Most TDS posts bring together theoretical expertise and real-world impact—whether it’s a short tutorial or a thorough explanation of cutting-edge research. This week, you should check out one of the best examples we’ve seen in recent memory of a project that bridges the theory/application gap.
Have you ever stood in front of your recycling bin not sure whether an item belonged in it or not? Duncan Wang, Arnaud Guzman-Annès, Sophie Courtemanche-Martel, and Jake Hogan walk us through their project on AI’s potential to streamline and optimize recycling programs. They do a stellar job laying out the technical foundations (in this case, CNNs, or convolutional neural networks), and then proceed to show how leveraging CNNs can drive real change and produce a positive outcome in an area that affects all of us. We hope you feel inspired reading about their work—we certainly did!
Inspiration can—and does—come in multiple flavors and shades. More often than not, it’s not a specific topic or approach that resonates with our readers, but rather a sense of useful insight: you leave the article feeling like you’re in a better position to think through a problem or execute a solution. Seth Billiau‘s exploration of permutation feature importance is a solid example of the former. Looking at black box models and the difficulty to explain how they produce their results, Seth shows how zooming in on specific features can help us see which ones hold the most predictive value. Taking a few steps back (or is it forward?), TDS podcast host Jeremie Harris and his recent guest Evan Hubinger chatted about inner alignment problems and how we might address safety concerns that currently lurk just beyond the field’s immediate future. On the other end of the spectrum, Prukalpa shared the lessons she and her team learned from their collaboration with TechStyle, a fashion retailer. She focuses on how they implemented a brand-new data warehouse that matched the company’s needs, detailing the challenges they faced along the way.
For a post that balances practical knowledge and whimsy, look no further than Matt Sosna‘s writing on the surprising similarities between algorithms and the behavior of large schools of fish. Taking his PhD research as a point of departure, Matt draws fascinating connections between the collective decision-making he’d observed in fish and ensemble learning algorithms, and wonders whether "fish schools act as a massive neural network."
Finding unexpected links between disparate areas and concepts is one of the hallmarks of productive learning; if you’re looking for some hands-on advice on how to expand your Data Science skill set, Julia Nikulski compiled several excellent tips based on her personal journey—they range from creating your own datasets to diving into academic research. For even more ideas (and more inspiration), once you’ve read Julia’s post, continue with TDS Editor Elliot Gunn‘s recent conversation with Randy Au, a Quantitative User Experience Researcher at Google. Randy emphasizes the importance of educating stakeholders about the value of data insights—and of educating oneself in specific domains that are relevant for the organization you’ve joined.
If you’re looking for other ways to improve your workflows and add efficiency into your day-to-day data science processes, we’re not done yet! Noah Burbank published a comprehensive deep dive into the art of choosing the right text-classification models based on his work at Salesforce, where the choice often hinges on "how well our training data captures the diversity of the data it’ll score in production." Meanwhile, Parul Pandey offers a way out of a common conundrum—the moment when you realize how difficult it is to decipher your own codebase. As a solution, Parul proposes automating your project’s structure, from creating a reusable template to pushing your code to GitHub.
Wherever you are in your data science journey, we hope you found something new to sink your teeth into this week—whether it’s a tool, a process, or a practical application of a lofty idea. Thank you for reading, discussing, and supporting our work.
Until the next Variable, TDS Editors
Recent additions to our curated topics:
Getting Started
- Node2vec Explained Graphically by Remy Lau
- 20 Lessons Learned Going from Junior Data Scientist to Chief Data Scientist by Mathias Gruber
- LinkedIn’s Response to Prophet—Silverkite and Greykite by Eryk Lewinson
Hands-On Tutorials
- How Vicious Do You Think Your Social Media Comments Are? by Navjot Bians
- How to Tune Hyperparameters of Machine Learning Models by Chanin Nantasenamat
- How to Solve a Staff Scheduling Problem with Python by Khuyen Tran
Deep Dives
- Seat of Knowledge: AI Systems with Deeply Structured Knowledge by Gadi Singer
- Can We Study Snowpack with Satellites? by Will Norris
- Raspberry Pi Gardening: Monitoring a Vegetable Garden Using a Raspberry Pi—Part 1 by Christian Hollinger