Early-career data scientists often worry about their coding and math skills, or about whether or not they’ve mastered enough algorithms. As we see time and again, though, the answer to the most complicated questions often lies in combining the right tools with a clear approach. Learning to do that is easier said than done, of course. Here are six recent articles that take on thorny problems and tackle them with curiosity and patience—we hope they inspire you to try something new.
- Learn how to leverage machine learning and statistics to measure social impact. **** Julia Nikulski was curious to know how the makeup of a company’s board of directors might affect its performance in areas like sustainability and human rights. Using multiple data sources and techniques that draw on NLP and regression analysis, Julia shows the clear effect of placing the right people at the right positions to drive better corporate policy.
- Discover a new method to calculate a crucial metric. **** Population density might not be something most of us think about on a daily basis—or ever—but it’s a key stat for governments, healthcare systems, and urban planners (among _man_y others). Nick Jones walks us through the powerful features of the High Resolution Settlement Layer (HRSL), a "large and ingeniously-produced dataset portraying the number of people in each 30-meter grid cell across most of the earth." He also explains how to query it with just a few lines of code.
- Catch up on the latest in the exciting world of augmentations. If you’re only planning to read one deep learning-focused blog post this week, you might as well go for Jonathan Laserson’s fascinating explainer on Test Time Augmentation, contrastive loss, and how we’ve finally arrived at a moment in which self-supervised learning is becoming a reality.
- Make sense of propensity score stratifications. Leihua Ye has published a series of informative posts in recent weeks, focused on the challenge of inferring causal relations from massive amounts of data. In the latest installment, Leihua turns to propensity scores, and shares both the theoretical background and hands-on details you need to take advantage of stratification methods.
- Find out if you’re doing anomaly detection right. "There is no scarcity in offers and solutions in business intelligence, but it’s often hard to evaluate the performance and quantify the potential impact the tool can have on the business," says Julia Bohutska. She goes on to explain how her team built an evaluation pipeline based on business data in order to measure their success at anomaly detection.
- Explore the power of Importance-Weighted Regression (IWR). Rama Ramakrishnan‘s series on deriving optimal policies from data is now complete; in this final post, Rama turns to a three-step process that will allow you to make informed business decisions using a single model—one that directly predicts the difference in outcomes between different actions.
Thank you for joining us this week—here’s to learning, sharing knowledge, and finding new ways to reduce complexity in our work (and beyond). We’re always grateful for the time you spend with our authors’ articles, and for your support of their writing.
Until the next Variable, TDS Editors
Recent additions to our curated topics:
Getting Started
- How to Build Your Data Analytics Team by Louise de Leyritz
- How to Communicate More Effectively as a Data Scientist by 👩🏻 💻 Kessie Zhang
- How xticks and xticklabels Really Work: a Walkthrough by Henry Alpert
Hands-On Tutorials
- When a Uniform Threshold Is Not Enough by Sébastien Gilbert
- Search Algorithms—Concepts and Implementation by Siwei Causevic
- Why Start Using sktime for Forecasting? by Joanna Lenczuk
Deep Dives
- Train Your Own Chess AI by Logan Spears
- Transformers, Can You Rate the Complexity of Reading Passages? by Peggy Chang
- Mindful Machines: Neuroscience & Critical Theory for Ethical AI by Haaya Naushan