In a recent post about the benefits of data marts, Vicky Yu highlighted a surprising stat: just a few years ago, data scientists would spend up to 80 percent of their work hours wrangling and cleaning data. Even if the percentage is lower these days (as some surveys suggest), that still leaves relatively little time for analysis and insight generation. Which makes one wonder: isn’t that supposed to be the core of a data professional’s job?
This week, we turn to the crucial (if sometimes under-discussed) area of Data Analysis, and share several recent articles that focus on how analysts should work—and how they can make the most of their finite time and powerful skills.
- Learn how to find the best solution to a given problem. A solid analysis should produce good decisions. As Julia Kho explains, that’s precisely what optimization does: her new article walks us through the different levels of analytics and the various components of data-informed problem-solving.
- Organizations can help team members use their time more effectively. For Mikkel Dengsøe, it’s important to examine why data analysts end up spending a lot of time on tedious, repetitive tasks instead of on, well, analyzing data. It’s only then that teams can restructure and refocus their most valuable workflows.
- Are we doing exploratory data analysis all wrong? "EDA is not a specific set of instructions one can execute – it is a way of thinking about the data and a way of practicing curiosity." Viyaleta Apgar‘s recent post is a thought-provoking invitation to reevaluate the way we approach analysis, and to avoid rushing into the process without reflecting on our own assumptions, biases, and blind spots.
- Take a glimpse into the day-to-day experience of a healthcare data analyst. Data-focused work can look strikingly different across industries and workplaces, which is why it’s so useful to learn about the real-life experiences of practitioners. Rashi Desai generously shares her firsthand impressions of data-analytics work in the healthcare sector, and dispels some common misconceptions along the way.
Looking to explore a few more topics this week? We certainly hope so! Here are some recent standouts we think you’ll enjoy:
- In our latest Author Spotlight, we had the distinct privilege of chatting with Tessa Xie about the twists and turns of Data Science career paths and the benefits of public writing.
- How should you compare regression models that have all or some variables in common? Sachin Date‘s recent deep dive explores this question in great detail.
- Suneeta Mall published a deep dive of her own on an equally fascinating issue: the problem of labeling errors and how they affect the outcomes of deep learning projects.
- Are you looking for a new job (or planning to start soon)? Don’t miss Emma Ding‘s handy resource on the three job-search mistakes you should absolutely avoid.
- For any creative tinkerer who’d like to experiment with a Cool New Thing, here’s Chanin Nantasenamat‘s tutorial on building a real-time transcription app in Python.
- For the more theory-inclined, here’s a thorough – and fascinating— exploration of multi-dimensional Fourier transformations in convolutional neural networks, courtesy of Sascha Kirch.
- Finally, we kicked off the new month with our June Edition, which rounded up some of our best articles on the SHAP library and explainable AI.
Your support means so much to us, whether it’s reading and engaging with our authors’ work, sharing it with your friends and colleagues, or becoming a Medium member. Thank you!
Until the next Variable,
TDS Editors