The Soft Skills You Need to Succeed as a Data Scientist

Overview of Your Journey

Introduction
Skill 1 – Communication
Skill 2 – Collaboration
Skill 3 – Curiosity
Skill 4 – Project Management
Skill 5 – Mentoring
Wrapping Up

Introduction

When you are working on your career as a data scientist, it’s easy to focus on the hard skills. You might want to learn a new ML algorithm like an SVM with a non-linear kernel, a new software technology like MLflow, or a new AI trend like ChatGPT.

These skills are comfortable to learn because it is easy to measure success in them. Take MLflow as an example. You might first start to learn about what MLflow can provide to your ML lifecycle. You learn about model artifacts, ML project structure, and model registration. You finish a course, spend a few hours reading the user guide, and even implement it in a real-life project. Great! After doing this, you can confidently say that you know some MLflow and can add this as a skillset in your CV.

What about a soft skill like, say, time management? How would you go through the same process? Really stop and think about this. There are certainly books on time management you could read, but it is not nearly as concrete as reading the documentation on MLflow. You could implement time management in your daily routines, but it is not as demonstrable as implementing Mlflow in an ML project. You could list time management in your CV, but what does that even mean? 😧

It is a fact of life that soft skills are harder to measure, simply by their nature of being less tangible. Many people draw the conclusion that soft skills are less valuable than hard skills. But this is a critical mistake! Just because something is difficult to measure does not mean that it is not worth working on!

I’m sure that we’ve all experienced a colleague that had time management down to such a degree that their output was almost twice the amount of others. This is a boost that is almost impossible to obtain with hard skills. Nowhere have I ever seen someone learn MLflow, and then have twice the output of other data scientists. So even if soft skills are hard to measure, they can provide value well beyond many typically hard skills 🔥

This is especially true for Data Science. The positive stereotype of a data scientist is someone with great problem-solving capabilities. The negative stereotype is that he or she is a bit lacking in common soft skills needed to succeed in business environments. By spending some of your time working on soft skills, you can gain a massive advantage and use this to forge your own career path.

In this blog post, you will learn the top 5 soft skills a data scientist needs to succeed. This is, of course, just based on my own option. However, that opinion has been shaped by watching many other data scientists and seeing what has made some of them stand out from the rest.

Skill 1 – Communication

The first one is as standard as they come. You should learn how to communicate well. This includes many things:

You should be able to communicate your findings of an exploratory data analysis (EDA) phase clearly. And for the love of god tailor it to the audience. A CEO does not want to hear about the choice of distribution you made to fit the data, or which Docker image you used for running the experiment. The CEO might be enthusiastic about data science, but he or she has hundreds of other things to consider as well. Give a high-level overview of the EDA and focus on business outcomes for the CEO.
When giving talks, make sure that you say something that is of value to the audience. This sounds obvious, but apparently, it’s not. Don’t ramble on about complex architecture or intricate hyperparameter tuning just to make yourself sound smart. This is just a defense mechanism. Rather, make sure that what you say leaves the listener with something of value. By doing this, you will suddenly have people coming up to you wanting to discuss what you talked about.
When speaking to others, make sure that you listen to what they are saying. This is not the same as nodding and waiting for your turn to speak. To actually listen means to put yourself into the shoes of the speaker, and to carefully try to understand their perspective. Say that a product owner explains to you that they want a faster pace and less exploration. Rather than waiting to explain why they are wrong, take a minute to actually listen. The product owner is maybe evaluated on progress, and might not understand the upside to exploratory analysis. And the exploratory analysis might have gone on a bit longer than necessary if you are being honest. Try to listen and then work with the product owner to find a good solution for both of you.

In addition, you should work on your writing. No, really. It’s not awful, but it could be shorter. You have a tendency to write complicated sentences, make unnecessary explanations, and drag on about details that don’t matter half as much as you think. Don’t worry, I do too 😳

Brevity can signal confidence. Compare the following two responses to an email about a feature request due next Friday:

I think that will be possible by next Friday. I will start by looking into the problem, understand the solution space, and then work on it in an iterative manner. I will ask for advice if needed, and otherwise progress towards the goal of finishing the feature by next Friday as you requested. I’m sure that everything will go well and that I will deliver a satisfactory result by next Friday.
I will work towards delivering that by next Friday. I will ask for advice if any blockers emerge.

There is not much more information in the first statement, except for vague talk about the iterations of your work and promises of satisfactory results. Imagine being a project manager or a tech lead that have to read stuff like this day in and day out. Cut this out! Say what you want to say in a professional manner, and then move on to solving the problems at hand.

Skill 2 – Collaboration

Very little impactful data science work is done by a single data scientist. Sure, there are a few exceptions. But most impactful data science is done by teams of data professionals with backing from other occupations like front-end/back-end developers, platform engineers, testers, domain experts, project managers, and the list goes on.

This means that collaboration is not only useful but completely essential for successful data science. Here are a few ways you can work on your collaboration skills:

When depending on other roles, understand the interface between your work and theirs. Say that you are collaborating with a data engineer who writes Spark in Databricks or in Synapse Analytics. The output of their work is tables that are cleaned and in the correct format for data science. But what is the correct format? This depends completely on which features you want to use, and which algorithms you plan to use. You don’t want to end up in a situation where a data engineer meticulously cleans a column in a table that you immediately drop because you are not planning to use it. This is a symptom of bad collaboration.
When other roles depend on you, plan early for how to secure good collaboration. Say you are planning to develop an ML model that ingests real-time data and predicts a value. The prediction will then be sent to both the user in a front-end app and also to a Power BI dashboard for internal tracking. Then both the front-end developers and the data analysts should be kept in the loop regarding the future format of the data. You might even provide them with mock data for the exact structure of the data. In this way, you ensure that the people that depend on you don’t have to wait for you to finish to do their work. When people are collaborating well, it’s like parallel processing. When they are not, it becomes single-threaded and everything slows down.
When collaborating with other data scientists, delegate clear ownership. Since data scientists come from a variety of backgrounds, their skill sets are quite different. You might have a data scientist who is really good at getting those extra percentages of accuracy for a model. Another data scientist might be really good at setting models in production and monitoring for data drift. Different persons can take ownership of different aspects based on their experience. Every data scientist can still contribute to every aspect, though.

Finally, there is the more generic stuff that has nothing to do with data science. Everyone working on the same team deserves to be treated with respect. This is independent of their technical background, skill level, gender, or any other factor that is irrelevant for common courtesy.

People sometimes make mistakes. It’s important to acknowledge mistakes while making sure mistakes are a natural part of learning. You should optimize to make a culture in a team where mistakes can be admitted without fear of retaliation or ridicule. If you fail to create such a culture, then mistakes will not stop happening. They will simply happen under the radar and resurface later when fixing them is a lot harder.

Skill 3 – Curiosity

I’ve always felt that data scientists are naturally curious people. They like to learn a new ML algorithm or keep track of new developments in their field. But it is a lot more varying whether this curiosity extends to technologies, methods, and approaches in nearby disciplines.

Some data scientists are excited to learn more about software development, design principles, project management, data analysis, data engineering, business impact, and so on. Others want to stick to their own bubble and only work on data science. While this is perfectly fine, you should not be surprised if you are evaluated lower than colleagues that have the curiosity to explore nearby areas.

Is this unfair? Not really 🤷

The data scientist who knows software development is simply more useful than the one that doesn’t. Software development skills ensure that there are more possible projects that data scientists can work on. The interface to roles like front-end/back-end developers and data engineers also suddenly becomes a lot easier to manage.

Others only view your output through their own lens. Say a tester is tasked with running integration tests for several components that you have written. If your components are well-documented, composed modularly, use good coding standards, and have unit tests, then the job of the tester is a lot easier. On the other hand, if you simply have a lot of free-flowing code in a massive R script, then tracking down errors becomes a lot of work for the tester. Naturally, the tester will think that the person that puts effort into the software aspect is more skilled. This is independent of what the ML model within the script does.

Business impact is another classic. One of the most common negative feedback data scientists get is that they are too removed from business objectives. A data scientist that understands the business and comes up with data science use cases that generate ROI will naturally be more valuable to the business.

So how does one work on this broad curiosity? There is only a limited time you can all spend on other disciplines, but I have two general suggestions:

Spend some time trying to understand what other roles are really working on when talking to them. It is pretty quick to pick up some knowledge of KPIs and OKRs when talking to a business analyst, but this knowledge can be super valuable. Personally, I know very little about computer networks since I don’t have an informatics background. But I do know why one would use a private network, the (extremely) rough outline of how one would set this up, and when it might be appropriate to invest in this. I’ve gained this knowledge mainly from talking to network engineers. This knowledge, although pretty surface level, is valuable for knowing when to contact a network engineer about this.
When working on projects, jump on the opportunity to do something slightly out of your comfort zone. Does someone need to implement automatic linting in a continuous integration pipeline? I’ll have a go at that! Even though you don’t know much about CI/CD or YAML files, you will probably figure it out. If not, you can always ask for help. By jumping at opportunities to learn something new you…learn something new. I know, it’s pretty profound 😉

Skill 4 – Project Management

Think back on previous projects that have involved a team effort. Think about those projects that have failed to meet deadlines, or have gone over budget. What is the common denominator? Is it too little hyperparameter tuning? To poor model artifact logging?

Probably not, right? One of the most common reasons for project failures is bad project management. Project management has the responsibility of breaking a project down into manageable phases. Each phase should then be continuously estimated for the amount of work left.

There is a lot more than this that a decided project manager is responsible for, ranging from sprint execution to retrospectives. But I don’t want to focus on project management as a role. I want to focus on project management as a skill. In the same way that anyone in a team can display leadership as a skill, anyone in a team can also display project management as a skill. And boy, is this a useful skill for a data scientist.

Let’s for concreteness focus on estimating a single phase. The fact of the matter is that much of data science work is very difficult to estimate:

How long will a data cleaning phase take? Completely depends on the data you are working with.
How long will an exploratory data analysis phase take? Completely depends on what you find out along the way.

You get my point. This has led many to think that estimating the duration of the phrases in a data science project is pointless.

I think this is the wrong conclusion. What is more accurate is that estimating the duration of a data science phase is difficult to do accurately before starting the phase. But project management is working with continuous estimation. Or, at least, this is what good project management is supposed to be doing 😁

Imagine instead of estimating a data cleaning job in advance that you are one week into the task of cleaning the data. You now know that there are three data sources stored in different databases. Two of the databases are lacking proper documentation, while the last one is lacking data models but is pretty well documented. Some of the data is missing in all three data sources, but not as much as you feared. What can you say about this?

Certainly, you don’t have zero information. You know that you won’t finish the data cleaning job tomorrow. On the other hand, you are very sure that three months are way too long for this job. Hence you have a kind of distribution giving the probability of when the phase is finished. This distribution has a "mean" (a guess for the duration of the phase) and a "standard deviation" (the amount of uncertainty in the guess).

The important point is that this conceptual distribution changes every day. You get more and more information about the work that needs to be done. Naturally, the "standard deviation" will shrink over time as you become more and more certain of when the phase will be finished. It is your job to quantify this information to stakeholders. And don’t use the distribution language I’ve used when explaining this to stakeholders, that can stay between us.

Having a data scientist able to say something like this is super valuable:

"I think this phase will take between 3 and 6 weeks. I can give you an updated estimate in a week that will be more accurate.

Skill 5 – Mentoring

Mentoring more junior data scientists is often seen as a necessary evil. It’s honest work for sure, but something that is not emphasized much. If the junior data scientist would magically learn the concepts themselves that would be preferable, right?

As you probably can tell, I disagree. Mentoring junior data scientists is immensely helpful for both you and them. Here are three reasons:

You learn a lot from explaining concepts: This one is pretty straightforward. By explaining concepts and ideas to junior data scientists, you learn the concepts better yourself. I’ve often found that explaining something to a junior data scientist has helped me articulate something more clearly. It is often only when someone asks you that you realize that you might not understand something as well as you thought. This is a great opportunity to learn more about the topic. In addition, you can highlight to the junior data scientist that it is okay to not know everything. In fact, this is inevitable.
You get minor management experience: Soon, you might be stepping into more senior roles like e.g., a chief data scientist. Roles like these often do not have formal management responsibilities for other employees. Yet, there is the expectation that you can lead and influence others. Like any other skill, this comes with practice. In your day-to-day data cleaning and model tuning, you get little practice with this. So if you never mentor anyone, then don’t be surprised if you struggle to lead and influence others. And if you are weighing the possibility of going into a management track, then no mentoring responsibilities in the past is a bit of a red flag. Why have you never mentored anyone? Is it because you don’t want to do it, or because other people don’t want you to do it? None of these possibilities look great.
You get to build a connection with junior data scientists: Sure, there is a natural power imbalance between a mentor and a junior data scientist. Nevertheless, it is often the mentor that the junior data scientist will connect most with if the mentor does a good job. By taking responsibility and mentoring junior data scientists, you will quickly find yourself surrounded by people whom you have mentored. These people often look up to you and value your advice. This is not such a bad situation to be in.

My advice is to become a mentor quickly in your career. The three benefits above are only valid if you take the mentoring job seriously. If you do a poor job mentoring, you get few of the benefits and might even get a reputation for being a bad mentor 😬

Some companies have very low expectations for mentoring. You can be asked to have a coffee with the junior data scientist once a month. I would advise going beyond the call of duty. Offer to the junior data scientist that they can come to you with problems and questions. Stepping up like this for junior data scientists is a sign that you can take on responsibilities without being explicitly asked.

Wrapping Up

In this blog post, we’ve seen how Soft Skills can be super valuable for data scientists to move forward with their careers. When interviewing for senior candidates in data science, I look at the soft skills they have accumulated as much as the hard skills. Write me a comment if there are other soft skills that you think are essential for data scientists to have.

If you are interested in data science, programming, or anything in between, then feel free to follow me on LinkedIn and say hi ✋

Like my writing? Check out some of my other posts for more Python content: