March 7, 2020

12 data science realities unknown to beginners

So you want to become a data scientist? Close your eyes and tell me what you see yourself doing.

I can’t really hear you but I’m going to assume it sounded something more or less like this: “I’m in front of my computer doing modelling using the latest and coolest algorithms and deciding the next steps of the business that I’m working for. I present my findings, everyone is extremely impressed and it changes the way that we do business.” This is what I had in mind when I was an aspiring data scientist. I never really stopped to think about the tiny little annoying details that might come my way.

From what I’ve experienced so far, being a data scientist has its fair share of … side-effects let’s call them. In this article, I will tell you some of the day-to-day problems/realities a data scientist most likely deals with, in hopes to illustrate the reality of the job.

#1: You’re probably going to receive dirty data that is messy beyond explanation and will need to track down 5 different people to explain 5 different parts of the data to you, in incompatible ways.

#2: Prior to implementing something that is going to be useful, you will work on many things that are going to be discarded.

#3: You will spend more time than you can imagine on data cleaning and feature engineering. After all, that is many times where the magic happens.

#4: You might actually need to use excel or build extremely simple products depending on the problem. It is not lazy, but rather smart to start small and build up to more complex solutions.

#5: You’re going to have to learn to ward people off when they come to you asking for dashboards or fancy visualisations to their data. Though, it is fun sometimes to work on a simple side project.

#6: You don’t only need to know data science, you also need to know how to handle data. You have to make sure to follow compliance and GDPR rules.

#7: Clients are going to come asking about the latest thing they heard (the shiny new algorithm they read about on Wired magazine or a new explainability tool) even if it has nothing to do with the project you’re on. Most of the time, it will be your responsibility to explain the situation.

#8: You’re not actually going to develop algorithms. Those are already made, packaged and delivered for you. It sure helps to know how everything works but your relationship with the algorithms themselves will probably only consist of tuning their parameters.

#9: You are going to have to explain what machine learning/data science/AI can or cannot achieve before, during and after a project.

#10: You cannot hide in your computer after receiving a piece of data. Understanding your problem’s domain is one of the most important things. You are going to have to learn a good deal on how “the thing that data is collected from” works.

#11: You do have to convince business stakeholders of the value of your solution. Even though we live in a world hyped beyond explanation about data and AI, it still takes effort to bring everyone onboard. Especially when you are changing how something is done in your company.

#12: You also have to get your business stakeholders to understand the non-deterministic state of your solution. Machine learning solutions do not base the results on physical rules. The results are based on patterns in the data. This fact is sometimes hard to get across to stakeholders who are used to hearing “cat”, “dog” or “rabbit” instead of “dog with a probability of 0.879”.

If you go online and read answers to Quora questions on “the reality of being a data scientist” you’re likely going to see similar or even worse sounding facts from others. Nevertheless, these facts should not discourage you from pursuing a career in data science.

If you ask me, these little “side-effects” are the things that keep me on my toes and make it even more fun for me. I like that the data does not come extremely clean every time. The process of cleaning helps me understand the data better, it helps me learn things about it, I would have otherwise not realized. I also like studying different domains. The nature of my job requires me to learn about different industries and no matter how uninterested I think I am about an industry, I always end up learning something that fascinates me.

All I want you to do is to be aware of what to expect from a data scientist position. You probably won’t have everything you’re looking for starting from the first day at the job. But as long as you enjoy what you’re doing enough to deal with the side-troubles, you will be fine.