June 5, 2020

What is the work product of data science?

My family asks me to explain what I do. I have tried to explain it many times without success. Most of the time the conversation comes down to: “Okay, just tell us what you make. What do you produce as a result of your efforts?” I've come to realise, even that is not an easy question to answer when data science is in question.

It's also one of the most commonly asked questions on the internet. What does a data scientist deliver? What do they do that everyone is so hyped about? Many people, most of whom want to become data scientists are not clear on this. Well, it’s because there is no one thing a data scientist does. What you deliver will depend on the company you work for and the type of work your company/employer requires from you. That’s why understanding what you will be creating as part of your job is an integral part of understanding a position before applying for or accepting an offer.

Let’s see a couple of examples of what you might be building as a data scientist. See if there is a favourite of yours in that list. It will be useful to know where your preferences lie when choosing a job.

  • A trained model: data scientists analyse data and train machine learning models on them. In this case, your main goal will be to create the best performing model based on your business’s requirements. At the end of the project, you will have a trained model to deliver. 
  • An API: making an API is making a way for end-users to interact with your model in a simplified way. Let’s say your model predicts the number of bananas that will be sold tomorrow in Amsterdam based on the neighbourhood. Your API will then be set up in such a way that it will receive a neighbourhood name and return number of bananas. This is just a way of the outside world interacting with your model.
  • A complete pipeline: on top of your trained model, a complete pipeline might be expected. A pipeline is a complete product that includes the reading of the data, analysis, model building and even the preparing of the results into a format that the end-user can access easily. If you ask me, everything that comes after preparing your model (in a technical sense) is not data science anymore but some companies prefer people who can be involved in the product building part of a data science team. This does not mean that you need engineering skills though. Many times you’d be part of a bigger team who is building the whole pipeline. 
  • A presentation on the results of the analysis: this deliverable can be requested individually or together with other deliverables and it is very common. You don't only analyse data and build a model but present your work in a way that is understandable and clear. Most of the time this deliverable is required to draw a conclusion from your work and has the potential to influence business decisions. The key skill you need to channel here is explaining a complex process in terms non-technical people can understand.
  • A report: it is possible that no one will interact with your model actively and it will be a periodic work. In this case, the model will be scheduled to run every week and it will produce some sort of report. This report could be the result of some analysis on data or again could be the prediction of a certain value. The format could be anything, an excel sheet, a csv file, could even be a certain visualization or visualizations. 
  • A dashboard: this is my least favourite deliverable. Sometimes, companies need you to prepare a way to interact with the data that they have. Though this is more of a data analyst type of work, I’ve seen many companies asking data scientists to do it.
  • Advice: sometimes the business people need the data scientist to understand what’s going on with the data and explain it to them. You might say that this sounds like a data analyst's job but it’s not necessarily. With the latest demand in AI ethics and explainability, companies want to understand how the ML models fit their data so well and why. So your responsibility might not end with the model that performs very well. You might need to prepare an explanation for why a certain model fits the data and which features explain the change in the predicted value how and how much.

These are the most common types of products or deliverables I've seen or heard data scientists' deliver so far. You can hear these mentioned by many of the guests on the So you want to be a data scientist? podcast. Even though they seem different, many times the work you do is similar: analysis and model building. It is what you present at the end as an end result that changes.

Nevertheless, what you work towards will be a big part of your job. It is important to feel motivated for what you’re preparing. So make sure to ask for it in your job interviews. Learn what type of responsibilities will be given to you as a data scientist. It will not only show that you are truly interested and knowledgeable but it will also give you great insight into the job you’re applying for.