April 30, 2020

7 things I've learned in 2 years of being a junior data scientist

Getting a data science job is the goal but it is not the end. On your first day, you are likely going to find yourself among a group of people who seem like they know what they’re doing at all times. I'm going to be honest, it’s scary, and overwhelming at first. You work hard to learn and improve your skills and just when you get hired and think you are good enough, self-doubt comes rushing in again.

I've had my fair share of self-doubt on my first couple of months or maybe even years of working as a junior data scientist. Though this helped me learn a lot about myself and the profession.

In this article, I will share some lessons I've learned in my first 2 years as a data scientist. Hopefully, this will help you understand the struggles you’ll likely face when you start off your journey as a professional data scientist. Knowing the potential problems is a good way of understanding the reality of a job.

The most important thing by far is to be organized, do version control and document your progress

Many of my projects started very slow. The data took a while to arrive, there were many meetings to join and I did some side work in the meantime. That’s why, on my first couple of projects, I did not start by keeping a solid file system and did not document what I’ve done. Big mistake. When the project starts, it starts and everything goes so fast. So fast that if you don’t already have a system for tracking your progress, you will likely not have the time to set it up once the project is on the way.

Now what I do is organize my folders and files, set up the relevant version control environment before the start of the action. On top of this, I made it a habit to write headlines and comments on my notebook, even before I start coding, so that when I do start coding I know where everything goes.
I also keep a weekly progress log to remember what I did each week and what I plan to work on the coming week. This way I can look back any time and refresh my memory on what’s been done so far. Oh and most importantly, I’ve learned to use proper naming. Let me tell you, when you have 10 different notebooks with different parts of a project, that has very similar or non-descriptive names and trying figure out which did what after a 2-month break, is NOT fun.

Be confident in your abilities, you know more than you think

It is very common to feel unconfident when you’re just starting to work. Everyone around you seems like they know what they’re doing whereas you’re just trying to learn the ropes. In comparison, of course, the people who have been working on this position for years know more than you do. That’s only natural, but you know a lot too. Don’t take your hard work for granted.

Besides, you always have the opportunity to learn more. This job is not about knowing everything by heart, it is about having the flexibility to learn new things when you need to. That’s where your real value lies: having the foundation to be able to learn new tools and techniques quickly.

Adopt some side projects on topics you’re not confident about

It is very helpful to work on the topic you’re not confident in alone or in a group. Gather some of your junior colleagues, convince some seniors too and adopt a side project together. Your colleagues are basically resources. Use their knowledge to improve yourself. Having an extracurricular project that you work on together is great. There will be no time, money or client pressure. It will be all about learning. If you cannot convince your colleagues to do so, do it alone and maybe your colleagues will want to join in when you talk about how much you learn and how much you’re having fun.

No shame in following your supervisor around

Most likely your supervisor is going to be a senior data scientist. If that’s not the case, you should probably get a mentor who is a senior data scientist. This person is the person who already knows what’s what both business-wise and technical-wise. It’s kind of like lion cubs learning to hunt looking at their parents. You need to observe and learn from this person. Pay attention to how they navigate daily office things, how they talk to clients, how they explain technical things to untechnical people, how and why they make certain decisions about projects. Make sure to have your observational hat on around them.

Listening is key

I was aware that communication mattered a lot in data science when I started working. I just wasn’t aware of how important it was. No matter if it’s your supervisor, your client, head of a team you’re working with, you have to pay attention to what they’re saying. I’m sure this is true for many positions but I find this to be particularly true for data scientists. This is probably because a data scientist acts as an intermediary between many different teams, takes input from different parts of the organization and is responsible for combining this information and making sense out of it. That’s why you need to learn to listen and monitor for the information you need.

Ask. All. Your. Questions.

I cannot emphasize this enough. It is natural to think you should not show what you don’t know but trust me, asking is the key to becoming a better data scientist. No one expects you to know everything or have all the answers, especially when you're a junior. Feel free to say "I don’t know" or "I need to get back to you" when you don’t have an answer for them right away. Don’t be afraid to ask what someone is talking about if they’re using phrases and terms you’re not familiar with. Hiding what you don’t know would only hinder your professional relationship with someone. After all, how much you contribute to the discussion is more important than how knowledgeable you come across.

Trust your instincts but support it with outside input

When you’re working on a data science project, there are many decision points. How do I separate the train and test set, what type of encoding to use for the categorical variables, which features to include in the training, what new features to generate... Some of these are more important than others but they all have an impact on the final model. At some point, you cannot go around trying to optimize every decision. That’s why you need to trust your instincts and your experience (which is very little at the start) to decide which way to go sometimes. I found it very useful to note down most, if not all, of the decisions and assumptions I made, to be able to track them back. If at any point you feel not confident or feel confused, bring your worries to a colleague or colleagues and discuss your assumptions. You will learn a lot from their experience and input.

As you might have realized, many of these points are about learning. This is because learning never ends when you are a data scientist. That’s probably the number one thing you should keep in mind when considering this profession. A significant portion of a junior data scientist's time is spent on finding holes in their knowledge, trying to mend the holes while trying to keep up with the latest developments.

I hope it was helpful to look to the other side of the coin and see what to expect after you start working. Let me know if you expect any other struggles from your first couple of years of data science in the comments!