January 24, 2021

So you learned a bit of Python, now what?

What do you need to do to become a data scientist? Based on the internet you need to:

  • Learn Python (or R)
  • Learn machine learning

Sounds easy enough to follow, right?

Well, not really. If you’ve already been through some of the learning, you might be aware that, things can get out of hand.

It’s pretty straightforward to learn Python. At least the basics of Python. There are many website and online resources. And it’s a programming language after all. After you learn the basics, the actual learning comes from doing.

Next is to learn machine learning. And that is where it starts getting overwhelming really fast.

This is because there is no one structure for learning machine learning. The more you dive into it, the more you realize that you would need to also understand other topics such as; data exploration and understanding, what to look for in the data, dealing with problems in the data, which problems ml algorithms are good for, how to prepare your data for these algorithms and so on.

And the problem is, these concepts are all intertwined.

In order to understand how good a machine learning algorithm can perform, you need to understand evaluation metrics. To understand evaluation metrics, you need to know about the factors that affect these evaluation metrics which brings you back to data quality. But data structuring and cleaning depend on the model you want to train too.

So, close up, it all feels like a big mess; there is no one set of prerequisites, everything is somehow connected to the other and you feel lost. That’s why even when you know 80% of everything you need to know from fundamentals, you might still be left feeling like you don’t know what you’re doing.

I know, because it happened to me.

Somehow it feels like you either know everything or nothing because of this complex relationship the concepts have with each other.

I’ve struggled with this in the early years of my career. I knew a good amount of how things worked together but then someone would tell me a new concept that I wasn’t familiar with and I’d feel like a complete failure not knowing what they were talking about and not being able to put it in the logical map I had in my head of how data science worked.

The solution I came up with in time to deal with this was to get practical. It’s a very obvious solution when you think about it. Data science is a practical profession. You work hands-on by definition. So, of course, the way to understand things is not by trying to force them into rigid structures in your head but by letting them stay in a fluid state, where you know how things relate to each other. This can only be achieved with experience though.

You need to do hands-on projects. Any type of project. Set up a goal and work towards that goal. Step by step. And tackle each challenge one by one. Maybe you will try to train an ml model and you will see that the library is giving you an error. Okay, then you probably first need to get rid of the missing values. And bam, you will not forget again that this is a step in data science.

Or maybe you will want to plot your data but will have a hard time making it look the way you want it. You will struggle for a while, you would need to try and fail but after a while, you will learn how to do it. Then, I promise you, you won't forget it even if you wanted to.

Just by keeping at it, you will learn and grow into data science.

There are some caveats of course. Doing this alone, you might miss some important decision points or struggle more than necessary with certain steps. That's why you would want to have support, for example by taking my Hands-on Data Science course.

So you only work for the parts of a project where you will learn. You will be responsible for doing the project yourself but I will be there each step of the way to show you how a professional would approach things. That way you can see what you missed, what mistakes you made if any or if you have fallen into common pitfalls. That, if you ask me, is the most efficient way to improve your data science skills.

Hands-on work is underrated in learning data science. In that, I mean independent hands-on work.

Many courses out there do not give you enough freedom to think and figure things out yourself. Yes, they have a project integrated with the course, but you just end up watching what the instructor is doing and repeating it.

Although it gives the illusion of doing hands-on work, when you just repeat someone else’s steps, you will not get to ask yourself the critical questions or realize the hidden decision points.

If you don't face these critical points while learning, you will have much more trouble becoming an independent data science practitioner.