March 6, 2020

Amazing things you can do with data science that you shouldn’t worry about as a beginner

Isn’t it super fun to learn data science? Especially if you are completely new to the area. It is impossible to not get excited and a little bit overwhelmed by everything you can learn. Maybe you start with programming and learn a little bit Python. Then come the commonly used libraries and tools, maybe some of the important machine learning algorithms and wait, what is that? Is that you in front of your computer, credit card in hand buying a course on the latest techniques of Natural Language Processing?

Obviously, something went really wrong here but no worries, it is not uncommon. There are many interesting topics out there and there is no shortage of good online courses. So we tend to attempt learning things which end up being unrelated to each other even though they are in the same, massive, overarching category of “Data Science”. This, in turn, makes us feel unsatisfied and inadequate when it comes to our skills.

I’ve been there and I’ve done that. In this article, let me tell you some of the most popular and cool-sounding sub-disciplines or practices (however you want to call it) of data science that you as a beginner should not worry about just yet.

Let’s start with my favourite: Deep Learning. Deep learning must be one of the coolest sounding things to be an expert of out there. New application areas are being discovered constantly and algorithms are being improved non-stop by brilliant people. It is almost impossible not to be excited about using them in your own work after watching all those YouTube videos of self-driving cars, face-recognition algorithms or cancer-cell detecting models. Except, you might never will. Especially if you’re going to work on a more business-facing side of data science, deep learning is likely never going to be the answer. Instead, you will be working with some of the simpler algorithms and improving them using preprocessing and parameter tuning. There is, of course, nothing wrong with learning the fundamentals of Deep Learning algorithms because they work impressively well on some complex problems and it feels nice to understand, at least on a high level, how they work. But you have to remember the people who are experts on these algorithms or in Deep Learning itself, have been working exclusively in this area, for a serious amount of time. Don’t expect yourself to be proficient in it just by taking a course. And unless you are aiming for a specific position at a specific company which you know quite certainly is looking for people with solid Deep Learning skills, do not go deep into this subject (pun not intended).

Natural Language Processing, Semantic Analysis, Speech Recognition and the like. We cannot say these areas belong to data science. They are subfields of the bigger subject, Artificial Intelligence. But we probably would need another article just to talk about the hierarchy of AI, data science and machine learning and all, so let’s just let that be for now. NLP (Natural Language Processing) and its related subjects are some of the most trendy things to work on. Thanks to advances in NLP, computers can be a part of our very human world much more naturally than they used to be able to. But this doesn’t mean that you need to know about it. It is again a very specialized subject that goes very deep, branches out to even more subjects that go even deeper. I once attended a conference on Semantic Analysis as a volunteer where I did not understand most of the research that was presented. It is just such a specialised topic. So, for now, let NLP go as one of the things you will learn if you ever need to. Know that it will not help you progress on your journey to data science. Similar logic follows for other subjects including but not limited to, computer vision and other advanced topics of AI.

Apart from these advanced topics, there are some tools or applications that are getting popular. One of them is visualizations. Visualizations are a crucial tool for data science. It is a great way to communicate your findings and convey your message to your stakeholders. But getting lost in the world of making visualizations, if you ask me, is not worth it for the value it brings. Don’t get me wrong, it is definitely a must-have skill to know how to make clean, understandable and informative visuals but unless your job has a specific focus on visuals I would not recommend diving too deeply into this topic. It is something you can always learn more about once you are confident of your fundamental data science skills or even after you found a job.

Another common pitfall I observe is learning many languages, meaning learning R or Scala even though you already know Python. If you have a reason to learn this extra language (e.g. the company that you really want to work at only and strictly uses R), then by all means, go ahead. But if you are only doing it to make your CV look nicer, I would say drop it. No one counts the number of languages you have on your CV. It is better to have solid skills in one language than to kind of know two or three languages.

You probably also started hearing about explainability and fairness in data science. These are part of the overall AI ethics discussion and some tools have been emerging to cover raising concerns in this area. I don’t know if there are courses going deep into this topic yet but my suggestion would be to learn what they are and not worry about applying them. It is definitely important to be informed about AI ethics, its implications and current concerns. But the tools which are used to address these concerns have not yet fully matured and it would be a waste of your time trying to understand and master how to use them. Long story short, find some articles, read them and spend some time thinking about being an informed data scientist when it comes to ethics.

These are the common subjects I see many beginners struggle with. It is hard to know, starting out, which skills are relevant and which are only nice-to-have. It is common to get distracted, annoyed and inevitably discouraged by the seemingly endless things you “think” you need to learn. But it is no surprise. Data science is an overarching term that covers too wide of an area of expertise.

At the end of the day, it is your responsibility to figure out what to prioritize. Having a clear vision of where you want to end up in your data science career will help you steer clear of the distractions and master the relevant skills. Until you figure out your vision, I hope this article can help you keep your focus on the important things.

Are there any other data science subcategories, applications or topics that you’re not sure about? Let me know in the comments and I will share my thoughts.