May 29, 2020

How is data science different than computer science?

This is a question I get a lot from aspiring data scientists. I totally understand where the confusion comes from. A lot of people with a background in computer science become data scientists and the main tool of a data scientist is programming. It's only natural to think that these two fields are closely related. Let me tell you very briefly though why they are very different. It is important to understand this distinction because your understanding of data science will greatly affect what you choose to study and how you approach learning data science.

I always ask people to imagine a diagram in their heads when I explain how these two topics relate to each other. Let’s start:

Computer science is a vast area with many sub-disciplines. It includes anything and everything that has to do with computers. This could be algorithms, operating systems, databases, programming languages, cryptography and, yes you know it, artificial intelligence (AI). Even though it has been hyped so much and perceived as a standalone discipline AI is a sub-discipline of Computer Science.

BUT

Even though AI is merely a sub-discipline of the bigger area of Computer Science, it is in itself a vast discipline. It includes things that have anything to do with machines managing to do things they were not specifically programmed to do. This could be robotics, knowledge representation and learning, planning, evolutionary computing. Many people are not familiar with the general field of AI so it's normal if these terms sound foreign to you. But in there somewhere is a term we got bored of hearing every day: Machine Learning.

Basically, machine learning is part of a sub-discipline of AI. It is a way to learn how factors affect each other from data without needing to explicitly program a computer.

I hope everything is good so far. So, the big question is, "what does data science have to do with any of this?".

Well, data science is a profession where machine learning is heavily used. It is not a data scientist's only job to work with/optimize ML algorithms. There is much more to it, such as dealing with the data, preparing the data, preventing possible pitfalls, understanding subtle problems with the model etc.

The analogy I draw to describe the relationship between CS and DS is this: Data science is Computer Science’s grandson’s girlfriend. Does not really belong to the family, but it is not completely unrelated.

Apart from how they relate to each other, there is also some confusion caused by the use of code as part of a data scientist's job. Programming is just a way how data scientists implement the said machine learning algorithms and deal with the data. It is merely a tool. Many forms of engineers and even artists use code in today's world. The fact that programming is part of a data scientist's job in no way indicates that you need a computer science degree to become a data scientist.

Understanding data science doesn't end here though. The data science kick-starter course has a whole module dedicated to this, including information on how data science relates to other disciplines and how topics inside machine learning relate to each other. Make sure to check it out. You can find the sign-up form below this article.

I recommend you become confident in your understanding of what data science is and where it stands relative to its close family of topics. This will bring much more clarity and focus to your journey. Don't hesitate to ask questions in the comments section if there is anything that is unclear!