May 21, 2020

Best data science resources and how to use them to attain data science competency

One of the most common requests I get is recommending online courses. This is not an easy feat. The online course scene is very competitive, there are many courses offering different sets of knowledge and nearly all claiming that they are the best course. That's why this week I sat down and reviewed the content of the most popular courses and books out there for learning data science.

In this article, I will share my opinions on the popular data science courses and books. At the end of the article, you can find my recommended way of studying data science using these resources.

I know...

...you just want to get to the list of recommended courses but bear with me a second. What I have to say matters for your course choice. There are a couple of things you should keep in mind when looking for courses on data science:

  • The creators of each of these courses have different ideals in mind when they are preparing their courses and they choose topics to teach based on that. Don't fully trust one instructor to have a full and unbiased view of the discipline.
  • Take claims like "complete bootcamp" with a grain of salt. The fact that a course covers everything you ever heard about data science doesn't mean that it covers them properly.
  • At the end of the day, you have the choose the course that is right for you. The teaching style, the level of the course and pre-requisites will matter. Don't just go with the most popular choice.
  • If you're having trouble identifying courses that will fit your specific situation, my suggestion is, when to you when looking for courses go find the lowest scores of a course and see if what people are complaining about applies to you. For example, a lot of people might be giving the course 5 stars but if you studied maths and someone closer to your situation tried to take the course and did not like it with reasons related to math, then it is evidence that the course might be wrong for you.
  • You don't have to get all your knowledge from a single course. You can find a couple of courses that complement each other. Sometimes it's even better to get your knowledge from a couple of sources. A bit of overlap is not a bad thing.
  • Most importantly, a course cannot give you everything you need to know. Think of it as a starting point, choose it smartly but don't overthink it. All you need to learn is the fundamentals. The rest comes from practice.

Now that you've listened to my advice let's jump to the review section. First, it's important for you to know how I evaluated these courses.

How did I evaluate these resources?

If you've taken my data science kick-starter course, you know I divide the knowledge required for doing data science into 3 layers. Theoretical, practical and technical. I will evaluate each course based on how each course fulfils the required theoretical knowledge.

Keep in mind that I am judging the courses from the point of getting the fundamental knowledge required for data science. All I care about is will this course take you from "not knowing much about data science" to a point where you can start implementing your own projects. Because once you know enough to start implementing, you will cover all the gaps in your knowledge by trial and error. There is a lot to attribute to experience in data science. Not everything can be learned by studying.

Disclaimer: I am not taking the courses or doing their quizzes/exercises. My knowledge will be limited. What I try to do here is to go over the syllabus and preview videos of the course and try to come to a verdict. I also browse through the course reviews to see how previous students reacted to it.

I did these reviews because I want to offer my current knowledge as a data scientist to help you judge which course is better for you. The difference between me and a non-data scientist doing this is that I have a better view of the required skills and what is important for a data scientist to know. At the end of the day, how I evaluate these courses are my personal opinions and do not represent any other party or organization. This article or anything I say in it is not sponsored or promoted.

That's enough cautious talk from me. Let's get down to business.

Online courses

IBM Data-Science Professional Certification

This certification is a very good start. It is suitable for total beginners. It will take you from an "I don't know what data science is" point to a very knowledgeable point. It covers a good breadth of topics but it is not complete and does not include some of the more advanced topics. It is a good way to learn the fundamentals of data science, in a simple way. I would not recommend this course to be the only course you take. Complementing it with a course that has more in-depth teachings would serve you better. Overall, it is a good first course.

John's Hopkins University Data Science Specialization

This course feels very academic yet also has a very practical side. It includes some of the fundamental knowledge but I find it lacking when it comes to providing a full picture of data science. Important knowledge such as data preparation and dealing with data-related issues is missing but there are sections about statistics and research. It doesn't feel like it would be a great translation to industry. It goes into some advanced topics such as dimensionality reduction but I'm not sure if enough foundation is provided beforehand. I would not recommend this course for a beginner. Moreover, I would not have chosen it for my own studying simply because it is not complete.

Applied Data Science with Python Specialization from the University of Michigan

This course is good for some things and not good for some other things. It is good for visualizations and for people who are interested in getting into text mining but not good for getting a good picture of machine learning or general data science practices. Has a good breadth of popular machine learning algorithms but somehow misses unsupervised learning and lacks the data preparation. Could be a good side-course if you have time even only to learn text mining basics.

Andrew Ng's Machine Learning

This course is specifically for machine learning and not general data science. It teaches the math and theory of machine learning by also touching upon advanced topics. Thus, it is not a beginner's course. The biggest downside of this course is its use of pretty outdated tools, like Matlab. Additionally, it neglects some of the very important algorithms like decision trees or random forests. Thus, doesn't feel like it is a complete package or maybe it was complete 8 years ago but now has gaps in it. I would recommend taking this course only after having a basic understanding of data science.

The Data Science Course 2020: Complete Data Science Bootcamp

This is one very ambitious course. It has everything from career advice to Python to machine learning algorithms to statistics. With the number of topics in this course, you can make five courses. And although it covers many topics, it looks like it misses some fundamentals concepts such as data preparation and machine learning fundamentals. Moreover, the structure seems a bit messy. As I mentioned before, courses who are very ambitious about what they teach tend to be superficial. I have not taken this course myself but I do not believe deep learning and career advice should be in the same course. With all due respect, I would not go for this course when all the other courses are available.

Python for Data Science and Machine Learning Bootcamp

One of the most complete courses I've seen on the internet. It has a good introduction to data science related technologies and teaches some data preparation practically with Python while also touching upon data visualization. Many of the most popular algorithms are introduced but I'm not sure how well. The lessons do not seem to go deep into how these algorithms work but rather how they are implemented in Python. Most of the basic data science fundamentals are included. I would say, from what I've seen it is a very practical course. It is probably not very good for learning the theoretical side of data science but it is good for learning how to code python when it comes to machine learning.

Data Science A-Z™: Real-Life Data Science Exercises Included

It seems like this course is made to create one very specific type of data scientist. It is not a theoretical course, rather it tries to be very practical by teaching how-tos of data science but with very specific tools and software. I don't think the skills taught in this course generalize very well. Thus, I would not recommend this course.

Machine Learning A-Z™: Hands-On Python & R In Data Science

This course teaches everything in both Python and R which makes it too long but, of course, you can skip the implementation videos of the language you are not interested in. It has a good selection of machine learning algorithms but machine learning seems to be the only thing this course focuses on. It does not include visualizations, python basics, data science technologies or theory of machine learning. Only data preparation that is relevant to training machine learning algorithms is mentioned briefly. This course could be a good option for learning more about machine learning algorithms and how to implement them rather than a source for learning general data science.

Complete Machine Learning and Data Science: Zero to Mastery

This is not one of the first courses to pop up but I included this course here because I took the web development course from the same guys and loved it. It seems to be a very practical and hands-on course. It doesn't give off the vibe that it will teach many of the theory behind ML algorithms or in general data science. Rather it will teach you Python, setting up a data science environment, basics of data science and implementations of ML algorithms. This is a good course for when you want to get hands-on.

Professional Certificate in Data Science by HarvardX

This course is one of the best courses out there that I can find. It is theoretically very complete. It goes over the fundamentals of machine learning properly, introduce the most important algorithms and also the technologies. I found the data preparation (data wrangling in the course) section a bit lacking. The only thing I don't like about this course is the fact that the main technology is R. All in all, it sounds like a solid course to start with.

MicroMasters® Program in Data Science by UC San Diego

This is an advanced course that can be taken to further your learning. It has a great section for math and statistics. If you'd like to have a solid theoretical basis for your machine learning knowledge, I would take this course after being confident in the main concepts of data science.

Data Science Foundations by IBM

This is as simple as a course on Data Science would get. I would do this course if you're still on the fence about getting into data science and want to get a taste for what it is. It is just a very brief introduction and doesn't take much time. You can take this course in combination with Data Analysis with Python and Machine Learning with Python to get into data science more.

Books

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

This book is one of the best books out there and maybe even the best. It basically has everything you need to know to feel confident in your data science knowledge. All the theory and maths and even statistics. It is a standalone champion if you ask me.

Although I think this book will work well with anyone, the general opinion is that this is a mid-level book rather than a beginner's book. I agree to a certain degree. I would not recommend the whole book to a beginner, but only the first section. This is because the second section goes into deep learning which is not necessary for starters. The first part should be enough to give you a very solid understanding of data science anyways.

You can easily complement it with one of the simple courses I mentioned above to have a softer start to data science and then go all-in with this book.

Data Science from Scratch, 2nd Edition

The book's goal is to get the reader to implement the algorithms by hand without using any libraries. In doing so, the book teaches the fundamentals of data science and machine learning very well. Even if you don't want to do hands-on implementation of machine learning algorithms from scratch the first 11 chapters before the hands-on coding starts, is very useful to learn about the theory behind data science. Also, the author is hilarious.

Introduction to Machine Learning with Python

This is another great beginner level book. It focuses mainly on machine learning and the theory behind it. It doesn't spend much time working on the data. Nevertheless, it is a good book to study the theory and the practicality of machine learning algorithms.

Learning platforms

I know there are online platforms such as Datacamp and Dataquest. I haven't had time this week to review them. I will go through them soon and share my opinions by updating this article.

Summary or final verdict if you will

There are a couple of paths I'm going to suggest:

Option 1

If you want to learn the theory behind data science and machine learning and also get hands-on

IBM Data-Science Professional Certification to build a foundation

+

Andrew Ng's Machine Learning course to get theoretical knowledge of machine learning (by being mindful of dated applications and taking the implementations with a grain of salt)

+

Python for Data Science and Machine LearningBootcamp or Machine Learning A-Z™: Hands-On Python & R In Data Science to get hands-on knowledge

Option 2

If you want to focus on theoretical knowledge

IBM Data-Science Professional Certification to build a foundation

+

MicroMasters® Program in Data Science by UC San Diego to dive deeper into advanced topics

Option 3

If you want to take only one course

Professional Certificate in Data Science by HarvardX to study overall data science skills

Option 4

If you are a very hands-on oriented person who doesn't want to learn much theory

Complete Machine Learning and Data Science: Zero to Mastery

Option 5 (personal favorite)

Courses and your learning journey is very personal. That's why I've been very careful not to make any grand claims about what you should or should not do. But of course, I have a favourite. I am in no way saying this is the best way possible for everyone. I'm merely saying if I had to give my brother or sister advice on how to study for data science, I would tell him/her to follow this option.

IBM Data-Science Professional Certification to build a foundation

+

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow to get a thorough and sound understanding of data science and machine learning concepts

+

Keep doing projects on the side no matter how little

Final word

Look at these resources practically, get from them whatever you can and leave whatever doesn't serve you at that point. You will have time to iterate, revisit and re-learn some (if not most) of the concepts. Don't see the end of a course or attaining of a certification as the end of your learning journey by which time you should know everything. Your learning doesn't end after taking these courses or attaining some certification. Instead, it is the beginning of a new chapter in your learning. At this point, you should start putting your skills to use on projects. This way you can practice what you learn, find gaps in your knowledge and improve yourself even further by addressing those gaps.

I hope this article will help you identify your first steps towards data science so that you can have a solid transition to the more hands-on chapter of your journey.