Welcome to my review of the Coursera Data Science Specialization. The Data Science Specialization course provided on Coursera in partnership with John Hopkins University consisted of 9 courses. Each course focused on a different element of data science.
I've taken many MOOC courses, including quite a few on Coursera and I have not yet been disappointed with their format and quality of the courses. This course has been no different, I've had a great experience taking the course and I believe I've really learned a lot and gained a certification that looks great on my Linkedin profile.
An interesting feature of this course is the way they authenticate students, they identify you by taking a photo of your face with your webcam and then recording your typing fingerprint. It's debatable how well this typing fingerprint method of authentication works but in combination with the webcam image, I'm sure it's more than suitable.
There are a total of 10 courses included in the Data Science Specialization offered by Coursera. I'll go over them briefly below and let you know my personal opinion on each one of them.
The Data Scientist's Toolbox: This course is the foundation course that introduces you to the basic tools and ideas that will be involved during this entire course. I found it to be quite easy to understand and if you're already familiar with data science, this course might not be necessary for you. If you are new however, then it is a great starting point and I personally found it very useful. The course is split into 2 separate sections, one being an introduction to the concepts behind analysing data and making it into actionable knowledge and the other is a more hands on lesson in using the tools that will be used throughout the course including, but not limited to git, GitHub, Markdown, Control, R and RStudio. This course was a great help for me, and I personally didn't find it too challenging. I highly recommend it to beginners in this area. This course spans a course of 4 weeks which is very manageable given a small time commitment per week.
R Programming: This course is where things start to get really interesting and we begin making use of the language of R - a necessity for any data science enthusiast! This course helps you get to grips with the process of programming in R and how to use R to correctly analyse data. We were taught how to properly setup our programming environment to handle the tasks at hand and we were taught the most important functions of R including, profiling R code, commenting R code, putting data into R, utilizing R packages, making R functions, debugging and more. It was a very helpful introduction to R programming and by the end of it I felt like I had a good grasp on the subject. The course completed over 4 weeks and this is an adequate amount of time to complete it to a decent level providing you put the time in each week.
Getting and Cleaning Data: This was one of my favourite courses and it teaches something incredibly valuable. It's all well and good wanting to analyse data but where do you get that data from? It's rare that data will be readily available in a clean format for you to just download and analyse. You need to be able to gather data and then clean it so that it is ready to analyse. A lot of the time, in the real world when I have been analysing data it comes from the web. This course does a great job at teaching how to grab data from the web and clean it to make the data tidy. It also includes instructions on how to get data from APIs, Databases and from other people in the correct formats. A very useful course! This course takes 4 weeks and I didn't find it too difficult to complete while putting in a decent amount of hours per week.
Exploratory Data Analysis: Summarizing data using exploratory techniques is an essential part of data science and that is exactly what is taught in this course. You are taught how to implement these techniques before formal modelling is done and this can help with the development of more advanced statistical models. The plotting systems available in R are discussed and the basic elements of constructing data graphics is covered. There is some discussion on the multivariate statistical techniques that can be utilized to visualise high-dimensional data.
Reproducible Research: Reproducible research is a very important concept in data science which focuses on creating data analyses and reporting them with the associated data and code so that they can be reproduced and verified by anyone who has the reason to do so. This is increasingly important in today's world where datasets are becoming bigger and bigger and more complex computations are involved. Having this data available allows the focus to be shifted to the actual content of the data analysis instead of on the sometimes unreliable written report. In this course we are taught several literate statistical analysis tools which make it easy to publish data analyses in a document that is easily shared with others so that they can replicate the analysis and get the same results. I found this course to be quite challenging, if I'm honest. But the help available was able to get my by and by the end of it I did feel I had a strong grip on the concepts. Again, this is a 4 week course and I'd say that is an adequate amount of time to complete the course if you are able to put enough time in each week.
Statistical Inference: This course covers the topic of Statistical Inference which involves finding out scientific truths from data and the course covers many different types of modes for statistical inference including data oriented strategies, explicit use of designs and randomization in analyses and statistical modelling, This course does a very good job of making sense of the different types of complexities involved in statistical inference, personally I found this course quite challenging and I had to reach out for help multiple times, however by the end of the course I had a solid understanding of the broad directions of statistical inference and I was able to use statistical inference to make educated choices for analyzing data effectively. This is another 4 week course and I did find it slightly challenging to complete it in that timeframe, however I was struggling with other work at the same time so for you it might be easier.
Regression Models: In this course we covered linear models which makes use of linear assumptions. Sometimes described as the most significant tool in the data scientist's toolkit (and I'd have to agree). Both ANCOVA and ANOVA special cases of regression models are covered as well as using regression analysis, inference using regression models and least squares. This course provided a modern understanding of regression model selection and demonstrated various uses of the regression model including scatterplot smoothing. I found this course to be quite easy and to be honest, I felt it could have had a bit more involved. But overall, it provided a good understanding of regression models and I now feel comfortable using them in day to day situations. The course is 4 weeks and I was able to complete it way ahead of time even though I was still struggling with other work as well.
Practical Machine Learning: I found this course to be particularly essential to understanding how to gain a proper real world understanding of the uses of data science. This course provided me with the most useful information for real world applications of my data science knowledge and I am frequently able to use the knowledge in this course in real world use cases. The training included concepts for grounding in tests sets, overfitting, errors rates and training. There was plenty of algorithmic and model based machine learning methods involved which gave a through understanding of classification trees, Naive Bayes, regression and random forests. I was able to create a complete set of prediction functions including feature creation, algorithms, data collection and evaluation. This course was challenging and I did struggle to complete it within the 4 weeks provided, but I did manage to do it and I learned a lot from it.
Developing Data Products: Perhaps the second most essential course in my opinion in this list of courses, it helped me gain an understanding of data products and how to create them using R packages, interactive graphics and Shiny. I was able to create a story that could be understood by a mass audience using data science using the techniques involved in this course. This is another 4 week course and I was able to compete it with plenty of time to spare. I thought this course was easy, but essential learning.
Data Science Capstone: Here comes the exciting stuff! A real world application! This looks great on my CV and I'm sure it's convinced a recruiter or two that I know my stuff when it comes to data science. We created a fully functional data product and I was able to make it look really good. I recommend spending some time on this one to make it the best you can, as it's a perfect showcase for your skills on your CV or on Linkedin.
It's entirely up to your individual needs and current level of understanding, but for me as a relative beginner to data science I found it incredibly helpful and it has made my CV much more interesting to recruiters. I have received many more job offers since becoming qualified with this course and in general I feel more capable of performing data science tasks.
For the price of the course, you're not going to get anything better than this. That's for sure. $49 per course is incredibly cheap and the only downside is that it's marked by other students but that's the price to pay for such a cheap and quality education in this field.
Overall, I cannot recommend Coursera's Data Science Specialization courses enough. I found them to be incredibly valuable and they helped me further my career and skill-set. If you're at all interesting in data science and you're not sure where to get started or you just want to brush up on your skills, give it a go. You won't regret it!