A year ago, I was a numbers geek with no coding background but wanted to learn data science. After trying an online programming course, I was so inspired that I enrolled in one of the best computer science programs in Canada.
Two weeks later, I realized that I could learn everything I needed through edX, Coursera, and Udacity instead. So I dropped out.
The decision was not difficult. I could learn the content I wanted to faster, more efficiently, and for a fraction of the cost.
I already had a university degree and, perhaps more importantly, I already had the university experience. Paying $30K+ to go back to school seemed irresponsible.
I started creating my own data science master's degree using online courses shortly afterward, after realizing it was a better fit for me than computer science. I scoured the introduction to programming landscape. I've already taken several courses and audited portions of many others. I know the options, and what skills are needed if you're targeting a data analyst or data scientist role.
For this guide, I spent 20+ hours trying to find every single online introduction to programming course offered as of August 2016, extracting key bits of information from their syllabi and reviews, and compiling their ratings. For this task, I turned to none other than the open source Class Central community and its database of thousands of course ratings and reviews.
Since 2011, Class Central founder Dhawal Shah has kept a closer eye on online courses than arguably anyone else in the world. Dhawal personally helped me assemble this list of resources.
How we picked courses to consider
Each course had to fit four criteria:
It introduces programming and, optionally, computer science. See "A note on Programming vs. Computer Science" below.
The language of instruction is Python or R. These are by far the two most popular programming languages used in data science.
It must be an interactive online course, so no books or text-based tutorials. Regarding the latter, Codecademy's video-less and text editor-based courses would qualify, but strict text tutorials like the ones from R tutorial would not. Though books are viable ways to learn programming, Python, and R, this guide focuses on courses.
It must be a decent length: at least ten hours in total for estimated completion.
How we evaluated courses
We believe we covered every notable course that exists and which fits the above criteria. Since there are seemingly hundreds of courses on Udemy in Python and R, we chose to consider the most reviewed and highest rated ones only. There is a chance we missed something, however. Please let us know if you think that is the case.
We compiled average rating and number of reviews from Class Central and other review sites. We calculated a weighted average rating for each course. If a series had multiple courses (like Rice University's Part 1 and Part 2), we calculated the weighted average rating across all courses. We also read text reviews and used this feedback to supplement the numerical ratings.
We made subjective syllabus judgment calls based on three factors:
Coverage of the fundamentals of programming.
Coverage of more advanced, but useful, topics in programming. (E.g. several courses choose to not cover object-oriented programming. We believe this is a key topic, though not a deal-breaker, hence these courses only being docked marks and not excluded from consideration.)
How much of the syllabus is relevant to data science?
A note on Programming vs. Computer Science
Programming is not computer science and vice versa. There is a difference of which beginners may not be acutely aware. Borrowing this answer from Programmers Stack Exchange:
Computer science is the study of what computers [can] do; programming is the practice of making computers do things.
The course we are looking for introduces programming and optionally touches on relevant aspects of computer science that would benefit a new programmer in terms of awareness. Many of the courses considered, you'll notice, do indeed have a computer science portion.
None of the courses, however, are strictly computer science courses, which is why something like Harvard's CS50x on edX is excluded.
Our pick for the best programming course for data scientists is...
University of Toronto's "Learn to Program" series on Coursera. LTP1: The Fundamentals and LTP2: Crafting Quality Code have a near-perfect weighted average rating of 4.71 out of 5 stars over 284 reviews. They also have a great mix of content difficulty and scope for the beginner data scientist.
This free, Python-based introduction to programming sets itself apart from the other 20+ courses we considered.
Jennifer Campbell and Paul Gries, two associate professors in the University of Toronto's department of computer science (which is regarded as one of the best in the world) teach the series. The self-paced, self-contained Coursera courses match the material in their book, "Practical Programming: An Introduction to Computer Science Using Python 3." LTP1 covers 40-50% of the book and LTP2 covers another 40%. The 10-20% not covered is not particularly useful for data science, which helped their case for being our pick.
The professors kindly and promptly sent me detailed course syllabi upon request, which were difficult to find online prior to the course's official restart in September 2016.
This course provides an introduction to computer programming intended for people with no programming experience. It covers the basics of programming in Python including elementary data types (numeric types, strings, lists, dictionaries, and files), control flow, functions, objects, methods, fields, and mutability.
Installing Python, IDLE, mathematical expressions, variables, assignment statement, calling and defining functions, syntax, and semantic errors.
Strings, input/output, function reuse, function design recipe, and docstrings.
Booleans, import, namespaces, and if statements.
For loops and fancy string manipulation.
While loops, lists, and mutability.
For loops over indices, parallel lists and strings, and files.
You know the basics of programming in Python: elementary data types (numeric types, strings, lists, dictionaries, and files), control flow, functions, objects, methods, fields, and mutability. You need to be good at these in order to succeed in this course.
LTP: Crafting Quality Code covers the next steps: designing larger programs, testing your code so that you know it works, reading code in order to understand how effective it is, and creating your own types.
Designing algorithms: how do you decide what to do in a function body? How do you figure out what functions to write in the first place?
Automated testing: doctest and unittest.
Analyzing code for speed - details of searching and sorting.
Creating new types: classes in Python.
Functions as arguments, default parameter values, and exceptions.
Associate professor Gries also provided the following commentary on the course structure: "Each module has between about 45 minutes to a bit more than an hour of video. There are in-video quiz questions, which will bring the total time spent studying the videos to perhaps 2 hours."
These videos are generally shorter than ten minutes each.
He continued: "In addition, we have one exercise (a dozen or two or so multiple choice and short-answer questions) per module, which should take an hour or two. There are three programming assignments in LTP1, each of which might take four to eight hours of work. There are two programming assignments in LTP2 of similar size."
He emphasized that the estimate of 6-8 hours per week is a rough guess: "Estimating time spent is incredibly student-dependent, so please take my estimates in that context. For example, someone who knows a bit of programming, perhaps in another programming language, might take half the time of someone completely new to programming. Sometimes someone will get stuck on a concept for a couple of hours, while they might breeze through on other concepts ... That's one of the reasons the self-paced format is so appealing to us."
In total, the University of Toronto's Learn to Program series runs an estimated 12 weeks at 6-8 hours per week, which is about standard for most online courses created by universities. If you prefer to binge-study your MOOCs, that's 72-96 hours, which could feasibly be completed in two to three weeks, especially if you have a bit of programming experience.
Another great Python option
If you already have some familiarity with programming, and don't mind a syllabus that has a notable skew towards games and interactive applications, I would also recommend Rice University's An Introduction to Interactive Programming in Python (Part 1 and Part 2) on Coursera.
With 6,000+ reviews and the highest weighted average rating of 4.93/5 stars, this popular course is noted for its engaging videos, challenging quizzes, and enjoyable mini projects. It's slightly more difficult, and focuses less on the fundamentals and more on topics that aren't applicable in data science than our #1 pick.
The materials are self-paced and free, and a paid certificate is available. The course must be purchased for $79 (USD) for access to graded materials.
The condensed course description and full syllabus are as follows:
"This two-part course is designed to help students with very little or no computing background learn the basics of building simple interactive applications ... To make learning Python easy, we have developed a new browser-based programming environment that makes developing interactive applications in Python simple. These applications will involve windows whose contents are graphical and respond to buttons, the keyboard, and the mouse.
Recommended background: A knowledge of high school mathematics is required. While the class is designed for students with no prior programming experience, some beginning programmers have viewed the class as being fast-paced. For students interested in some light preparation prior to the start of class, we recommend a self-paced Python learning site such as codecademy.com."
Timeline: 5 weeks
Estimated time commitment: 7-10 hours per week
Week 0 - statements, expressions, variables
Understand the structure of this class, and explore Python as a calculator.
Week 1 - functions, logic, conditionals
Learn the basic constructs of Python programming, and create a program that plays a variant of Rock-Paper-Scissors.
Learn the basics of event-driven programming, understand the difference between local and global variables, and create an interactive program that plays a simple guessing game.
Week 3 - canvas, drawing, timers
Create a canvas in Python, learn how to draw on the canvas, and create a digital stopwatch.
Week 4 - lists, keyboard input, the basics of modeling motion
Learn the basics of lists in Python, model moving objects in Python, and recreate the classic arcade game "Pong."
Week 5 - mouse input, list methods, dictionaries
Read mouse input, learn about list methods and dictionaries, and draw images.
Week 6 - classes and object-oriented programming
Learn the basics of object-oriented programming in Python using classes, and work with tiled images.
Week 7 - basic game physics, sprites
Understand the math of acceleration and friction, work with sprites, and add sound to your game.
Week 8 - sets and animation
Learn about sets in Python, compute collisions between sprites, and animate sprites.
If you are set on R
If you are set on an introduction to programming course in R, we recommend DataCamp's series of R courses: Introduction to R, Intermediate R, Intermediate R - Practice, and Writing Functions in R. Though the latter three come at a price point of $25/month, DataCamp is best in category for covering the programming fundamentals and R-specific topics, which is reflected in its average rating of 4.29/5 stars.
We believe the best approach to learning programming for data science using online courses is to do it first through Python. Why? There is a lack of MOOC options that teach core programming principles and use R as the language of instruction. We found six such R courses that fit our testing criteria, compared to twenty-two Python-based courses. Most of the R courses didn't receive great ratings and failed to meet most of our subjective testing criteria.
Another option for R would be to take a Python-based introduction to programme course to cover the fundamentals of programming, and then pick up R syntax with an R basics course. This is what I did, but I did it with Udacity's Data Analysis with R. It worked well for me.
You can also pick up R with our top recommendation for a statistics class, which teaches the basics of R through coding up stats problems.
If you enjoyed reading this, check out some of Class Central's other pieces: