The practice of data science requires the use of analytics tools, technologies, and programming languages to help data professionals extract insights and value from data. A recent survey of nearly 24,000 data professionals by Kaggle revealed that Python, SQL, and R are the most popular programming languages. The most popular, by far, was Python (83% used). Additionally, 3 out of 4 data professionals recommended that aspiring data scientists learn Python first.
Kaggle conducted a worldwide survey in October 2018 of 23,859 data professionals (2018 Machine Learning and Data Science Survey). Their survey included a variety of questions about data science, machine learning, education and more. Kaggle released the raw survey data and many of their members have analyzed the data (see link above). I will be exploring their survey data over the next couple of months. When I find something interesting, I'll be sure to post it here on my blog. Today's post is about the data science and machine learning programming languages data professionals used in 2018.
Most Popular Programming Languages
Of the data professionals who identified as a data scientist, 93% used Python, 54% used SQL and 46% used R.
The survey also asked the respondents, "What specific programming language do you use most often?" As seen in Figure 2, a little over half (54%) of data professionals use Python most often. The remaining programming languages are much less popular, with only 13% of data pros saying they use R and 8% saying they use SQL.
Comparing programming language usage from 2017, we see that usage of Python has increased 23 percentage points (60% used in 2017) SQL usage increased 2 percentage points (44% used in 2017). However, R usage decreased 10 percentage points (46% used in 2017).
Which Programming Language is Recommended Most?
The survey also asked respondents what programming language they would recommend an aspiring data scientist to learn first (see Figure 3). Results showed that 3 out of 4 data professionals would recommend Python as the programming language aspiring data scientists to learn first. The remaining programming languages are recommended at a significantly lower rate (R recommended by 12% of respondents; SQL recommended by 5% of respondents.
When looking at data professionals who identified as a data scientist, we find similar recommendations for aspiring data scientists: Python (78%), R (13%) and SQL (5%)
The results of the Kaggle survey of over 23,000 data professionals paint a clear picture about the most popular programming languages for data professionals. Python, by far, is the most popular programming language, followed by SQL and R. Not surprisingly, Python is the most recommended programming language for aspiring data scientists. So, even though data professionals have access to many different programming languages, it appears that Python is becoming the default programming language for data science and machine learning.