Often, when you’re learning a new skill, the hardest thing is knowing where to start. This is especially true with data science or programming in general. There are so many languages you could learn, all of which are bundled up in terminology that can seem dense and strange when you’re unfamiliar with it.
Data scientists often use one of three main languages - Python, R and SAS. In this article, we will look at which one is most appropriate for a beginner to pick up and learn, and how your choice will depend on what environment you plan on working in.
Before we dive into the details, here’s a brief overview of each language:
It’s worth noting that the data science industry is extremely dynamic, so the recommendations we’re offering here are likely to be different to what they would have been two years ago. This means that if you’ve previously toyed with this decision, it’s worth having an open mind, based on today’s data science work environment.
Cast your mind back to learning a new human language as part of a group. Perhaps you took French or Spanish? Those learning with an intention to visit the country for fun or work had an edge because of their motivation and opportunity to test it out in its native context. Learning a computer language has some parallels.
So, it is perhaps worth taking a moment to consider why you are looking at learning data science skills. What industries are you hoping to go into? Will you work as a freelancer, or do you want to join a small company or a corporate? What country will you work in? All of these aspects will determine both your choice of data science language and your ultimate learning success.
Geography can also play a part in your choice, so it’s wise to check the trends in your country of work.
It’s now time to weigh up your choices.
Python use has grown rapidly in the past few years and some are now predicting that it will be more popular than Java and C in three to four years. This makes it a good language to have in your toolkit as your investment in learning will have longevity. Plus, there’s a good chance that Python will be used on projects you will end up working on, meaning you’ll be increasing your employability and be able to hit the ground running.
Another benefit is that even as a beginner you can carry out meaningful work, and the language will be useful as you advance in your career. As Python is a free tool/open source language, this makes it highly accessible. However, there is no customer support, which means it is well worth investing in a course to learn it.
Python has libraries and functions for almost any statistical operation you would need, and it has become very strong in operations on structured data. Plus, as it is a general programming language, you can apply your skills outside data science.
Python is commonly used in web development. So, if you are considering working as a freelancer or with a start-up or an SME, starting with this language could quickly offer crossover benefits between online business and analytics.
Like Python, R is free and open source, making it accessible and driving its popularity.
R is the most difficult of the three languages to learn and it can be difficult to transition into if you already know another programming language, such as C. One R expert explains that its difficulty “is due to the fact that it is radically different from other analytics software and… is…an unavoidable by-product of its extreme power and flexibility”. There is a large online community for support, but not as large as that for Python. Some say that R makes hard things easy, and easy things hard.
Don’t let the difficulty put you off, however. R’s flexibility and power reaps dividends for those who persevere. Especially for those to whom data visualisation is a priority, the number of add-on packages means that there is likely to be something to meet your needs, making it the best among the three for visualisation.
Because of its power, R has a strong user base in academia and among pure data scientists. It is used in almost every sector, including finance, banking, health sciences and social media. Machine learning is one often-mentioned application, so for those heading for an AI-focused career, R could be worth the extra learning effort.
SAS is the undisputed market leader in commercial analytics space. Unfortunately, it is likely to be prohibitively expensive for an individual licence, unless you are eligible for a (limited) free student and teacher version.
SAS’s features above mean that it potentially offers the fastest learning trajectory compared to R and Python, especially for those with no coding experience. Familiar features such as drag and drop mean that you can get going without needing to code. There is also good technical support and training materials.
SAS has a user-friendly GUI and a vast array of statistical functions. It is similar to SQL – so you have a head start if you already know that language, which may influence your choice. Some complain that it misses out on the latest statistical functions compared to nimble, open-source R and Python.
It stands to reason that if you have access to SAS in your place of work, and you’re likely to stay working within large companies that can have a preference for SAS, it’s a no brainer to start here.
Which language you choose will depend on the direction you want to take your career, but in terms of accessibility, cost and career prospects, Python is the sensible choice for most beginners.