There are many toolkits available for doing typical data analysis/data mining/machine learning tasks. There are programming languages, libraries, frameworks and development environments to choose from. Some are aimed at solving particular problems, others at techniques and algorithms. This presents a challenge to all the newcomers. Which toolkits to learn? Just one? All of them?

I’ve done the mistake of trying to use too many. I’ve been using R, Octave, Python and Matlab on and off. Some online courses were offered with exercises in one of languages but not the others. The issue is that the tools knowledge fades out with time so if I’m not actively programming in Matlab I gradually forget its idiosyncrasies. And fundamentally the idiosyncrasies are all there is to alternatives.

There is a better way. Choose one toolkit and stick to it. Sure, you’d miss some of the courses that are offered in the other languages but there are so many alternatives available that I wouldn’t worry. There are more courses and resources than anyone can realistically complete.

I’d choose between R and Python. Better choose the one you’re already using or the which is closest to what you already use. There are better ways to spend time than to learn the peculiarities of each environment and tool. Instead of learning and re-learning how to process data frames in each programming language practice solving problems, new techniques or fundamentals. Investing time to learn all the tools is wasteful. I’ve personally settled on the following:

  • Python as the programming language
  • NumPy & Pandas as data manipulation libraries
  • scikit-learn as the machine learning toolkit
  • matplotlib and seaborn as visualization libraries
  • Jupyter as the IDE

I chose Python because I’m also using it at work. I prefer to reinforce my knowledge and slowly achieve mastery than to stumble in ten different languages. There are enough novel directions to explore in the machine learning world than just API quirks. I heartily recommend Anaconda-Navigator bundle that includes everything necessary to begin working in Python. It sort of reminds me of installable LAMP stacks that were popular years ago.

There’s one reason I’d consider switching to another environment or a programming language: getting paid to do it. For example, joining a company that uses R. Otherwise, I’ll keep using the same tools and focus on the more valuable activities, like trying deep learning.