Initially, one might think that data science is difficult because of the rigorous and complex math related to machine learning algorithms. Or you might find data cleaning difficult, getting the data into a usable state for your model fitting . However, as this article states, these can actually be a couple of these easiest parts of data science. Once you understand the math, you can pretty quickly grasp new algorithms, and once you have some experience cleaning data, it becomes a tedious but moderately easy task. Some machine algorithms are even implemented for you already through software libraries and can simply be treated as a black box. And when trying to figure out which model to fit to a problem, there are handy flowcharts and diagrams available that can greatly simplify the process.

In contrast, data science is difficult because one must know what questions to ask as well as how to answer those questions (i.e. figuring out what data to collect and how to determine if you have found a solution or not). Data collection, cleaning, exploratory analysis and visualizations, model fitting, model analysis and model validation are all necessary processes one must take when trying to answer a data-based question. One can only enhance these skills through constant practice and exposure. I feel there can be some misunderstanding about what exactly data science is, and hopefully this post helps readers gain a better idea of the various components that go into answering questions with a data-driven approach, as well as serve as a precursor of the difficulties that can arise when entering the field.