So, what is Data Science then?

I just finished a post on explaining the relationship between Artificial Intelligence, Machine Learning, and Deep Learning.  And somebody immediately pointed out: But what about Data Science? How does Data Science relate to all this?

Good question.  That’s what I am going to write about today then.

In case you do not want to read the whole post from yesterday (shame on you!), here is a quick summary:

  • Artificial Intelligence is covering anything which enables computers to behave like a human.  Machine Learning is a part of this as well as language understanding and computer vision.
  • Machine Learning deals with the extraction of patterns from data sets. This means that the machine can find rules for optimal behavior but also can adapt to changes in the world. Deep Learning is part of this, but so are decision trees, k-means clustering, or linear regression, among others.
  • Deep Learning is a specific class of Machine Learning algorithms which are using complex neural networks.  Recent advances in parallel computing made those algorithms feasible.

Deep Learning is a subset of methods from Machine Learning.  Which is again a subset of Artificial Intelligence.

Which brings us now finally to Data Science.  The picture below gives an idea how Data Science relates to those fields:ai_ml_dl_ds
Data Science is the practical application of all those fields (AI, ML, DL) in a business context.  “Business” here is a flexible term since it could also cover a case where you work on scientific research.  In this case your “business” is science.  Which actually is more true than you want to think about.

But whatever the context of your application is, the goal are always the same:

  • extracting insights from data,
  • predicting developments,
  • deriving the best actions for an optimal outcome,
  • or sometimes even perform those actions in an automated fashion.

As you can also see in the diagram above, Data Science covers more than the application of only those techniques.  It also covers related fields like traditional statistics and the visualization of data or results.   Finally, Data Science also includes the necessary data preparation to get the analysis done.  In fact, this is where you will spend most of your time on as a data scientist.

A more traditional definition describes a data scientist as somebody with programming skills, statistical knowledge, and business understanding. And while this indeed is a skill mix which allows you to do the job of a data scientist, this definition falls a bit short.  Others realized this as well which led to a battle of Venn diagrams.

The problem is that people can be good data scientists even if they do not write a single line of code. And other data scientists can create great predictive models with the help of the right tools.  But without a deeper understanding of statistics.  So the “unicorn” data scientist (who can master all the skills at the same time) is not only overpaid and hard to find.  It might also be unnecessary.

For this reason, I like the definition above more which focuses on the “what” and less on the “how”.  Data scientists are people who apply all those analytical techniques and the necessary data preparation in the context of a business application.  The tools do not matter to me as long as the results are correct and reliable.