Gartner predicts that 40 percent of data science tasks will be automated in less than three years. However, as the complexity of machines increases, so will the need for the “science” side of data science.
It seems not a month goes by without a headline stating that a future shortage of data scientists is imminent. And then there is the competing statement by research firm Gartner, that 40 percent of data science tasks will be automated by 2020.
While Gartner is an often-cited authority within the technology community for its predictions, there is some data missing in its postulations. For example, have Gartner analysts sat down with actual data scientists who function within organizations of all sizes? Or is its prediction merely based on the current impetus of machine learning and artificial intelligence permeating the Internet of Things?
Sean Downes, senior data scientist at Expedia.com, led a session at Qubole’s Data Platforms conference in May 2017 entitled “Industrializing Data Science.” He walked participants through the specific issues Expedia faced with migrating its huge data infrastructure to the cloud. In between the discussion of A/B testing and microservice logs, Downes shared some speed bumps that could either substantiate or counter Gartner’s “data science automation” prediction.
Clarity and detail regarding the data are paramount to data scientists:
- Who owns the data?
- Who owns that field?
- What is this field?
- Where did that field go?
- Why is this field NULL?
As Downes later stated, he was once asked at what point can one consider themselves a data scientist. He answered, “the emphasis is on science.” Scientists are persistent in their questioning and seeking solutions to human- and systems-based problems that arise. While artificial intelligence promises to automate the detail provision, the data is intended for human use. Despite the eagerness of the marketing world to predict human behavior, human emotion throws a wrench into predictive analytics. It’s akin to chasing a constantly moving target.
Why data scientists still top AI
Inevitably, data scientists will pose more questions. Until artificial intelligence can predict which question an expert is going to ask based on a multitude of factors (which means the machine must have the capability to read minds) trained data scientists will still be in short supply.
[ Related: Why The Future of Data Science Is Data Psychology ]
Data scientists are not engineers and engineers are not data scientists. However, Downes emphasized fundamental similarities in streamlining the data science process, which also aligns with data engineering:
- Pick a robust standard and stick to it.
- Production code matters, so document and format to provide information as to the intention of what you’re doing as well as the results you’re expecting.
- Pipelines count as production code
But, he included a message to the engineers as to how they can help data scientists do their job:
- Create a space for data scientists to save test and training data.
- Cluster bootstrap permissions.
- Provide access to S3 buckets.
- Sandbox clusters are an important part of a data scientist’s ability to test models.
Based on the close interaction between data scientists, data engineers, and the rest of an enterprise, if Gartner’s prediction is correct, then it won’t be only data science tasks (to a certain extent) that will be automated. An organization will rely on machines to function as data engineers and analysts. Also, given the increasingly pervasive data breaches occurring, Gartner’s proposed “citizen data scientist” might have decreasing access to large data sets if the U.S. adopts regulations such as the General Data Protection Regulation set to take effect in 2018.
While everyone waits to see the full extent of artificial intelligence capabilities, data scientists are still in demand. As the complexity of machines increases, so will the need for the “science” side of data science.