Data scientists used to be more in demand, while now the new category of data engineers seems to be equally desirable.
The corporate world of data science has changed. There used to be a data science lab, similar to a research lab, where things were experimented, one at a time, until the best model or dashboard was found. The lab included a few data scientists who were responsible for every step in the data science cycle: accessing the data and controlling its quality, building models, creating dashboards, and especially productionizing the final application.
Now that data science, or AI if you wish, has reached maturity, this kind of research lab philosophy might not work anymore because we are building more dashboards, training more and larger models, needing better data quality, complying with data protection laws, following the company’s guidelines, and so on. At whatever level we are working, be it producing 1000 reports per month or training a billion-parameter AI model, it is clear that this is not a business for just a few individuals, no matter how skilled they are.
What we observe in the corporate world is indeed a diversification (a specialization) of the original data scientist profession into different data professional profiles: Data Analysts, Data Engineers, Data Scientists, and Machine Learning Engineers. This is without counting in Software Developers and IT Administrators, who do not work directly with the data but are still necessary for the smooth functioning of the data science lab.
Andrea De Mauro and his colleagues, in 2018, after scraping ten job search websites, grouped job openings for data professionals into four categories: business analysts, data scientists, developers, and system managers. (Mauro’s and his colleagues’ work can be found here and here.) Tom Davenport and DJ Patil, in their latest Harvard Business Review article, are still referring to just one category, the data scientists, who are now facing a number of new challenges, including the ethics of data science.
In my experience, and after running a number of interviews with data professionals, I am more inclined to agree with this. That is, I prefer to separate the data professions into a few different categories. However, a few years after, I believe that new categories of professionals have emerged who focus exclusively on the data, on the reports, or on the machine learning models.
Running a few searches on Indeed.com, limited to the U.S. job market, for job openings posted in a 24-hour period, we recently searched for jobs for data analysts, business analysts, data scientists, big data, data engineers, and machine learning engineers.
Data & business analysts are the most sought-after professionals (at least as of today), whereas data scientists and data engineers are less demanded.
Machine learning engineers represent a new category of professionals who emerged with the maturity of the data scientist profession. While the boundary between data scientists and machine learning engineers is sometimes fuzzy, usually, machine learning engineers are in charge of developing a structured framework for repetitive creation, training, and optimization of machine learning models.
A huge surprise is the relatively high number of big data job posts still out there. The wording “big data” has by now almost disappeared from the data science landscape, and yet it survives persistently in the job-hunting field.
Every data science lab requires three different types of data analytics:
- Reporting and generating dashboards;
- Plumbing around a machine learning model;
- Creating and training machine learning models themselves.
Reports and dashboards are the oldest forms of data analytics, and yet they continue to be very useful for understanding at a glance what is going on within a specific domain. Nothing can replace a line plot when we talk about revenues or a bar chart when we count occurrences in the eyes of an expert manager.
In the last few years, however, predictive analytics and, therefore, the application of machine learning algorithms has been adopted in many ‘data labs.’ Together with the classic analysis of the current situation with dashboards and reports, applications are developed to predict what is coming next, what the next trend is, and what the next best choice could be. Creating and training machine learning models has become common practice in ‘data science labs’, even in the small ones.
With such common practice comes the need for data blending, data quality, model testing, and for data quality monitoring. A framework must be put in place to regularly update the data coming from all sorts of data sources, clean the merged data, and monitor their quality. All this plumbing activity, aiming at controlling and repeating the data acquisition process, has the flavor of a massive engineering work.
Aggregating job posts for the analysts (business and data analysts), the scientists (data scientists and machine learning engineers), and the data experts (data engineers and big data experts) together, we found a clear dominance of the analyst job positions.
Let’s explore these professional profiles and their skills in greater detail.
Data and business analystsare responsible for the creation and maintenance of dashboards and reports to reliably measure all aspects of the field and/or the business they work in. Their goal is to transform relevant insights into impactful decisions in their area of work.
Traditionally, data analysts were usually employed in the finance and business sector. However, generally speaking, dashboards and reports are by now used by all sorts of professionals with deep expertise in a specific domain, be it finance, healthcare, IoT, pharma, or the automotive industry, to track, discover, and correct issues of any type.
Data analysts are not necessarily IT or machine learning data experts per se, but they know their data in and out: the collection process, the business cases, the data preparation techniques, and the most suitable KPIs to provide ad-hoc insights.
From a skillset perspective, they are usually well-versed in data visualization and blending techniques since their main role within the data science lab is to explain the problem, monitor the process, and interpret the results. They are the necessary link between the data, the algorithms, the IT experts, and the business stakeholders.
See also: US Government Launches Digital Corps
Members of this category focus on the analytical methods for the transformation of data into insights. They are required to build and train machine learning and statistical models that identify patterns, extract relevant content, and make predictions based on large volumes of data.
In the past few years, data scientists have evolved into some kind of engineers. Indeed, while data scientists use their knowledge to design new innovative algorithms and applications, machine learning engineers are responsible for the framework that automatically creates, trains, updates, optimizes, and monitors existing machine learning models.
Data scientists and machine learning engineers alike are usually specialized in a sub-branch of machine learning algorithms. We find data scientists with deep knowledge of classic machine learning algorithms, specialized in deep learning networks, experts in Natural Language Processing, or proficient in time series analysis.
These professional figures are expected to keep up with the most advanced research and best practices for data science algorithms and techniques.
The last professional category to enter the data science landscape has been one of the engineers. To obtain great performing machine learning models, we need great, clean, exhaustive data.
Data engineers are typically responsible for the data pipelines bringing together information from different sources, for the data quality control, and for the maintenance of the data update infrastructure. Big data experts are those data engineers specialized in dealing with large amounts of data. They have been added to the mix of professionals who take care of the data quality.
Data engineers have become a very popular and well-paid category. Since a model’s performance largely relies on the data that have been used to train it and since many modern AI models need extraordinarily large amounts of data for training, it is clear why the role of a data engineer is on the rise.
What is the recipe for building the perfect data science lab? That many data scientists, then double this number and hire as many data analysts? It is hard to say.
Usually, data analysts are the most common data professionals in an average data science lab unless the data science lab is highly specialized in producing AI machines.
Data scientists used to be more in demand, while now the new category of data engineers seems to be equally desirable. Indeed, this trend of a high demand for data engineers seems to be destined to grow in the coming year.
While we cannot say the exact proportions, we can state for sure that an average data science lab will need all three categories of data science professionals.