Data engineers, data analysts, and data scientists are all valuable additions to businesses of all size and scope. But they each have a different job to do.
Data scientists, data engineers, and data analysts all have one prominent task in common: They apply analysis to data. Granted, there are additional overlapping components to each job description, and the data scientist is the new kid on the block.
Given the overlap between the jobs, small- and medium-sized businesses who are wondering whether they need a data scientist tend to think data scientists will help design their database infrastructure and manage the data influx. Or they believe that the glitzy new analytics dashboard they just incorporated is their data scientist (or data analyst for that matter).
This doesn’t mean that a data scientist doesn’t have the skill to help you decide which data warehouse construct fits your business goals and objectives. And data scientists do analyze data with some of the same tools as a data analyst. However, expected “work products” are different and largely depend on prediction and inference using machine learning and statistical tools.
Machine learning and statistics aren’t precisely the same – the main difference is the intention of the results. We’ll come back to that shortly.
Data engineers are your architects of data. Need a functional database that accurately collects and stores structured, semi structured, and unstructured data? As the business grows, they will use various tools to help scale the infrastructure. More data equals more stress on your current system. A data engineer is your architecture super-hero who deploys data management and data warehouse tools such as Hadoop, Redshift, Google BigQuery, SQL, Java, and so forth.
Data analysts have been around for decades – and arguably longer than that. They gather data and run varying degrees of descriptive statistical calculations on a specific dataset they’ve pulled (with the help of the data engineer). Then, data analysts report the results. Given they often work with Excel, SAS, SPSS, IBM Watson or some other analytical software, they don’t need to know the intricate math underlying the quantitative analysis. It helps if they do, but their primary role is translating those numbers into “what does this mean in non-mathematical language?”
Data scientists are expected to go deeper. They pull a specific – often huge — dataset to answer a particular question, and test the data using machine learning and statistical algorithms. Certainly, some enterprises will require that we know SQL (or some version thereof) to cultivate the data from the database. And they also use Excel, SAS, SPSS, and IBM Watson to get an overview of the data.
Data scientists are also expected to perform some form of extract, load and transformation (or extract, transform, and load if we need to clean the data first). Part programmer and part statistician, a basic data science toolset is comprised of R, Python, C++, and Matlab (though a company can require additional languages based on their internal infrastructure). Learn more: Enterprise scale analytics with R — white paper
Data scientists create or tool machine learning algorithms to help scale predictions. (See: How to apply machine learning to event processing). But, they also use complex statistical modeling to determine if the answer to their initial question has robust inference – meaning it’s generalizable – to the population in our data set. Prediction and inference aren’t exactly the same thing and one of the traits of an expert data scientist is both knowing and developing tools that demonstrate their knowledge of this discernment.
Data engineers, data analysts, and data scientists are valuable additions to businesses of all size and scope. Hopefully, there is now more clarity as to how each provides a unique contribution to the world of data.