SHARE
Facebook X Pinterest WhatsApp

Time Consumed by Data Prep: Is This a Bad Thing?

thumbnail
Time Consumed by Data Prep: Is This a Bad Thing?

Data professionals are spending too much time on data prep, but the quality assurance that provides ensures projects are working with clean data sets.

Written By
thumbnail
Joe McKendrick
Joe McKendrick
Apr 4, 2022

To have a responsive, responsible and accurate artificial intelligence or analytics system, one needs data. The catch is, data scientists and analysts are forced to spend more time with data prep than they do in model creation, making it of value to their businesses. This suggests a need for more data engineers and database administrators to handle much of the front-end work that goes into supporting data-driven applications. Importantly, it means a high degree of teamwork is needed to make data analytics practical.

Download Now: Building Real-time Location Applications on Massive Datasets

Ask any data scientist or analyst about the level of support they need to do the jobs they were hired to do. SAS did exactly that, as documented in their recent study of 277 data managers and scientists, which finds data professionals are spending too much time on data preparation, and not enough on model creation. Respondents are spending more of their time (58%) than they would prefer gathering, exploring, managing and cleaning data.

See also: Integration Projects: How Data Prep Benefits from Automation

A typical data science project involves a variety of activities, almost always beginning with preparing data. On average, 11% of data scientists’ or analysts’ time is spent creating computer models. The question is: is this enough?

Data prep may be onerous and takes time away from working on business issues, but it’s necessary, the SAS study’s authors point out. “Regardless of your level in the organization, data management will probably take a large share of your time, even with the development of low code/no code tools and AI and machine learning algorithms being written for it,” they point out. “The likely reason is that the data you have and how you decide what’s relevant is probably specific to your industry and organization. As is the case for how you approach your model-building, knowing which data is relevant and why has a lot to do with the issues you are trying to solve.”

Data scientist and Data Science Bootcamp Leader Patrick Butler agrees, noting that the whole front-end managing and cleaning data process “is an intrinsic part of the modeling process.” Without it, “all the modeling that follows is truly just math.” The quality assurance for the data coming in up front is essential for ensuring that training data is built on clean data sets.

Download Now: Building Real-time Location Applications on Massive Datasets
thumbnail
Joe McKendrick

Joe McKendrick is RTInsights Industry Editor and industry analyst focusing on artificial intelligence, digital, cloud and Big Data topics. His work also appears in Forbes an Harvard Business Review. Over the last three years, he served as co-chair for the AI Summit in New York, as well as on the organizing committee for IEEE's International Conferences on Edge Computing. (full bio). Follow him on Twitter @joemckendrick.

Recommended for you...

Real-time Analytics News for the Week Ending February 14
Cleaning up the Slop: Will Backlash to “AI Slop” Increase This Year?
Henry Young
Feb 13, 2026
On a Trust-Building Trajectory: AI in Network Automation
Brad Haas
Feb 12, 2026
AI at Scale Is an Operating Model Problem, Not a Technology One

Featured Resources from Cloud Data Insights

Real-time Analytics News for the Week Ending February 14
Why Satellite Connectivity Sits at the Heart of Enterprise Network Resilience
Fánan Henriques
Feb 14, 2026
Cleaning up the Slop: Will Backlash to “AI Slop” Increase This Year?
Henry Young
Feb 13, 2026
How Data Hydration Enables Scalable and Trusted AI
Peter Harris
Feb 12, 2026
RT Insights Logo

Analysis and market insights on real-time analytics including Big Data, the IoT, and cognitive computing. Business use cases and technologies are discussed.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.