Building the Data Prep Business Case Part 1: Five Scenarios to Quantify ROI


When it comes to building a business case for data preparation, start by taking a fresh look at your current approach and then consider areas where data can impact your ability to be successful.

Data is at the forefront of most decisions. Nevertheless, data has little value until it is validated, cleaned, and put into business context. This is called data preparation – turning raw data into meaningful information. Data prep is an intense effort that can take a lot of time and resources. So even though many organizations have assembled a team of data practitioners to handle data preparation requests, most realize the inefficiencies of their current process, which often involved coding, using legacy ETL tools, or other less than ideal processes.

See also: Infrastructure Architecture Requirements for Continuous Intelligence

Frustrated with how data preparation has been done in the past, business analysts and IT teams realize there is a new and better way to prepare their data but struggle to articulate how this new approach can help. When it comes to building a business case for data preparation, start by taking a fresh look at your current approach, including the tools/techniques you use then and consider areas where data can impact your ability to be successful.

Identifying the Business Value of Data Preparation

The first step in building your business case is to start identifying where the value of data preparation is going to manifest itself. In other words, where would the value of getting timely and accurate information have the most impact? In general, timely and accurate information can demonstrate its value in several ways, so it makes sense to review these five scenarios to see where it can help you the most:

Saving Operational Costs: Using data to understand and optimize current operations deeply is a benefit that spans across many lines of business units and industries. For instance, in supply chain management, properly matching supplies to actual demand is the key to success. The challenge is that demand is often articulated regionally in spreadsheets, with each department, geographical location, or even individual worker relying on a separately managed spreadsheet, while supplies, inventories, and scheduling are in an ERP or supply chain management system. Proper data preparation can accelerate the consolidation of these systems and integrate inventory data with demand and replenishment needs much faster to avoid expedited shipping fees and ensure on-time delivery. 

Similarly, marketing teams want to de-duplicate name and address data to achieve savings. Clean and complete data that includes proper demographics helps marketers target their high-cost direct mail campaigns to the right segments of people and avoid sending duplicate mailers to the same household.

Creating New Revenue Streams: Data is exploding and more abundant than ever. While companies used to relish in the idea of compiling vast amounts of data, the reality is it can hurt your business faster than help if it fails to deliver an accurate view. For example, in the insurance industry, inaccurate demographic or property information could lead to premiums that are set too low, impacting the overarching premium income. It can also impact claim adjudication, which is another big topic as most health-related services are paid by third-party payers. Data preparation tools can assist in finding data that shows improper evidence of insurance, misclassification of damages, missing codes, and more, helping payors to go back and collect claims that were paid on ineligible coverage.

In manufacturing, data products and services are emerging go-to-market strategies that can bring in new revenue streams. Data from sensors and IoT devices have become the source of subscription fees, even in a case when the device itself is a one-time chargeable component. However, before the device data can be used to create revenue-generating data products, it must first be prepared and contextualize into meaningful information.

Enhancing Customer Experience Customer experience encompasses every aspect of a company’s offering. From the quality of customer care to advertising, packaging, product and service features, ease of use, and reliability. However, data from all these experiences is distributed. Some data is in CRM, customer support, or survey platforms. Some data is collected in product usage analysis and others in various marketing tools. To discover the right patterns and early indicators of customer dissatisfaction, data from these distributed systems must be integrated and prepared for analysis, and downstream machine learning. Data preparation impacts the acceleration of this initiative.

Marketing is a good example, as they typically invest in various channels to engage with customers. Understanding customer value in each channel requires a big picture effort. It involves integrating the data from customer transaction systems (aka revenue systems) to the leads accumulated from each marketing channel and constantly experimenting with the investments against outcomes. Essentially, shifting funds from channel to the other and re-integrating the data once again, to see whether investing in one would have an uplift over the other. In practice, this is hard, but with the right data preparation solution, the process is accelerated, and the discovery of insights becomes second nature.

Ensuring Compliance: In regulated industries like finance, pharmaceutical, or healthcare, the lack of complete and accurate data can lead to millions of dollars in fines. Achieving compliance must be an ongoing focus as new regulations continue to evolve around the world.

For instance, the General Data Protection Regulation (GDPR) has set guidelines for the collection and processing of personal information of individuals within the EU, but the impact applies to most organizations globally. Data preparation solutions that have a centralized, governed data catalog, and can show the lineage of the data, ensure proper compliance by knowing how the data was collected, where is it stored, who had access to it, and who has made copies of it.

Avoiding Reputational Damage: Out of the five areas where data preparation value solidifies, this is the one that is the hardest to quantify. Reputational damage can take any form from small, everyday damage to large public relations disasters. On the larger end, for example, in banking, poor quality data and incomplete knowledge about customers can lead to accidental trade with sanctioned parties or suspected terrorist financiers, resulting in public fallout and hefty fines.

In automotive and manufacturing, the failure to properly test products can lead to external failures and customer dissatisfaction. This can harm a brand’s image that could take years to rebuild. Testing and Quality Assurance (QA) of vehicle, device, or equipment involves validating sensor data against proper data ranges and flagging anomalies and outliers. In many cases, the data from sensors is stored in semi-structured files, each collecting information from a past set interval. A data prep solution that can assess the quality of the entire dataset accelerates not only the go-to-market aspect of these products but also ensures high quality and complete verification assurance.


Data preparation is a necessity in every company, and until recently, it involved either manual coding or legacy ETL solutions. But just as with any initiative that is challenging the status quo, embarking on a new data preparation approach requires engaging and persuading executives and stakeholders across your company to see the upside. Building a business case is one way to do it and a step that an organization can’t afford to overlook.

In Part II of the series, we will discuss additional ways to build a business case for data prep by articulating how data accuracy and agility can help drive ROI.

Farnaz Erfan

About Farnaz Erfan

Farnaz Erfan is the Senior Director and Head of Product Marketing at Paxata. She has over 19 years of experience in software development, product design, and marketing roles within data and analytics space at companies such as IBM, Pentaho, Birst, and Paxata.

Leave a Reply

Your email address will not be published. Required fields are marked *