Enterprises are struggling with real-time data quality control.
Almost 90 percent of companies report bad data polluting their data stores, according to a June 22 survey from Dimensional Research.
Sponsored by StreamSets, the survey, which polled 314 data professionals, found that enterprises are struggling with real-time control of data flows, with lack of visibility, security, and control of those flows.
“The industry has long been fixated on managing data at rest and this myopia creates a real risk for enterprises as they attempt to harness big and fast data,” said Girish Pancha, CEO of StreamSets.
Nearly 66 percent of respondents use ETL/data integration tools and 77 percent use hand coding to design their data pipelines.
These tools “do not let you watch the data in motion, which means you are flying blind and can’t detect data quality or data flow issues,” StreamSets stated.
The survey also stated that 68 percent of respondents said ensuring data quality was their most common challenge yet only 34 percent had confidence in their ability to detect divergent data. Fifty-three percent of respondents change each pipeline multiple times each month. Other challenges cited were upgrading data infrastructure (40 percent) and complying with security and privacy policies (60 percent).
The survey asked the respondents to rate their performance across five key data metrics:
- A specific data flow pipeline has stopped operating.
- Data flow throughput is degrading or latency is growing.
- Error rates are increasing.
- The values of incoming data are diverging from historical norms.
- Identify personal information within the data flows.
Only 12 percent rated their performance as excellent across all five. Nearly half said being able to detect real-time data quality issues would be of value to them. When asked if they would like a single portal to access and manage all data, 72 percent said that would be valuable to very valuable to them.
The full report can be accessed here.