Improve Data Lifecycle Efficiency with Automation

Organizations that look at data as they do any other critical corporate asset or resource will be the most successful.

How any organization manages its resources from acquisition through utilization impacts the value it returns (improved operational efficiency, decreased cost, increased profit, etc.) Whether it be a retailer selling, booking, and shipping, or a manufacturer making the ubiquitous ‘widget,’ they understand how materials are acquired and used to support the overall strategic goals of the organization. To increase profit, they study the use of these resources and defined processes and procedures to utilize them as efficiently as possible, garnering maximum value while reducing costs. The same type of approach should be applied to their data lifecycle.

Download Report: Why Continuous Intelligence? Why Now?

Many organizations fail to take these same actions with their data. As we become a more data-driven society, how we use information is of critical concern. How we identify, collect, integrate, and utilize information in its various forms has accelerated over the past few years, and there is every indication that this trend will continue. Information once ‘trapped’ in non-conventional spaces is being integrated with more legacy types of data to form deeper insights and provide a more holistic view of who we are and how we may react to certain stimuli, whether from a business or personal perspective.

A holistic approach to automation

How can we become more efficient with how we manage that data? Can automation be utilized to reduce human intervention, thereby reducing costs and possible error while safeguarding that information and following any necessary regulatory protocols? We will explore how the automation of internal processes can impact the value an organization can derive from its data throughout the data lifecycle.

Data Creation: The first step in the data lifecycle is the creation of enterprise data. This data can be created organically by those internal to the organization (e.g., creating new data records into an HR, payroll, or operational system), acquired from outside the organization, and migrated/integrated or captured (e.g., sensor data.).

Automation at this phase is usually focused on how that information is being collected and requires human intervention only when an error occurs. The same could be said for the introduction of sensor or machine collected data. Here again, we try to limit human intervention to those circumstances when there is an error in the capture or processing; however, it is possible to automate some data entry functions. Input forms are typically made as simple as possible with the necessary data checks to ensure that the data is being entered accurately (or meets the business rules that have been defined for it). By integrating sensor and image capturing architecture into the environment, it is possible to scan the forms that are being entered and allow the system to populate fields based on those same business rules.

Regardless of how the data is being acquired, if we can automate the validation of that data as part of the capture, we can increase the quality of the information and realize a significant cost reduction downstream by eliminating the need for manual cleansing, elimination of duplicates, etc.

Data Storage: This second step is the act of taking the created data and moving it to a secure, organized, and well-governed repository. his step of the data lifecycle is where most of the foundational data management activities take place (integration, cleansing, data quality, etc.)

Achieving visibility into business operations in real-time allows organizations to identify and act on opportunities and address situations where improvements are needed. Real-time data ingestion to feed powerful analytics solutions demands the movement of high volumes of data from diverse sources without impacting source systems and with sub-second latency. Using traditional batch methods to move the data introduces unwelcome delays. By the time the data is collected and delivered, it is already out of date and cannot support real-time operational decision-making. Real-time data ingestion is a critical step in the collection and delivery of volumes of high-velocity data – in a wide range of formats – in the timeframe necessary for organizations to optimize their value. There are sophisticated data platforms that support real-time data ingestion from sources, including databases, log files, sensors, and message queues and delivery to targets that include cloud, transactional databases, and messaging systems. Using non-intrusive Change Data Capture (CDC), these solutions can read new database transactions from the source database’s transactions and move the changed data without impacting the database workload.

In addition to the introduction of Lambda Architectures to speed the ingestion of the data through a speed layer directly into the serving layer, we must also continue to look at the supporting or ancillary functions that can continue to support the actual use of the information from a rapid use system to ensure that the quality of the data is verified and maintained throughout the process itself. Including automation of the data quality processes within the ingestion process ensures that the architecture not only provides data quickly but is verified and reliable data.

Data Use/Usage: At this phase, data is used to support the activities and strategic initiatives of the organization. This is largely known as the consumption phase, where data is turned into information, and information creates actionable intelligence.

Business Intelligence (BI) has evolved from traditional methods that required manual interpretation and technical expertise to analyze data into a truly self-service automated approach to generate in-depth analysis from complex datasets. Modern self-service BI tools powered by augmented analytics now have user-friendly interfaces that enable business users without technical and analytical skills to derive valuable insights from data in real-time. These tools can easily handle large sets of data from multiple sources in a faster and efficient manner. Machine Learning (ML) has improved speed, reliability, and ultimately the value of the most commonly used BI and Analytics solutions, leveraging immense volumes of disparate data at close to the speed of thought. Augmented Data Analytics is the use of ML and Natural Language Processing (NLP) to enhance BI, data analytics, and data sharing. This extends the value of an organization’s data foundation and provides near-real-time business information. Traditional decision support systems struggled to process data in a timely manner, whereas today, petabytes are being processed faster than ever, offering quicker time to value.

Data Archive/Purge/Destruction: This has been represented by many in a single-phase and by others as two distinct phases (Archival and Destruction). Archiving is the removal of data sets from active data repositories (e.g., production) to “cold” storage that requires less maintenance and ultimately making determinations around retention requirements and organizational desires so data can be purged from time to time (the actual destruction of the data). At this point, it is important to ask ourselves – based on the cost of storage and the value that we can derive from the utilization of data that has been archived, is it necessary for us to actually destroy the data – or at what point does the data become either non-useful or too costly?

When it comes to automation at this phase, there are regulations associated with data retention to take into account at the time of archiving and/or purging of the data. Data retention and use must now also consider the use of that information from a Data Privacy (GDPR/CCPA) viewpoint and if that detailed data can be retained or archived at all. Even so, once the business rules are defined and entered into the archiving architecture, the process will run as defined unless an error has occurred requiring human involvement.

Additional data lifecycle automation benefits

By incorporating data automation into each of the four phases, businesses can directly benefit from these modern processes. We can free the human intervention required to monitoring the results or dealing with issues encountered during each of the phases. Automation also reduces the potential for human error if data is entered and handled manually. Naturally, time and money saved come hand-in-hand with being more efficient. This holistic approach to data automation provides the business with a higher quality product to use within the course of business and reduces the reliance on IT, allowing employees to focus on more critical tasks. Organizations that look at data as they do any other critical corporate asset or resource will be the most successful. Take the next step in your data modernization journey and embrace automation from start to finish.