Automation of the Data Lifecycle: Focus on Data Creation


A deeper look into just how automation adds value at each of the phases of the data life cycle and how automation at this level impacts the business (data) consumer.

In our previous article, “Improve Data Lifecycle Efficiency with Automation,” we discussed how and where automation takes place throughout the data lifecycle.   We discussed each phase and summarized how automation has increased the speed and efficiency in how we identify, collect, integrate, and utilize information.  In this piece and in the ones to follow, we will take a deeper look into just how automation adds value at each of the phases of the data life cycle and how automation at this level impacts the business (data) consumer.

The first step in the data lifecycle is the creation of enterprise data.  This data can be created organically by those internal to the organization (e.g., creating new data records into an HR, payroll, operational or transactional system), acquired from outside the organization, and captured autonomously with no human intervention (e.g., sensor data.)   Below we will focus primarily on the autonomous data creation and how that data impacts the individual/consumer of this data (e.g., how automation at this phase of the data lifecycle is providing value.)  We will look at the creation of this information from two perspectives, a purely automated view as well as one where the data is input by human means and how automation can be used to validate the information at the capture point.

Automation and data creation

In the case of automated data creation, data is captured via machine and entered directly into a data repository/mart/warehouse – which can offer significant insights if effectively managed.  An example of this automated data creation and capture rest comfortably in the palm of our hand; the mighty cellular telephone.   Long before cell towers and voicemail, the telephone companies were capturing information from switches to determine who made the call, who received the call, its duration, and ultimately what to charge for the call.   This was then printed on our monthly phone bill, which most dutifully paid.  There was no human intervention at this point, either in the creation of or the collection of the information.  Automation in the gathering of information has advanced, as have the tools that we now use for the collection of that information.  Today, cellular towers, oil rigs, and pipelines can be fitted with sensors that provide more real-time information that, when collected and analyzed, can provide near-immediate value.  It is this value that we will focus on over the next few paragraphs.  Alternatively, today data can be captured at its origin, analytics performed, and only the resultant data moved to the data mart/warehouse.  Calculating data at the edge reduces the cost and effort of ETL (Extract Transform and Load) prior to adding the data to the Warehouse/Mart.

Regardless of whether the actual or the analysis resultant data is moved to the data store, these automated data creation, capture, store, and analysis capabilities provide opportunities to leverage critical, real-time data for use across multiple business use cases to improve organizational efficiency, streamline cost, increase profitability and ensure the safety and health of employees.   For example, the process of Predictive Asset Maintenance and Asset Optimization has been significantly improved with connected devices that operate without the need for manual intervention helping identify and schedule preventive maintenance – prior to an asset having to be taken out of service or replaced.  The reduction of asset downtime and the identification of potentially dangerous failure provides significant value to the organization, their reputation, and bottom line (think catastrophic asset failure on an offshore oil rig that could be predicted and repaired proactively.)  Automation of this type is not limited to data coming from sensors.  The use of drones and image analysis is gaining significant traction across multiple business verticals and helps to automate and enhance human ability to analyze and predict potential issues with organizational assets more efficiently.  Cellular towers and cargo ships (amongst other things) can be inspected by drones that provide tens of thousands of images for analysis in determining if there are issues that should be resolved prior to a system failure or determining where the failure is once one has occurred.  Drones are used to detect crop damage or infestation on farms where again, the images are analyzed by AI rather than humans to look for possible problems.  Further, the use of new, advanced sensors (infrared, radiation, heat and water) is taking root, and these IoT devices can be placed in areas that are unsuitable for the human.  These advanced devices provide vital data around asset failure, asset adherence to standard operational norms, and support the tracking of throughput, capacity, etc.  In these cases, we have created the data without manual intervention and have been able to utilize that data (in many cases in a non-human intervention process as well) to resolve issues.

Then there are the cases where a combination of automation and manual process is used to collect data. From a scanner perspective, we are all familiar with the use of scanners at the corner store or supermarket to read bar codes and determine not only the product but the price, rather than having the cashier enter the price of each item into the register (you may not remember those days, but we certainly do).  In this case, automation is used to reduce or eliminate the entering of incorrect pricing information, which could result in either an overage or underage of the amount paid – assuming that the pricing information has been entered correctly. 

A good example being an Amazon warehouse.  When items are placed into the warehouse, the number of items and the storage location are scanned.  In this way, location and quantity information are entered into the organization’s internal system.  The same is true when items are picked.  A picker is told to go to a specific location and pull X number of items from that location.  Both the location and the items are again scanned as the items are placed into a bin.  While there is no manual entry of data, in this case, the system checks the validity of the location and the scanned items to ensure that there is a match and that both items were scanned correctly (valid location numbers and SKUs).  Items are again scanned prior to packing, as is the packing slip, to ensure that all the necessary items have been packed into the correct box.  While there is still some possibility of erroneous process occurrence, the possibility of entering incorrect data has been eliminated with the scanner.   The result in both cases is a decrease in the manual intervention necessary to enter the vital data into the transactional system.  This results in both a faster process with more accurate data capture, which supports prediction, as well as planning/forecasting/projecting. 

The value of automation as part of the data collection cycle will continue evolving, as does technology.  Collection devices will evolve from autonomous (working on their own) to connected (as part of the IoT) to coordinated (acting in concert with one another with minimal or no human intervention).  As more and more information becomes available from these types of data collection devices, man will be able to better control the resources he has available, becoming more productive. As an example, consider agriculture where autonomous, connected, and coordinated devices manage crop production, irrigation, ploughing, etc., to better improve crop yield.  

It is important to note, however, that while we can use automation in the collection of data, we must still take the time to verify that data for completeness and accuracy rather than blindly relying on it.  Standard data checking processes, as well as any business rules, should be applied, but as we have eliminated the actual manual entry process, experience has shown that the actual quality of the data is exponentially better.  Of course, we also look at automated processes for the validation and cleansing of the data once it is collected by any means, again decreasing the time it takes to have access to the data and increasing its validity and value – so long as the information used to cleanse and validate the data is accurate.  But that is another article.


About Scott Schlesinger and Aaron Gavzy

Scott Schlesinger is a data, analytics, and AI professional with over two decades of experience helping client organizations make faster and more informed decisions leveraging business intelligence, analytics, AI, and data management technologies. Mr. Schlesinger is a digital strategist, innovator, and people leader with demonstrated success in building and leading large consulting practices as a senior executive/Partner within the Big 4 and global consulting firms/system integrators. Aaron Gavzy is a Lead Data and Business Strategist focusing on Global Data, Analytics, AI, and Advisory Services. He has over 35 years of demonstrated experience in the development and delivery of innovative strategic directives for solving business and tactical issues across a variety of industries. He is a recognized thought leader in the areas of Data Strategy/Governance/Privacy as well as Analytics and Organizational Change. A speaker and author who has led consulting practices at large multi-national consulting firms, he has also served as the CIO of a Health Care firm and the CFO of an Advertising Agency.

Leave a Reply

Your email address will not be published. Required fields are marked *