A recent Eckerson Group event for CDOs focused on how to evaluate, select, and implement data pipelines and products.
The rise in the growth in data due to digitalization and other efforts, combined with the need to gain actionable insights from that data, is placing a new focus on data pipelines. The demand for access to what is often very high volumes of complex streaming data is outpacing the ability of IT and data engineering departments to provide that access. As such, businesses require modern intelligent data pipelines that automate many of the processes in the lifecycle of data ingestion to the final analysis.
These issues were the focus of a recent Eckerson Group, “CDO TechVent for Modern Data Pipelines: Practices and Products You Need to Know.” The event overview notes that “as enterprises democratize data consumption and invest in advanced analytics, they need ever-higher volumes of complex, fast-moving data. To meet this demand, data teams need to accelerate the development of data pipelines, automate their execution, and continuously validate the output quality. And along the way, they need to master the data lifecycle, from ingestion and transformation to testing, orchestration, and monitoring.”
The three-hour event features speakers from various companies and organizations trying to address data pipeline issues. As the title notes, it was aimed at chief data officers (CDOs) to provide information to help them evaluate and select data pipeline products and learn best practices for implementing them. Here is a summary of the sessions:
Building Data Products on Snowflake using DataOps
Speakers: Guy Adams, Co-Founder & CTO, DataOps.live, and Mark Bartlo, Sr. Sales Engineer, DataOps.live
This talk focused on a point that many face with respect to building data products. For any data product effort to have a real impact, businesses must use the right methodology and tools to build and manage them.
The speakers noted that businesses must bring the best aspects of DevOps to data into what many are calling DataOps. By adopting such a methodology and using the right platform, businesses can improve developer productivity without compromising agility and governance.
The speakers then lead the audience through the journey from automated development, orchestration, observability, and deployment to effective lifecycle management of date products. Throughout the talk, they noted how the business stakeholders could benefit from this modern approach. They concluded the session by profiling how a major pharmaceutical research company was able to scale out to 50 data products in less than 18 months.
See also: The Antidote for Congested Data and Analytics Pipelines
Data Product Based Design Pattern for Data Integrations
Speaker: Avinash Shahdadpuri, Co-Founder & CTO, Nexla
This session talked about the need to take a product-centric approach to data. The right approach can simplify how every data management task is done. The speaker noted that data integration is at the heart of most data product efforts. As such, businesses need a comprehensive data integration design pattern when creating and consuming data products. It then went on to talk about how logical data products extend this design pattern to enable multi-speed data integration.
Liberate Your Enterprise Data for Cloud Analytics
Speaker: Mike Pickett, VP of Growth, StreamSets
This session noted how while cloud platforms have revolutionized the world of analytics, many companies still face challenges in transferring their most valuable and comprehensive data. The reason: the data is stored in enterprise systems. That data must now be moved to cloud data environments for analysis. The speaker then covered how businesses can overcome typical obstacles to freeing up enterprise data. Once that is done, the speaker discussed how integrating this data can improve analytics, refine financial and regulatory reporting, streamline operations, and enhance the customer experience.
The Best Data Pipelines are the Ones You Never Build
Speaker: Mark Van de Wiel, Field CTO, Fivetran
The scope of this session was about how cloud computing commoditized access to data center resources. The speaker noted that among the many benefits, this introduced businesses to sheer infinite scalability at the click of a button using a pay-as-you-go scheme and provided access to ready-built machine learning routines.
To some, this seems to imply that all you have to do is bring your data. But the speaker cautioned that this is not as easy as it sounds.
To Code or Not to Code ELT Pipelines: That is the Question!
Speaker: Elesh Mistry, Lead Solutions Engineer, Rivery
Okay, in the world of typically sedate session titles, let’s give the speaker props for making this one entertaining and engaging.
The speaker discussed whether paying for a SaaS ETL/ELT solution is ridiculous when you can script a data pipeline yourself. The session unpacked the pros and cons of coding your own data pipelines, considered the costs of the different alternatives, and then provided clear guidelines for when businesses should code or not code their data pipelines.