Queryable Dataflows will significantly reduce much of the manual overhead today associated with managing DataOps.
One of the things that most conspires to make IT a lot less agile than anyone wants or appreciates is all the time and effort that goes into manually constructing data pipelines. Not only is it a challenge to construct those pipelines in the first place, deciphering how they work and have been implemented can take days and weeks.
To make it simpler to modify existing data pipelines, Ascend is now providing a technical preview of a tool that allows IT teams to launch queries directly against data pipelines constructed using the company’s recently launched Autonomous Dataflow Service, which makes it simpler to create pipelines using declarative tools on instances of the Apache Spark in-memory computing framework hosted on public clouds provide by either Amazon Web Services (AWS), Microsoft, or Google.
Dubbed Queryable Dataflows, this new capability will significantly reduce much of the manual overhead today associated with managing DataOps, says Ascend CEO Sean Knapp.
Queryable Dataflows make it possible for DataOps teams to explore and profile large raw datasets incrementally as they build. That capability not only makes it simpler to construct new pipelines faster, it can also be employed to ensure results are accurate results before exposing data to downstream applications.
Pipelines are now able to handle staging and exploration in a way that offloads those activities from the data warehouse. In addition, interactive queries can immediately be move into production as stages within the Autonomous Dataflow Service to eliminate recoding and reprocessing.
Queryable Dataflows also helps optimize operational analytics and reporting. Data analysts and scientists can also connect directly to pipeline stages without having to first load data into a warehouse, which means they can employ their preferred tools to access data.
Previous generations of tools for managing data pipeline tools are, by comparison, little more than glorified job schedulers, says Knapp.
DataOps as an IT discipline is under increased pressure because the rate of change that occurs within IT environments has accelerated. Thanks in part to the rise of DevOps processes and microservices, the rate at which data pipelines need to be optimized or updated has exponentially increased. Manual updates to data pipelines will create some very predictable friction points between DataOps and DevOps, says Knapp.
“DataOps and DevOps need to be harmonized,” says Knapp.
In fact, that lack of harmony is what leads developers to often try and end run the internal IT team by employing an open source database to build an application only to discover over time that database doesn’t scale to meet the demands of the application. Then they wind up having to reengage with internal IT teams to move that application on to a platform such as Apache Spark or some other database.
It’s not clear to what degree DataOps and DevOps might one day converge. For right now, both areas are likely to remain domains on to themselves. However, it’s also abundantly clear DataOps processes need to improve to the point where the construction of data pipelines is no longer a major bottleneck holding back the modernization of IT.