SHARE
Facebook X Pinterest WhatsApp

Whether Ghostbusting or Analyzing Data: Cross the Streams

thumbnail
Whether Ghostbusting or Analyzing Data: Cross the Streams

Harnessing data streams — joining both batch and real-time events — empowers data scientists and analysts to address sophisticated problems.

Written By
thumbnail
Pete Goddard
Pete Goddard
Oct 28, 2021

Individual streams provide data related to a particular dimension — the price of a stock, the order of a customer, the metric of a device. Analytics and applications can be served by a single stream of data, but uses are narrow and local.

Crossing streams unveils grander possibilities, ones filled with history, context, and related signals. When our Ghostbuster heroes (Venkman and the gang) needed to rise to the challenge (and defeat Mr. Stay Puft), they joined forces—and streams! The whole was greater than the sum of the parts.

In our community, data scientists, analysts, and developers are similarly called to action. Harnessing data streams — joining both batch and real-time events — empowers you to address sophisticated problems. And, as with Venkman, sometimes you need others to bring their gear and help. Here are four vital components to making the crossing of streams successful:

1) Bring together data, use cases, and people.

Accelerating innovation, maximizing efficiency, and providing flexibility are established priorities for sophisticated data systems. A nimble, evolving software backbone realizes these goals. Open-source core components provide the long-term agility and interoperability paramount for success.

Tools evolve, and sometimes you need to use that new ghost trap.

2) Future-proof your data stack with open-source formats.

Data portability has long been a sacred requirement for enterprise data teams. Walled gardens create future debt, and vendor lock-in has an unspoken long-term cost, one often paid in business drag. Store data using open formats.

CSV and JSON have been big for years, with Avro, Protobuffs, Parquet, Orc, and others recently gaining popularity. They have respective reasons to exist, but each is principled on the delivery of structured data to a plethora of independent systems, agnostic to and oblivious of the computer science downstream.

As the magnitude of data has scaled and the related financial and latency cost of moving data has compounded, the concept of open data now includes in-memory formats, not just the kind that persisted on disk. It is now often unacceptable to require data to be copied, moved, serialized, or translated in any way. In particular, Apache Arrow’s significant community benefits from its ability to serve in-memory data to a range of data processing libraries across many languages with minimal overhead, zero-copy reads, and fast access at scale.

But let’s remember, in Ghostbusters, the data was just the start of the adventure. 

3) Make joining real-time and static data a fundamental requirement.

A modern data engine must bring together data from a variety of sources. The jargon of warehouse, lake, and the centaur-like lakehouse are now common imagery. However, the growing popularity of event streams is a not-so-quiet canary suggesting static data is no longer the whole story.

Data changes. Modern workloads live in a state of flux. Real-time data matters.

Data engines and processing libraries must be architected to address and move fluidly between real-time and static data workloads. “Continuous intelligence” is a trendy phrase for systems that combine the context of history with the event signals of the moment. Modern data systems should be built to process real-time data, event streams, and other updates as a first-class competency. These should be core strengths, not add-ons, not afterthoughts.

After all, as we learned in Ghostbusters, Gatekeepers and Key Masters are a lot less powerful until they are joined together.

4) Always put the user first.

Today’s data users have a variety of skills, tools, workflows, and priorities. Coalescing a team around a shared platform serves the individual while energizing the team. Data systems that maximize individuals’ efficiency and foster collaboration drive business value.

Open data software lights the way. The intriguing mix of cooperation and competition in open projects yields an unrivaled pace of progress and ingenuity.  Organized to encourage interoperability, community development promises enhancements, integrations, and user experience upgrades. Popular paths become paved roads. Such systems make users an army of one while supporting the codependent work product required for any even moderately complex use case.

After all, one proton pack is powerful, but four working together is invincible.

I ain’t ‘fraid of no ghost.

thumbnail
Pete Goddard

Pete Goddard is the CEO and co-founder of Deephaven Data Labs, a data company building software for modern data teams. After founding quantitative trading company Walleye Capital in 2005, Pete and his engineering team were searching for ways to help quants, data scientists, developers, and portfolio managers discover and evolve strategies and signals more quickly. After witnessing how Walleye benefited from the solution they built, Pete took those engineers, the data system, and its related IP out of Walleye and formed Deephaven as an independent company.

Recommended for you...

Top 5 Smart Manufacturing Articles of 2025
Building Resilient and Sustainable Industries With AI, IoT, Software-Defined Systems, and Digital Twins
Peter Weckesser
Nov 26, 2025
Adaptive Edge Intelligence: Real-Time Insights Where Data Is Born
Skype May Be Gone, but P2P Is Here To Stay

Featured Resources from Cloud Data Insights

The Difficult Reality of Implementing Zero Trust Networking
Misbah Rehman
Jan 6, 2026
Cloud Evolution 2026: Strategic Imperatives for Chief Data Officers
Why Network Services Need Automation
The Shared Responsibility Model and Its Impact on Your Security Posture
RT Insights Logo

Analysis and market insights on real-time analytics including Big Data, the IoT, and cognitive computing. Business use cases and technologies are discussed.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.