How Do You Integrate Real-Time Data? Informatica’s Approach


A key challenge for real-time analytics is simply getting a handle on all the data.

In a modern business, real-time data might be flowing from different devices or processes—machine sensors, application and IT logs, web clickstreams, or gateways. It might then move to a relational database, the cloud, disk storage, a Hadoop distribution, or a data warehouse.

Integrating that data is crucial for real-time analytics. In predictive maintenance, for example, real-time data from a variety of machine sensors must be compared against an analytic model built on historical data. Or consider healthcare. A heart monitor, a blood pressure monitor, and oxygen monitors might be streaming real-time data from an ill patient; then there are lab results from medical equipment.

“Having the ability to take data from many individual or siloed systems, combining the datasets together with other sources such as patient history and previous lab results, and generating a more complex or complete picture of what is going on with the patient – that is a very powerful example of why data integration is so important,” said Rodrigo Sanchez Bredee, senior director if IoT product management at Informatica.

How Informatica Gets Data to Vibe

Informatica has been named as a leader in Gartner’s Magic Quadrant for data integration for ten consecutive years, for data quality for nine consecutive years, and for master data management for six consecutive years. It wasn’t until 2013, however, that Informatica launched its real-time data integration platform, Vibe Data Stream.

How does Informatica integrate real-time data? A key element of Vibe Data Stream is what Informatica calls a “brokerless” ultra-messaging (UM) technology,  which uses a subscribe-publish model. The agents are Java Virtual Machine nodes that can be deployed on devices and subscribe to data from connections such as TCP, UDP, Syslog, SNMP, MQQT, and others.

The use of agents in VDS offers a"nothing in the middle data hop." Art: Informatica.
The use of agents in VDS offers a”nothing in the middle data hop” according to Informatica.

 Agents talk to each other using a control protocol. Not using a broker reduces bottlenecks when scaling out, and “fewer hops along the way reduces latency,” Bredee said. Better reliability and efficient network usage are also benefits, he said.

The agents can then stream millions of records per second to big data platforms and targets such as Apache Kafka (a connector allows Spark Streaming), Hadoop, Cassandra, and complex-event processing platforms, such as Informatica RulePoint.

A graphical user interface on VDS allows customers to visually map source-to-target patterns and configure messaging, including some basic transformations such as filtering and time-stamping.

The VDS graphical user interface allows point-and-drag data connections
The VDS graphical user interface allows point-and-drag data connections

Use Cases for Real-Time Data Integration

One client of Vibe Data Stream is ConocoPhillips, which is connecting VDS to obtain information from their numerous oil and gas well sites in the United States. Normally oil companies make decisions on which regions to pump on a quarterly or monthly basis. The vision is to allow ConocoPhillips to make those decisions daily or even more frequently based on parameters such as weather conditions, workforce availability, as well as demand and pricing for different petroleum products.

Under the architecture, Vibe Data Stream agents will listen to the database used by the well sites to capture and analyze changes as they happen.

VDS also supports condition-based maintenance programs for oil and gas pipelines, which are normally inspected and maintained using a calendar system, but can now be inspected and maintained based on the condition of the pipeline – by either extending its operational cycles or shortening the cycle, all based on the data.

“What we’re trying to get to is a more specific condition maintenance where maybe you can extend some of the cycles and leverage the availability to pump a little bit more or, the opposite, pull in or shorten the cycle between cleanings. Ultimate we want to give our customers a way of deriving insights that they can operationalize,” said Bredee.

Another use case is pricing optimization at gasoline stations. While normally gasoline companies had to wait a day to change prices, regulations in India were recently changed to allow for intra-day price changes. Informatica could not disclose a client but said Vibe Data Stream will be embedded into a local gateway that will inform a gasoline company for demand and use of different grades of gasoline at its stations.

“We’re actually going to employ tens of thousands of agents, collecting data from these gas pumps essentially and moving it to where the client can make pricing and supply decisions,” said Bredee.

Blue antenna mast sign

While data is being moved, agents can also be configured to filter, parse, and enrich that data. One example is with cellular tower data. Vibe Data Stream will collect and filter CDR and log data from a cellular tower, adding metadata such as location, then move it to Informatica PowerCenter Real Time, which is an enterprise data integration platform. The result is that a cell-phone carrier can identify high-value customers who had negative experiences such as dropped calls, then offer those customers something to compensate for the poor experience, such as a coupon.

Looking ahead, Bredee said real-time data integration also has a use case with connected cars.

Supposed for example a car’s data collection system finds a huge pothole on a high-speed road. If the car is able to notify other cars behind them within a one-mile radius, it could prevent accidents.

“That is another great use case where time is of the essence and the volume and velocity of the data is high,” said Bredee.


Striim’s approach to real-time integration with change data capture

Why data integration needs to evolve for the IoT

Chris Raphael

About Chris Raphael

Chris Raphael (full bio) covers fast data technologies and business use cases for real-time analytics. Follow him on Twitter at raphaelc44.

Leave a Reply

Your email address will not be published. Required fields are marked *