Video Interview: Data Streams, Convergence, and Fusion

In our road trip video, we talk with MapR, which explains their converged data streams technology, and Objectivity, which goes into their streaming IoT platform.

What’s the best approach for dealing with data streams? In our road trip video, we talked with MapR, which has developed a converged data streaming platform that enables applications such as fraud detection.

We also talked with Objectivity, which has developed a platform called “Thingspan” for obtaining insights from IoT data streams:

Ready to start your journey to becoming an AI-first enterprise?

Transcript

Adrian: With us today is Will Ochandarena, MapR’s director of products. I know that a lot of folks that go to the RTInsights site are familiar with the company, but maybe you can just give us a little bit of a background. Then we’ll get into new products.

Will: Sure. MapR is a data and analytics company. We’ve been around for about six years now. We started out primarily focused on Hadoop, but over time have bloomed into what we’re now calling the Converged Data Platform. When we think about a Converged Data Platform, it’s everything from a converged data layer, which includes file-based storage, NoSQL database, and now event streaming, and on top of that, processing services for everything from batch processing, to SQL, to real-time processing. Putting all that together in a way that is easy to build Big Data applications on top of …

Adrian: I understand that you’re making a pretty big push into streaming data right now. Maybe you can tell us a little bit about what you’re doing and what the impetus was for that.

Will: Sure. Our new addition in the Converged Platform is this event streaming. We call it MapR Streams. Specifically with MapR Streams is, we call it “the global publish-subscribe event streaming system for big data;” so deconstructing that a little bit: The publish-subscribe gives you an idea of where it fits, so it connects the producers of data to the consumers of data in real time through a reliable channel, and when combined with stream processing technologies like Spark Streaming, Flink, Apex, technologies like that, you can build these real-time applications.

Adrian: How do you distinguish yourself in this market as it gets to be at least crowded in terms of mind share, if not capabilities?

Will: Sure. Two big things. One is the idea of convergence. By building the stream processing into the same data layer that’s doing NoSQL database and file-based storage, what we give customers the opportunity to do is build these applications that draw on all of these services in a consistent way. For example, think of a fraud detection application for financial services. In that application, it’s not just about doing the stream processing, it’s about doing the stream processing based on user-specific data that was gleaned through machine learning; so for that end-to-end application, there’s the component that needs to run machine learning on all of the historical data, to build profiles for each individual user, storing that model in a NoSQL database so that, within the real-time streaming context, they can draw on the model data of the database and actually make decisions on whether a transaction is anomalous, or in that user’s profile. All these things put together get, lend to the Converged story and make it easier to build these applications.

Adrian: When you say “converge,” you’re talking about convergence of streaming, and more static, if you will.

Will: Yeah. Convergence of all of the data services that these applications need, whether they’re batch-oriented, file-based data services, operational NoSQL database, or the stream processing; so it’s that convergence that’s one of the big differentiators of our platform. Because what it helps customers avoid is building not just clusters, but data silos where some portion of the application data sits in that place, some application data sits in that place. Building a secure end-to-end security policy between all of these systems is often very complex because each system has a different way of doing authentication, authorization. Some have security, some don’t, and of course, all of the duct tape you may need to build an application that sits on top of multiple disparate data services. All of that’s avoided with this idea of convergence. The second big differentiator is how we deal with global data.

Like I mentioned earlier, it’s a global publish-subscribe system. What we mean by global is the ability to synchronize event data between multiple clusters that are distributed worldwide. Because, as we see more and more Internet of Things-type use cases, data isn’t always created in one location, it’s created by different endpoints worldwide, and there’s the balance between collecting close to the endpoint to optimize latency, and processing in one location to figure out what’s going on in aggregate worldwide. The special thing that we do there is, in addition to just copying over messages from one cluster to the other worldwide, we copy all of the metadata about those messages: When it came in, who listened to it already, what the sequence number is, allowing for failover between one cluster and another; so really, what we provide is the single platform you bring in that you can standardize on, and every application gets the services that they need.

Objectivity

Adrian: Brian Clark with Objectivity, and you’ve been there for a long time. Objectivity is certainly one of those companies that’s stood the test of time. Tell me a little bit about what you’re up to today, and how we got to this schism. The last I talked to Objectivity was a while ago.

Brian: Today, we’re working in the areas of Big Data, fast data, particularly applied to the Internet of Things. Without object database technology, we’ve had customers sort of been dealing with Big Data problems, having fast data problems, even before they became buzzwords or buzz phrases. I’ve been with the company since almost the beginning, so about 25 years, and seen a lot of our customers deploy systems, and if you haven’t guessed by now that, under the name “Objectivity,” then our prime products are object databases. In that world, it’s all about the objects and relationships between the objects.

Another one of our products is called InfiniteGraph, so again, by the name, I guess you’ve guessed… it’s a graph database, and if you think about the nodes and edges of a graph, it’s no more than the objects and relationships in an object database. Our customers have been working in this area, mining the relationships between the data, for many years, in many different areas. In the defense and intelligence space, our customers would build in what was then called data fusion applications, which obviously, by the name, involves collecting data from multiple sensors and fusing that data to give a common operating picture, so that the analysts, or the planners, can make decisions based on the real-time information.

The first-generation systems were very brittle, built in C++, with a fixed data model. The next generation, which we call sensor fusion, they were much more dynamic, a dynamic scheme, written in Java, and basically using model-driven architectures, so if you like, the underpinnings of the work we do in the Internet of Things, where we talked about a lot of industrial, commercial sensors, was based on the work we did with the military and intelligence agencies.

Adrian: Okay, and I understand you have something new in the works in terms of a product. How does that fit?

Brian: Yes, so building on our technology and experience using objects and relationships, then we’ve announced a new product called ThingSpan. You can think of the name as spanning the Internet of Things.

Adrian: Okay.

Brian: Obviously, in that world, the things are the objects, and of course, there’s the relationships between the objects. ThingSpan is actually architected to work with Hadoop in the Hadoop environment, and particularly, leveraging Spark, Apache Spark. One of the nice things about what we’re doing is, not only do we leverage the Spark data frames for getting data in and out of the database, we’re also exposing our relationship information through data frames.

Adrian: It sounds like you have built on the object model, refined it, and now, by taking that orthogonal view — looking at the relationships as a primary thing to model, giving yourself a graph view, you’ve really, in fact, modernized what you have, but building on the past.

Brian: Yeah, I think you’re right about people, sort of, when you’re learning or growing up, you tend to think in terms of the objects and the relationships, how things are related. I think in the business world, when people are trying to solve problems, they tend to slip back into the old data view of the world and forget about the relationships.

Want more? Check out our most-read content:

Fog Computing: A Reference Architecture
Intelligent Business Operations: White Paper
7 Essential Elements in a Real-Time Streaming Analytics Platform
Netflix Recommendations: How Algorithms Keep Customers Watching
Testing Edge Processing for the Industrial IoT
Marketing With the IoT: Location and Personalization

Liked this article? Share it with your colleagues!