Data Streaming at the Edge: IBM and Apache Quarks

How do you perform analytics on data streaming from edge devices? IBM discusses their latest project, Quarks, now an Apache incubator project.

Dan Debrunner and Will Marshall of IBM Streams talk about how to stream data from edge devices, such as smartphones, and conduct analytics. What happens, for instance, when the edge device has only 512 MB of RAM?

Interview by Justin Grammens, RTInsights industry analyst:

[audioplayer file=”http://811e3dd65174986d0846-ee167dcc907c3f4bd0392d5da570097a.r6.cf5.rackcdn.com/realtimetalk/real-time-talk-episode2.mp3″ titles=”Real-Time Talk Podcast Episode 2″]

Webinar Enterprise Connectivity: Unleashing the Power of Data in Digital Transformation

Download this podcast

Transcript:

Justin: Hello, and welcome to the RT Insights Real-Time Talk podcast series. I’m Justin Grammens, industry analyst at RT Insights, and your host for this ongoing discussion about the Internet of things, real-time analytics, and cognitive computing solutions that provide significant business value. This week we are excited to have Will and Dan from IBM to talk about a new product release from them called Quarks.

Quarks is defined as a model and run-time to be embedded in gateways and devices for edge analytics. Our conversation covered not only Quarks, but some of the real-world applications IBM is solving today in the areas of data, IoT, and real-time analytics. I hope you enjoy the discussion. Well, I want to thank you guys for your time. Thank you so much for being on the Real-Time Talk podcast series by RT Insights. I guess if you could just introduce yourselves real quick to our listeners, that would be a great place to start. Will, you want to go first?

Will: Yeah, sure. I’ve been with IBM a year and a half. I’ve worked on the IBM Streams project, and recently have been working on Quarks. Before that I was working at OpenStack, and a little bit of robotic stuff, but yeah, and real-time data streaming for the past year and a half.

Justin: Great, great. Dan? You want to go next?

Dan: Yeah. I’m Dan Debrunner. I’ve been working on IBM Streams project, streaming technology, since 2008, when it first came out in research, IBM research, and I thought this was an interesting technology, and I think it’s going to be the next big thing. I think that has happened. It’s just maybe slower than I thought. We’re seeing a lot of streaming activity within maybe the last two years, something like that, maybe less. Before that I was in database internals, and actually when I started out, I was working on embedded systems, so in some ways working on Quarks is coming full circle a number of years later.

Justin: Great, great, cool. I guess for our listeners, maybe one of you two guys could define just what is Quarks?

Data Streaming and Edge Analytics

Dan: What we want Quarks to be is a community for building edge analytics systems — that is systems that can run analytics at the edge of the network, primarily in the internet of things. While Quarks at the moment is a run-time and a programming model, streaming program model for edge analytics, we see it as much more than just a code. We want it to be a community. That’s why we went open source. We want to see contributions from other people, other companies involved in this space, and really define the de facto standard for analytics at the edge networks on constrained devices.

Will: One of the questions that we get a lot is what is the difference between Quarks and other current streaming technologies like Spark Streaming, Node-RED, Flink; and one thing that we realized working next to these technologies is that they don’t extend to the edge very well in that they’re cluster-based technologies. Spark might not be the best solution to run on the edge device. Quarks is written from the ground up to be on an edge device, on an Android phone or on a Raspberry Pi, and deal with that data.

Justin: Got it. Got it. Excellent. Yeah, I actually went out last week some time here and downloaded the source code, and went ahead and compiled it. It was up and running here within five minutes. It was a really, really great experience, and I was able to generate both real-time data, but also the simulator that you guys have that will sit there and kick out data. My background is in software. I have a company that deals a lot with streaming data technologies. What is the intended and current Quarks user base right now?

Dan: I think that comes from … As we said, we both worked on the IBM Streams, their streaming product, and we have a number of customers that are using that technology in the IoT space, but just at the back end. What we saw is that they came to us really with a need, and talking with them about I guess two main problems. The one was how do I reduce data volume that I send from my devices to my central system for analytics, because the cellular bill is becoming very expensive?

The other generic use case was we need to run analytics even when we can’t get to the back end, and they understand that the analytics you may run at the edge won’t be as rich as the analytics at the back end, but if they’ve gone out of cell phone service area or they’re just not connected, they still want to be able to take some actions based upon potential failures with the edge devices.

Mobile Streaming

Justin: Got it. Got it. Yeah, so those are two specific use cases you’re seeing in the market as driving the need for this type of software to be written and run on the edge device?

Dan: Correct. Streams is used on the back end with companies like Ford and SilverHook Powerboat Racing, and so anyone that’s really dealing with lots of devices out there that are moving and have unreliable network or expensive networking options.

Justin: Now, this came out of the IBM Streams project? Is that correct? What’s the genealogy, I guess, of Quarks?

Will: That gets back to what I was saying before where yeah, we worked on IBM Streams, and one of the things we realized was that it didn’t translate well to mobile devices, edge devices. Going back to one of the use cases, IBM Streams was used in the Tour de France. It was used to monitor the locations of the cyclists and to do analytics on them. One of the issues was that they would lose connectivity. Let’s say for example, one of the cyclists falls over in a dead zone, in a network dead zone. Then if you want to know about that immediately or if you want to do some action on that in the dead zone, then some other technology would be needed. That’s one of the idiomatic use cases for Quarks, is Tour de France, dead zone, what to do without network connectivity.

Justin: Good. Yeah, yeah. Very good. What’s the performance of Quarks? Is it writing on top of something like MQTT or some type of a queuing mechanism?

Dan: Quarks by itself, to maybe step back a little, what Quarks is … in some ways I describe it as a software development kit for the edge. Because we wanted to run on constrained devices, we didn’t want to make it some big, monolithic thing. It’s very modular. You can pick and choose the pieces that you want. Our model is … I guess one model is you just run at the edge and you’re self-contained, and you don’t communicate with anything. That’s probably unlikely. Our more typical model is that yes, it’s running at the edge, it’s doing real-time analytics. Then it’s sending information back to back end systems over something like MQTT, IBM Watson IoT platform, maybe if you’re running in a more corporate environment where you’re just running maybe a copy of Quarks on every machine in your machine room to do some analytics about the health of those devices. By itself, we just piggyback on any particular message hub that people want to use, and the system has modular connectives, whereas if there’s some IoT scale message hub that we don’t support, it would be very easy to access.

Dan: On the focus, our focus is running in constrained devices, so that’s why we started out with all our samples running on the Raspberry Pi because physically it is a small device with a fairly low CPU capability, and I think about 512 megabytes of RAM. The newer versions have more RAM. I’m running on a Raspberry Pi that’s probably three or four years old. Now, we’re not restricted to Raspberry Pi. It’s just a very convenient platform to demonstrate that you can run on a constrained device.

Justin: Sure, yeah. I saw the examples, at least the ones that I came across were in Java, and I heard you guys mention Android earlier on. Anything with iPhone at all, being able to push data out that way?

Will: Right now we’re in Java only, so that’s a couple different flavors of Java – Java 7, Java 8, Android Java, and we’re not working with Swift right now. We have thought about the different languages that we want to support, but I think right now the higher priority is flushing out Quarks in Java as it currently is. I think the next possible step, just to list them – Swift, Scilab, Python are potential languages that we would want to support, things that people would use, data science-wise, but also just targeting large user bases of people into a type of a large user base. Like I said, that’s not what we’re focusing on right now.

Justin: Sure, but it is open source like you said, right? If there’s stuff that comes in from developers and stuff, you’d be open to contributions from the community I’m guessing, right? You could maybe leverage from that.

Apache Quarks

Dan: Yes, we’re very open to that, and part of the path we took down was we could, based upon our experience with our IBM Streams product, we knew we could produce a small Java run-time very quickly, open source that to start a community, and use that as the platform to show people the concept, really. As you said, we go from Quarks over a message hub to a back end analytics system, and to get people excited about it, and then to actually go into real customers, as far as IBM is concerned, and say, “Okay, we have this concept. Where do you actually need to run it on the edge?” Because we probably guess that not every edge platform is going to be able to run Java.

We had one example with a major modem manufacturer from the US who said, “I don’t really care what it’s implemented in at the moment. I just want to see the concept.” We took that approach and ran with it, and so far I’ve actually been surprised, but the companies we have talked to about actually doing real deployments are saying that yeah, they just have a full Linux machine down there on the piece of hardware that’s on the edge that can run something like Java. But I would hope that people from the community get involved and do an open Swift implementation or even maybe a C or C+ implementation, because I think the more easy we can get, the greater its option will get, and it’ll be good to set up that de facto standard at the edge.

Justin: Sure. Very good, very good. Well, I appreciate your guys’ time. Do you have anything else you’d like to add here at the end with regards to the project?

Dan: I guess our latest news is that we have been accepted for incubation at the Apache Software Foundation.

Justin: Congratulations.

Dan: We are now Apache Quarks, an incubating project with the normal disclaimers. We just got the code over there yesterday, I think, was it?

Will: Yeah.

Dan: Please, anyone listening, feel free to join the developer list and get involved.

Justin: Great, great. Well, Will and Dan, thank you so much for your time. I appreciate it.

Will: Yeah. Thank you, Justin.