“When we look at what’s behind the dynamic growth in the big data arena, right now we see it at Apache Spark.”
TIBCO’s Accelerator for Apache Spark consists of 40 out-of-the-box building blocks to speed implementations of Spark. In the video below, Hayden Schultz, global architect for TIBCO, explains how the accelerator can work with big data and machine learning, and how it speeds time to value for business customers.
Adrian: We’re here today with Hayden Schultz, global architect for TIBCO. We’re going to talk about some new products that are coming out. Let’s start with, tell me a little bit about yourself, and how you came to TIBCO.
Hayden: I was working for StreamBase, a small startup in the Boston area that makes the StreamBase even- processing product that TIBCO acquired in 2013. I’ve been working on event processing for the past almost 13 years, with StreamBase doing mostly financial market things, and high-frequency trading, currency trading a lot of that, a lot of asset classes. Now at TIBCO on StreamBase. We move into all kinds of things. Vehicles, internet of things, and today we’ll talk about big data.
We see a lot of solutions that we find repeatable that are not necessarily obvious from the customer’s point of view. We have a lot of different products, and rather than just giving the customer a bunch of pieces, a bunch of parts, they can build complicated things together. What we’re doing is we’re putting [things] together in a common application framework. We build an application…the case we’re going to talk about today [is a] big data application. What it does for customers, it gives them an example of how the products [get] put together, and how they work with other technologies in a scalable, reliable, best practices solution. That you can take this and customize it for something else.
We’ve been calling it a product, but this is something that we’re giving out. This is not something that we’re selling. Example implementations … there are full systems which you can take and customize to user software. We’re releasing them with an open-source license, you can do whatever you want with them. We started focusing on Hadoop. Hadoop was in there, it’s part of fundamental building block that Apache Spark uses. When we look at what’s the dynamic growth in the big data arena, right now we see it at Apache Spark.
One way of thinking about this is while the accelerators are a new thing that we’re giving out, the various components are not really new in a sense … any of our customers could have taken our products, and built systems on top of big data clusters. .. built systems on top of Spark. They can do that without us. What this does, is let someone who’s new to it use these components. We have an example of a customer, what they’re doing with StreamBase and Spark, is they have their currency trading algorithms written in StreamBase.
What they want to do, is they wanted to do back testing, which means they have a large amount of historical data, and they want to know whether their new algorithms performed better than their old algorithms. Now if they knew what the new currency data was going to be like, that would be easy, it would be no challenge at all. But they don’t, what they do is they store in their big data cluster, they store all their financial trades that happened. They store all of the raw, the raw quotes that happened every day.
What they do to train their new algorithm, which is evaluated and compared against their old algorithm — they take six months of data, they partition it into one-day chunks, which turns out to be like 136 different eight-hour chunks of data. Then they run the new algorithm where StreamBase is running inside a Spark Cluster.
They take 136 simultaneous partitions of data, and they run them all in their cluster, and they end up training six months worth of data in under an hour.
Hayden: It used to take them, when they started doing this, it was many, many hours. They would typically think it was in days. Now it’s under an hour. They can do a lot more experimentation, they can evaluate different changes, and get their new versions of their algorithm to investors. …The accelerator is a sample solution, it assumes that you are using the TIBCO products. Now you can swap in and out other components if you want.
What we start with is StreamBase for capturing the data. One typical example is you have a Kafka Bus. I mean that’s the initial source of the data or JMS Bus, or just a socket or web services. There’s adapters we used to connect to the data source. Those are very simple StreamBase applications that connect the data source, and then write it into directly HDFS or possibly write it out using Flume. Then it’s in the big data system. Once you have a large portfolio of data to look at, what you can do is you’re running your data analytics at the … On the data scientist that comes out, and uses in the TIBCO Stack, they would use Spotfire.
They’ll look at the data, connect using there’s a newly certified by Databricks Spark connector from Spotfire in Spark system. They can run Spark SQL command, they could run our commands directly from Spotfire. They can even use care or the TIBCO Enterprise Run Time for R to analyze the data as well. In the case of the accelerator for Apache Spark, what we do is, Spotfire prepares the data, understands the relationships and once that’s done, it uses Sparkling Water, an H2O.ai layer on top of Apache Spark.
That trains a machine learning model. Okay, and the machine learning model is now saved, it’s trained and saved inside the big data.
Adrian: If I were to sum it up, and if I were talking to potential customer of yours, it sounds like the important thing is that when you’re looking at from accelerator, the acceleration what you’re accelerating is the time to value, it’s another used term. But it’s the time to deliver something that’s usable, because you have this template.
Hayden: Exactly, if you look at why new projects fail, and at large customers, the large companies a lot of projects failed. The reason i, it takes too much investment on the company’s part before they start showing some return. The idea here is, “We’re going to give you something that’s already, it’s front end already and all you do — instead of all of the plumbing, that’s done — you work on the business logic.”
Adrian: For someone that hasn’t been working with you, how can they get started?
Hayden: The download us from the TIBCO Accelerator download sites. They have the full source to everything. It’s a totally open product, do whatever you want with it.