6 Big Data and ML Takeaways from Strata 2017

The recent Strata Data Conference offered a look at the future of AI, big data, data science, machine learning, stream processing and more.

The Strata Data Conference, held at the end of last month in New York, pledged to bring together the leading minds, and most promising new ideas, in big data. Between AI, big data, data science, machine learning (ML), stream processing, and more, it was a sophisticated, productive, and ultimately fascinating look at the future of this fast-growing industry.

A few ideas and innovations percolated into the conversations held during and just after the event.

1. Hadoop is (still) everywhere

While the conference itself is now called only “Strata” instead of “Strata + Hadoop World,” Hadoop itself hasn’t disappeared.

Webinar Enterprise Connectivity: Unleashing the Power of Data in Digital Transformation

That’s because, in part, big data companies are moving away from the idea that a single Hadoop/ HDFS data lake is the best path forward. Instead, they’re promoting multi-cluster data environments combined with analytics applications that can pull different types of any number of sources. That’s what Hortonworks is pushing for in its new Dataplane Service, for example. Hadoop is still the underlying technology, but as companies build ever-more elaborate multi-cluster applications on top of it, the name gets buried beneath a heap of other technical jargon.

2. Support for Azure is growing

Always a second or third player in the big data, analytics, and ML space, the buzz around Microsoft’s Azure cloud is growing. The company rolled out significant changes to its SQL Server relational database, which now complements the HDFS object storage Data Lake Store. The Azure Machine Learning Workbench aims to be an end-to-end data science solution and is compatible with Jupyter notebooks and Microsoft’s own Visual Studio Code application.

On top of this, more companies are supporting Azure storage with their applications, including Cloudera’s Altus Data Engineering, WANdisco Fusion connecting with HDInsight, and more. If Microsoft keeps up the momentum, they just might become a go-to resource for ML enthusiasts looking to store and analyze the multiple object stores they have kicking around.

3. GPUs are still the way to go

While AI-focused computer processors are growing in popularity, the graphics processing unit (GPU) is still the go-to hardware resource for companies wanting to do complex parallel processing. NVIDIA remains the industry leader in this space, and isn’t sitting on its laurels.

The company announced new partnerships with Kinetica and H20.ai that take advantage of NVIDIA’s DGX Station, which the company bills as “the world’s first personal supercomputer for leading-edge AI development.”

4. Machine learning is a core technology

Machine learning is no longer a value-add or a market differentiator—customers expect it built into their data and analytics software, and the industry is responding in kind.

Microsoft announced the release of its redesigned Azure Machine Learning service, which also automates data preparation. The Cloudera Data Science Workbench aims to make ML more accessible for “analytics teams working at scale,” while IBM announced the Data Science Experience, which give data scientists a suite of tools to create ML applications in a social environment.

5. IoT lacks ML’s and big data’s open source framework

The Internet of Things (IoT) is again falling out of favor as ML takes center stage. One of the primary reasons for this is that IoT industry has yet to agree on any standards or even an open source framework that could solve issues with security and interoperability.

Data lakes have Hadoop, whereas IoT is still searching for its trendsetter. And that means lots of companies just aren’t willing to invest as heavily in it as they might want to, for fear of security, or that the industry will move in a different direction next year.big

Plus, ML doesn’t require hardware to implement, not with all the user-friendly platforms and APIs that exist these days—simply build a model, throw some data at it, and see what sticks. That’s a compelling argument for companies wanting to do more with their data as quickly as possible. Could that change in the year or two to come? It’s possible, but it seems more likely that between now and next year’s Strat we’ll see more big data and ML innovations rather than IoT. Maybe the Internet of Things needs a rebranding.

6. Meanwhile, GPUs get another

The newly announced GPU Open Analytics Initiative (GOAI) is a collaboration between Anaconda, H2O.ai, and MapD Technologies, to initially create a common Data Frame and corresponding Python API that allows end-to-end computation on the GPU via in-memory copying rather than transferring back and forth between the CPU and GPU.

[ Content Hub: Center for Cognitive Computing ]

GOAI says it will “will foster the development of a data science ecosystem on GPUs by allowing resident applications to interchange data seamlessly and efficiently.” In practice, this could look similar to the following: “Users of the MapD Core database can output the results of a SQL query into the GPU Data Frame, which then can be manipulated by the Anaconda NumPy-like Python API or used as input into the H2O suite of machine learning algorithms without additional data manipulation.”

In the long term, this could be more of exactly what IoT is missing: various companies, oftentimes competitors, coming together with the understanding that if they make their entire industry more accessible, everyone benefits.

About Joel Hans

Leave a Reply Cancel reply