SHARE

Spark Summit East News Roundup

Data science-as-a-service and the cloud featured heavily in product announcements made during Spark Summit East 2017 held Feb. 7-9.

Written By

Sue Walsh

Feb 13, 2017

4 minute read

Data science as-a-service and the cloud featured heavily in product announcements made during Spark Summit East 2017 held Feb. 7-9.

Cazena announces data science sandbox as a service
Qubole unveils big data service on the Oracle cloud
BlueData supports machine learning use cases, DevOps
MemSQL releases new Spark 2 connector
Impetus launches Spark streaming contest

Cazena announces data science sandbox as a service

Analytic platform provider Cazena has announced a new Data Science Sandbox as a service. The new service allows data scientists to run a variety of analytics in the cloud without the need to build, manage or maintain any infrastructure. The Data Science Sandbox has end-to-end capabilities and embedded support for R, SQL. Python and other analytics. Users can access it through a web interface or through tools like RStudioServer or Hue Notebooks. Additional storage and processing power is available on demand, and the Sandbox includes built-in data movers that quickly and securely transfer data sets from cloud or on-site sources. It’s delivered as a service on Amazon Web Services and Microsoft Azure, and a free one week trial is available.

Qubole unveils big data service on the Oracle cloud

Data-as-a-service provider Qubole announced the general availability of its Qubole Data Service (QDS) on the Oracle Cloud. The new service provides customers with fast access to analytics to assist in business decision making.

“Cloud is rapidly becoming the dominant deployment option for big data because it provides unmatched agility, time to market and cost efficiency,” said Ashish Thusoo, CEO of Qubole. “Oracle is setting a new standard for price-performance in the cloud which makes it a perfect fit for Qubole.”

QDS offers an enterprise-grade turnkey data platform that can handle all types of big data workloads and uses open-source processing engines such as Hive, Hadoop and Spark. It features built-in auto-scaling capabilities and works with Oracle’s NVMe SSD storage. Users can combine data from any Oracle data base, data lake and third party sources. Oracle Cloud’s object storage architecture and Qubole’s built-in connectors provide cost effective flexibility.

BlueData supports machine learning use cases, DevOps

BlueData’s newest release of the BlueData EPIC software platform brings support for new machine learning use cases in addition to DevOps agility. Its self-service interface is designed to allow data science teams to create Docker-based environments quickly. They can run on shared infrastructure on-site or in the cloud with secure access to common data such as found in a HDFS data lake or Amazon S3.

“It’s time for enterprises to extend the benefits of DevOps to their data science and engineering teams, whether for real-time analytics and machine learning or other use cases,” said Kumar Sreekanti, co-founder and CEO at BlueData. “BlueData customers can bring this agility and speed to their data science operations, with the ability to create fully integrated data science environments in just a few mouse clicks — both on-premises and in the public cloud.”

Other highlights of the new release include the option for data science teams to use RStudioServer, Zepplin, or JupyterHub notebooks. They are all pre-configured as Docker images and available in the BlueData Epic App Store.

Data science environments are pre-configured for R and Python support with or without Spark, according to BlueData. This lets data teams use their preferred languages and tools. R, Python, Hadoop, SQL and Spark jobs for persistent or transient clusters can be easily submitted from the BlueData EPIC web–based interface or REST API. The new release also has H20 and Spark MLIb pre-integrated. It also includes a variety of action scripts including bootstrap to automate data science operations.

MemSQL releases new Spark 2 connector

Database platform provider MemSQL has released a new Spark 2 connector. It offers full support for all Apache Spark 2 functions, including the use of SparkSession as an entry point for the DataFrame API. The new connector offers bi-directional data moment between Spark and MemSQL and SQL push down support with filter for faster database processing.

“The new MemSQL Spark Connector with support for Spark 2.0 and 2.1 continues our journey of being the best database to store and retrieve data quickly from Apache Spark,” said Nikita Shamgunov, CTO and co-founder, MemSQL. “With data sources continuing to expand, enterprises need to implement architectures that support fast, operational analytics. The MemSQL and Spark combination empowers users to harness streaming data and capitalize on real-time analytics.”

Impetus launches Spark streaming contest

Big data software and services company Impetus is hosting a Spark Streaming Innovation contest. Participating teams will compete to build a real-rime streaming data application on the company’s StreamAnalytix visual development platform. They’ll be given a dataset and an analytical problem to be solved via the creation of a Spark Streaming application.

Registration begins on February 8 and will close on March 31. The winter will be announced on April 18. The grand prize is $10,00 with $5,000 and $3,000 second and third prizes, respectively. A student winner will also be selected and awarded $2,000.

Machine learning

Cloud technologies

Sue Walsh

Sue Walsh is News Writer for RTInsights, and a freelance writer and social media manager living in New York City. Her specialties include tech, security and e-commerce. You can follow her on Twitter at @girlfridaygeek.