MapR Releases New Ecosystem Pack Optimized for Apache Spark

PinIt

MapR Ecosystem Pack 3.0 provides enhanced integrations with Spark 2.1, as well as analytics with Hive 2.1 and business intelligence with Drill 1.10.

MapR Technologies, Inc., a converged data platform provider, has announced the release of the MapR Ecosystem Pack (MEP) program.

MEP is made up of a collection of open source ecosystem products that allow big data apps running on the MapR Converged Data Platform to have inter-project compatibility. New features of MEP Version 3.0 include new Spark connectors for MapR-DB and HBase, integration with Apache Drill, a faster version of Hive and improved security for Spark.

“The adoption of Spark and Drill continues to advance at a fast pace with enterprises worldwide,” said Will Ochandarena, senior director, product management, MapR Technologies. “With a regular cadence of ecosystem updates that make it easier to adopt for production use, our customers immediately benefit from rapid open source innovation with the reliability, scale and performance of the Converged Data Platform.”

According to the company, other key features of the new release include:

Apache Spark 2.1.0
The Spark 2.1 release focuses on improvements in enterprise-ready stability and security including:

  • Scalable partition handling
  • Data Type APIs graduate to “stable”
  • More than 1200 fixes on the Spark 2.X line
  • Provides for secure connections using MapR-SASL in addition to Kerberos for inbound client connections to the Spark Thrift server and Spark connections to Hive Metastore
  • Support for impersonation on SELECT statements

Native Spark Connector for MapR-DB JSON
The Native Spark Connector for MapR-DB JSON makes it easier to build real-time or batch pipelines between data and MapR-DB while leveraging Spark or Spark Streaming within the pipeline, MapR stated. Designed to be highly efficient and simplify code development, the Native Spark Connector includes:

  • Two new APIs that allow you to load data from a MapR-DB JSON table to a Spark RDD or save a Spark RDD to a MapR-DB JSON table
  • A custom data partitioner for better performance
  • Data locality of MapR-DB to launch Spark executors when it reads data

The release also includes Apache Drill 1.0, which has been given additional tools for BI, end-to-end security, and usability. It offers native connectivity for Tableau, support for Kerberos & MapR-SASL authentication, and improved compatibility with Hive/Spark generated Parquet files.

Related:

Big data platforms: Spark and Hadoop

Why Apache Spark is so hot

Sue Walsh

About Sue Walsh

Sue Walsh is a freelance writer and social media manager living in New York City. Her specialties include tech, security and e-commerce. You can follow her on Twitter at @girlfridaygeek.

Leave a Reply