Qubole Aims to Optimize Big Data Analysis


Qubole aims to optimize analytic workloads with automation and predictive analytics.

Qubole has announced it will launch an “autonomous data platform and intelligence” product to ease the use of big data analytics in the enterprise.

The volume, velocity and variety of data has been increasing every day, “and the traditional approach of solving this challenge by adding more resources to the data team is no longer viable,” said Balaji Mohanam, senior product manager at Qubole. “Due to complexity, projects take longer and are more expensive to become successful. And when you bring big data and cloud computing together, that compounds the complexity, which makes it difficult to deliver business value.”

Qubole provides Qubole Data Service (QDS), which is optimized for the cloud and offers ad hoc analysis, predictive analysis, machine learning, streaming and MapReduce to name a few. Users without software development skills can leverage the QDS workbench through a SmartQuery interface without even knowing how to write a SQL query.

The autonomous data platform and intelligence product will abstract low-level services and automatically execute tasks, based on policy, configurations or intelligence. That frees data teams to focus on higher-order problems. In addition, workload-aware auto scaling can add or reduce nodes to a compute cluster both predictively and reactively, thereby reducing compute consumption spend.

The platform features intelligence that collects the metadata about the underlying infrastructure,data platform and  application, and through a combination of heuristics and machine learning, provides both the data team and the autonomous data platform with actionable insights. For instance, the intelligence engine might recommend which data tables should be optimized through partitioning or sorting for different workloads.

The platform optimizes analytics workloads through monitoring of real-time and historical data. For example, it can monitor a cluster’s HDFS storage to ensure the available capacity remains sufficient for the jobs that are running to complete and launch more nodes if necessary. The optimization can be rate-based, in which past data is used to estimate HDFS storage and predict when to launch additional nodes, or threshold-based, which involves real-time monitoring.

Qubole’s  announcement coincides with its Data Platforms 2017 conference, which features speakers including R. David Edelman, President Obama’s “Geek in Chief,” Alex Sadovsky of Oracle, and big data experts from Microsoft, Amazon, and Expedia.

Presentation overviews:

Using Apache Spark machine learning for pattern detection

Using ORC files to speed analytics

Leave a Reply

Your email address will not be published. Required fields are marked *