Organizations will rely on these cloud services for more ephemeral types of workloads involving, for example, the building of artificial intelligence models.
Cloudera this week announced a significant expansion of its portfolio with the launch of three cloud services that promise to reduce the time and effort required to stand up a Big Data platform based on Hadoop.
Designed to be deployed on any cloud, the Cloudera Data Platform (CDP) is a managed instance of Hadoop that is simpler for IT teams to set up and manage than an on-premises edition of a distribution of Hadoop. On top of that platform, Cloudera is also making available a data warehouse service, a machine learning service that comes complete with workspaces for data scientists, and a data management and analytics service.
See also: Cutting Through the Fog: The Speediness of Edge, Hybrid, and All-Cloud
Each of these offerings is intended to reduce the amount of time required to attain a return on investment in Hadoop, says Mick Hollison, chief marketing officer for Cloudera.
Cloudera expects that organizations will rely on these cloud services for more ephemeral types of workloads involving, for example, the building of artificial intelligence (AI) models, says Hollison. In environments where a Big Data application is being accessed on a 24/7 basis and there is a lot of movement of data, an on-premises instance of Hadoop would be less expensive to run over time, notes Hollison.
Initially, the Cloudera services will be available of Amazon Web Services (AWS), followed by support for Microsoft Azure and Google Cloud Platform soon, adds Hollison. An on-premises edition of CDP, dubbed the CDP Data Center, is available now as a technology preview. It is expected to become generally available later this year, with annual subscriptions starting at $10,000 per node.
“This is a whole new chapter for us,” says Hollison.
Cloudera is also giving customers the option of storing data directly in the object-file systems provided by those cloud service providers or rely of the more converged compute and storage model enabled by the Hadoop Distributed File System (HDFS), says Hollison. That’s a significant development because many critics of Hadoop as a platform have identified HDFS as a layer of overhead that isn’t required when object file systems in the cloud are readily available.
In general, organizations are adopting the Cloudera platform to build analytics applications and data warehouses consisting of data that streams into the platform in near real time. Cloudera doesn’t expect those data warehousing to replace legacy data warehouses as much as they will be used to drive a wide range of new processes that require large amounts of data to be instantly processed. To bridge the divide between those two worlds, Hollison says there’s been a lot of adoption of Cloudera Replication Manager software to move data between various platforms.
There’s clearly a lot more competition these days between various Big Data platforms. The number of providers of Hadoop distributions may have consolidated, but reliance on the platform remains relatively steady. The big issue continues to be determining what types of emerging applications will make the most sense to build on an open source Big Data platform such as Hadoop.