Apache Pulsar: Core Technology for Streaming Applications

PinIt

The initial goal for Pulsar was to create a multi-tenant scalable messaging system that could serve as a unified platform for a wide variety of demanding use cases.

One sign of growing interest in an open-source technology is the number of contributors to the effort and the pace at which they are becoming active. By that measure, Apache Pulsar, a cloud-native, distributed messaging and streaming platform originally created at Yahoo!, is on the rise.

Pulsar was designated a top-level project by the Apache Software Foundation in 2018. That designation is given when a technology has attracted a robust community of developers and users and is mature enough to be self-sustaining. More recently, the number of Pulsar contributors doubled in the last two years. And this summer, the Pulsar community hit a milestone… its 400th contributor.

Its popularity and the need for Pulsar are related to the growing use of real-time streaming data in businesses today.

Objectives from the start

The initial goal for Pulsar was to create a multi-tenant scalable messaging system that could serve as a unified platform for a wide variety of demanding use cases. Pulsar offered a unique architecture by separating the serving and storage layers using Apache BookKeeper as the storage component. Such a two-layer architecture offers a simplified approach to the cluster operations, allowing users to easily expand clusters and replace failed nodes or by providing a much higher write and read availability.

At its core, Pulsar uses a publish-and-subscribe technique for building streaming data applications. This method is desirable because different programs can subscribe to specific streams, filtering out the many others that are not of interest.

Pulsar is a multi-tenant, high-performance solution for server-to-server messaging. It can run on everything from bare-metal machines to Kubernetes clusters both on-premises and in the cloud.

Some key features according to the Apache community site are:

  • Native support for multiple clusters in a Pulsar instance, with seamless geo-replication of messages across clusters.
  • Very low publish and end-to-end latency.
  • A simple client API with bindings for Java, Go, Python and C++.
  • Multiple subscription modes (exclusiveshared, and failover) for topics.
  • Guaranteed message delivery with persistent message storage provided by Apache BookKeeper.
  • A serverless lightweight computing framework, Pulsar Functions offers the capability for stream-native data processing.
  • A serverless connector framework Pulsar IO, which is built on Pulsar Functions, makes it easier to move data in and out of Apache Pulsar.
  • Tiered Storage offloads data from hot/warm storage to cold/long-term storage (such as S3 and GCS) when the data is aging out.
Salvatore Salamone

About Salvatore Salamone

Salvatore Salamone is a physicist by training who has been writing about science and information technology for more than 30 years. During that time, he has been a senior or executive editor at many industry-leading publications including High Technology, Network World, Byte Magazine, Data Communications, LAN Times, InternetWeek, Bio-IT World, and Lightwave, The Journal of Fiber Optics. He also is the author of three business technology books.

Leave a Reply

Your email address will not be published. Required fields are marked *