Will Apache Pulsar eventually kill Apache Kafka?

Tarun Manrai
2 min readMar 27, 2020

--

apache pulsar vs apache kafka

Kafka was launched by LinkedIn in 2011. It was one of the only good options available for critical large-scale messaging. With the daily increased usage and messaging count going up to billions, in 2019, a new and updated platform was launched, called Apache Pulsar. As per research and analysis firm, GigaOm, Pulsar is much faster than Kafka. — it is 2.5 times faster and 40% latency than Kafka. Pulsar can function both ways — Kafka’s offset based topic reader consumption method as well as traditional pub-sub topic message consumption. Also, Pulsar has a different architecture for data persistence. Pulsar does not store log files in the local broker as Kafka, it stores all data in a specialized data.

Pulsar also offers Function as a Service (FaaS) support which is a feature with which real-time data streams can be analyzed, aggregates to summarized in real-time. There are certain problems in Kafka that Pulsar can resolve. Here are few of them.

· It is difficult to scale Kafka, as it stores data within the broker as distributed logs.

· Changing partitions sizes in order to allow more storage can mess the messaging order.

· Cluster re balancing can affect the performance of connected consumers and producers.

· The topics in Kafka can lose messages in failure scenarios.

· Another real time event analyzer tools such as Apache Horror, Apache Storm, Apache Spark would be required to strengthen the support of incoming traffic.

It is not that Kafka is bad or Pulsar can completely kill it. But, when it comes to a large-scale messaging platform, the points in Kafka that creates a problem for us are resolved by Pulsar. Due to architectural aspects, Pulsar is much faster than Kafka which is a best option for any engineer. However, there is a learning curve that is there to move shift from Kafka to Pulsar solutions, which includes ROI for that.

Visit www.entradasoft.com for all types of consultancy & managed services in Big data, data engineering, cloud computing

--

--