Know if Apache Kafka is Right for You?
Apache Kafka is a publish-subscribe based durable messaging system that exchanges data between processes, applications and servers. It offers a better understanding of messaged and distributed logs and defines important concepts. In last few years, Apache Kafka has grown in terms of its functionality and reach. One-third of the fortune 500 companies use Kafka. With multiple functions, here are few that are offered by Kafka:
Messaging System: Messaging is used widely in two different ways:
· Queuing (such as SQS, celery etc.): In case of queue consumers, each message goes to only one worker who processes effectively and divides the work.
· Publish-Subscribe (SNS, PubNet etc.): It acts like a notification system and each subscriber gets a copy of each message.
Stream Processing: Stream API helps scale the messaging system and makes a process for streaming of messages. Kafka helps and make following easy to perform:
· Stateless operations like transforming stream messages and filtering
· Stateful operations like join & aggregation over a time window.
Use cases of Kafka
· Kafka helps post changes by acting as a subscriber after forming a single consumer group. A message is shared with only one node which s further copied to other components.
· Kafka help track and analyze website activities.
· Incoming data can be published on Kafka topics and processed with Stream API.
How Kafka works?
Kafka looks like a publish-subscribe system that can deliver persistent, in-order and scalable messages. It also has partition topics and enable parallel consumption. The messages written to Kafka are persistent and replicated to peer brokers for fault tolerance and the messages stay for a set period (which could be 7 days, 30 days, or any). Log is the key to Kafka. The log here is a time-ordered, append only sequence of data inserts and here data can be anything.
When not to use Kafka?
· Kafka can be fast because it offers the log structure as first-class citizen. But it has its own set of loopholes.
· Kafka does not have individual message IDs. The messages are addressed by their offset in the log.
· Kafka could not track consumers who have consumed the messages.
· Kafka cannot provide high level abstractions if the user wants to switch to Java/Scala for services.
· In Kafka, partition is consumed by single consumer only. The flood of tasks on a single partition can cause starvation and adding new consumers will not be of help.
· In case of only thousand messages each day. Kafka could not be a right choice. It is built for handling large-scale streaming process.
· The messages could not be deleted in Kafka. It keeps all parts of the log for a specific period of time.
visit www.entradasoft.com for any type of consultancy and managed services