Apache Kafka

Apache Kafka

Apache Kafka is a framework implementation of a software bus using stream-processing. It is an open-source software platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.Apache Kafka is a distributed data store optimized for ingesting and processing streaming data in real-time. Streaming data is data that is continuously generated by thousands of data sources, which typically send the data records in simultaneously. A streaming platform needs to handle this constant influx of data, and process the data sequentially and incrementally.

Kafka provides three main functions to its users:

  • Publish and subscribe to streams of records
  • Effectively store streams of records in the order in which records were generated
  • Process streams of records in real time

Benefits of Kafka's approach

Scalable

Kafka’s partitioned log model allows data to be distributed across multiple servers, making it scalable beyond what would fit on a single server. 

Fast

Kafka decouples data streams so there is very low latency, making it extremely fast.

Durable

Partitions are distributed and replicated across many servers, and the data is all written to disk. This helps protect against server failure, making the data very fault-tolerant and durable.