11 Comments
User's avatar
Favour Lawrence's avatar

This article is so comprehensive. One can find their way round kafka with this .

Expand full comment
Dinesh Solanki's avatar

This article is awesome. Explaining every capabilities of the tool so that we can go deeper as needed. As a beginner in kafka, this is very helpful. Thanks!

Expand full comment
Mridul's avatar

This article is so helpful and easy to follow.

One suggestion is to allow click-zoom the images for mobile users

Expand full comment
Snowfire's avatar

I am currently working in a team that heavily uses and builds on top of Kafka infrastructure - we make use of kafka infra + Kafka streams + Kafka connect - and nowhere until this article have I come across such a crisp, succinct and comprehensive enough that I have already bookmarked it as my go to article moving forward! Thank you Stanislav!

Expand full comment
Stanislav Kozlovski's avatar

🙇‍♂️

Expand full comment
Neural Foundry's avatar

This is hands down one of the most comprehensiv explainers on Kafka Ive seen. Breaking down the whole stream table duality thing and explaining how the metadata log works with KRaft was super helpful, I always found that aspect a bit confusing. The tiered storage section is also facinating, I didnt realize how much cost savings you could get by offloading to S3. One thing Im curious about is the consumer group protocl details, like how does the coordinator handle a situation where a consumer is slow but not dead? Does it just wait indefinitely or is there some kind of timeout? Either way, great stuff, definitely bookmarking this!

Expand full comment
Stanislav Kozlovski's avatar

Timeouts in consumers are `session.timeout.ms` and `poll.timeout.ms`; I wrote about it back in the day here: https://www.confluent.io/blog/apache-kafka-data-access-semantics-consumers-and-membership/

Basically, as long as the consumer heartbeats and actively polls for messages, it’ll remain in the group. It can accumulate consumer lag in that time.

If you’re surprised how much S3 can save you, you’ll be blown away by how much you can save by eliminating networking: https://topicpartition.io/blog/kip-1150-diskless-topics-in-apache-kafka

Expand full comment
Alex's avatar

Incredible article! I was going to use Kafka for an earlier draft of my project. I scaled the project down and realize it became overkill. But I really do like Kafka a lot.

Expand full comment
Stanislav Kozlovski's avatar

For small cases, you should just use postgres. https://topicpartition.io/blog/postgres-pubsub-queue-benchmarks

Expand full comment
Ankit's avatar

One of the best high level overview of kafka

Expand full comment
Pathwings's avatar

Awesome article...

Expand full comment