This document discusses building leakproof stream processing pipelines with Apache Kafka and Apache Spark. It provides an overview of offset management in Spark Streaming from Kafka, including storing offsets in external data stores like ZooKeeper, Kafka, and HBase. The document also covers Spark Streaming Kafka consumer types and workflows, and addressing issues like maintaining offsets during planned and unplanned maintenance or application errors.
Related topics: