Kafka Stream 学习笔记-5 process api

最新推荐文章于 2024-07-12 00:01:31 发布

weixin_40455124

最新推荐文章于 2024-07-12 00:01:31 发布

阅读量297

点赞数

CC 4.0 BY-SA版权

分类专栏： kafka 文章标签： kafka stream process api

本文链接：https://blog.csdn.net/weixin_40455124/article/details/126863690

kafka 专栏收录该内容

12 篇文章

订阅专栏

本文探讨了Apache Kafka Streams中如何结合DSL（Domain Specific Language）和Processor API进行应用开发，介绍了其优势如元数据访问、定时任务和更精细的控制，同时也揭示了缺点如代码复杂性和维护成本。核心内容包括源处理器添加、状态ful和stateless处理器，以及如何使用聚合和sink。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Summary

Kafka Streams allows us to mix both the DSL and Processor API in an
application。

advantage

Access to record metadata (topic, partition, offset information, record headers,and so on)
ability to schedule periodic functions （DSL不支持）
More fine-grained control over when records get forwarded to downstream processors
More granular access to state stores
Ability to circumvent any limitations you come across in the DSL

disadvantages

More verbose code, which can lead to higher maintenance costs and impair readability
A higher barrier to entry for other project maintainers
More footguns, including accidental reinvention of DSL features or abstractions, exotic problem-framing,1 and performance traps

Topology 主要方法

addSource method to create a source processor.
addProcessor–需要挂到parent(source processor 获取前置processor）下

 public <KIn, VIn, KOut, VOut> Topology addProcessor(     String name,
    org.apache.kafka.streams.processor.api.ProcessorSupplier<KIn, VIn, KOut, VOut> supplier,
    String... parentNames )

Stateless Processors

 public interface Processor<K, V> {
void init(ProcessorContext context);
void process(K key, V value);
void close();
}
    context.forward(newRecord); 发送record给next processor

Stateful Processors

主要materialized差异

KeyValueBytesStoreSupplier storeSupplier =
Stores.persistentTimestampedKeyValueStore("my-store");
grouped.aggregate(
initializer,
adder,
Materialized.<String, String>as(storeSupplier));

aggregate类函数不带默认state store，需要指定materialized，即state store，stream/topology指定state store。

  builder.addStateStore(storeBuilder, "Digital Twin Processor");

  this.kvStore = (KeyValueStore) context.getStateStore("digital-twin-store");

  kvStore.get(key)/kvStore.put(key, digitalTwin);

sink-输出

addSink

Periodic Functions with Punctuate

DSL 无法实现，只能在process api实现

  this.context.schedule(
        Duration.ofSeconds(10), PunctuationType.WALL_CLOCK_TIME, this::enforceTtl);

schedule（创建）时候会返回Cancellable对象，用于后续取消

  @Overridepublic void close() {// cancel the punctuatorpunctuator.cancel();}

types of punctuations-触发模式

Stream time ：not execute unless data arrives on a continuous basis.必须有后续record

Wall clock time：无论有无新record都会执行，This means periodic functions will continue to execute regardless of whether or not new messages arrive.

Accessing Record Metadata

Record headers context.headers()

Offset context.offset()

Partition context.partition()

Timestamp context.timestamp()

Topic context.topic()

Combining the Processor API with the DSL

Processors: A processor is a terminal operation (meaning it returns void and downstream operators cannot be chained)

Apply a Processor to each record at a time

Transformers (多个XXTransformerXX接口) can return one or more records (depending on which variation you use), and are therefore more optimal
if you need to chain a downstream operator.