Summary
Kafka Streams allows us to mix both the DSL and Processor API in an
application。
advantage
- Access to record metadata (topic, partition, offset information, record headers,and so on)
- ability to schedule periodic functions (DSL不支持)
- More fine-grained control over when records get forwarded to downstream processors
- More granular access to state stores
- Ability to circumvent any limitations you come across in the DSL
disadvantages
- More verbose code, which can lead to higher maintenance costs and impair readability
- A higher barrier to entry for other project maintainers
- More footguns, including accidental reinvention of DSL features or abstractions, exotic problem-framing,1 and performance traps
Topology 主要方法
- addSource method to create a source processor.
- addProcessor–需要挂到parent(source processor 获取前置processor)下
public <KIn, VIn, KOut, VOut> Topology addProcessor( String name,
org.apache.kafka.streams.processor.api.ProcessorSupplier<KIn, VIn, KOut, VOut> supplier,
String... parentNames )
- Stateless Processors
public interface Processor<K, V> {
void init(ProcessorContext context);
void process(K key, V value);
void close();
}
context.forward(newRecord); 发送record给next processor
- Stateful Processors
主要materialized差异
KeyValueBytesStoreSupplier storeSupplier =
Stores.persistentTimestampedKeyValueStore("my-store");
grouped.aggregate(
initializer,
adder,
Materialized.<String, String>as(storeSupplier));
aggregate类函数不带默认state store,需要指定materialized,即state store,stream/topology指定state store。
builder.addStateStore(storeBuilder, "Digital Twin Processor");
this.kvStore = (KeyValueStore) context.getStateStore("digital-twin-store");
kvStore.get(key)/kvStore.put(key, digitalTwin);
-
sink-输出
addSink
Periodic Functions with Punctuate
DSL 无法实现,只能在process api实现
this.context.schedule(
Duration.ofSeconds(10), PunctuationType.WALL_CLOCK_TIME, this::enforceTtl);
schedule(创建)时候会返回Cancellable对象,用于后续取消
@Overridepublic void close() {// cancel the punctuatorpunctuator.cancel();}
types of punctuations-触发模式
Stream time :not execute unless data arrives on a continuous basis.必须有后续record
Wall clock time:无论有无新record都会执行,This means periodic functions will continue to execute regardless of whether or not new messages arrive.
Accessing Record Metadata
Record headers context.headers()
Offset context.offset()
Partition context.partition()
Timestamp context.timestamp()
Topic context.topic()
Combining the Processor API with the DSL
Processors: A processor is a terminal operation (meaning it returns void and downstream operators cannot be chained)
Apply a Processor to each record at a time
Transformers (多个XXTransformerXX接口) can return one or more records (depending on which variation you use), and are therefore more optimal
if you need to chain a downstream operator.