Flink定时器实战：处理时间与事件时间

渣渣盟

于 2025-05-16 08:00:00 发布

阅读量933

点赞数 29

CC 4.0 BY-SA版权

分类专栏：大数据技术与应用 # Flink # 基于Scala对Flink的实现文章标签： flink 大数据 scala

本文链接：https://blog.csdn.net/m0_57376564/article/details/147991095

大数据技术与应用同时被 3 个专栏收录

51 篇文章

订阅专栏

Flink

28 篇文章

订阅专栏

基于Scala对Flink的实现

23 篇文章

订阅专栏

1. KeyedProcessFunction

2. 处理时间（Processing Time）

package processfunction

import org.apache.flink.streaming.api.functions.source.{RichSourceFunction, SourceFunction}
import org.apache.flink.streaming.api.functions.{KeyedProcessFunction, ProcessFunction}
import org.apache.flink.streaming.api.scala._
import org.apache.flink.util.Collector
import source.{ClickSource, Event}
/**
 *
 * @PROJECT_NAME: flink1.13
 * @PACKAGE_NAME: processfunction
 * @author: 赵嘉盟-HONOR
 * @data: 2023-11-23 23:48
 * @DESCRIPTION
 *
 */
object TimeTimer {
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)

    val data = env.addSource(new ClickSource).assignAscendingTimestamps(_.timestamp)

    //TODO 基于处理时间的定时器
    data.keyBy(data=>"data").process(new KeyedProcessFunction[String,Event,String] {
      override def processElement(i: Event, context: KeyedProcessFunction[String, Event, String]#Context, collector: Collector[String]): Unit = {
        val currentTime = context.timerService().currentProcessingTime()
        collector.collect("数据到达，当前时间是："+currentTime)
        context.timerService().registerProcessingTimeTimer(currentTime+5000L)
      }
      override def onTimer(timestamp: Long, ctx: KeyedProcessFunction[String, Event, String]#OnTimerContext, out: Collector[String]): Unit = {
        out.collect("定时器触发，触发时间为："+timestamp)
      }
    }).print("ProcessingTimeTimer")

    //TODO 基于事件时间的定时器
    val data1=env.addSource(new EventSource).assignAscendingTimestamps(_.timestamp)
    data1.keyBy(data => "data").process(new KeyedProcessFunction[String, Event, String] {
      override def processElement(i: Event, context: KeyedProcessFunction[String, Event, String]#Context, collector: Collector[String]): Unit = {
        val currentTime = context.timerService().currentWatermark()
        collector.collect(s"数据到达，当前时间是：$currentTime ,当前数据时间戳是 ${i.timestamp}")
        context.timerService().registerEventTimeTimer(i.timestamp + 5000L)
      }

      override def onTimer(timestamp: Long, ctx: KeyedProcessFunction[String, Event, String]#OnTimerContext, out: Collector[String]): Unit = {
        out.collect("定时器触发，触发时间为：" + timestamp)
      }
    }).print("EventTimeTimer")

    env.execute("ProcessingTimeTimer")
  }
  class EventSource extends RichSourceFunction[Event] {
    override def run(sourceContext: SourceFunction.SourceContext[Event]): Unit = {
      sourceContext.collect(Event("Mary","./root",100L))
      Thread.sleep(5000L)
      sourceContext.collect(Event("Mary", "./root", 200L))
      Thread.sleep(5000L)
      sourceContext.collect(Event("Mary", "./root", 1000L))
      Thread.sleep(5000L)
      sourceContext.collect(Event("Mary", "./root", 6000L))
      Thread.sleep(5000L)
      sourceContext.collect(Event("Mary", "./root", 6001L))
      Thread.sleep(5000L)
    }

    override def cancel(): Unit = ???
  }
}

这段代码展示了如何使用 Apache Flink 的 KeyedProcessFunction 来实现基于 处理时间（Processing Time） 和 事件时间（Event Time） 的定时器。以下是代码的详细解释和背景知识拓展。

代码解释

1. 环境设置

val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)

StreamExecutionEnvironment.getExecutionEnvironment：获取流处理执行环境。
env.setParallelism(1)：设置并行度为 1，方便调试和观察结果。

2. 基于处理时间的定时器

val data = env.addSource(new ClickSource).assignAscendingTimestamps(_.timestamp)
data.keyBy(data => "data").process(new KeyedProcessFunction[String, Event, String] {
  override def processElement(i: Event, context: KeyedProcessFunction[String, Event, String]#Context, collector: Collector[String]): Unit = {
    val currentTime = context.timerService().currentProcessingTime()
    collector.collect("数据到达，当前时间是：" + currentTime)
    context.timerService().registerProcessingTimeTimer(currentTime + 5000L)
  }
  override def onTimer(timestamp: Long, ctx: KeyedProcessFunction[String, Event, String]#OnTimerContext, out: Collector[String]): Unit = {
    out.collect("定时器触发，触发时间为：" + timestamp)
  }
}).print("ProcessingTimeTimer")

addSource(new ClickSource)：从自定义数据源 ClickSource 读取数据。
assignAscendingTimestamps(_.timestamp)：为数据分配时间戳。
keyBy(data => "data")：将所有数据分到同一个 key 中。
process：使用 KeyedProcessFunction 处理数据。
- processElement：处理每条数据，获取当前处理时间，并注册一个 5 秒后的定时器。
- onTimer：定时器触发时执行的操作。
print：打印结果。

3. 基于事件时间的定时器

val data1 = env.addSource(new EventSource).assignAscendingTimestamps(_.timestamp)
data1.keyBy(data => "data").process(new KeyedProcessFunction[String, Event, String] {
  override def processElement(i: Event, context: KeyedProcessFunction[String, Event, String]#Context, collector: Collector[String]): Unit = {
    val currentTime = context.timerService().currentWatermark()
    collector.collect(s"数据到达，当前时间是：$currentTime ,当前数据时间戳是 ${i.timestamp}")
    context.timerService().registerEventTimeTimer(i.timestamp + 5000L)
  }
  override def onTimer(timestamp: Long, ctx: KeyedProcessFunction[String, Event, String]#OnTimerContext, out: Collector[String]): Unit = {
    out.collect("定时器触发，触发时间为：" + timestamp)
  }
}).print("EventTimeTimer")

addSource(new EventSource)：从自定义数据源 EventSource 读取数据。
assignAscendingTimestamps(_.timestamp)：为数据分配时间戳。
keyBy(data => "data")：将所有数据分到同一个 key 中。
process：使用 KeyedProcessFunction 处理数据。
- processElement：处理每条数据，获取当前水位线（Watermark），并注册一个 5 秒后的定时器。
- onTimer：定时器触发时执行的操作。
print：打印结果。

4. 自定义数据源

class EventSource extends RichSourceFunction[Event] {
  override def run(sourceContext: SourceFunction.SourceContext[Event]): Unit = {
    sourceContext.collect(Event("Mary", "./root", 100L))
    Thread.sleep(5000L)
    sourceContext.collect(Event("Mary", "./root", 200L))
    Thread.sleep(5000L)
    sourceContext.collect(Event("Mary", "./root", 1000L))
    Thread.sleep(5000L)
    sourceContext.collect(Event("Mary", "./root", 6000L))
    Thread.sleep(5000L)
    sourceContext.collect(Event("Mary", "./root", 6001L))
    Thread.sleep(5000L)
  }
  override def cancel(): Unit = ???
}

run：模拟数据生成，每隔 5 秒发送一条数据。
cancel：取消数据生成（未实现）。

5. 任务执行

env.execute("ProcessingTimeTimer")

启动 Flink 任务。

背景知识拓展

1. KeyedProcessFunction

作用：用于处理 keyed 数据流，支持状态管理和定时器。
核心方法：
- processElement：处理每条数据。
- onTimer：定时器触发时执行的操作。

2. 处理时间（Processing Time）

定义：数据被处理时的系统时间。
特点：简单、高效，但无法处理乱序事件。
应用场景：对实时性要求高，但对事件顺序不敏感的场景。

3. 事件时间（Event Time）

定义：数据实际发生的时间。
特点：能处理乱序事件，但需要水位线（Watermark）机制。
应用场景：对事件顺序敏感的场景，如日志分析、交易监控。

4. 水位线（Watermark）

作用：用于处理乱序事件，标记事件时间的进度。
生成方式：通常基于事件时间戳生成。
延迟处理：允许一定时间范围内的延迟数据。

5. 定时器

类型：
- 处理时间定时器：基于系统时间触发。
- 事件时间定时器：基于事件时间触发。
注册方式：通过 context.timerService().registerProcessingTimeTimer 或 context.timerService().registerEventTimeTimer 注册。