目录
2. 任务执行命令类:FlinkTaskExecuteCommand
4. 任务执行:FlinkExecution.execute()
本文基于SeaTunnel 2.3.x源码分析Flink引擎执行流程,以seatunnel-examples/seatunnel-flink-connector-v2-example/src/main/java/org/apache/seatunnel/example/flink/v2/SeaTunnelApiExample.java
为入口,完整解析Flink引擎的执行流程。
1. 任务启动入口
启动类核心代码:
// 1. 初始化Flink启动命令参数 FlinkCommandArgs flinkCommandArgs = new FlinkCommandArgs(); // 2. 执行SeaTunnel.run()回调Flink执行命令 SeaTunnel.run(flinkCommandArgs.buildCommand());
-
buildCommand()
返回FlinkTaskExecuteCommand
实例 -
SeaTunnel.run()
最终调用FlinkTaskExecuteCommand.execute()
2. 任务执行命令类:FlinkTaskExecuteCommand
核心执行流程:
public void execute() { // 1. 解析配置文件生成Config对象 Config config = ConfigBuilder.of(configFile); // 2. 创建FlinkExecution实例 FlinkExecution seaTunnelTaskExecution = new FlinkExecution(config); // 3. 执行任务 seaTunnelTaskExecution.execute(); }
3. FlinkExecution的创建与初始化
3.1 核心组件初始化
public FlinkExecution(Config config) { // 创建三大处理器 this.sourcePluginExecuteProcessor = new SourceExecuteProcessor( jarPaths, config.getConfigList(Constants.SOURCE), jobContext); this.transformPluginExecuteProcessor = new TransformExecuteProcessor( jarPaths, TypesafeConfigUtils.getConfigList(config, Constants.TRANSFORM, Collections.emptyList()), jobContext); this.sinkPluginExecuteProcessor = new SinkExecuteProcessor( jarPaths, config.getConfigList(Constants.SINK), jobContext); // 初始化Flink执行环境 this.flinkRuntimeEnvironment = FlinkRuntimeEnvironment.getInstance( this.registerPlugin(config, jarPaths)); // 为处理器注入运行时环境 this.sourcePluginExecuteProcessor.setRuntimeEnvironment(flinkRuntimeEnvironment); this.transformPluginExecuteProcessor.setRuntimeEnvironment(flinkRuntimeEnvironment); this.sinkPluginExecuteProcessor.setRuntimeEnvironment(flinkRuntimeEnvironment); }
3.2 关键对象说明
组件 | 类型 | 功能 |
---|---|---|
sourcePluginExecuteProcessor | SourceExecuteProcessor | 处理数据源接入 |
transformPluginExecuteProcessor | TransformExecuteProcessor | 处理数据转换逻辑 |
sinkPluginExecuteProcessor | SinkExecuteProcessor | 处理数据输出 |
flinkRuntimeEnvironment | FlinkRuntimeEnvironment | 封装Flink StreamExecutionEnvironment |
4. 任务执行:FlinkExecution.execute()
DAG构建流程:
public void execute() { // 初始化数据流集合 List<DataStream<Row>> dataStreams = new ArrayList<>(); // 按顺序执行三大组件 dataStreams = sourcePluginExecuteProcessor.execute(dataStreams); dataStreams = transformPluginExecuteProcessor.execute(dataStreams); sinkPluginExecuteProcessor.execute(dataStreams); }
5. Source处理流程
5.1 插件初始化
调用链:
SourceExecuteProcessor() → super(jarPaths, pluginConfigs, jobContext) // 调用父类构造器 → this.plugins = initializePlugins(jarPaths, pluginConfigs)
插件加载核心逻辑:
protected List<SeaTunnelSource> initializePlugins(...) { SeaTunnelSourcePluginDiscovery discovery = new SeaTunnelSourcePluginDiscovery(); List<SeaTunnelSource> sources = new ArrayList<>(); for (Config sourceConfig : pluginConfigs) { // 1. 识别插件类型 PluginIdentifier identifier = PluginIdentifier.of( ENGINE_TYPE, PLUGIN_TYPE, sourceConfig.getString(PLUGIN_NAME)); // 2. 加载依赖JAR jars.addAll(discovery.getPluginJarPaths(Lists.newArrayList(identifier))); // 3. 创建插件实例 SeaTunnelSource source = discovery.createPluginInstance(identifier); // 4. 初始化插件 source.prepare(sourceConfig); source.setJobContext(jobContext); // 5. 批处理模式校验 if (jobContext.getJobMode() == JobMode.BATCH && source.getBoundedness() == Boundedness.UNBOUNDED) { throw new UnsupportedOperationException("Unbounded source not support batch mode"); } sources.add(source); } jarPaths.addAll(jars); return sources; }
5.2 数据流生成
执行入口:
public List<DataStream<Row>> execute(List<DataStream<Row>> upstreamDataStreams) { StreamExecutionEnvironment env = flinkRuntimeEnvironment.getStreamExecutionEnvironment(); List<DataStream<Row>> sources = new ArrayList<>(); for (int i = 0; i < plugins.size(); i++) { SeaTunnelSource internalSource = plugins.get(i); // 1. 创建SourceFunction BaseSeaTunnelSourceFunction sourceFunction; if (internalSource instanceof SupportCoordinate) { sourceFunction = new SeaTunnelCoordinatedSource(internalSource); } else { sourceFunction = new SeaTunnelParallelSource(internalSource); // Flink默认路径 } // 2. 创建DataStreamSource DataStreamSource<Row> sourceStream = addSource( env, sourceFunction, "SeaTunnel " + internalSource.getClass().getSimpleName(), internalSource.getBoundedness() == Boundedness.BOUNDED ); // 3. 设置并行度 if (pluginConfig.hasPath(CommonOptions.PARALLELISM.key())) { sourceStream.setParallelism(pluginConfig.getInt(CommonOptions.PARALLELISM.key())); } // 4. 注册结果表 registerResultTable(pluginConfig, sourceStream); sources.add(sourceStream); } return sources; }
结果表注册逻辑:
void registerResultTable(Config config, DataStream<Row> dataStream) { if (config.hasPath(RESULT_TABLE_NAME)) { String tableName = config.getString(RESULT_TABLE_NAME); StreamTableEnvironment tableEnv = getStreamTableEnvironment(); if (!TableUtil.tableExists(tableEnv, tableName)) { if (config.hasPath("field_name")) { // 带字段名的注册方式 tableEnv.registerDataStream( tableName, dataStream, config.getString("field_name") ); } else { // 默认注册方式 tableEnv.registerDataStream(tableName, dataStream); } } } }
6. Transform处理流程
6.1 插件初始化
关键校验逻辑:
public void prepare(Config pluginConfig) { // 必须包含source_table_name和result_table_name if (!pluginConfig.hasPath(SOURCE_TABLE_NAME) || !pluginConfig.hasPath(RESULT_TABLE_NAME)) { throw new IllegalArgumentException("Missing required table name config"); } // 输入输出表名不能相同 if (Objects.equals( pluginConfig.getString(SOURCE_TABLE_NAME), pluginConfig.getString(RESULT_TABLE_NAME)) { throw new IllegalArgumentException("Source and result table names must be different"); } this.inputTableName = pluginConfig.getString(SOURCE_TABLE_NAME); this.outputTableName = pluginConfig.getString(RESULT_TABLE_NAME); // 调用具体Transform的配置初始化 setConfig(pluginConfig); }
6.2 转换执行
核心处理流程:
public List<DataStream<Row>> execute(List<DataStream<Row>> upstreamDataStreams) { DataStream<Row> input = upstreamDataStreams.get(0); // 默认使用第一个上游流 for (int i = 0; i < plugins.size(); i++) { SeaTunnelTransform transform = plugins.get(i); Config pluginConfig = pluginConfigs.get(i); // 1. 获取输入流(通过source_table_name查找) DataStream<Row> stream = fromSourceTable(pluginConfig).orElse(input); // 2. 执行转换 input = flinkTransform(transform, stream); // 3. 注册结果表 registerResultTable(pluginConfig, input); } return Collections.singletonList(input); }
转换算子实现:
protected DataStream<Row> flinkTransform(SeaTunnelTransform transform, DataStream<Row> stream) { // 类型系统转换 SeaTunnelDataType inputType = TypeConverterUtils.convert(stream.getType()); transform.setTypeInfo(inputType); // 创建行转换器 FlinkRowConverter inputConverter = new FlinkRowConverter(inputType); FlinkRowConverter outputConverter = new FlinkRowConverter(transform.getProducedType()); // 通过flatMap实现转换逻辑 return stream.flatMap(new FlatMapFunction<Row, Row>() { @Override public void flatMap(Row value, Collector<Row> out) { // 类型转换 SeaTunnelRow inRow = inputConverter.reconvert(value); // 执行Transform核心逻辑 SeaTunnelRow outRow = (SeaTunnelRow) transform.map(inRow); if (outRow != null) { // 输出类型转换 Row result = outputConverter.convert(outRow); out.collect(result); } } }, TypeConverterUtils.convert(transform.getProducedType())); }
7. Sink处理流程
7.1 插件初始化
特殊处理逻辑:
protected List<SeaTunnelSink> initializePlugins(...) { for (Config sinkConfig : pluginConfigs) { SeaTunnelSink sink = discovery.createPluginInstance(identifier); sink.prepare(sinkConfig); // 数据保存模式处理 if (sink instanceof SupportDataSaveMode) { SupportDataSaveMode saveModeSink = (SupportDataSaveMode) sink; saveModeSink.checkOptions(sinkConfig); // 校验配置 } } }
7.2 数据输出
执行流程:
public List<DataStream<Row>> execute(List<DataStream<Row>> upstreamDataStreams) { DataStream<Row> input = upstreamDataStreams.get(0); for (int i = 0; i < plugins.size(); i++) { Config sinkConfig = pluginConfigs.get(i); SeaTunnelSink sink = plugins.get(i); // 1. 获取输入流 DataStream<Row> stream = fromSourceTable(sinkConfig).orElse(input); // 2. 设置类型信息 sink.setTypeInfo((SeaTunnelRowType) TypeConverterUtils.convert(stream.getType())); // 3. 处理数据保存模式 if (sink instanceof SupportDataSaveMode) { SupportDataSaveMode saveModeSink = (SupportDataSaveMode) sink; DataSaveMode saveMode = saveModeSink.getDataSaveMode(); saveModeSink.handleSaveMode(saveMode); // 处理保存模式 } // 4. 适配Flink Sink API DataStreamSink<Row> dataStreamSink = stream.sinkTo( SinkV1Adapter.wrap(new FlinkSink<>(sink)) ).name(sink.getPluginName()); // 5. 设置并行度 if (sinkConfig.hasPath(CommonOptions.PARALLELISM.key())) { dataStreamSink.setParallelism(sinkConfig.getInt(CommonOptions.PARALLELISM.key())); } } return null; // Sink是终点,无下游数据流 }
FlinkSink适配器伪代码:
class FlinkSink implements SinkFunction<Row> { private final SeaTunnelSink<SeaTunnelRow, ?, ?, ?> sink; public void invoke(Row value) { // 将Flink Row转换为SeaTunnelRow SeaTunnelRow row = converter.convert(value); // 调用SeaTunnel Sink写入逻辑 sink.write(row); } }
执行流程全景图
关键设计总结
-
插件化架构:
-
通过SPI机制动态加载Source/Transform/Sink插件
-
插件发现机制:
SeaTunnel*PluginDiscovery
-
-
表名映射机制:
-
source_table_name
和result_table_name
构成DAG链路 -
通过
registerDataStream()
实现表注册
-
-
类型系统转换:
-
TypeConverterUtils
处理SeaTunnel类型与Flink类型的转换 -
FlinkRowConverter
实现行数据双向转换
-
-
执行环境封装:
-
FlinkRuntimeEnvironment
统一管理执行环境 -
兼容批流一体:通过
JobMode
控制执行模式
-
-
适配器模式:
-
SeaTunnelParallelSource
适配SeaTunnelSource到Flink SourceFunction -
FlinkSink
适配SeaTunnelSink到Flink SinkFunction
-
本文完整解析了SeaTunnel Flink引擎从任务启动到DAG构建的全流程,重点突出了插件加载机制、类型转换系统和表名映射等核心设计,为理解SeaTunnel内部工作原理提供了详细参考。