一 .前言
StreamGraph:是根据用户通过 Stream API 编写的代码生成的最初的图。用来表示程序的拓扑结构。
StreamGraph,JobGraph在Flink客户端生成,然后提交给Flink集群。JobGraph到ExecutionGraph的转换在JobMaster中完成
根据用户通过 Stream API 编写的代码生成的最初的图。
使用DataStream API开发的应用程序,首先被转换为Transformation,然后被映射为StreamGraph,该图与具体的执行无关,核心是表达计算过程的逻辑。
StreamGraph在FlinkClient中生成,由Flink Client在提交时候触发Flink应用的main,用户编写的业务逻辑组装成Transformation流水线,在最后调用StreamExecutionEnvironment。execute()的时候开始触发StreamGraph的构建。
- StreamNode:用来代表 operator 的类,并具有所有相关的属性,如并发度、入边和出边等。
StreamNode是StreamGraph中的节点,从Transformation转换而来,可以简单理解为一个StreamNode表示一个算子。
- StreamEdge:表示连接两个 StreamNode 的边。
StreamEdge是StreamGraph中的边,用来连接两个StreamNode,一个StreamNode可以有多个出边,入边,StreamEdge中包含了旁路输出,分区器,字段筛选输出等信息。
二 .代码
2.1. 入口位置
调用用户代码中的 StreamExecutionEnvironment.execute()
-> execute(getJobName())
-> execute(getStreamGraph(jobName))
-> getStreamGraph(jobName, true)
所以最终的入口类是 : StreamExecutionEnvironment#getStreamGraph
/**
*
* getStreamGraph方法会调用StreamGraphGenerator#generate方法
* 使用StreamExecutionEnvironment及其包含的所有transformations生成计算图。
*
* Getter of the {@link org.apache.flink.streaming.api.graph.StreamGraph StreamGraph} of the
* streaming job with the option to clear previously registered {@link Transformation
* transformations}. Clearing the transformations allows, for example, to not re-execute the
* same operations when calling {@link #execute()} multiple times.
*
* @param jobName Desired name of the job
* @param clearTransformations Whether or not to clear previously registered transformations
* @return The streamgraph representing the transformations
*/
@Internal
public StreamGraph getStreamGraph(String jobName, boolean clearTransformations) {
// 获取StreamGraphGenerator构造器,生成 StreamGraph
StreamGraph streamGraph = getStreamGraphGenerator().setJobName(jobName).generate();
if (clearTransformations) {
this.transformations.clear();
}
return streamGraph;
}
2.2. StreamGraphGenerator#generate
在这里我们看到构造一了一个getStreamGraphGenerator 生成器, 然后通过generate方法生成StreamGraph
一个关键的参数是 List<Transformation<?>> transformations。
Transformation 代表了从一个或多个 DataStream 生成新 DataStream 的操作。
DataStream 的底层其实就是一个Transformation,描述了这个 DataStream 是怎么来的。
// 构造StreamGraph
public StreamGraph generate() {
// streamGraph = {StreamGraph@4411}
// jobName = null
// executionConfig = {ExecutionConfig@4105} "ExecutionConfig{executionMode=PIPELINED, closureCleanerLevel=RECURSIVE, parallelism=1, maxParallelism=-1, numberOfExecutionRetries=-1, forceKryo=false, disableGenericTypes=false, enableAutoGeneratedUids=true, objectReuse=false, autoTypeRegistrationEnabled=true, forceAvro=false, autoWatermarkInterval=200, latencyTrackingInterval=0, isLatencyTrackingConfigured=false, executionRetryDelay=10000, restartStrategyConfiguration=Cluster level default restart strategy, taskCancellationIntervalMillis=-1, taskCancellationTimeoutMillis=-1, useSnapshotCompression=false, defaultInputDependencyConstraint=ANY, globalJobParameters=org.apache.flink.api.common.ExecutionConfig$GlobalJobParameters@1, registeredTypesWithKryoSerializers={}, registeredTypesWithKryoSerializerClasses={}, defaultKryoSerializers={}, defaultKryoSerializerClasses={}, registeredKryoTypes=[], registeredPojoTypes=[]}"
// executionMode = {ExecutionMode@3509} "PIPELINED"
// closureCleanerLevel = {ExecutionConfig$ClosureCleanerLevel@3510} "RECURSIVE"
// parallelism = 4
// maxParallelism = -1
// numberOfExecutionRetries = -1
// forceKryo = false
// disableGenericTypes = false
// enableAutoGeneratedUids = true
// objectReuse = false
// autoTypeRegistrationEnabled = true
// forceAvro = false
// autoWatermarkInterval = 200
// latencyTrackingInterval = 0
// isLatencyTrackingConfigured = false
// executionRetryDelay = 10000
// restartStrategyConfiguration = {RestartStrategies$FallbackRestartStrategyConfiguration@3511} "Cluster level default restart strategy"
// taskCancellationIntervalMillis = -1
// taskCancellationTimeoutMillis = -1
// useSnapshotCompression = false
// defaultInputDependencyConstraint = {InputDependencyConstraint@3512} "ANY"
// globalJobParameters = {ExecutionConfig$GlobalJobParameters@3513}
// registeredTypesWithKryoSerializers = {LinkedHashMap@3514} size = 0
// registeredTypesWithKryoSerializerClasses = {LinkedHashMap@3515} size = 0
// defaultKryoSerializers = {LinkedHashMap@3516} size = 0
// defaultKryoSerializerClasses = {LinkedHashMap@3517} size = 0
// registeredKryoTypes = {LinkedHashSet@3518} size = 0
// registeredPojoTypes = {LinkedHashSet@3519} size = 0
// checkpointConfig = {CheckpointConfig@4399}
// checkpointingMode = {CheckpointingMode@3524} "EXACTLY_ONCE"
// checkpointInterval = -1
// checkpointTimeout = 600000
// minPauseBetweenCheckpoints = 0
// maxConcurrentCheckpoints = 1
// forceCheckpointing = false
// forceUnalignedCheckpoints = false
// unalignedCheckpointsEnabled = false
// alignmentTimeout = 0
// approximateLocalRecovery = false
// externalizedCheckpointCleanup = null
// failOnCheckpointingErrors = true
// preferCheckpointForRecovery = false
// tolerableCheckpointFailureNumber = -1
// savepointRestoreSettings = {SavepointRestoreSettings@4412} "SavepointRestoreSettings.none()"
// scheduleMode = null
// chaining = false
// userArtifacts = null
// timeCharacteristic = null
// globalDataExchangeMode = null
// allVerticesInSameSlotSharingGroupByDefault = true
// streamNodes = {HashMap@4415} size = 0
// sources = {HashSet@4416} size = 0
// sinks = {HashSet@4417} size = 0
// virtualSideOutputNodes = {HashMap@4418} size = 0
// virtualPartitionNodes = {HashMap@4419} size = 0
// vertexIDtoBrokerID = {HashMap@4420} size = 0
// vertexIDtoLoopTimeout = {HashMap@4421} size = 0
// stateBackend = null
// iterationSourceSinkPairs = {HashSet@4422} size = 0
// timerServiceProvider = null
streamGraph = new StreamGraph(executionConfig, checkpointConfig, savepointRestoreSettings);
// STREAMING
shouldExecuteInBatchMode = shouldExecuteInBatchMode(runtimeExecutionMode);
configureStreamGraph(streamGraph);
alreadyTransformed = new HashMap<>();
// transformations 的数据实例. 这个有三个集合,
// 我只截取了最新的一个.也就是下标为2的集合.
//
// {LegacySinkTransformation@3460} "LegacySinkTransformation{id=5, name='Print to Std. Out', outputType=GenericType<java.lang.Object>, parallelism=1}"
// | input = {OneInputTransformation@3443} "OneInputTransformation{id=4, name='Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)', outputType=PojoType<org.apache.flink.streaming.examples.socket.SocketWindowWordCount$WordWithCount, fields = [count: Long, word: String]>, parallelism=4}"
// | | input = {PartitionTransformation@3577} "PartitionTransformation{id=3, name='Partition', outputType=PojoType<org.apache.flink.streaming.examples.socket.SocketWindowWordCount$WordWithCount, fields = [count: Long, word: String]>, parallelism=4}"
// | | | input = {OneInputTransformation@3364} "OneInputTransformation{id=2, name='Flat Map', outputType=PojoType<org.apache.flink.streaming.examples.socket.SocketWindowWordCount$WordWithCount, fields = [count: Long, word: String]>, parallelism=4}"
// | | | |---- input = {LegacySourceTransformation@3548} "LegacySourceTransformation{id=1, name='Socket Stream', outputType=String, parallelism=1}"
// | | | | |---- operatorFactory = {SimpleUdfStreamOperatorFactory@3555}
// | | | | |---- boundedness = {Boundedness@3253} "CONTINUOUS_UNBOUNDED"
// | | | | |---- id = 1
// | | | | |---- name = "Socket Stream"
// | | | | |---- outputType = {BasicTypeInfo@3254} "String"
// | | | | |---- typeUsed = true
// | | | | |---- parallelism = 1
// | | | | |---- maxParallelism = -1
// | | | | |---- minResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
// | | | | |---- preferredResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
// | | | | |---- managedMemoryOperatorScopeUseCaseWeights = {HashMap@3556} size = 0
// | | | | |---- managedMemorySlotScopeUseCases = {HashSet@3557} size = 0
// | | | | |---- uid = null
// | | | | |---- userProvidedNodeHash = null
// | | | | |---- bufferTimeout = -1
// | | | | |---- slotSharingGroup = null
// | | | | |---- coLocationGroupKey = null
// | | | |---- operatorFactory = {SimpleUdfStreamOperatorFactory@3370}
// | | | |---- stateKeySelector = null
// | | | |---- stateKeyType = null
// | | | |---- id = 2
// | | | |---- name = "Flat Map"
// | | | |---- outputType = {PojoTypeInfo@3369} "PojoType<org.apache.flink.streaming.examples.socket.SocketWindowWordCount$WordWithCount, fields = [count: Long, word: String]>"
// | | | |---- typeUsed = true
// | | | |---- parallelism = 4
// | | | |---- maxParallelism = -1
// | | | |---- minResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
// | | | |---- preferredResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
// | | | |---- managedMemoryOperatorScopeUseCaseWeights = {HashMap@3550} size = 0
// | | | |---- managedMemorySlotScopeUseCases = {HashSet@3551} size = 0
// | | | |---- uid = null
// | | | |---- userProvidedNodeHash = null
// | | | |---- bufferTimeout = -1
// | | | |---- slotSharingGroup = null
// | | | |---- coLocationGroupKey = null
// | | |-- partitioner = {KeyGroupStreamPartitioner@3591} "HASH"
// | | |-- shuffleMode = {ShuffleMode@3592} "UNDEFINED"
// | | |-- id = 3
// | | |-- name = "Partition"
// | | |-- outputType = {PojoTypeInfo@3369} "PojoType<org.apache.flink.streaming.examples.socket.SocketWindowWordCount$WordWithCount, fields = [count: Long, word: String]>"
// | | |-- typeUsed = true
// | | |-- parallelism = 4
// | | |-- maxParallelism = -1
// | | |-- minResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
// | | |-- preferredResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
// | | |-- managedMemoryOperatorScopeUseCaseWeights = {HashMap@3594} size = 0
// | | |-- managedMemorySlotScopeUseCases = {HashSet@3595} size = 0
// | | |-- uid = null
// | | |-- userProvidedNodeHash = null
// | | |-- bufferTimeout = -1
// | | |-- slotSharingGroup = null
// | | |-- coLocationGroupKey = null
// | |-- operatorFactory = {SimpleUdfStreamOperatorFactory@3578}
// | |-- stateKeySelector = {SocketWindowWordCount$lambda@3579}
// | |-- stateKeyType = {BasicTypeInfo@3254} "String"
// | |-- id = 4
// | |-- name = "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)"
// | |-- outputType = {PojoTypeInfo@3369} "PojoType<org.apache.flink.streaming.examples.socket.SocketWindowWordCount$WordWithCount, fields = [count: Long, word: String]>"
// | |-- typeUsed = true
// | |-- parallelism = 4
// | |-- maxParallelism = -1
// | |-- minResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
// | |-- preferredResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
// | |-- managedMemoryOperatorScopeUseCaseWeights = {HashMap@3580} size = 0
// | |-- managedMemorySlotScopeUseCases = {HashSet@3581} size = 1
// | |-- uid = null
// | |-- userProvidedNodeHash = null
// | |-- bufferTimeout = -1
// | |-- slotSharingGroup = null
// | |-- coLocationGroupKey = null
// |-- operatorFactory = {SimpleUdfStreamOperatorFactory@3606}
// |-- stateKeySelector = null
// |-- stateKeyType = null
// |-- id = 5
// |-- name = "Print to Std. Out"
// |-- outputType = {GenericTypeInfo@3608} "GenericType<java.lang.Object>"
// |-- typeUsed = false
// |-- parallelism = 1
// |-- maxParallelism = -1
// |-- minResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
// |-- preferredResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
// |-- managedMemoryOperatorScopeUseCaseWeights = {HashMap@3609} size = 0
// |-- managedMemorySlotScopeUseCases = {HashSet@3610} size = 0
// |-- uid = null
// |-- userProvidedNodeHash = null
// |-- bufferTimeout = -1
// |-- slotSharingGroup = null
// |-- coLocationGroupKey = null
// transformations 是在DataStream#doTransform方法中
for (Transformation<?> transformation : transformations) {
// 遍历所有transformation并转换为计算图
transform(transformation);
}
final StreamGraph builtStreamGraph = streamGraph;
alreadyTransformed.clear();
alreadyTransformed = null;
streamGraph = null;
return builtStreamGraph;
}
2.3. DataStream#map 转换示例
map 转换将用户自定义的函数 MapFunction 包装到 StreamMap这个 Operator 中,
再将 StreamMap 包装到 OneInputTransformation,最后该 transformation 存到 env 中,
当调用 env.execute 时,遍历其中的 transformation 集合构造出 StreamGraph。
DataStream 上常见的 transformation 有 map、 flatmap、 filter 等。
这些 transformation 会构造出一棵 StreamTransformation 树,通过这棵树转换成 StreamGraph。
以 map 为例,分析List<Transformation<?>> transformations 的数据:
2.3.1. DataStream#map
/**
* 在{@link DataStream}上应用 Map 转换
* transformation 为数据流的每个元素调用一个{@link MapFunction}。
* 每个MapFunction调用只返回一个元素。
*
* 用户还可以扩展{@link RichMapFunction}以访问
* {@link org.apache.flink.api.common.functions.RichFunction}接口 .
*
*
* Applies a Map transformation on a {@link DataStream}.
*
* The transformation calls a {@link MapFunction} for each element of the DataStream.
*
* Each MapFunction call returns exactly one element.
*
* The user can also extend {@link RichMapFunction} to gain access to other features
* provided by the {@link org.apache.flink.api.common.functions.RichFunction} interface.
*
* @param mapper The MapFunction that is called for each element of the DataStream.
* @param outputType {@link TypeInformation} for the result type of the function.
* @param <R> output type
* @return The transformed {@link DataStream}.
*/
public <R> SingleOutputStreamOperator<R> map(
MapFunction<T, R> mapper, TypeInformation<R> outputType) {
// 返回一个新的 DataStream, StreamMap 为 StreamOperator 的实现类
return transform("Map", outputType, new StreamMap<>(clean(mapper)));
}
2.3.2. DataStream#transform
方法传递用户定义的运算符以及将转换DataStreams的类型信息。
/**
*
*
* Method for passing user defined operators along with the type information that will transform
* the DataStream.
*
* @param operatorName name of the operator, for logging purposes
* @param outTypeInfo the output type of the operator
* @param operator the object containing the transformation logic
* @param <R> type of the return stream
* @return the data stream constructed
* @see #transform(String, TypeInformation, OneInputStreamOperatorFactory)
*/
@PublicEvolving
public <R> SingleOutputStreamOperator<R> transform(
String operatorName,
TypeInformation<R> outTypeInfo,
OneInputStreamOperator<T, R> operator) {
// operatorName: Flat Map
// outTypeInfo : PojoType<org.apache.flink.streaming.examples.socket.SocketWindowWordCount$WordWithCount, fields = [count: Long, word: String]>
// operator : StreamFlatMap
return doTransform(operatorName, outTypeInfo, SimpleOperatorFactory.of(operator));
}
2.3.3. DataStream#doTransform
protected <R> SingleOutputStreamOperator<R> doTransform(
String operatorName,
TypeInformation<R> outTypeInfo,
StreamOperatorFactory<R> operatorFactory) {
// read the output type of the input Transform to coax out errors about MissingTypeInfo
transformation.getOutputType();
// 构造OneInputTransformation 单输入转换
OneInputTransformation<T, R> resultTransform =
new OneInputTransformation<>(
this.transformation,
operatorName,
operatorFactory,
outTypeInfo,
environment.getParallelism());
// 新的 transformation 会连接上当前 DataStream 中的 transformation,从而构建成一棵树
@SuppressWarnings({"unchecked", "rawtypes"})
SingleOutputStreamOperator<R> returnStream =
new SingleOutputStreamOperator(environment, resultTransform);
// 添加到StreamExecutionEnvironment中的 List<Transformation<?>> transformations 集合中
getExecutionEnvironment().addOperator(resultTransform);
return returnStream;
}
2.3.4. Transformation
并不是每一个 StreamTransformation 都会转换成 runtime 层中物理操作。有一些只是逻辑概念,比如 union、 split/select、 partition 等。如下图所示的转换树,在运行时会优化成下方的操作图
- Transformation 代码注释有写…
union、 split/select(1.12 已移除)、 partition 中的信息会被写入到 Source – > Map 的边中。
通过源码也可以发现 UnionTransformation , SplitTransformation(1.12 移除) ,SelectTransformation(1.12 移除) ,
PartitionTransformation 由于不包含具体的操作所以都没有 StreamOperator 成员变量,
而其他 StreamTransformation 的子类基本上都有。
/**
* {@code Transformation} 代表创建DataStream 的操作.
* 每个数据流都有一个底层的{@code Transformation},它是所述数据流的起源。
*
* 类似DataStream#map 这样的API操作在底层都会构建一个 {@code Transformation}s 树 .
*
* 当 stream 程序执行的时候都会将这个graph使用StreamGraphGenerator 生成一个StreamGraph
*
* {@code Transformation}不一定对应于运行时的物理操作。
*
* 一些操作仅仅是逻辑上的概念
*
*
* <pre>{@code
* Source Source
* + +
* | |
* v v
* Rebalance HashPartition
* + +
* | |
* | |
* +------>Union<------+
* +
* |
* v
* Split
* +
* |
* v
* Select
* +
* v
* Map
* +
* |
* v
* Sink
* }</pre>
*
* 将在运行时生成此操作图:
*
* <pre>{@code
* Source Source
* + +
* | |
* | |
* +------->Map<-------+
* +
* |
* v
* Sink
* }</pre>
*
* partitioning, union, split/select 之类的操作最终都会被编码在连接map算子的edges中.
*
**/
三 .StreamGraph 生成的源码
3.1.StreamGraphGenerator
StreamExecutionEnvironment.java -> generator() -> transform()
/**
* Transforms one {@code Transformation}.
*
* 这将检查我们是否已经对其进行了转换,并在这种情况下提前退出。
* 如果不是,它将委托给特定于转换的方法处理。
*
* <p>This checks whether we already transformed it and exits early in that case.
* If not it delegates to one of the transformation specific methods.
*/
// 对每个 transformation 进行转换,转换成 StreamGraph 中的 StreamNode 和 StreamEdge
// 返回值为该 transform 的 id 集合,通常大小为 1 个(除 FeedbackTransformation)
private Collection<Integer> transform(Transformation<?> transform) {
// 直接返回已经完成转换的实例
if (alreadyTransformed.containsKey(transform)) {
return alreadyTransformed.get(transform);
}
LOG.debug("Transforming " + transform);
if (transform.getMaxParallelism() <= 0) {
// if the max parallelism hasn't been set, then first use the job wide max parallelism
// from the ExecutionConfig.
int globalMaxParallelismFromConfig = executionConfig.getMaxParallelism();
if (globalMaxParallelismFromConfig > 0) {
transform.setMaxParallelism(globalMaxParallelismFromConfig);
}
}
// 为了触发 MissingTypeInfo 的异常
// call at least once to trigger exceptions about MissingTypeInfo
transform.getOutputType();
@SuppressWarnings("unchecked")
final TransformationTranslator<?, Transformation<?>> translator =
(TransformationTranslator<?, Transformation<?>>)
translatorMap.get(transform.getClass());
Collection<Integer> transformedIds;
if (translator != null) {
// 转换 ???
transformedIds = translate(translator, transform);
} else {
transformedIds = legacyTransform(transform);
}
// need this check because the iterate transformation adds itself before
// transforming the feedback edges
if (!alreadyTransformed.containsKey(transform)) {
alreadyTransformed.put(transform, transformedIds);
}
return transformedIds;
}
private Collection<Integer> translate(
final TransformationTranslator<?, Transformation<?>> translator,
final Transformation<?> transform) {
checkNotNull(translator);
checkNotNull(transform);
final List<Collection<Integer>> allInputIds = getParentInputIds(transform.getInputs());
// 根据不同的类型调用相应的转换逻辑
// 已经处理, 直接返回..
// the recursive call might have already transformed this
if (alreadyTransformed.containsKey(transform)) {
return alreadyTransformed.get(transform);
}
// 确定slot共享组 : default
final String slotSharingGroup =
determineSlotSharingGroup(
transform.getSlotSharingGroup(),
allInputIds.stream()
.flatMap(Collection::stream)
.collect(Collectors.toList()));
final TransformationTranslator.Context context =
new ContextImpl(this, streamGraph, slotSharingGroup, configuration);
// 是否执行在批处理模式 ???? ==> translateForStreaming
return shouldExecuteInBatchMode
? translator.translateForBatch(transform, context) // 批处理
: translator.translateForStreaming(transform, context); //流处理
}
3.2. translateForStreaming#translateForStreaming
@Override
public final Collection<Integer> translateForStreaming(
final T transformation, final Context context) {
checkNotNull(transformation);
checkNotNull(context);
// translateForStreamingInternal ??
final Collection<Integer> transformedIds =
translateForStreamingInternal(transformation, context);
configure(transformation, context);
return transformedIds;
}
/**
* 算子对应关系 :
* key xxx 分区相关的算子 : translateForStreamingInternal
* map flatmap 转换算子 : OneInputTransformationTranslator#translateForStreamingInternal
*
* Translates a given {@link Transformation} to its runtime implementation for STREAMING-style
* execution.
*
* @param transformation The transformation to be translated.
* @param context The translation context.
* @return The ids of the "last" {@link StreamNode StreamNodes} in the transformation graph
* corresponding to this transformation. These will be the nodes that a potential following
* transformation will need to connect to.
*/
protected abstract Collection<Integer> translateForStreamingInternal(
final T transformation, final Context context);
/**
* OneInputTransformationTranslator # translateForStreamingInternal
* map , flatmap 转换算子 相关
*
* @param transformation The transformation to be translated.
* @param context The translation context.
* @return
*/
@Override
public Collection<Integer> translateForStreamingInternal(
final OneInputTransformation<IN, OUT> transformation, final Context context) {
return translateInternal(
transformation,
transformation.getOperatorFactory(),
transformation.getInputType(),
transformation.getStateKeySelector(),
transformation.getStateKeyType(),
context);
}
3.3. AbstractOneInputTransformationTranslator#translateForStreamingInternal
该函数首先会对该 transform 的上游 transform 进行递归转换,确保上游的都已经完成了转化。
然后通过 transform 构造出 StreamNode,最后与上游的 transform 进行连接,构造出 StreamEdge。
protected Collection<Integer> translateInternal(
final Transformation<OUT> transformation,
final StreamOperatorFactory<OUT> operatorFactory,
final TypeInformation<IN> inputType,
@Nullable final KeySelector<IN, ?> stateKeySelector,
@Nullable final TypeInformation<?> stateKeyType,
final Context context) {
checkNotNull(transformation);
checkNotNull(operatorFactory);
checkNotNull(inputType);
checkNotNull(context);
final StreamGraph streamGraph = context.getStreamGraph();
final String slotSharingGroup = context.getSlotSharingGroup();
final int transformationId = transformation.getId();
final ExecutionConfig executionConfig = streamGraph.getExecutionConfig();
// 添加 StreamNode
streamGraph.addOperator(
transformationId,
slotSharingGroup,
transformation.getCoLocationGroupKey(),
operatorFactory,
inputType,
transformation.getOutputType(),
transformation.getName());
if (stateKeySelector != null) {
TypeSerializer<?> keySerializer = stateKeyType.createSerializer(executionConfig);
streamGraph.setOneInputStateKey(transformationId, stateKeySelector, keySerializer);
}
// 处理并行度
int parallelism =
transformation.getParallelism() != ExecutionConfig.PARALLELISM_DEFAULT
? transformation.getParallelism()
: executionConfig.getParallelism();
// 设置并行度
streamGraph.setParallelism(transformationId, parallelism);
streamGraph.setMaxParallelism(transformationId, transformation.getMaxParallelism());
final List<Transformation<?>> parentTransformations = transformation.getInputs();
checkState(
parentTransformations.size() == 1,
"Expected exactly one input transformation but found "
+ parentTransformations.size());
// 添加边 StreamEdge
for (Integer inputId : context.getStreamNodeIds(parentTransformations.get(0))) {
streamGraph.addEdge(inputId, transformationId, 0);
}
return Collections.singleton(transformationId);
}
3.4.PartitionTransformationTranslator#addEdgeInternal
对partition 的转换没有生成具体的 StreamNode 和 StreamEdge,而是添加一个虚节点。
当 partition 的下游 transform(如 map)添加 edge 时(调用 StreamGraph.addEdge),会把partition 信息写入到 edge 中。
// 逻辑转换(partition、 union 等)的处理
@Override
protected Collection<Integer> translateForStreamingInternal(
final PartitionTransformation<OUT> transformation, final Context context) {
return translateInternal(transformation, context);
}
// 逻辑转换(partition、 union 等)的处理
private Collection<Integer> translateInternal(
final PartitionTransformation<OUT> transformation, final Context context) {
checkNotNull(transformation);
checkNotNull(context);
final StreamGraph streamGraph = context.getStreamGraph();
final List<Transformation<?>> parentTransformations = transformation.getInputs();
checkState(
parentTransformations.size() == 1,
"Expected exactly one input transformation but found "
+ parentTransformations.size());
final Transformation<?> input = parentTransformations.get(0);
List<Integer> resultIds = new ArrayList<>();
for (Integer inputId : context.getStreamNodeIds(input)) {
// 生成一个新的虚拟 id
final int virtualId = Transformation.getNewNodeId();
// 添加一个虚拟分区节点,不会生成 StreamNode
streamGraph.addVirtualPartitionNode(
inputId,
virtualId,
transformation.getPartitioner(),
transformation.getShuffleMode());
resultIds.add(virtualId);
}
return resultIds;
}
3.5. StreamGraph#addEdge
AbstractOneInputTransformationTranslator.java -> translateInternal()
public void addEdge(Integer upStreamVertexID, Integer downStreamVertexID, int typeNumber) {
addEdgeInternal(
upStreamVertexID,
downStreamVertexID,
typeNumber,
null,
new ArrayList<String>(),
null,
null);
}
private void addEdgeInternal(
Integer upStreamVertexID,
Integer downStreamVertexID,
int typeNumber,
StreamPartitioner<?> partitioner,
List<String> outputNames,
OutputTag outputTag,
ShuffleMode shuffleMode) {
// 当上游是侧输出时,递归调用,并传入侧输出信息
if (virtualSideOutputNodes.containsKey(upStreamVertexID)) {
int virtualId = upStreamVertexID;
upStreamVertexID = virtualSideOutputNodes.get(virtualId).f0;
if (outputTag == null) {
outputTag = virtualSideOutputNodes.get(virtualId).f1;
}
// 递归调用
addEdgeInternal(
upStreamVertexID,
downStreamVertexID,
typeNumber,
partitioner,
null,
outputTag,
shuffleMode);
} else if (virtualPartitionNodes.containsKey(upStreamVertexID)) {
// 当上游是partition时, 递归调用, 并传入partitioner信息
int virtualId = upStreamVertexID;
upStreamVertexID = virtualPartitionNodes.get(virtualId).f0;
if (partitioner == null) {
partitioner = virtualPartitionNodes.get(virtualId).f1;
}
shuffleMode = virtualPartitionNodes.get(virtualId).f2;
// 递归调用
addEdgeInternal(
upStreamVertexID,
downStreamVertexID,
typeNumber,
partitioner,
outputNames,
outputTag,
shuffleMode);
} else {
// 真正构建 StreamEdge
// 上游节点
StreamNode upstreamNode = getStreamNode(upStreamVertexID);
// 下游节点
StreamNode downstreamNode = getStreamNode(downStreamVertexID);
// 如果并行度相同使用ForwardPartitioner , 如果分区器不同这是用RebalancePartitioner 分区器
// If no partitioner was specified and the parallelism of upstream and downstream
// operator matches use forward partitioning, use rebalance otherwise.
if (partitioner == null
&& upstreamNode.getParallelism() == downstreamNode.getParallelism()) {
partitioner = new ForwardPartitioner<Object>();
} else if (partitioner == null) {
partitioner = new RebalancePartitioner<Object>();
}
// 如果上游的并行度和下游的并行度不一致,则抛出异常...
if (partitioner instanceof ForwardPartitioner) {
if (upstreamNode.getParallelism() != downstreamNode.getParallelism()) {
throw new UnsupportedOperationException(
"Forward partitioning does not allow "
+ "change of parallelism. Upstream operation: "
+ upstreamNode
+ " parallelism: "
+ upstreamNode.getParallelism()
+ ", downstream operation: "
+ downstreamNode
+ " parallelism: "
+ downstreamNode.getParallelism()
+ " You must use another partitioning strategy, such as broadcast, rebalance, shuffle or global.");
}
}
if (shuffleMode == null) {
shuffleMode = ShuffleMode.UNDEFINED;
}
// 构建 edge
StreamEdge edge =
new StreamEdge(
upstreamNode,
downstreamNode,
typeNumber,
partitioner,
outputTag,
shuffleMode);
// 添加 edge
getStreamNode(edge.getSourceId()).addOutEdge(edge);
getStreamNode(edge.getTargetId()).addInEdge(edge);
}
}
四 .StreamGraph数据结构.
4.1. StreamGraph 属性
StreamGraph 的属性有不少核心点 属性. 比如名称, 配置相关, 调度模式(EAGER), StreamNode 集合 等等…
private static final Logger LOG = LoggerFactory.getLogger(StreamGraph.class);
public static final String ITERATION_SOURCE_NAME_PREFIX = "IterationSource";
public static final String ITERATION_SINK_NAME_PREFIX = "IterationSink";
// job name
private String jobName;
// 配置 相关
private final ExecutionConfig executionConfig;
private final CheckpointConfig checkpointConfig;
private SavepointRestoreSettings savepointRestoreSettings = SavepointRestoreSettings.none();
// ScheduleMode : EAGER
private ScheduleMode scheduleMode;
// true
private boolean chaining;
// 上传的文件
private Collection<Tuple2<String, DistributedCache.DistributedCacheEntry>> userArtifacts;
// {TimeCharacteristic@3582} "EventTime"
private TimeCharacteristic timeCharacteristic;
// ALL_EDGES_PIPELINED
private GlobalDataExchangeMode globalDataExchangeMode;
/**
* 用于指示默认情况下是否将所有顶点放入同一插槽共享组的标志。
*
* Flag to indicate whether to put all vertices into the same slot sharing group by default.
* */
private boolean allVerticesInSameSlotSharingGroupByDefault = true;
// StreamNode 集合
private Map<Integer, StreamNode> streamNodes;
// Graph的起点 id
private Set<Integer> sources;
// Graph的总店 id 5
private Set<Integer> sinks;
private Map<Integer, Tuple2<Integer, OutputTag>> virtualSideOutputNodes;
// {Integer@3683} 6 -> {Tuple3@3914} "(2,HASH,UNDEFINED)"
private Map<Integer, Tuple3<Integer, StreamPartitioner<?>, ShuffleMode>> virtualPartitionNodes;
protected Map<Integer, String> vertexIDtoBrokerID;
protected Map<Integer, Long> vertexIDtoLoopTimeout;
private StateBackend stateBackend;
private Set<Tuple2<StreamNode, StreamNode>> iterationSourceSinkPairs;
private InternalTimeServiceManager.Provider timerServiceProvider;
4.2. StreamNode 类
StreamNode 代表graph中的一个点. 比较重要的是 并行度. 名称, 入边/出边 等参数…
private final int id;
// 并行度
private int parallelism;
// 名称
private final String operatorName;
private List<StreamEdge> inEdges = new ArrayList<StreamEdge>();
private List<StreamEdge> outEdges = new ArrayList<StreamEdge>();
// 分区相关
private KeySelector<?, ?>[] statePartitioners = new KeySelector[0];
private TypeSerializer<?> stateKeySerializer;
// 其他参数..
/**
* Maximum parallelism for this stream node.
* The maximum parallelism is the upper limit for dynamic scaling and the number of key groups used for partitioned state.
*/
private int maxParallelism;
private ResourceSpec minResources = ResourceSpec.DEFAULT;
private ResourceSpec preferredResources = ResourceSpec.DEFAULT;
private final Map<ManagedMemoryUseCase, Integer> managedMemoryOperatorScopeUseCaseWeights = new HashMap<>();
private final Set<ManagedMemoryUseCase> managedMemorySlotScopeUseCases = new HashSet<>();
private long bufferTimeout;
private @Nullable String slotSharingGroup;
private @Nullable String coLocationGroup;
private StreamOperatorFactory<?> operatorFactory;
private TypeSerializer<?>[] typeSerializersIn = new TypeSerializer[0];
private TypeSerializer<?> typeSerializerOut;
private final Class<? extends AbstractInvokable> jobVertexClass;
private InputFormat<?, ?> inputFormat;
private OutputFormat<?> outputFormat;
private String transformationUID;
private String userHash;
private final Map<Integer, StreamConfig.InputRequirement> inputRequirements = new HashMap<>();
4.3. StreamEdge 类
在streaming中的一条边.
这样的一条边不一定会转换为两个作业顶点之间的连接 (由于链接/优化)
比较重要的是 StreamEdge 中的 sourceId / targetId 以及分区 outputPartitioner 相关的信息.
// id
private final String edgeId;
// source StreamNode id
private final int sourceId;
// target StreamNode id
private final int targetId;
/** The type number of the input for co-tasks. */
private final int typeNumber;
/** The side-output tag (if any) of this {@link StreamEdge}. */
private final OutputTag outputTag;
/**
* 分区相关.
*
* The {@link StreamPartitioner} on this {@link StreamEdge}.
* */
private StreamPartitioner<?> outputPartitioner;
/**
* source StreamNode name
* The name of the operator in the source vertex.
* */
private final String sourceOperatorName;
/**
* target StreamNode name
* The name of the operator in the target vertex.
* */
private final String targetOperatorName;
/**
* PIPELINED
*/
private final ShuffleMode shuffleMode;
/**
* 超时相关...
*/
private long bufferTimeout;
4.4. StreamGraph 方法
StreamGraph的方法主要是对StreamGraph中的属性做set/get操作.
挑几个比较核心的瞄一眼.
4.4.1. addOperator
private <IN, OUT> void addOperator(
Integer vertexID,
@Nullable String slotSharingGroup,
@Nullable String coLocationGroup,
StreamOperatorFactory<OUT> operatorFactory,
TypeInformation<IN> inTypeInfo,
TypeInformation<OUT> outTypeInfo,
String operatorName,
Class<? extends AbstractInvokable> invokableClass) {
// 构建 StreamNode 并加入缓存 streamNodes
// vertexID: 2
// slotSharingGroup : default
// coLocationGroup : null
// StreamOperatorFactory : SimpleUdfStreamOperatorFactory
// inTypeInfo : String
// outTypeInfo : PojoType<org.apache.flink.streaming.examples.socket.SocketWindowWordCount$WordWithCount, fields = [count: Long, word: String]>
// operatorName : Flat Map
// invokableClass : class org.apache.flink.streaming.runtime.tasks.OneInputStreamTask
addNode(
vertexID,
slotSharingGroup,
coLocationGroup,
invokableClass,
operatorFactory,
operatorName);
setSerializers(vertexID, createSerializer(inTypeInfo), null, createSerializer(outTypeInfo));
if (operatorFactory.isOutputTypeConfigurable() && outTypeInfo != null) {
// sets the output type which must be know at StreamGraph creation time
operatorFactory.setOutputType(outTypeInfo, executionConfig);
}
if (operatorFactory.isInputTypeConfigurable()) {
operatorFactory.setInputType(inTypeInfo, executionConfig);
}
if (LOG.isDebugEnabled()) {
LOG.debug("Vertex: {}", vertexID);
}
}
4.4.2. addNode
protected StreamNode addNode(
Integer vertexID,
@Nullable String slotSharingGroup,
@Nullable String coLocationGroup,
Class<? extends AbstractInvokable> vertexClass,
StreamOperatorFactory<?> operatorFactory,
String operatorName) {
if (streamNodes.containsKey(vertexID)) {
throw new RuntimeException("Duplicate vertexID " + vertexID);
}
// vertexID: 2
// slotSharingGroup : default
// coLocationGroup : null
// vertexClass : class org.apache.flink.streaming.runtime.tasks.OneInputStreamTask
// StreamOperatorFactory : SimpleUdfStreamOperatorFactory
// operatorName : Flat Map
StreamNode vertex =
new StreamNode(
vertexID,
slotSharingGroup,
coLocationGroup,
operatorFactory,
operatorName,
vertexClass);
// 加入缓存 : streamNodes
streamNodes.put(vertexID, vertex);
return vertex;
}
4.4.3. addEdge
public void addEdge(Integer upStreamVertexID, Integer downStreamVertexID, int typeNumber) {
addEdgeInternal(
upStreamVertexID,
downStreamVertexID,
typeNumber,
null,
new ArrayList<String>(),
null,
null);
}
private void addEdgeInternal(
Integer upStreamVertexID,
Integer downStreamVertexID,
int typeNumber,
StreamPartitioner<?> partitioner,
List<String> outputNames,
OutputTag outputTag,
ShuffleMode shuffleMode) {
// 当上游是侧输出时,递归调用,并传入侧输出信息
if (virtualSideOutputNodes.containsKey(upStreamVertexID)) {
int virtualId = upStreamVertexID;
upStreamVertexID = virtualSideOutputNodes.get(virtualId).f0;
if (outputTag == null) {
outputTag = virtualSideOutputNodes.get(virtualId).f1;
}
// 递归调用
addEdgeInternal(
upStreamVertexID,
downStreamVertexID,
typeNumber,
partitioner,
null,
outputTag,
shuffleMode);
} else if (virtualPartitionNodes.containsKey(upStreamVertexID)) {
// 当上游是partition时, 递归调用, 并传入partitioner信息
int virtualId = upStreamVertexID;
upStreamVertexID = virtualPartitionNodes.get(virtualId).f0;
if (partitioner == null) {
partitioner = virtualPartitionNodes.get(virtualId).f1;
}
shuffleMode = virtualPartitionNodes.get(virtualId).f2;
// 递归调用
addEdgeInternal(
upStreamVertexID,
downStreamVertexID,
typeNumber,
partitioner,
outputNames,
outputTag,
shuffleMode);
} else {
// 真正构建 StreamEdge
// 上游节点
StreamNode upstreamNode = getStreamNode(upStreamVertexID);
// 下游节点
StreamNode downstreamNode = getStreamNode(downStreamVertexID);
// 如果并行度相同使用ForwardPartitioner , 如果分区器不同这是用RebalancePartitioner 分区器
// If no partitioner was specified and the parallelism of upstream and downstream
// operator matches use forward partitioning, use rebalance otherwise.
if (partitioner == null
&& upstreamNode.getParallelism() == downstreamNode.getParallelism()) {
partitioner = new ForwardPartitioner<Object>();
} else if (partitioner == null) {
partitioner = new RebalancePartitioner<Object>();
}
// 如果上游的并行度和下游的并行度不一致,则抛出异常...
if (partitioner instanceof ForwardPartitioner) {
if (upstreamNode.getParallelism() != downstreamNode.getParallelism()) {
throw new UnsupportedOperationException(
"Forward partitioning does not allow "
+ "change of parallelism. Upstream operation: "
+ upstreamNode
+ " parallelism: "
+ upstreamNode.getParallelism()
+ ", downstream operation: "
+ downstreamNode
+ " parallelism: "
+ downstreamNode.getParallelism()
+ " You must use another partitioning strategy, such as broadcast, rebalance, shuffle or global.");
}
}
if (shuffleMode == null) {
shuffleMode = ShuffleMode.UNDEFINED;
}
// 构建 edge
StreamEdge edge =
new StreamEdge(
upstreamNode,
downstreamNode,
typeNumber,
partitioner,
outputTag,
shuffleMode);
// 添加 edge
getStreamNode(edge.getSourceId()).addOutEdge(edge);
getStreamNode(edge.getTargetId()).addInEdge(edge);
}
}
4.5. 数据实例
{
"nodes" : [ {
"id" : 1,
"type" : "Source: Socket Stream",
"pact" : "Data Source",
"contents" : "Source: Socket Stream",
"parallelism" : 1
}, {
"id" : 2,
"type" : "Flat Map",
"pact" : "Operator",
"contents" : "Flat Map",
"parallelism" : 4,
"predecessors" : [ {
"id" : 1,
"ship_strategy" : "REBALANCE",
"side" : "second"
} ]
}, {
"id" : 4,
"type" : "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)",
"pact" : "Operator",
"contents" : "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)",
"parallelism" : 4,
"predecessors" : [ {
"id" : 2,
"ship_strategy" : "HASH",
"side" : "second"
} ]
}, {
"id" : 5,
"type" : "Sink: Print to Std. Out",
"pact" : "Data Sink",
"contents" : "Sink: Print to Std. Out",
"parallelism" : 1,
"predecessors" : [ {
"id" : 4,
"ship_strategy" : "REBALANCE",
"side" : "second"
} ]
} ]
}
streamGraph = {StreamGraph@3473}
jobName = "Socket Window WordCount"
executionConfig = {ExecutionConfig@3399} "ExecutionConfig{executionMode=PIPELINED, closureCleanerLevel=RECURSIVE, parallelism=4, maxParallelism=-1, numberOfExecutionRetries=-1, forceKryo=false, disableGenericTypes=false, enableAutoGeneratedUids=true, objectReuse=false, autoTypeRegistrationEnabled=true, forceAvro=false, autoWatermarkInterval=200, latencyTrackingInterval=0, isLatencyTrackingConfigured=false, executionRetryDelay=10000, restartStrategyConfiguration=Cluster level default restart strategy, taskCancellationIntervalMillis=-1, taskCancellationTimeoutMillis=-1, useSnapshotCompression=false, defaultInputDependencyConstraint=ANY, globalJobParameters=org.apache.flink.api.common.ExecutionConfig$GlobalJobParameters@1, registeredTypesWithKryoSerializers={}, registeredTypesWithKryoSerializerClasses={}, defaultKryoSerializers={}, defaultKryoSerializerClasses={}, registeredKryoTypes=[], registeredPojoTypes=[]}"
checkpointConfig = {CheckpointConfig@3465}
checkpointingMode = {CheckpointingMode@3936} "EXACTLY_ONCE"
checkpointInterval = -1
checkpointTimeout = 600000
minPauseBetweenCheckpoints = 0
maxConcurrentCheckpoints = 1
forceCheckpointing = false
forceUnalignedCheckpoints = false
unalignedCheckpointsEnabled = false
alignmentTimeout = 0
approximateLocalRecovery = false
externalizedCheckpointCleanup = null
failOnCheckpointingErrors = true
preferCheckpointForRecovery = false
tolerableCheckpointFailureNumber = -1
savepointRestoreSettings = {SavepointRestoreSettings@3467} "SavepointRestoreSettings.none()"
scheduleMode = {ScheduleMode@3580} "EAGER"
chaining = true
userArtifacts = {ArrayList@3581} size = 0
timeCharacteristic = {TimeCharacteristic@3582} "EventTime"
globalDataExchangeMode = {GlobalDataExchangeMode@3583} "ALL_EDGES_PIPELINED"
allVerticesInSameSlotSharingGroupByDefault = true
streamNodes = {HashMap@3584} size = 4
{Integer@3540} 1 -> {StreamNode@3929} "Source: Socket Stream-1"
key = {Integer@3540} 1
value = {StreamNode@3929} "Source: Socket Stream-1"
id = 1
parallelism = 1
maxParallelism = -1
minResources = {ResourceSpec@3492} "ResourceSpec{UNKNOWN}"
preferredResources = {ResourceSpec@3492} "ResourceSpec{UNKNOWN}"
managedMemoryOperatorScopeUseCaseWeights = {HashMap@3939} size = 0
managedMemorySlotScopeUseCases = {HashSet@3940} size = 0
bufferTimeout = -1
operatorName = "Source: Socket Stream"
slotSharingGroup = "default"
coLocationGroup = null
statePartitioners = {KeySelector[0]@3942}
stateKeySerializer = null
operatorFactory = {SimpleUdfStreamOperatorFactory@3498}
typeSerializersIn = {TypeSerializer[2]@3943}
typeSerializerOut = {StringSerializer@3944}
inEdges = {ArrayList@3945} size = 0
outEdges = {ArrayList@3946} size = 1
0 = {StreamEdge@3953} "(Source: Socket Stream-1 -> Flat Map-2, typeNumber=0, outputPartitioner=REBALANCE, bufferTimeout=-1, outputTag=null)"
edgeId = "Source: Socket Stream-1_Flat Map-2_0_REBALANCE"
sourceId = 1
targetId = 2
typeNumber = 0
outputTag = null
outputPartitioner = {RebalancePartitioner@3971} "REBALANCE"
sourceOperatorName = "Source: Socket Stream"
targetOperatorName = "Flat Map"
shuffleMode = {ShuffleMode@3664} "UNDEFINED"
bufferTimeout = -1
0 = {StreamEdge@3953} "(Source: Socket Stream-1 -> Flat Map-2, typeNumber=0, outputPartitioner=REBALANCE, bufferTimeout=-1, outputTag=null)"
jobVertexClass = {Class@3947} "class org.apache.flink.streaming.runtime.tasks.SourceStreamTask"
inputFormat = null
outputFormat = null
transformationUID = null
userHash = null
sortedInputs = false
{Integer@3652} 2 -> {StreamNode@3715} "Flat Map-2"
key = {Integer@3652} 2
value = {StreamNode@3715} "Flat Map-2"
id = 2
parallelism = 4
maxParallelism = -1
minResources = {ResourceSpec@3492} "ResourceSpec{UNKNOWN}"
preferredResources = {ResourceSpec@3492} "ResourceSpec{UNKNOWN}"
managedMemoryOperatorScopeUseCaseWeights = {HashMap@3956} size = 0
managedMemorySlotScopeUseCases = {HashSet@3957} size = 0
bufferTimeout = -1
operatorName = "Flat Map"
slotSharingGroup = "default"
coLocationGroup = null
statePartitioners = {KeySelector[0]@3958}
stateKeySerializer = null
operatorFactory = {SimpleUdfStreamOperatorFactory@3490}
typeSerializersIn = {TypeSerializer[2]@3959}
typeSerializerOut = {PojoSerializer@3960}
inEdges = {ArrayList@3961} size = 1
0 = {StreamEdge@3953} "(Source: Socket Stream-1 -> Flat Map-2, typeNumber=0, outputPartitioner=REBALANCE, bufferTimeout=-1, outputTag=null)"
edgeId = "Source: Socket Stream-1_Flat Map-2_0_REBALANCE"
sourceId = 1
targetId = 2
typeNumber = 0
outputTag = null
outputPartitioner = {RebalancePartitioner@3971} "REBALANCE"
sourceOperatorName = "Source: Socket Stream"
targetOperatorName = "Flat Map"
shuffleMode = {ShuffleMode@3664} "UNDEFINED"
bufferTimeout = -1
outEdges = {ArrayList@3962} size = 1
0 = {StreamEdge@3968} "(Flat Map-2 -> Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4, typeNumber=0, outputPartitioner=HASH, bufferTimeout=-1, outputTag=null)"
edgeId = "Flat Map-2_Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4_0_HASH"
sourceId = 2
targetId = 4
typeNumber = 0
outputTag = null
outputPartitioner = {KeyGroupStreamPartitioner@3663} "HASH"
sourceOperatorName = "Flat Map"
targetOperatorName = "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)"
shuffleMode = {ShuffleMode@3664} "UNDEFINED"
bufferTimeout = -1
jobVertexClass = {Class@3963} "class org.apache.flink.streaming.runtime.tasks.OneInputStreamTask"
inputFormat = null
outputFormat = null
transformationUID = null
userHash = null
sortedInputs = false
{Integer@3777} 4 -> {StreamNode@3781} "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4"
key = {Integer@3777} 4
value = {StreamNode@3781} "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4"
id = 4
parallelism = 4
maxParallelism = -1
minResources = {ResourceSpec@3492} "ResourceSpec{UNKNOWN}"
preferredResources = {ResourceSpec@3492} "ResourceSpec{UNKNOWN}"
managedMemoryOperatorScopeUseCaseWeights = {HashMap@3978} size = 0
managedMemorySlotScopeUseCases = {HashSet@3979} size = 1
bufferTimeout = -1
operatorName = "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)"
slotSharingGroup = "default"
coLocationGroup = null
statePartitioners = {KeySelector[1]@3980}
stateKeySerializer = {StringSerializer@3944}
operatorFactory = {SimpleUdfStreamOperatorFactory@3743}
typeSerializersIn = {TypeSerializer[2]@3981}
typeSerializerOut = {PojoSerializer@3982}
inEdges = {ArrayList@3983} size = 1
0 = {StreamEdge@3968} "(Flat Map-2 -> Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4, typeNumber=0, outputPartitioner=HASH, bufferTimeout=-1, outputTag=null)"
edgeId = "Flat Map-2_Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4_0_HASH"
sourceId = 2
targetId = 4
typeNumber = 0
outputTag = null
outputPartitioner = {KeyGroupStreamPartitioner@3663} "HASH"
sourceOperatorName = "Flat Map"
targetOperatorName = "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)"
shuffleMode = {ShuffleMode@3664} "UNDEFINED"
bufferTimeout = -1
outEdges = {ArrayList@3984} size = 1
0 = {StreamEdge@3989} "(Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4 -> Sink: Print to Std. Out-5, typeNumber=0, outputPartitioner=REBALANCE, bufferTimeout=-1, outputTag=null)"
edgeId = "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4_Sink: Print to Std. Out-5_0_REBALANCE"
sourceId = 4
targetId = 5
typeNumber = 0
outputTag = null
outputPartitioner = {RebalancePartitioner@3992} "REBALANCE"
sourceOperatorName = "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)"
targetOperatorName = "Sink: Print to Std. Out"
shuffleMode = {ShuffleMode@3664} "UNDEFINED"
bufferTimeout = -1
jobVertexClass = {Class@3963} "class org.apache.flink.streaming.runtime.tasks.OneInputStreamTask"
inputFormat = null
outputFormat = null
transformationUID = null
userHash = null
sortedInputs = false
{Integer@3921} 5 -> {StreamNode@3930} "Sink: Print to Std. Out-5"
key = {Integer@3921} 5
value = {StreamNode@3930} "Sink: Print to Std. Out-5"
id = 5
parallelism = 1
maxParallelism = -1
minResources = {ResourceSpec@3492} "ResourceSpec{UNKNOWN}"
preferredResources = {ResourceSpec@3492} "ResourceSpec{UNKNOWN}"
managedMemoryOperatorScopeUseCaseWeights = {HashMap@3998} size = 0
managedMemorySlotScopeUseCases = {HashSet@3999} size = 0
bufferTimeout = -1
operatorName = "Sink: Print to Std. Out"
slotSharingGroup = "default"
coLocationGroup = null
statePartitioners = {KeySelector[0]@4000}
stateKeySerializer = null
operatorFactory = {SimpleUdfStreamOperatorFactory@3810}
typeSerializersIn = {TypeSerializer[2]@4001}
typeSerializerOut = null
inEdges = {ArrayList@4002} size = 1
0 = {StreamEdge@3989} "(Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4 -> Sink: Print to Std. Out-5, typeNumber=0, outputPartitioner=REBALANCE, bufferTimeout=-1, outputTag=null)"
edgeId = "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4_Sink: Print to Std. Out-5_0_REBALANCE"
sourceId = 4
targetId = 5
typeNumber = 0
outputTag = null
outputPartitioner = {RebalancePartitioner@3992} "REBALANCE"
sourceOperatorName = "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)"
targetOperatorName = "Sink: Print to Std. Out"
shuffleMode = {ShuffleMode@3664} "UNDEFINED"
bufferTimeout = -1
outEdges = {ArrayList@4003} size = 0
jobVertexClass = {Class@3963} "class org.apache.flink.streaming.runtime.tasks.OneInputStreamTask"
inputFormat = null
outputFormat = null
transformationUID = null
userHash = null
sortedInputs = false
sources = {HashSet@3585} size = 1
0 = {Integer@3540} 1
sinks = {HashSet@3586} size = 1
0 = {Integer@3540} 5
virtualSideOutputNodes = {HashMap@3587} size = 0
virtualPartitionNodes = {HashMap@3588} size = 1
{Integer@3683} 6 -> {Tuple3@3914} "(2,HASH,UNDEFINED)"
vertexIDtoBrokerID = {HashMap@3589} size = 0
vertexIDtoLoopTimeout = {HashMap@3590} size = 0
stateBackend = null
iterationSourceSinkPairs = {HashSet@3591} size = 0
timerServiceProvider = null