Flink 1.12.2 源码分析 : StreamGraph生成

最新推荐文章于 2024-11-11 23:44:48 发布

张伯毅

最新推荐文章于 2024-11-11 23:44:48 发布

阅读量984

点赞数

CC 4.0 BY-SA版权

分类专栏： Flink 1.12.2 源码

本文链接：https://blog.csdn.net/zhanglong_4444/article/details/115551148

Flink 1.12.2 源码专栏收录该内容

33 篇文章

订阅专栏

本文解析了Apache Flink中StreamGraph的生成流程，包括从DataStream API转换为Transformation，再到StreamGraph的具体步骤。深入探讨了核心组件如StreamNode、StreamEdge的作用及构造方式。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一 .前言

StreamGraph：是根据用户通过 Stream API 编写的代码生成的最初的图。用来表示程序的拓扑结构。
StreamGraph，JobGraph在Flink客户端生成，然后提交给Flink集群。JobGraph到ExecutionGraph的转换在JobMaster中完成

根据用户通过 Stream API 编写的代码生成的最初的图。

使用DataStream API开发的应用程序，首先被转换为Transformation，然后被映射为StreamGraph，该图与具体的执行无关，核心是表达计算过程的逻辑。

StreamGraph在FlinkClient中生成，由Flink Client在提交时候触发Flink应用的main，用户编写的业务逻辑组装成Transformation流水线，在最后调用StreamExecutionEnvironment。execute（）的时候开始触发StreamGraph的构建。

在这里插入图片描述

StreamNode：用来代表 operator 的类，并具有所有相关的属性，如并发度、入边和出边等。

StreamNode是StreamGraph中的节点，从Transformation转换而来，可以简单理解为一个StreamNode表示一个算子。

StreamEdge：表示连接两个 StreamNode 的边。

StreamEdge是StreamGraph中的边，用来连接两个StreamNode，一个StreamNode可以有多个出边，入边，StreamEdge中包含了旁路输出，分区器，字段筛选输出等信息。

二 .代码

2.1. 入口位置

调用用户代码中的 StreamExecutionEnvironment.execute()
	-> execute(getJobName())
		-> execute(getStreamGraph(jobName))
			-> getStreamGraph(jobName, true)

所以最终的入口类是 : StreamExecutionEnvironment#getStreamGraph

        /**
     *
     * getStreamGraph方法会调用StreamGraphGenerator#generate方法
     * 使用StreamExecutionEnvironment及其包含的所有transformations生成计算图。
     *
     * Getter of the {@link org.apache.flink.streaming.api.graph.StreamGraph StreamGraph} of the
     * streaming job with the option to clear previously registered {@link Transformation
     * transformations}. Clearing the transformations allows, for example, to not re-execute the
     * same operations when calling {@link #execute()} multiple times.
     *
     * @param jobName Desired name of the job
     * @param clearTransformations Whether or not to clear previously registered transformations
     * @return The streamgraph representing the transformations
     */
    @Internal
    public StreamGraph getStreamGraph(String jobName, boolean clearTransformations) {
        // 获取StreamGraphGenerator构造器,生成 StreamGraph
        StreamGraph streamGraph = getStreamGraphGenerator().setJobName(jobName).generate();
        if (clearTransformations) {
            this.transformations.clear();
        }
        return streamGraph;
    }

2.2. StreamGraphGenerator#generate

在这里我们看到构造一了一个getStreamGraphGenerator 生成器, 然后通过generate方法生成StreamGraph

一个关键的参数是 List<Transformation<?>> transformations。
Transformation 代表了从一个或多个 DataStream 生成新 DataStream 的操作。
DataStream 的底层其实就是一个Transformation，描述了这个 DataStream 是怎么来的。


    // 构造StreamGraph
    public StreamGraph generate() {

        //    streamGraph = {StreamGraph@4411}
        //    jobName = null
        //    executionConfig = {ExecutionConfig@4105} "ExecutionConfig{executionMode=PIPELINED, closureCleanerLevel=RECURSIVE, parallelism=1, maxParallelism=-1, numberOfExecutionRetries=-1, forceKryo=false, disableGenericTypes=false, enableAutoGeneratedUids=true, objectReuse=false, autoTypeRegistrationEnabled=true, forceAvro=false, autoWatermarkInterval=200, latencyTrackingInterval=0, isLatencyTrackingConfigured=false, executionRetryDelay=10000, restartStrategyConfiguration=Cluster level default restart strategy, taskCancellationIntervalMillis=-1, taskCancellationTimeoutMillis=-1, useSnapshotCompression=false, defaultInputDependencyConstraint=ANY, globalJobParameters=org.apache.flink.api.common.ExecutionConfig$GlobalJobParameters@1, registeredTypesWithKryoSerializers={}, registeredTypesWithKryoSerializerClasses={}, defaultKryoSerializers={}, defaultKryoSerializerClasses={}, registeredKryoTypes=[], registeredPojoTypes=[]}"
        //            executionMode = {ExecutionMode@3509} "PIPELINED"
        //            closureCleanerLevel = {ExecutionConfig$ClosureCleanerLevel@3510} "RECURSIVE"
        //            parallelism = 4
        //            maxParallelism = -1
        //            numberOfExecutionRetries = -1
        //            forceKryo = false
        //            disableGenericTypes = false
        //            enableAutoGeneratedUids = true
        //            objectReuse = false
        //            autoTypeRegistrationEnabled = true
        //            forceAvro = false
        //            autoWatermarkInterval = 200
        //            latencyTrackingInterval = 0
        //            isLatencyTrackingConfigured = false
        //            executionRetryDelay = 10000
        //            restartStrategyConfiguration = {RestartStrategies$FallbackRestartStrategyConfiguration@3511} "Cluster level default restart strategy"
        //            taskCancellationIntervalMillis = -1
        //            taskCancellationTimeoutMillis = -1
        //            useSnapshotCompression = false
        //            defaultInputDependencyConstraint = {InputDependencyConstraint@3512} "ANY"
        //            globalJobParameters = {ExecutionConfig$GlobalJobParameters@3513}
        //            registeredTypesWithKryoSerializers = {LinkedHashMap@3514}  size = 0
        //            registeredTypesWithKryoSerializerClasses = {LinkedHashMap@3515}  size = 0
        //            defaultKryoSerializers = {LinkedHashMap@3516}  size = 0
        //            defaultKryoSerializerClasses = {LinkedHashMap@3517}  size = 0
        //            registeredKryoTypes = {LinkedHashSet@3518}  size = 0
        //            registeredPojoTypes = {LinkedHashSet@3519}  size = 0
        //    checkpointConfig = {CheckpointConfig@4399}
        //            checkpointingMode = {CheckpointingMode@3524} "EXACTLY_ONCE"
        //            checkpointInterval = -1
        //            checkpointTimeout = 600000
        //            minPauseBetweenCheckpoints = 0
        //            maxConcurrentCheckpoints = 1
        //            forceCheckpointing = false
        //            forceUnalignedCheckpoints = false
        //            unalignedCheckpointsEnabled = false
        //            alignmentTimeout = 0
        //            approximateLocalRecovery = false
        //            externalizedCheckpointCleanup = null
        //            failOnCheckpointingErrors = true
        //            preferCheckpointForRecovery = false
        //            tolerableCheckpointFailureNumber = -1
        //    savepointRestoreSettings = {SavepointRestoreSettings@4412} "SavepointRestoreSettings.none()"
        //    scheduleMode = null
        //    chaining = false
        //    userArtifacts = null
        //    timeCharacteristic = null
        //    globalDataExchangeMode = null
        //    allVerticesInSameSlotSharingGroupByDefault = true
        //    streamNodes = {HashMap@4415}  size = 0
        //    sources = {HashSet@4416}  size = 0
        //    sinks = {HashSet@4417}  size = 0
        //    virtualSideOutputNodes = {HashMap@4418}  size = 0
        //    virtualPartitionNodes = {HashMap@4419}  size = 0
        //    vertexIDtoBrokerID = {HashMap@4420}  size = 0
        //    vertexIDtoLoopTimeout = {HashMap@4421}  size = 0
        //    stateBackend = null
        //    iterationSourceSinkPairs = {HashSet@4422}  size = 0
        //    timerServiceProvider = null
        streamGraph = new StreamGraph(executionConfig, checkpointConfig, savepointRestoreSettings);


        // STREAMING
        shouldExecuteInBatchMode = shouldExecuteInBatchMode(runtimeExecutionMode);



        configureStreamGraph(streamGraph);



        alreadyTransformed = new HashMap<>();


        // transformations 的数据实例. 这个有三个集合,
        // 我只截取了最新的一个.也就是下标为2的集合.
        //
        //        {LegacySinkTransformation@3460} "LegacySinkTransformation{id=5, name='Print to Std. Out', outputType=GenericType<java.lang.Object>, parallelism=1}"
        //        |   input = {OneInputTransformation@3443} "OneInputTransformation{id=4, name='Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)', outputType=PojoType<org.apache.flink.streaming.examples.socket.SocketWindowWordCount$WordWithCount, fields = [count: Long, word: String]>, parallelism=4}"
        //        |   |     input = {PartitionTransformation@3577} "PartitionTransformation{id=3, name='Partition', outputType=PojoType<org.apache.flink.streaming.examples.socket.SocketWindowWordCount$WordWithCount, fields = [count: Long, word: String]>, parallelism=4}"
        //        |   |     |     input = {OneInputTransformation@3364} "OneInputTransformation{id=2, name='Flat Map', outputType=PojoType<org.apache.flink.streaming.examples.socket.SocketWindowWordCount$WordWithCount, fields = [count: Long, word: String]>, parallelism=4}"
        //        |   |     |     |---- input = {LegacySourceTransformation@3548} "LegacySourceTransformation{id=1, name='Socket Stream', outputType=String, parallelism=1}"
        //        |   |     |     |       |---- operatorFactory = {SimpleUdfStreamOperatorFactory@3555}
        //        |   |     |     |       |---- boundedness = {Boundedness@3253} "CONTINUOUS_UNBOUNDED"
        //        |   |     |     |       |---- id = 1
        //        |   |     |     |       |---- name = "Socket Stream"
        //        |   |     |     |       |---- outputType = {BasicTypeInfo@3254} "String"
        //        |   |     |     |       |---- typeUsed = true
        //        |   |     |     |       |---- parallelism = 1
        //        |   |     |     |       |---- maxParallelism = -1
        //        |   |     |     |       |---- minResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
        //        |   |     |     |       |---- preferredResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
        //        |   |     |     |       |---- managedMemoryOperatorScopeUseCaseWeights = {HashMap@3556}  size = 0
        //        |   |     |     |       |---- managedMemorySlotScopeUseCases = {HashSet@3557}  size = 0
        //        |   |     |     |       |---- uid = null
        //        |   |     |     |       |---- userProvidedNodeHash = null
        //        |   |     |     |       |---- bufferTimeout = -1
        //        |   |     |     |       |---- slotSharingGroup = null
        //        |   |     |     |       |---- coLocationGroupKey = null
        //        |   |     |     |---- operatorFactory = {SimpleUdfStreamOperatorFactory@3370}
        //        |   |     |     |---- stateKeySelector = null
        //        |   |     |     |---- stateKeyType = null
        //        |   |     |     |---- id = 2
        //        |   |     |     |---- name = "Flat Map"
        //        |   |     |     |---- outputType = {PojoTypeInfo@3369} "PojoType<org.apache.flink.streaming.examples.socket.SocketWindowWordCount$WordWithCount, fields = [count: Long, word: String]>"
        //        |   |     |     |---- typeUsed = true
        //        |   |     |     |---- parallelism = 4
        //        |   |     |     |---- maxParallelism = -1
        //        |   |     |     |---- minResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
        //        |   |     |     |---- preferredResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
        //        |   |     |     |---- managedMemoryOperatorScopeUseCaseWeights = {HashMap@3550}  size = 0
        //        |   |     |     |---- managedMemorySlotScopeUseCases = {HashSet@3551}  size = 0
        //        |   |     |     |---- uid = null
        //        |   |     |     |---- userProvidedNodeHash = null
        //        |   |     |     |---- bufferTimeout = -1
        //        |   |     |     |---- slotSharingGroup = null
        //        |   |     |     |---- coLocationGroupKey = null
        //        |   |     |-- partitioner = {KeyGroupStreamPartitioner@3591} "HASH"
        //        |   |     |-- shuffleMode = {ShuffleMode@3592} "UNDEFINED"
        //        |   |     |-- id = 3
        //        |   |     |-- name = "Partition"
        //        |   |     |-- outputType = {PojoTypeInfo@3369} "PojoType<org.apache.flink.streaming.examples.socket.SocketWindowWordCount$WordWithCount, fields = [count: Long, word: String]>"
        //        |   |     |-- typeUsed = true
        //        |   |     |-- parallelism = 4
        //        |   |     |-- maxParallelism = -1
        //        |   |     |-- minResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
        //        |   |     |-- preferredResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
        //        |   |     |-- managedMemoryOperatorScopeUseCaseWeights = {HashMap@3594}  size = 0
        //        |   |     |-- managedMemorySlotScopeUseCases = {HashSet@3595}  size = 0
        //        |   |     |-- uid = null
        //        |   |     |-- userProvidedNodeHash = null
        //        |   |     |-- bufferTimeout = -1
        //        |   |     |-- slotSharingGroup = null
        //        |   |     |-- coLocationGroupKey = null
        //        |   |-- operatorFactory = {SimpleUdfStreamOperatorFactory@3578}
        //        |   |-- stateKeySelector = {SocketWindowWordCount$lambda@3579}
        //        |   |-- stateKeyType = {BasicTypeInfo@3254} "String"
        //        |   |-- id = 4
        //        |   |-- name = "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)"
        //        |   |-- outputType = {PojoTypeInfo@3369} "PojoType<org.apache.flink.streaming.examples.socket.SocketWindowWordCount$WordWithCount, fields = [count: Long, word: String]>"
        //        |   |-- typeUsed = true
        //        |   |-- parallelism = 4
        //        |   |-- maxParallelism = -1
        //        |   |-- minResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
        //        |   |-- preferredResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
        //        |   |-- managedMemoryOperatorScopeUseCaseWeights = {HashMap@3580}  size = 0
        //        |   |-- managedMemorySlotScopeUseCases = {HashSet@3581}  size = 1
        //        |   |-- uid = null
        //        |   |-- userProvidedNodeHash = null
        //        |   |-- bufferTimeout = -1
        //        |   |-- slotSharingGroup = null
        //        |   |-- coLocationGroupKey = null
        //        |-- operatorFactory = {SimpleUdfStreamOperatorFactory@3606}
        //        |-- stateKeySelector = null
        //        |-- stateKeyType = null
        //        |-- id = 5
        //        |-- name = "Print to Std. Out"
        //        |-- outputType = {GenericTypeInfo@3608} "GenericType<java.lang.Object>"
        //        |-- typeUsed = false
        //        |-- parallelism = 1
        //        |-- maxParallelism = -1
        //        |-- minResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
        //        |-- preferredResources = {ResourceSpec@3549} "ResourceSpec{UNKNOWN}"
        //        |-- managedMemoryOperatorScopeUseCaseWeights = {HashMap@3609}  size = 0
        //        |-- managedMemorySlotScopeUseCases = {HashSet@3610}  size = 0
        //        |-- uid = null
        //        |-- userProvidedNodeHash = null
        //        |-- bufferTimeout = -1
        //        |-- slotSharingGroup = null
        //        |-- coLocationGroupKey = null

        // transformations 是在DataStream#doTransform方法中
        for (Transformation<?> transformation : transformations) {
            // 遍历所有transformation并转换为计算图
            transform(transformation);
        }

        final StreamGraph builtStreamGraph = streamGraph;

        alreadyTransformed.clear();
        alreadyTransformed = null;
        streamGraph = null;

        return builtStreamGraph;
    }

2.3. DataStream#map 转换示例

map 转换将用户自定义的函数 MapFunction 包装到 StreamMap这个 Operator 中，
再将 StreamMap 包装到 OneInputTransformation，最后该 transformation 存到 env 中，
当调用 env.execute 时，遍历其中的 transformation 集合构造出 StreamGraph。

DataStream 上常见的 transformation 有 map、 flatmap、 filter 等。
这些 transformation 会构造出一棵 StreamTransformation 树，通过这棵树转换成 StreamGraph。

以 map 为例，分析List<Transformation<?>> transformations 的数据：

在这里插入图片描述

2.3.1. DataStream#map



    /**
     * 在{@link DataStream}上应用 Map 转换
     * transformation 为数据流的每个元素调用一个{@link MapFunction}。
     * 每个MapFunction调用只返回一个元素。
     *
     * 用户还可以扩展{@link RichMapFunction}以访问
     * {@link org.apache.flink.api.common.functions.RichFunction}接口 .
     *
     *
     * Applies a Map transformation on a {@link DataStream}.
     *
     * The transformation calls a {@link MapFunction} for each element of the DataStream.
     *
     * Each MapFunction call returns exactly one element.
     *
     * The user can also extend {@link RichMapFunction} to gain access to other features
     * provided by the {@link org.apache.flink.api.common.functions.RichFunction} interface.
     *
     * @param mapper The MapFunction that is called for each element of the DataStream.
     * @param outputType {@link TypeInformation} for the result type of the function.
     * @param <R> output type
     * @return The transformed {@link DataStream}.
     */
    public <R> SingleOutputStreamOperator<R> map(
            MapFunction<T, R> mapper, TypeInformation<R> outputType) {
        // 返回一个新的 DataStream， StreamMap 为 StreamOperator 的实现类
        return transform("Map", outputType, new StreamMap<>(clean(mapper)));
    }

2.3.2. DataStream#transform

方法传递用户定义的运算符以及将转换DataStreams的类型信息。


    /**
     * 
     *
     * Method for passing user defined operators along with the type information that will transform
     * the DataStream.
     *
     * @param operatorName name of the operator, for logging purposes
     * @param outTypeInfo the output type of the operator
     * @param operator the object containing the transformation logic
     * @param <R> type of the return stream
     * @return the data stream constructed
     * @see #transform(String, TypeInformation, OneInputStreamOperatorFactory)
     */
    @PublicEvolving
    public <R> SingleOutputStreamOperator<R> transform(
            String operatorName,
            TypeInformation<R> outTypeInfo,
            OneInputStreamOperator<T, R> operator) {

        //  operatorName:  Flat Map
        //  outTypeInfo : PojoType<org.apache.flink.streaming.examples.socket.SocketWindowWordCount$WordWithCount, fields = [count: Long, word: String]>
        //  operator : StreamFlatMap
        return doTransform(operatorName, outTypeInfo, SimpleOperatorFactory.of(operator));
    }

2.3.3. DataStream#doTransform


    protected <R> SingleOutputStreamOperator<R> doTransform(
            String operatorName,
            TypeInformation<R> outTypeInfo,
            StreamOperatorFactory<R> operatorFactory) {

        // read the output type of the input Transform to coax out errors about MissingTypeInfo
        transformation.getOutputType();

        // 构造OneInputTransformation 单输入转换
        OneInputTransformation<T, R> resultTransform =
                new OneInputTransformation<>(
                        this.transformation,
                        operatorName,
                        operatorFactory,
                        outTypeInfo,
                        environment.getParallelism());

		// 新的 transformation 会连接上当前 DataStream 中的 transformation，从而构建成一棵树
        @SuppressWarnings({"unchecked", "rawtypes"})
        SingleOutputStreamOperator<R> returnStream =
                new SingleOutputStreamOperator(environment, resultTransform);

        // 添加到StreamExecutionEnvironment中的 List<Transformation<?>> transformations 集合中
        getExecutionEnvironment().addOperator(resultTransform);

        return returnStream;
    }

2.3.4. Transformation

并不是每一个 StreamTransformation 都会转换成 runtime 层中物理操作。有一些只是逻辑概念，比如 union、 split/select、 partition 等。如下图所示的转换树，在运行时会优化成下方的操作图

在这里插入图片描述

Transformation 代码注释有写…
union、 split/select（1.12 已移除）、 partition 中的信息会被写入到 Source – > Map 的边中。
通过源码也可以发现 UnionTransformation , SplitTransformation（1.12 移除），SelectTransformation（1.12 移除） ,
PartitionTransformation 由于不包含具体的操作所以都没有 StreamOperator 成员变量，
而其他 StreamTransformation 的子类基本上都有。


/**
 * {@code Transformation} 代表创建DataStream 的操作.
 * 每个数据流都有一个底层的{@code Transformation}，它是所述数据流的起源。
 *
 * 类似DataStream#map 这样的API操作在底层都会构建一个  {@code Transformation}s 树 .
 *
 * 当 stream 程序执行的时候都会将这个graph使用StreamGraphGenerator 生成一个StreamGraph
 *
 * {@code Transformation}不一定对应于运行时的物理操作。
 *
 * 一些操作仅仅是逻辑上的概念
 *
 *
 * <pre>{@code
 *   Source              Source
 *      +                   +
 *      |                   |
 *      v                   v
 *  Rebalance          HashPartition
 *      +                   +
 *      |                   |
 *      |                   |
 *      +------>Union<------+
 *                +
 *                |
 *                v
 *              Split
 *                +
 *                |
 *                v
 *              Select
 *                +
 *                v
 *               Map
 *                +
 *                |
 *                v
 *              Sink
 * }</pre>
 *
 * 将在运行时生成此操作图：
 *
 * <pre>{@code
 * Source              Source
 *   +                   +
 *   |                   |
 *   |                   |
 *   +------->Map<-------+
 *             +
 *             |
 *             v
 *            Sink
 * }</pre>
 *
 * partitioning, union, split/select 之类的操作最终都会被编码在连接map算子的edges中.
 *
 **/

三 .StreamGraph 生成的源码

3.1.StreamGraphGenerator

StreamExecutionEnvironment.java -> generator() -> transform()

    /**
     * Transforms one {@code Transformation}.
     *
     * 这将检查我们是否已经对其进行了转换，并在这种情况下提前退出。
     * 如果不是，它将委托给特定于转换的方法处理。
     *
     * <p>This checks whether we already transformed it and exits early in that case.
     * If not it delegates to one of the transformation specific methods.
     */

    // 对每个 transformation 进行转换，转换成 StreamGraph 中的 StreamNode 和 StreamEdge
    // 返回值为该 transform 的 id 集合，通常大小为 1 个（除 FeedbackTransformation）
    private Collection<Integer> transform(Transformation<?> transform) {
        // 直接返回已经完成转换的实例
        if (alreadyTransformed.containsKey(transform)) {
            return alreadyTransformed.get(transform);
        }

        LOG.debug("Transforming " + transform);

        if (transform.getMaxParallelism() <= 0) {

            // if the max parallelism hasn't been set, then first use the job wide max parallelism
            // from the ExecutionConfig.
            int globalMaxParallelismFromConfig = executionConfig.getMaxParallelism();
            if (globalMaxParallelismFromConfig > 0) {
                transform.setMaxParallelism(globalMaxParallelismFromConfig);
            }
        }

        // 为了触发 MissingTypeInfo 的异常
        // call at least once to trigger exceptions about MissingTypeInfo
        transform.getOutputType();

        @SuppressWarnings("unchecked")
        final TransformationTranslator<?, Transformation<?>> translator =
                (TransformationTranslator<?, Transformation<?>>)
                        translatorMap.get(transform.getClass());

        Collection<Integer> transformedIds;
        if (translator != null) {
            // 转换 ???
            transformedIds = translate(translator, transform);
        } else {
            transformedIds = legacyTransform(transform);
        }

        // need this check because the iterate transformation adds itself before
        // transforming the feedback edges
        if (!alreadyTransformed.containsKey(transform)) {
            alreadyTransformed.put(transform, transformedIds);
        }

        return transformedIds;
    }


	
    private Collection<Integer> translate(
            final TransformationTranslator<?, Transformation<?>> translator,
            final Transformation<?> transform) {
        checkNotNull(translator);
        checkNotNull(transform);

        final List<Collection<Integer>> allInputIds = getParentInputIds(transform.getInputs());

        // 根据不同的类型调用相应的转换逻辑
        
        
        // 已经处理, 直接返回..
        // the recursive call might have already transformed this
        if (alreadyTransformed.containsKey(transform)) {
            return alreadyTransformed.get(transform);
        }
        // 确定slot共享组 : default
        final String slotSharingGroup =
                determineSlotSharingGroup(
                        transform.getSlotSharingGroup(),
                        allInputIds.stream()
                                .flatMap(Collection::stream)
                                .collect(Collectors.toList()));

        final TransformationTranslator.Context context =
                new ContextImpl(this, streamGraph, slotSharingGroup, configuration);

        // 是否执行在批处理模式 ????  ==> translateForStreaming
        return shouldExecuteInBatchMode
                ? translator.translateForBatch(transform, context)  // 批处理
                : translator.translateForStreaming(transform, context); //流处理
    }

3.2. translateForStreaming#translateForStreaming

    @Override
    public final Collection<Integer> translateForStreaming(
            final T transformation, final Context context) {
        checkNotNull(transformation);
        checkNotNull(context);

        
        // translateForStreamingInternal ??
        final Collection<Integer> transformedIds =
                translateForStreamingInternal(transformation, context);
        configure(transformation, context);

        return transformedIds;
    }
    
    /**
     * 算子对应关系 :
     * key xxx 分区相关的算子 : translateForStreamingInternal
     * map flatmap  转换算子  : OneInputTransformationTranslator#translateForStreamingInternal
     *
     * Translates a given {@link Transformation} to its runtime implementation for STREAMING-style
     * execution.
     *
     * @param transformation The transformation to be translated.
     * @param context The translation context.
     * @return The ids of the "last" {@link StreamNode StreamNodes} in the transformation graph
     *     corresponding to this transformation. These will be the nodes that a potential following
     *     transformation will need to connect to.
     */
    protected abstract Collection<Integer> translateForStreamingInternal(
            final T transformation, final Context context);





    /**
     * OneInputTransformationTranslator # translateForStreamingInternal
     * map , flatmap  转换算子 相关
     *
     * @param transformation The transformation to be translated.
     * @param context The translation context.
     * @return
     */
    @Override
    public Collection<Integer> translateForStreamingInternal(
            final OneInputTransformation<IN, OUT> transformation, final Context context) {
        return translateInternal(
                transformation,
                transformation.getOperatorFactory(),
                transformation.getInputType(),
                transformation.getStateKeySelector(),
                transformation.getStateKeyType(),
                context);
    }

3.3. AbstractOneInputTransformationTranslator#translateForStreamingInternal

该函数首先会对该 transform 的上游 transform 进行递归转换，确保上游的都已经完成了转化。

然后通过 transform 构造出 StreamNode，最后与上游的 transform 进行连接，构造出 StreamEdge。

protected Collection<Integer> translateInternal(
            final Transformation<OUT> transformation,
            final StreamOperatorFactory<OUT> operatorFactory,
            final TypeInformation<IN> inputType,
            @Nullable final KeySelector<IN, ?> stateKeySelector,
            @Nullable final TypeInformation<?> stateKeyType,
            final Context context) {
        checkNotNull(transformation);
        checkNotNull(operatorFactory);
        checkNotNull(inputType);
        checkNotNull(context);

        final StreamGraph streamGraph = context.getStreamGraph();
        final String slotSharingGroup = context.getSlotSharingGroup();
        final int transformationId = transformation.getId();
        final ExecutionConfig executionConfig = streamGraph.getExecutionConfig();

        // 添加 StreamNode
        streamGraph.addOperator(
                transformationId,
                slotSharingGroup,
                transformation.getCoLocationGroupKey(),
                operatorFactory,
                inputType,
                transformation.getOutputType(),
                transformation.getName());

        if (stateKeySelector != null) {
            TypeSerializer<?> keySerializer = stateKeyType.createSerializer(executionConfig);
            streamGraph.setOneInputStateKey(transformationId, stateKeySelector, keySerializer);
        }

        // 处理并行度
        int parallelism =
                transformation.getParallelism() != ExecutionConfig.PARALLELISM_DEFAULT
                        ? transformation.getParallelism()
                        : executionConfig.getParallelism();


        // 设置并行度
        streamGraph.setParallelism(transformationId, parallelism);
        streamGraph.setMaxParallelism(transformationId, transformation.getMaxParallelism());

        final List<Transformation<?>> parentTransformations = transformation.getInputs();
        checkState(
                parentTransformations.size() == 1,
                "Expected exactly one input transformation but found "
                        + parentTransformations.size());

        // 添加边 StreamEdge
        for (Integer inputId : context.getStreamNodeIds(parentTransformations.get(0))) {
            streamGraph.addEdge(inputId, transformationId, 0);
        }

        return Collections.singleton(transformationId);
    }

3.4.PartitionTransformationTranslator#addEdgeInternal

对partition 的转换没有生成具体的 StreamNode 和 StreamEdge，而是添加一个虚节点。
当 partition 的下游 transform（如 map）添加 edge 时（调用 StreamGraph.addEdge），会把partition 信息写入到 edge 中。

    //  逻辑转换（partition、 union 等）的处理
    @Override
    protected Collection<Integer> translateForStreamingInternal(
            final PartitionTransformation<OUT> transformation, final Context context) {
        return translateInternal(transformation, context);
    }

    //  逻辑转换（partition、 union 等）的处理
    private Collection<Integer> translateInternal(
            final PartitionTransformation<OUT> transformation, final Context context) {
        checkNotNull(transformation);
        checkNotNull(context);

        final StreamGraph streamGraph = context.getStreamGraph();

        final List<Transformation<?>> parentTransformations = transformation.getInputs();
        checkState(
                parentTransformations.size() == 1,
                "Expected exactly one input transformation but found "
                        + parentTransformations.size());
        final Transformation<?> input = parentTransformations.get(0);

        List<Integer> resultIds = new ArrayList<>();

        for (Integer inputId : context.getStreamNodeIds(input)) {
            // 生成一个新的虚拟 id
            final int virtualId = Transformation.getNewNodeId();
            // 添加一个虚拟分区节点，不会生成 StreamNode
            streamGraph.addVirtualPartitionNode(
                    inputId,
                    virtualId,
                    transformation.getPartitioner(),
                    transformation.getShuffleMode());
            resultIds.add(virtualId);
        }
        return resultIds;
    }

3.5. StreamGraph#addEdge

AbstractOneInputTransformationTranslator.java -> translateInternal()

  public void addEdge(Integer upStreamVertexID, Integer downStreamVertexID, int typeNumber) {

        addEdgeInternal(
                upStreamVertexID,
                downStreamVertexID,
                typeNumber,
                null,
                new ArrayList<String>(),
                null,
                null);
    }

    private void addEdgeInternal(
            Integer upStreamVertexID,
            Integer downStreamVertexID,
            int typeNumber,
            StreamPartitioner<?> partitioner,
            List<String> outputNames,
            OutputTag outputTag,
            ShuffleMode shuffleMode) {

        // 当上游是侧输出时,递归调用,并传入侧输出信息
        if (virtualSideOutputNodes.containsKey(upStreamVertexID)) {
            int virtualId = upStreamVertexID;
            upStreamVertexID = virtualSideOutputNodes.get(virtualId).f0;
            if (outputTag == null) {
                outputTag = virtualSideOutputNodes.get(virtualId).f1;
            }
            // 递归调用
            addEdgeInternal(
                    upStreamVertexID,
                    downStreamVertexID,
                    typeNumber,
                    partitioner,
                    null,
                    outputTag,
                    shuffleMode);
        } else if (virtualPartitionNodes.containsKey(upStreamVertexID)) {
            // 当上游是partition时, 递归调用, 并传入partitioner信息
            int virtualId = upStreamVertexID;
            upStreamVertexID = virtualPartitionNodes.get(virtualId).f0;
            if (partitioner == null) {
                partitioner = virtualPartitionNodes.get(virtualId).f1;
            }
            shuffleMode = virtualPartitionNodes.get(virtualId).f2;

            // 递归调用
            addEdgeInternal(
                    upStreamVertexID,
                    downStreamVertexID,
                    typeNumber,
                    partitioner,
                    outputNames,
                    outputTag,
                    shuffleMode);
        } else {
            // 真正构建 StreamEdge

            // 上游节点
            StreamNode upstreamNode = getStreamNode(upStreamVertexID);

            // 下游节点
            StreamNode downstreamNode = getStreamNode(downStreamVertexID);

            // 如果并行度相同使用ForwardPartitioner , 如果分区器不同这是用RebalancePartitioner 分区器
            // If no partitioner was specified and the parallelism of upstream and downstream
            // operator matches use forward partitioning, use rebalance otherwise.
            if (partitioner == null
                    && upstreamNode.getParallelism() == downstreamNode.getParallelism()) {
                partitioner = new ForwardPartitioner<Object>();
            } else if (partitioner == null) {
                partitioner = new RebalancePartitioner<Object>();
            }

            // 如果上游的并行度和下游的并行度不一致,则抛出异常...
            if (partitioner instanceof ForwardPartitioner) {
                if (upstreamNode.getParallelism() != downstreamNode.getParallelism()) {
                    throw new UnsupportedOperationException(
                            "Forward partitioning does not allow "
                                    + "change of parallelism. Upstream operation: "
                                    + upstreamNode
                                    + " parallelism: "
                                    + upstreamNode.getParallelism()
                                    + ", downstream operation: "
                                    + downstreamNode
                                    + " parallelism: "
                                    + downstreamNode.getParallelism()
                                    + " You must use another partitioning strategy, such as broadcast, rebalance, shuffle or global.");
                }
            }

            if (shuffleMode == null) {
                shuffleMode = ShuffleMode.UNDEFINED;
            }

            // 构建 edge
            StreamEdge edge =
                    new StreamEdge(
                            upstreamNode,
                            downstreamNode,
                            typeNumber,
                            partitioner,
                            outputTag,
                            shuffleMode);

            // 添加 edge
            getStreamNode(edge.getSourceId()).addOutEdge(edge);
            getStreamNode(edge.getTargetId()).addInEdge(edge);
        }
    }

四 .StreamGraph数据结构.

4.1. StreamGraph 属性

StreamGraph 的属性有不少核心点属性. 比如名称, 配置相关, 调度模式(EAGER), StreamNode 集合等等…

    private static final Logger LOG = LoggerFactory.getLogger(StreamGraph.class);

    public static final String ITERATION_SOURCE_NAME_PREFIX = "IterationSource";

    public static final String ITERATION_SINK_NAME_PREFIX = "IterationSink";
    // job name
    private String jobName;
    // 配置 相关
    private final ExecutionConfig executionConfig;
    private final CheckpointConfig checkpointConfig;
    private SavepointRestoreSettings savepointRestoreSettings = SavepointRestoreSettings.none();

    // ScheduleMode : EAGER
    private ScheduleMode scheduleMode;

    // true
    private boolean chaining;

    // 上传的文件
    private Collection<Tuple2<String, DistributedCache.DistributedCacheEntry>> userArtifacts;

    // {TimeCharacteristic@3582} "EventTime"
    private TimeCharacteristic timeCharacteristic;

    // ALL_EDGES_PIPELINED
    private GlobalDataExchangeMode globalDataExchangeMode;

    /**
     *  用于指示默认情况下是否将所有顶点放入同一插槽共享组的标志。
     *
     *  Flag to indicate whether to put all vertices into the same slot sharing group by default.
     *  */
    private boolean allVerticesInSameSlotSharingGroupByDefault = true;

    // StreamNode 集合
    private Map<Integer, StreamNode> streamNodes;

    // Graph的起点 id
    private Set<Integer> sources;
    // Graph的总店 id 5
    private Set<Integer> sinks;

    private Map<Integer, Tuple2<Integer, OutputTag>> virtualSideOutputNodes;
    
    // {Integer@3683} 6 -> {Tuple3@3914} "(2,HASH,UNDEFINED)"
    private Map<Integer, Tuple3<Integer, StreamPartitioner<?>, ShuffleMode>> virtualPartitionNodes;

    protected Map<Integer, String> vertexIDtoBrokerID;
    protected Map<Integer, Long> vertexIDtoLoopTimeout;
    private StateBackend stateBackend;
    private Set<Tuple2<StreamNode, StreamNode>> iterationSourceSinkPairs;
    private InternalTimeServiceManager.Provider timerServiceProvider;

4.2. StreamNode 类

StreamNode 代表graph中的一个点. 比较重要的是并行度. 名称, 入边/出边等参数…


    private final int id;

    // 并行度
    private int parallelism;

    // 名称
    private final String operatorName;

    private List<StreamEdge> inEdges = new ArrayList<StreamEdge>();
    private List<StreamEdge> outEdges = new ArrayList<StreamEdge>();

    // 分区相关
    private KeySelector<?, ?>[] statePartitioners = new KeySelector[0];
    private TypeSerializer<?> stateKeySerializer;
    
    
    // 其他参数..
    
    /**
     * Maximum parallelism for this stream node.
     * The maximum parallelism is the upper limit for dynamic scaling and the number of key groups used for partitioned state.
     */
    private int maxParallelism;

    private ResourceSpec minResources = ResourceSpec.DEFAULT;
    private ResourceSpec preferredResources = ResourceSpec.DEFAULT;
    private final Map<ManagedMemoryUseCase, Integer> managedMemoryOperatorScopeUseCaseWeights = new HashMap<>();
    private final Set<ManagedMemoryUseCase> managedMemorySlotScopeUseCases = new HashSet<>();
    private long bufferTimeout;
    
    private @Nullable String slotSharingGroup;
    private @Nullable String coLocationGroup;
    

    private StreamOperatorFactory<?> operatorFactory;
    private TypeSerializer<?>[] typeSerializersIn = new TypeSerializer[0];
    private TypeSerializer<?> typeSerializerOut;

   

    private final Class<? extends AbstractInvokable> jobVertexClass;

    private InputFormat<?, ?> inputFormat;
    private OutputFormat<?> outputFormat;

    private String transformationUID;
    private String userHash;

    private final Map<Integer, StreamConfig.InputRequirement> inputRequirements = new HashMap<>();

4.3. StreamEdge 类

在streaming中的一条边.
这样的一条边不一定会转换为两个作业顶点之间的连接 (由于链接/优化)
比较重要的是 StreamEdge 中的 sourceId / targetId 以及分区 outputPartitioner 相关的信息.


    // id
    private final String edgeId;

    // source StreamNode id
    private final int sourceId;

    // target StreamNode id
    private final int targetId;

    /** The type number of the input for co-tasks. */
    private final int typeNumber;
    /** The side-output tag (if any) of this {@link StreamEdge}. */
    private final OutputTag outputTag;

    /**
     * 分区相关.
     *
     * The {@link StreamPartitioner} on this {@link StreamEdge}.
     * */
    private StreamPartitioner<?> outputPartitioner;

    /**
     *  source StreamNode name
     * The name of the operator in the source vertex.
     * */
    private final String sourceOperatorName;

    /** 
     * target StreamNode name
     * The name of the operator in the target vertex. 
     * */
    private final String targetOperatorName;


    /**
     * PIPELINED
     */
    private final ShuffleMode shuffleMode;

    /**
     * 超时相关...
     */
    private long bufferTimeout;

4.4. StreamGraph 方法

StreamGraph的方法主要是对StreamGraph中的属性做set/get操作.
挑几个比较核心的瞄一眼.

4.4.1. addOperator


    private <IN, OUT> void addOperator(
            Integer vertexID,
            @Nullable String slotSharingGroup,
            @Nullable String coLocationGroup,
            StreamOperatorFactory<OUT> operatorFactory,
            TypeInformation<IN> inTypeInfo,
            TypeInformation<OUT> outTypeInfo,
            String operatorName,
            Class<? extends AbstractInvokable> invokableClass) {

        // 构建 StreamNode 并加入缓存  streamNodes
        // vertexID: 2
        // slotSharingGroup : default
        // coLocationGroup : null
        // StreamOperatorFactory : SimpleUdfStreamOperatorFactory
        // inTypeInfo  : String
        // outTypeInfo : PojoType<org.apache.flink.streaming.examples.socket.SocketWindowWordCount$WordWithCount, fields = [count: Long, word: String]>
        // operatorName : Flat Map
        // invokableClass : class org.apache.flink.streaming.runtime.tasks.OneInputStreamTask


        addNode(
                vertexID,
                slotSharingGroup,
                coLocationGroup,
                invokableClass,
                operatorFactory,
                operatorName);


        setSerializers(vertexID, createSerializer(inTypeInfo), null, createSerializer(outTypeInfo));

        if (operatorFactory.isOutputTypeConfigurable() && outTypeInfo != null) {
            // sets the output type which must be know at StreamGraph creation time
            operatorFactory.setOutputType(outTypeInfo, executionConfig);
        }

        if (operatorFactory.isInputTypeConfigurable()) {
            operatorFactory.setInputType(inTypeInfo, executionConfig);
        }

        if (LOG.isDebugEnabled()) {
            LOG.debug("Vertex: {}", vertexID);
        }
    }

4.4.2. addNode


    protected StreamNode addNode(
            Integer vertexID,
            @Nullable String slotSharingGroup,
            @Nullable String coLocationGroup,
            Class<? extends AbstractInvokable> vertexClass,
            StreamOperatorFactory<?> operatorFactory,
            String operatorName) {

        if (streamNodes.containsKey(vertexID)) {
            throw new RuntimeException("Duplicate vertexID " + vertexID);
        }


        // vertexID: 2
        // slotSharingGroup : default
        // coLocationGroup : null
        // vertexClass : class org.apache.flink.streaming.runtime.tasks.OneInputStreamTask
        // StreamOperatorFactory : SimpleUdfStreamOperatorFactory
        // operatorName : Flat Map


        StreamNode vertex =
                new StreamNode(
                        vertexID,
                        slotSharingGroup,
                        coLocationGroup,
                        operatorFactory,
                        operatorName,
                        vertexClass);

        // 加入缓存 : streamNodes
        streamNodes.put(vertexID, vertex);

        return vertex;
    }

4.4.3. addEdge


    public void addEdge(Integer upStreamVertexID, Integer downStreamVertexID, int typeNumber) {

        addEdgeInternal(
                upStreamVertexID,
                downStreamVertexID,
                typeNumber,
                null,
                new ArrayList<String>(),
                null,
                null);
    }

    private void addEdgeInternal(
            Integer upStreamVertexID,
            Integer downStreamVertexID,
            int typeNumber,
            StreamPartitioner<?> partitioner,
            List<String> outputNames,
            OutputTag outputTag,
            ShuffleMode shuffleMode) {

        // 当上游是侧输出时,递归调用,并传入侧输出信息
        if (virtualSideOutputNodes.containsKey(upStreamVertexID)) {
            int virtualId = upStreamVertexID;
            upStreamVertexID = virtualSideOutputNodes.get(virtualId).f0;
            if (outputTag == null) {
                outputTag = virtualSideOutputNodes.get(virtualId).f1;
            }
            // 递归调用
            addEdgeInternal(
                    upStreamVertexID,
                    downStreamVertexID,
                    typeNumber,
                    partitioner,
                    null,
                    outputTag,
                    shuffleMode);
        } else if (virtualPartitionNodes.containsKey(upStreamVertexID)) {
            // 当上游是partition时, 递归调用, 并传入partitioner信息
            int virtualId = upStreamVertexID;
            upStreamVertexID = virtualPartitionNodes.get(virtualId).f0;
            if (partitioner == null) {
                partitioner = virtualPartitionNodes.get(virtualId).f1;
            }
            shuffleMode = virtualPartitionNodes.get(virtualId).f2;

            // 递归调用
            addEdgeInternal(
                    upStreamVertexID,
                    downStreamVertexID,
                    typeNumber,
                    partitioner,
                    outputNames,
                    outputTag,
                    shuffleMode);
        } else {
            // 真正构建 StreamEdge

            // 上游节点
            StreamNode upstreamNode = getStreamNode(upStreamVertexID);

            // 下游节点
            StreamNode downstreamNode = getStreamNode(downStreamVertexID);

            // 如果并行度相同使用ForwardPartitioner , 如果分区器不同这是用RebalancePartitioner 分区器
            // If no partitioner was specified and the parallelism of upstream and downstream
            // operator matches use forward partitioning, use rebalance otherwise.
            if (partitioner == null
                    && upstreamNode.getParallelism() == downstreamNode.getParallelism()) {
                partitioner = new ForwardPartitioner<Object>();
            } else if (partitioner == null) {
                partitioner = new RebalancePartitioner<Object>();
            }

            // 如果上游的并行度和下游的并行度不一致,则抛出异常...
            if (partitioner instanceof ForwardPartitioner) {
                if (upstreamNode.getParallelism() != downstreamNode.getParallelism()) {
                    throw new UnsupportedOperationException(
                            "Forward partitioning does not allow "
                                    + "change of parallelism. Upstream operation: "
                                    + upstreamNode
                                    + " parallelism: "
                                    + upstreamNode.getParallelism()
                                    + ", downstream operation: "
                                    + downstreamNode
                                    + " parallelism: "
                                    + downstreamNode.getParallelism()
                                    + " You must use another partitioning strategy, such as broadcast, rebalance, shuffle or global.");
                }
            }

            if (shuffleMode == null) {
                shuffleMode = ShuffleMode.UNDEFINED;
            }

            // 构建 edge
            StreamEdge edge =
                    new StreamEdge(
                            upstreamNode,
                            downstreamNode,
                            typeNumber,
                            partitioner,
                            outputTag,
                            shuffleMode);

            // 添加 edge
            getStreamNode(edge.getSourceId()).addOutEdge(edge);
            getStreamNode(edge.getTargetId()).addInEdge(edge);
        }
    }

4.5. 数据实例

在这里插入图片描述

{
  "nodes" : [ {
    "id" : 1,
    "type" : "Source: Socket Stream",
    "pact" : "Data Source",
    "contents" : "Source: Socket Stream",
    "parallelism" : 1
  }, {
    "id" : 2,
    "type" : "Flat Map",
    "pact" : "Operator",
    "contents" : "Flat Map",
    "parallelism" : 4,
    "predecessors" : [ {
      "id" : 1,
      "ship_strategy" : "REBALANCE",
      "side" : "second"
    } ]
  }, {
    "id" : 4,
    "type" : "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)",
    "pact" : "Operator",
    "contents" : "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)",
    "parallelism" : 4,
    "predecessors" : [ {
      "id" : 2,
      "ship_strategy" : "HASH",
      "side" : "second"
    } ]
  }, {
    "id" : 5,
    "type" : "Sink: Print to Std. Out",
    "pact" : "Data Sink",
    "contents" : "Sink: Print to Std. Out",
    "parallelism" : 1,
    "predecessors" : [ {
      "id" : 4,
      "ship_strategy" : "REBALANCE",
      "side" : "second"
    } ]
  } ]
}

streamGraph = {StreamGraph@3473} 
        jobName = "Socket Window WordCount"
        executionConfig = {ExecutionConfig@3399} "ExecutionConfig{executionMode=PIPELINED, closureCleanerLevel=RECURSIVE, parallelism=4, maxParallelism=-1, numberOfExecutionRetries=-1, forceKryo=false, disableGenericTypes=false, enableAutoGeneratedUids=true, objectReuse=false, autoTypeRegistrationEnabled=true, forceAvro=false, autoWatermarkInterval=200, latencyTrackingInterval=0, isLatencyTrackingConfigured=false, executionRetryDelay=10000, restartStrategyConfiguration=Cluster level default restart strategy, taskCancellationIntervalMillis=-1, taskCancellationTimeoutMillis=-1, useSnapshotCompression=false, defaultInputDependencyConstraint=ANY, globalJobParameters=org.apache.flink.api.common.ExecutionConfig$GlobalJobParameters@1, registeredTypesWithKryoSerializers={}, registeredTypesWithKryoSerializerClasses={}, defaultKryoSerializers={}, defaultKryoSerializerClasses={}, registeredKryoTypes=[], registeredPojoTypes=[]}"
        checkpointConfig = {CheckpointConfig@3465} 
                checkpointingMode = {CheckpointingMode@3936} "EXACTLY_ONCE"
                checkpointInterval = -1
                checkpointTimeout = 600000
                minPauseBetweenCheckpoints = 0
                maxConcurrentCheckpoints = 1
                forceCheckpointing = false
                forceUnalignedCheckpoints = false
                unalignedCheckpointsEnabled = false
                alignmentTimeout = 0
                approximateLocalRecovery = false
                externalizedCheckpointCleanup = null
                failOnCheckpointingErrors = true
                preferCheckpointForRecovery = false
                tolerableCheckpointFailureNumber = -1
        savepointRestoreSettings = {SavepointRestoreSettings@3467} "SavepointRestoreSettings.none()"
        scheduleMode = {ScheduleMode@3580} "EAGER"
        chaining = true
        userArtifacts = {ArrayList@3581}  size = 0
        timeCharacteristic = {TimeCharacteristic@3582} "EventTime"
        globalDataExchangeMode = {GlobalDataExchangeMode@3583} "ALL_EDGES_PIPELINED"
        allVerticesInSameSlotSharingGroupByDefault = true
        streamNodes = {HashMap@3584}  size = 4
                {Integer@3540} 1 -> {StreamNode@3929} "Source: Socket Stream-1"
                        key = {Integer@3540} 1
                        value = {StreamNode@3929} "Source: Socket Stream-1"
                                id = 1
                                parallelism = 1
                                maxParallelism = -1
                                minResources = {ResourceSpec@3492} "ResourceSpec{UNKNOWN}"
                                preferredResources = {ResourceSpec@3492} "ResourceSpec{UNKNOWN}"
                                managedMemoryOperatorScopeUseCaseWeights = {HashMap@3939}  size = 0
                                managedMemorySlotScopeUseCases = {HashSet@3940}  size = 0
                                bufferTimeout = -1
                                operatorName = "Source: Socket Stream"
                                slotSharingGroup = "default"
                                coLocationGroup = null
                                statePartitioners = {KeySelector[0]@3942} 
                                stateKeySerializer = null
                                operatorFactory = {SimpleUdfStreamOperatorFactory@3498} 
                                typeSerializersIn = {TypeSerializer[2]@3943} 
                                typeSerializerOut = {StringSerializer@3944} 
                                inEdges = {ArrayList@3945}  size = 0
                                outEdges = {ArrayList@3946}  size = 1
                                        0 = {StreamEdge@3953} "(Source: Socket Stream-1 -> Flat Map-2, typeNumber=0, outputPartitioner=REBALANCE, bufferTimeout=-1, outputTag=null)"
                                                edgeId = "Source: Socket Stream-1_Flat Map-2_0_REBALANCE"
                                                sourceId = 1
                                                targetId = 2
                                                typeNumber = 0
                                                outputTag = null
                                                outputPartitioner = {RebalancePartitioner@3971} "REBALANCE"
                                                sourceOperatorName = "Source: Socket Stream"
                                                targetOperatorName = "Flat Map"
                                                shuffleMode = {ShuffleMode@3664} "UNDEFINED"
                                                bufferTimeout = -1
                                0 = {StreamEdge@3953} "(Source: Socket Stream-1 -> Flat Map-2, typeNumber=0, outputPartitioner=REBALANCE, bufferTimeout=-1, outputTag=null)"
                                jobVertexClass = {Class@3947} "class org.apache.flink.streaming.runtime.tasks.SourceStreamTask"
                                inputFormat = null
                                outputFormat = null
                                transformationUID = null
                                userHash = null
                                sortedInputs = false                                

                {Integer@3652} 2 -> {StreamNode@3715} "Flat Map-2"
                        key = {Integer@3652} 2
                        value = {StreamNode@3715} "Flat Map-2"
                                id = 2
                                parallelism = 4
                                maxParallelism = -1
                                minResources = {ResourceSpec@3492} "ResourceSpec{UNKNOWN}"
                                preferredResources = {ResourceSpec@3492} "ResourceSpec{UNKNOWN}"
                                managedMemoryOperatorScopeUseCaseWeights = {HashMap@3956}  size = 0
                                managedMemorySlotScopeUseCases = {HashSet@3957}  size = 0
                                bufferTimeout = -1
                                operatorName = "Flat Map"
                                slotSharingGroup = "default"
                                coLocationGroup = null
                                statePartitioners = {KeySelector[0]@3958} 
                                stateKeySerializer = null
                                operatorFactory = {SimpleUdfStreamOperatorFactory@3490} 
                                typeSerializersIn = {TypeSerializer[2]@3959} 
                                typeSerializerOut = {PojoSerializer@3960} 
                                inEdges = {ArrayList@3961}  size = 1
                                        0 = {StreamEdge@3953} "(Source: Socket Stream-1 -> Flat Map-2, typeNumber=0, outputPartitioner=REBALANCE, bufferTimeout=-1, outputTag=null)"
                                                edgeId = "Source: Socket Stream-1_Flat Map-2_0_REBALANCE"
                                                sourceId = 1
                                                targetId = 2
                                                typeNumber = 0
                                                outputTag = null
                                                outputPartitioner = {RebalancePartitioner@3971} "REBALANCE"
                                                sourceOperatorName = "Source: Socket Stream"
                                                targetOperatorName = "Flat Map"
                                                shuffleMode = {ShuffleMode@3664} "UNDEFINED"
                                                bufferTimeout = -1
                                outEdges = {ArrayList@3962}  size = 1
                                        0 = {StreamEdge@3968} "(Flat Map-2 -> Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4, typeNumber=0, outputPartitioner=HASH, bufferTimeout=-1, outputTag=null)"
                                                edgeId = "Flat Map-2_Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4_0_HASH"
                                                sourceId = 2
                                                targetId = 4
                                                typeNumber = 0
                                                outputTag = null
                                                outputPartitioner = {KeyGroupStreamPartitioner@3663} "HASH"
                                                sourceOperatorName = "Flat Map"
                                                targetOperatorName = "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)"
                                                shuffleMode = {ShuffleMode@3664} "UNDEFINED"
                                                bufferTimeout = -1
                                jobVertexClass = {Class@3963} "class org.apache.flink.streaming.runtime.tasks.OneInputStreamTask"
                                inputFormat = null
                                outputFormat = null
                                transformationUID = null
                                userHash = null
                                sortedInputs = false

                {Integer@3777} 4 -> {StreamNode@3781} "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4"
                        key = {Integer@3777} 4
                        value = {StreamNode@3781} "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4"
                                id = 4
                                parallelism = 4
                                maxParallelism = -1
                                minResources = {ResourceSpec@3492} "ResourceSpec{UNKNOWN}"
                                preferredResources = {ResourceSpec@3492} "ResourceSpec{UNKNOWN}"
                                managedMemoryOperatorScopeUseCaseWeights = {HashMap@3978}  size = 0
                                managedMemorySlotScopeUseCases = {HashSet@3979}  size = 1
                                bufferTimeout = -1
                                operatorName = "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)"
                                slotSharingGroup = "default"
                                coLocationGroup = null
                                statePartitioners = {KeySelector[1]@3980} 
                                stateKeySerializer = {StringSerializer@3944} 
                                operatorFactory = {SimpleUdfStreamOperatorFactory@3743} 
                                typeSerializersIn = {TypeSerializer[2]@3981} 
                                typeSerializerOut = {PojoSerializer@3982} 
                                inEdges = {ArrayList@3983}  size = 1
                                        0 = {StreamEdge@3968} "(Flat Map-2 -> Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4, typeNumber=0, outputPartitioner=HASH, bufferTimeout=-1, outputTag=null)"
                                                edgeId = "Flat Map-2_Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4_0_HASH"
                                                sourceId = 2
                                                targetId = 4
                                                typeNumber = 0
                                                outputTag = null
                                                outputPartitioner = {KeyGroupStreamPartitioner@3663} "HASH"
                                                sourceOperatorName = "Flat Map"
                                                targetOperatorName = "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)"
                                                shuffleMode = {ShuffleMode@3664} "UNDEFINED"
                                                bufferTimeout = -1
                                outEdges = {ArrayList@3984}  size = 1
                                        0 = {StreamEdge@3989} "(Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4 -> Sink: Print to Std. Out-5, typeNumber=0, outputPartitioner=REBALANCE, bufferTimeout=-1, outputTag=null)"
                                                edgeId = "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4_Sink: Print to Std. Out-5_0_REBALANCE"
                                                sourceId = 4
                                                targetId = 5
                                                typeNumber = 0
                                                outputTag = null
                                                outputPartitioner = {RebalancePartitioner@3992} "REBALANCE"
                                                sourceOperatorName = "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)"
                                                targetOperatorName = "Sink: Print to Std. Out"
                                                shuffleMode = {ShuffleMode@3664} "UNDEFINED"
                                                bufferTimeout = -1                   
                                jobVertexClass = {Class@3963} "class org.apache.flink.streaming.runtime.tasks.OneInputStreamTask"
                                inputFormat = null
                                outputFormat = null
                                transformationUID = null
                                userHash = null
                                sortedInputs = false




                
                {Integer@3921} 5 -> {StreamNode@3930} "Sink: Print to Std. Out-5"
                        key = {Integer@3921} 5
                        value = {StreamNode@3930} "Sink: Print to Std. Out-5"
                                id = 5
                                parallelism = 1
                                maxParallelism = -1
                                minResources = {ResourceSpec@3492} "ResourceSpec{UNKNOWN}"
                                preferredResources = {ResourceSpec@3492} "ResourceSpec{UNKNOWN}"
                                managedMemoryOperatorScopeUseCaseWeights = {HashMap@3998}  size = 0
                                managedMemorySlotScopeUseCases = {HashSet@3999}  size = 0
                                bufferTimeout = -1
                                operatorName = "Sink: Print to Std. Out"
                                slotSharingGroup = "default"
                                coLocationGroup = null
                                statePartitioners = {KeySelector[0]@4000} 
                                stateKeySerializer = null
                                operatorFactory = {SimpleUdfStreamOperatorFactory@3810} 
                                typeSerializersIn = {TypeSerializer[2]@4001} 
                                typeSerializerOut = null
                                inEdges = {ArrayList@4002}  size = 1
                                        0 = {StreamEdge@3989} "(Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4 -> Sink: Print to Std. Out-5, typeNumber=0, outputPartitioner=REBALANCE, bufferTimeout=-1, outputTag=null)"
                                                edgeId = "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4_Sink: Print to Std. Out-5_0_REBALANCE"
                                                sourceId = 4
                                                targetId = 5
                                                typeNumber = 0
                                                outputTag = null
                                                outputPartitioner = {RebalancePartitioner@3992} "REBALANCE"
                                                sourceOperatorName = "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)"
                                                targetOperatorName = "Sink: Print to Std. Out"
                                                shuffleMode = {ShuffleMode@3664} "UNDEFINED"
                                                bufferTimeout = -1
                                outEdges = {ArrayList@4003}  size = 0
                                jobVertexClass = {Class@3963} "class org.apache.flink.streaming.runtime.tasks.OneInputStreamTask"
                                inputFormat = null
                                outputFormat = null
                                transformationUID = null
                                userHash = null
                                sortedInputs = false
        sources = {HashSet@3585}  size = 1
                0 = {Integer@3540} 1
        sinks = {HashSet@3586}  size = 1
                0 = {Integer@3540} 5
        virtualSideOutputNodes = {HashMap@3587}  size = 0
        virtualPartitionNodes = {HashMap@3588}  size = 1
                {Integer@3683} 6 -> {Tuple3@3914} "(2,HASH,UNDEFINED)"
        vertexIDtoBrokerID = {HashMap@3589}  size = 0
        vertexIDtoLoopTimeout = {HashMap@3590}  size = 0
        stateBackend = null
        iterationSourceSinkPairs = {HashSet@3591}  size = 0
        timerServiceProvider = null