Flink Client源码深度剖析：解码任务提交流程的底层逻辑

Edingbrugh.南空

于 2025-06-23 09:28:44 发布

阅读量952

点赞数 25

CC 4.0 BY-SA版权

分类专栏： flink 大数据文章标签： flink 大数据

本文链接：https://blog.csdn.net/qq_42773076/article/details/148830535

大数据同时被 2 个专栏收录

90 篇文章

订阅专栏

flink

42 篇文章

订阅专栏

在Flink的生态体系中，Flink Client作为用户与集群交互的关键枢纽，承担着解析用户提交命令、触发作业执行的核心职责。从解析命令行参数，到最终将作业提交给集群执行，Flink Client的每一步操作都蕴含着精妙的设计与复杂的逻辑。接下来，我们将以你的笔记内容为线索，深入剖析Flink Client的源码，揭开任务提交流程的神秘面纱。

一、Flink Client核心职责概述

Flink Client的主要任务是解析用户提交的命令行，识别具体命令，并将这些命令逐步转化为可执行的作业。无论是local、yarn-session、yarn-cluster per jar、k8s还是standalone等部署方式，Flink Client的运行流程大致相同，都需要经过参数解析、命令识别、环境构建以及作业提交等关键步骤。具体而言，其核心流程包括：

解析参数：识别提交参数类型与执行环境，通过反射执行用户main方法获取ExecutionEnvironment；
构建作业图：运行用户提交jar中的main方法，生成StreamGraph和JobGraph，并将作业交给PipelineExecutor；
作业执行：Flink集群接收JobGraph后，将其转换为ExecutionGraph，进而调度执行。

二、Flink任务提交流程详解

2.1 提交任务入口：`bin/flink`脚本解析

在Flink中，我们日常通过bin/flink脚本进行任务管理操作，该脚本是Flink Client与集群交互的起点。脚本中的核心代码片段如下（已精简）：

# Add Client-specific JVM options
FLINK_ENV_JAVA_OPTS="${FLINK_ENV_JAVA_OPTS} ${FLINK_ENV_JAVA_OPTS_CLI}"
# Add HADOOP_CLASSPATH to allow the usage of Hadoop file systems
exec "${JAVA_RUN}" $JVM_ARGS $FLINK_ENV_JAVA_OPTS "${log_setting[@]}" 
-classpath "`manglePathList "$CC_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS"`" org.apache.flink.client.cli.CliFrontend "$@"

从上述代码可知，bin/flink脚本最终将任务处理交给org.apache.flink.client.cli.CliFrontend类。该类的main方法是整个程序的主要入口，在Flink Client中占据重要地位。main方法核心逻辑如下：

public static void main(final String[] args) {
    EnvironmentInformation.logEnvironmentInfo(LOG, "Command Line Client", args);
    LOG.info("CliFrontend start");
    // 1. 找到配置目录
    final String configurationDirectory = getConfigurationDirectoryFromEnv();
    // 2. 加载全局配置
    final Configuration configuration = GlobalConfiguration.loadConfiguration(configurationDirectory);
    // 3. 加载自定义命令行
    final List<CustomCommandLine> customCommandLines = loadCustomCommandLines(configuration, configurationDirectory);
    try {
        final CliFrontend cli = new CliFrontend(configuration, customCommandLines);
        SecurityUtils.install(new SecurityConfiguration(cli.configuration));
        retCode = SecurityUtils.getInstalledContext().runSecured(() -> cli.parseAndRun(args));
    } catch (Throwable t) {
        // 异常处理逻辑
    }
}

main方法首先确定配置目录，接着加载全局配置和自定义命令行，最后实例化CliFrontend对象并执行parseAndRun方法，正式开启任务提交的后续流程。

2.2 配置环境加载：寻找与解析配置文件

在CliFrontend的main方法中，配置文件的加载分为两个关键步骤：确定配置文件位置和加载配置内容。getConfigurationDirectoryFromEnv方法用于查找配置文件，其查找优先级为：环境变量ENV_FLINK_CONF_DIR > "../conf" > "/conf"，若未找到则抛出异常，具体代码如下：

public static String getConfigurationDirectoryFromEnv() {
    String location = System.getenv(ConfigConstants.ENV_FLINK_CONF_DIR);
    if (location != null) {
        return location;
    } else if (new File(CONFIG_DIRECTORY_FALLBACK_1).exists()) {
        location = CONFIG_DIRECTORY_FALLBACK_1;
    } else if (new File(CONFIG_DIRECTORY_FALLBACK_2).exists()) {
        location = CONFIG_DIRECTORY_FALLBACK_2;
    } else {
        throw new RuntimeException(
                "The configuration directory was not specified. "
                        + "Please specify the directory containing the configuration file through the '"
                        + ConfigConstants.ENV_FLINK_CONF_DIR
                        + "' environment variable.");
    }
    return location;
}

确定配置目录后，通过GlobalConfiguration.loadConfiguration(configurationDirectory)方法将配置文件内容加载到Configuration对象中，为后续任务执行提供必要的配置信息，如集群地址、资源分配策略等。

2.3 执行类型识别：判断任务执行方式

在main方法加载全局配置和自定义命令行后，会调用parseAndRun方法执行具体操作。该方法首先获取提交参数中的第一个参数作为执行类型，并从参数数组中移除（后续不再使用），然后根据不同的执行类型执行相应的操作，部分代码如下：

public int parseAndRun(String[] args) {
    String action = args[0];
    // 从参数中删除动作
    final String[] params = Arrays.copyOfRange(args, 1, args.length);
    try {
        // do action
        switch (action) {
            case ACTION_RUN:
                run(params);
                return 0;
            case ACTION_RUN_APPLICATION:
                runApplication(params);
                return 0;
            case ACTION_LIST:
                list(params);
                return 0;
            case ACTION_INFO:
                info(params);
                return 0;
            case ACTION_CANCEL:
                cancel(params);
                return 0;
            case ACTION_STOP:
                stop(params);
                return 0;
            case ACTION_SAVEPOINT:
                savepoint(params);
                return 0;
            case "-h":
            case "--help":
                CliFrontendParser.printHelp(customCommandLines);
                return 0;
            // 其他执行类型处理逻辑
        }
    } catch (CliArgsException ce) {
        // 异常处理逻辑
    }
}

例如，当执行类型为ACTION_RUN时，会调用run方法进行任务提交相关操作。

2.4 识别Flink部署方式：确定任务运行环境

loadCustomCommandLines方法默认加载Generic、Yarn和Default三种命令行客户端，后续会根据isActive()方法按顺序选择。在run方法中，首先获取Flink默认提供的参数，然后合并用户自定义参数，最后通过validateAndGetActiveCommandLine方法判断具体的部署方式，核心代码如下：

protected void run(String[] args) throws Exception {
    LOG.info("Running 'run' command.");
    // 默认提供参数
    final Options commandOptions = CliFrontendParser.getRunCommandOptions();
    final CommandLine commandLine = getCommandLine(commandOptions, args, true);
    // 返回 help信息
    if (commandLine.hasOption(HELP_OPTION.getOpt())) {
        CliFrontendParser.printHelpForRun(customCommandLines);
        return;
    }
    final CustomCommandLine activeCommandLine = validateAndGetActiveCommandLine(checkNotNull(commandLine));
    // py or not py
    final ProgramOptions programOptions = ProgramOptions.create(commandLine);
    final List<URL> jobJars = getJobJarAndDependencies(programOptions);
    final Configuration effectiveConfiguration = getEffectiveConfiguration(activeCommandLine, commandLine, programOptions, jobJars);
    LOG.debug("Effective executor configuration: {}", effectiveConfiguration);
    try (PackagedProgram program = getPackagedProgram(programOptions, effectiveConfiguration)) {
        executeProgram(effectiveConfiguration, program);
    }
}

通过这一系列操作，Flink Client能够准确识别任务的部署方式，为后续任务执行做好准备。

2.5 执行任务：从参数构建到程序运行

在run方法确定Flink部署方式后，会通过ProgramOptions.create方法创建任务程序运行的参数。该方法针对Python和非Python程序有不同的处理逻辑：

public static ProgramOptions create(CommandLine line) throws CliArgsException {
    if (isPythonEntryPoint(line) || containsPythonDependencyOptions(line)) {
        return createPythonProgramOptions(line);
    } else {
        return new ProgramOptions(line);
    }
}
// py执行的逻辑
public static ProgramOptions createPythonProgramOptions(CommandLine line) throws CliArgsException {
    try {
        ClassLoader classLoader = getPythonClassLoader();
        Class<?> pythonProgramOptionsClazz = Class.forName("org.apache.flink.client.cli.PythonProgramOptions", false, classLoader);
        Constructor<?> constructor = pythonProgramOptionsClazz.getConstructor(CommandLine.class);
        return (ProgramOptions) constructor.newInstance(line);
    } catch (InstantiationException e) {
        throw new CliArgsException(
                "Python command line option detected but the flink-python module seems to be missing "
                        + "or not working as expected.",
                e);
    }
}

构建完程序运行参数后，通过getJobJarAndDependencies方法获取启动jar的入口类路径及依赖jar的类路径，再通过getEffectiveConfiguration方法获取本次执行的有效配置。接着，buildProgram方法进一步封装任务参数，包括程序启动入口类、依赖jar、参数等：

PackagedProgram buildProgram(final ProgramOptions runOptions, final Configuration configuration)
        throws FileNotFoundException, ProgramInvocationException, CliArgsException {
    runOptions.validate();
    String[] programArgs = runOptions.getProgramArgs();
    String jarFilePath = runOptions.getJarFilePath();
    List<URL> classpaths = runOptions.getClasspaths();
    // 获取执行程序类
    String entryPointClass = runOptions.getEntryPointClassName();
    File jarFile = jarFilePath != null ? getJarFile(jarFilePath) : null;
    return PackagedProgram.newBuilder()
           .setJarFile(jarFile)
           .setUserClassPaths(classpaths)
           .setEntryPointClassName(entryPointClass)
           .setConfiguration(configuration)
           .setSavepointRestoreSettings(runOptions.getSavepointRestoreSettings())
           .setArguments(programArgs)
           .build();
}

最后，通过executeProgram(effectiveConfiguration, program)方法执行程序。该方法借助ClientUtils执行任务，主要操作包括设置用户jar为当前运行线程、设置上下文，以及通过反射执行jar中的main方法：

public static void executeProgram(
        PipelineExecutorServiceLoader executorServiceLoader,
        Configuration configuration,
        PackagedProgram program,
        boolean enforceSingleJobExecution,
        boolean suppressSysout)
        throws ProgramInvocationException {
    checkNotNull(executorServiceLoader);
    final ClassLoader userCodeClassLoader = program.getUserCodeClassLoader();
    final ClassLoader contextClassLoader = Thread.currentThread().getContextClassLoader();
    try {
        Thread.currentThread().setContextClassLoader(userCodeClassLoader);
        LOG.info(
                "Starting program (detached: {})",
                !configuration.getBoolean(DeploymentOptions.ATTACHED));
        // 用户代码中的 getExecutionEnvironment 会返回该 Environment
        ContextEnvironment.setAsContext(
                executorServiceLoader,
                configuration,
                userCodeClassLoader,
                enforceSingleJobExecution,
                suppressSysout);
        StreamContextEnvironment.setAsContext(
                executorServiceLoader,
                configuration,
                userCodeClassLoader,
                enforceSingleJobExecution,
                suppressSysout);
        program.invokeInteractiveModeForExecution();        
    } finally {
        Thread.currentThread().setContextClassLoader(contextClassLoader);
    }
}

//  PackagedProgram类中的方法
public void invokeInteractiveModeForExecution() throws ProgramInvocationException {
    FlinkSecurityManager.monitorUserSystemExitForCurrentThread();
    callMainMethod(mainClass, args);
}

//  PackagedProgram类中的方法
private static void callMainMethod(Class<?> entryClass, String[] args)
        throws ProgramInvocationException {
     mainMethod = entryClass.getMethod("main", String[].class);
     mainMethod.invoke(null, (Object) args);
}

至此，Flink Client完成了从flink脚本到反射调用jar中main方法的整个流程，涉及的核心类包括CliFrontend、PackagedProgram、CliFrontendParser和ClientUtils。这些类相互协作，确保Flink任务能够顺利提交并执行。深入理解Flink Client的源码和任务提交流程，有助于我们在实际开发和运维中更好地优化Flink作业，解决遇到的各种问题。