Flink 1.12.2 源码浅析 : AbstractRichFunction

本文介绍了Apache Flink中RichFunction的概念,包括其生命周期方法如open()和close(),以及如何获取和使用RuntimeContext。RichFunction提供了访问执行上下文的能力,允许在初始化时进行配置,并在函数执行前后进行资源管理。同时,文章通过示例展示了如何在实际的数据源(如MysqlSource、SocketSourceFunction)和数据接收器(如MysqlSink)中应用这些方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

一 .前言

Flink代码里面好多地方用到了AbstractRichFunction,所以瞄一眼这个抽象类干了啥…

在这里插入图片描述

二 .代码相关

2.1. RichFunction

RichFunction 接口是AbstractRichFunction 的父类, 所以先看他.
RichFunction 是为用户自定义functions 的基类 . 实现了Function接口(接口啥都没有),
这个类定义函数生命周期的方法,以及访问执行函数的上下文的方法。

2.1.1 void open(Configuration parameters) throws Exception;

function的初始化方法
在调用实际请求的方法之前调用. (比如 map , join ) . 因此适合一次性的设置操作.
对于作为迭代一部分的函数,此方法将在每个迭代步骤的开始处调用
传递给函数的配置对象可用于配置和初始化。
configuration包含在 program composition 中的函数上配置的所有参数。


    /**
     * 默认这个方法不做任何事情.
     *
     * Initialization method for the function.
     *
     * It is called before the actual working methods (like
     * <i>map</i> or <i>join</i>) and thus suitable for one time setup work.
     *
     * For functions that are part of an iteration, this method will be invoked at the beginning of each iteration superstep.
     *
     * <p>The configuration object passed to the function can be used for configuration and initialization.
     *
     * The configuration contains all parameters that were configured on the function in the program composition.
     *
     * <pre>{@code
     * public class MyFilter extends RichFilterFunction<String> {
     *
     *     private String searchString;
     *
     *     public void open(Configuration parameters) {
     *         this.searchString = parameters.getString("foo");
     *     }
     *
     *     public boolean filter(String value) {
     *         return value.equals(searchString);
     *     }
     * }
     * }</pre>
     *
     * <p>By default, this method does nothing.
     *
     * @param parameters The configuration containing the parameters attached to the contract.
     * @throws Exception Implementations may forward exceptions, which are caught by the runtime.
     *     When the runtime catches an exception, it aborts the task and lets the fail-over logic
     *     decide whether to retry the task execution.
     * @see org.apache.flink.configuration.Configuration
     */
    void open(Configuration parameters) throws Exception;

2.1.2 void close() throws Exception;

用户代码的 Tear-down 方法。
主方法执行完之后调用.
对于作为迭代一部分的函数,此方法将在每次迭代后调用。
这个方法可以用于清理之后的work .

    /**
     *
     * Tear-down method for the user code.
     * It is called after the last call to the main working  methods (e.g. <i>map</i> or <i>join</i>).
     * For functions that are part of an iteration, this method will be invoked after each iteration superstep.
     * <p>This method can be used for clean up work.
     *
     * @throws Exception Implementations may forward exceptions, which are caught by the runtime.
     *     When the runtime catches an exception, it aborts the task and lets the fail-over logic
     *     decide whether to retry the task execution.
     */
    void close() throws Exception;

2.1.3 RuntimeContext getRuntimeContext();

获取包含有关UDF运行时的信息的context,例如函数的并行读、函数的子任务索引或执行函数的任务的名称。

    /**
     * 获取包含有关UDF运行时的信息的context,例如函数的并行读、函数的子任务索引或执行函数的任务的名称。
     * Gets the context that contains information about the UDF's runtime, such as the parallelism
     * of the function, the subtask index of the function, or the name of the task that executes the
     * function.
     *
     * <p>The RuntimeContext also gives access to the {@link
     * org.apache.flink.api.common.accumulators.Accumulator}s and the {@link
     * org.apache.flink.api.common.cache.DistributedCache}.
     *
     * @return The UDF's runtime context.
     */
    RuntimeContext getRuntimeContext();

2.1.4 IterationRuntimeContext getIterationRuntimeContext();


    /**
     * 
     * 获取{@link RuntimeContext}的指定版本,其中包含有关在其中执行函数的迭代的附加信息。
     * 仅当函数是迭代的一部分时,此IterationRuntimeContext才可用。否则,此方法将引发异常。
     * 
     * Gets a specialized version of the {@link RuntimeContext}, which has additional information about the iteration in which the function is executed.
     *
     * This IterationRuntimeContext is only available if the function is part of an iteration. Otherwise, this method throws an exception.
     *
     * @return The IterationRuntimeContext.
     * @throws java.lang.IllegalStateException Thrown, if the function is not executed as part of an
     *     iteration.
     */
    IterationRuntimeContext getIterationRuntimeContext();

2.1.5 void setRuntimeContext(RuntimeContext t);

    /**
     * Sets the function's runtime context. Called by the framework when creating a parallel instance of the function.
     *
     * @param t The runtime context.
     */
    void setRuntimeContext(RuntimeContext t);

2.1.6 官方自带demo

public class MyFilter extends RichFilterFunction<String> {
    
    private String searchString;

    public void open(Configuration parameters) {
        this.searchString = parameters.getString("foo");
    }

    public boolean filter(String value) {
        return value.equals(searchString);
    }
}

2.2. AbstractRichFunction

AbstractRichFunction 是RichFunction接口的抽象实现.
Rich functions 有额外的初始化方法 ({@link #open(Configuration)}) 和 拆解方法 ({@link #close()})
以及通过{@link #getRuntimeContext()}访问它们的 runtime context

AbstractRichFunction 抽象类 新增了一个 private transient RuntimeContext runtimeContext; 属性. 以及重写了setRuntimeContextgetRuntimeContext 方法.

   @Override
    public void setRuntimeContext(RuntimeContext t) {
        this.runtimeContext = t;
    }

    @Override
    public RuntimeContext getRuntimeContext() {
        if (this.runtimeContext != null) {
            return this.runtimeContext;
        } else {
            throw new IllegalStateException("The runtime context has not been initialized.");
        }
    }

    @Override
    public IterationRuntimeContext getIterationRuntimeContext() {
        if (this.runtimeContext == null) {
            throw new IllegalStateException("The runtime context has not been initialized.");
        } else if (this.runtimeContext instanceof IterationRuntimeContext) {
            return (IterationRuntimeContext) this.runtimeContext;
        } else {
            throw new IllegalStateException("This stub is not part of an iteration step function.");
        }
    }

三 .案例

在这里我找个案例去看看如何应用.

3.1. RichSourceFunction 类型的数据源

  • 构建mysql数据源, 从mysql数据库中读取数据.
package org.apache.flink.table.examples.java.basics;

import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.source.RichSourceFunction;

import java.sql.DriverManager;
import java.sql.ResultSet;
import com.mysql.jdbc.Connection;
import com.mysql.jdbc.PreparedStatement;


public class MysqlSource extends RichSourceFunction<Tuple3<String,String,String>> {

    private static final long serialVersionUID = 3334654984018091675L;

    private Connection connect = null;
    private PreparedStatement ps = null;

    @Override
    public void open(Configuration parameters) throws Exception {
        super.open(parameters);
        Class.forName("com.mysql.jdbc.Driver");
        connect = (Connection) DriverManager.getConnection("jdbc:mysql://192.168.xx.xx:3306", "root", "xxxxx");
        ps = (PreparedStatement) connect.prepareStatement("select id,name,age from user ");
    }

    @Override
    public void run(SourceContext<Tuple3<String, String, String>> ctx) throws Exception {
        ResultSet resultSet = ps.executeQuery();
        while (resultSet.next()) {
            Tuple3<String, String, String> tuple = new Tuple3<String, String, String>();
            tuple.setFields(resultSet.getString(1), resultSet.getString(2), resultSet.getString(3));
            ctx.collect(tuple);
        }

    }

    @Override
    public void cancel() {
        try {
            super.close();
            if (connect != null) {
                connect.close();
            }
            if (ps != null) {
                ps.close();
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

  • Socket 类型的数据源

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.flink.table.examples.java.connectors;

import org.apache.flink.api.common.serialization.DeserializationSchema;
import org.apache.flink.api.common.serialization.RuntimeContextInitializationContextAdapters;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.source.RichSourceFunction;
import org.apache.flink.table.data.RowData;

import java.io.ByteArrayOutputStream;
import java.io.InputStream;
import java.net.InetSocketAddress;
import java.net.Socket;

/**
 * The {@link SocketSourceFunction} opens a socket and consumes bytes.
 *
 * <p>It splits records by the given byte delimiter (`\n` by default) and delegates the decoding to a
 * pluggable {@link DeserializationSchema}.
 *
 * <p>Note: This is only an example and should not be used in production. The source function is not
 * fault-tolerant and can only work with a parallelism of 1.
 */
public final class SocketSourceFunction extends RichSourceFunction<RowData> implements ResultTypeQueryable<RowData> {

	private final String hostname;
	private final int port;
	private final byte byteDelimiter;
	private final DeserializationSchema<RowData> deserializer;

	private volatile boolean isRunning = true;
	private Socket currentSocket;

	public SocketSourceFunction(String hostname, int port, byte byteDelimiter, DeserializationSchema<RowData> deserializer) {
		this.hostname = hostname;
		this.port = port;
		this.byteDelimiter = byteDelimiter;
		this.deserializer = deserializer;
	}

	@Override
	public TypeInformation<RowData> getProducedType() {
		return deserializer.getProducedType();
	}

	@Override
	public void open(Configuration parameters) throws Exception {
		deserializer.open(
				RuntimeContextInitializationContextAdapters.deserializationAdapter(getRuntimeContext())
		);
	}

	@Override
	public void run(SourceContext<RowData> ctx) throws Exception {
		while (isRunning) {
			// open and consume from socket
			try (final Socket socket = new Socket()) {
				currentSocket = socket;
				socket.connect(new InetSocketAddress(hostname, port), 0);
				try (InputStream stream = socket.getInputStream()) {
					ByteArrayOutputStream buffer = new ByteArrayOutputStream();
					int b;
					while ((b = stream.read()) >= 0) {
						// buffer until delimiter
						if (b != byteDelimiter) {
							buffer.write(b);
						}
						// decode and emit record
						else {
							ctx.collect(deserializer.deserialize(buffer.toByteArray()));
							buffer.reset();
						}
					}
				}
			} catch (Throwable t) {
				t.printStackTrace(); // print and continue
			}
			Thread.sleep(1000);
		}
	}

	@Override
	public void cancel() {
		isRunning = false;
		try {
			currentSocket.close();
		} catch (Throwable t) {
			// ignore
		}
	}
}

3.3. KafkaSourceFunction 类型的数据源

static class KafkaSourceFunction extends RichParallelSourceFunction<Tuple3<Integer, Long, Integer>> {
		private volatile boolean running = true;
		private final int numElementsPerProducer;
		private final boolean unBounded;

		KafkaSourceFunction(int numElementsPerProducer) {
			this.numElementsPerProducer = numElementsPerProducer;
			this.unBounded = true;
		}

		KafkaSourceFunction(int numElementsPerProducer, boolean unBounded) {
			this.numElementsPerProducer = numElementsPerProducer;
			this.unBounded = unBounded;
		}

		@Override
		public void run(SourceContext<Tuple3<Integer, Long, Integer>> ctx) throws Exception{
			long timestamp = INIT_TIMESTAMP;
			int sourceInstanceId = getRuntimeContext().getIndexOfThisSubtask();
			for (int i = 0; i < numElementsPerProducer && running; i++) {
				ctx.collect(new Tuple3<>(i, timestamp++, sourceInstanceId));
			}

			while (running && unBounded) {
				Thread.sleep(100);
			}
		}

		@Override
		public void cancel() {
			running = false;
		}
	}

3.3. RichSinkFunction 类型的Sink

package org.apache.flink.table.examples.java.basics;


import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import org.apache.flink.streaming.api.functions.source.RichSourceFunction;

import java.sql.DriverManager;
import java.sql.ResultSet;
import com.mysql.jdbc.Connection;
import com.mysql.jdbc.PreparedStatement;



public class MysqlSink extends RichSinkFunction<Tuple3<String,String,String>> {

    private static final long serialVersionUID = -8930276689109741501L;

    private Connection connect = null;
    private PreparedStatement ps = null;

    @Override
    public void open(Configuration parameters) throws Exception {
        super.open(parameters);
        super.open(parameters);
        Class.forName("com.mysql.jdbc.Driver");
        connect = (Connection) DriverManager.getConnection("jdbc:mysql://192.168.xxx.xxx:3306", "root", "xxxxx");
        ps = (PreparedStatement) connect.prepareStatement("insert into user (id,name,sex) values (?,?,?)");
    }

    @Override
    public void invoke(Tuple3<String, String, String> value, Context context) throws Exception {
        ps.setString(1, value.f0);
        ps.setString(2, value.f1);
        ps.setString(3, value.f2);
        ps.executeUpdate();
    }

    @Override
    public void close() throws Exception {
        try {
            super.close();
            if (connect != null) {
                connect.close();
            }
            if (ps != null) {
                ps.close();
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值