datax(14):源码解读Channel

最新推荐文章于 2025-02-21 08:00:00 发布

原创最新推荐文章于 2025-02-21 08:00:00 发布 · 10w+ 阅读

9 ·

CC 4.0 BY-SA版权

文章标签：

#datax

datax 专栏收录该内容

28 篇文章

订阅专栏

本文详细解析了DataX中的Channel组件，包括其作为Reader和Writer通信桥梁的作用，内存Channel的工作原理，以及push和pull操作的限速控制机制。重点介绍了如何配置和理解实际并发度与文件切片数量的关系，以及statPush和statPull方法在速度控制中的作用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、概述

Channel是Reader和Writer的通信组件。因为Reader和Writer是一一对应的关系，所以channel的值可以视为是DataX任务中数据管道的数量；Reader向channle写入数据，Writer从channel读取数据。channel还提供了限速的功能，支持数据大小（字节数），数据条数。

但是在实际ETL过程中，配置了channel可能不产生效果，实际的DataX并发度由数据源的文件切片数量决定，比如数据源是hdfs上某个文件，这个文件的并发度是3，有3个文件，则Datax读取文件时会启动3个并发线程读取，忽视channel中设置的值；

所以，一般setting中不进行多余配置；只编写reader和writer部分；

二、族谱

目前datax的channel族谱中只有2个，Channel和MemoryChannel。MemoryChannel 原理
MemoryChannel实现了doPush、doPushAll、doPull、doPullAll方法，它本质是将数据放进ArrayBlockingQueue。
在这里插入图片描述

主要方法

在这里插入图片描述

三、详细方法分析

1、写入数据

Channel提供push方法，给Reader调用，写入数据。在父类Channel中实现了push和pushAll方法，doPush和doPushAll则在子类MemoryChannel中实现

父类Channel中
push方法运行时序图
在这里插入图片描述
pushAll方法运行时序图


  /**
   * 将单条数据push到reader中
   *
   * @param r Record
   */
  public void push(final Record r) {
    Validate.notNull(r, "record不能为空.");
    //子类MemoryChannel实现doPush
    this.doPush(r);
    this.statPush(1L, r.getByteSize());
  }

  /**
   * 将多条数据push到 reader中，push前会检查数据是否空，数据内元素是否空
   *
   * @param rs Collection<Record>
   */
  public void pushAll(final Collection<Record> rs) {
    Validate.notNull(rs);
    Validate.noNullElements(rs);
    this.doPushAll(rs);
    this.statPush(rs.size(), this.getByteSize(rs));
  }

子类 MemoryChannel中的doXXX方法

  @Override
  protected void doPush(Record r) {
    try {
      long startTime = System.nanoTime();
      // ArrayBlockingQueue提供了阻塞的put方法，写入数据
      this.queue.put(r);
      // 记录写入push花费的时间
      waitWriterTime += System.nanoTime() - startTime;
      // 更新Channel里数据的字节数
      memoryBytes.addAndGet(r.getMemorySize());
    } catch (InterruptedException ex) {
      Thread.currentThread().interrupt();
    }
  }

  @Override
  protected void doPushAll(Collection<Record> rs) {
    try {
      // 获取锁
      lock.lockInterruptibly();
      long startTime = System.nanoTime();
      int bytes = getRecordBytes(rs);
      while (memoryBytes.get() + bytes > this.byteCapacity || rs.size() > this.queue
          .remainingCapacity()) {
        // 如果新增数据，会造成数据字节数超过指定容量， 或者超过了queue的容量，就会一直等待notInsufficient信号
        notInsufficient.await(200L, TimeUnit.MILLISECONDS);
      }
      // 向queue里添加数据
      this.queue.addAll(rs);
      // 更新push的时间
      waitWriterTime += System.nanoTime() - startTime;
      // 更新数据的字节数
      memoryBytes.addAndGet(bytes);
      // 通知可以pull数据的信号
      notEmpty.signalAll();
    } catch (InterruptedException e) {
      throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, e);
    } finally {
      lock.unlock();
    }
  }

Channel的statPush里面会对速度进行控制。
statPush方法运行时序图
在这里插入图片描述

  /**
   * 对速度进行控制。它通过Communication记录总的写入数据大小和数据条数。然后每隔一段时间，检查速度。
   * 如果速度过快，就会sleep一段时间，来把速度降下来。
   *
   * @param recordSize
   * @param byteSize
   */
  private void statPush(long recordSize, long byteSize) {
    // currentCommunication实时记录了Reader读取的总数据字节数和条数
    currentCommunication.increaseCounter(CommunicationTool.READ_SUCCEED_RECORDS, recordSize);
    currentCommunication.increaseCounter(READ_SUCCEED_BYTES, byteSize);

    //在读的时候进行统计waitCounter即可，因为写（pull）的时候可能正在阻塞，但读的时候已经能读到这个阻塞的counter数
    currentCommunication.setLongCounter(CommunicationTool.WAIT_READER_TIME, waitReaderTime);
    currentCommunication.setLongCounter(CommunicationTool.WAIT_WRITER_TIME, waitWriterTime);

    // 判断是否会限速
    boolean isChannelByteSpeedLimit = (this.byteSpeed > 0);
    boolean isChannelRecordSpeedLimit = (this.recordSpeed > 0);
    if (!isChannelByteSpeedLimit && !isChannelRecordSpeedLimit) {
      return;
    }

    // lastCommunication记录最后一次的时间
    long lastTimestamp = lastCommunication.getTimestamp();
    long nowTimestamp = System.currentTimeMillis();
    long interval = nowTimestamp - lastTimestamp;
    // 每隔flowControlInterval一段时间，就会检查是否超速
    if (interval - this.flowControlInterval >= 0) {
      long byteLimitSleepTime = 0;
      long recordLimitSleepTime = 0;
      //如果设置了channel的字节速度限制
      if (isChannelByteSpeedLimit) {
        // 计算速度，(现在的字节数 - 上一次的字节数) / 过去的时间
        long nowTotalReadBytes = CommunicationTool.getTotalReadBytes(currentCommunication);
        long lastTotalReadBytes = CommunicationTool.getTotalReadBytes(lastCommunication);
        long currentByteSpeed = (nowTotalReadBytes - lastTotalReadBytes) * 1000 / interval;
        if (currentByteSpeed > this.byteSpeed) {
          // 计算根据byteLimit得到的休眠时间，
          // 这段时间传输的字节数 / 期望的限定速度 - 这段时间
          byteLimitSleepTime = currentByteSpeed * interval / this.byteSpeed - interval;
        }
      }
      //如果设置了channel的记录速度限制
      if (isChannelRecordSpeedLimit) {
        long nowRecordSpeed = CommunicationTool.getTotalReadRecords(currentCommunication);
        long lastRecordSpeed = CommunicationTool.getTotalReadRecords(lastCommunication);
        long currentRecordSpeed = (nowRecordSpeed - lastRecordSpeed) * 1000 / interval;
        if (currentRecordSpeed > this.recordSpeed) {
          // 计算根据recordLimit得到的休眠时间
          recordLimitSleepTime = currentRecordSpeed * interval / this.recordSpeed - interval;
        }
      }

      // 休眠时间取较大值
      long sleepTime = Math.max(byteLimitSleepTime, recordLimitSleepTime);
      if (sleepTime > 0) {
        try {
          Thread.sleep(sleepTime);
        } catch (InterruptedException e) {
          Thread.currentThread().interrupt();
        }
      }
      // 保存读取字节数
      lastCommunication.setLongCounter(READ_SUCCEED_BYTES,
          currentCommunication.getLongCounter(READ_SUCCEED_BYTES));
      // 保存读取失败的字节数
      lastCommunication.setLongCounter(READ_FAILED_BYTES,
          currentCommunication.getLongCounter(READ_FAILED_BYTES));
      // 保存读取条数
      lastCommunication.setLongCounter(CommunicationTool.READ_SUCCEED_RECORDS,
          currentCommunication.getLongCounter(CommunicationTool.READ_SUCCEED_RECORDS));
      // 保存读取失败的条数
      lastCommunication.setLongCounter(CommunicationTool.READ_FAILED_RECORDS,
          currentCommunication.getLongCounter(CommunicationTool.READ_FAILED_RECORDS));
      // 记录保存的时间点
      lastCommunication.setTimestamp(nowTimestamp);
    }
  }

2、读取数据

Channel提供pull方法，给Writer调用，读取数据。在父类Channel中实现了pull和pullAll方法，doPull和doPullAll则在子类MemoryChannel中实现；

Channel提供pull和pullAll方法

pull方法运行时序图
在这里插入图片描述

pullAll方法运行时序图

在这里插入图片描述

public Record pull() {
    // 子类实现doPull方法，返回数据
    Record record = this.doPull();
    // 调用statPull方法，更新统计数据
    this.statPull(1L, record.getByteSize());
    return record;
}

public void pullAll(final Collection<Record> rs) {
    Validate.notNull(rs);
    // 子类实现doPullAll方法，返回数据
    this.doPullAll(rs);
    // 调用statPull方法，更新统计数据
    this.statPull(rs.size(), this.getByteSize(rs));
}

MemoryChannel提供的doXXX方法

  @Override
  protected Record doPull() {
    try {
      long startTime = System.nanoTime();
      // ArrayBlockingQueue提供了阻塞的take方法，读取入数据
      Record r = this.queue.take();
      // 记录写入pull花费的时间
      waitReaderTime += System.nanoTime() - startTime;
      // 更新Channel里数据的字节数
      memoryBytes.addAndGet(-r.getMemorySize());
      return r;
    } catch (InterruptedException e) {
      Thread.currentThread().interrupt();
      throw new IllegalStateException(e);
    }
  }


  @Override
  protected void doPullAll(Collection<Record> rs) {
    assert rs != null;
    rs.clear();
    try {
      long startTime = System.nanoTime();
      lock.lockInterruptibly();
      // 从queue里面取出数据，最多bufferSize条
      while (this.queue.drainTo(rs, bufferSize) <= 0) {
        // 如果queue里面没有数据，就等待notEmpty信号
        notEmpty.await(200L, TimeUnit.MILLISECONDS);
      }
      waitReaderTime += System.nanoTime() - startTime;
      int bytes = getRecordBytes(rs);
      memoryBytes.addAndGet(-bytes);
      notInsufficient.signalAll();
    } catch (InterruptedException e) {
      throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, e);
    } finally {
      lock.unlock();
    }
  }

statPull方法，并没有限速。因为数据的整个流程是Reader -》 Channle -》 Writer， Reader的push速度限制了，Writer的pull速度也就没必要限速
statPull方法运行时序图
在这里插入图片描述

  /**
   * statPull方法，并没有限速。因为数据的整个流程是Reader -》 channel -》 Writer， Reader的push速度限制了，
   * Writer的pull速度也就没必要限速
   *
   * @param recordSize long
   * @param byteSize   long
   */
  private void statPull(long recordSize, long byteSize) {
    currentCommunication.increaseCounter(CommunicationTool.WRITE_RECEIVED_RECORDS, recordSize);
    currentCommunication.increaseCounter(CommunicationTool.WRITE_RECEIVED_BYTES, byteSize);
  }