datax源码在线下载_datax源码下载,datax下載资源-CSDN下载

共1505个文件

svn-base：540个

html：213个

class：114个

datax

2星需积分: 50 27 浏览量 2017-12-17 21:38:14 上传评论收藏 30.04MB ZIP 举报

DataX是阿里巴巴开源的一款用于大数据同步的工具，它支持多种数据源之间的数据迁移，包括RDBMS（关系型数据库管理系统）、Hadoop生态、NoSQL数据库等。DataX的设计目标是实现离线数据同步过程的简单化和高效化，使得用户无需关注底层数据传输的细节，只需按照规范编写数据源插件，就能实现数据的高效迁移。 DataX的源码分析可以从以下几个方面进行： 1. **框架结构**：DataX的整体架构采用插件化的思想设计，主要分为三大部分：Reader（数据源读取）、Writer（数据源写入）和Core（核心框架）。Reader负责从源数据源读取数据，Writer负责将数据写入目标数据源，而Core作为连接两者的桥梁，提供任务调度、执行监控等功能。 2. **任务模型**：DataX的任务模型基于Job和Task的双层设计。Job是整体的数据迁移任务，包含了多个Task，每个Task则对应一次从Reader到Writer的数据传输过程。 3. **配置文件**：DataX使用JSON格式的配置文件来定义数据同步任务，其中包含数据源信息、表名、过滤条件、转换规则等参数。 4. **插件开发**：DataX的扩展性体现在其丰富的插件体系，开发者可以通过实现Reader和Writer接口来开发新的数据源支持。每个插件都需要实现数据读取和写入的抽象方法，如`readRecord`和`writeRecord`。 5. **数据校验**：在数据迁移过程中，DataX提供了数据校验机制，确保源端和目标端数据的一致性，可以设置全量校验、抽样校验等方式。 6. **并发策略**：DataX支持多线程并发读写，通过配置可以调整并发度，提高数据迁移效率。同时，它会根据数据源特性智能分配并发度，如根据表的分区信息进行并行迁移。 7. **异常处理**：在数据迁移过程中，DataX具备完善的异常处理机制，能够捕获并记录错误信息，方便后期排查问题。 8. **日志系统**：DataX的日志系统可以帮助开发者跟踪任务执行状态，记录迁移过程中的关键信息，对于调试和优化有重要作用。 9. **性能优化**：DataX在源码层面进行了大量的性能优化，例如批量读写、缓冲池利用、网络通信优化等，以提升数据同步的效率。 10. **社区与文档**：由于是阿里巴巴开源项目，DataX拥有活跃的社区和详细的官方文档，用户可以方便地获取帮助和学习资源。通过阅读和理解DataX的源码，开发者不仅可以了解数据同步的基本原理，还能掌握如何定制化开发适合自己业务的数据源插件，提升数据集成的效率和灵活性。对于希望深入理解大数据同步以及有志于从事相关开发工作的人员，研究DataX源码是一个很好的学习途径。

资源推荐

资源详情

资源评论

收起资源包目录

datax源码在线下载（1505个子文件）

libiconv.so.2 1.19MB

all-wcprops 4KB

all-wcprops 3KB

all-wcprops 2KB

all-wcprops 1KB

all-wcprops 999B

all-wcprops 961B

all-wcprops 955B

all-wcprops 911B

all-wcprops 905B

all-wcprops 850B

all-wcprops 836B

all-wcprops 801B

all-wcprops 780B

all-wcprops 778B

all-wcprops 776B

all-wcprops 756B

all-wcprops 730B

all-wcprops 630B

all-wcprops 601B

all-wcprops 581B

all-wcprops 568B

all-wcprops 566B

all-wcprops 560B

all-wcprops 518B

all-wcprops 453B

all-wcprops 419B

all-wcprops 418B

all-wcprops 413B

all-wcprops 408B

all-wcprops 383B

all-wcprops 376B

all-wcprops 357B

all-wcprops 301B

all-wcprops 264B

all-wcprops 260B

all-wcprops 190B

all-wcprops 172B

all-wcprops 105B

all-wcprops 98B

all-wcprops 97B

all-wcprops 94B

all-wcprops 90B

all-wcprops 84B

共 1505 条

/** * (C) 2010-2011 Alibaba Group Holding Limited. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * version 2 as published by the Free Software Foundation. * */ package com.taobao.datax.plugins.writer.hdfswriter; import java.io.BufferedWriter; import java.io.OutputStreamWriter; import java.lang.reflect.Method; import java.net.URI; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.Writable; import org.apache.hadoop.io.compress.CompressionCodec; import org.apache.hadoop.io.compress.CompressionOutputStream; import org.apache.hadoop.util.ReflectionUtils; import org.apache.log4j.Logger; import com.taobao.datax.common.exception.DataExchangeException; import com.taobao.datax.common.exception.ExceptionTracker; import com.taobao.datax.common.plugin.Line; import com.taobao.datax.common.plugin.LineReceiver; import com.taobao.datax.common.plugin.PluginParam; import com.taobao.datax.common.plugin.PluginStatus; import com.taobao.datax.common.plugin.Writer; import com.taobao.datax.plugins.common.DFSUtils; public class HdfsWriter extends Writer { private static final Logger logger = Logger.getLogger(HdfsWriter.class); private FileSystem fs; private Path p = null; private char FIELD_SPLIT = '\u0001'; private char LINE_SPLIT = '\n'; private int BUFFER_SIZE = 8 * 1024; private String ENCODING = "UTF-8"; private String delMode = "3"; private String hadoop_conf = ""; private int concurrency = 10; private char[] nullChars = null; private static char[] searchChars = new char[2]; private DfsWriterStrategy dfsWriterStrategy = null; static { Thread.currentThread().setContextClassLoader(HdfsWriter.class.getClassLoader()); } /* * NOTE: if user set parameter 'splitnum' to 1, which means no-split in * hdfswriter, we use dir + prefixname as the fixed hdfs-file name for * example: dir = hdfs://taobao/dw prefixname = bazhen.csy splitname = 1 we * use hdfs://taobao/dw/bazhen.csy as the target filename which hdfswriter * dump file to * * for other cases, we use prefixname as just prefix filename, for example * dir = hdfs://taobao/dw prefixname = bazhen.csy splitname = 2 at last, the * generated filename will be hdfs://taobao/dw/bazhen.csy-0 * hdfs://taobao/dw/bazhen.csy-1 the suffix is thread number */ @Override public int prepare(PluginParam param) { String dir = param.getValue(ParamKey.dir); String ugi = param.getValue(ParamKey.ugi, null); String prefixname = param.getValue(ParamKey.prefixname, "prefix"); delMode = param.getValue(ParamKey.delMode, this.delMode); concurrency = param.getIntValue(ParamKey.concurrency, 1); hadoop_conf = param.getValue(ParamKey.hadoop_conf, ""); if (dir.endsWith("*")) { dir = dir.substring(0, dir.lastIndexOf("*")); } if (dir.endsWith("/")) { dir = dir.substring(0, dir.lastIndexOf("/")); } Path rootpath = new Path(dir); try { fs = DFSUtils.createFileSystem(new URI(dir), DFSUtils.getConf(dir, ugi, hadoop_conf)); /* No split to dump file, use dir as absolute filename . */ if (concurrency == 1) { DFSUtils.deleteFile(fs, new Path(dir + "/" + prefixname), true); } /* use dir as directory path . */ else { if ("4".equals(delMode)) DFSUtils.deleteFiles(fs, rootpath, true, true); else if ("3".equals(delMode)) DFSUtils.deleteFiles(fs, new Path(dir + "/" + prefixname + "-*"), true, true); } } catch (Exception e) { logger.error(ExceptionTracker.trace(e)); throw new DataExchangeException(String.format( "HdfsWriter Init file system failed:%s,%s", e.getMessage(), e.getCause())); } finally { closeAll(); } return PluginStatus.SUCCESS.value(); } @Override public List<PluginParam> split(PluginParam param) { HdfsFileSplitter spliter = new HdfsFileSplitter(); spliter.setParam(param); spliter.init(); return spliter.split(); } @Override public int init() { FIELD_SPLIT = param.getCharValue(ParamKey.fieldSplit, FIELD_SPLIT); ENCODING = param.getValue(ParamKey.encoding, ENCODING); LINE_SPLIT = param.getCharValue(ParamKey.lineSplit, LINE_SPLIT); searchChars[0] = FIELD_SPLIT; searchChars[1] = LINE_SPLIT; BUFFER_SIZE = param.getIntValue(ParamKey.bufferSize, BUFFER_SIZE); delMode = param.getValue(ParamKey.delMode, this.delMode); nullChars = param.getValue(ParamKey.nullChar, "") .toCharArray(); hadoop_conf = param.getValue(ParamKey.hadoop_conf, ""); String ugi = param.getValue(ParamKey.ugi, null); String dir = param.getValue(ParamKey.dir); try { fs = DFSUtils.createFileSystem(new URI(dir), DFSUtils.getConf(dir, ugi, hadoop_conf)); } catch (Exception e) { logger.error(ExceptionTracker.trace(e)); closeAll(); throw new DataExchangeException(String.format( "HdfsWriter Initialize file system failed:%s,%s", e.getMessage(), e.getCause())); } if (dir != null) { p = new Path(dir); } else { closeAll(); throw new DataExchangeException("Can't find the param [" + ParamKey.dir + "] in hdfs-writer-param."); } String filetype = param.getValue(ParamKey.fileType, "TXT"); if ("SEQ".equalsIgnoreCase(filetype) || "SEQ_COMP".equalsIgnoreCase(filetype)) dfsWriterStrategy = new DfsWriterSequeueFileStrategy(); else if ("TXT_COMP".equalsIgnoreCase(filetype)) dfsWriterStrategy = new DfsWriterTextFileStrategy(true); else if ("TXT".equalsIgnoreCase(filetype)) dfsWriterStrategy = new DfsWriterTextFileStrategy(false); else { closeAll(); throw new DataExchangeException( "HdfsWriter cannot recognize filetype: " + filetype); } return PluginStatus.SUCCESS.value(); } @Override public int connect() { if (p == null) { closeAll(); throw new DataExchangeException( "HdfsWriter Can't initialize file system ."); } try { if ("2".equals(delMode)) DFSUtils.deleteFile(fs, p, true); dfsWriterStrategy.open(); getMonitor().setStatus(PluginStatus.CONNECT); return PluginStatus.SUCCESS.value(); } catch (Exception ex) { closeAll(); logger.error(ExceptionTracker.trace(ex)); throw new DataExchangeException(String.format( "HdfsWriter initialize file system failed: %s, %s", ex.getMessage(), ex.getCause())); } } @Override public int startWrite(LineReceiver receiver) { getMonitor().setStatus(PluginStatus.WRITE); try { dfsWriterStrategy.write(receiver); } catch (Exception ex) { throw new DataExchangeException(String.format( "Some errors occurs on starting writing: %s,%s", ex.getMessage(), ex.getCause())); } finally { dfsWriterStrategy.close(); closeAll(); } return PluginStatus.SUCCESS.value(); } @Override public int commit() { return PluginStatus.SUCCESS.value(); } @Override public int finish() { closeAll(); getMonitor().setStatus(PluginStatus.WRITE_OVER); return PluginStatus.SUCCESS.value(); } private void closeAll() { try { IOUtils.closeStream(fs); } catch (Exception e) { throw new DataExchangeException(String.format( "HdfsWriter closing filesystem failed: %s,%s", e.getMessage(), e.getCause())); } } @Override public int cleanup() { closeAll(); return PluginStatus.SUCCESS.value(); } public interface DfsWriterStrategy { void open(); void write(LineReceiver receiver); void close(); } class DfsWriterSequeueFileStrategy implements DfsWriterStrategy { private Configuration conf = null; private SequenceFile.Writer writer = null; private Writable key = null; private Writable value = null; private boolean compressed = false; private String keyClassName = null; private String valueClassName = null; priva

评论收藏

内容反馈