Hadoop中distcp命令

本文详细介绍了Hadoop中的distcp命令,包括其工作原理、使用方法及常见用例。distcp是一个并行复制工具,通过MapReduce作业在Hadoop文件系统间高效传输数据。文章还提供了具体的命令执行示例。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Hadoopdistcp命令

1.什么是distcp命令?

Hadoop comes with a useful program called distcp for copying data to and from Hadoop filesystems in parallel.

2.distcp 是如何实现的?

distcp is implemented as a MapReduce job where the work of copying is done by the maps that run in parallel across the cluster.There are no reducers.

Each file is copied by a single map, and distcp tries to give each map approximately the same amount of data by bucketing files into roughly equal allocations

By default, up to 20 maps are used, but this can be changed by specifying the -m argument to distcp.

3.如何使用distcp命令?

hadoop distcp dir1 dir2

  • If dir2 does not exist, it will be created, and the contents of the dir1 directory will be copied there.
  • If dir2 already exists, then dir1 will be copied under it, creating the directory structure dir2/dir1. If this isn’t what you want, you can supply the -overwrite option to keep the same directory structure and force files to be overwritten.
4.distcip 的用途

A very common use case for distcp is for transferring data between two HDFS clusters.

5.示例
  • 执行命令前
    查看HDFS 的存储内容
    在这里插入图片描述
  • 执行命令hadoop distcp /test /lawson
    这个命令的意思是:将/test 目录下的内容 copy 一份放到 /lawson文件夹下。
[root@server4 hadoop]# hadoop distcp /test /lawson
19/01/14 18:08:17 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[/test], targetPath=/lawson, targetPathExists=false, preserveRawXattrs=false}
19/01/14 18:08:17 INFO client.RMProxy: Connecting to ResourceManager at server4/192.168.211.4:8032
19/01/14 18:08:17 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
19/01/14 18:08:17 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
19/01/14 18:08:17 INFO client.RMProxy: Connecting to ResourceManager at server4/192.168.211.4:8032
19/01/14 18:08:18 INFO mapreduce.JobSubmitter: number of splits:2
19/01/14 18:08:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1547460247783_0001
19/01/14 18:08:19 INFO impl.YarnClientImpl: Submitted application application_1547460247783_0001
19/01/14 18:08:19 INFO mapreduce.Job: The url to track the job: http://server4:8088/proxy/application_1547460247783_0001/
19/01/14 18:08:19 INFO tools.DistCp: DistCp job-id: job_1547460247783_0001
19/01/14 18:08:19 INFO mapreduce.Job: Running job: job_1547460247783_0001
19/01/14 18:08:31 INFO mapreduce.Job: Job job_1547460247783_0001 running in uber mode : false
19/01/14 18:08:31 INFO mapreduce.Job:  map 0% reduce 0%
19/01/14 18:08:42 INFO mapreduce.Job:  map 50% reduce 0%
19/01/14 18:08:43 INFO mapreduce.Job:  map 100% reduce 0%
19/01/14 18:08:44 INFO mapreduce.Job: Job job_1547460247783_0001 completed successfully
19/01/14 18:08:44 INFO mapreduce.Job: Counters: 33
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=218172
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=889
		HDFS: Number of bytes written=30
		HDFS: Number of read operations=28
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=7
	Job Counters 
		Launched map tasks=2
		Other local map tasks=2
		Total time spent by all maps in occupied slots (ms)=15439
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=15439
		Total vcore-milliseconds taken by all map tasks=15439
		Total megabyte-milliseconds taken by all map tasks=15809536
	Map-Reduce Framework
		Map input records=2
		Map output records=0
		Input split bytes=268
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=293
		CPU time spent (ms)=740
		Physical memory (bytes) snapshot=201781248
		Virtual memory (bytes) snapshot=4150059008
		Total committed heap usage (bytes)=93454336
	File Input Format Counters 
		Bytes Read=591
	File Output Format Counters 
		Bytes Written=0
	org.apache.hadoop.tools.mapred.CopyMapper$Counter
		BYTESCOPIED=30
		BYTESEXPECTED=30
		COPY=2
  • 执行命令后
    在这里插入图片描述
    可以看到已经生成了一个/lawson的文件,而且下面有一个users.txt文件

  • 接着我们将/test文件夹下的文件重命名为distcp.txt,再次执行如下命令

[root@server4 hadoop]# hadoop distcp /test /lawson
19/01/14 18:14:17 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[/test], targetPath=/lawson, targetPathExists=true, preserveRawXattrs=false}
19/01/14 18:14:17 INFO client.RMProxy: Connecting to ResourceManager at server4/192.168.211.4:8032
19/01/14 18:14:18 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
19/01/14 18:14:18 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
19/01/14 18:14:18 INFO client.RMProxy: Connecting to ResourceManager at server4/192.168.211.4:8032
19/01/14 18:14:19 INFO mapreduce.JobSubmitter: number of splits:2
19/01/14 18:14:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1547460247783_0002
19/01/14 18:14:19 INFO impl.YarnClientImpl: Submitted application application_1547460247783_0002
19/01/14 18:14:19 INFO mapreduce.Job: The url to track the job: http://server4:8088/proxy/application_1547460247783_0002/
19/01/14 18:14:19 INFO tools.DistCp: DistCp job-id: job_1547460247783_0002
19/01/14 18:14:19 INFO mapreduce.Job: Running job: job_1547460247783_0002
19/01/14 18:14:26 INFO mapreduce.Job: Job job_1547460247783_0002 running in uber mode : false
19/01/14 18:14:26 INFO mapreduce.Job:  map 0% reduce 0%
19/01/14 18:14:33 INFO mapreduce.Job:  map 50% reduce 0%
19/01/14 18:14:35 INFO mapreduce.Job:  map 100% reduce 0%
19/01/14 18:14:35 INFO mapreduce.Job: Job job_1547460247783_0002 completed successfully
19/01/14 18:14:35 INFO mapreduce.Job: Counters: 33
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=218164
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=906
		HDFS: Number of bytes written=30
		HDFS: Number of read operations=29
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=7
	Job Counters 
		Launched map tasks=2
		Other local map tasks=2
		Total time spent by all maps in occupied slots (ms)=11509
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=11509
		Total vcore-milliseconds taken by all map tasks=11509
		Total megabyte-milliseconds taken by all map tasks=11785216
	Map-Reduce Framework
		Map input records=2
		Map output records=0
		Input split bytes=266
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=100
		CPU time spent (ms)=520
		Physical memory (bytes) snapshot=203853824
		Virtual memory (bytes) snapshot=4150059008
		Total committed heap usage (bytes)=93454336
	File Input Format Counters 
		Bytes Read=610
	File Output Format Counters 
		Bytes Written=0
	org.apache.hadoop.tools.mapred.CopyMapper$Counter
		BYTESCOPIED=30
		BYTESEXPECTED=30
		COPY=2

查看/lawson文件夹下的内容:

[root@server4 hadoop]# hadoop fs -ls /lawson
Found 2 items
drwxr-xr-x   - root supergroup          0 2019-01-14 18:14 /lawson/test
-rw-r--r--   3 root supergroup         30 2019-01-14 18:08 /lawson/users.txt


[root@server4 hadoop]# hadoop fs -ls /lawson/test
Found 1 items
-rw-r--r--   3 root supergroup         30 2019-01-14 18:14 /lawson/test/distcp.txt

可以看到在/lawson文件夹下生成了一个/test文件夹,而该文件夹下有一个distcp.txt文件。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

说文科技

看书人不妨赏个酒钱?

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值