Moving Data In and Out of Hadoop
Moving Data In and Out of Hadoop
• Existing tools like Flume, Scribe, and Chukwa are mainly designed for log file
transportation.What if you need to transfer different file formats, such as semi-structured
or binary files?
Solution:
• The HDFS File Slurper is an open-source utility that can copy any file format into or out of
HDFS.
The HDFS File Slurper is a simple tool that automates copying files between a local directory
and HDFS, and vice versa. It follows a structured five-step process:
• Scan: The Slurper reads files from the source directory.
• Determine HDFS destination: Optionally, it consults a script to determine where in HDFS
the file should be placed.
• Write: The file is copied to HDFS.
• Verify: An optional verification step ensures successful transfer.
• Relocate file: The original file is moved to a completed directory after a successful copy.
Technique 3: Scheduling Regular Ingress Activities with Oozie