0% found this document useful (0 votes)

33 views2 pages

Assignment 2

The document provides instructions to simulate an ETL pipeline that: 1. Filters order data from a local file for pending payments, saves it to staging. 2. Moves the filtered data to HDFS landing and runs validation checks. 3. "Processes" the data by moving it to HDFS staging and creating a sample results file. 4. Brings the results file back locally, renames it, and cleans up temporary files and folders.

Uploaded by

kalidas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views2 pages

Assignment 2

Uploaded by

kalidas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Assignment - Week 2

1. Login to your Gateway node & open a terminal

2. write a command to know what's your home directory in gateway node

3. There is a third party service which will drop a file named orders.csv in the
landing folder under your home directory.

Then you need to filter for all the orders where status is PENDING_PAYMENT &
create a new file named orders_filtered.csv and put it to the staging folder.

Then take this file and put it to hdfs in landing folder in your hdfs

and do a couple of more things...

So to simulate this..

1. create two folders named landing and staging in your home directory.

2. copy the file present under /data/retail_db/orders folder to the landing folder in
your home directory.

3. Apply the grep command to filter for all orders with PENDING_PAYMENT
status.

4. create a new file named orders_filtered.csv under your staging folder with the
filtered results.

5. create a folder hierarchy in your hdfs home named data/landing

6. copy this orders_filtered.csv file from your staging folder in local to

data/landing folder in your hdfs.

7. Run a command to check number of records in orders_filtered.csv file under

data/landing folder

8. Write a command to list the files in the data/landing folder of hdfs.

9. reframe this command so that you can see the file size in kb's

10. change the permission of this file

give read,write and execute to the owner
read and write to the group
read to others

11. create a new folder data/staging in your hdfs and move orders_filtered.csv
from data/landing to data/staging

12. Now let's assume a spark program would have run on your staging folder to
do some processing and let's say the processed results gives you just 2 lines as
ouput
3617,2013-08-15 00:00:00.0,8889,PENDING_PAYMENT
68714,2013-09-06 00:00:00.0,8889,PENDING_PAYMENT

To simulate this, create a new file called orders_result.csv in the home directory
of your local gateway node using vi editor and have the above 2 records..

13. move orders_result.csv from local to hdfs under a new directory called
data/results (thing as if spark program has run and has created this file)

14. Now the processed results we want to bring back to local under a folder
data/results in your local. so run a command to bring the file from hdfs to local.

15. rename the file orders_result.csv under data/results folder in your local to
final_results.csv

16. Now we are done.. so delete all the directories that you have created in your
local as well as hdfs.

Exp-2 Hadoop Commands
No ratings yet
Exp-2 Hadoop Commands
6 pages
Big Data Class Activity Assignment 2
No ratings yet
Big Data Class Activity Assignment 2
17 pages
Dsa Practical File
No ratings yet
Dsa Practical File
16 pages
BDC Output 2
No ratings yet
BDC Output 2
4 pages
HDFS Command Guide for Beginners
No ratings yet
HDFS Command Guide for Beginners
1 page
HDFS Import/Export Operations Guide
No ratings yet
HDFS Import/Export Operations Guide
6 pages
HDFS Directory and File Management Guide
No ratings yet
HDFS Directory and File Management Guide
13 pages
HOL - Exploring HDFS
No ratings yet
HOL - Exploring HDFS
6 pages
Practical 1 - 1 - Hadoop Commands
No ratings yet
Practical 1 - 1 - Hadoop Commands
3 pages
Assignment Week 1
No ratings yet
Assignment Week 1
9 pages
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
No ratings yet
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
51 pages
Hadoop Imp Commands
No ratings yet
Hadoop Imp Commands
21 pages
L2 Accessing HDFS On Cloudera Distribution
No ratings yet
L2 Accessing HDFS On Cloudera Distribution
5 pages
HDFS Commands
No ratings yet
HDFS Commands
11 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
HDFS Commands1
No ratings yet
HDFS Commands1
18 pages
Lecture Notes On Big-Data Storage
No ratings yet
Lecture Notes On Big-Data Storage
9 pages
HDFS Tutorial
No ratings yet
HDFS Tutorial
5 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
Overview of Hadoop Modules and Commands
No ratings yet
Overview of Hadoop Modules and Commands
6 pages
Big Data AnalyticUnit2
No ratings yet
Big Data AnalyticUnit2
19 pages
HDFS Features and Access Guide
No ratings yet
HDFS Features and Access Guide
9 pages
HDFS
No ratings yet
HDFS
6 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
2 HDFS Commands
No ratings yet
2 HDFS Commands
7 pages
HDFS Overview and Command Guide
No ratings yet
HDFS Overview and Command Guide
25 pages
BDCT Week 1, 2 PDF c9
No ratings yet
BDCT Week 1, 2 PDF c9
10 pages
Ai&Ml (Bdamanual)
No ratings yet
Ai&Ml (Bdamanual)
24 pages
Hafs Commands
No ratings yet
Hafs Commands
17 pages
HDFS Overview and Command-Line Guide
No ratings yet
HDFS Overview and Command-Line Guide
29 pages
Essential HDFS Commands Guide
No ratings yet
Essential HDFS Commands Guide
15 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
Lista de Comandos HDFS
No ratings yet
Lista de Comandos HDFS
8 pages
Lab2 BD
No ratings yet
Lab2 BD
20 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
HDFS File Management Commands Guide
No ratings yet
HDFS File Management Commands Guide
2 pages
HDFS Overview and Command Guide
No ratings yet
HDFS Overview and Command Guide
5 pages
HDFS Operations and Java API Guide
No ratings yet
HDFS Operations and Java API Guide
6 pages
J2EE Lab Assignment: HDFS & HBase Tasks
No ratings yet
J2EE Lab Assignment: HDFS & HBase Tasks
60 pages
HDFS Commands 2
No ratings yet
HDFS Commands 2
9 pages
Apache Hadoop
No ratings yet
Apache Hadoop
3 pages
SSJ Bda File
No ratings yet
SSJ Bda File
16 pages
HDFS Command Guide for Beginners
No ratings yet
HDFS Command Guide for Beginners
22 pages
2335 m4 Demo2 v1 HDL 781gomlg
No ratings yet
2335 m4 Demo2 v1 HDL 781gomlg
6 pages
Hadoop Commands
100% (1)
Hadoop Commands
6 pages
Hadoop Linux Commands
No ratings yet
Hadoop Linux Commands
8 pages
Essential HDFS Commands Guide
No ratings yet
Essential HDFS Commands Guide
2 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
26 pages
HDFS Command Line Cheat Sheet
No ratings yet
HDFS Command Line Cheat Sheet
26 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
Hadoop & Spark Commands Guide
No ratings yet
Hadoop & Spark Commands Guide
3 pages
HDFS File System Shell Guide
No ratings yet
HDFS File System Shell Guide
10 pages
Hadoop Basics for IT Students
No ratings yet
Hadoop Basics for IT Students
13 pages
Create A Directory in HDFS at Given Path(s) .: Upload
No ratings yet
Create A Directory in HDFS at Given Path(s) .: Upload
11 pages
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
7 pages
Excel Practical 20-31
No ratings yet
Excel Practical 20-31
3 pages
AI Session 2 Slides
No ratings yet
AI Session 2 Slides
47 pages
Servo Motor Controlled by Arduino
No ratings yet
Servo Motor Controlled by Arduino
4 pages
Attention Is All You Need Summary
No ratings yet
Attention Is All You Need Summary
5 pages
Compiler Construction Lab2
No ratings yet
Compiler Construction Lab2
11 pages
All-in-One Fanless POS System PTE0120
No ratings yet
All-in-One Fanless POS System PTE0120
3 pages
Adv Unix Scripting
100% (2)
Adv Unix Scripting
139 pages
En - stm32L4 Peripheral LCD
No ratings yet
En - stm32L4 Peripheral LCD
25 pages
Samsung HT-TXQ120
No ratings yet
Samsung HT-TXQ120
36 pages
SC-900 Exam Prep
No ratings yet
SC-900 Exam Prep
3 pages
Final Blackbook AutoRecovered
No ratings yet
Final Blackbook AutoRecovered
71 pages
TCP-IP Client Socket and Server Socket
No ratings yet
TCP-IP Client Socket and Server Socket
6 pages
Sen361 L DCCN Lab 10 Ospf
No ratings yet
Sen361 L DCCN Lab 10 Ospf
12 pages
M6 Migrating SQL Server Databases To Google Cloud
No ratings yet
M6 Migrating SQL Server Databases To Google Cloud
52 pages
Corrigendum-08 Gwalior Smartcity CCTV Tender
No ratings yet
Corrigendum-08 Gwalior Smartcity CCTV Tender
15 pages
Manual Victron GX - Device - Manual en
No ratings yet
Manual Victron GX - Device - Manual en
83 pages
Conflict Series 1 What Is Conflict PTP201901E
No ratings yet
Conflict Series 1 What Is Conflict PTP201901E
4 pages
Cloud and Storage Consultant - Sunil Kumar
No ratings yet
Cloud and Storage Consultant - Sunil Kumar
3 pages
Asus ROG Strix SCAR 15 2022 Review (G533ZW - Core I9-12900h, RTX 3070ti)
No ratings yet
Asus ROG Strix SCAR 15 2022 Review (G533ZW - Core I9-12900h, RTX 3070ti)
29 pages
Global Atlas - Data - Quality
No ratings yet
Global Atlas - Data - Quality
26 pages
Quick Sort and Its Worst-Case Analysis
No ratings yet
Quick Sort and Its Worst-Case Analysis
6 pages
M.Tech Exam Results - June 2023
No ratings yet
M.Tech Exam Results - June 2023
2 pages
AI Internship Report
No ratings yet
AI Internship Report
109 pages
Mangal Font Hindi Typing Shortcuts
100% (1)
Mangal Font Hindi Typing Shortcuts
2 pages
IE2108 Tutorial 01
No ratings yet
IE2108 Tutorial 01
6 pages
Airish Joyce Benoza - Worksheet 2 - Global Economic Institutions (Short Research)
100% (2)
Airish Joyce Benoza - Worksheet 2 - Global Economic Institutions (Short Research)
3 pages
Fabric Data Warehouse
No ratings yet
Fabric Data Warehouse
686 pages
QP ET CSE3343 CloudComputing 5
No ratings yet
QP ET CSE3343 CloudComputing 5
4 pages
Case Studies - Chapter 3.3
100% (1)
Case Studies - Chapter 3.3
30 pages
CompTIA N+ Session 02
No ratings yet
CompTIA N+ Session 02
51 pages

Assignment 2

Uploaded by

Assignment 2

Uploaded by

Assignment - Week 2

1. Login to your Gateway node & open a terminal

2. write a command to know what's your home directory in gateway node

and do a couple of more things...

5. create a folder hierarchy in your hdfs home named data/landing

6. copy this orders_filtered.csv file from your staging folder in local to

7. Run a command to check number of records in orders_filtered.csv file under

8. Write a command to list the files in the data/landing folder of hdfs.

10. change the permission of this file

You might also like