HADOOP: A Solution To Big Data Problems Using Partitioning Mechanism Map-Reduce
HADOOP: A Solution To Big Data Problems Using Partitioning Mechanism Map-Reduce
ABSTRACT
With an increased usage of the internet, the data usage This approach lowers the risk of catastrophic
is also getting increased exponentially year on year. system failure and unexpected data loss.
So obviously to handle such an enormous data we
needed a better platform to process data. So a
programming model was introduced called Map
Reduce, which process big amounts of data in in-parallel
on large clusters (thousands of nodes) of commodity
hardware in a reliable, fault-tolerant
tolerant manner. Since
HADOOP has been emerged as a popular tool for
BIG DATA implementation, the paper deals with the
overall architecture of HADOOP along with the
details of its various components.
This algorithm divides the task into small parts and assigns those parts to many computers connected over the
network, and collects the results to form the final result dataset.
The first is the map job, which takes a set of data and converts it into another set of data, where
individual elements are broken down into tuples (key/value pairs).
The reduce job takes the output from a map as input and combines those data tuples into a smaller set
of tuples. As the sequence of the name MapReduce implies, the reduce job is always performed after the
map job.
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 4 | May-Jun 2018 Page: 1297
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
Hadoop Map Reduce architecture
In Mapper Phase the input data is going to split into 2 Reducer Phase
components, Key and Value. The key is writable and
comparable in the processing stage. Value is writable Shuffled and sorted data is going to pass as input to
only during the processing stage. Suppose, client the reducer. In this phase, all incoming data is going
submits input data to Hadoop system, the Job tracker to combine and same actual key value pairs is going
assigns tasks to task tracker. The input data that is to write into hdfs system. Record writer writes data
going to get split into several input splits. from reducer to hdfs. The reducer is not so mandatory
for searching and mapping purpose.
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 4 | May-Jun 2018 Page: 1298
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
Hadoop - HDFS File System Hadoop provides a command interface to
Hadoop File System was developed using distributed interact with HDFS.
file system design. It is run on commodity hardware. The built-in servers of name node and data
Unlike other distributed systems, HDFS is highly node help users to easily check the status of
fault olerant and designed using low-cost hardware. cluster.
HDFS holds very large amount of data and provides Streaming access to file system data.
easier access. To store such huge data, the files are HDFS provides file permissions and
stored across multiple machines. These files are authentication.
stored in redundant fashion to rescue the system
from possible data losses in case of failure. HDFS
also makes applications available to parallel
processing.
Features of HDFS
HDFS Architecture:-
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 4 | May-Jun 2018 Page: 1299
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
hardware/System) in a cluster, there will be a data
node. These nodes manage the data storage of their
system. Data nodes perform read-write operations on
the file systems, as per client request. They also
perform operations such as block creation, deletion,
and replication according to the instructions of the
name node.
Block
Generally the user data is stored in the files of
HDFS. The file in a file system will be divided into
one or more segments and/or stored in individual
data nodes. These file segments are called as blocks.
In other words, the minimum amount of data that
[Fig-1.6 Hadoop Architecture Model]
HDFS can read or write is called a Block. The
default block size is 64MB, but it can be increased as
per the need to change in HDFS configuration. How Does Hadoop Work?
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 4 | May-Jun 2018 Page: 1300
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
Data Management for Hadoop:-Big data skills are References:-
in high demand. Now business users can profile,
transform and cleanse data – on Hadoop or anywhere 1) S.Vikram Phaneendra & E.Madhusudhan Reddy
else it may reside – using an intuitive user interface. “Big Data- solutions for RDBMS problems- A
Data analysts can run SAS code on Hadoop for even survey” In 12th IEEE/IFIP Network Operations &
better performance. With SAS("Statistical Analysis Management Symposium (NOMS 2010) (Osaka,
System"), We can: Japan, Apr 19{23 2013).
2) Kiran kumara Reddi & Dnvsl Indira “Different
Access and load Hadoop data fast:- Turn big data Technique to Transfer Big Data : survey” IEEE
into valuable data with quick, easy access to Transactions on 52(8) (Aug.2013) 2348 { 2355}
Hadoop and the ability to load to and from relational
data sources as well as SAS datasets. 3) Z. Zheng, J. Zhu, M. R. Lyu. “Service-generated
Big Data and Big Data-as-a-Service: An
Stop the “garbage in, garbage out” cycle:- Overview,” in Proc. IEEE BigData, pp. 403-410,
Integrated data quality delivers pristine data that fuels October 2013. A . Bellogín, I. Cantador, F. Díez,
accurate analytics amplified by the power of Hadoop. et al., “An empirical comparison of social,
collaborative filtering, and hybrid recommenders,”
Put big data to work for you:- Transform, filter and ACM Trans. on Intelligent Systems and
summarize data yourself, and get more value from Technology, vol. 4, no. 1, pp. 1-37, January 2013.
your big data.
4) W. Zeng, M. S. Shang, Q. M. Zhang, et al., “Can
Dissimilar Users Contribute to Accuracy and
Get more out of your computing resources:-
Diversity of Personalized Recommendation?,”
Optimize your workloads, and gain high availability
International Journal of Modern Physics C, vol.
across the Hadoop cluster.
21, no. 10, pp. 1217- 1227, June 2010.
WHY SAS("Statistical Analysis System")? 5) T. C. Havens, J. C. Bezdek, C. Leckie, L. O. Hall,
and M. Palaniswami, “Fuzzy c-Means Algorithms
Better productivity through faster management of for Very Large Data,” IEEE Trans. on Fuzzy
big data:- In-Hadoop data quality and code execution Systems, vol. 20, no. 6, pp. 1130-1146, December
take advantage of MapReduce and YARN to speed 2012.
the process of accessing trusted data.
6) Z. Liu, P. Li, Y. Zheng, et al., “Clustering to find
Big data management:- Big data is becoming the exemplar terms for keyphrase extraction,” in Proc.
backbone of business information. We help business 2009 Conf. on Empirical Methods in Natural
and IT work together to deliver big data that's Language Processing, pp. 257-266, May 2009
enterprise ready – no need to write code (unless you 7) Sameer Agarwal†, Barzan MozafariX, Aurojit
want to). Panda†, Henry Milner†, Samuel MaddenX, Ion
Stoica “BlinkDB: Queries with Bounded Errors
Data you can trust:- Make big data better. SAS and Bounded Response Times on Very Large
provides multiple data integration and data quality Data” Copyright © 2013ì ACM 978-1-4503-1994
transformations to profile, parse and join your data 2/13/04
without moving it out of Hadoop.
8) Yingyi Bu _ Bill Howe _ Magdalena Balazinska _
Conclusion Michael D. Ernst “The HaLoop Approach to
Large-Scale Iterative Data Analysis” VLDB 2010
We have entered on Big Data era. The paper describes paper “HaLoop: Efficient Iterative Data
the concept of Big Data analystics with the help of Processing on Large Clusters.
partitioning mechanism-MAP Reduce and describes
the management of large amount of data through
HDFS. The paper also focuses on Big Data processing
problems. These technical challenges must be
addressed for efficient and fast processing of Big
data.Hadoop provides solution to all big data problem.
@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 4 | May-Jun 2018 Page: 1301