Hadoop Ecosystem at a Glance

About Neev
Web

Mobile

Magento eCommerce
SaaS Applications
Video Streaming Portals
Rich Internet Apps
Custom Development

iPhone
Android
Windows Phone 7
HTML5 Apps

Cloud
AWS Consulting Partner
Rackspace
Joyent
Heroku
Google App Engine

Key Company Highlights
250+ team with experience in
managing
offshore, distributed
development.
Neev Technologies
established in Jan ’05
VC Funding in 2009 By Basil
Partners

User Interface Design and User Experience Design

Part of Publicis Groupe
Member of NASSCOM.

Performance Consulting Practices
Development Centers in
Bangalore and Pune.
Quality Assurance & Testing

Outsourced Product Development

Offices at
Bangalore, USA, Delhi, Pune,
Singapore and Stockholm.

Hadoop in a Nutshell : An Overview
• Hadoop as we know is a Java based massive scalable distributed
framework for processing large data (several peta bytes) across a
cluster (1000s) of commodity computers.
• The Hadoop ecosystem has grown over the last few years and
there is a lot of jargon in terms of tools as well as frameworks.
• Many organizations are investing & innovating heavily in Hadoop

to make it better and easier. The mind map on the next slide
should be useful to get a high level picture of the ecosystem.

Hadoop Core
The core consists of
1) HDFS or Hadoop Distributed File System is designed to run on a commodity
cluster of machines. It is highly fault tolerant and is useful for processing
large data sets. Files stored in HDFS are organized into blocks, typically
64MB or 128MB, and stored across nodes in the cluster. Each block of data
is also replicated across more nodes generally 3 to avoid data loss in case of
failure
2) MapReduce is a software framework for processing a large data set(peta
byte scale), on a cluster of commodity hardware. When MapReduce is
run, Hadoop splits the input and locates the nodes on the cluster. The
actual jobs are then run at or close to the node where the data is residing
so that the data is as close to the computation node. This stops the network
from getting flooded with data or becoming a bottleneck

Hadoop : Distributions
Hadoop Distribution
Description
Apache
Purely Open Source maintained by Apache
Cloudera
The leading distribution with capabilities like
management, security, high availability and integration
with many other solutions
for
software
and
hardware.
HortonWorks
Only version for Windows Servers
MapR
unique features like mounting over NFS
GreenPlum
Uses an SQL based Database Engine
Intel
Intel’s open source version
AmazonEMR
Amazon’s version of MapReduce called Elastic
MapReduce, a part of AWS. EMR allows a Hadoop
cluster to be deployed and MapReduce jobs to be run
in the cloud with just a few clicks.

Related Projects
Related Projects

Description

Avro

Data serialization framework that is useful in Hadoop and other
systems
Framework for analyzing large data set using a high level language
called Pig Latin
Hive is a data warehouse framework that stores querying of large
data sets stored in Hadoop

Pig
Hive

Hbase
Mahout
Yarn
Ozzie
Flume
Sqoop
Cascading

HBase is a distributed scalable data store based on Hadoop
Mahout is a scalable Machine learning library
YARN is the next generation of MapReduce
Involves running a sequence of MapReduce and other pre and post processing jobs at scheduled times or based on data availability
A distributed, reliable and available service for collecting,
aggregating and moving log data to HDFS
Designed for transferring data between Hadoop and relational
databases
Application framework for building application using Hadoop

Related Technologies
Related
Technologies
Twitter Storm

HPCC

Dremel

Description
As opposed to Hadoop which is a batch processing system,
Storm is a distributed real time processing system
developed by Twitter. Storm is fast, scalable and easy to
use.
High Performance Computing Cluster is an MPP(Massive
parallel processing) computing platform that helps solving
problems with handling huge data.
A scalable interactive ad-hoc query system for analysis of
read-only nested data built by Google.

Neev Information Technologies Pvt. Ltd.
India - Bangalore

India - Pune

The Estate, # 121,6th Floor,

#13 L’Square, 3rd Floor

Dickenson Road

Parihar Chowk, Aundh,

Bangalore-560042

Pune – 411007.

Phone :+91 80 25594416

Phone : +91-64103338

USA

sales@neevtech.com
Sweden

Singapore

Neev AB, Birger Jarlsgatan
1121 Boyce Rd Ste 1400,
Pittsburgh PA 15241

Phone : +1 888-979-7860

#08-03 SGX Centre 2, 4

53, 6tr,

Shenton Way,

11145, Stockholm

Singapore 068807

Phone: +46723250723

Phone: +65 6435 1961

For more info on our offerings, visit www.neevtech.com

Hadoop Ecosystem at a Glance

More Related Content

What's hot

Similar to Hadoop Ecosystem at a Glance

More from Neev Technologies

Recently uploaded

Hadoop Ecosystem at a Glance