Copyright©2014 NTT corp. All Rights Reserved. 
Apache Hadoop-What’s next?- @db tech showcase 2014 
Tsuyoshi Ozawa 
ozawa.tsuyoshi@lab.ntt.co.jp
2 
Copyright©2014 NTT corp. All Rights Reserved. 
•Tsuyoshi Ozawa 
•Researcher & Engineer @ NTTTwitter: @oza_x86_64 
•A Hadoop developer 
•Merged patches –53 patches! 
•Author of “Hadoop 徹底入門2nd Edition” Chapter 22(YARN) 
About me
3 
Copyright©2014 NTT corp. All Rights Reserved. 
Quiz!!
4 
Copyright©2014 NTT corp. All Rights Reserved. 
Does Hadoophave SPoF? 
Quiz
5 
Copyright©2014 NTT corp. All Rights Reserved. 
Quiz 
All master nodes in Hadoopcan run as highly available mode
6 
Copyright©2014 NTT corp. All Rights Reserved. 
Is Hadooponly for MapReduce? 
Quiz
7 
Copyright©2014 NTT corp. All Rights Reserved. 
Quiz 
Hadoop isnot only for MapReducebut also Spark/Tez/Storm and so on…
8 
Copyright©2014 NTT corp. All Rights Reserved. 
•Current Status of Hadoop-New features since Hadoop 2 - 
•HDFS 
•No SPoFwith NamenodeHA + JournalNode 
•Scaling out Namenodewith NamenodeFederation 
•YARN 
•Resource Management with YARN 
•No SPoFwith ResourceManagerHA 
•MapReduce 
•No SPoFwith ApplicationMasterrestart 
•What’s next? -Coming features in 2.6 release - 
•HDFS 
•Heterogeneous Storage 
•Memory as Storage Tier 
•YARN 
•Label-based scheduling 
•RM HA Phase 2 
Agenda
9 
Copyright©2014 NTT corp. All Rights Reserved. 
HDFS IN HADOOP 2
10 
Copyright©2014 NTT corp. All Rights Reserved. 
•Once on a time, NameNodewas SPoF 
•In Hadoop 2, NameNodehasQuorum JournalManager 
•Replication is done by Pasxos-based protocol 
See also: 
http://blog.cloudera.com/blog/2012/10/quorum-based-journaling-in-cdh4-1/ 
NameNode with JournalNode 
NameNode 
QuorumJournalManager 
JournalNode 
JournalNode 
JournalNode 
Local disk 
Local disk 
Local disk
11 
Copyright©2014 NTT corp. All Rights Reserved. 
•Once on a time, scalability of NameNodewas limited to memory 
•In Hadoop 2, NameNodehasFederation feature 
•Distributing metadata per namespace 
NameNode Federation 
Figures from: 
https://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop- hdfs/Federation.html
12 
Copyright©2014 NTT corp. All Rights Reserved. 
RESOURCE MANAGEMENT IN HADOOP 2
13 
Copyright©2014 NTT corp. All Rights Reserved. 
YARN 
•Generic resource management framework 
•YARN = Yet Another Resource Negotiator 
•Proposed by ArunC Murthy in 2011 
•Container-level resource management 
•Container is more generic unit of resource than slots 
•Separate JobTracker’srole 
•Job Scheduling/Resource Management/Isolation 
•Task Scheduling 
What’s YARN? 
JobTracker 
MRv1 architecture 
MRv2 and YARN Architecture 
YARN ResourceManager 
Impala Master 
Spark Master 
MRv2 Master 
TaskTracker 
YARN NodeManager 
map slot 
reduce slot 
container 
container 
container
14 
Copyright©2014 NTT corp. All Rights Reserved. 
•Running various processing frameworkson same cluster 
•Batch processing with MapReduce 
•Interactive query with Impala 
•Interactive deep analytics(e.g. Machine Learning) with Spark 
Why YARN?(Use case) 
MRv2/Tez 
YARN 
HDFS 
Impala 
Spark 
Periodic long batch 
query 
Interactive 
Aggregation 
query 
Interactive 
Machine Learning 
query
15 
Copyright©2014 NTT corp. All Rights Reserved. 
•More effective resource management for multiple processing frameworks 
•difficult to use entire resources without thrashing 
•Cannot move *Real* big data from HDFS/S3 
Why YARN?(Technical reason) 
Master for MapReduce 
Master for Impala 
Slave 
Impala slave 
map slot 
reduce slot 
MapReduce slave 
Slave 
Slave 
Slave 
HDFS slave 
Each frameworks has own scheduler 
Job2 
Job1 
Job1 
thrashing
16 
Copyright©2014 NTT corp. All Rights Reserved. 
•Resource is managed by JobTracker 
•Job-level Scheduling 
•Resource Management 
MRv1 Architecture 
Master for MapReduce 
Slave 
map slot 
reduce slot 
MapReduce slave 
Slave 
map slot 
reduce slot 
MapReduce slave 
Slave 
map slot 
reduce slot 
MapReduce slave 
Master for Impala 
Schedulers only now own resource usages
17 
Copyright©2014 NTT corp. All Rights Reserved. 
•Idea 
•One global resource manager(ResourceManager) 
•Common resource pool for all frameworks(NodeManagerand Container) 
•Schedulers for each frameworks(AppMaster) 
YARN Architecture 
ResourceManager 
Slave 
NodeManager 
Container 
Container 
Container 
Slave 
NodeManager 
Container 
Container 
Container 
Slave 
NodeManager 
Container 
Container 
Container 
Master 
Slave 
Slave 
Master 
Slave 
Slave 
Master 
Slave 
Slave 
Client 
1. Submit jobs 
2. Launch Master 
3. Launch Slaves
18 
Copyright©2014 NTT corp. All Rights Reserved. 
YARN and Mesos 
YARN 
•AppMasteris launched for each jobs 
•More scalability 
•Higher latency 
•One container per req 
•One Master per Job 
Mesos 
•AppMasteris launched for each app(framework) 
•Less scalability 
•Lower latency 
•Bundle of containers per req 
•One Master per Framework 
ResourceManager 
NM 
NM 
NM 
ResourceMaster 
Slave 
Slave 
Slave 
Master1 
Master2 
Master1 
Master2 
Policy/Philosophy is different
19 
Copyright©2014 NTT corp. All Rights Reserved. 
•MapReduce 
•Of course, it works 
•DAG-style processing framework 
•Spark on YARN 
•Hive on Tezon YARN 
•Interactive Query 
•Impala on YARN(via llama) 
•Users 
•Yahoo! 
•Twitter 
•LinkdedIn 
•Hadoop 2 @ Twitter http://www.slideshare.net/Hadoop_Summit/t- 235p210-cvijayarenuv2 
YARN Eco-system
20 
Copyright©2014 NTT corp. All Rights Reserved. 
YARN COMPONENTS
21 
Copyright©2014 NTT corp. All Rights Reserved. 
•Master Node of YARN 
•Role 
•Accepting requests from 
1.Application Masters for allocating containers 
2.Clients for submitting jobs 
•Managing Cluster Resources 
•Job-level Scheduling 
•Container Management 
•Launching Application-level Master(e.g. for MapReduce) 
ResourceManager(RM) 
ResourceManager 
Client 
Slave 
NodeManager 
Container 
Container 
Master 
4.Container allocationrequests to NodeManager 
1. Submitting Jobs 
2. Launching Master of jobs 
3.Container allocation requests
22 
Copyright©2014 NTT corp. All Rights Reserved. 
•Slave Node of YARN 
•Role 
•Accepting requests from RM 
•Monitoring local machine and report it to RM 
•Health Check 
•Managing local resources 
NodeManager(NM) 
NodeManager 
ResourceManager 
2. Allocating containers 
Clients 
Master 
or 
3. Launching containers 
containers 
4. Containers information(host, port, etc.) 
1. Request containers 
Periodic health check via heartbeat
23 
Copyright©2014 NTT corp. All Rights Reserved. 
•Master of Applications(e.g. Master of MapReduce, Tez, Spark etc.) 
•Run on Containers 
•Roles 
•Getting containers from ResourceManager 
•Application-level Scheduling 
•How much and where Map tasks run? 
•When reduce tasks will be launched? 
ApplicationMaster(AM) 
NodeManager 
Container 
Master of MapReduce 
ResourceManager 
1. Request containers 
2. List of Allocated containers
24 
Copyright©2014 NTT corp. All Rights Reserved. 
RESOURCE MANAGER HA
25 
Copyright©2014 NTT corp. All Rights Reserved. 
•What’s happen when ResourceManagerfails? 
•cannot submit new jobs 
•NOTE: 
•Launched Apps continues to run 
•AppMasterrecover is done in each frameworks 
•MRv2 
ResourceManager High Availability 
ResourceManager 
Slave 
NodeManager 
Container 
Container 
Container 
Slave 
NodeManager 
Container 
Container 
Container 
Slave 
NodeManager 
Container 
Container 
Container 
Master 
Slave 
Slave 
Master 
Slave 
Slave 
Master 
Slave 
Slave 
Client 
Submit jobs 
Continue to run each jobs
26 
Copyright©2014 NTT corp. All Rights Reserved. 
•Approach 
•Storing RM information to ZooKeeper 
•Automatic Failover by Embedded Elector 
•Manual Failover by RMHAUtils 
•NodeManagersuses local RMProxyto access them 
ResourceManager High Availability 
ResourceManager 
Active 
ResourceManager 
Standby 
ZooKeeper 
ZooKeeper 
ZooKeeper 
2. failure 
3. Embedded 
Detects 
failure 
EmbeddedElector 
EmbeddedElector 
4. Failover 
RMState 
RMState 
RMState 
1.Active Node storesall state into RMStateStore 
3. Standby 
Node become 
active 
5. Load states fromRMStateStore
27 
Copyright©2014 NTT corp. All Rights Reserved. 
CAPACITY PLANNINGON YARN
28 
Copyright©2014 NTT corp. All Rights Reserved. 
•Define resources with XML(etc/hadoop/yarn-site.xml) 
Resource definition on NodeManager 
NodeManager 
CPU 
CPU 
CPU 
CPU 
CPU 
Memory 
Memory 
Memory 
Memory 
Memory 
<property> 
<name>yarn.nodemanager.resource.cpu-vcores</name> 
<value>8</value> 
</property> 
<property> 
<name>yarn.nodemanager.resource.memory-mb</name> 
<value>8192</value> 
</property> 
8 CPU cores 
8 GB memory
29 
Copyright©2014 NTT corp. All Rights Reserved. 
Container allocation on ResourceManager 
•RM accepts container request and send it to NM, but the request can be rewritten 
•Small requests will be rounded up to minimum-allocation-mb 
•Large requests will be rounded down tomaximum-allocation-mb 
<property> 
<name>yarn.scheduler.minimum-allocation-mb</name> 
<value>1024</value> 
</property> 
<property> 
<name>yarn.scheduler.maximum-allocation-mb</name> 
<value>8192</value> 
</property> 
ResourceManager 
Client 
Request 512MB 
NodeManager 
NodeManager 
NodeManager 
Request 1024MB 
Master
30 
Copyright©2014 NTT corp. All Rights Reserved. 
•Define how much MapTasksor ReduceTasksuse resource 
•MapReduce: etc/hadoop/mapred-site.xml 
Container allocation at framework side 
NodeManager 
CPU 
CPU 
CPU 
CPU 
CPU 
Memory 
Memory 
Memory 
Memory 
Memory 
8 CPU cores 
8 GB memory 
<property> 
<name>mapreduce.map.memory.mb</name> 
<value>1024</value> 
</property> 
<property> 
<name>mapreduce.reduce.memory.mb</name> 
<value>4096</value> 
</property> 
Slave 
NodeManager 
Container 
Container 
Master 
Giving us containers 
For map task-1024 MB memory, 
1 CPU core 
Container 
1024MB memory1 core
31 
Copyright©2014 NTT corp. All Rights Reserved. 
WHAT’S NEXT? –HDFS -
32 
Copyright©2014 NTT corp. All Rights Reserved. 
•HDFS-2832, HDFS-5682 
•Handling various storage types in HDFS 
•SSD, memory, disk, and so on. 
•Setting quota per storage types 
•Setting SSD quota on /home/user1 to 10 TB. 
•Setting SSD quota on /home/user2 to 10 TB. 
•(c) Not configuring any SSD quota on the remaining user directories (i.e. leaving it to defaults). 
Heterogeneous Storages for HDFS Phase 2 
<configuration> 
... 
<property> 
<name>dfs.datanode.data.dir</name> 
<value>[DISK]/mnt/sdc2/,[DISK]/mnt/sdd2,[SSD]/mnt/sde2</value> 
</property> 
... 
</configuration>
33 
Copyright©2014 NTT corp. All Rights Reserved. 
•HDFS-5851 
•Introducing obvious “Cache”layer in HDFS 
•DiscardableDistributed Memory 
•Applications can accelerate their speedsby using memory 
•DiscardableMemory and Materialized Queries is one of examples 
•Difference between RDD and DDM 
•Multi-tenancy aware 
•Handling data in processing layer or in Storage layer 
Support memory as a storage medium
34 
Copyright©2014 NTT corp. All Rights Reserved. 
•Archival storage 
•HDFS-6584 
•Transparent encryption 
•HDFS-6134 
And, more!
35 
Copyright©2014 NTT corp. All Rights Reserved. 
WHAT’S NEXT? –YARN -
36 
Copyright©2014 NTT corp. All Rights Reserved. 
•Non-stop YARN updating(YARN-666) 
•NodeManger, ResourceManager, Applications 
•Before 2.6.0 
•Restarting RM -> RM restarts all AMs -> restart all jobs 
•Restarting NMs -> NMs are removed from cluster-> Containers are restarted! 
•After 2.6.0 
•Restarting RM -> AMs continue run 
•Restarting NM -> NMs restore the state from local data 
Support for rolling upgrades in YARN 
ResourceManager 
Slave 
NodeManager 
Container 
Container 
Container 
Slave 
NodeManager 
Container 
Container 
Container 
Slave 
NodeManager 
Container 
Container 
Container 
Master 
Slave 
Slave 
Master 
Slave 
Slave 
Master 
Slave 
Slave
37 
Copyright©2014 NTT corp. All Rights Reserved. 
•Now we can run various subsystems on YARN 
•Interactive query engines : Spark, Impala, … 
•Batch processing engines : MapReduce, Tez, … 
•Problem 
•Interactive query engines allocates resources at the same time –it can delay daily batch. 
•Time-based reservation scheduling 
•8:00am –6:00pm, allocating resources for Impala 
•6:00pm –0:00am, allocating resources for MapReduce 
YARN reservation-subsystem 
Allocation for Interactive query engine 
Batch processing for 
The next day! 
8:00am 
6:00pm 
0:00am
38 
Copyright©2014 NTT corp. All Rights Reserved. 
•YARN-796 
•Handling heterogeneous machinesin one YARN cluster 
•GPU cluster 
•High memory cluster 
•40Gbps Network cluster 
•Labeling them and scheduling based on labels 
•Admin can add/remove labels via yarn rmadmincommands 
Support for admin-specified labels in YARN 
NodeManager 
NodeManager 
NodeManager 
NodeManager 
GPU 
NodeManager 
NodeManager 
NodeManager 
NodeManager 
40Gnetwork 
ResourceManager 
Client 
Submit jobs 
On GPU!
39 
Copyright©2014 NTT corp. All Rights Reserved. 
•Timeline service security 
•YARN-1935 
•Minimal support for running long-running services on YARN 
•YARN-896 
•Support for automatic, shared cache for YARN application artifacts 
•YARN-1492 
•And, and more! 
•Please check Wiki http://wiki.apache.org/hadoop/Roadmap 
And, more!
40 
Copyright©2014 NTT corp. All Rights Reserved. 
•Hadoop 2 is evolving rapidly 
•I appreciate if you can catch up via this presentaion! 
•New components from V2 
•HDFS 
•Quorum Journal Manager 
•NamenodeFederation 
•ResourceManager 
•NodeManager 
•Application Master 
•New features in 2.6: 
•Discardablememory store on HDFS, and so on. 
•Rolling update, labels for heterogeneous cluster on YARN, Reservation system, and so on… 
•Questions or Feedbacks -> user@hadoop.apache.org 
•Issue -> https://issues.apache.org/jira/browse/{HDFS,YARN,HADOOP, MAPREDUCE} 
Summary
41 
Copyright©2014 NTT corp. All Rights Reserved. 
•YARN-666 
•https://www.youtube.com/watch?v=O4Q73e2ua9Y&feature=youtu.be 
•http://www.slideshare.net/Hadoop_Summit/ t-145p230avavilapalli-mac

[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史

  • 1.
    Copyright©2014 NTT corp.All Rights Reserved. Apache Hadoop-What’s next?- @db tech showcase 2014 Tsuyoshi Ozawa [email protected]
  • 2.
    2 Copyright©2014 NTTcorp. All Rights Reserved. •Tsuyoshi Ozawa •Researcher & Engineer @ NTTTwitter: @oza_x86_64 •A Hadoop developer •Merged patches –53 patches! •Author of “Hadoop 徹底入門2nd Edition” Chapter 22(YARN) About me
  • 3.
    3 Copyright©2014 NTTcorp. All Rights Reserved. Quiz!!
  • 4.
    4 Copyright©2014 NTTcorp. All Rights Reserved. Does Hadoophave SPoF? Quiz
  • 5.
    5 Copyright©2014 NTTcorp. All Rights Reserved. Quiz All master nodes in Hadoopcan run as highly available mode
  • 6.
    6 Copyright©2014 NTTcorp. All Rights Reserved. Is Hadooponly for MapReduce? Quiz
  • 7.
    7 Copyright©2014 NTTcorp. All Rights Reserved. Quiz Hadoop isnot only for MapReducebut also Spark/Tez/Storm and so on…
  • 8.
    8 Copyright©2014 NTTcorp. All Rights Reserved. •Current Status of Hadoop-New features since Hadoop 2 - •HDFS •No SPoFwith NamenodeHA + JournalNode •Scaling out Namenodewith NamenodeFederation •YARN •Resource Management with YARN •No SPoFwith ResourceManagerHA •MapReduce •No SPoFwith ApplicationMasterrestart •What’s next? -Coming features in 2.6 release - •HDFS •Heterogeneous Storage •Memory as Storage Tier •YARN •Label-based scheduling •RM HA Phase 2 Agenda
  • 9.
    9 Copyright©2014 NTTcorp. All Rights Reserved. HDFS IN HADOOP 2
  • 10.
    10 Copyright©2014 NTTcorp. All Rights Reserved. •Once on a time, NameNodewas SPoF •In Hadoop 2, NameNodehasQuorum JournalManager •Replication is done by Pasxos-based protocol See also: http://blog.cloudera.com/blog/2012/10/quorum-based-journaling-in-cdh4-1/ NameNode with JournalNode NameNode QuorumJournalManager JournalNode JournalNode JournalNode Local disk Local disk Local disk
  • 11.
    11 Copyright©2014 NTTcorp. All Rights Reserved. •Once on a time, scalability of NameNodewas limited to memory •In Hadoop 2, NameNodehasFederation feature •Distributing metadata per namespace NameNode Federation Figures from: https://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop- hdfs/Federation.html
  • 12.
    12 Copyright©2014 NTTcorp. All Rights Reserved. RESOURCE MANAGEMENT IN HADOOP 2
  • 13.
    13 Copyright©2014 NTTcorp. All Rights Reserved. YARN •Generic resource management framework •YARN = Yet Another Resource Negotiator •Proposed by ArunC Murthy in 2011 •Container-level resource management •Container is more generic unit of resource than slots •Separate JobTracker’srole •Job Scheduling/Resource Management/Isolation •Task Scheduling What’s YARN? JobTracker MRv1 architecture MRv2 and YARN Architecture YARN ResourceManager Impala Master Spark Master MRv2 Master TaskTracker YARN NodeManager map slot reduce slot container container container
  • 14.
    14 Copyright©2014 NTTcorp. All Rights Reserved. •Running various processing frameworkson same cluster •Batch processing with MapReduce •Interactive query with Impala •Interactive deep analytics(e.g. Machine Learning) with Spark Why YARN?(Use case) MRv2/Tez YARN HDFS Impala Spark Periodic long batch query Interactive Aggregation query Interactive Machine Learning query
  • 15.
    15 Copyright©2014 NTTcorp. All Rights Reserved. •More effective resource management for multiple processing frameworks •difficult to use entire resources without thrashing •Cannot move *Real* big data from HDFS/S3 Why YARN?(Technical reason) Master for MapReduce Master for Impala Slave Impala slave map slot reduce slot MapReduce slave Slave Slave Slave HDFS slave Each frameworks has own scheduler Job2 Job1 Job1 thrashing
  • 16.
    16 Copyright©2014 NTTcorp. All Rights Reserved. •Resource is managed by JobTracker •Job-level Scheduling •Resource Management MRv1 Architecture Master for MapReduce Slave map slot reduce slot MapReduce slave Slave map slot reduce slot MapReduce slave Slave map slot reduce slot MapReduce slave Master for Impala Schedulers only now own resource usages
  • 17.
    17 Copyright©2014 NTTcorp. All Rights Reserved. •Idea •One global resource manager(ResourceManager) •Common resource pool for all frameworks(NodeManagerand Container) •Schedulers for each frameworks(AppMaster) YARN Architecture ResourceManager Slave NodeManager Container Container Container Slave NodeManager Container Container Container Slave NodeManager Container Container Container Master Slave Slave Master Slave Slave Master Slave Slave Client 1. Submit jobs 2. Launch Master 3. Launch Slaves
  • 18.
    18 Copyright©2014 NTTcorp. All Rights Reserved. YARN and Mesos YARN •AppMasteris launched for each jobs •More scalability •Higher latency •One container per req •One Master per Job Mesos •AppMasteris launched for each app(framework) •Less scalability •Lower latency •Bundle of containers per req •One Master per Framework ResourceManager NM NM NM ResourceMaster Slave Slave Slave Master1 Master2 Master1 Master2 Policy/Philosophy is different
  • 19.
    19 Copyright©2014 NTTcorp. All Rights Reserved. •MapReduce •Of course, it works •DAG-style processing framework •Spark on YARN •Hive on Tezon YARN •Interactive Query •Impala on YARN(via llama) •Users •Yahoo! •Twitter •LinkdedIn •Hadoop 2 @ Twitter http://www.slideshare.net/Hadoop_Summit/t- 235p210-cvijayarenuv2 YARN Eco-system
  • 20.
    20 Copyright©2014 NTTcorp. All Rights Reserved. YARN COMPONENTS
  • 21.
    21 Copyright©2014 NTTcorp. All Rights Reserved. •Master Node of YARN •Role •Accepting requests from 1.Application Masters for allocating containers 2.Clients for submitting jobs •Managing Cluster Resources •Job-level Scheduling •Container Management •Launching Application-level Master(e.g. for MapReduce) ResourceManager(RM) ResourceManager Client Slave NodeManager Container Container Master 4.Container allocationrequests to NodeManager 1. Submitting Jobs 2. Launching Master of jobs 3.Container allocation requests
  • 22.
    22 Copyright©2014 NTTcorp. All Rights Reserved. •Slave Node of YARN •Role •Accepting requests from RM •Monitoring local machine and report it to RM •Health Check •Managing local resources NodeManager(NM) NodeManager ResourceManager 2. Allocating containers Clients Master or 3. Launching containers containers 4. Containers information(host, port, etc.) 1. Request containers Periodic health check via heartbeat
  • 23.
    23 Copyright©2014 NTTcorp. All Rights Reserved. •Master of Applications(e.g. Master of MapReduce, Tez, Spark etc.) •Run on Containers •Roles •Getting containers from ResourceManager •Application-level Scheduling •How much and where Map tasks run? •When reduce tasks will be launched? ApplicationMaster(AM) NodeManager Container Master of MapReduce ResourceManager 1. Request containers 2. List of Allocated containers
  • 24.
    24 Copyright©2014 NTTcorp. All Rights Reserved. RESOURCE MANAGER HA
  • 25.
    25 Copyright©2014 NTTcorp. All Rights Reserved. •What’s happen when ResourceManagerfails? •cannot submit new jobs •NOTE: •Launched Apps continues to run •AppMasterrecover is done in each frameworks •MRv2 ResourceManager High Availability ResourceManager Slave NodeManager Container Container Container Slave NodeManager Container Container Container Slave NodeManager Container Container Container Master Slave Slave Master Slave Slave Master Slave Slave Client Submit jobs Continue to run each jobs
  • 26.
    26 Copyright©2014 NTTcorp. All Rights Reserved. •Approach •Storing RM information to ZooKeeper •Automatic Failover by Embedded Elector •Manual Failover by RMHAUtils •NodeManagersuses local RMProxyto access them ResourceManager High Availability ResourceManager Active ResourceManager Standby ZooKeeper ZooKeeper ZooKeeper 2. failure 3. Embedded Detects failure EmbeddedElector EmbeddedElector 4. Failover RMState RMState RMState 1.Active Node storesall state into RMStateStore 3. Standby Node become active 5. Load states fromRMStateStore
  • 27.
    27 Copyright©2014 NTTcorp. All Rights Reserved. CAPACITY PLANNINGON YARN
  • 28.
    28 Copyright©2014 NTTcorp. All Rights Reserved. •Define resources with XML(etc/hadoop/yarn-site.xml) Resource definition on NodeManager NodeManager CPU CPU CPU CPU CPU Memory Memory Memory Memory Memory <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>8</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>8192</value> </property> 8 CPU cores 8 GB memory
  • 29.
    29 Copyright©2014 NTTcorp. All Rights Reserved. Container allocation on ResourceManager •RM accepts container request and send it to NM, but the request can be rewritten •Small requests will be rounded up to minimum-allocation-mb •Large requests will be rounded down tomaximum-allocation-mb <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>1024</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>8192</value> </property> ResourceManager Client Request 512MB NodeManager NodeManager NodeManager Request 1024MB Master
  • 30.
    30 Copyright©2014 NTTcorp. All Rights Reserved. •Define how much MapTasksor ReduceTasksuse resource •MapReduce: etc/hadoop/mapred-site.xml Container allocation at framework side NodeManager CPU CPU CPU CPU CPU Memory Memory Memory Memory Memory 8 CPU cores 8 GB memory <property> <name>mapreduce.map.memory.mb</name> <value>1024</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>4096</value> </property> Slave NodeManager Container Container Master Giving us containers For map task-1024 MB memory, 1 CPU core Container 1024MB memory1 core
  • 31.
    31 Copyright©2014 NTTcorp. All Rights Reserved. WHAT’S NEXT? –HDFS -
  • 32.
    32 Copyright©2014 NTTcorp. All Rights Reserved. •HDFS-2832, HDFS-5682 •Handling various storage types in HDFS •SSD, memory, disk, and so on. •Setting quota per storage types •Setting SSD quota on /home/user1 to 10 TB. •Setting SSD quota on /home/user2 to 10 TB. •(c) Not configuring any SSD quota on the remaining user directories (i.e. leaving it to defaults). Heterogeneous Storages for HDFS Phase 2 <configuration> ... <property> <name>dfs.datanode.data.dir</name> <value>[DISK]/mnt/sdc2/,[DISK]/mnt/sdd2,[SSD]/mnt/sde2</value> </property> ... </configuration>
  • 33.
    33 Copyright©2014 NTTcorp. All Rights Reserved. •HDFS-5851 •Introducing obvious “Cache”layer in HDFS •DiscardableDistributed Memory •Applications can accelerate their speedsby using memory •DiscardableMemory and Materialized Queries is one of examples •Difference between RDD and DDM •Multi-tenancy aware •Handling data in processing layer or in Storage layer Support memory as a storage medium
  • 34.
    34 Copyright©2014 NTTcorp. All Rights Reserved. •Archival storage •HDFS-6584 •Transparent encryption •HDFS-6134 And, more!
  • 35.
    35 Copyright©2014 NTTcorp. All Rights Reserved. WHAT’S NEXT? –YARN -
  • 36.
    36 Copyright©2014 NTTcorp. All Rights Reserved. •Non-stop YARN updating(YARN-666) •NodeManger, ResourceManager, Applications •Before 2.6.0 •Restarting RM -> RM restarts all AMs -> restart all jobs •Restarting NMs -> NMs are removed from cluster-> Containers are restarted! •After 2.6.0 •Restarting RM -> AMs continue run •Restarting NM -> NMs restore the state from local data Support for rolling upgrades in YARN ResourceManager Slave NodeManager Container Container Container Slave NodeManager Container Container Container Slave NodeManager Container Container Container Master Slave Slave Master Slave Slave Master Slave Slave
  • 37.
    37 Copyright©2014 NTTcorp. All Rights Reserved. •Now we can run various subsystems on YARN •Interactive query engines : Spark, Impala, … •Batch processing engines : MapReduce, Tez, … •Problem •Interactive query engines allocates resources at the same time –it can delay daily batch. •Time-based reservation scheduling •8:00am –6:00pm, allocating resources for Impala •6:00pm –0:00am, allocating resources for MapReduce YARN reservation-subsystem Allocation for Interactive query engine Batch processing for The next day! 8:00am 6:00pm 0:00am
  • 38.
    38 Copyright©2014 NTTcorp. All Rights Reserved. •YARN-796 •Handling heterogeneous machinesin one YARN cluster •GPU cluster •High memory cluster •40Gbps Network cluster •Labeling them and scheduling based on labels •Admin can add/remove labels via yarn rmadmincommands Support for admin-specified labels in YARN NodeManager NodeManager NodeManager NodeManager GPU NodeManager NodeManager NodeManager NodeManager 40Gnetwork ResourceManager Client Submit jobs On GPU!
  • 39.
    39 Copyright©2014 NTTcorp. All Rights Reserved. •Timeline service security •YARN-1935 •Minimal support for running long-running services on YARN •YARN-896 •Support for automatic, shared cache for YARN application artifacts •YARN-1492 •And, and more! •Please check Wiki http://wiki.apache.org/hadoop/Roadmap And, more!
  • 40.
    40 Copyright©2014 NTTcorp. All Rights Reserved. •Hadoop 2 is evolving rapidly •I appreciate if you can catch up via this presentaion! •New components from V2 •HDFS •Quorum Journal Manager •NamenodeFederation •ResourceManager •NodeManager •Application Master •New features in 2.6: •Discardablememory store on HDFS, and so on. •Rolling update, labels for heterogeneous cluster on YARN, Reservation system, and so on… •Questions or Feedbacks -> [email protected] •Issue -> https://issues.apache.org/jira/browse/{HDFS,YARN,HADOOP, MAPREDUCE} Summary
  • 41.
    41 Copyright©2014 NTTcorp. All Rights Reserved. •YARN-666 •https://www.youtube.com/watch?v=O4Q73e2ua9Y&feature=youtu.be •http://www.slideshare.net/Hadoop_Summit/ t-145p230avavilapalli-mac