La big datacamp2014_vikram_dixit

© Hortonworks Inc. 2013.© Hortonworks Inc. 2013.
Hive 0.13: An upgrade in
Performance, Scaling,
Security and Multi-tenancy
Vikram Dixit
(vikram@apache.org)

Hive – SQL on Hadoop
• Open source Apache project
• Started by Facebook in 2009
• Tools to enable easy data extract/transform/load (ETL)
• Work with structured, unstructured, semi-structured data
• Access to files stored either directly in Apache HDFSTM or in other
data storage systems such as Apache HBaseTM
• Query execution via MapReduce/Tez
• Metadata sharing via HCatalog allows your Pig scripts to work with
Hive tables

Using Hive Effectively
• Understanding Hive’s use case
– Current focus on making it a fast analytics engine that scales
– Transactions coming
• Understand the storage mechanism right for you
– ORC File - highest compression, metadata used to enable faster reads
– Parquet - intermediate compression, fast reads
– RC File - legacy, most widely used but suffers in performance
– Text - ease of use but lowest in terms of performance
• Use the right execution engine
– Tez is the recommended execution engine for performance
– Map reduce is chosen by default in cases where tez can not yet run the query
• Use the right configuration flags
– Many optimizations are turned on by default
– Some are not. Need to tune it for your cluster because a default is hard to come up
with.

What’s new in Hive 0.13
•Speed
– Hive on Tez – Broadcast Joins, Bucket Map Joins
– Vectorized Query processing
– Split elimination for ORC file
– Parquet file format support
•Scale
– Smaller hash tables allowing more scalable map joins
– More scalable dynamic partition loads

What’s new in Hive 0.13
• More SQL improvements
– SQL standard Authorization
– Char support, Decimal improvements
– Permanent UDFs
– Streaming ingest from Flume for ACID capability
• Additional Improvements
– Hive Server 2 improvements
– HCatalog parity with Hive data types
– JDBC improvements viz. job cancel, async execution
• Even more goodies
– Mavenization
– Parallel test framework
– Lots of documentation

Stinger Project
(announced February 2013)
Batch AND Interactive SQL-IN-
Hadoop
Stinger Initiative
A broad, community-based effort to
drive the next generation of HIVE
Hive 0.13, April, 2013
• Hive on Apache Tez
• Query Service
• Buffer Cache
• Cost Based Optimizer (Optiq)
• Vectorized Processing
Hive 0.11, May 2013:
• Base Optimizations
• SQL Analytic Functions
• ORCFile, Modern File Format
Hive 0.12, October 2013:
• VARCHAR, DATE Types
• ORCFile predicate pushdown
• Advanced Optimizations
• Performance Boosts via YARN
Speed
Improve Hive query performance by 100X to
allow for interactive query times (seconds)
Scale
The only SQL interface to Hadoop designed
for queries that scale from TB to PB
SQL
Support broadest range of SQL semantics for
analytic applications running against Hadoop
…all IN Hadoop
Goals:

SPEED: Increasing Hive
Performance
Key Highlights
– Tez: New execution engine
– Vectorized Query Processing
– Startup time improvement
– Statistics to accelerate query execution
– Cost Based Optimizer: Optiq (missed the cut)
Interactive Query Times across ALL use cases
• Simple and advanced queries in seconds
• Integrates seamlessly with existing tools
• Currently a >100x improvement in just nine months
Elements of Fast SQL Execution
• Query Planner/Cost Based Optimizer
w/ Statistics
• Query Startup
• Query Execution
• I/O Path

Apache Tez (“Speed”)
• Replaces MapReduce as primitive for Pig, Hive, Cascading etc.
– Smaller latency for interactive queries
– Higher throughput for batch queries
– 22 contributors: Hortonworks (13), Facebook, Twitter, Yahoo, Microsoft
YARN ApplicationMaster to run DAG of Tez Tasks
Task with pluggable Input, Processor and Output
Tez Task - <Input, Processor, Output>
Task
ProcessorInput Output

Hive – MR Hive – Tez
Hive-on-MR vs. Hive-on-Tez
SELECT g1.x, g1.avg, g2.cnt
FROM (SELECT a.x, AVERAGE(a.y) AS avg FROM a GROUP BY a.x) g1
JOIN (SELECT b.x, COUNT(b.y) AS avg FROM b GROUP BY b.x) g2
ON (g1.x = g2.x)
ORDER BY avg;
GROUP a BY a.x
JOIN (a,b)
GROUP b BY b.x
ORDER BY
M M M
R R
M M
R
M M
R
M
R
HDFS HDFS
HDFS
M M M
R R
R
M M
R
GROUP BY a.x
JOIN (a,b)
ORDER BY
GROUP BY x
Tez avoids
unnecessary writes
to HDFS
HIVE-4660

Shuffle Join
SELECT ss.ss_item_sk, ss.ss_quantity, inv.inv_quantity_on_hand
FROM inventory inv
JOIN store_sales ss
ON (inv.inv_item_sk = ss.ss_item_sk);

Broadcast Join
• Similar to map-join w/o the need to build a hash table on the
client
• Will work with any level of sub-query nesting
• Uses stats to determine if applicable
• How it works:
– Broadcast result set is computed in parallel on the cluster
– Join processor are spun up in parallel
– Broadcast set is streamed to join processor
– Join processors build hash table
– Other relation is joined with hashtable
• Tez handles:
– Best parallelism
– Best data transfer of the hashed relation
– Best scheduling to avoid latencies

Broadcast Join
M M M
M
HDFS
M MM
M M
HDFS
FROM store_sales ss
JOIN inventory inv
HDFS
Inventory scan
(Runs as single
local map task)
Store Sales scan
and Join
(Inventory hash
table read as side
file)
Inventory scan
(Runs on cluster
potentially more
than 1 mapper)
Store Sales scan
and Join
Broadcast
edge

Dynamically partitioned Hash join
• Kicks in when large table is bucketed
– Bucketed table
– Dynamic as part of query processing
– Enabled via set hive.convert.join.bucket.mapjoin.tez = true; (use 0.13.1)
• Uses custom edge to match the partitioning on the smaller table
• Allows hash-join in cases where broadcast would be too large
• Tez gives us the option of building custom edges and vertex
managers
– Fine grained control over how the data is replicated and partitioned
– Scheduling and actual data transfer is handled by Tez

Dynamically Partitioned Hash Join
FROM store_sales ss
JOIN inventory inv
M MM
M M
HDFS
Inventory scan
(Runs on cluster
potentially more
than 1 mapper)
Store Sales scan
and Join (Custom
vertex reads
both inputs – no
side file reads)
Custom edge
(routes outputs of
previous stage to
the correct
Mappers of the
next stage)M MM
M
HDFS
Inventory scan
(Runs as single
local map task)
Store Sales scan
and Join
(Inventory hash
table read as side
file)
HDFS

Dynamically Partitioned Hash Join
Plans look very similar to map join but the way things work change between
MR and Tez.
Hive – MR (Bucket map-join) Hive – Tez
• Not dynamically partitioned.
• Both tables need to be bucketed by the join key.
• Local task that generates the hash table writes n
files corresponding to n buckets.
• Number of mappers for the join must be same
as the number of buckets.
• Each of these mappers reads the corresponding
bucket file of the local task to perform the join.
• Only one of the sides needs to be bucketed and
the other side is dynamically bucketed.
• Also works if neither side is explicitly bucketed,
but another operation forced bucketing in the
pipeline (traits)
• No writing to HDFS.
• There can be more mappers than number of
buckets but splits do not span multiple buckets.
• The dynamically bucketed mappers have as
many outputs as number of buckets and a
custom tez routing ensures these outputs reach
the right mappers.

Bulk Inner loop: Vectorization
• Avoid Writable objects & use primitive int/long
– Allows efficient JIT code for primitive types
• Generate per-type loops & avoid runtime type-checks
• The classes generated look like
– LongColEqualDoubleColumn
– LongColEqualLongColumn
– LongColEqualLongScalar
• Avoid duplicate operations on repeated values
– isRepeating & hasNulls

ORC: ZeroCopy & caching
• Use memory mapped I/O path in HDFS
– HDFS in-memory cache
• ORC reads can start deserializing early
– there is no blocking read() call
• Allow OS read-ahead to kick-in
• Use buffer-cache pages without copying it
• Avoid wasting heap space on ORC stripes
• Decompress directly from mapped buffers
– Fast JNI code for SNAPPY decompressors

Scaling
• Reduce size of map join hash tables
– Hundred bytes were being used to store an integer (Map join key)
– HIVE-6430 reduced sizes of the hash tables by 60-70% in many cases
– Allowed more efficient use of memory and hence more tables to fit in
• Large number of open record writers in ORC file reduced to just 1
– HIVE-6455
– Now in a multi-insert scenario, performance is much better and many more
inserts can be done in parallel

TPC-DS 10 TB Query Times (Tuned Queries)
Page 19
Data: 10 TB data loaded into ORCFile using defaults.
Hardware: 20 nodes: 24 CPU threads, 64 GB RAM, 5 local SATA drives each.
Queries: TPC-DS queries with partition filters added.

© Hortonworks Inc. 2011
Security
Page 20
Architecting the Future of Big Data
• Old authorization based on grant/revoke
• Incomplete model - eg. Anybody can run grant statement
• Does not try following standard
• Why follow standard ?
• Lot of thought has been put into the standard – important for
security!
• It’s a standard!
• Hive should have built-in authorization
• Easy to use, no additional components to manage
• New features get added that needs authorization
• Life cycle of objects should be synced with authorization policy

© Hortonworks Inc. 2011
Managing privileges
Page 21
Architecting the Future of Big Data
• Grant/revoke privilege on object to/from user/role
• SHOW GRANT statement
• view privilege grants based on user/role name and/or object name
• INSERT, SELECT, DELETE, UPDATE, ALL
• Privileges for some actions based on object ownership
• Table/view ownership : Most alter commands, drop
• Database ownership : create table, drop database
• URI privileges based on file permissions
• https://cwiki.apache.org/confluence/display/Hive/SQL+Stand
ard+based+hive+authorization#SQLStandardBasedHiveAuthor
ization-Configuration
• Use hive 0.13.1 – fixes the issues listed under known issues in
above wiki doc.

Hive Server 2 improvements
• Hive server 2 now supports thrift over HTTP and kerberos/LDAP
authentication on HTTP
• Also supports HTTPS
• HiveServer2 can keep sessions alive
– Between different JDBC queries
• New security model helps
– All secure queries run as “hive” user
• Ideal for short exploratory queries
• Uses same JARs (no download for task)
• Even better JIT performance on >1 queries

Other improvements
• Insert-update-delete semantics. Streaming ingest from flume. (HIVE-
5317)
– Transaction manager added in. Support for ORC file format only at this time.
• Lots of UDF support via permanent functions. No need to have add
jar for most commonly used UDFs. Ideally, admin adds the permanent
(trusted) functions.
• Parquet is a supported storage format.
https://cwiki.apache.org/confluence/display/Hive/Parquet
• HCatalog now supports all the datatypes supported in Hive.
• Hive is now mavenized (Thanks Brock Noland!)
• Parallel test framework means Unit testing happens faster and
changes get in faster.
• Lots of new documentation for all the new features. (Thanks Lefty!)
• Bottom line: Hive 0.13 is the fastest, most feature rich version of hive
so far.

Future Work
• Lot more improvements coming up
• Speed
– Sort Merge Bucket Map join in Tez
– Total ordering of data
– Skew joins
– Cost based Optimizer
• Security
– Authorizing permanent UDF access
– Authorizing ‘show grant’
– Support hdfs ACL in URI permission checks (new in hadoop 2.4)
– More SQL syntax support – eg revoke just admin option on a role
• Multi-tenancy
– Sticky HS2 sessions for improved performance in a multi-tenant environment
– Improve scheduling in a multi-tenant environment

Questions?

La big datacamp2014_vikram_dixit

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to La big datacamp2014_vikram_dixit (20)

More from Data Con LA (20)

Recently uploaded (20)

La big datacamp2014_vikram_dixit

Editor's Notes