The Big Data Technology Landscape
The Big Data Technology Landscape
Q/A 15 minutes
Agenda
NoSQL
What is it?
Types of NoSQL Databases
Why NoSQL?
Advantages of NoSQL
NoSQL Vendors
SQL versus NoSQL
NewSQL
Comparison of SQL, NoSQL and NewSQL
Hadoop
Features of Hadoop
Key Advantages of Hadoop
Versions of Hadoop
What is NoSQL?
What is NoSQL?
No Joins
NoSQL
No multi-document transactions
Easy to distribute
SQL NoSQL
Relational database Non-relational, distributed database
Relational model Model-less approach
Pre-defined schema Dynamic schema for unstructured data
Table based databases Document-based or graph-based or wide column store or
key-value pairs databases
Vertically scalable (by increasing system Horizontally scalable (by creating a cluster of
resources) commodity machines)
Uses SQL Uses UnQL (Unstructured Query Language)
Not preferred for large datasets Largely preferred for large datasets
Not a best fit for hierarchical data Best fit for hierarchical storage as it follows the key-
value pair of storing data similar to JSON (Java Script
Object Notation)
Emphasis on ACID properties Follows Brewer’s CAP theorem
Excellent support from vendors Relies heavily on community support
Supports complex querying and data Does not have good support for complex querying
keeping needs
Can be configured for strong consistency Few support strong consistency (e.g., MongoDB), few
others can be configured for eventual consistency (e.g.,
Cassandra)
Examples: Oracle, DB2, MySQL, MS SQL, MongoDB, HBase, Cassandra, Redis, Neo4j, CouchDB,
PostgreSQL, etc. Couchbase, Riak, etc.
NewSQL
NewSQL
Hadoop
Apache Open-Source Software Framework
Inspired by
- Google MapReduce
- Google File System
HDFS YARN
(redundant, reliable storage) (Cluster Resource Manager)
HDFS
(redundant, reliable storage)
Hadoop Ecosystem
Ambari
(Provisioning, Managing & Monitoring Hadoop Cluster)
HDFS is the file system where as HBase is a Hadoop database. It is like NTFS
and MySQL.
HDFS is WORM (Write once and read multiple times or many times). Latest
versions supports appending of data but this feature is rarely used. However
HBase supports real time random read and write.
HDFS is based on Google File System (GFS) whereas Hbase is based on Google
Big Table.
Hadoop Ecosystem Components for Data Ingestion
Sqoop:
Sqoop stands for SQL to Hadoop. It can provision the data from external
system on to HDFS and populate tables in Hive and HBase.
Flume:
Flume is an important log aggregator (aggregates logs from different
machines and places them in HDFS) component in the Hadoop Ecosystem.
Hadoop Ecosystem Components for Data Processing
MapReduce:
It is a programing paradigm that allows distributed and parallel processing of
huge datasets. It is based on Google MapReduce.
Spark:
It is both a programming model as well as a computing model. It is an open
source big data processing framework.
It is written in Scala. It provides in-memory computing for Hadoop.
Spark can be used with Hadoop coexisting smoothly with MapReduce (sitting
on top of Hadoop YARN) or used independently of Hadoop (standalone).
Hadoop ecosystem components for Data Analysis
Pig
It is a high level scripting language used with Hadoop. It serves as an
alternative to MapReduce. It has two parts:
Pig Latin: It is a SQL like scripting language.
Pig runtime: is the runtime environment.
Hive:
Hive is a data warehouse software project built on top of Hadoop. Three main
tasks performed by Hive are summarization, querying and analysis
Impala:
It is a high performance SQL engine that runs on Hadoop cluster. It is ideal
for interactive analysis. It has very low latency measured in milliseconds. It
supports a dialect of SQL called Impala SQL.
Answer a few quick questions …
Fill in the blanks
http://www.mongodb.com/nosql-explained
http://nosql-database.org/
http
://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapr
educe-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html
http://hadoop.apache.org/
Thank you