0% found this document useful (0 votes)

70 views

Columnar Database

Uploaded by

Xenos Playground aka Boxman Studios

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views

Columnar Database

Uploaded by

Xenos Playground aka Boxman Studios

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Columnar Database - HBase

Dr. Richa Sharma

Commonwealth University
Introduction
 Stores data tables by columns rather than by rows!

 This allows efficient data retrieval, especially when aggregate

operations are there on the query, and therefore quite helpful with
data analytics and data warehousing!

 Columnar storage enables better data compression due to the

similarity of data within a column – this enhances aggregation
operation on queries and data analytics!

 HBase, Cassandra, Amazon Redshift are examples of columnar

database.

 Columnar databases can use traditional SQL to load data and

execute queries.
Example of data in columnar DB

 Let’s assume a snapshot of a table as:

Attr1 Attr2 Attr3

1111 Val1 10000
2222 Val2 20000
3333 Val3 15000

 Columnar storage of this table will consider the data as:

(1111, 2222, 3333; Val1, Val2, Val3; 10000, 20000, 15000)

 Row-oriented storage of this table will store the data as:

(1111, Val1, 10000; 2222, Val2, 20000; 3333, Val3, 15000)
Columnar vs Row DB
 Columnar databases store data vertically in a table while row-oriented
databases store data horizontally organizing each record in a row of a
table.

 The data in the columnar database has a highly compressible nature that
speeds up aggregate operations such as AVG, MIN, MAX on big data.
Such operations are relatively slower on relational database.

 Column-based DBMS use self-indexing mechanism, which uses less disk

space than RDBMS containing the same data!

 Relational model focuses on structured data and adheres to the principles

of normalization, ensuring data integrity and consistency through well-
defined relationships. RDMBS prioritizes transactional processing and
quick access to entire record. Columnar databases leverage vertical
storage to enhance query performance, making them particularly suitable
for data warehousing and analytics tasks.
Benefits of Columnar DB
 Data within a single column is homogenous - this makes it highly
amenable to compression. Columnar databases capitalize on this by
applying advanced compression techniques, significantly reducing
storage requirements and associated costs. This compression also
results in less I/O overhead!

 In a columnar database, only the columns relevant to a query need to

be accessed and processed. This contrasts with row-based databases,
where entire rows must be read, even if only a few columns are
needed. This selective data retrieval translates to faster query
performance, especially for analytical queries that typically aggregate or
scan large volumes of data.

 Columnar databases are well-suited for vectorized operations, where

the same operation is applied to multiple data points simultaneously,
making them useful for big data processing.
Benefits of Columnar DB
 Due to their structure, columnar databases are inherently efficient at
aggregating and summarizing data, operations that are fundamental to
analytics and reporting. This makes them ideal choice for business
intelligence and analytical applications!

 Columnar databases store sparse data efficiently. In scenarios where

there are many missing or null values, columnar databases does not
store any data for those missing values. This leads to significant
storage saving compared to row-based systems.

 Columnar databases are generally easier to scale horizontally, which

means adding more servers to handle increased load. This scalability
is particularly beneficial in cloud computing environments where
resources can be dynamically adjusted based on demand.

Source: https://atlan.com/what-is/columnar-database
Limitations of Columnar DB
 Columnar databases are optimized for read-heavy analytical
queries, and are not suitable for transactional workloads.

 Columnar databases have higher overhead for writing data as

each data insertion or update may require accessing and
modifying several distinct column files, leading to increased
I/O overhead.

 Not suitable for row-oriented databases’ like SQL queries or

table joins!

 High learning curve for developers and cost considerations as

well!
HBase
Introduction

 HBase is a distributed, column-oriented database that is very

effective for handling large, sparse datasets.

 HBase is based on Bigtable, a high-performance, proprietary

database developed by Google, and described in a white paper
in the year 2006!

 HBase is written in Java.

 HBase integrates seamlessly with Apache Hadoop and runs on

top of the Hadoop Distributed File System (HDFS). HBase
serves as a direct input and output to the Apache MapReduce
framework for Hadoop, and works with Apache Phoenix (SQL
layer) to enable SQL-like queries over HBase tables.
Features of HBase
 HBase not only ensures scalability and consistency, it has other
features too that make it a popular choice for dealing with big data:

◦ HBase has some built-in features that other databases lack such
as versioning, compression, garbage collection (for expired data),
and in-memory tables.

◦ Having these features available in the database box implies that

application developers will need to write less code for such
requirements.

◦ HBase guarantees atomicity at the row level, which means that

one can have strong consistency at a crucial level of HBase’s data
model.

◦ The fact that HBase guarantees strong consistency, makes it

easier to transition from relational databases to HBase.
HBase Architecture

 HBase lives in the Hadoop ecosystem, where it benefits from

its proximity to other related tools. Based on distributed
system, HBase is, by design, fault tolerant.

 In HBase, row is a collection of column families. A column

family is a collection of columns. A column is a collection of
key value pairs.

 A table in HBase is basically a big map - more accurately, a

map of maps!
 In an HBase table, keys are arbitrary strings that each map to
a row of data.A row is itself a map in which keys are called
columns and values are stored as uninterpreted arrays of
HBase CRUD operations
 In Hbase, columns are grouped into column families, so a column’s
full name consists of two parts: the column family name and the
column qualifier!

 An HBase table might look like if it were a Python dictionary:

hbase_table = { # Table
'row1': { # Row key
'cf1:col1': 'value1', # Column family, column, & value
'cf1:col2': 'value2',
'cf2:col1': 'value3'
},
'row2': {
# More row data
}
HBase CRUD operations
 Create command in HBase supports creation of table,
example: following command creates a table named ‘wiki’
with a single column family named as ‘text’
create 'wiki', 'text‘
When the table is created, it is empty; it has no rows and
no columns!

 Put command allows adding data to HBase table. Example:

put 'wiki', 'Home', 'text:', 'Welcome to the wiki!'
This command inserts a new row into the wiki table with the
key 'Home', adding 'Welcome to the wiki!' to the column
called 'text:‘!
HBase CRUD operations
 Get command in HBase helps retrieving data from the table,
get command requires two parameters: the table name and the
row key. We can optionally specify a list of columns to return.
Example:
get 'wiki', 'Home', 'text:'
This command returns: “value=Welcome to the wiki!”. This
command fetches the value of the text: column from wiki table!

 Scan operations simply return all rows in the entire table.

Scans are powerful and great for development purposes
Example:
scan 'wiki'
HBase – importing Data
 When we setup a new database, one major problem
that we encounter is how to migrate data into it!
 Handcrafting ‘put’ operations with static strings to do this
task can be of help – but that’s a cumbersome solution.

 A better solution would be to have some scripts ready to

migrate data from original data source to HBase!

 Most of the Big Data for which HBase is the best

solution can be exported as XML files with informative
XML tags for which we can write scripts to extract data
and put into HBase table map!
HBase – importing Data
 Often times, the data that needs to be imported to Hbase is
big blobs of text content, which takes longer to read and write!
 HBase has got compression utilities to speed up data reads.

 HBase supports two compression algorithms: Gzip (GZ) and

Lempel-Ziv-Oberhumer (LZO)! LZO has licensing problems –
that makes Gzip a favourable choice over LZO 

 HBase features Bloom filters as a faster way of determining

whether data exists well before incurring an expensive disk
read!!
 A Bloom filter is a useful data structure to determine whether a
particular column exists for a given row key or just whether a
given row key exists at all (BLOOMFILTER=>'ROW').
Attributes of database to explore!

 Nature of problem and usage of database – problems where

“Big Data” processing is a requirement! Example: Airbnb,
Yahoo, eBay, Meetup etc.

 Unique characteristic of database – similar to relational DB

but treats tables, rows and columns differently. Tables are
maps of maps (column family).

 Communication interface of database – HBase provides

command-line interface for interacting with the database!

 Scalability – Highly scalable for Big Data with good

performance!

 Security – One can enable SSH security at HBase cluster level.

Attributes of database to explore!

 Durability – HBase can gracefully recover from individual

server failures because it uses write-ahead logging (WAL),
which writes data to an in-memory log before it’s written to the
disk (so that nodes can use the logs for recovery rather than
disk). This also means that nodes can rely on each other for
configuration rather than on a centralized source.

 Database Replication – HBase does support cluster-to-cluster

replication. A typical multi-cluster setup could have clusters
separated geographically by some distance. In this case, for a
given column family, one cluster is the system of record, while
the other clusters merely provide access to the replicated data.

21st Century Boys v02, (2007) (Obxist)
No ratings yet
21st Century Boys v02, (2007) (Obxist)
205 pages
RPE - M05 Notes - Databases & Research Metrics
100% (2)
RPE - M05 Notes - Databases & Research Metrics
24 pages
Relativism in Ethics - William Shaw
No ratings yet
Relativism in Ethics - William Shaw
4 pages
Blueprint MC Configure The CMDB Simulator
No ratings yet
Blueprint MC Configure The CMDB Simulator
4 pages
Chapter 12 HBase[1]
No ratings yet
Chapter 12 HBase[1]
108 pages
9 HBase
No ratings yet
9 HBase
77 pages
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
No ratings yet
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
32 pages
Unit - IV_Notes
No ratings yet
Unit - IV_Notes
23 pages
HBase (Unit 4)
No ratings yet
HBase (Unit 4)
37 pages
Learn Hbase in 24 Hours
From Everand
Learn Hbase in 24 Hours
Alex Nordeen
No ratings yet
10_HBase
No ratings yet
10_HBase
13 pages
UNIT 5 Notes
No ratings yet
UNIT 5 Notes
47 pages
HBase
No ratings yet
HBase
38 pages
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
No ratings yet
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
6 pages
Unit 5 BDA
No ratings yet
Unit 5 BDA
34 pages
BDM Unit 5
No ratings yet
BDM Unit 5
60 pages
HBase
No ratings yet
HBase
27 pages
HBase - Tutorial
No ratings yet
HBase - Tutorial
14 pages
pbds unit-5
No ratings yet
pbds unit-5
60 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
34 pages
Unit 5 Notes
100% (3)
Unit 5 Notes
66 pages
Hadoop HBASE
No ratings yet
Hadoop HBASE
71 pages
Bda - Unit 5
No ratings yet
Bda - Unit 5
30 pages
HBASE
No ratings yet
HBASE
11 pages
Assignment Day 10: Task 1
No ratings yet
Assignment Day 10: Task 1
8 pages
BDT UNIT - V
No ratings yet
BDT UNIT - V
15 pages
Hbase in Practice
No ratings yet
Hbase in Practice
46 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
Assignment 10
No ratings yet
Assignment 10
9 pages
Cse 17CS82 M2 S4 PPT
No ratings yet
Cse 17CS82 M2 S4 PPT
19 pages
HBase
No ratings yet
HBase
6 pages
DBMS Unit3
No ratings yet
DBMS Unit3
28 pages
BDA Unit 5
No ratings yet
BDA Unit 5
33 pages
Large-Scale Data Management: Hbase
No ratings yet
Large-Scale Data Management: Hbase
36 pages
Lesson 6 NoSQL Databases HBase
100% (1)
Lesson 6 NoSQL Databases HBase
47 pages
UNIT5
No ratings yet
UNIT5
42 pages
Unit 5 Hbase
No ratings yet
Unit 5 Hbase
15 pages
Big Data Unit 5
No ratings yet
Big Data Unit 5
18 pages
Cs525: Special Topics in DBS: Large-Scale Data Management
No ratings yet
Cs525: Special Topics in DBS: Large-Scale Data Management
35 pages
HBASE (1)
No ratings yet
HBASE (1)
18 pages
HBase
No ratings yet
HBase
31 pages
Unit v Hadoop Related Tools_b5f716067e8295de72a527efb7a3698b
No ratings yet
Unit v Hadoop Related Tools_b5f716067e8295de72a527efb7a3698b
54 pages
HBase
No ratings yet
HBase
30 pages
Hadoop Week 6
No ratings yet
Hadoop Week 6
38 pages
unit-5 notes
No ratings yet
unit-5 notes
61 pages
Unit - 5 Part - 1
No ratings yet
Unit - 5 Part - 1
8 pages
BDA.Unit-5
No ratings yet
BDA.Unit-5
31 pages
Lecture10 HBase
No ratings yet
Lecture10 HBase
70 pages
Big data UNIT 5 own
No ratings yet
Big data UNIT 5 own
18 pages
Apache HBase PPT
No ratings yet
Apache HBase PPT
12 pages
DDMUNIT5
No ratings yet
DDMUNIT5
11 pages
Hbase - Quick Guide Hbase - Overview
No ratings yet
Hbase - Quick Guide Hbase - Overview
53 pages
C7 Hbase
No ratings yet
C7 Hbase
36 pages
b0e1c9217ce447eb90f001de93aa0803 Chapter03HBase—DistributedDatabase&Hive—
No ratings yet
b0e1c9217ce447eb90f001de93aa0803 Chapter03HBase—DistributedDatabase&Hive—
54 pages
Hbase
No ratings yet
Hbase
13 pages
Hbase: Q) What Is Hbase ?
No ratings yet
Hbase: Q) What Is Hbase ?
15 pages
BDA Unit 5 HIVE HBASE
No ratings yet
BDA Unit 5 HIVE HBASE
33 pages
HBase Presentation
No ratings yet
HBase Presentation
23 pages
Hbase: Schema Design
No ratings yet
Hbase: Schema Design
189 pages
HBASE
No ratings yet
HBASE
35 pages
Unit III_Full
No ratings yet
Unit III_Full
31 pages
lec18
No ratings yet
lec18
21 pages
lec18
No ratings yet
lec18
18 pages
Unit 5 Hbase - Hive - Pig
No ratings yet
Unit 5 Hbase - Hive - Pig
93 pages
Ch_2 C_V7.01_J
No ratings yet
Ch_2 C_V7.01_J
37 pages
Chapter-08-2
No ratings yet
Chapter-08-2
20 pages
Chapter-06
No ratings yet
Chapter-06
46 pages
Chapter-14
No ratings yet
Chapter-14
35 pages
Eliot PsychoanalyticInterpretationGroup 1920
No ratings yet
Eliot PsychoanalyticInterpretationGroup 1920
21 pages
Chapter-02
No ratings yet
Chapter-02
45 pages
Query Optimization
No ratings yet
Query Optimization
10 pages
Chapter-04
No ratings yet
Chapter-04
29 pages
Chapter_3_J_v8.0_V04 (1)
No ratings yet
Chapter_3_J_v8.0_V04 (1)
150 pages
Intro-Databases For Big Data
No ratings yet
Intro-Databases For Big Data
10 pages
CAP Theorem
No ratings yet
CAP Theorem
15 pages
SQL Views & Procedures
No ratings yet
SQL Views & Procedures
23 pages
SQL Triggers & Functions
No ratings yet
SQL Triggers & Functions
16 pages
SQL Queries5
No ratings yet
SQL Queries5
20 pages
Review - Normal Forms2
No ratings yet
Review - Normal Forms2
17 pages
Chapter 6 Management A Practical Introduction
No ratings yet
Chapter 6 Management A Practical Introduction
6 pages
SQL Functions
No ratings yet
SQL Functions
18 pages
86EIGHTY-SIX Vol 10 Light Novel Fragmental Neoteny - Asato Asato
No ratings yet
86EIGHTY-SIX Vol 10 Light Novel Fragmental Neoteny - Asato Asato
289 pages
Review of DB Concepts
No ratings yet
Review of DB Concepts
27 pages
Deutsch GroupFormation 1973
No ratings yet
Deutsch GroupFormation 1973
20 pages
Examining Maslow's Hierarchy Need Theory in The Social Media Adoption
No ratings yet
Examining Maslow's Hierarchy Need Theory in The Social Media Adoption
11 pages
Quality Indicators For The Care of Older Adults W Disabilities in Longterm Care Wbased On Maslow Hierarchy of Needs
No ratings yet
Quality Indicators For The Care of Older Adults W Disabilities in Longterm Care Wbased On Maslow Hierarchy of Needs
7 pages
A Suggested Modification To Maslow's Need Hierarchy
No ratings yet
A Suggested Modification To Maslow's Need Hierarchy
6 pages
Reviving The Lost Tort of Defamation A Proposal To Stem The Flow of Fake News
No ratings yet
Reviving The Lost Tort of Defamation A Proposal To Stem The Flow of Fake News
26 pages
BLAME! Master Edition v02 (2016) (Digital) (Danke-Empire)
No ratings yet
BLAME! Master Edition v02 (2016) (Digital) (Danke-Empire)
364 pages
BLAME! Master Edition v01 (2016) (Digital) (Danke-Empire)
100% (1)
BLAME! Master Edition v01 (2016) (Digital) (Danke-Empire)
396 pages
The Great Divide Drivers of Polarization
No ratings yet
The Great Divide Drivers of Polarization
13 pages
BLAME! Master Edition v03 (2017) (Digital) (Danke-Empire)
100% (1)
BLAME! Master Edition v03 (2017) (Digital) (Danke-Empire)
341 pages
Ragual R
No ratings yet
Ragual R
30 pages
Optimizing Periodic Maintenance Operations For Schindler Elevator Corp
No ratings yet
Optimizing Periodic Maintenance Operations For Schindler Elevator Corp
14 pages
4670 Lecture4 Profile Privilege
No ratings yet
4670 Lecture4 Profile Privilege
57 pages
Lecture Four - Measures of Dispersion
No ratings yet
Lecture Four - Measures of Dispersion
6 pages
Restaurant Management System
33% (3)
Restaurant Management System
55 pages
International Journal of Engineering Science and Computing-My Stockroom-E-Portal
No ratings yet
International Journal of Engineering Science and Computing-My Stockroom-E-Portal
3 pages
Boyce-Codd Normal
No ratings yet
Boyce-Codd Normal
6 pages
Numerai Competition EDA
No ratings yet
Numerai Competition EDA
8 pages
Microsoft Access 2013: Querying A Database
No ratings yet
Microsoft Access 2013: Querying A Database
58 pages
3 DM Classification
No ratings yet
3 DM Classification
55 pages
DataStage Parallel Job Warnings-Its Solutions
No ratings yet
DataStage Parallel Job Warnings-Its Solutions
22 pages
Defining Users and Configuring Security-R17
No ratings yet
Defining Users and Configuring Security-R17
20 pages
Mernstack Assignment PDF
No ratings yet
Mernstack Assignment PDF
43 pages
PDF Exploratory Data Analysis Using R 1st Edition Ronald K. Pearson download
100% (10)
PDF Exploratory Data Analysis Using R 1st Edition Ronald K. Pearson download
66 pages
Implementing A Recommender System With Graph Database: Seminar
No ratings yet
Implementing A Recommender System With Graph Database: Seminar
26 pages
Adithya KG
No ratings yet
Adithya KG
2 pages
Build An Online Survey Using Dreamweaver
No ratings yet
Build An Online Survey Using Dreamweaver
9 pages
Module Configuration Software: Easy and Fast Configuration
No ratings yet
Module Configuration Software: Easy and Fast Configuration
2 pages
KPS - LIst of Programs For Practical File Term 2
No ratings yet
KPS - LIst of Programs For Practical File Term 2
4 pages
Lab 3 - Query Examples 3-31: Wonderware System Platform Course - Part 2
No ratings yet
Lab 3 - Query Examples 3-31: Wonderware System Platform Course - Part 2
10 pages
Using MaxDB
No ratings yet
Using MaxDB
70 pages
Dsa Lab Manual
No ratings yet
Dsa Lab Manual
77 pages
Borland C Builder 6 Developer s Guide 2nd Edition Jarrod Hollingworth download
100% (2)
Borland C Builder 6 Developer s Guide 2nd Edition Jarrod Hollingworth download
55 pages
27 tar gzip command
No ratings yet
27 tar gzip command
3 pages
Power Bi Developer Resume
100% (1)
Power Bi Developer Resume
7 pages
Agentforce-Specialist
No ratings yet
Agentforce-Specialist
30 pages
Student Portal
No ratings yet
Student Portal
40 pages
10 - Sqlite
No ratings yet
10 - Sqlite
46 pages

Columnar Database

Uploaded by

Columnar Database

Uploaded by

Columnar Database - HBase

Dr. Richa Sharma

 This allows efficient data retrieval, especially when aggregate

 Columnar storage enables better data compression due to the

 HBase, Cassandra, Amazon Redshift are examples of columnar

 Columnar databases can use traditional SQL to load data and

 Let’s assume a snapshot of a table as:

Attr1 Attr2 Attr3

 Columnar storage of this table will consider the data as:

 Row-oriented storage of this table will store the data as:

 Column-based DBMS use self-indexing mechanism, which uses less disk

 Relational model focuses on structured data and adheres to the principles

 In a columnar database, only the columns relevant to a query need to

 Columnar databases are well-suited for vectorized operations, where

 Columnar databases store sparse data efficiently. In scenarios where

 Columnar databases are generally easier to scale horizontally, which

 Columnar databases have higher overhead for writing data as

 Not suitable for row-oriented databases’ like SQL queries or

 High learning curve for developers and cost considerations as

 HBase is a distributed, column-oriented database that is very

 HBase is based on Bigtable, a high-performance, proprietary

 HBase is written in Java.

 HBase integrates seamlessly with Apache Hadoop and runs on

◦ Having these features available in the database box implies that

◦ HBase guarantees atomicity at the row level, which means that

◦ The fact that HBase guarantees strong consistency, makes it

 HBase lives in the Hadoop ecosystem, where it benefits from

 In HBase, row is a collection of column families. A column

 A table in HBase is basically a big map - more accurately, a

 An HBase table might look like if it were a Python dictionary:

 Put command allows adding data to HBase table. Example:

 Scan operations simply return all rows in the entire table.

 A better solution would be to have some scripts ready to

 Most of the Big Data for which HBase is the best

 HBase supports two compression algorithms: Gzip (GZ) and

 HBase features Bloom filters as a faster way of determining

 Nature of problem and usage of database – problems where

 Unique characteristic of database – similar to relational DB

 Communication interface of database – HBase provides

 Scalability – Highly scalable for Big Data with good

 Security – One can enable SSH security at HBase cluster level.

 Durability – HBase can gracefully recover from individual

 Database Replication – HBase does support cluster-to-cluster

You might also like