0% found this document useful (0 votes)
11 views108 pages

Chapter 3 - Data Storage and Processing Systems

Uploaded by

Huyền Thu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views108 pages

Chapter 3 - Data Storage and Processing Systems

Uploaded by

Huyền Thu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 108

CHAPTER 3

DATA STORAGE AND


ROCESSING SYSTEMS
THS. NGUYỄN ĐÌNH THỌ
[email protected]

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 1


CONTENTS
Distributed storage system and
memory consistency NFS & AFS

Storage systems HDFS & GFS

NoSQL Database

BigData & Cloud Computing


DISTRIBUTED STORAGE SYSTEM AND
MEMORY CONSISTENCY

• NFS
• AFS

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 3


NFS- NETWORK FILE SYSTEM

Network File System (NFS) is a distributed file system protocol


originally developed by Sun Microsystems in 1984, allowing a
user on a client computer to access files over a computer
network much like local storage is accessed. NFS, like many
other protocols, builds on the Open Network Computing
Remote Procedure Call (ONC RPC) system. The NFS is an open
standard defined in Request for Comments (RFC), allowing
anyone to implement the protocol
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 4
https://en.wikipedia.org/wiki/Network_File_System#Platforms
TIMELINE OF NFS PROTOCOLS

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 5


THE NFS ARCHITECTURE

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 6


IMPLEMENTATIONS:

There are three ways to implement network file system:


• Upper kernel layer
• Lower kernel layer
• Middle kernel layer (vnode layer)
Important aspect of NFS implementation – implementing
effective cache mechanism to boost performance.

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 7


VFS-VIRTUAL FILE SYSTEM

• NFS is implemented using the Virtual File System abstraction,


which is now used for lots of different operating systems:
• Essence: VFS provides standard file system interface, and
allows to hide difference between accessing local or remote
file system.

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 8


VFS-VIRTUAL FILE SYSTEM

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 9


RPCS IN FILE SYSTEM

• Many (traditional) distributed file systems deploy remote


procedure calls to access files. When wide-area networks
need to be crossed, alternatives need to be exploited.

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 10


RPCS IN FILE SYSTEM

(a) Reading data from a file in NFS version 3.


CLOUD COPUTING LECTURE- THS NGUYEN DINH THO (b) Reading data using a compound procedure in version 4.11
DISCUSS -HOW TO READ AND WRITE
FILES IN NFS?

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 12


AFS- ANDREW FILE SYSTEM

• The Andrew File System (AFS) is a distributed file system


which uses a set of trusted servers to present a
homogeneous, location-transparent file name space to all the
client workstations. It was developed by Carnegie Mellon
University as part of the Andrew Project. AFS is named after
Andrew Carnegie and Andrew Mellon. Its primary use is in
distributed computing.

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 13

https://en.wikipedia.org/wiki/Andrew_File_System
AFS ARCHITECTURE

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 14


LOCAL CACHING

• File reads/writes operate on locally cached copy


• Local copy sent back to master when file is closed
• Open local copies are notified of external updates through
callbacks
Tradeoffs:
• Shared database files do not work well on this system
• Does not support write-through to shared medium
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 15
STORAGE SYSTEMS HDFS & GFS

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 16


HDFS- HADOOP FILE SYSTEM

• HDFS is a distributed file system that is fault tolerant,


scalable and extremely easy to expand.
• HDFS is the primary distributed storage for Hadoop
applications.
• HDFS provides interfaces for applications to move
themselves closer to data.
• HDFS is designed to ‘just work’, however a working
knowledge helps in diagnostics and improvements.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 17
HDFS ARCHITECTURE

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 18


COMPONENTS OF HDFS

• NameNode :
Is the heart of an HDFS filesystem, it maintains and manages
the file system metadata. E.g; what blocks make up a file, and
on which datanodes those blocks are stored.
• DataNode :
Where HDFS stores the actual data, there are usually quite a
few of these.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 19
HDFS ARCHITECTURE

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 20


UNIQUE FEATURES OF HDFS

• Failure tolerant : data is duplicated across multiple DataNodes to protect


against machine failures. The default is a replication factor of 3 (every
block is stored on three machines)
• Scalability : Data transfers happen directly with the DataNodes so your
read/write capacity scales fairly well with the number of DataNodes
• Space : Need more disk space? Just add more DataNodes and re-balance
• Industry standard - Other distributed applications are built on top of
HDFS (HBase, Map-Reduce)

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 21


UNIQUE FEATURES OF HDFS

• HDFS is designed to process large data


sets with write-once-read-many
semantics, it is not for low latency
access

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 22


HDFS – DATA ORGANIZATION

• Each file written into HDFS is split into data blocks


• Each block is stored on one or more nodes
• Each copy of the block is called replica

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 23


HDFS – DATA ORGANIZATION

• Block placement policy


First replica is placed on the local node
Second replica is placed in a different rack
Third replica is placed in the same rack as the
second replica

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 24


READ OPERATION IN HDFS

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 25


WRITE OPERATION IN HDFS

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 26


GFS- GOOGLE FILE SYSTEM

• Google File System (GFS or GoogleFS) is a proprietary


distributed file system developed by Google to provide
efficient, reliable access to data using large clusters of
commodity hardware. A new version of Google File System
code named Colossus was released in 2010

https://en.wikipedia.org/wiki/Google_File_System
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 27
ARCHITECTURE

• Files are divided into chunks


• Fixed-size chunks (64MB)
• Replicated over chunkservers, called replicas
• Unique 64-bit chunk handles
• Chunks as Linux files

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 28


ARCHITECTURE

• Single master
• Multiple chunkservers
• Grouped into Racks
• Connected through switches
• Multiple clients
• Master/chunkserver coordination
• HeartBeat messages

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 29


ARCHITECTURE

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 30


NOSQL DATABASE

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 31


WHAT IS SQL DATABASE ?

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 32


DATABASE

• A database is an organized collection of data, stored and


accessed electronically. Database designers typically
organize the data to model aspects of reality in a way that
supports processes requiring information, such as (for
example) modelling the availability of rooms in hotels in a
way that supports finding a hotel with vacancies.

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 33


DATABASE MANAGEMENT SYSTEM

• The database management system (DBMS) is the software that


interacts with end users, applications, and the database itself to
capture and analyze data. A general-purpose DBMS allows the
definition, creation, querying, update, and administration of
databases. A database is generally stored in a DBMS-specific
format which is not portable, but different DBMSs can share data
by using standards such as SQL and ODBC or JDBC.
• https://en.wikipedia.org/wiki/Database
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 34
INTRODUCTION

• In the computing system (web and business applications), there are


enormous data that comes out every day from the web. A large
section of these data is handled by Relational database management
systems (RDBMS). The idea of relational model came with E.F.Codd’s
1970 paper "A relational model of data for large shared data banks"
which made data modeling and application programming much easier.
Beyond the intended benefits, the relational model is well-suited to
client-server programming and today it is predominant technology for
storing structured data in web and business applications
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 35
WHAT RULES CLASSICAL RELATION
DATABASE FOLLOW?

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 36


CLASSICAL RELATION DATABASE FOLLOW
THE ACID RULES

• A database transaction, must be atomic, consistent, isolated


and durable.
Atomic (nguyên tử): A transaction is a logical unit of work
which must be either completed with all of its data
modifications, or none of them is performed.
Consistent (nhất quán): At the end of the transaction, all
data must be left in a consistent state.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 37
CLASSICAL RELATION DATABASE FOLLOW
THE ACID RULES

Isolated (cô lập): Modifications of data performed by a


transaction must be independent of another transaction.
Unless this happens, the outcome of a transaction may be
erroneous.
Durable (bền vững): When the transaction is completed,
effects of the modifications performed by the transaction must
be permanent in the system

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 38


DISTRIBUTED SYSTEMS

• A distributed system consists of multiple computers and


software components that communicate through a computer
network (a local network or by a wide area network). A
distributed system can consist of any number of possible
configurations, such as mainframes, workstations, personal
computers, and so on.The computers interact with each other
and share the resources of the system to achieve a common
goal.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 39
ADVANTAGES OF DISTRIBUTED
COMPUTING

• Reliability (tin cậy) (fault tolerance-chịu lỗi) :


The important advantage of distributed computing system is
reliability. If some of the machines within the system crash, the
rest of the computers remain unaffected and work does not
stop.
• Scalability :
In distributed computing the system can easily be expanded by
adding more machines as needed.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 40
ADVANTAGES OF DISTRIBUTED
COMPUTING

• Sharing of Resources :
Shared data is essential to many applications such as banking,
reservation system. As data or resources are shared in
distributed system, other resources can be also shared (e.g.
expensive printers).
• Flexibility (linh hoạt) :
As the system is very flexible, it is very easy to install,
implement and debug new services.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 41
ADVANTAGES OF DISTRIBUTED
COMPUTING
• Speed :
A distributed computing system can have more computing power and
it's speed makes it different than other systems.
• Open system :
As it is open system, every service is equally accessible to every client
i.e. local or remote.
• Performance :
The collection of processors in the system can provide higher
performance (and better price/performance ratio) than a centralized
computer.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 42
DISADVANTAGES OF DISTRIBUTED
COMPUTING

• Troubleshooting (xử lí sự cố):


Troubleshooting and diagnosing problems.
• Software :
Less software support is the main disadvantage of
distributed computing system.

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 43


DISADVANTAGES OF DISTRIBUTED
COMPUTING

• Networking :
The network infrastructure can create several problems such
as transmission problem, overloading, loss of messages.
• Security :
Easy access in distributed computing system increases the
risk of security and sharing of data generates the problem of
data security

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 44


WHAT IS NOSQL?

• NoSQL is a non-relational database management systems,


different from traditional relational database management
systems in some significant ways. It is designed for
distributed data stores where very large scale of data
storing needs (for example Google or Facebook which
collects terabits of data every day for their users). These
type of data storing may not require fixed schema, avoid
join operations and typically scale horizontally.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 47
BRIEF HISTORY OF NOSQL
• The term NoSQL was coined by Carlo Strozzi in the year 1998.
He used this term to name his Open Source, Light Weight,
DataBase which did not have an SQL interface.
• In the early 2009, when last.fm wanted to organize an event on
open-source distributed databases, Eric Evans, a Rackspace
employee, reused the term to refer databases which are non-
relational, distributed, and does not conform to atomicity,
consistency, isolation, durability - four obvious features of
traditional relational database systems.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 48
BRIEF HISTORY OF NOSQL
• In the same year, the "no:sql(east)" conference held in
Atlanta, USA, NoSQL was discussed and debated a lot.
• And then, discussion and practice of NoSQL got a
momentum, and NoSQL saw an unprecedented growth.

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 49


WHY NOSQL?
• In today’s time data is becoming easier to access and capture
through third parties such as Facebook, Google+ and others.
• Personal user information, social graphs, geo location data,
user-generated content and machine logging data are just a few
examples where the data has been increasing exponentially.
• To avail the above service properly, it is required to process
huge amount of data. Which SQL databases were never
designed. The evolution of NoSql databases is to handle these
huge data properly.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 50
WHY NOSQL?

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 51


RDBMS VS NOSQL ?

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 52


RDBMS VS NOSQL
RDBMS NoSQL

- Structured and organized - Stands for Not Only SQL


data - No declarative query
- Structured query language language
(SQL) - No predefined schema
- Fixed - schema - Key-Value pair storage,
- Data and its relationships Column Store, Document Store,
are stored in separate tables. Graph databases
- Data Manipulation - Eventual consistency(nhất
Language, Data Definition quán cuối cùng) rather ACID
Language property
-CLOUD
Tight Consistency
COPUTING LECTURE- THS NGUYEN DINH THO
- Unstructured and 53
unpredictable data
- CAP Theorem
CAP THEOREM (BREWER’S THEOREM)
CAP theorem states that there are three basic requirements which exist in a
special relation when designing applications for a distributed architecture.
• Consistency - This means that the data in the database remains consistent
after the execution of an operation. For example after an update operation all
clients see the same data.
Điều này có nghĩa là dữ liệu trong cơ sở dữ liệu vẫn nhất quán sau khi thực hiện
một thao tác. Ví dụ sau khi một hoạt động cập nhật, tất cả các máy khách sẽ
thấy cùng một dữ liệu.
• Availability - This means that the system is always on (service guarantee
availability), no downtime.
Điều này có nghĩa là hệ thống luôn bật (tính khả dụng của dịch vụ), không có
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 54
thời gian chết.
CAP THEOREM (BREWER’S THEOREM)

• Partition Tolerance - This means that the system continues


to function even the communication among the servers is
unreliable, i.e. the servers may be partitioned into multiple
groups that cannot communicate with one another.
- Điều này có nghĩa là hệ thống tiếp tục hoạt động ngay cả khi
giao tiếp giữa các máy chủ không đáng tin cậy, tức là các máy
chủ có thể được phân đoạn thành nhiều nhóm không thể giao
tiếp với nhau.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 55
CAP THEOREM (BREWER’S THEOREM)

• In theoretically it is impossible to fulfill all 3


requirements. CAP provides the basic requirements
for a distributed system to follow 2 of the 3
requirements. Therefore all the current NoSQL
database follow the different combinations of the C,
A, P from the CAP theorem. Here is the brief
description of three combinations CA, CP, AP :
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 56
CAP THEOREM (BREWER’S THEOREM)

• CA: Single site cluster, therefore all nodes are always in


contact. When a partition occurs, the system blocks.
• CP : Some data may not be accessible, but the rest is still
consistent/accurate.
• AP : System is still available under partitioning, but some of
the data returned may be inaccurate.

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 57


CAP THEOREM (BREWER’S THEOREM)

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 58


NOSQL PROS/CONS

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 59


NOSQL PROS/CONS
Advantages :
• High scalability
• Distributed Computing
• Lower cost
• Schema flexibility, semi-structure data(dữ liệu bán
cấu trúc)
• No complicated Relationships
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 60
NOSQL PROS/CONS

Disadvantages
• No standardization
• Limited query capabilities (so far)
• Eventual consistent is not intuitive to program for

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 61


THE BASE

• The CAP theorem states that a distributed computer system


cannot guarantee all of the following three properties at the
same time C-A-P

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 62


THE BASE

• Basically Available indicates that the system does guarantee


availability, in terms of the CAP theorem.
• Soft state indicates that the state of the system may change
over time, even without input. This is because of the eventual
consistency model.
• Eventual(nhất quán cuối cùng) consistency indicates that the
system will become consistent over time, given that the
system doesn't receive input during that time.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 63
ACID VS BASE
ACID BASE
Atomic Basically Available
Consistency Soft state
Isolation Eventual consistency
Durable

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 64


NOSQL CATEGORIES

• There are four general types (most common categories)


of NoSQL databases. Each of these categories has its
own specific attributes and limitations. There is not a
single solutions which is better than all the others,
however there are some databases that are better to
solve specific problems. To clarify the NoSQL databases,
lets discuss the most common categories :
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 65
NOSQL CATEGORIES

• Key-value stores
• Column-oriented
• Graph
• Document oriented

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 66


KEY-VALUE STORES

• Key-value stores are most basic types of NoSQL


databases.
• Designed to handle huge amounts of data.
• Based on Amazon’s Dynamo paper.

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 67


KEY-VALUE STORES

• Key value stores allow developer to store schema-less


data.
• In the key-value storage, database stores data as hash
table where each key is unique and the value can be
string, JSON, BLOB (Binary Large OBjec) etc.
• A key may be strings, hashes, lists, sets, sorted sets and
values are stored against these keys.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 68
KEY-VALUE STORES

• For example a key-value pair might consist of a key like


"Name" that is associated with a value like "Robin".
• Key-Value stores can be used as collections, dictionaries,
associative arrays etc.
• Key-Value stores follow the 'Availability' and 'Partition'
aspects of CAP theorem.

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 69


KEY-VALUE STORES

• Key-Values stores would work well for shopping cart contents,


or individual values like color schemes, a landing page URI,
or a default account number.
• Example of Key-value store DataBase : Redis, Dynamo,
Riak

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 70


KEY-VALUE STORES

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 71


KEY-VALUE STORES

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 72


KEY-VALUE STORES

 Keys are mapped to (possibly) more complex value (e.g.,


lists)
 Keys can be stored in a hash table and can be distributed
easily
 Such stores typically support regular CRUD (Create, Read,
Update, and Delete) operations
 That is, no joins and aggregate functions
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 73
COLUMN-ORIENTED DATABASES

• Column-oriented databases primarily work on columns and


every column is treated individually.
• Values of a single column are stored contiguously.
• Column stores data in column specific files.
• In Column stores, query processors work on columns too.
• Example of Column-oriented databases : BigTable,
Cassandra, SimpleDB etc.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 74
COLUMN-ORIENTED DATABASES

• All data within each column datafile have the same type which
makes it ideal for compression.
• Column stores can improve(cải thiện) the performance of queries as
it can access specific column data.
• High performance on aggregation queries (e.g. COUNT, SUM, AVG,
MIN, MAX).
• Works on data warehouses and business intelligence, customer
relationship management (CRM), Library card catalogs etc.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 75
COLUMN-ORIENTED DATABASES

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 76


COLUMN-ORIENTED DATABASES
• Columnar databases are a hybrid of RDBMSs and Key-
Value stores
• Values are stored in groups of zero or more columns,
but in Column-Order (as opposed to Row-Order)

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 77


GRAPH THEORY ??

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 78


GRAPH THEORY ??

• In mathematics, graph theory is the study of graphs, which are mathematical


structures used to model pairwise relations between objects. A graph in this
context is made up of vertices, nodes, or points which are connected by edges,
arcs, or lines. A graph may be undirected, meaning that there is no distinction
between the two vertices associated with each edge, or its edges may be
directed from one vertex to another; see Graph (discrete mathematics) for
more detailed definitions and for other variations in the types of graph that are
commonly considered. Graphs are one of the prime objects of study in discrete
mathematics.
• https://en.wikipedia.org/wiki/Graph_theory
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 79
GRAPH DATABASES

• A graph data structure consists of a finite (and possibly


mutable) set of ordered pairs, called edges or arcs, of certain
entities called nodes or vertices

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 80


GRAPH DATABASES

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 81


WHAT IS A GRAPH DATABASES ?

• A graph database stores data in a graph.


• It is capable of elegantly representing any kind of data in a
highly accessible way.
• A graph database is a collection of nodes and edges
• Each node represents an entity (such as a student or
business) and each edge represents a connection or
relationship between two nodes.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 82
WHAT IS A GRAPH DATABASES ?

• Every node and edge are defined by a unique identifier.


• Each node knows its adjacent nodes.
• As the number of nodes increases, the cost of a local step (or
hop) remains the same.
• Index for lookups.
• E.g., Neo4j and VertexDB, OrientDB, Titan..
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 83
WHAT IS A GRAPH DATABASES ?

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 84


COMPARISON BETWEEN THE CLASSIC
RELATIONAL MODEL AND THE GRAPH
MODEL?

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 85


COMPARISON BETWEEN THE CLASSIC
RELATIONAL MODEL AND THE GRAPH
MODEL
Relational model Graph model

Tables Vertices and Edges set


Rows Vertices
Columns Key/value pairs
Joins Edges

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 86


DOCUMENT ORIENTED DATABASES

• A collection of documents
• Data in this model is stored inside documents.
• A document is a key value collection where the key allows access to its value.
• Documents are not typically forced to have a schema and therefore are
flexible and easy to change.
• Documents are stored into collections in order to group different kinds of data.
• Documents can contain many different key-value pairs, or key-array pairs, or
even nested documents.

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 87


COMPARISON BETWEEN THE CLASSIC
RELATIONAL MODEL AND THE DOCUMENT
MODEL

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 88


COMPARISON BETWEEN THE CLASSIC
RELATIONAL MODEL AND THE DOCUMENT
MODEL
Relational model Document model
Tables Collections
Rows Documents
Columns Key/value pairs
Joins not available
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 89
DOCUMENT ORIENTED DATABASES

• Example of Document Oriented databases : MongoDB,


CouchDB

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 90


DOCUMENT ORIENTED DATABASES

• Documents are stored in some standard format or encoding (e.g.,


XML, JSON, PDF or Office Documents)
• These are typically referred to as Binary Large Objects (BLOBs)
• Documents can be indexed
• This allows document stores to outperform traditional file systems
• E.g., MongoDB and CouchDB (both can be queried using
MapReduce)
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 91
THERE IS A LARGE NUMBER OF
COMPANIES USING NOSQL

• Google • LinkedIn
• Facebook • Digg
• Mozilla • McGraw-Hill Education
• Adobe • Vermont Public Radio
• Foursquare
https://www.w3resource.com/mongodb/
nosql.php
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 92
BIGDATA & CLOUD COMPUTING

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 93


WHAT IS BIG DATA?

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 94


BIGDATA

• Big Data is for large data sets (volume, variety, velocity,


variability, veracity, complexity) complex to analyze, capture,
cure, search, share, store, transfer, visualize, and manage
their privacy.
• Big Data Is About Massive Data Volume
• Big Data Means Unstructured Data
• Big Data Is for Social Media & Sentiment Analysis
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 95
BIGDATA

• Log data collected from hundreds for distributed servers


• Different file format (each CDN- Content Delivery Network has its
own format)
• Different file sizes: from 20MB to few Bytes.
• Big data uses inductive statistics and concepts from nonlinear
system identification to infer laws (regressions, nonlinear
relationships, and causal effects) from large sets of data to reveal
relationships, dependencies and perform predictions.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 96
BIGDATA

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 97


BIG DATA USE CASES (IBM)

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 98


AN ABUNDANCE OF TECHNOLOGIES …

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 99


BIG DATA TECHNOLOGIES …

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 100


BIG IMPACT OF BIG DATA

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 101


BIG DATA MODELS AND ALGORITHMS

• Foundational Models
• Algorithms and Programming Techniques
• Analytics and Metrics
• Representation Formats for Multimedia Big Data

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 102


BIG DATA ARCHITECTURES

• Big Data as a Service


• Cloud Computing Techniques for Big Data
• Big Data Open Platforms
• Big Data in Mobile and Pervasive Computing

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 103


BIG DATA SEARCH AND MINING

• Algorithms and Systems for Big Data Search


• Distributed, and Peer-to-peer Search
• Machine learning based on Big Data
• Visualization Analytics for Big Data

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 104


BIG DATA MANAGEMENT

• Big Data Persistence and Preservation


• Big Data Quality and Provenance Control
• Management Issues of Social Network Big Data

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 105


BIG DATA PROTECTION, INTEGRITY AND
PRIVACY

• Models and Languages for Big Data Protection


• Privacy Preserving Big Data Analytics
• Big Data Encryption

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 106


SECURITY APPLICATIONS OF BIG DATA

• Anomaly Detection in Very Large Scale Systems


• Collaborative Threat Detection using Big Data Analytics

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 107


BIG DATA FOR ENTERPRISE AND
SOCIETY
• Big Data Economics • Scientific Applications of Big Data
• Value Creation through Big • Large-scale Social Media and
Data Analytics Recommendation Systems
• Big Data in Enterprise
• Big Data for Business Model
Management Models and Practices
Innovation
• Big Data in Government
• Big Data in Business Management Models and Practices
Performance Management • Big Data in Smart Planet Solutions
• SME-centric Big Data Analytics • Big Data for Enterprise
• Big Data for Verticals Transformation

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 108


HTTPS://
WWW.RESEARCHGATE.NET/PROFILE/BE
RNICE_PURCELL/PUBLICATION/256888
844_BIG_DATA_USING_CLOUD_COMPUT
ING/LINKS/0DEEC52406F6FC4DF80000
00/BIG-DATA-USING-CLOUD-COMPUTIN
G.PDF

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 109


NOW- DISCUSS

CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 110

You might also like