Chapter 3 - Data Storage and Processing Systems
Chapter 3 - Data Storage and Processing Systems
NoSQL Database
• NFS
• AFS
https://en.wikipedia.org/wiki/Andrew_File_System
AFS ARCHITECTURE
• NameNode :
Is the heart of an HDFS filesystem, it maintains and manages
the file system metadata. E.g; what blocks make up a file, and
on which datanodes those blocks are stored.
• DataNode :
Where HDFS stores the actual data, there are usually quite a
few of these.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 19
HDFS ARCHITECTURE
https://en.wikipedia.org/wiki/Google_File_System
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 27
ARCHITECTURE
• Single master
• Multiple chunkservers
• Grouped into Racks
• Connected through switches
• Multiple clients
• Master/chunkserver coordination
• HeartBeat messages
• Sharing of Resources :
Shared data is essential to many applications such as banking,
reservation system. As data or resources are shared in
distributed system, other resources can be also shared (e.g.
expensive printers).
• Flexibility (linh hoạt) :
As the system is very flexible, it is very easy to install,
implement and debug new services.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 41
ADVANTAGES OF DISTRIBUTED
COMPUTING
• Speed :
A distributed computing system can have more computing power and
it's speed makes it different than other systems.
• Open system :
As it is open system, every service is equally accessible to every client
i.e. local or remote.
• Performance :
The collection of processors in the system can provide higher
performance (and better price/performance ratio) than a centralized
computer.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 42
DISADVANTAGES OF DISTRIBUTED
COMPUTING
• Networking :
The network infrastructure can create several problems such
as transmission problem, overloading, loss of messages.
• Security :
Easy access in distributed computing system increases the
risk of security and sharing of data generates the problem of
data security
Disadvantages
• No standardization
• Limited query capabilities (so far)
• Eventual consistent is not intuitive to program for
• Key-value stores
• Column-oriented
• Graph
• Document oriented
• All data within each column datafile have the same type which
makes it ideal for compression.
• Column stores can improve(cải thiện) the performance of queries as
it can access specific column data.
• High performance on aggregation queries (e.g. COUNT, SUM, AVG,
MIN, MAX).
• Works on data warehouses and business intelligence, customer
relationship management (CRM), Library card catalogs etc.
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 75
COLUMN-ORIENTED DATABASES
• A collection of documents
• Data in this model is stored inside documents.
• A document is a key value collection where the key allows access to its value.
• Documents are not typically forced to have a schema and therefore are
flexible and easy to change.
• Documents are stored into collections in order to group different kinds of data.
• Documents can contain many different key-value pairs, or key-array pairs, or
even nested documents.
• Google • LinkedIn
• Facebook • Digg
• Mozilla • McGraw-Hill Education
• Adobe • Vermont Public Radio
• Foursquare
https://www.w3resource.com/mongodb/
nosql.php
CLOUD COPUTING LECTURE- THS NGUYEN DINH THO 92
BIGDATA & CLOUD COMPUTING
• Foundational Models
• Algorithms and Programming Techniques
• Analytics and Metrics
• Representation Formats for Multimedia Big Data