Advanced Database Chapter 6 Distributed database
Advanced Database Chapter 6 Distributed database
Chapter 6:
Distributed Databases and Client-
Server Architectures
Outline
Distributed Database Concepts
Data Fragmentation, Replication and Allocation
Types of Distributed Database Systems
Query Processing in Distributed Databases
Concurrency Control and Recovery
Client-Server Architecture
2
Distributed Database Concepts
A distributed database (DDB) is a collection of
multiple logically related databases distributed over a
computer network.
A transaction can be executed by multiple
3
Distributed Database System
Advantages
Management of distributed data with different
levels of transparency:
This refers to the physical placement of data
(files, relations, etc.) which is not known to the
user (distribution transparency).
4
Distributed Database System(cont…)
Example:
5
Distributed Database System(cont…)
Advantages (cont…)
Replication transparency:
It allows to store copies of a data at multiple sites
for better availability.
Fragmentation transparency:
Allows to fragment a relation horizontally (create
a subset of tuples of a relation) or vertically (create
a subset of columns of a relation)
6
Distributed Database System(cont…)
Advantages (cont...)
Increased availability:
Availability is the probability that the system is
continuously available (usable or accessible)
during a time interval.
A distributed database system has multiple nodes
(computers) and if one fails then others are
available to do the job.
7
Distributed Database System(cont…)
Other Advantages (cont…)
Improved performance:
A distributed DBMS fragments the database to
keep data closer to where it is needed most
This reduces data management overhead (access
and modification time) significantly
Easier expansion (scalability):
Refers to expansion of the system in terms of
adding more data, increasing database sizes or
adding more processors
8
Data Fragmentation, Replication
and Allocation
Data Fragmentation
Split a relation into logically related and correct parts.
9
Data Fragmentation, Replication and
Allocation(cont…)
Vertical fragmentation
It is a subset of a relation which is created by a
10
cont.….
Mixed (Hybrid) fragmentation
A combination of Vertical fragmentation and
Horizontal fragmentation
This is achieved by SELECT-PROJECT operations.
11
Data Replication and Allocation
Data Replication refers the distribution of whole or
part of the data to a number of sites
Useful in improving availability of data & Improve
12
Data Replication and Allocation(cont…)
Data Replication and allocation
The disadvantage of full replication is that it can slow
13
Types of Distributed Database Systems
Homogeneous Window
Site 5 Unix
All sites of the
Oracle Site 1
database system Window
Oracle
14
Types of Distributed Database Systems
Heterogeneous
Federated: Each site may run different database
Network
Object DBMS
Oriented Site 3 Site 2 Relational
Linux Linux
15
Types of Distributed Database Systems
The type of heterogeneity present in FDBSs may arise
from several sources:
Differences in data models:
Relational, Objected oriented, hierarchical, network,
etc.
Differences in constraints:
Each site may have their own data accessing and
processing constraints.
Differences in query language:
Even with the data model, the language and their
version may vary. SQL has multiple versions.
16
Query Processing in Distributed
Databases
Issues
Cost of transferring data (files and results) over the
network.
This cost is usually high. So, some optimization is
necessary.
Example: Suppose we have the Employee relation at site 1
and Department relation at Site 2
Employee at site 1. 10,000 rows. Row size = 100 bytes.
This means, table size = 106 bytes.
Department at Site 2. 100 rows. Row size = 35 bytes.
This means, table size = 3,500 bytes.
17
Query Processing in Distributed
Databases (cont…)
Issues (cont…)
Query Q : For each employee, retrieve employee name and
Employee Fname Minit Lname SSN Bdate Address Sex Slary Superssn Dno
18
Query Processing in Distributed
Databases (cont…)
Result
If every employee is related to a department, the result
Site 2 Site 3
Department
19
Query Processing in Distributed
Databases (cont…)
Strategies (Available options):
1. Transfer Employee and Department to site 3.
Total transfer bytes = 1,000,000 + 3500 = 1,003,500
bytes.
2. Transfer Employee to site 2, execute join at site 2 and
send the result to site 3.
Total transfer size = 1,000,000 + 400,000 = 1,400,000
bytes.
3. Transfer Department relation to site 1, execute the join at
site 1, and send the result to site 3
Total bytes transferred = 3500 + 400,000 = 403,500
bytes.
Optimization criteria: minimizing data transfer.
Preferred strategy: strategy 3.
20
Query Processing in Distributed
Databases (cont…)
Now suppose the result site is 2.
Possible strategies :
1. Transfer Employee relation to site 2, execute the
query and present the result to the user at site 2
Total transfer size = 1,000,000 bytes for Q.
2. Transfer Department relation to site 1, execute join
at site 1 and send the result back to site 2
Total transfer size for Q:
3500 +400,000 = 403,500 bytes
21
Concurrency Control and Recovery
Distributed Databases encounter a number of
concurrency control and recovery problems which are
not present in centralized databases.
Some of these problems are listed below:
Dealing with multiple copies of data items
Distributed commit
Distributed deadlock
22
Concurrency Control and Recovery
(cont…)
Details
Dealing with multiple copies of data items:
The concurrency control must maintain global
consistency.
Likewise, the recovery mechanism must recover all
copies and maintain consistency after recovery.
Failure of individual sites:
Database availability must not be affected due to the
failure of one or two sites and the recovery scheme
must recover them before they are available for use.
23
Concurrency Control and
Recovery (cont…)
(Details….)
Communication link failure:
This failure may create network partition which
24
Concurrency Control and
Recovery (cont…)
(Details….)
Distributed deadlock:
Since transactions are processed at multiple sites,
25
Concurrency Control and
Recovery (cont…)
Distributed Concurrency control
Primary site technique: A single site is assigned
Primary site
Site 5
Site 1
Site 3 Site 2
26
Concurrency Control and
Recovery
Transaction management:
Concurrency control and commit are managed by
this site
All locks are kept at that site and all requests for
27
Concurrency Control and
Recovery (cont…)
Advantages:
It is an extension to the centralized two phase
locking and hence simple to Implement and
manage
Data items are locked only at one site but they
can be accessed at any site at which they reside
Disadvantages:
All transaction management activities go to
primary site which is likely to overload the site.
If the primary site fails, the entire system is
inaccessible
28
Concurrency Control and
Recovery (cont…)
Primary site with backup site
To aid recovery, a backup site is designated which
as primary site.
29
Client-Server Database Architecture
It consists of clients running client software, a set of
servers which provide all database functionalities and
a reliable communication infrastructure.
Server 1 Client 1
Client 2
Server 2 Client 3
Server n Client n
30
Client-Server Database Architecture
Server: is responsible for local data management at a
site, much like centralized DBMS software.
Client: is responsible for most of the distribution
function; it accesses data distribution information from
the DBMS catalog and processes all requests that
require access to more than one site.
The communication software manages communication
among clients and servers
31
Client-Server Database Architecture
The processing of a SQL queries goes as follows:
Client parses a user query and decomposes it into a
to the client.
The client combines the results of sub queries and
32
Thank you….
Question?
33