0% found this document useful (0 votes)
4 views

Advanced Database Chapter 6 Distributed database

Chapter 6 discusses distributed databases and client-server architectures, covering concepts such as data fragmentation, replication, and allocation. It highlights the advantages of distributed databases, including increased availability and improved performance, while also addressing challenges like concurrency control and recovery. The chapter concludes with an overview of client-server architecture, detailing the roles of clients and servers in managing data distribution and query processing.

Uploaded by

leulz3000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Advanced Database Chapter 6 Distributed database

Chapter 6 discusses distributed databases and client-server architectures, covering concepts such as data fragmentation, replication, and allocation. It highlights the advantages of distributed databases, including increased availability and improved performance, while also addressing challenges like concurrency control and recovery. The chapter concludes with an overview of client-server architecture, detailing the roles of clients and servers in managing data distribution and query processing.

Uploaded by

leulz3000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 33

Advanced Database

Chapter 6:
Distributed Databases and Client-
Server Architectures
Outline
 Distributed Database Concepts
 Data Fragmentation, Replication and Allocation
 Types of Distributed Database Systems
 Query Processing in Distributed Databases
 Concurrency Control and Recovery
 Client-Server Architecture

2
Distributed Database Concepts
 A distributed database (DDB) is a collection of
multiple logically related databases distributed over a
computer network.
 A transaction can be executed by multiple

networked computers in a unified manner

3
Distributed Database System
 Advantages
 Management of distributed data with different

levels of transparency:

This refers to the physical placement of data
(files, relations, etc.) which is not known to the
user (distribution transparency).

4
Distributed Database System(cont…)
 Example:

 EMPLOYEE, PROJECT and WORKS_ON tables may


be fragmented horizontally and stored with possible
replication as shown below:

5
Distributed Database System(cont…)
 Advantages (cont…)

Replication transparency:

It allows to store copies of a data at multiple sites
for better availability.

Fragmentation transparency:

Allows to fragment a relation horizontally (create
a subset of tuples of a relation) or vertically (create
a subset of columns of a relation)

6
Distributed Database System(cont…)
 Advantages (cont...)

 Increased availability:


Availability is the probability that the system is
continuously available (usable or accessible)
during a time interval.

A distributed database system has multiple nodes
(computers) and if one fails then others are
available to do the job.

7
Distributed Database System(cont…)
 Other Advantages (cont…)
 Improved performance:


A distributed DBMS fragments the database to
keep data closer to where it is needed most

This reduces data management overhead (access
and modification time) significantly
 Easier expansion (scalability):


Refers to expansion of the system in terms of
adding more data, increasing database sizes or
adding more processors

8
Data Fragmentation, Replication
and Allocation
 Data Fragmentation
 Split a relation into logically related and correct parts.

A relation can be fragmented in three ways:


-Horizontal Fragmentation -
Vertical Fragmentation
-Mixed (hybrid) Fragmentation
 Horizontal fragmentation
 It is a horizontal subset of a relation which contain

those of tuples which satisfy selection conditions.

9
Data Fragmentation, Replication and
Allocation(cont…)
 Vertical fragmentation
 It is a subset of a relation which is created by a

subset of columns. Thus, a vertical fragment of a


relation will contain values of selected columns.
 Because there is no condition for creating a vertical

fragment, each fragment must include the primary


key attribute of the parent relation Employee.

In this way all vertical fragments of a relation
are connected.

10
cont.….
 Mixed (Hybrid) fragmentation
 A combination of Vertical fragmentation and

Horizontal fragmentation
 This is achieved by SELECT-PROJECT operations.

11
Data Replication and Allocation
Data Replication refers the distribution of whole or
part of the data to a number of sites
 Useful in improving availability of data & Improve

performance of global queries since the result of


such query can be obtained from any one site
 In full replication, the entire database is replicated

and in partial replication some selected part is


replicated to some of the sites.

12
Data Replication and Allocation(cont…)
Data Replication and allocation
 The disadvantage of full replication is that it can slow

down update operation since a single logical update


must be performed on every copy of the database to
keep the copies consistent
 Each fragment must be assigned to a particular site in the

distributed system. This process is called data


distribution (or data allocation).

13
Types of Distributed Database Systems
 Homogeneous Window
Site 5 Unix
 All sites of the
Oracle Site 1
database system Window
Oracle

have identical Site 4 Communications


network
setup, i.e., same
database system Oracle
software. Site 3 Site 2
Linux Oracle Linux Oracle

14
Types of Distributed Database Systems
 Heterogeneous
 Federated: Each site may run different database

system but the data access is managed through a single


conceptual schema.
Object Unix Relational
Oriented Site 5 Unix
Site 1
Hierarchical
Window
Site 4 Communications
network

Network
Object DBMS
Oriented Site 3 Site 2 Relational
Linux Linux

15
Types of Distributed Database Systems
 The type of heterogeneity present in FDBSs may arise
from several sources:
 Differences in data models:


Relational, Objected oriented, hierarchical, network,
etc.
 Differences in constraints:


Each site may have their own data accessing and
processing constraints.
 Differences in query language:


Even with the data model, the language and their
version may vary. SQL has multiple versions.

16
Query Processing in Distributed
Databases
 Issues

Cost of transferring data (files and results) over the
network.

This cost is usually high. So, some optimization is
necessary.
 Example: Suppose we have the Employee relation at site 1
and Department relation at Site 2
Employee at site 1. 10,000 rows. Row size = 100 bytes.

This means, table size = 106 bytes.

Department at Site 2. 100 rows. Row size = 35 bytes.

This means, table size = 3,500 bytes.
17
Query Processing in Distributed
Databases (cont…)
 Issues (cont…)
 Query Q : For each employee, retrieve employee name and

department name Where the employee works.

 Q: Fname,Lname,Dname (EmployeeDno = Dnumber Department)

Employee Fname Minit Lname SSN Bdate Address Sex Slary Superssn Dno

Department Dname Dnumber Mgrssn Mgrstartdate

18
Query Processing in Distributed
Databases (cont…)
 Result
 If every employee is related to a department, the result

of this query will have 10,000 tuples


 Suppose that each result tuple is 40 bytes long. The

query is submitted at site 3 and the result is sent to this


site: Query result size = 40 * 10,000 = 400,000 bytes.
 Suppose that Employee and Department relations are

not present at site 3 Employee


Site 1

Site 2 Site 3
Department

19
Query Processing in Distributed
Databases (cont…)
 Strategies (Available options):
1. Transfer Employee and Department to site 3.

Total transfer bytes = 1,000,000 + 3500 = 1,003,500
bytes.
2. Transfer Employee to site 2, execute join at site 2 and
send the result to site 3.

Total transfer size = 1,000,000 + 400,000 = 1,400,000
bytes.
3. Transfer Department relation to site 1, execute the join at
site 1, and send the result to site 3

Total bytes transferred = 3500 + 400,000 = 403,500
bytes.
 Optimization criteria: minimizing data transfer.
 Preferred strategy: strategy 3.

20
Query Processing in Distributed
Databases (cont…)
 Now suppose the result site is 2.
 Possible strategies :
1. Transfer Employee relation to site 2, execute the
query and present the result to the user at site 2

Total transfer size = 1,000,000 bytes for Q.
2. Transfer Department relation to site 1, execute join
at site 1 and send the result back to site 2

Total transfer size for Q:
 3500 +400,000 = 403,500 bytes

21
Concurrency Control and Recovery
 Distributed Databases encounter a number of
concurrency control and recovery problems which are
not present in centralized databases.
 Some of these problems are listed below:
 Dealing with multiple copies of data items

 Failure of individual sites

 Communication link failure

 Distributed commit

 Distributed deadlock

22
Concurrency Control and Recovery
(cont…)
 Details
 Dealing with multiple copies of data items:


The concurrency control must maintain global
consistency.

Likewise, the recovery mechanism must recover all
copies and maintain consistency after recovery.
 Failure of individual sites:


Database availability must not be affected due to the
failure of one or two sites and the recovery scheme
must recover them before they are available for use.

23
Concurrency Control and
Recovery (cont…)
 (Details….)
 Communication link failure:
 This failure may create network partition which

would affect database availability even though all


database sites may be running.
 Distributed commit:
 Problems can arise with transactions that is

accessing databases stored on multiple sites if some


sites fail during the commit process . The 2 phase
commit is used to deal with this problem

24
Concurrency Control and
Recovery (cont…)
 (Details….)
 Distributed deadlock:
 Since transactions are processed at multiple sites,

two or more sites may get involved in deadlock.


This must be resolved in a distributed manner.

25
Concurrency Control and
Recovery (cont…)
 Distributed Concurrency control
 Primary site technique: A single site is assigned

as a primary site which serves as a coordinator for


transaction management.

Primary site
Site 5
Site 1

Site 4 Communications neteork

Site 3 Site 2

26
Concurrency Control and
Recovery
 Transaction management:
 Concurrency control and commit are managed by

this site
 All locks are kept at that site and all requests for

locking or unlocking are sent there


 In two phase locking, this site manages locking and

releasing of data items


 If all transactions follow two-phase policy at all

sites, then serializability is guaranteed

27
Concurrency Control and
Recovery (cont…)
 Advantages:

It is an extension to the centralized two phase
locking and hence simple to Implement and
manage

Data items are locked only at one site but they
can be accessed at any site at which they reside
 Disadvantages:

All transaction management activities go to
primary site which is likely to overload the site.

If the primary site fails, the entire system is
inaccessible
28
Concurrency Control and
Recovery (cont…)
 Primary site with backup site
 To aid recovery, a backup site is designated which

behaves as a shadow of primary site.


 In case of primary site failure, backup site can act

as primary site.

29
Client-Server Database Architecture
 It consists of clients running client software, a set of
servers which provide all database functionalities and
a reliable communication infrastructure.

Server 1 Client 1

Client 2

Server 2 Client 3

Server n Client n

30
Client-Server Database Architecture
 Server: is responsible for local data management at a
site, much like centralized DBMS software.
 Client: is responsible for most of the distribution
function; it accesses data distribution information from
the DBMS catalog and processes all requests that
require access to more than one site.
 The communication software manages communication
among clients and servers

31
Client-Server Database Architecture
 The processing of a SQL queries goes as follows:
 Client parses a user query and decomposes it into a

number of independent sub-queries.


 Each server processes its query and sends the result

to the client.
 The client combines the results of sub queries and

produces the final result.

32
 Thank you….
 Question?

33

You might also like