0% found this document useful (0 votes)
42 views25 pages

Distributed Databases: Benefits and Issues To Be Considered

Distributed databases allow data to be stored across multiple computers or sites connected through a network. This provides benefits like improved performance through parallel processing, high availability if some sites fail, and the ability to scale incrementally. However, distributed databases also introduce challenges around ensuring consistency, optimizing queries that require data from multiple sites, and managing transactions that update data on different systems. Effective solutions involve strategies like data fragmentation, replication, and optimization of query plans to minimize network communication costs.

Uploaded by

vipinvisvanath
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views25 pages

Distributed Databases: Benefits and Issues To Be Considered

Distributed databases allow data to be stored across multiple computers or sites connected through a network. This provides benefits like improved performance through parallel processing, high availability if some sites fail, and the ability to scale incrementally. However, distributed databases also introduce challenges around ensuring consistency, optimizing queries that require data from multiple sites, and managing transactions that update data on different systems. Effective solutions involve strategies like data fragmentation, replication, and optimization of query plans to minimize network communication costs.

Uploaded by

vipinvisvanath
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

Distributed Databases

Benefits and issues to be considered

1
What is a Distributed Database?
 The DB is stored on several computers,
interconnected through network.
 Each site can process local transactions
involving data only on that site.
 Some sites may get involved in global
transactions, which access data from several
sites.

2
Examples
 University of the West Indies
 Bursary - Financial information
 Personnel - Staff information
 Registry - Student information
 Multinational
 HQ in Kingston
 Manufacturing in Trinidad
 Warehouse in Miami
 European HQ in London
 Each site keeps data on local employees, as well as
data relevant to the operation of the site.
3
Distributed Database Architecture

ES1 ES2 ES3

GCS

LCS1 LCS2 LCS3

LIS1 LIS2 LIS3

4
Data Dictionaries
 A data dictionary in a non-distributed system
contains so-called meta-information,
information about the data.
 Examples: structure of tables, data types of
attributes.
 In distributed DBMS’s, data dictionary must
also say where the fragments can be found.

5
Why Distributed Databases I
 More natural for representing many real world
organizations.
 Local autonomy
 Local organization is fully responsible for accuracy
and safety of its own portion of DB.
 Improved performance
 Speed up of query processing
 Possibility of parallel processing
 Local processing
 If data must be accessed often at a site, store it locally.

6
Why Distributed Databases II

 Improved availability/reliability
 DB can continue to function even if some
subsystems are down if we replicate data or
hardware.
 Security
 Avoid destruction of DB by replicating vital data.
 Incremental growth
 Increasing size of DB
 Increasing operations
 Example: Include R&D facility.
7
What do we want from a distributed
database?
 No reliance on central master site.
 Would be a bottleneck.
 If down, whole system is down.

 Continuous Operation
 Enforces reliability and availability

 Distributed Query Processing


 A query may require accessing data at a no. of sites.
 Same data may be available at a no. of sites.

 Distributed Transaction Management


 Same copy of a data item may be a a number of sites.

 Transparency
 Local Autonomy

8
Transparency
 Allow users to use the system as if it is
not distributed.
 Replication transparency
 Fragmentation transparency
 Hardware and OS transparency
 Language and DBMS transparency
 Applies mostly to multidatabase systems.

9
Autonomy
 All operations at a site are controlled by that site.
 Types of autonomy
 Design
 Individual DBs can use data models and transaction

management techniques that they prefer.


 Communication
 Individual DBs can decide which information they want

to make accessible to other sites


 Execution
 Individual DBs can decide how to execute

transactions submitted to them.

10
Issues and Problems
 Distributed database design
 How should DB and applications be placed across
sites.
 How should data dictionary be placed across sites.
 Replication and partitioning

 Distributed query processing


 How to break down a query into series of data
manipulation operations.
 Reliability
 If site becomes inaccessible, how do you ensure that
DBs at other sites remain consistent and up-to-date?

11
Replication and Fragmentation
 Replication:
 Should we maintain several identical copies of
(part of) the DB, with each replica stored at a
different site?
 Fragmentation
 Should we partition a table into separate parts,
each stored at a different site?
 If we do, how should we partition the table?

12
Replication
 Advantages:
 Increased availability
 If one site goes down, the data may be available from
elsewhere.
 Increased parallelism
 We may send parts of the same query to different sites.
 Disadvantages
 Increased storage space
 Increased overhead on update.
 Update needs to be copied to all sites containing the
relevant replica.
13
Fragmentation
 Why fragment? Why not simply store
different complete tables at different sites?
 Applications usually access only a subset of a
relation.
 Can keep tuples at the site most frequently
used.
 In fragmentation, make sure that we do not
lose information.

14
Types of Fragmentation
 Horizontal fragmentation
 Assign different tuples to each fragment (e.g.,
through a selection in the sense of relational
algebra).
 Vertical fragmentation
 Assign different attributes to each fragment.
 Must ensure re-constructability.
 Mixed Fragmentation
 Mixture of the two.

15
Problems in Fragmentation
 Increased response time if an application
needs to access more than one fragment.
 Especially in vertical fragmentation, ensuring
data integrity may become more difficult.
 Allocation: where to place the various
fragments, and whether to replicate it.

16
Allocation
 Allocation concerns where to store each
fragment and whether to replicate it.
 Possibilities:
 Partitioned DB
 Fragmentation, no replication
 Partially replicated DB
 Fragmentation, with each fragment stored at
more than one site.
 Fully replicated DB
 DB is replicated in full at each site.
17
Evaluation of Partitioned DB
 Query processing is moderately difficult, and
can be time consuming.
 Updating of DB is easy.
 Directory management is moderately difficult
 Reliability is very low.
 Requires least amount of space.

18
Evaluation of Partially Replicated
DBs
 Query processing is moderately difficult, and
can be time consuming.
 Updating of DB is moderately difficult.
 Directory management is moderately difficult.
 Reliability is high.
 Moderate amount of space.

19
Evaluation of Fully Replicated
DBs
 Query processing is easy, and can be done
quickly.
 Updating of DB is easy but time-consuming.
 Directory management is easy but time
consuming.
 Reliability is very high.
 Requires a lot of space.

20
Query Optimization
 Each DBMS has a query processor.
 The query processor takes a high level query
(e.g. SQL) and translates it into a set of
relational algebraic expressions.
 Since this can be done in a number of
different ways, query processor must choose
the best one. This is called query
optimization.

21
Example of query optimization
 Consider tables:
 Empl(Eno, Ename, Title)
 Job(Eno, JobNo, Resp, Dur)
 Query:
SELECT Ename
FROM Empl, Job
WHERE Empl.Eno = Job.Eno
AND Resp = ‘Manager’;
 Two strategies
ename(Resp = ‘Manager’ E J)
ename(E Resp = ‘Manager’(J))
22
Distributed Query
Processing/Optimization
 Query processing/optimization is more
difficult in distributed DBMS
 Require both global optimization and local
optimization
 A query may require data from more than one
site, and communication has a cost.
 It may be possible to perform some sub-
queries in parallel.
 Cost no longer dependent on only number of
tuples accessed.
23
Example
Site 1: E1 = ‘E3’(E)

Site 2: E2 = ‘E3’(E)

Site 3: J1 = ‘E3’(J)

Site 4: J2 = ‘E3’(J)

Results are expected at site 5.


24
Different Possibilities
 Strategy 1:
 Send everything to Site 5 and perform the original
query there.
 Strategy 2:
 Do selections at sites 3 and 4
 Send results to sites 1 and 2
 Perform join and projections
 Send result to site 5.
 Strategy 2 might seem better but if communication
to site 5 is cheap/fast, then 1 may be better.
25

You might also like