0% found this document useful (0 votes)
9 views20 pages

unit 4 part 4

A distributed database system consists of multiple interrelated databases spread across a network, managed by a distributed database management system. It can be homogeneous, with identical software across sites, or heterogeneous, where different schemas and software are used. Key concepts include local and global transactions, data replication, fragmentation, and transparency, which are essential for efficient data management and user interaction in distributed environments.

Uploaded by

bhavyagu12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views20 pages

unit 4 part 4

A distributed database system consists of multiple interrelated databases spread across a network, managed by a distributed database management system. It can be homogeneous, with identical software across sites, or heterogeneous, where different schemas and software are used. Key concepts include local and global transactions, data replication, fragmentation, and transparency, which are essential for efficient data management and user interaction in distributed environments.

Uploaded by

bhavyagu12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

DISTRIBUTED DATABASES

Dr. Avdhesh Gupta


Professor
Department of Information Technology
AKGEC, Ghaziabad
Distributed Database System
• A distributed database is a collection of
multiple, logically inter related databases
distributed over a computer network

• A distributed database management


system is a software system that permits
the management of distributed database
• Data spread over multiple computers (also
referred to as sites or nodes).
• Network interconnects the computers
• Data shared by users on multiple computers
Homogeneous Distributed Databases
• In a homogeneous distributed database
– All sites have identical software
– Are aware of each other and agree to
cooperate in processing user requests.
– Appears to user as a single system
– Goal: provide a view of a single database,
hiding details of distribution
• In a heterogeneous distributed database
– Different sites may use different schemas and
software
• Difference in schema is a major problem for query
processing
• Difference in software is a major problem for
transaction processing
– Sites may not be aware of each other and may
provide only limited facilities for cooperation in
transaction processing
• Goal: integrate existing databases to provide useful functionality
Local and Global Transactions

– A local transaction accesses data in the single site at which the transaction
was initiated.
– A global transaction either accesses data in a site different from the one at
which the transaction was initiated or accesses data in several different sites.
Distributed Data Storage
• Assume relational data model
• Replication
– System maintains several identical copies or
replicas of data, stored in different sites
• Fragmentation
– Relation is partitioned into several fragments
stored in distinct sites
• Replication and fragmentation can be
combined
– Relation is partitioned into several fragments:
system maintains several identical replicas of
each such fragment.
Data Replication
• A relation or fragment of a relation is said
to be replicated if it is stored redundantly
in two or more sites.
• Full replication of a relation is the case
where the relation is stored at all sites.
• Partial Replication is where fragments are
replicated at different sites, but each site
does not contain all the fragments
Data Fragmentation
• Division of relation r into fragments r1,
r2, …, rn which contain sufficient
information to reconstruct relation r.
• Two types
– Horizontal Fragmentation
– Vertical Fragmentation
– Mixed Fragmentation
• Horizontal fragmentation: each tuple of r is
assigned to one or more fragments
– Each tuple of the global relation must be present
in atleast one fragment
– Usually tuples are kept at sites where they may be
used most to minimize data transfer
– The fragments may be
• Disjoint : A tuple appears in only one fragment
• Overlapping : A tuple appears in more than one
fragment
Horizontal Fragmentation of account Relation

branch_name account_number balance

Hillside A-305 500


Hillside A-226 336
Hillside A-155 62

account1 = σbranch_name=“Hillside” (account )

branch_name account_number balance

Valleyview A-177 205


Valleyview A-402 10000
Valleyview A-408 1123
Valleyview A-639 750

account2 = σbranch_name=“Valleyview” (account )


• Vertical fragmentation: the schema for
relation r is split into several smaller schemas
(fragmentation on the basis of attributes)
– Each attribute to be present in atleast one
fragment
– All schemas must contain a common primary key
to ensure lossless join property.
– A special attribute, the tuple-id attribute may be
added to each schema to serve as a candidate key.
Vertical Fragmentation of employee_info Relation
branch_name customer_name tuple_id

Hillside Lowman 1
Hillside Camp 2
Valleyview Camp 3
Valleyview Kahn 4
Hillside Kahn 5
Valleyview Kahn 6
Valleyview Green 7
deposit1 = Πbranch_name, customer_name, tuple_id (employee_info )
account_number balance tuple_id

A-305 500 1
A-226 336 2
A-177 205 3
A-402 10000 4
A-155 62 5
A-408 1123 6
A-639 750 7
deposit2 = Πaccount_number, balance, tuple_id (employee_info )
Mixed Fragmentation
Horizontal fragmentation followed by vertical
fragmentation
Vertical fragmentation followed by horizontal
fragmentation
Data Transparency
• Data transparency: Degree to which
system user may remain unaware of the
details of how and where the data items
are stored in a distributed system
• Consider transparency issues in relation
to:
– Fragmentation transparency
– Replication transparency
– Location transparency
• Fragmentation Transparency
– Users are unaware of how a relation has been
fragmented
• Replication Transparency
– Users are unaware of what data objects have been
replicated and where the replicas have been places
• Location Transparency
– Users are unaware of the physical location of the data
Naming of Data Items - Criteria
1. Every data item must have a system-wide
unique name.
2. It should be possible to find the location of
data items efficiently.
3. It should be possible to change the location
of data items transparently.
4. Each site should be able to create new data
items autonomously.
Centralized Scheme - Name Server
• Structure:
– name server assigns all names
– each site maintains a record of local data items
– sites ask name server to locate non-local data items
• Advantages:
– satisfies naming criteria 1-3
• Disadvantages:
– does not satisfy naming criterion 4
– name server is a potential performance bottleneck
resulting in poor performance
– name server is a single point of failure, if it crashes then
the sites will not run
Use of Aliases
• Alternative to centralized scheme: each site prefixes its
own site identifier to any name that it generates i.e.,
site 17.account.
– Fulfills having a unique identifier, and avoids problems
associated with central control.
– However, fails to achieve network transparency.

• Solution: Create a set of aliases for data items; Store


the mapping of aliases to the real names at each site.
• Users use the alias names
• The user can be unaware of the physical location of a
data item, and is unaffected if the data item is moved
from one site to another.
Distributed Transactions
• Transaction may access data at several sites.
• Each site has a local transaction manager responsible for:
– Maintaining a log for recovery purposes
– Participating in coordinating the concurrent execution of the
transactions executing at that site.
• Each site has a transaction coordinator, which is
responsible for:
– Starting the execution of transactions that originate at the site.
– Distributing subtransactions at appropriate sites for execution.
– Coordinating the termination of each transaction that originates
at the site, which may result in the transaction being committed
at all sites or aborted at all sites.

You might also like