unit 4 part 4
unit 4 part 4
– A local transaction accesses data in the single site at which the transaction
was initiated.
– A global transaction either accesses data in a site different from the one at
which the transaction was initiated or accesses data in several different sites.
Distributed Data Storage
• Assume relational data model
• Replication
– System maintains several identical copies or
replicas of data, stored in different sites
• Fragmentation
– Relation is partitioned into several fragments
stored in distinct sites
• Replication and fragmentation can be
combined
– Relation is partitioned into several fragments:
system maintains several identical replicas of
each such fragment.
Data Replication
• A relation or fragment of a relation is said
to be replicated if it is stored redundantly
in two or more sites.
• Full replication of a relation is the case
where the relation is stored at all sites.
• Partial Replication is where fragments are
replicated at different sites, but each site
does not contain all the fragments
Data Fragmentation
• Division of relation r into fragments r1,
r2, …, rn which contain sufficient
information to reconstruct relation r.
• Two types
– Horizontal Fragmentation
– Vertical Fragmentation
– Mixed Fragmentation
• Horizontal fragmentation: each tuple of r is
assigned to one or more fragments
– Each tuple of the global relation must be present
in atleast one fragment
– Usually tuples are kept at sites where they may be
used most to minimize data transfer
– The fragments may be
• Disjoint : A tuple appears in only one fragment
• Overlapping : A tuple appears in more than one
fragment
Horizontal Fragmentation of account Relation
Hillside Lowman 1
Hillside Camp 2
Valleyview Camp 3
Valleyview Kahn 4
Hillside Kahn 5
Valleyview Kahn 6
Valleyview Green 7
deposit1 = Πbranch_name, customer_name, tuple_id (employee_info )
account_number balance tuple_id
A-305 500 1
A-226 336 2
A-177 205 3
A-402 10000 4
A-155 62 5
A-408 1123 6
A-639 750 7
deposit2 = Πaccount_number, balance, tuple_id (employee_info )
Mixed Fragmentation
Horizontal fragmentation followed by vertical
fragmentation
Vertical fragmentation followed by horizontal
fragmentation
Data Transparency
• Data transparency: Degree to which
system user may remain unaware of the
details of how and where the data items
are stored in a distributed system
• Consider transparency issues in relation
to:
– Fragmentation transparency
– Replication transparency
– Location transparency
• Fragmentation Transparency
– Users are unaware of how a relation has been
fragmented
• Replication Transparency
– Users are unaware of what data objects have been
replicated and where the replicas have been places
• Location Transparency
– Users are unaware of the physical location of the data
Naming of Data Items - Criteria
1. Every data item must have a system-wide
unique name.
2. It should be possible to find the location of
data items efficiently.
3. It should be possible to change the location
of data items transparently.
4. Each site should be able to create new data
items autonomously.
Centralized Scheme - Name Server
• Structure:
– name server assigns all names
– each site maintains a record of local data items
– sites ask name server to locate non-local data items
• Advantages:
– satisfies naming criteria 1-3
• Disadvantages:
– does not satisfy naming criterion 4
– name server is a potential performance bottleneck
resulting in poor performance
– name server is a single point of failure, if it crashes then
the sites will not run
Use of Aliases
• Alternative to centralized scheme: each site prefixes its
own site identifier to any name that it generates i.e.,
site 17.account.
– Fulfills having a unique identifier, and avoids problems
associated with central control.
– However, fails to achieve network transparency.