0% found this document useful (0 votes)
3 views84 pages

ADBMS

A distributed database is a collection of interconnected databases spread across various locations, allowing for independent management while appearing as a single database to users. The Distributed Database Management System (DDBMS) facilitates the management of these databases, ensuring data transparency and synchronization across sites. While distributed databases offer advantages such as modular development, reliability, and lower communication costs, they also present challenges like complex software requirements and data integrity issues.

Uploaded by

krishna226633pt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views84 pages

ADBMS

A distributed database is a collection of interconnected databases spread across various locations, allowing for independent management while appearing as a single database to users. The Distributed Database Management System (DDBMS) facilitates the management of these databases, ensuring data transparency and synchronization across sites. While distributed databases offer advantages such as modular development, reliability, and lower communication costs, they also present challenges like complex software requirements and data integrity issues.

Uploaded by

krishna226633pt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

ADVANCED DATABASE MANAGEMENT SYSTEM

Distributed Databases
❖ Definition of Distributed databases
A distributed database is a collection of multiple interconnected databases, which are
spread physically across various locations that communicate via a computer network.

It means instead of keeping all data on a single database or on centralised database, we


are keeping it on multiple databases which are interrelated to each other. In distributed
databases we have different sites like site1, site2, …, etc where each site is associated
with one database and a sites are logically connected with each other via network or
Internet as follows. Here site means computer or system.

Features

➢ Databases in the collection are logically interrelated with each other. Often they
represent a single logical database.
➢ Data is physically stored across multiple sites. Data in each site can be managed
by a DBMS independent of the other sites.
➢ The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
➢ A distributed database is not a loosely connected file system.
➢ A distributed database incorporates transaction processing, but it is not
synonymous with a transaction processing system.
❖ Distributed Database Management System
(DDBMS)
A distributed database management system (DDBMS) is a centralized
software system that manages a distributed database in a manner as if it
were all stored in a single location.

It is a software system that manages a distributed database while making


distribution transparent to user.

It means DDBMS is a software who manages distributed databases in such a way


that end user should not aware of data is distributed. He should feel like he is
accessing data from single database.

Features

● It is used to create, retrieve, update and delete distributed databases.


● It synchronizes the database periodically and provides access
mechanisms by the virtue of which the distribution becomes transparent
to the users.
● It ensures that the data modified at any site is universally updated.
● It is used in application areas where large volumes of data are processed
and accessed by numerous users simultaneously.
● It is designed for heterogeneous database platforms.
● It maintains confidentiality and data integrity of the databases.
Advantages of Distributed Databases

Following are the advantages of distributed databases over centralized databases.

Modular Development − If the system needs to be expanded to new locations or new units, in centralized database systems, the action
requires substantial efforts and disruption in the existing functioning. However, in distributed databases, the work simply requires adding new
computers and local data to the new site and finally connecting them to the distributed system, with no interruption in current functions.

More Reliable − In case of database failures, the total system of centralized databases comes to a halt. However, in distributed systems, when
a component fails, the functioning of the system continues may be at a reduced performance. Hence DDBMS is more reliable.

Better Response − If data is distributed in an efficient manner, then user requests can be met from local data itself, thus providing faster
response. On the other hand, in centralized systems, all queries have to pass through the central computer for processing, which increases the
response time.

Lower Communication Cost − In distributed database systems, if data is located locally where it is mostly used, then the communication
costs for data manipulation can be minimized. This is not feasible in centralized systems.

Easier Expansion - When we to expand system in centralized database, that time we need to down whole system for particular time. While in
distributed databases you don’t need to down the whole system when you want to add or remove new site in the same system.

Reduced Operating cost - As we have divided centralized data at different sites so data is easily accessible which reduces operating cost.

Fast data processing - As data has divided in number of sites so processing of data becomes fast.
Disadvantages of Distributed Databases

Following are some of the adversities associated with distributed databases.


Need for complex and expensive software − DDBMS demands complex and often expensive software to provide data transparency and
co-ordination across several sites.
Processing overhead − Even simple operations may require a large number of communications and additional calculations to provide
uniformity in data across the sites.
Data integrity − The need for updating data in multiple sites pose problems of data integrity.
Overheads for improper data distribution − Responsiveness of queries is largely dependent upon proper data distribution. Improper data
distribution often leads to very slow response to user requests.
Types of Distributed Databases:

Distributed databases can be broadly classified into


homogeneous and heterogeneous distributed database
environments, each with further subdivisions, as shown in the
following illustration.
★ Homogeneous Distributed Databases
In a homogeneous distributed database, all the sites use identical DBMS and operating systems.
Its properties are −
● The sites use very similar software.
● The sites use identical DBMS or DBMS from the same vendor.
● Each site is aware of all other sites and cooperates with other sites to process user requests.
● The database is accessed through a single interface as if it is a single database.
Types of Homogeneous Distributed Database
There are two types of homogeneous distributed database −
❖ Autonomous − Each database is independent that functions on its own. They are integrated by a controlling application and
use message passing to share data updates.
❖ Non-autonomous − Data is distributed across the homogeneous nodes and a central or master DBMS coordinates data
updates across the sites.

★ Heterogeneous Distributed Databases
In a heterogeneous distributed database, different sites have different operating systems, DBMS products and data models.
Its properties are −
● Different sites use dissimilar schemas and software.
● The system may be composed of a variety of DBMSs like relational, network, hierarchical or object oriented.
● Query processing is complex due to dissimilar schemas.
● Transaction processing is complex due to dissimilar software.
● A site may not be aware of other sites and so there is limited co-operation in processing user requests.
Types of Heterogeneous Distributed Databases
❖ Federated − The heterogeneous database systems are independent in nature and integrated together so that they function as
a single database system.
❖ Un-federated − The database systems employ a central coordinating module through which the databases are accessed.
❖ Distributed transparent system.
The three dimensions of distribution transparency are −
1) Location transparency 2) Fragmentation transparency 3) Replication transparency
➢ Location Transparency

Location transparency ensures that the user can query on any table(s) or fragment(s) of a table as if they were stored locally in the user’s site. The fact that the table
or its fragments are stored at remote site in a distributed database system, should be completely oblivious to the end user. The address of the remote site(s) and the access
mechanisms are completely hidden. In order to incorporate location transparency, DDBMS should have access to updated and accurate data dictionary and DDBMS
directory which contains the details of the locations of data.

➢ Fragmentation Transparency

Fragmentation transparency enables users to query upon any table as if it were unfragmented. Thus, it hides the fact that the table the user is querying on is actually a
fragment or union of some fragments. It also conceals the fact that the fragments are located at diverse sites. This is somewhat similar to users of SQL views, where the user
may not know that they are using a view of a table instead of the table itself.

➢ Replication Transparency

Replication transparency ensures that replication of databases are hidden from the users. It enables users to query upon a table as if only a single copy of the table
exists.
Replication transparency is associated with concurrency transparency and failure transparency. Whenever a user updates a data item, the update is reflected in all the copies
of the table. However, this operation should not be known to the user. This is concurrency transparency. Also, in case of failure of a site, the user can still proceed with his
queries using replicated copies without any knowledge of failure. This is failure transparency.

➢ Combination of Transparencies

In any distributed database system, the designer should ensure that all the stated transparencies are maintained to a considerable extent. The designer may choose
to fragment tables, replicate them and store them at different sites; all oblivious to the end user. However, complete distribution transparency is a tough task and requires
considerable design efforts.
❖ DDBMS Architecture
Some of the common architectural models are −

● Distributed database reference architecture


● Client - Server Architecture for DDBMS
● Peer - to - Peer Architecture for DDBMS
● Multi - DBMS Architecture

1. Distributed database reference architecture


❏ Basically this architecture contains four views which are External view, Global View,
Conceptual view and Internal view.
❏ The external view is the view which displayed to external users.
❏ Global conceptual view tells you exactly how logical data is related to each other on
distributed databases. It tell you relation of databases globally how they are related.
❏ While conceptual view gives you an idea about particular database at particular site
❏ Then local internal schema gives you how data physically stored like data stored on
particular directory, file, etc.
2) Client - Server Architecture for DDBMS
- This is a two-level architecture where the functionality is divided into
servers and clients.
- The server functions primarily encompass data management, query
processing, optimization and transaction management.
- Client functions include mainly user interface. However, they have
some functions like consistency checking and transaction management.
- The two different client - server architecture are

● Single Server Multiple Client
● Multiple Server Multiple Client (shown in the following
diagram)
3) Peer- to-Peer Architecture for DDBMS
- In these systems, each peer acts both as a client and a server for imparting
database services.
- The peers share their resources with other peers and coordinate their
activities.
- This architecture generally has four levels of
schemas −
➢ Global Conceptual Schema − Depicts the global logical
view of data.
➢ Local Conceptual Schema − Depicts logical data
organization at each site.
4) Multi - DBMS Architectures
This is an integrated database system formed by a collection of two or more autonomous
database systems.

Multi-DBMS can be expressed through six levels of schemas −

➢ Multi-database View Level − Depicts multiple user views


comprising of subsets of the integrated distributed
database.
➢ Multi-database Conceptual Level − Depicts integrated
multi-database that comprises of global logical
multi-database structure definitions.
➢ Multi-database Internal Level − Depicts the data distribution
across different sites and multi-database to local data
mapping.
➢ Local database View Level − Depicts public view of local data.
➢ Local database Conceptual Level − Depicts local data
organization at each site.
➢ Local database Internal Level − Depicts physical data
organization at each site.
❖ Design problem of distributed systems

Design problems are those problems which you can face while using distributed databases. Here we have listed some issues
1) Distributed database design
There are two alternative design methods
i) Partitioned database
Here we divides data in different parts
ii) Replicated database
Here we replicate or make copies of data at different sites or all sites like partial or full replication

Fundamental design issues


While using distributed databases we should ensure that the following problems should not arise
i) locality of reference
We should not get problem of locality reference. Its like where we have maximum data and very important data their only data is not available. It should not be
happened.
ii) reliability and availability
Data should be reliable means suppose value of particular variable is different at two sites, then you may get inconsistent data which should not be happened. It
should be available at any time.
iii) Performance
Performance is related to query processing. It means performance is dependent upon particular output I am getting after number of queries processed. I should
get maximum performance with minimum queries
iv) Communication cost
The sites which are communicating with each other for sharing data that communication cost must be low. Because communication cost affect performance as
well as time.

Optimal design parameters


i) Database storage
In optimal design of distributed databases, database storage must be minimum.
ii) Processing transactions running on it
The transactions or group of processes we are running, that cost must be low. It means whatever queries I am executing, those should be optimized.
iii) Message communication among sites
Message communication cost should be low at all the sites.
2) Distributed query processing
Query processor design algorithms
Distributed query processor is the processor which designs algorithm in such a way that the data we want to manipulate should be
more efficient with minimum cost. As data may stored at multiple places so query processor must fetch data from more efficient database.
Factors to be considered
Query processor must consider the following two factors.
i) Distribution of data
Query processor must remember that how far the particular data exists. Here system need to check how I can data by passing
minimum hops on different sites.
ii) Lack of sufficient locally available information
It means it should not be that data which is very important is not available at any site.
The main aim of query processing is to minimize the cost which can be found using the following formula where we will have minimum data
transmission as well as minimum local processing.
Min{cost = data transmission + local processing}

3) Distributed directory management


➢ Directory is nothing but like diary where database maintain one directory where he stores information of all datasets.
➢ It stores information of data with location means where exactly particular data has been stored.
➢ So basically, to find location of particular data when query fired that time system needs directory.
➢ So issues with directory are like where I need to store directory locally or globally. How exactly we have to store directory and how to update it.

4) Distributed Concurrency control


➢ Concurrency means multiple transactions are executing simultaneously.
➢ Suppose two transactions executing together and both operating same data set then we need to maintain data integrity.
➢ It means value of particular database must be same at all sites to avoid inconsistent data.
➢ The system must check whether integrity is maintained across all replicas. It means system should update all replicas when particular dataset value
needs to update.
❖ Design, strategies (top-down, bottom-up)

Top Down Approach

❏ A top down approach is used to create a new database When we start


new database designing with new tables, new schema that we use top
down approach.
❏ You model the objects at a logical level.
❏ Then you apply the objects to a physical database design.
❏ For example, a relational database would need the objects to be mapped
to tables.

Bottom up Approach

❏ A bottom up approach is used to migrate a database from one physical


database to another.
❏ Migrating from Oracle to SQL Server usually requires some changes as
the column data type are not completely compatible.
❏ You would create tables based on the existing tables.
❏ Sometimes, you try to make a near exact copy, to minimize the
application coding changes.
❏ Other times, you alter the table structure, usually to normalize further or
to group columns together in a more logical way.
❖ Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the table
are called fragments.

Fragmentation can be of three types: horizontal, vertical, and hybrid (combination of horizontal
and vertical).

Horizontal fragmentation can further be classified into two techniques: primary horizontal
fragmentation and derived horizontal fragmentation.

Fragmentation should be done in a way so that the original table can be reconstructed from the
fragments. This is needed so that the original table can be reconstructed from the fragments
whenever required. This requirement is called “re-constructiveness.”
● Advantages of Fragmentation
Since data is stored close to the site of usage, efficiency of the database system is increased.
Local query optimization techniques are sufficient for most queries since data is locally available.
Since irrelevant data is not available at the sites, security and privacy of the database system can be maintained.

● Disadvantages of Fragmentation

When data from different fragments are required, the access speeds may be very high.
In case of recursive fragmentations, the job of reconstruction will need expensive techniques.
Lack of back-up copies of data in different sites may render the database ineffective in case of failure of a site.
★ Vertical Fragmentation

In vertical fragmentation, the fields or columns of a table are grouped into fragments. In order to maintain re-constructiveness, each fragment
should contain the primary key field(s) of the table. Vertical fragmentation can be used to enforce privacy of data.

For example, let us consider that a University database keeps records of all registered students in a Student table having the following schema.

STUDENT

Regd_No Name Course Address Semester Fees Marks

Now, the fees details are maintained in the accounts section. In this
case, the designer will fragment the database as follows −

CREATE TABLE STD_FEES AS

SELECT Regd_No, Fees

FROM STUDENT;
★ Horizontal Fragmentation

Horizontal fragmentation groups the tuples of a table in accordance to the values of one or more fields.
Horizontal fragmentation should also confirm to the rule of re-constructiveness. Each horizontal
fragment must have all columns of the original base table.

For example, in the student schema, if the details of all students of


Computer Science Course needs to be maintained at the School of Computer
Science, then the designer will horizontally fragment the database as
follows −

CREATE COMP_STD

SELECT * FROM STUDENT WHERE COURSE = "Computer Science";

★ Hybrid Fragmentation

In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques are used.
This is the most flexible fragmentation technique since it generates fragments with minimal extraneous
information. However, reconstruction of the original table is often an expensive task.

Hybrid fragmentation can be done in two alternative ways −

At first, generate a set of horizontal fragments; then generate vertical fragments from one or
more of the horizontal fragments.
At first, generate a set of vertical fragments; then generate horizontal fragments from one or
more of the vertical fragments.
❖ Data Replication

Data replication is the process of storing separate copies of the database at two or
more sites. It is a popular fault tolerance technique of distributed databases.

Advantages of Data Replication

Reliability − In case of failure of any site, the database system continues to


work since a copy is available at another site(s).
Reduction in Network Load − Since local copies of data are available,
query processing can be done with reduced network usage, particularly
during prime hours. Data updating can be done at non-prime hours.
Quicker Response − Availability of local copies of data ensures quick query
processing and consequently quick response time.
Simpler Transactions − Transactions require less number of joins of tables
located at different sites and minimal coordination across the network. Thus,
they become simpler in nature.

Disadvantages of Data Replication

❖ Increased Storage Requirements − Maintaining multiple copies of data


is associated with increased storage costs. The storage space required
is in multiples of the storage required for a centralized system.
❖ Increased Cost and Complexity of Data Updating − Each time a data
item is updated, the update needs to be reflected in all the copies of the
data at the different sites. This requires complex synchronization
techniques and protocols.
❖ Undesirable Application – Database coupling − If complex update
mechanisms are not used, removing data inconsistency requires
complex co-ordination at application level. This results in undesirable
application – database coupling.
➢ Query Processing
- Query Processing
It includes transaction of high level queries into low level
expressions that can be used at the physical level of the file
system, query optimization and actual execution of the query to
get the result.
- Query Optimization
It is a process in which multiple query execution plans for
satisfying a query are examined and most efficient query plan is
identified for execution.
- Query Processing steps
Consist of Parsing and translation optimization and execution of
the query
- Parser and translator checks the syntax of query. It also
verifies attributes and relation names.
- It gives internal representation of query to query
optimizer.
Query Tree
Query tree is used to represent Relational algebra expressions.
1) Tree data structure
2) i/p relations are represented as leaf nodes.
3) Relational algebra operations are internal nodes.
SQL Query
Select Book_title, price from Book where price>50;
𝛔price>30 (πbook_title,price(Book))
↙ ↘
𝛔price>30 πbook_title,price(Book)
↓ ↓
πbook_title,price(Book) 𝛔price>30
↓ ↓
Book Book
Query evaluation plan
It is used to fully specify how to evaluate a query. Each operation in the query tree is annotated with instructions which specify
algorithm or the index to be used to evaluate that operation.
Different evaluation plans for a given query can have different costs. It is the responsibility of the Query optimizer to generate a least
costly plan.
Best Query Evaluation plan is finally submitted to the Query Evaluation Engine for actual execution.

Query Blocks
Query submitted to the database system is first decomposed into query blocks.
A query block forms a basic unit that can be translated into relational algebra expression and optimized.
❖ Distributed Reliability Protocols
Recovery in distributed database
Recovery is more complicated than in a centralised system
The failures related to distributed databases are:
- Loss of message
- Failure of site at which subtransaction is running
- Failure of communication link
→ Recovery system must ensure atomicity(all or none). Certain protocol helps in guaranteeing the recovery as
follows:
❖ Two phase commit protocol
Two phase commit protocol contains two phases
1) Voting phase
In this, participating sites vote on whether they are ready to commit the transaction or not
Steps of voting phase are as follows
i) [T, Prepare]- CS sends this message to all participating msg that CS is ready to commit and creates an entry
in the log files.
ii) [T,Ready] or [T, not ready] depending upon whether PS site ready to commit or not and keep this entry in
their respective log files.
2) Decision phase
In this, the coordinator site decides whether the transaction can be committed or has to be aborted.
i) {Ready, T] message from all participating sites allow CS to commit transaction
ii) At Least one [not ready T] abort the transaction T will allow CS to abort transaction T
- Failure of participating site:
i) Site fails before sending ‘Ready T’ then Transaction is aborted
ii) Site fails after sending ‘Ready T’ then transaction proceeds in a normal way
- Recovery process when participating site restarts after failure:
PS will check Log file
↗ [T, commit]--> Redo transactions
i) LOG
↘ [T, abort] → undo transactions

ii) LOG → [T,ready] ------> Then site contacts CS whether commit or abort T
iii) No such record in log → Undo T and abort transaction
- Failure of coordinators site:
All the participant site communicates with each other to determine the status of T
↗ [T, commit] → Commit T
i) Any site
↘ [T, abort]--> ABORT T

ii) No [T,ready] in log → IT MEANS NO VOTING SO ABORT T


iii) No such record in log→ Her all PS sites wait to recover CS. Till time all participating sites will be in blocking phase.
- Recovery Process when coordinating site restarts after failure
- Log file is checked
↗ [T, commit] → Commit redo
i) Log
↘ [T, abort] → Undo or abort T and share this msg with all PS

ii) LOG → [T, Prepare] → Abort T


❖ Three phases commit protocol.
- Three phase commit protocol is an extension of two phase commit protocol that avoid blocking even if the coordinator site fails during
recovery.
- Conditions to avoid blocking:
i) No N/W partitioning
ii) At least there is one available site
iii) At most ’k’ site fail where k is predetermined number.
- Phases between voting and decision
i) [T1 Prepare]
ii) [T1 Pre commit]
iii) [T1 commit]
- Coordinator site sends [T1 ,prepare] message and receives votes from participating sites. Then it sends [T1 Pre commit] msg and after
ensuring that k sites knows about decision to commit, actual [T1 commit] msg sent.
- If coordinator site fails, new coordinator site is chosen which communicates with remaining sites to check whether old coordinating site had
decided to commit. It can be checked easily if any one of k site has [T, pre commit] msg.
- Blocking is avoided but overhead is +
Parallel Database System
❖ Parallel Database Systems

➢ Companies need to handle huge amount of data with high data transfer rate.
➢ The client server and centralized system is not much efficient. The need to improve the efficiency gave birth to the concept of Parallel
Databases.
➢ Parallel database system improves performance of data processing using multiple resources in parallel, like multiple CPU and disks are
used parallelly.
➢ It also performs many parallelization operations like, data loading and query processing.

A parallel database is one which involves multiple processors and working in parallel on the database used to provide the services.

A parallel database system seeks to improve performance through parallelization of various operations like loading data, building
index and evaluating queries parallel systems improve processing and I/O speeds by using multiple CPU’s and disks in parallel.

Need :
Multiple resources like CPUs and Disks are used in parallel. The operations are performed simultaneously, as opposed
to serial processing. A parallel server can allow access to a single database by users on multiple machines. It also
performs many parallelization operations like data loading, query processing, building indexes, and evaluating
queries.
Advantages :

Here, we will discuss the advantages of parallel databases.

1. Performance Improvement –
By connecting multiple resources like CPU and disks in parallel we can significantly increase the performance of the
system.

2. High availability –
In the parallel database, nodes have less contact with each other, so the failure of one node doesn’t cause for failure of
the entire system. This amounts to significantly higher database availability.

3. Proper resource utilization –


Due to parallel execution, the CPU will never be idle. Thus, proper utilization of resources is there.

4. Increase Reliability –
When one site fails, the execution can continue with another available site which is having a copy of data. Making the
system more reliable.
❖ Working of parallel database
Parallel database works in step by step manner −

➢ Step 1 − Parallel processing divides a large task into many


smaller tasks and executes the smaller tasks concurrently on
several CPU’s and completes it more quickly.

➢ Step 2 − The driving force behind parallel database systems is


the demand of applications that have to query extremely large
databases of the order of terabytes or that have to process a
large number of transactions per second.

➢ Step 3 − In parallel processing, many operations are


performed simultaneously as opposed to serial processing, in
which the computational steps are performed sequentially.
Parallel Databases Architecture

1) Shared Memory System


● Shared memory system uses multiple processors which is attached to a global
shared memory via intercommunication channel or communication bus.
● Shared memory system have large amount of cache memory at each processors,
so referencing of the shared memory is avoided.
● If a processor performs a write operation to memory location, the data should be
updated or removed from that location.

Advantages of Shared memory system

● Data is easily accessible to any processor.


● One processor can send message to other efficiently.

Disadvantages of Shared memory system

● Waiting time of processors is increased due to more number of processors.


● Bandwidth problem.
2) Shared Disk System

● Shared disk system uses multiple processors which are accessible to multiple disks via
intercommunication channel and every processor has local memory.
● Each processor has its own memory so the data sharing is efficient.
● The system built around this system are called as clusters.

Advantages of Shared Disk System

● Fault tolerance is achieved using shared disk system.


Fault tolerance: If a processor or its memory fails, the other processor can complete
the task. This is called as fault tolerance.

Disadvantage of Shared Disk System

● Shared disk system has limited scalability as large amount of data travels through the
interconnection channel.
● If more processors are added the existing processors are slowed down.

Applications of Shared Disk System

Digital Equipment Corporation(DEC): DEC cluster running relational databases use the
shared disk system and now owned by Oracle.
3) Shared Nothing Disk System

● Each processor in the shared nothing system has its own local memory and local
disk.
● Processors can communicate with each other through intercommunication channel.
● Any processor can act as a server to serve the data which is stored on local disk.

Advantages of Shared nothing disk system

● Number of processors and disk can be connected as per the requirement in share
nothing disk system.
● Shared nothing disk system can support for many processors, which makes the
system more scalable.

Disadvantages of Shared nothing disk system

● Data partitioning is required in shared nothing disk system.


● Cost of communication for accessing local disk is much higher.

Applications of Shared nothing disk system

● Teradata database machine.


● The Grace and Gamma research prototypes.
4) Hierarchical System or Non-Uniform Memory Architecture

● Hierarchical model system is a hybrid of shared memory system, shared disk system and shared nothing system.
● Hierarchical model is also known as Non-Uniform Memory Architecture (NUMA).
● In this system each group of processor has a local memory. But processors from other groups can access memory which is associated
with the other group in coherent.
● NUMA uses local and remote memory(Memory from other group), hence it will take a longer time to communicate with each other.

Advantages of NUMA

● Improves the scalability of the system.


● Memory bottleneck(shortage of memory) problem is minimized in this architecture.

Disadvantages of NUMA

The cost of the architecture is higher compared to other architectures.


Performance Measures
There are two main resources of performance of a database system, which are explained below −
Throughput − The number of tasks that can be completed in a given time interval. A system that processes a large number of small transactions can improve throughput by
processing many transactions in parallel.
Response time − The amount of time it takes to complete a single task from the time it is submitted. A system that processes large transactions can improve response time, as
well as throughput by performing subtasks of each transaction in parallel.

Parallel query evaluation: Speed up and scale up

Parameters for Parallel Databases


Some parameters to judge the performance of Parallel Databases are:

1. Response time: It is the time taken to complete a single task for given time.

2. Speed up in Parallel database:

● Speed up is the process of increasing degree of (resources) parallelism to complete a running task in less time.
● Adding more resources leads to less time for solving the same problem.

n times more resources → n times speedup

● The time required for running task is inversely proportional to number of resources.
● Formula:
Speed up = TS / TL
Where, TS = Time required to execute task of size Q & TL = Time required to execute task of size N*Q
● Linear speed-up is N (Number of resources).
● Speed-up is sub-linear if speed-up is less than N.

3. Scale up in Parallel database:


- Scale-up is the ability to keep performance constant, when number
of process and resources increases proportionally.
- Adding more resources solves a larger version of the problem in the
same time

n times more resources → n times larger problem solvable

Formula:
Let Q be the Task and QN the task where N is greater than Q
TS = Execution time of task Q on smaller machine MS
TL = Execution time of task Q on smaller machine ML

Scale Up = TS /TL
❖ Benefits of parallel Database
The benefits of the parallel database are explained below −

➢ Speed
■ Speed is the main advantage of parallel databases. The server breaks up a request for a user database into parts and
sends each part to a separate computer.
■ We eventually function on the pieces and combine the outputs, returning them to the customer. It speeds up most
requests for data so that large databases can be reached more easily.

➢ Capacity
■ As more users request access to the database, the network administrators are adding more machines to the parallel
server, increasing their overall capacity.
■ For example, a parallel database enables a large online store to have at the same time access to information from
thousands of users. With a single server, this level of performance is not feasible.

➢ Reliability
■ Despite the failure of any computer in the cluster, a properly configured parallel database will continue to work. The
database server senses that there is no response from a single computer and redirects its function to the other
computers.
■ Many companies, such as online retailers, want their database to be accessible as fast as possible. This is where a
parallel database stands good.
■ This method also helps in conducting scheduled maintenance on a computer-by-computer technician. They send a
server command to uninstall the affected device, then perform the maintenance and update required.
❖ Benefits for queries
Parallel query processing can benefit the following types of queries −

➢ Select statements that scan large numbers of pages but output a few rows only.

➢ Select statements that include union, order by, or distinct, since these queries can populate worktables in parallel,
and can make use of parallel sorting.

➢ Select statements that use merge joins can use parallel processing for scanning tables and also for sorting and
merging.

➢ Select statements where the reformatting strategy is chosen by the optimizer, since these can populate worktables
in parallel, and can make use of parallel sorting.

➢ Create index statements, and the alter table - add constraint clauses that create indexes, unique and primary keys.
❖ Query Parallelism: I/O Parallelism (Data Partitioning)

The two techniques used in query evaluation are as follows:

1. Inter query parallelism

● This technique allows to run multiple queries on different processors simultaneously.


● Pipelined parallelism is achieved by using inter query parallelism, which improves the output of the system.

For example: If there are 6 queries, each query will take 3 seconds for evaluation. Thus, the total time taken to complete evaluation
process is 18 seconds. Inter query parallelism achieves this task only in 3 seconds.
However, Inter query parallelism is difficult to achieve every time.

2. Intra Query Parallelism

● In this technique query is divided in sub queries which can run simultaneously on different processors, this will minimize the query
evaluation time.
● Intra query parallelism improves the response time of the system.

For Example: If we have 6 queries, which can take 3 seconds to complete the evaluation process, the total time to complete the evaluation
process is 18 seconds. But We can achieve this task in only 3 seconds by using intra query evaluation as each query is divided in sub-queries.
Optimization of Parallel Query
● Parallel Query optimization is nothing but selecting the efficient query evaluation plan.
● Parallel Query optimization plays an important role in developing system to minimize the cost of query evaluation.

Two factors play a very important in parallel query optimization.

a) total time spent to find the best plan.


b) amount of time required to execute the plan.

Goals of Query optimization.


Query Optimization is done with an aim to:

● Speed up the queries by finding the queries which can give the fastest result on execution.
● Increase the performance of the system.
● Select the best query evaluation plan.
● Avoid the unwanted plan.
Object Oriented Database
➢ Object oriented database systems are alternative to

relational database and other database systems.

➢ In object oriented database, information is represented

in the form of objects.

➢ Object oriented databases are exactly the same as

object oriented programming languages.

➢ If we can combine the features of relational model

(transaction, concurrency, recovery) to object oriented

databases, the resultant model is called as object

oriented database model.


Features of OODBMS

In OODBMS, every entity is considered as object and represented in a table. Similar objects are classified to classes and subclasses and
relationship between two objects is maintained using concept of inverse reference.

Some of the features of OODBMS are as follows:

1. Complexity

OODBMS has the ability to represent the complex internal structure (of an object) with multilevel complexity.

2. Inheritance

Creating a new object from an existing object in such a way that new object inherits all characteristics of an existing object.

3. Encapsulation

It is a data hiding concept in OOPL which binds the data and functions together which can manipulate data and not visible to the outside
world.

4. Persistency

OODBMS allows to create persistent object (Object remains in memory even after execution). This feature can automatically solve the
problem of recovery and concurrency.
Advantages:

Supports Complex Data Structures: ODBMS is designed to handle complex data structures, such as inheritance, polymorphism, and encapsulation. This
makes it easier to work with complex data models in an object-oriented programming environment.

Improved Performance: ODBMS provides improved performance compared to traditional relational databases for complex data models. ODBMS can
reduce the amount of mapping and translation required between the programming language and the database, which can improve performance.

Reduced Development Time: ODBMS can reduce development time since it eliminates the need to map objects to tables and allows developers to work
directly with objects in the database.

Supports Rich Data Types: ODBMS supports rich data types, such as audio, video, images, and spatial data, which can be challenging to store and retrieve
in traditional relational databases.

Scalability: ODBMS can scale horizontally and vertically, which means it can handle larger volumes of data and can support more users.

Disadvantages:

Limited Adoption: ODBMS is not as widely adopted as traditional relational databases, which means it may be more challenging to find developers with
experience working with ODBMS.

Lack of Standardization: ODBMS lacks standardization, which means that different vendors may implement different features and functionality.

Cost: ODBMS can be more expensive than traditional relational databases since it requires specialized software and hardware.

Integration with Other Systems: ODBMS can be challenging to integrate with other systems, such as business intelligence tools and reporting software.

Scalability Challenges: ODBMS may face scalability challenges due to the complexity of the data models it supports, which can make it challenging to
partition data across multiple nodes.
What is Object?

❖ Object consists of entity and attributes which can describe the state of real world object and action associated
with that object.
❖ Object is uniquely identifiable entity that contains both attributes that describe the state of real world object and
the action associated with it.
❖ An object typically has two components; state(value) and behaviour(operations)
❖ Hence it is somewhat similar to a program variable in a programming language, except that it will have a
complex data structure as well as specific operations defined by the programmer.
❖ Each object has to maintain information about current state of object and additionally has action and behaviour
that have to be modelled.
❖ The current state of object is described by one or more attributes(instance variables).
Characteristics of Object
Some important characteristics of an object are:

1. Object identifier

The object identifier is unique system generated identifier.

This is system generated identifier which is assigned, when a new object is created.

2. Object Name
In addition to object identity some object may have unique object name within particular database.·
The name is used to refer to different objects in the program.
Also with help of name user will be able to reference object that are referenced by this object.

3. Structure of object
· Structure defines how the object is constructed using constructor.
· In object oriented database the state of complex object can be constructed from other objects by using certain type of constructor.
· The formal way of representing objects as (i,c,v) where 'i' is object identifier, 'c' is type constructor and 'v' is current value of an object.

4. Object Lifetime

The lifetime specifies period for which variable is valid

i) Transient object

In OOPL, objects which are present only at the time of execution are called as transient object.
For example: Variables in OOPL

ii) Persistent objects


An object oriented (OO) database can store objects permanently, and hence the object persist beyond program termination and can be retrieved later and
shared by other programs such objects are called as persistent objects.
For example: Objects stored in secondary memory
❖ Object Identity
· Every object has unique identity. In an object oriented system, when object is created OID is assigned to it.
· In RDBMS OID is value based and primary key is used to provide uniqueness of each table in relation. Primary key is
unique only for that relation and not for the entire system. Primary key is chosen from the attributes of the relation which
makes object independent on the object state.
· In OODBMS OID are variable name or pointer.

Properties of OID

1. Uniqueness: OID cannot be the same to every object in the system and it is generated automatically by the system.

2. Invariant: OID cannot be changed throughout its entire lifetime.

3. Invisible: OID is not visible to users.

● Complex Object

Object oriented database allows me to have complex objects. Complex object basically means that there is no restriction for
structure of object. It can be created in any manner that database fits.

In object oriented database, object is represented by value of three elements or triples and this three elements are

O:<i,c,v>

Where i is OID, c is type constructor and v is the value of an object


TYPE Constructor:

Here type constructor is something that allows you how this object is constructed or it gives you basic structure of an
object.

Different types of type constructor are as follows:

1) Atom - The object is storing an atomic value. It can be numeric string etc
2) Set - Object stores set of values of the same type with duplication allowed

Eg: {123,234,345,123}

3) Bag - Object stores set of values of the same type without duplication
4) List - Ordered collection of items of same type

(1: 123, 2:234, 3:456, 4:345)

5) Array - Ordered collection of items of same type with fixed size

[123,234,345]

6) Tuple - Collection of elements of above types

Eg:

Name Set of Location Array of employees

Atomic Set array


❖ Object structure
Object consists of entity and attributes which can describe the state of real world object and action associated with that object.
Attributes: Attributes are nothing but the properties of objects in the system.
Example: Employee can have attribute 'name' and 'address' with assigned values as:
Attribute Value
Name Radha
Address Pune
ID 07
Type of Attributes
The three types of attributes are as follows:

1. Simple attributes
Attributes can be of primitive data type such as integer, string, real etc. which can take literal value.
Example: 'ID' is simple attribute and value is 07.

2. Complex attributes
Attributes which consist of collections or reference of other multiple objects are called as complex attributes.
Example: Collection of Employees consists of many employee names.

3. Reference attributes
Attributes that represent a relationship between objects and consist of value or collection of values are called as reference
attributes.
Example: Manager is reference of staff object.
Temporal Database
A temporal database is a database that needs some aspect of time information for the organization. In the temporal
database, each tuple in relation is associated with time. It stores information about the states of the real world and time.
The temporal database does store information about past states it only stores information about current states.
Whenever the state of the database changes, the information in the database gets updated. In many fields, it is very
necessary to store information about past states. For example, a stock database must store information about past stock
prices for analysis. Historical information can be stored manually in the schema.
There are various terminologies in the temporal database:
● Valid Time: The valid time is a time in which the facts are true with respect to the real world.
● Transaction Time: The transaction time of the database is the time at which the fact is currently present in the
database.
● Decision Time: Decision time in the temporal database is the time at which the decision is made about the fact.

Temporal databases use a relational database for support. But relational databases have some problems in temporal
database, i.e. it does not provide support for complex operations. Query operations also provide poor support for
performing temporal queries.
● Temporal database concept combines all database applications that make use of the time aspect while arranging information.
● Generally database models maintain some aspect of the real world having the current state of data and do not store information
about past states of the database.
● When the database state changes, the database gets updated and past information is automatically deleted but in many
applications it is necessary to maintain past information.
● Example: In a medical database we need to keep patients' medical history for treatment of patients.
● Temporal database applications have been developed in the very early age of database design, but the main design and
development of such databases are in the hands of designers and developers.
Examples
Applications that use temporal databases are as follows.
(a) Healthcare database: Keeps track of patients' medical history for further treatment.
(b) Airline reservation system: In general all reservation systems require all time- related information about reservation time, valid arrival time, departure time etc.
(c) Insurance database: History of all accidents and claims is stored along with corresponding time and date for further processing.
(d) Sales database: Sales database needs to maintain sales information along with data and time which may be helpful for further marketing decisions.

Time specification/temporal data types


(a) DATE: 'Date' data type stores year (yyyy) information, month (mm) information and day (dd) information.
There are many formats available.
• YYYY: MM: DD
• DD: MM: YYYY
• DD - MM - YYYY.
• Day - Mon - Year

(b) TIME: 'Time' data type stores time in the form of two digits for hours (HH) information, two digits for minutes (MM) information and two digits for seconds (SS)
information.
Formats :
• HH: MM : SS
• HH - MM - SS

(c) TIMESTAMP: 'Timestamp' combines above two data types together to store complete time information.
Format:
• DD: MM: YY HH: MM: SS

(d) INTERVAL: Interval refers to a period of time. It may span of a few days, months or years.
EXAMPLE:

Date Real world event Database Action What the database


shows

April 7, 1992 Kannan is born Nothing There is no person


called kannan

April 8, 1992 Kannan’s father officially reports Kannan’s Inserted person Kannan lives in Delhi
birth (Kannan, Delhi)

August 26, 2015 After graduation, Kannan moves to Nothing Kannan lives in Delhi
Chennai, but forgets to register his new
address

December 26, 2015 Nothing Nothing Kannan lives in Delhi

December 27, 2015 Kannan registers his Updated: Kannan lives in Chennai
new address Person(Kannan, Chennai)

April 1,2020 Kannan Dies Deleted: There is no person called


Person(Kannan) Kannan
Valid time:
- Valid time is the time for which a fact is true in the real world. A valid time period may be in the past, span the
current time, or occurs in the future. Person(Kannan, Delhi, 7-April-1992, ∞) Person(Kannan, Chennai,
26-August-2015, ∞) Person(Kannan, Delhi, 7-April-1992, 25-August-2015)

Transaction time:
- Transaction time records the time period during which a database entry is accepted as correct.
- Transaction time periods can only occur in the past or up to current time.
- In a transaction time table , records are never deleted.
- Only new records can be inserted, and existing once updated by setting their transaction end time to show that they
are no longer current.
- Person table: Transaction - from and Transaction-To.- Transaction - from is the time a transaction was made and
- Transaction-To is the time that the transaction was superseded
- This makes the table into a bitemporal time.

Bi-Temporal Relations:
- Bitemporal relations contains both valid time and transaction time.
- This provides both historical and rollback information.
- Historical information (e.g. “Where did Kannan live in 1996?”) is provided by valid time.
- Rollback (e.g. “In 1996,where did database believe Kannan lived?”)
- The valid time and Transaction time do not have to be the same for the single fact.
Types of temporal databases
(a) Valid time temporal database
Valid time is defined as time at which particular event occurred or duration during which particular event is considered to be true.
A temporal database that uses valid time is called a valid time temporal database.
Emp_validTime (Emp VT)

Eid Ename Salary DNo VST VET


(Valid Start Time) (Valid End
Time)

Valid time database schema


In the above relation (EMP_VT) the non temporal key (Eid) and Valid Start Time (VST) is treated as new primary
key.

Eid Ename Salary DNo VST VET


• Whenever one or more attributes of above employee table is
1 Akshaya 50000 10 01-01-99 15-12-99 updated
2 Amruta 20000 40 01-01-01 01-01-02 (1) System can overwrite the old values as in case of simple (non
temporal) databases
3 Pallavi 15000 50 01-01-98 01-01-99
OR
4 Bhavana 75000 30 31-12-98 01-01-01 (2) System can cart new version and close current version by
5 Shubhra 25000 20 02-02-01 02-02-10
changing its Valid End Time (VET) to end time.

In these cases we generally go for a second approach i.e. creating a


new version instead of overwriting.
(b) Transaction time temporal database
• Transaction time is defined as the time at which a particular event is actually recorded/stored in a database.
• A temporal database that uses transaction time called as transaction time temporal database:

Eid Ename Salary DNo TST TET


(Transaction Start (Transaction End
Time) Time)

Transaction time database schema


In above relation the non temporal key (Eid) and Transaction Start Time (TST) is treated as the new primary key.
• Example:

Eid Ename Salary DNo TST TET


• A transaction time database has also been called as
1 Akshaya 50000 10 01-01-99 15-12-99 rollback database, because users can logically rollback to
2 Amruta 20000 40 01-01-01 01-01-02 the original database state at any past point in time by
deriving all tuple version U whose transaction turn (U. TST,
3 Pallavi 15000 50 01-01-98 01-01-99
U. TET] include point T.
4 Bhavana 75000 30 31-12-98 01-01-01

5 Shubhra 25000 20 02-02-01 02-02-10


(c) Bitemporal database schema
• In some cases one of above time is required but in many cases we require both of above time dimensions.
Such time is called bitemporal time.
• The database that uses the above time is called a Bitemporal time database schema.

Eid Ename Salary DNo VST VET TST TET


(Valid Start (Valid End (Transaction Start (Transaction End
Time) Time) Time) Time)

Bitemporal time temporal database schema


In above relation (Emp_BT) the non temporal key (EID) and transaction start time (TST) together act as primary key.
• Example:

Eid Ename Salary DNo VST VET TST TET • As shown in above table, tuple whose
1 Akshaya 50000 10 01-01-99 Now 30-10-99 15-12-99 transaction end time (TET) is UC are
representing currently valid information
2 Amruta 20000 40 01-01-01 01-01-02 01-01-01 UC

3 Pallavi 15000 50 01-01-98 01-01-99 01-01-98 UC

4 Bhavana 75000 30 31-12-98 01-01-01 15-02-01 UC

5 Shubhra 25000 20 02-02-01 02-02-10 17-03-09 UC


Temporal query language
(a) Temporal selection: We can select temporal data which involves time attributes.
(b) Temporal projection: We can project temporal data in the projection and inherit those
time intervals from tuples in original relation.
(c) Temporal join: Intersection of time interval of original tuples is derived. The empty tuples
are discarded from join.
(d) Temporal functions dependency: Adding time dimension may invalidate functional
dependency.
X → Y on relation R
For all instances i for R all snapshots of i satisfy functional dependency X → Y.
Applications of Temporal Databases
Finance: It is used to maintain the stock price histories.
1. It can be used in Factory Monitoring System for storing information about current and past readings of
sensors in the factory.
2. Healthcare: The histories of the patient need to be maintained for giving the right treatment.
3. Banking: For maintaining the credit histories of the user.

Examples of Temporal Databases

1. An EMPLOYEE table consists of a Department table that the employee is assigned to. If an employee is
transferred to another department at some point in time, this can be tracked if the EMPLOYEE table is an
application time-period table that assigns the appropriate time periods to each department he/she works for.

Temporal Relation

A temporal relation is defined as a relation in which each tuple in a table of the database is associated with time, the
time can be either transaction time or valid time.
Types of Temporal Relation

There are mainly three types of temporal relations:

1. Uni-Temporal Relation: The relation which is associated with valid or transaction time is called Uni-Temporal relation. It is related to only one time.

2. Bi-Temporal Relation: The relation which is associated with both valid time and transaction time is called a Bi-Temporal relation. Valid time has two parts namely start
time and end time, similar in the case of transaction time.

3. Tri-Temporal Relation: The relation which is associated with three aspects of time namely Valid time, Transaction time, and Decision time called as Tri-Temporal
relation.

Features of Temporal Databases

● The temporal database provides built-in support for the time dimension.
● Temporal database stores data related to the time aspects.
● A temporal database contains Historical data instead of current data.
● It provides a uniform way to deal with historical data.

Challenges of Temporal Databases

1. Data Storage: In temporal databases, each version of the data needs to be stored separately. As a result, storing the data in temporal databases requires
more storage as compared to storing data in non-temporal databases.
2. Schema Design: The temporal database schema must accommodate the time dimension. Creating such a schema is more difficult than creating a schema
for non-temporal databases.
3. Query Processing: Processing the query in temporal databases is slower than processing the query in non-temporal databases due to the additional
complexity of managing temporal data.
Active Databases
➢ An active Database is a database consisting of a set of triggers.
➢ These databases are very difficult to be maintained because of the
complexity that arises in understanding the effect of these triggers.
➢ In such database, DBMS initially verifies whether the particular trigger
specified in the statement that modifies the database is activated or not,
prior to executing the statement.
➢ If the trigger is active then DBMS executes the condition part and then
executes the action part only if the specified condition is evaluated to true.
➢ It is possible to activate more than one trigger within a single statement. In
such situation, DBMS processes each of the trigger randomly.
➢ The execution of an action part of a trigger may either activate other
triggers or the same trigger that Initialized this action.
➢ Such types of trigger that activates itself is called as ‘recursive trigger’.
➢ The DBMS executes such chains of trigger in some predefined manner but
it effects the concept of understanding.
Features of Active Database:
1. It possess all the concepts of a conventional database i.e. data modelling facilities, query language etc.
2. It supports all the functions of a traditional database like data definition, data manipulation, storage
management etc.
3. It supports definition and management of ECA rules.
4. It detects event occurrence.
5. It must be able to evaluate conditions and to execute actions.
6. It means that it has to implement rule execution.

Examples of Active Databases:


1. Real-time Databases
2. In-Memory Databases
3. Transactional Databases
4. Time-series Databases
1.Real-time Databases:
● Oracle TimesTen: A relational database that runs in memory and is intended for real-time applications that need response
times of less than one millisecond.
● VoltDB: A lightning-fast in-memory database for instantaneous analytics and data processing.

2.In-Memory Databases:
● SAP HANA: A column-oriented, in-memory relational database management system for processing large amounts of data and
real-time analytics.
● MemSQL: Uses in-memory processing for real-time data insights, combining analytics and transactions on a single platform.

3.Transactional Databases:
● MySQL Cluster: Offers automatic sharding and synchronous replication for high availability and real-time data access.
● Microsoft SQL Server with Always On: High availability and disaster recovery are provided by Microsoft SQL Server with
Always On, which enables real-time read access to replicated databases.

4.Time-series Databases:
● InfluxDB: For time-stamped data, InfluxDB is designed to withstand heavy write and query loads. It is frequently utilized in IoT
and monitoring applications.
● Prometheus: A toolkit for alerting and monitoring that keeps track of time series data and is used to analyze and monitor
systems in real time.
These databases and platforms support a variety of real-time data handling requirements, including high-throughput
stream processing, low-latency transaction processing, and event-driven architectures.
Advantages :
1. Enhances traditional database functionalities with powerful rule processing capabilities.
2. Enable a uniform and centralized description of the business rules relevant to the information system.
3. Avoids redundancy of checking and repair operations.
4. Suitable platform for building large and efficient knowledge base and expert systems
XML DATABASES
● XML is a markup language is very much like HTML but XML was designed to carry (transfer)
data and not to display data.
● XML stands for Extensible Markup Language which gives a mechanism to define structures of
a document which is to be transferred over internet.
● The XML defines a standard way of adding element to documents. Hence, XML is used for
structured documentation.
● Unlike HTML, XML tags are not predefined one can define their own tags in XML.
➢ Goals of XML
● XML should be directly used over the Internet. Users must be able to view XML
documents easily as like HTML documents. This may be possible only when XML
browsers are as robust and widely available as HTML browsers.
● XML should support a wide variety of applications.
● It should be Easy to write programs and process various XML documents.
● The minimum number of optional features in XML as it causes more confusion in
programmers mind.
● XML documents should be logically clear.
● The design of XML shall be formal and concise and can be prepared very fast
● XML documents shall be easy to create
➢ Well Formed Document
• XML Documents which satisfy below given rules are called as well formed XML
documents,
(a) XML document must start with an XML declaration to indicate the version number of
XML being used all other relevant attributes.
<? Xml version= “1.0” standalone=”YEs”?>
(b) A well-formed XML document should be syntactically correct.
(c) Conditions for Tree Model / syntactically correct XML document
• There should be only one root element.
• Every element is included in an identical pair of start and end tags within the start and
end tags of the parent element.
• Above conditions will ensure that the nested elements specify a well-formed tree
structure.
(d) User defined Tags
• An element within XML document can have any tag name as specified by user. There
is no predefined set of tag names that a document
Knows.
➢ Valid XML Documents
XML Documents which satisfy below given conditions are called as valid XML documents,
(a) Conditions
• The document must be well formed
• Element used in the start and end tag pairs must follow the specified structure.
• This structure is specified in a separate XML DTD (Document Type Definition) file or XML schema
file.
(b) DTD Syntax
• Start with a name is given to the root tag of the document
• Then the elements and their nested structures are specified in top down fashion as below

example.

<! DOCTYPE company [


<! ELEMENT Employee (ID, Name, DeptNoproj) Company
<! ELEMENT ID (#PCDATA)
<! ELEMENT Name (#PCDATA)
<! ELEMENT DeptNo (#PCDATA)
<! ELEMENT project (Name, Number)
<! ELEMENT Name (#PCDATA)
<! ELEMENT Number (#PCDATA)>
Representation XML Objects
➢ Introduction
In the XML document the basic object is XML and it can be represented as hierarchical data model
or tree data model.
➢ Structuring concepts used to construct an XML document

(a) Element
● An element is a group of tags data values that can contain character data, child element or a
mixture of both.
● Element can be of two types: Simple and Complex.
● The elements constructed from other elements by nesting them are called complex elements.
● The elements containing data values are called Simple elements.

(b) Attributes
Additional information that describes elements.

(c) There are some additional concepts used in XML, such as entities, identifiers, and references.
➢ A major difference between XML and HTML
Tag names : In HTML tag names are used to describe how text is to be displayed and in
XML tag names are defined to describe the meaning of the data elements in the document.

Processing : With help of user defined tags it is possible to process the data elements in the
XML document automatically using computer applications.

1) Tree Representation
XML Model is made of two elements :
(i) Complex elements are shown with the help of internal nodes.
(i) Simple elements are shown by Leaf nodes.
Hence, XML Model is called as Tree Model or hierarchical model:

○ Example
(a) Tree Representation
• Simple Elements: ‹Price>, < amount>
• Complex elements: <Drink>, ‹Snack> etc.
2) Textual Representation
• Whenever value of the STANDALONE attribute in an XML document is set to "YES", such XML document is
known as schemaless XML documents.
E.g.
< inventory>
<drink>
<lemonade>
< price > $2.50</price>
< amount > 20</amount>
</lemonade >
<pop>
< price > $1.50</price>
< amount > 10</amount >
</pop>
</drink>
< snack>
< chips>
< price > $4.50</price>
< amount > 60</amount>
</chips>
</snack>
</inventory>
➢ Types of XML documents

(a) Data-centric XML documents


• The document contains many small data items that follow a specific structure and
hence it may be extracted from a structured database are called as data centric XML
document.
• In order to exchange and display data over internet the document is formatted as
XML documents.
(b) Document-centric XML documents
• The document contains large amounts of text, such as news articles or books are
referred as document centric XML documents.
• There are only few structured data elements in these documents.Sometimes
there are no data elements in such documents.
(c) Hybrid XML documents
• If both types of data are used simultaneously in document then it is referred as
hybrid XML document.
• In such documents some parts that contain structured data and other parts that
are predominantly unstructured
XML - Structured Data or Semi Structured Data

● Data centric XML documents sometimes considered as semi structured data or sometimes considered a structured data
● document is considered as structured data if an XML document is written a per prederined XML schema or DTD
● Document is considered as semi structure if an XML allows document that do not Conform to any particular schema.

XML Document Type Definition (DTD)


Introduction
● The way by which we describe a valid syntax of XML document by listing all the elements occur in the document which elements can
occur in combination? How elements can be nested? What attributes are available for each element type? and so on.
● DTD describes the rules for analysing an XML document.
● A DTD ensures that the contents of an XML document conform to expected rules that document should follow.
Example
<! DOCTYPE BOOK [
<! ELEMENT Book (Author, Title, Chapter+)
<! Element Author (# CDATA) >
<! Element Title (# PCDATA) >
<! Element Chapter (# PCDATA) >
<! ATTLIST Chapter # REQUIRED>|>

(i) <! DOCTYPE Book [


The DTD starts with the above line. It indicates that this DTD corresponds to an XML document that contains Book as the main element
or root element.

(i) ‹! Element Book (Author, Title, Chapter +)


This line indicates that the first element in our XML document would be Book. Additionally this line also signifies that the Book element
is the parent of sub-element namely, author, title and chapter. The + sign after chapter indicates that it can contain one or more chapter
elements.

(iii) ‹! Element Author (# CDATA)>


This tag specifies that element Author is type of CDATA character data.

(iv) ‹! ELEMENT Title (# PCDATA) >&<! ELEMENT Chapter # PCDATA) >


This tag specifies that element Title and Chapter are of type PDATA - parsed character data.

v)<!ATTLIST chapter # REQUIRED>


This line specifies that id is an attribute of the element chapter. The #REQUIRED declarative, specifies that id is must.
Spatial Databases
Spatial data is associated with geographic locations such as cities,towns etc. A spatial database is optimized to store and query data
representing objects. These are the objects which are defined in a geometric space.

Characteristics of Spatial Database


A spatial database system has the following characteristics

It is a database system
It offers spatial data types (SDTs) in its data model and query language.
It supports spatial data types in its implementation, providing at least spatial indexing and efficient algorithms for spatial join.

Example
A road map is a visualization of geographic information. A road map is a 2-dimensional object which contains points, lines, and polygons
that can represent cities, roads, and political boundaries such as states or provinces.

In general, spatial data can be of two types −

Vector data: This data is represented as discrete points, lines and polygons
Rastor data: This data is represented as a matrix of square cells.

The spatial data in the form of points, lines, polygons etc. is used by many different databases as shown above.
The spatial DBMS makes spatial data management simple for user and application.
A spatial database supports various concepts for databases which keep information of objects in a multidimensional
region.
There are limited set of data types and operations available for spatial applications which makes the modeling of real
world spatial applications extremely difficult.
Common example of spatial data is a map of railway tracks. This map is two dimensional objects that contain points
and lines that can represent the route of railway and cities.

2) Spatial databases
a. Cartographic databases
This database store maps include two dimensional spatial descriptions of their objects from countries and
states to rivers, cities, roads, seas, and so on.
b. Meteorological databases
Weather information is 3D, since temperatures and other meteorological information are related to three
dimensional spatial points.
3) Spatial database management
• The spatial relationships among the objects are important, and they are often needed when querying the database.
• Some extensions that are needed for spatial databases are models that can interpret spatial characteristics.
• In addition, special indexing and storage structures are often used for improving performance.
• The basic types of extensions required to be include 2 dimensional geometric concepts, like points, lines, circles and polygons in order to specify the
spatial characteristics.

Performance factors
• For better performance special techniques for spatial indexing are needed.
• One of the best known techniques for it is the use of RV trees and their variations.
• RV trees group together objects that are in close spatial physical proximity on the same leaf nodes of a tree-structured index.
• Typical criteria for dividing the space include minimizing the rectangle areas, since this would lead to a quicker narrowing of the search space.

4) Types of data
a. Point data
• A point has a spatial extent characterized completely by its position or location.
• Point data can be a collection of points in multidimensional space.
• Point data stored in a database can be based on direct measurement or data obtained through measurements.
• Raster data is an example of directly measured point data.
b. Region data
• Region data has a spatial extent represented by location and boundaries.
• The location can be shown as the position of fixed points for the region.
• The boundary can be represented as a line in 2D space and as surface in 3D space.

● Region data consist of collection of regions in multidimensional space


● Vector data is example of region data.
● Region data is nothing but a simple geometric approximation to an actual data object.
5. Types of spatial queries
a. Spatial range query
•Spatial range queries having an associated region for it.
•These queries search for the objects of a particular type that are within a given area.
Example:
(i) Find all hotels within 500 Km of Mumbai.
(ii) Find all pubs in Mumbai.
b. Nearest neighbour query
• Finds an object of a particular type that is closest to a given point or location.
• Answers can be ordered by the distance from the object.
• Such Queries are very important in the context of multimedia databases.
• For example, find 10 water parks near Mumbai.
c. Spatial joins queries
• Typically joins the objects of two types based on some spatial condition, such as the objects intersecting or being within a certain distance of
one another.
• If more detailed information is given about record then query may becomes more complex.
• For example, find pair of cities that are within two miles of each other.
• In above example each record is point representing a city, Query can be answered by self join
6. Applications
a. GIS
• These applications are also known as Geographical Information Systems (GIS), and are used in areas such as environmental, emergency, and
battle management.
• Point data and region data must be handled properly.
• ArcInfo is a widely used GIS Software.
b. CAD/CAM
• Spatial objects such as surface of design objects.
• Point data and region data used extensively.
• Range and spatial queries are commonly used.
c Multimedia database system
• They contain objects like images, audio and video and various types of time series data.
• Multimedia data is mapped to a collection of points in which distance between then is very important.
NoSQL Databases
These are used for large sets of distributed data. There are some big data performance issues which are effectively handled by
relational databases, such kind of issues are easily managed by NoSQL databases. There are very efficient in analyzing large
size unstructured data that may be stored at multiple virtual servers of the cloud.

NoSQL is a type of database management system (DBMS) that is designed to handle and store large volumes of unstructured
and semi-structured data. Unlike traditional relational databases that use tables with predefined schemas to store data, NoSQL
databases use flexible data models that can adapt to changes in data structures and are capable of scaling horizontally to
handle growing amounts of data.
The term NoSQL originally referred to “non-SQL” or “non-relational” databases, but the term has since evolved to mean “not
only SQL,” as NoSQL databases have expanded to include a wide range of different database architectures and data models.

NoSQL databases are generally classified into four main categories:

1. Document databases: These databases store data as semi-structured documents, such as JSON or XML, and can be
queried using document-oriented query languages.
2. Key-value stores: These databases store data as key-value pairs, and are optimized for simple and fast read/write
operations.
3. Column-family stores: These databases store data as column families, which are sets of columns that are treated as
a single entity. They are optimized for fast and efficient querying of large amounts of data.
4. Graph databases: These databases store data as nodes and edges, and are designed to handle complex
relationships between data.
Four Types of NoSQL Database
1. Key-value store databases
• This is a very simple NoSQL database.
• It is specially designed for storing data as schema free data.
• Such data is stored in a form of data along with an indexed key.
This type is generally used when you need quick performance for
basic Create-Read-Update-Delete operations and data is not
connected.
Example :
•Storing and retrieving session information for a Web pages.
•Storing user profiles and preferences
•Storing shopping cart data for ecommerce
Limitations
•It may not work well for complex queries attempting to connect
multiple relations of data.
•If data contains lot of many-to-many relationships, a Key-Value store
is likely to show poor performance
Examples
• Cassandra
• Azure Table Storage (ATS)
• DynamoDB
Example of unstructured data for user records
2. Column store database
• Instead of storing data in relational tuples (table rows), it is stored in
cells grouped in columns.
• It offers very high performance and a highly scalable architecture.
Examples:
1. HBase
2. Big Table
3. Hyper Table
Use Cases
• Some common examples of a Column-Family database include event
logging and blogs like document databases, but the data would be
stored in a different fashion.

• In logging, every application can write its own set of columns and
have each row key formatted in such a way to promote easy lookup
based on application and timestamp.
• Counters can be a unique use case. It is possible to design an
application that needs an easy way to count or increment as events
occur.
3. Document database
• Document databases work on the concept of key-value stores where
"documents" contain a lot of complex data.
• Every document contains a unique key, used to retrieve the
document.
• Key is used for storing, retrieving and managing document-oriented
information also known as semi-structured data.
Examples:
MongoDB
CouchDB
• The example of such a system would be an event logging system for
an application or online blogging.
• In online blogging the user acts like a document; each post a
document, and each comment, like, or action would be a document.
• All documents would contain information about the type of data,
username, post content, or timestamp of document creation.
Limitations
• It's challenging for a document store to handle a transaction on
multiple documents.
• Document databases may not be good if data is required in
aggregation.
4. Graph database
• Data is stored as a graph and their relationships are stored as a link
between them whereas an entity acts like a node.
Examples:
Neo4j
Polyglot

Use Cases
•A very important and popular application would be social networking
sites that can benefit by quickly locating friends, friends of friends,
likes, and so on.
• Google Maps can help you to use graphs to easily model their data
for finding close locations or building shortest routes for directions.
•Many recommendation systems make effective use of this model.
Limitations
• Graph Databases may not be offering better choice over other
NoSQL variations.
• If an application needs to scale horizontally this may introduce poor
performance.
• Not very efficient when it needs to update all nodes with a given
parameter.
5. Comparison
NoSQL databases are often used in applications where there is a high volume of data that needs to be processed and
analyzed in real-time, such as social media analytics, e-commerce, and gaming. They can also be used for other
applications, such as content management systems, document management, and customer relationship management.
However, NoSQL databases may not be suitable for all applications, as they may not provide the same level of data
consistency and transactional guarantees as traditional relational databases. It is important to carefully evaluate the
specific needs of an application when choosing a database management system.
NoSQL originally referring to non SQL or non relational is a database that provides a mechanism for storage and
retrieval of data. This data is modeled in means other than the tabular relations used in relational databases. Such
databases came into existence in the late 1960s, but did not obtain the NoSQL moniker until a surge of popularity in
the early twenty-first century. NoSQL databases are used in real-time web applications and big data and their use are
increasing over time.
● NoSQL systems are also sometimes called Not only SQL to emphasize the fact that they may support SQL-like query languages. A NoSQL database
includes simplicity of design, simpler horizontal scaling to clusters of machines,has and finer control over availability. The data structures used by NoSQL
databases are different from those used by default in relational databases which makes some operations faster in NoSQL. The suitability of a given NoSQL
database depends on the problem it should solve.
● NoSQL databases, also known as “not only SQL” databases, are a new type of database management system that has, gained popularity in recent years.
Unlike traditional relational databases, NoSQL databases are designed to handle large amounts of unstructured or semi-structured data, and they can
accommodate dynamic changes to the data model. This makes NoSQL databases a good fit for modern web applications, real-time analytics, and big data
processing.
● Data structures used by NoSQL databases are sometimes also viewed as more flexible than relational database tables. Many NoSQL stores compromise
consistency in favor of availability, speed,, and partition tolerance. Barriers to the greater adoption of NoSQL stores include the use of low-level query
languages, lack of standardized interfaces, and huge previous investments in existing relational databases.
● Most NoSQL stores lack true ACID(Atomicity, Consistency, Isolation, Durability) transactions but a few databases, such as MarkLogic, Aerospike, FairCom
c-treeACE, Google Spanner (though technically a NewSQL database), Symas LMDB, and OrientDB have made them central to their designs.
● Most NoSQL databases offer a concept of eventual consistency in which database changes are propagated to all nodes so queries for data might not return
updated data immediately or might result in reading data that is not accurate which is a problem known as stale reads. Also,has some NoSQL systems may
exhibit lost writes and other forms of data loss. Some NoSQL systems provide concepts such as write-ahead logging to avoid data loss.
● One simple example of a NoSQL database is a document database. In a document database, data is stored in documents rather than tables. Each document
can contain a different set of fields, making it easy to accommodate changing data requirements
● For example, “Take, for instance, a database that holds data regarding employees.”. In a relational database, this information might be stored in tables, with
one table for employee information and another table for department information. In a document database, each employee would be stored as a separate
document, with all of their information contained within the document.
● NoSQL databases are a relatively new type of database management system that hasa gained popularity in recent years due to their scalability and
flexibility. They are designed to handle large amounts of unstructured or semi-structured data and can handle dynamic changes to the data model. This
makes NoSQL databases a good fit for modern web applications, real-time analytics, and big data processing.
Key Features of NoSQL:
1. Dynamic schema: NoSQL databases do not have a fixed schema and can accommodate changing data
structures without the need for migrations or schema alterations.
2. Horizontal scalability: NoSQL databases are designed to scale out by adding more nodes to a database
cluster, making them well-suited for handling large amounts of data and high levels of traffic.
3. Document-based: Some NoSQL databases, such as MongoDB, use a document-based data model, where
data is stored in a scalessemi-structured format, such as JSON or BSON.
4. Key-value-based: Other NoSQL databases, such as Redis, use a key-value data model, where data is stored
as a collection of key-value pairs.
5. Column-based: Some NoSQL databases, such as Cassandra, use a column-based data model, where data is
organized into columns instead of rows.
6. Distributed and high availability: NoSQL databases are often designed to be highly available and to
automatically handle node failures and data replication across multiple nodes in a database cluster.
7. Flexibility: NoSQL databases allow developers to store and retrieve data in a flexible and dynamic manner,
with support for multiple data types and changing data structures.
8. Performance: NoSQL databases are optimized for high performance and can handle a high volume of reads
and writes, making them suitable for big data and real-time applications.
Advantages of NoSQL: There are many advantages of working with NoSQL databases such as MongoDB and Cassandra. The main
advantages are high scalability and high availability.
1. High scalability: NoSQL databases use sharding for horizontal scaling. Partitioning of data and placing it on multiple machines in
such a way that the order of the data is preserved is sharding. Vertical scaling means adding more resources to the existing
machine whereas horizontal scaling means adding more machines to handle the data. Vertical scaling is not that easy to
implement but horizontal scaling is easy to implement. Examples of horizontal scaling databases are MongoDB, Cassandra, etc.
NoSQL can handle a huge amount of data because of scalability, as the data grows NoSQL scalesThe auto itself to handle that
data in an efficient manner.
2. Flexibility: NoSQL databases are designed to handle unstructured or semi-structured data, which means that they can
accommodate dynamic changes to the data model. This makes NoSQL databases a good fit for applications that need to handle
changing data requirements.
3. High availability: The auto, replication feature in NoSQL databases makes it highly available because in case of any failure data
replicates itself to the previous consistent state.
4. Scalability: NoSQL databases are highly scalable, which means that they can handle large amounts of data and traffic with ease.
This makes them a good fit for applications that need to handle large amounts of data or traffic
5. Performance: NoSQL databases are designed to handle large amounts of data and traffic, which means that they can offer
improved performance compared to traditional relational databases.
6. Cost-effectiveness: NoSQL databases are often more cost-effective than traditional relational databases, as they are typically
less complex and do not require expensive hardware or software.
7. Agility: Ideal for agile development.
Disadvantages of NoSQL: NoSQL has the following disadvantages.

1. Lack of standardization: There are many different types of NoSQL databases, each with its own unique strengths and weaknesses. This lack of standardization can make it
difficult to choose the right database for a specific application
2. Lack of ACID compliance: NoSQL databases are not fully ACID-compliant, which means that they do not guarantee the consistency, integrity, and durability of data. This can
be a drawback for applications that require strong data consistency guarantees.
3. Narrow focus: NoSQL databases have a very narrow focus as it is mainly designed for storage but it provides very little functionality. Relational databases are a better choice in
the field of Transaction Management than NoSQL.
4. Open-source: NoSQL is an databaseopen-source database. There is no reliable standard for NoSQL yet. In other words, two database systems are likely to be unequal.
5. Lack of support for complex queries: NoSQL databases are not designed to handle complex queries, which means that they are not a good fit for applications that require
complex data analysis or reporting.
6. Lack of maturity: NoSQL databases are relatively new and lack the maturity of traditional relational databases. This can make them less reliable and less secure than
traditional databases.
7. Management challenge: The purpose of big data tools is to make the management of a large amount of data as simple as possible. But it is not so easy. Data management in
NoSQL is much more complex than in a relational database. NoSQL, in particular, has a reputation for being challenging to install and even more hectic to manage on a daily
basis.
8. GUI is not available: GUI mode tools to access the database are not flexibly available in the market.
9. Backup: Backup is a great weak point for some NoSQL databases like MongoDB. MongoDB has no approach for the backup of data in a consistent manner.
10. Large document size: Some database systems like MongoDB and CouchDB store data in JSON format. This means that documents are quite large (BigData, network
bandwidth, speed), and having descriptive key names actually hurts since they increase the document size.

Types of NoSQL database: Types of NoSQL databases and the name of the database system that falls in that category are:

1. Graph Databases: Examples – Amazon Neptune, Neo4j


2. Key value store: Examples – Memcached, Redis, Coherence
3. Column: Examples – Hbase, Big Table, Accumulo
4. Document-based: Examples – MongoDB, CouchDB, Cloudant
When should NoSQL be used:
1. When a huge amount of data needs to be stored and retrieved.
2. The relationship between the data you store is not that important
3. The data changes over time and is not structured.
4. Support of Constraints and Joins is not required at the database level
5. The data is growing continuously and you need to scale the database regularly to handle the data.

In conclusion, NoSQL databases offer several benefits over traditional relational databases, such as scalability,
flexibility, and cost-effectiveness. However, they also have several drawbacks, such as a lack of standardization, lack
of ACID compliance, and lack of support for complex queries. When choosing a database for a specific application, it
is important to weigh the benefits and drawbacks carefully to determine the best fit.

You might also like