ADBMS
ADBMS
Distributed Databases
❖ Definition of Distributed databases
A distributed database is a collection of multiple interconnected databases, which are
spread physically across various locations that communicate via a computer network.
Features
➢ Databases in the collection are logically interrelated with each other. Often they
represent a single logical database.
➢ Data is physically stored across multiple sites. Data in each site can be managed
by a DBMS independent of the other sites.
➢ The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
➢ A distributed database is not a loosely connected file system.
➢ A distributed database incorporates transaction processing, but it is not
synonymous with a transaction processing system.
❖ Distributed Database Management System
(DDBMS)
A distributed database management system (DDBMS) is a centralized
software system that manages a distributed database in a manner as if it
were all stored in a single location.
Features
Modular Development − If the system needs to be expanded to new locations or new units, in centralized database systems, the action
requires substantial efforts and disruption in the existing functioning. However, in distributed databases, the work simply requires adding new
computers and local data to the new site and finally connecting them to the distributed system, with no interruption in current functions.
More Reliable − In case of database failures, the total system of centralized databases comes to a halt. However, in distributed systems, when
a component fails, the functioning of the system continues may be at a reduced performance. Hence DDBMS is more reliable.
Better Response − If data is distributed in an efficient manner, then user requests can be met from local data itself, thus providing faster
response. On the other hand, in centralized systems, all queries have to pass through the central computer for processing, which increases the
response time.
Lower Communication Cost − In distributed database systems, if data is located locally where it is mostly used, then the communication
costs for data manipulation can be minimized. This is not feasible in centralized systems.
Easier Expansion - When we to expand system in centralized database, that time we need to down whole system for particular time. While in
distributed databases you don’t need to down the whole system when you want to add or remove new site in the same system.
Reduced Operating cost - As we have divided centralized data at different sites so data is easily accessible which reduces operating cost.
Fast data processing - As data has divided in number of sites so processing of data becomes fast.
Disadvantages of Distributed Databases
Location transparency ensures that the user can query on any table(s) or fragment(s) of a table as if they were stored locally in the user’s site. The fact that the table
or its fragments are stored at remote site in a distributed database system, should be completely oblivious to the end user. The address of the remote site(s) and the access
mechanisms are completely hidden. In order to incorporate location transparency, DDBMS should have access to updated and accurate data dictionary and DDBMS
directory which contains the details of the locations of data.
➢ Fragmentation Transparency
Fragmentation transparency enables users to query upon any table as if it were unfragmented. Thus, it hides the fact that the table the user is querying on is actually a
fragment or union of some fragments. It also conceals the fact that the fragments are located at diverse sites. This is somewhat similar to users of SQL views, where the user
may not know that they are using a view of a table instead of the table itself.
➢ Replication Transparency
Replication transparency ensures that replication of databases are hidden from the users. It enables users to query upon a table as if only a single copy of the table
exists.
Replication transparency is associated with concurrency transparency and failure transparency. Whenever a user updates a data item, the update is reflected in all the copies
of the table. However, this operation should not be known to the user. This is concurrency transparency. Also, in case of failure of a site, the user can still proceed with his
queries using replicated copies without any knowledge of failure. This is failure transparency.
➢ Combination of Transparencies
In any distributed database system, the designer should ensure that all the stated transparencies are maintained to a considerable extent. The designer may choose
to fragment tables, replicate them and store them at different sites; all oblivious to the end user. However, complete distribution transparency is a tough task and requires
considerable design efforts.
❖ DDBMS Architecture
Some of the common architectural models are −
Design problems are those problems which you can face while using distributed databases. Here we have listed some issues
1) Distributed database design
There are two alternative design methods
i) Partitioned database
Here we divides data in different parts
ii) Replicated database
Here we replicate or make copies of data at different sites or all sites like partial or full replication
Bottom up Approach
Fragmentation can be of three types: horizontal, vertical, and hybrid (combination of horizontal
and vertical).
Horizontal fragmentation can further be classified into two techniques: primary horizontal
fragmentation and derived horizontal fragmentation.
Fragmentation should be done in a way so that the original table can be reconstructed from the
fragments. This is needed so that the original table can be reconstructed from the fragments
whenever required. This requirement is called “re-constructiveness.”
● Advantages of Fragmentation
Since data is stored close to the site of usage, efficiency of the database system is increased.
Local query optimization techniques are sufficient for most queries since data is locally available.
Since irrelevant data is not available at the sites, security and privacy of the database system can be maintained.
● Disadvantages of Fragmentation
When data from different fragments are required, the access speeds may be very high.
In case of recursive fragmentations, the job of reconstruction will need expensive techniques.
Lack of back-up copies of data in different sites may render the database ineffective in case of failure of a site.
★ Vertical Fragmentation
In vertical fragmentation, the fields or columns of a table are grouped into fragments. In order to maintain re-constructiveness, each fragment
should contain the primary key field(s) of the table. Vertical fragmentation can be used to enforce privacy of data.
For example, let us consider that a University database keeps records of all registered students in a Student table having the following schema.
STUDENT
Now, the fees details are maintained in the accounts section. In this
case, the designer will fragment the database as follows −
FROM STUDENT;
★ Horizontal Fragmentation
Horizontal fragmentation groups the tuples of a table in accordance to the values of one or more fields.
Horizontal fragmentation should also confirm to the rule of re-constructiveness. Each horizontal
fragment must have all columns of the original base table.
CREATE COMP_STD
★ Hybrid Fragmentation
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques are used.
This is the most flexible fragmentation technique since it generates fragments with minimal extraneous
information. However, reconstruction of the original table is often an expensive task.
At first, generate a set of horizontal fragments; then generate vertical fragments from one or
more of the horizontal fragments.
At first, generate a set of vertical fragments; then generate horizontal fragments from one or
more of the vertical fragments.
❖ Data Replication
Data replication is the process of storing separate copies of the database at two or
more sites. It is a popular fault tolerance technique of distributed databases.
Query Blocks
Query submitted to the database system is first decomposed into query blocks.
A query block forms a basic unit that can be translated into relational algebra expression and optimized.
❖ Distributed Reliability Protocols
Recovery in distributed database
Recovery is more complicated than in a centralised system
The failures related to distributed databases are:
- Loss of message
- Failure of site at which subtransaction is running
- Failure of communication link
→ Recovery system must ensure atomicity(all or none). Certain protocol helps in guaranteeing the recovery as
follows:
❖ Two phase commit protocol
Two phase commit protocol contains two phases
1) Voting phase
In this, participating sites vote on whether they are ready to commit the transaction or not
Steps of voting phase are as follows
i) [T, Prepare]- CS sends this message to all participating msg that CS is ready to commit and creates an entry
in the log files.
ii) [T,Ready] or [T, not ready] depending upon whether PS site ready to commit or not and keep this entry in
their respective log files.
2) Decision phase
In this, the coordinator site decides whether the transaction can be committed or has to be aborted.
i) {Ready, T] message from all participating sites allow CS to commit transaction
ii) At Least one [not ready T] abort the transaction T will allow CS to abort transaction T
- Failure of participating site:
i) Site fails before sending ‘Ready T’ then Transaction is aborted
ii) Site fails after sending ‘Ready T’ then transaction proceeds in a normal way
- Recovery process when participating site restarts after failure:
PS will check Log file
↗ [T, commit]--> Redo transactions
i) LOG
↘ [T, abort] → undo transactions
ii) LOG → [T,ready] ------> Then site contacts CS whether commit or abort T
iii) No such record in log → Undo T and abort transaction
- Failure of coordinators site:
All the participant site communicates with each other to determine the status of T
↗ [T, commit] → Commit T
i) Any site
↘ [T, abort]--> ABORT T
➢ Companies need to handle huge amount of data with high data transfer rate.
➢ The client server and centralized system is not much efficient. The need to improve the efficiency gave birth to the concept of Parallel
Databases.
➢ Parallel database system improves performance of data processing using multiple resources in parallel, like multiple CPU and disks are
used parallelly.
➢ It also performs many parallelization operations like, data loading and query processing.
A parallel database is one which involves multiple processors and working in parallel on the database used to provide the services.
A parallel database system seeks to improve performance through parallelization of various operations like loading data, building
index and evaluating queries parallel systems improve processing and I/O speeds by using multiple CPU’s and disks in parallel.
Need :
Multiple resources like CPUs and Disks are used in parallel. The operations are performed simultaneously, as opposed
to serial processing. A parallel server can allow access to a single database by users on multiple machines. It also
performs many parallelization operations like data loading, query processing, building indexes, and evaluating
queries.
Advantages :
1. Performance Improvement –
By connecting multiple resources like CPU and disks in parallel we can significantly increase the performance of the
system.
2. High availability –
In the parallel database, nodes have less contact with each other, so the failure of one node doesn’t cause for failure of
the entire system. This amounts to significantly higher database availability.
4. Increase Reliability –
When one site fails, the execution can continue with another available site which is having a copy of data. Making the
system more reliable.
❖ Working of parallel database
Parallel database works in step by step manner −
● Shared disk system uses multiple processors which are accessible to multiple disks via
intercommunication channel and every processor has local memory.
● Each processor has its own memory so the data sharing is efficient.
● The system built around this system are called as clusters.
● Shared disk system has limited scalability as large amount of data travels through the
interconnection channel.
● If more processors are added the existing processors are slowed down.
Digital Equipment Corporation(DEC): DEC cluster running relational databases use the
shared disk system and now owned by Oracle.
3) Shared Nothing Disk System
● Each processor in the shared nothing system has its own local memory and local
disk.
● Processors can communicate with each other through intercommunication channel.
● Any processor can act as a server to serve the data which is stored on local disk.
● Number of processors and disk can be connected as per the requirement in share
nothing disk system.
● Shared nothing disk system can support for many processors, which makes the
system more scalable.
● Hierarchical model system is a hybrid of shared memory system, shared disk system and shared nothing system.
● Hierarchical model is also known as Non-Uniform Memory Architecture (NUMA).
● In this system each group of processor has a local memory. But processors from other groups can access memory which is associated
with the other group in coherent.
● NUMA uses local and remote memory(Memory from other group), hence it will take a longer time to communicate with each other.
Advantages of NUMA
Disadvantages of NUMA
1. Response time: It is the time taken to complete a single task for given time.
● Speed up is the process of increasing degree of (resources) parallelism to complete a running task in less time.
● Adding more resources leads to less time for solving the same problem.
● The time required for running task is inversely proportional to number of resources.
● Formula:
Speed up = TS / TL
Where, TS = Time required to execute task of size Q & TL = Time required to execute task of size N*Q
● Linear speed-up is N (Number of resources).
● Speed-up is sub-linear if speed-up is less than N.
Formula:
Let Q be the Task and QN the task where N is greater than Q
TS = Execution time of task Q on smaller machine MS
TL = Execution time of task Q on smaller machine ML
Scale Up = TS /TL
❖ Benefits of parallel Database
The benefits of the parallel database are explained below −
➢ Speed
■ Speed is the main advantage of parallel databases. The server breaks up a request for a user database into parts and
sends each part to a separate computer.
■ We eventually function on the pieces and combine the outputs, returning them to the customer. It speeds up most
requests for data so that large databases can be reached more easily.
➢ Capacity
■ As more users request access to the database, the network administrators are adding more machines to the parallel
server, increasing their overall capacity.
■ For example, a parallel database enables a large online store to have at the same time access to information from
thousands of users. With a single server, this level of performance is not feasible.
➢ Reliability
■ Despite the failure of any computer in the cluster, a properly configured parallel database will continue to work. The
database server senses that there is no response from a single computer and redirects its function to the other
computers.
■ Many companies, such as online retailers, want their database to be accessible as fast as possible. This is where a
parallel database stands good.
■ This method also helps in conducting scheduled maintenance on a computer-by-computer technician. They send a
server command to uninstall the affected device, then perform the maintenance and update required.
❖ Benefits for queries
Parallel query processing can benefit the following types of queries −
➢ Select statements that scan large numbers of pages but output a few rows only.
➢ Select statements that include union, order by, or distinct, since these queries can populate worktables in parallel,
and can make use of parallel sorting.
➢ Select statements that use merge joins can use parallel processing for scanning tables and also for sorting and
merging.
➢ Select statements where the reformatting strategy is chosen by the optimizer, since these can populate worktables
in parallel, and can make use of parallel sorting.
➢ Create index statements, and the alter table - add constraint clauses that create indexes, unique and primary keys.
❖ Query Parallelism: I/O Parallelism (Data Partitioning)
● In this technique query is divided in sub queries which can run simultaneously on different processors, this will minimize the query
evaluation time.
● Intra query parallelism improves the response time of the system.
For Example: If we have 6 queries, which can take 3 seconds to complete the evaluation process, the total time to complete the evaluation
process is 18 seconds. But We can achieve this task in only 3 seconds by using intra query evaluation as each query is divided in sub-queries.
Optimization of Parallel Query
● Parallel Query optimization is nothing but selecting the efficient query evaluation plan.
● Parallel Query optimization plays an important role in developing system to minimize the cost of query evaluation.
● Speed up the queries by finding the queries which can give the fastest result on execution.
● Increase the performance of the system.
● Select the best query evaluation plan.
● Avoid the unwanted plan.
Object Oriented Database
➢ Object oriented database systems are alternative to
In OODBMS, every entity is considered as object and represented in a table. Similar objects are classified to classes and subclasses and
relationship between two objects is maintained using concept of inverse reference.
1. Complexity
OODBMS has the ability to represent the complex internal structure (of an object) with multilevel complexity.
2. Inheritance
Creating a new object from an existing object in such a way that new object inherits all characteristics of an existing object.
3. Encapsulation
It is a data hiding concept in OOPL which binds the data and functions together which can manipulate data and not visible to the outside
world.
4. Persistency
OODBMS allows to create persistent object (Object remains in memory even after execution). This feature can automatically solve the
problem of recovery and concurrency.
Advantages:
Supports Complex Data Structures: ODBMS is designed to handle complex data structures, such as inheritance, polymorphism, and encapsulation. This
makes it easier to work with complex data models in an object-oriented programming environment.
Improved Performance: ODBMS provides improved performance compared to traditional relational databases for complex data models. ODBMS can
reduce the amount of mapping and translation required between the programming language and the database, which can improve performance.
Reduced Development Time: ODBMS can reduce development time since it eliminates the need to map objects to tables and allows developers to work
directly with objects in the database.
Supports Rich Data Types: ODBMS supports rich data types, such as audio, video, images, and spatial data, which can be challenging to store and retrieve
in traditional relational databases.
Scalability: ODBMS can scale horizontally and vertically, which means it can handle larger volumes of data and can support more users.
Disadvantages:
Limited Adoption: ODBMS is not as widely adopted as traditional relational databases, which means it may be more challenging to find developers with
experience working with ODBMS.
Lack of Standardization: ODBMS lacks standardization, which means that different vendors may implement different features and functionality.
Cost: ODBMS can be more expensive than traditional relational databases since it requires specialized software and hardware.
Integration with Other Systems: ODBMS can be challenging to integrate with other systems, such as business intelligence tools and reporting software.
Scalability Challenges: ODBMS may face scalability challenges due to the complexity of the data models it supports, which can make it challenging to
partition data across multiple nodes.
What is Object?
❖ Object consists of entity and attributes which can describe the state of real world object and action associated
with that object.
❖ Object is uniquely identifiable entity that contains both attributes that describe the state of real world object and
the action associated with it.
❖ An object typically has two components; state(value) and behaviour(operations)
❖ Hence it is somewhat similar to a program variable in a programming language, except that it will have a
complex data structure as well as specific operations defined by the programmer.
❖ Each object has to maintain information about current state of object and additionally has action and behaviour
that have to be modelled.
❖ The current state of object is described by one or more attributes(instance variables).
Characteristics of Object
Some important characteristics of an object are:
1. Object identifier
This is system generated identifier which is assigned, when a new object is created.
2. Object Name
In addition to object identity some object may have unique object name within particular database.·
The name is used to refer to different objects in the program.
Also with help of name user will be able to reference object that are referenced by this object.
3. Structure of object
· Structure defines how the object is constructed using constructor.
· In object oriented database the state of complex object can be constructed from other objects by using certain type of constructor.
· The formal way of representing objects as (i,c,v) where 'i' is object identifier, 'c' is type constructor and 'v' is current value of an object.
4. Object Lifetime
i) Transient object
In OOPL, objects which are present only at the time of execution are called as transient object.
For example: Variables in OOPL
Properties of OID
1. Uniqueness: OID cannot be the same to every object in the system and it is generated automatically by the system.
● Complex Object
Object oriented database allows me to have complex objects. Complex object basically means that there is no restriction for
structure of object. It can be created in any manner that database fits.
In object oriented database, object is represented by value of three elements or triples and this three elements are
O:<i,c,v>
Here type constructor is something that allows you how this object is constructed or it gives you basic structure of an
object.
1) Atom - The object is storing an atomic value. It can be numeric string etc
2) Set - Object stores set of values of the same type with duplication allowed
Eg: {123,234,345,123}
3) Bag - Object stores set of values of the same type without duplication
4) List - Ordered collection of items of same type
[123,234,345]
Eg:
1. Simple attributes
Attributes can be of primitive data type such as integer, string, real etc. which can take literal value.
Example: 'ID' is simple attribute and value is 07.
2. Complex attributes
Attributes which consist of collections or reference of other multiple objects are called as complex attributes.
Example: Collection of Employees consists of many employee names.
3. Reference attributes
Attributes that represent a relationship between objects and consist of value or collection of values are called as reference
attributes.
Example: Manager is reference of staff object.
Temporal Database
A temporal database is a database that needs some aspect of time information for the organization. In the temporal
database, each tuple in relation is associated with time. It stores information about the states of the real world and time.
The temporal database does store information about past states it only stores information about current states.
Whenever the state of the database changes, the information in the database gets updated. In many fields, it is very
necessary to store information about past states. For example, a stock database must store information about past stock
prices for analysis. Historical information can be stored manually in the schema.
There are various terminologies in the temporal database:
● Valid Time: The valid time is a time in which the facts are true with respect to the real world.
● Transaction Time: The transaction time of the database is the time at which the fact is currently present in the
database.
● Decision Time: Decision time in the temporal database is the time at which the decision is made about the fact.
Temporal databases use a relational database for support. But relational databases have some problems in temporal
database, i.e. it does not provide support for complex operations. Query operations also provide poor support for
performing temporal queries.
● Temporal database concept combines all database applications that make use of the time aspect while arranging information.
● Generally database models maintain some aspect of the real world having the current state of data and do not store information
about past states of the database.
● When the database state changes, the database gets updated and past information is automatically deleted but in many
applications it is necessary to maintain past information.
● Example: In a medical database we need to keep patients' medical history for treatment of patients.
● Temporal database applications have been developed in the very early age of database design, but the main design and
development of such databases are in the hands of designers and developers.
Examples
Applications that use temporal databases are as follows.
(a) Healthcare database: Keeps track of patients' medical history for further treatment.
(b) Airline reservation system: In general all reservation systems require all time- related information about reservation time, valid arrival time, departure time etc.
(c) Insurance database: History of all accidents and claims is stored along with corresponding time and date for further processing.
(d) Sales database: Sales database needs to maintain sales information along with data and time which may be helpful for further marketing decisions.
(b) TIME: 'Time' data type stores time in the form of two digits for hours (HH) information, two digits for minutes (MM) information and two digits for seconds (SS)
information.
Formats :
• HH: MM : SS
• HH - MM - SS
(c) TIMESTAMP: 'Timestamp' combines above two data types together to store complete time information.
Format:
• DD: MM: YY HH: MM: SS
(d) INTERVAL: Interval refers to a period of time. It may span of a few days, months or years.
EXAMPLE:
April 8, 1992 Kannan’s father officially reports Kannan’s Inserted person Kannan lives in Delhi
birth (Kannan, Delhi)
August 26, 2015 After graduation, Kannan moves to Nothing Kannan lives in Delhi
Chennai, but forgets to register his new
address
December 27, 2015 Kannan registers his Updated: Kannan lives in Chennai
new address Person(Kannan, Chennai)
Transaction time:
- Transaction time records the time period during which a database entry is accepted as correct.
- Transaction time periods can only occur in the past or up to current time.
- In a transaction time table , records are never deleted.
- Only new records can be inserted, and existing once updated by setting their transaction end time to show that they
are no longer current.
- Person table: Transaction - from and Transaction-To.- Transaction - from is the time a transaction was made and
- Transaction-To is the time that the transaction was superseded
- This makes the table into a bitemporal time.
Bi-Temporal Relations:
- Bitemporal relations contains both valid time and transaction time.
- This provides both historical and rollback information.
- Historical information (e.g. “Where did Kannan live in 1996?”) is provided by valid time.
- Rollback (e.g. “In 1996,where did database believe Kannan lived?”)
- The valid time and Transaction time do not have to be the same for the single fact.
Types of temporal databases
(a) Valid time temporal database
Valid time is defined as time at which particular event occurred or duration during which particular event is considered to be true.
A temporal database that uses valid time is called a valid time temporal database.
Emp_validTime (Emp VT)
Eid Ename Salary DNo VST VET TST TET • As shown in above table, tuple whose
1 Akshaya 50000 10 01-01-99 Now 30-10-99 15-12-99 transaction end time (TET) is UC are
representing currently valid information
2 Amruta 20000 40 01-01-01 01-01-02 01-01-01 UC
1. An EMPLOYEE table consists of a Department table that the employee is assigned to. If an employee is
transferred to another department at some point in time, this can be tracked if the EMPLOYEE table is an
application time-period table that assigns the appropriate time periods to each department he/she works for.
Temporal Relation
A temporal relation is defined as a relation in which each tuple in a table of the database is associated with time, the
time can be either transaction time or valid time.
Types of Temporal Relation
1. Uni-Temporal Relation: The relation which is associated with valid or transaction time is called Uni-Temporal relation. It is related to only one time.
2. Bi-Temporal Relation: The relation which is associated with both valid time and transaction time is called a Bi-Temporal relation. Valid time has two parts namely start
time and end time, similar in the case of transaction time.
3. Tri-Temporal Relation: The relation which is associated with three aspects of time namely Valid time, Transaction time, and Decision time called as Tri-Temporal
relation.
● The temporal database provides built-in support for the time dimension.
● Temporal database stores data related to the time aspects.
● A temporal database contains Historical data instead of current data.
● It provides a uniform way to deal with historical data.
1. Data Storage: In temporal databases, each version of the data needs to be stored separately. As a result, storing the data in temporal databases requires
more storage as compared to storing data in non-temporal databases.
2. Schema Design: The temporal database schema must accommodate the time dimension. Creating such a schema is more difficult than creating a schema
for non-temporal databases.
3. Query Processing: Processing the query in temporal databases is slower than processing the query in non-temporal databases due to the additional
complexity of managing temporal data.
Active Databases
➢ An active Database is a database consisting of a set of triggers.
➢ These databases are very difficult to be maintained because of the
complexity that arises in understanding the effect of these triggers.
➢ In such database, DBMS initially verifies whether the particular trigger
specified in the statement that modifies the database is activated or not,
prior to executing the statement.
➢ If the trigger is active then DBMS executes the condition part and then
executes the action part only if the specified condition is evaluated to true.
➢ It is possible to activate more than one trigger within a single statement. In
such situation, DBMS processes each of the trigger randomly.
➢ The execution of an action part of a trigger may either activate other
triggers or the same trigger that Initialized this action.
➢ Such types of trigger that activates itself is called as ‘recursive trigger’.
➢ The DBMS executes such chains of trigger in some predefined manner but
it effects the concept of understanding.
Features of Active Database:
1. It possess all the concepts of a conventional database i.e. data modelling facilities, query language etc.
2. It supports all the functions of a traditional database like data definition, data manipulation, storage
management etc.
3. It supports definition and management of ECA rules.
4. It detects event occurrence.
5. It must be able to evaluate conditions and to execute actions.
6. It means that it has to implement rule execution.
2.In-Memory Databases:
● SAP HANA: A column-oriented, in-memory relational database management system for processing large amounts of data and
real-time analytics.
● MemSQL: Uses in-memory processing for real-time data insights, combining analytics and transactions on a single platform.
3.Transactional Databases:
● MySQL Cluster: Offers automatic sharding and synchronous replication for high availability and real-time data access.
● Microsoft SQL Server with Always On: High availability and disaster recovery are provided by Microsoft SQL Server with
Always On, which enables real-time read access to replicated databases.
4.Time-series Databases:
● InfluxDB: For time-stamped data, InfluxDB is designed to withstand heavy write and query loads. It is frequently utilized in IoT
and monitoring applications.
● Prometheus: A toolkit for alerting and monitoring that keeps track of time series data and is used to analyze and monitor
systems in real time.
These databases and platforms support a variety of real-time data handling requirements, including high-throughput
stream processing, low-latency transaction processing, and event-driven architectures.
Advantages :
1. Enhances traditional database functionalities with powerful rule processing capabilities.
2. Enable a uniform and centralized description of the business rules relevant to the information system.
3. Avoids redundancy of checking and repair operations.
4. Suitable platform for building large and efficient knowledge base and expert systems
XML DATABASES
● XML is a markup language is very much like HTML but XML was designed to carry (transfer)
data and not to display data.
● XML stands for Extensible Markup Language which gives a mechanism to define structures of
a document which is to be transferred over internet.
● The XML defines a standard way of adding element to documents. Hence, XML is used for
structured documentation.
● Unlike HTML, XML tags are not predefined one can define their own tags in XML.
➢ Goals of XML
● XML should be directly used over the Internet. Users must be able to view XML
documents easily as like HTML documents. This may be possible only when XML
browsers are as robust and widely available as HTML browsers.
● XML should support a wide variety of applications.
● It should be Easy to write programs and process various XML documents.
● The minimum number of optional features in XML as it causes more confusion in
programmers mind.
● XML documents should be logically clear.
● The design of XML shall be formal and concise and can be prepared very fast
● XML documents shall be easy to create
➢ Well Formed Document
• XML Documents which satisfy below given rules are called as well formed XML
documents,
(a) XML document must start with an XML declaration to indicate the version number of
XML being used all other relevant attributes.
<? Xml version= “1.0” standalone=”YEs”?>
(b) A well-formed XML document should be syntactically correct.
(c) Conditions for Tree Model / syntactically correct XML document
• There should be only one root element.
• Every element is included in an identical pair of start and end tags within the start and
end tags of the parent element.
• Above conditions will ensure that the nested elements specify a well-formed tree
structure.
(d) User defined Tags
• An element within XML document can have any tag name as specified by user. There
is no predefined set of tag names that a document
Knows.
➢ Valid XML Documents
XML Documents which satisfy below given conditions are called as valid XML documents,
(a) Conditions
• The document must be well formed
• Element used in the start and end tag pairs must follow the specified structure.
• This structure is specified in a separate XML DTD (Document Type Definition) file or XML schema
file.
(b) DTD Syntax
• Start with a name is given to the root tag of the document
• Then the elements and their nested structures are specified in top down fashion as below
example.
(a) Element
● An element is a group of tags data values that can contain character data, child element or a
mixture of both.
● Element can be of two types: Simple and Complex.
● The elements constructed from other elements by nesting them are called complex elements.
● The elements containing data values are called Simple elements.
(b) Attributes
Additional information that describes elements.
(c) There are some additional concepts used in XML, such as entities, identifiers, and references.
➢ A major difference between XML and HTML
Tag names : In HTML tag names are used to describe how text is to be displayed and in
XML tag names are defined to describe the meaning of the data elements in the document.
Processing : With help of user defined tags it is possible to process the data elements in the
XML document automatically using computer applications.
1) Tree Representation
XML Model is made of two elements :
(i) Complex elements are shown with the help of internal nodes.
(i) Simple elements are shown by Leaf nodes.
Hence, XML Model is called as Tree Model or hierarchical model:
○ Example
(a) Tree Representation
• Simple Elements: ‹Price>, < amount>
• Complex elements: <Drink>, ‹Snack> etc.
2) Textual Representation
• Whenever value of the STANDALONE attribute in an XML document is set to "YES", such XML document is
known as schemaless XML documents.
E.g.
< inventory>
<drink>
<lemonade>
< price > $2.50</price>
< amount > 20</amount>
</lemonade >
<pop>
< price > $1.50</price>
< amount > 10</amount >
</pop>
</drink>
< snack>
< chips>
< price > $4.50</price>
< amount > 60</amount>
</chips>
</snack>
</inventory>
➢ Types of XML documents
● Data centric XML documents sometimes considered as semi structured data or sometimes considered a structured data
● document is considered as structured data if an XML document is written a per prederined XML schema or DTD
● Document is considered as semi structure if an XML allows document that do not Conform to any particular schema.
It is a database system
It offers spatial data types (SDTs) in its data model and query language.
It supports spatial data types in its implementation, providing at least spatial indexing and efficient algorithms for spatial join.
Example
A road map is a visualization of geographic information. A road map is a 2-dimensional object which contains points, lines, and polygons
that can represent cities, roads, and political boundaries such as states or provinces.
Vector data: This data is represented as discrete points, lines and polygons
Rastor data: This data is represented as a matrix of square cells.
The spatial data in the form of points, lines, polygons etc. is used by many different databases as shown above.
The spatial DBMS makes spatial data management simple for user and application.
A spatial database supports various concepts for databases which keep information of objects in a multidimensional
region.
There are limited set of data types and operations available for spatial applications which makes the modeling of real
world spatial applications extremely difficult.
Common example of spatial data is a map of railway tracks. This map is two dimensional objects that contain points
and lines that can represent the route of railway and cities.
2) Spatial databases
a. Cartographic databases
This database store maps include two dimensional spatial descriptions of their objects from countries and
states to rivers, cities, roads, seas, and so on.
b. Meteorological databases
Weather information is 3D, since temperatures and other meteorological information are related to three
dimensional spatial points.
3) Spatial database management
• The spatial relationships among the objects are important, and they are often needed when querying the database.
• Some extensions that are needed for spatial databases are models that can interpret spatial characteristics.
• In addition, special indexing and storage structures are often used for improving performance.
• The basic types of extensions required to be include 2 dimensional geometric concepts, like points, lines, circles and polygons in order to specify the
spatial characteristics.
Performance factors
• For better performance special techniques for spatial indexing are needed.
• One of the best known techniques for it is the use of RV trees and their variations.
• RV trees group together objects that are in close spatial physical proximity on the same leaf nodes of a tree-structured index.
• Typical criteria for dividing the space include minimizing the rectangle areas, since this would lead to a quicker narrowing of the search space.
4) Types of data
a. Point data
• A point has a spatial extent characterized completely by its position or location.
• Point data can be a collection of points in multidimensional space.
• Point data stored in a database can be based on direct measurement or data obtained through measurements.
• Raster data is an example of directly measured point data.
b. Region data
• Region data has a spatial extent represented by location and boundaries.
• The location can be shown as the position of fixed points for the region.
• The boundary can be represented as a line in 2D space and as surface in 3D space.
NoSQL is a type of database management system (DBMS) that is designed to handle and store large volumes of unstructured
and semi-structured data. Unlike traditional relational databases that use tables with predefined schemas to store data, NoSQL
databases use flexible data models that can adapt to changes in data structures and are capable of scaling horizontally to
handle growing amounts of data.
The term NoSQL originally referred to “non-SQL” or “non-relational” databases, but the term has since evolved to mean “not
only SQL,” as NoSQL databases have expanded to include a wide range of different database architectures and data models.
1. Document databases: These databases store data as semi-structured documents, such as JSON or XML, and can be
queried using document-oriented query languages.
2. Key-value stores: These databases store data as key-value pairs, and are optimized for simple and fast read/write
operations.
3. Column-family stores: These databases store data as column families, which are sets of columns that are treated as
a single entity. They are optimized for fast and efficient querying of large amounts of data.
4. Graph databases: These databases store data as nodes and edges, and are designed to handle complex
relationships between data.
Four Types of NoSQL Database
1. Key-value store databases
• This is a very simple NoSQL database.
• It is specially designed for storing data as schema free data.
• Such data is stored in a form of data along with an indexed key.
This type is generally used when you need quick performance for
basic Create-Read-Update-Delete operations and data is not
connected.
Example :
•Storing and retrieving session information for a Web pages.
•Storing user profiles and preferences
•Storing shopping cart data for ecommerce
Limitations
•It may not work well for complex queries attempting to connect
multiple relations of data.
•If data contains lot of many-to-many relationships, a Key-Value store
is likely to show poor performance
Examples
• Cassandra
• Azure Table Storage (ATS)
• DynamoDB
Example of unstructured data for user records
2. Column store database
• Instead of storing data in relational tuples (table rows), it is stored in
cells grouped in columns.
• It offers very high performance and a highly scalable architecture.
Examples:
1. HBase
2. Big Table
3. Hyper Table
Use Cases
• Some common examples of a Column-Family database include event
logging and blogs like document databases, but the data would be
stored in a different fashion.
• In logging, every application can write its own set of columns and
have each row key formatted in such a way to promote easy lookup
based on application and timestamp.
• Counters can be a unique use case. It is possible to design an
application that needs an easy way to count or increment as events
occur.
3. Document database
• Document databases work on the concept of key-value stores where
"documents" contain a lot of complex data.
• Every document contains a unique key, used to retrieve the
document.
• Key is used for storing, retrieving and managing document-oriented
information also known as semi-structured data.
Examples:
MongoDB
CouchDB
• The example of such a system would be an event logging system for
an application or online blogging.
• In online blogging the user acts like a document; each post a
document, and each comment, like, or action would be a document.
• All documents would contain information about the type of data,
username, post content, or timestamp of document creation.
Limitations
• It's challenging for a document store to handle a transaction on
multiple documents.
• Document databases may not be good if data is required in
aggregation.
4. Graph database
• Data is stored as a graph and their relationships are stored as a link
between them whereas an entity acts like a node.
Examples:
Neo4j
Polyglot
Use Cases
•A very important and popular application would be social networking
sites that can benefit by quickly locating friends, friends of friends,
likes, and so on.
• Google Maps can help you to use graphs to easily model their data
for finding close locations or building shortest routes for directions.
•Many recommendation systems make effective use of this model.
Limitations
• Graph Databases may not be offering better choice over other
NoSQL variations.
• If an application needs to scale horizontally this may introduce poor
performance.
• Not very efficient when it needs to update all nodes with a given
parameter.
5. Comparison
NoSQL databases are often used in applications where there is a high volume of data that needs to be processed and
analyzed in real-time, such as social media analytics, e-commerce, and gaming. They can also be used for other
applications, such as content management systems, document management, and customer relationship management.
However, NoSQL databases may not be suitable for all applications, as they may not provide the same level of data
consistency and transactional guarantees as traditional relational databases. It is important to carefully evaluate the
specific needs of an application when choosing a database management system.
NoSQL originally referring to non SQL or non relational is a database that provides a mechanism for storage and
retrieval of data. This data is modeled in means other than the tabular relations used in relational databases. Such
databases came into existence in the late 1960s, but did not obtain the NoSQL moniker until a surge of popularity in
the early twenty-first century. NoSQL databases are used in real-time web applications and big data and their use are
increasing over time.
● NoSQL systems are also sometimes called Not only SQL to emphasize the fact that they may support SQL-like query languages. A NoSQL database
includes simplicity of design, simpler horizontal scaling to clusters of machines,has and finer control over availability. The data structures used by NoSQL
databases are different from those used by default in relational databases which makes some operations faster in NoSQL. The suitability of a given NoSQL
database depends on the problem it should solve.
● NoSQL databases, also known as “not only SQL” databases, are a new type of database management system that has, gained popularity in recent years.
Unlike traditional relational databases, NoSQL databases are designed to handle large amounts of unstructured or semi-structured data, and they can
accommodate dynamic changes to the data model. This makes NoSQL databases a good fit for modern web applications, real-time analytics, and big data
processing.
● Data structures used by NoSQL databases are sometimes also viewed as more flexible than relational database tables. Many NoSQL stores compromise
consistency in favor of availability, speed,, and partition tolerance. Barriers to the greater adoption of NoSQL stores include the use of low-level query
languages, lack of standardized interfaces, and huge previous investments in existing relational databases.
● Most NoSQL stores lack true ACID(Atomicity, Consistency, Isolation, Durability) transactions but a few databases, such as MarkLogic, Aerospike, FairCom
c-treeACE, Google Spanner (though technically a NewSQL database), Symas LMDB, and OrientDB have made them central to their designs.
● Most NoSQL databases offer a concept of eventual consistency in which database changes are propagated to all nodes so queries for data might not return
updated data immediately or might result in reading data that is not accurate which is a problem known as stale reads. Also,has some NoSQL systems may
exhibit lost writes and other forms of data loss. Some NoSQL systems provide concepts such as write-ahead logging to avoid data loss.
● One simple example of a NoSQL database is a document database. In a document database, data is stored in documents rather than tables. Each document
can contain a different set of fields, making it easy to accommodate changing data requirements
● For example, “Take, for instance, a database that holds data regarding employees.”. In a relational database, this information might be stored in tables, with
one table for employee information and another table for department information. In a document database, each employee would be stored as a separate
document, with all of their information contained within the document.
● NoSQL databases are a relatively new type of database management system that hasa gained popularity in recent years due to their scalability and
flexibility. They are designed to handle large amounts of unstructured or semi-structured data and can handle dynamic changes to the data model. This
makes NoSQL databases a good fit for modern web applications, real-time analytics, and big data processing.
Key Features of NoSQL:
1. Dynamic schema: NoSQL databases do not have a fixed schema and can accommodate changing data
structures without the need for migrations or schema alterations.
2. Horizontal scalability: NoSQL databases are designed to scale out by adding more nodes to a database
cluster, making them well-suited for handling large amounts of data and high levels of traffic.
3. Document-based: Some NoSQL databases, such as MongoDB, use a document-based data model, where
data is stored in a scalessemi-structured format, such as JSON or BSON.
4. Key-value-based: Other NoSQL databases, such as Redis, use a key-value data model, where data is stored
as a collection of key-value pairs.
5. Column-based: Some NoSQL databases, such as Cassandra, use a column-based data model, where data is
organized into columns instead of rows.
6. Distributed and high availability: NoSQL databases are often designed to be highly available and to
automatically handle node failures and data replication across multiple nodes in a database cluster.
7. Flexibility: NoSQL databases allow developers to store and retrieve data in a flexible and dynamic manner,
with support for multiple data types and changing data structures.
8. Performance: NoSQL databases are optimized for high performance and can handle a high volume of reads
and writes, making them suitable for big data and real-time applications.
Advantages of NoSQL: There are many advantages of working with NoSQL databases such as MongoDB and Cassandra. The main
advantages are high scalability and high availability.
1. High scalability: NoSQL databases use sharding for horizontal scaling. Partitioning of data and placing it on multiple machines in
such a way that the order of the data is preserved is sharding. Vertical scaling means adding more resources to the existing
machine whereas horizontal scaling means adding more machines to handle the data. Vertical scaling is not that easy to
implement but horizontal scaling is easy to implement. Examples of horizontal scaling databases are MongoDB, Cassandra, etc.
NoSQL can handle a huge amount of data because of scalability, as the data grows NoSQL scalesThe auto itself to handle that
data in an efficient manner.
2. Flexibility: NoSQL databases are designed to handle unstructured or semi-structured data, which means that they can
accommodate dynamic changes to the data model. This makes NoSQL databases a good fit for applications that need to handle
changing data requirements.
3. High availability: The auto, replication feature in NoSQL databases makes it highly available because in case of any failure data
replicates itself to the previous consistent state.
4. Scalability: NoSQL databases are highly scalable, which means that they can handle large amounts of data and traffic with ease.
This makes them a good fit for applications that need to handle large amounts of data or traffic
5. Performance: NoSQL databases are designed to handle large amounts of data and traffic, which means that they can offer
improved performance compared to traditional relational databases.
6. Cost-effectiveness: NoSQL databases are often more cost-effective than traditional relational databases, as they are typically
less complex and do not require expensive hardware or software.
7. Agility: Ideal for agile development.
Disadvantages of NoSQL: NoSQL has the following disadvantages.
1. Lack of standardization: There are many different types of NoSQL databases, each with its own unique strengths and weaknesses. This lack of standardization can make it
difficult to choose the right database for a specific application
2. Lack of ACID compliance: NoSQL databases are not fully ACID-compliant, which means that they do not guarantee the consistency, integrity, and durability of data. This can
be a drawback for applications that require strong data consistency guarantees.
3. Narrow focus: NoSQL databases have a very narrow focus as it is mainly designed for storage but it provides very little functionality. Relational databases are a better choice in
the field of Transaction Management than NoSQL.
4. Open-source: NoSQL is an databaseopen-source database. There is no reliable standard for NoSQL yet. In other words, two database systems are likely to be unequal.
5. Lack of support for complex queries: NoSQL databases are not designed to handle complex queries, which means that they are not a good fit for applications that require
complex data analysis or reporting.
6. Lack of maturity: NoSQL databases are relatively new and lack the maturity of traditional relational databases. This can make them less reliable and less secure than
traditional databases.
7. Management challenge: The purpose of big data tools is to make the management of a large amount of data as simple as possible. But it is not so easy. Data management in
NoSQL is much more complex than in a relational database. NoSQL, in particular, has a reputation for being challenging to install and even more hectic to manage on a daily
basis.
8. GUI is not available: GUI mode tools to access the database are not flexibly available in the market.
9. Backup: Backup is a great weak point for some NoSQL databases like MongoDB. MongoDB has no approach for the backup of data in a consistent manner.
10. Large document size: Some database systems like MongoDB and CouchDB store data in JSON format. This means that documents are quite large (BigData, network
bandwidth, speed), and having descriptive key names actually hurts since they increase the document size.
Types of NoSQL database: Types of NoSQL databases and the name of the database system that falls in that category are:
In conclusion, NoSQL databases offer several benefits over traditional relational databases, such as scalability,
flexibility, and cost-effectiveness. However, they also have several drawbacks, such as a lack of standardization, lack
of ACID compliance, and lack of support for complex queries. When choosing a database for a specific application, it
is important to weigh the benefits and drawbacks carefully to determine the best fit.