DBMS_Unit I
DBMS_Unit I
Data:- Data is raw facts which can be stored and recorded. The data consist of facts such as
text, graphics, images, video and audio segments which gives more meaning to the user
environment.
File System :- A file system is a type of software that allows users to access and organize
small groups of data. It is usually integrated into a computer's operating system and is
responsible for storing and retrieving files from a storage medium, such as a hard disk or
flash drive.
A file based system is a collection of application programs that perform services for
the users wishing to access information. Each program within a file based system defines and
manages its own data.
File based systems were developed as better alternatives to paper based filing systems. By
having files stored on computers, the data could be accessed more efficiently. It was
common practice for larger companies to have each of its departments looking after its
own data.
Data Inconsistency:- Two reasons for inconsistency: i) more than one person modifies the
data simultaneously. ii) wrong data is entered. For example, consider a person has both
savings and current account. Consider that the address of a person is changed only in one
file. This creates data inconsistency.
Difficulty in Accessing Data:- It becomes difficult to access data when data is stored in
different files. When data is stored in more than one file it becomes extremely difficult to
access data as large amount of data has to be searched.
Limited Data Sharing:- Data are stored in different files. Different files may have
different formats and these files may be stored in different folders. So, due to this data
isolation, it is difficult to share data among different applications.
Integrity Problems:- Data integrity means that the data contained in the database in both
correct and consistent. For this purpose the data stored in database must satisfy correct and
consistent. For example, balance for any account should not be less than 0, this should be
specified in the system by adding appropriate code .
Data dependence:- File structure is stored in program code. Its very difficult to change
the existing structure. The programmer will have to find all the affected programs, modify
and change them. This characteristic is known as program data independence.
Incompatible file formats:- File structure is stored in program code, therefore the
structure is dependent on programming language. This makes it difficult to process jointly.
Fixed Queries:- File based systems are dependent on application program. Programs are
written to satisfy particular functions. Any new requirement needs a new program.
Security:- There is no provision for security. Paper files can be lost leading to permanent
data loss.
A file system is a software that manages and DBMS or Database Management System is a
organizes the files in a storage medium. It software application. It is used for accessing,
controls how data is stored and retrieved. creating, and managing databases.
The file system provides the details of data DBMS gives an abstract view of data that
representation and storage of data. hides the details
Storing and retrieving of data can't be done DBMS is efficient to use as there are a wide
efficiently in a file system. variety of methods to store and retrieve data.
It does not offer data recovery processes. There is a backup recovery for data in DBMS.
The file system doesn't have a crash recovery DBMS provides a crash recovery mechanism
mechanism.
Protecting a file system is very difficult. DBMS offers good protection mechanism.
In a file management system, the redundancy of The redundancy of data is low in the DBMS
data is greater. system.
Data inconsistency is higher in the file system. Data inconsistency is low in a database
management system.
The file system offers lesser security. Database Management System offers high
security.
File System allows you to stores the data as Database Management System stores data as
isolated data files and entities. well as defined constraints and interrelation.
The centralization process is hard in File Centralization is easy to achieve in the DBMS
Management System. system.
It doesn't offer backup and recovery of data if it DBMS system provides backup and recovery
is lost. of data even if it is lost.
There is no efficient query processing in the file You can easily query data in a database using
system. the SQL language.
These system doesn't offer concurrency. DBMS system provides a concurrency facility.
Database Approach:-
Database management systems can be classified based on several criteria, such as the data
model, user numbers and database distribution, all described below.
Classification Based on Data Model:- hierarchical data models, network data models and
network data models.
The most popular data model in use today is the network data models. The DBMSs like
Oracle, MS SQL Server, DB2 and MySQL support this model.
The hierarchical data models and network data models, are still used in industry
mainly on mainframe platforms. However, they are not commonly used due to their
complexity.
In recent years, the newer object-oriented data models were introduced. This model is
a database management system in which information is represented in the form of objects as
used in object-oriented programming. Object-oriented database management systems
(OODBMS) combine database capabilities with object-oriented programming language
capabilities.
There are four main distribution systems for database systems and these, in turn, can be used
to classify the DBMS.
Centralized systems:-
With a centralized database system, the DBMS and database are stored at a single site that is
used by several other systems too. This is illustrated in Figure
Distributed database system:-
In a distributed database system, the actual database and the DBMS software are distributed
from various sites that are connected by a computer network, as shown in Figure
Homogeneous distributed database systems use the same DBMS software from multiple
sites. Data exchange between these various sites can be handled easily.
For example, library information systems by the same vendor, such as Geac Computer
Corporation, use the same DBMS software which allows easy data exchange between the
various Geac library sites.
In a heterogeneous distributed database system, different sites might use different DBMS
software, but there is additional common software to support data exchange between these
sites. For example, the various library database systems use the same machine-readable
cataloguing (MARC) format to support library record data exchange.
2. Insulation between Programs and Data, and Data Abstraction :-In traditional file
processing, the structure of data files is embedded in the application programs, so any
changes to the structure of a file may require changing all programs that access this file. By
contrast, DBMS access programs do not require such changes in most cases. The structure of
data files is stored in the DBMS catalog separately from the access programs. We call this
property program-data independence.
Data Abstraction:- The characteristic that allows program data independence and program
operation independence is called data abstraction. DBMS provides essential information to a
user by hiding the internal details
3. Support of Multiple Views of the Data:- A database typically has many users, each of
whom may require a different perspective or view of the database. A view may be a subset of
the database or it may contain virtual data that is derived from the database files but is not
explicitly stored. Some users may not need to be aware of whether the data they refer to is
stored or derived. A multiuser DBMS whose users have a variety of distinct applications
must provide facilities for defining multiple views. For example, student wants to get his
exam results, so his number, marks are given as student view which satisfy his need.
4. Sharing of Data and Multiuser Transaction Processing :-A multiuser DBMS must allow
multiple users to access the database at the same time. This is essential if data for multiple
applications is to be integrated and maintained in a single database. The DBMS must include
concurrency control software to ensure that several users trying to update the same data do so
in a controlled manner so that the result of the updates is correct.
For example, when several reservation clerks try to assign a seat on an airline flight,
the DBMS should ensure that each seat can be accessed by only one clerk at a time for
assignment to a passenger. These types of applications are generally called online transaction
processing (OLTP) applications
4. Providing storage structure and search technique for efficient query processing
1. Controlling Redundancy:- In file system, each application has its own private files, which
cannot be shared between multiple applications, this can often lead to considerable
redundancy in the stored data, Redundancy is storing same data multiple times this has
several problems like
DBMS allows sharing of data between tables and controls duplication of data.
Thus by using security and authorization subsystem, DBMS gives access to data from
database only to privileged users (with valid username and password). This helps in
maintaining security and confidentiality of data.
3. Providing persistent storage:- Database provides permanent storage for all data. Even after
the usage of data and terminating the program we can access the data from the storage.
4. Providing storage structure and search technique for efficient query processing :- In
DBMS indexes are useful in providing specialized data structures and search techniques to
speedup disk search for desired records. For the fast accessing of frequent queries DBMS
uses buffering or caching module. The query processing and optimizing module of DBMS is
responsible for choosing an efficient query execution plan
5. Providing Backup and Recovery:- A DBMS must provide facilities for recovering from
hardware or software failures. The backup and recovery subsystem of the DBMS is
responsible for recovery. For example, if the computer system fails in the middle of a
complex update transaction, the recovery subsystem is responsible for making sure that the
database is restored to the initial state.
6. Providing multiple user interface:- Many types of users with different levels of technical
knowledge uses the database. DBMS provides multiple user interfaces such as queries,
application programs, forms, menu driven programs for different types of user to interact
with users to interact with database.
7. Represent Complex relationship among data:- A database may include numerous varieties
of data that are interrelated in many ways. A DBMS must have the capability to represent a
variety of complex relationships among the data, to define new relationships as they arise,
and to retrieve and update related data easily and efficiently.
1. Size:- DBMS software is complex and extremely large piece of software, it occupies many
megabytes of disk space and need memory to run efficiently
2. Cost:- The cost of DBMS varies significantly, depending on the environment and
functionality provided. There is also the recurrent annual maintenance cost.
3. High degree of failure:- The centralization of resources increases the vulnerability of the
system. Since all users and applications rely on the availability of DBMS i.e. centralized
databases, the failure of any component can bring operations to a halt. If at all serer fails
entire system stops functioning. 4.Performance of DBMS:- Performance of DBMS is
considerably slow .
Data Models:- Data Model can be defined as an integrated collection of concepts for
describing and manipulating data, relationships between data, and constraints on the data in
an organization.
The purpose of a data model is to represent data and make data understandable. The three
categories of data models are
Object based data models:- This model use concepts such as entities, attributes, and
relationships. Entity - Object, Attribute – property, relationship –association between entities
Ex: ER Model
Physical data models:- Physical data models describe how data is stored in the computer.
Ex: tables
1. Hierarchical model
2. Network model
3. Relational model
Hierarchical model:- Hierarchical Database model is one of the oldest database models.
Information Management System (IMS) is based on this model.
This model is like a structure of a tree with records forming the nodes and fields forming the
branches of a tree.
The tree structure contain nodes .In general, a root node can contain any number of
dependents. Each of these dependents can have any number of lower level dependents .
The different elements present in the hierarchical tree structure have Parent-Child
relationship. A parent element can have many children elements but a child element cannot
have many parent elements. That is, hierarchical model cannot represent many to many
relationships among records(M:M Relations).
EX: Great grandparent is the root of the structure. Parents can have many children exhibiting
one to many relationships. The great grandparent record is known as the root of the tree. The
grandparents and children are the nodes or dependents of the root.
A Sample database:- Let us take the example of the sample database consisting of parts,
supplier and shipments.
Each row in Supplier table is identified by a unique SNO (Supplier Number) that uniquely
identifies the entire row of the table. Likewise each part has a unique PNO (Part Number).
Not more than one shipment exists for a given supplier/part combination in the shipments
table.
The tree structure is built with PART node as parent node and SUPPLIER node as child
node. Each of the 3 Trees in figure, consists of one PART record occurrence, together with a
set of subordinate SUPPLIER record occurrences. There is one supplier record for each
supplier of a particular part. Each supplier occurrence includes the corresponding shipment
quantity.
Part P1 is supplied by supplier S1, similarly P2 is supplied S2, P3 is supplied by S1, S3.
Operations on Hierarchical Model:- There are four basic operations Insert, Update, Delete
and Retrieve that can be performed on Hierarchical Model
Insert Operation:- A part P4 that is not supplied by any supplier can be inserted without any
problem, because a parent can exist without any child. It is not possible to insert the
information of the supplier e.g. S4 who does not supply any part. This is because a node
cannot exist without a root. Since, we can say that insert anomaly exists only for those
children, which has no corresponding parents.
Update Operation: -To update city of supplier S1 we need to perform multiple updates.
Which leads to inconsistency in data.
Delete Operation:- If we delete part P3 information we will lose supplier S3 data too.
Record Retrieval:- Record retrieval methods for hierarchical model are complex .
Simplicity:- since the database is based on hierarchical structure, the relationship between
various layers is logically simple. Thus, the design of hierarchical database is simple.
Data Security:- hierarchical model was the first database model that offered the data security
That is provided and enforced by the DBMS.
Database Integrity:- Because of its inherent parent-child structure, database integrity is
highly promoted in these systems.
Efficiency:- The hierarchical database model is a very efficient, one when the database
contains a large number of 1: N relationships (one-to-many relationships) and when the users
require large number of transactions, using data whose relationships are fixed.
Disadvantages:-
Lack of structural independence:-If the physical structure is changed the applications also
have to be modified. Thus in hierarchical database the benefits of data independence is
limited by structural dependence.
Operational Anomalies : - Hierarchical model suffers from the Insert anomalies, Update
anomalies and Deletion anomalies, also the retrieval operation is complex and asymmetric,
and thus hierarchical model is not suitable for all the cases.
Network Model:-The Network model uses a graph model to represent general connections
among the nodes. A network structure allows 1:1 (one: one), l: M (one: many), M: M (many:
many) relationships among entities. In network database terminology, a relationship is a set.
Each set is made up of at least two types of records: an owner record (equivalent to parent in
the hierarchical model) and a member record (similar to the child record in the hierarchical
model).
Sample Database: - Let us take the example of the sample database consisting of supplier,
parts and shipments.
Network view of Sample Database:- supplier-part database network view is shown in the
following diagram. A connector occurrence specifies the association (shipment) between one
supplier and one part. All the supplier occurrences are placed on the chain. Similarly all part
occurrences are placed on chain.
Supports more relationship types:- The network model can support the one to- many (l :
N) and many to many(N : N) relationships, which helps in modeling the real life situations.
Data access: - The data access is easier than the hierarchical model.
Data Integrity: - The network model does not allow a member to exist without an owner.
Thus, a user must first define the owner record and then the member record. This ensures
the data integrity.
Data independence: - The network model is better than the hierarchical model in
isolating the data from the application programs.
Database Standards: - The standards included a Data Definition Language (DDL) and
the Data Manipulation Language (DML), thus greatly enhancing database
administration and portability.
Relational model:-
Relational model stores data in the form of tables. This concept proposed by Dr.
E.F.Codd. The relational model consists of three major components:
1. The set of relations and set of domains that defines the way data can be represented (data
structure).
2. Integrity rules that define the procedure to protect the data (data integrity).
● In relational model data is represented as ordered collection of rows and columns called
relation (table).
● All values are scalar. (At any given row/column position in the relation there is one and
only one value).
● operations performed on a relation will result in other relation
Basic Terminology used in Relational Model
The figure shows a relation with the. Formal names of the basic components marked the
entire structure is, as we have said, a relation.
Domains:- A domain is the set of all possible values that an attribute may validly contain.
Domains are often confused with data types, but this is inaccurate. Data type is a physical
concept while domain is a logical one. "Number" is a data type and "Age" is a domain.
Body of a Relation:- The body of the relation consists of an unordered set of zero or more
tuples.
Keys of a Relation:- It is a set of one or more columns (attributes) specifying uniqueness.
Some different types of keys are:
Primary key:- it is an attribute or set of attributes of a relation which identify each row
uniquely. A Primary key satisfies properties of Uniqueness and NOT NULL
Ex: PNO in PART Relation, SNO in SUPPLIER Relation, SNO, PNO in SHIPMENT
Relation.
Foreign key:- is the attributes of a relation, which refers to the primary key of some other
relation.
The four basic operations Insert, Update, Delete and Retrieve operations are
shown below on the sampledatabase in relational model:
Insert Operation:- Suppose we wish to insert the information of supplier who does not
supply any part, can be inserted in SUPPLIER table without any anomaly. So, we can say
that insert operations can be performed in all the cases without any anomaly.
Update Operation:- Update operation in relational model is very simple and without any
anomaly in case of relational model.
Suppose supplier S1 has moved from VJA to VZG. In that case we need to make
changes in the record, so that the supplier table is up-to-date. Since supplier number is
the primary key in the SUPPLIER table, so there is only a single entry of S 1, which
needs a single update.
Record Retrieval:- Record retrieval methods for relational model are simple.
Design, implementation, maintenance and usage ease:- The relational database model
achieves both data independence and structure independence making the database design,
maintenance, administration and usage much easier than the other models.
Query capability:- SQL is a fourth generation language (4GL). A 4 GL allows the user to
specify what must be done without specifying how it must be done.
These information islands will prevent the information integration that is essential
for the smooth and efficient functioning of the organization. These individual databases
will also create problems like data inconsistency, data duplication, data redundancy and so
on.
Components of DBMS:
There are five major components in the database system environment and their
interrelationship are.
• Data
• Software
• Hardware
• Users
• Procedures
1. Data: The database contains the operational data and the meta-data,
the 'data aboutdata'. Operational Data is day-to-day entered into the
database tables.
Meta Data is data description stored in data dictionary.
3. Hardware: Hardware includes Secondary Storage devices like Hard disk, input and
output devices.
5. Procedures : Procedures are application programs that perform a specific task or job
Instance:- The data in the database at a particular moment in time is called a database
state or snapshot. It is also called the current set of occurrences or instances in the
database.
The internal level:- the internal level has an internal schema, which describes the
physical storage structure of the database. The internal schema uses a physical data model
and describes the complete details of data storage and access paths for the database.
The conceptual level:- the conceptual level has a conceptual schema, which describes the
structure of the whole database for a community of users. The conceptual schema hides
the details of physical storage structures and concentrates on describing entities, data
types, relationships, user operations, and constraints. Usually, a representational data
model is used to describe the conceptual schema when a database system is implemented.
This implementation conceptual schema is often based on a conceptual schema design in
a high-level data model.
The external or view level:- it includes a number of external schemas or user views. Each
external schema describes the part of the database that a particular user group is interested
in and hides the rest of the database from that user group. As in the previous case, each
external schema is typically implemented using a representational data model, possibly
based on an external schema design in a high level data model. The three-schema
architecture is a convenient tool with which the user can visualize the schema levels in a
database system.
Data Independence
The three-schema architecture can be used to further explain the concept of data
independence, which can be defined as the capacity to change the schema at one level of a
database system without having to change the schema at the next higher level. We can
define two types of data independence:
Logical data independence:- it is the capacity to change the conceptual schema without
having to change external schemas or application programs. We may change the
conceptual schema to expand the database (by adding a record type or data item), to
change constraints, or to reduce the database (by removing a record type or data item). In
the last case, external schemas that refer only to the remaining data should not be affected.
Physical data independence:- is the capacity to change the internal schema without
having to change the conceptual schema. Hence, the external schemas need not be
changed as well. Changes to the internal schema may be needed because some physical
files had to be reorganized-for example, by creating additional access structures-to
improve the performance of retrieval or update. If the same data as before remains in the
database, we should not have to change the conceptual schema.
The database approach emphasizes data integration and sharing across organizations. As with
any business decision, the database approach entails some additional costs and risks that must
be recognized and managed when implementing this approach. Mentionable some costs &
risks of database approach are as follows;
New Specialized Personnel:- Frequently, organizations that adopt the database approach
need to hire or train individuals to design & implement databases, provide database
administration services and manage a staff of new people, further, because of the rapid
changes in technology these new people will have to be retrained or upgraded on a regular
basis.
Installation & Management Costs and complexity:- A multi-user database management
system is a large and complex suite of software that has a high initial cost, requires a staff of
trained personnel to install and operate, and also has substantial annual maintenance &
support costs. Installing such a system may also require upgrades to the hardware and data
communications system in the organization.
Conversion Costs:- The cost of converting the traditional file processing systems to modern
database technology: measured in terms of money, time, and organizational commitment.
Need for explicit Backup & Recovery:- A shared corporate database must be accurate and
available at all times. These require that comprehensive procedures be developed and used
for providing backup copies of data and for restoring a database when damage occurs.
Organizational Conflict:- A shared database requires a consensus on data definitions and
ownership as well as responsibilities for accurate data maintenance. Experience has shown
that conflicts on data definitions, data formats, and coding, rights to update shared data are
frequent and often difficult to resolve.