Data Base Management System
(DBMS)
Unit -1
U1.1
Data Base Management System
Data: Data is the basic raw facts and figures
Ex: a name, a digit, a picture etc.
Data Base: Collection of related data
Ex. the names, telephone numbers and addresses of all the
people you know
Data Base Management System:
A DBMS is a set of programs that controls creation, storage,
management, and retrieval of data in a database.
Ex: MS-Access, Oracle, MY SQL, Sybase, IBM DB2, Ingres etc
Use of DBMS
Corporate
Airlines
Hotels
Banks
Colleges /University
Railway reservation
Telecommunication Industry
Data mining
Libraries
Disadvantages of Flat File Systems
No centralized control
Data Redundancy
Data Inconsistency
Data can not be shared
Standards can not be enforced
Security issues
Integrity can not be maintained
Data Dependence
Advantages of Using DBMS
No Data Redundancy
Data Consistency
Mass Data Storage
Centralized Access
Automatic Backup Possible
Data Recovery Possible
Integrity Constraints
Easy updation & fetching of data
Only authorized Access
Data Base Characteristics
Controls data redundancy.
Enforces user defined rules.
Ensures data sharing.
It has automatic and intelligent backup and recovery
procedures.
It has central dictionary to store information pertaining to
data and its manipulation.
It has different interfaces via which user can manipulate
the data.
Enforces data access authorization.
Represents complex relationship between data.
Classification of DBMS
• Relational DBMS:
Modeling concept: tables and constraints on tables
Query Language: SQL
Applications: suited for traditional business processing
• Object-Oriented DBMS
Modeling concepts: objects, classes, inheritance
Query Language: object oriented OQL
Applications: suited for CAD databases, CASE databases, office
automation
• Object-Relational DBMS:
Incorporate OO concepts into relational model
Similar functionality as OO-DBMS, but different implementations
Language: extended to process objects. Eg: Cloudscape, DB2
DBMS Overview
user
Applications/queries
Query processor
Storage manager
metadata data
• Database: collection of interrelated information about world being modeled
• DBMS: general-purpose software to define, create, modify, retrieve, delete and
manipulate a database
Data Base Users
•DBMS designers and implementers
•Database administrator (DBA)
“superuser” of a database, similar to a system
administrator.
Define schemas, views, authorization, indexes, tuning
parameters, etc.
•Application programmers
•End users
Roles of Data Base Administrator
A database administrator (DBA) is a person responsible for the design,
implementation, maintenance and repair of an organization's database. The key
roles of a DBA are :
To Provide space to each user.
To create the external and logical Schema.
To Provide security from unauthorized access.
To grant permissions to the user
Installation, configuration and upgrading of DBMS server software and related
products.
To take Back up and Recovery of data.
Performance monitoring of the machine and database.
U1.11
Three-Layer Abstraction
External Schema - 1 External Schema - 2 External Schema - 3
Conceptual Schema
Physical Schema
Description of Levels
Users Level:
• Any number of users may exists in this view.
• Different users may have different external views for the same data.
•It insulates the users from the details of internal & conceptual level.
Conceptual Level:
•This level is designed by data base administrator.
•Under this level a schema of data base is created by DBA.
•It represents the entire database and there can be only one conceptual view per database.
•It represents entities, their attributes and relationships between them.
•It is independent on the hardware and software.
• This is also known as Logical Level.
Internal Level:
•It indicates how the data will be stored ad describes the data structures and access
methods to be used by data base (ie. The physical implementation of data).
•It is concerned with storage space allocation, indexes, data compression etc.
Data Independence
• When a schema at a lower level is changed, only the
mappings between this schema and higher-lever
schemas need to be changed in a DBMS that fully
supports data independence.
• The higher-level schemas themselves are unchanged.
Hence, the application programs need not be changed
since they refer to the external schemas.
• Disadvantages of two levels of mappings:
Overhead during compilation or execution of a query or
program
Data Independence
Logical Data Independence: The capacity to change the conceptual
schema without having to change the external schemas and their
application programs.
Example : Addition or removal of new entities, attributes,
relationships etc to the conceptual schema should be possible
without affecting existing external schema.
Physical Data Independence: The capacity to change the internal
schema without having to change the conceptual schema.
Example:
Using new storage devices, different data structures, different access
methods, different file organization or storage structures and
modifying indexes.
Mapping between views
Two mappings are required in a database system with three different
views:
• Mapping between conceptual and external view
• Mapping between internal and conceptual view
Mapping between views specifies the methods of deriving the record
at one level from the record at lower level.
DBMS Interface
• Provides users means to interact with database:
Menu driven interface
Forms based interface
Using SQL
WWW connectivity.
DBMS Languages
• Data Definition Language (DDL)
Used to describe a schema
Eg: Create table, drop table, alter table etc
• Data Manipulation Language (DML)
Used by users to query the DB and change the data
Eg: Insert into, update, delete etc
• Data Control Language (DCL)
Used to specify access control on data
Eg: Grant, Revoke etc
• View Definition Language (VDL)
Define views
Eg: Create view etc
Data Model
•Concepts and tools used to describe DB schemas
•A Data Model is a mechanism that provides abstraction
for database applications. Different models provide
different abstraction levels
•Data modeling is used for representing entities of interest
and their relationships in the database.
•It allows the conceptualization of the association
between various entities and their attributes.
Data Model Classification
• Flat file (Primitive model)
• Traditional Models
• Hierarchical Data Model
• Network Data model
• Relational Data model
• Object Based Models
• Entity-Relationship Model
• Object- Oriented Models
• Semi structured Data Model
Features of Flat Files
A flat file database is a type of database that stores
data in a single table or a file. Placing data in a flat file
offers following advantages:
• All records are stored at one place
• Easy to set up using different office applications
• Easy to understand
• Records can be viewed or extracted based on simple
criteria
Disadvantages of flat files
• Potential duplication
• Data Inconsistency
• No centralized access
• Harder to change data format
• Poor at complex queries
• Poor at authorized access
Hierarchical Data Model
In this model data is organized into a tree-like
structure, implying a single upward link in each record
to describe the nesting, and a sort field to keep the
records in a particular order in each same-level list.
Drawbacks: Hierarchical DBMS
Can not handle Many-Many relationships.
Can not reflect all real life situations
Difficult to perform insert, delete and update operations.
Network Data Model
In the network model, entities are organised in a graph in which some
entities can be accessed through several path.
The basic data modeling construct in the network model is the set
construct. A set consists of an owner record type, a set name, and a
member record type. A member record type can have that role in more
than one set, hence the multi-parent concept is supported. An owner
record type can also be a member or owner in another set.
Relational Data Model
Relational model is based on relations construct.
It is bounded with 12 codd ’s rules.
Every information is stored in the form of columns and
rows.
Relational Data Model
Example of tabular data in the relational model
Attributes
customer- customer- customer- account-
Customer-
name street city number
id
192-83-7465 Johnson
Alma Palo Alto A-101
019-28-3746 Smith
North Rye A-215
192-83-7465 Johnson
Alma Palo Alto A-201
321-12-3123 Jones
Main Harrison A-217
019-28-3746 Smith
North Rye A-201
Sample Relational Database
Instance
The collection of information stored in the database at
a particular moment is called an instance of the
database.
Ex: Amit, 101 etc.
Schema
The overall design of the database is called the
database schema.
A schema is the structure of the table which is
decided before storing the data.
Example:
Create table student
( rollno number(5),
name char(15),
address varchar2(25));
Tuple
A tuple is a related record stored in a row of the
table.
Ex: 101,Alok,MCA,85%
Tuple : Record
Attributes: columns
Entity : Tables
Semi-Structured Data Model
• Semi structured data model is a self describing data model, in this the
information that is normally associated with a scheme is contained within the
data and this property is called as the self describing property.
• In such database there is no clear separation between the data and the schema,
and the degree to which it is structured depends on the application. In some
forms of semistructured data there is no separate schema, in others it exists but
only places loose constraints on the data.
• Semistructured data has recently emerged as an important topic of study for a
variety of reasons. First, there are data sources such as the Web, which we would
like to treat as databases but which cannot be constrained by a schema. Second, it
may be desirable to have an extremely flexible format for data exchange between
disparate databases.
Object Based Models
It is designed using the entities in the real world, attributes of each
entity and their relationship. It picks up each thing/object in the real
world which is involved in the requirement.
There are two types of object based data Models – Entity
Relationship Model and Object oriented data model.
ER data model is one of the important data model which forms the
basis for the all the designs in the database world. It defines the
mapping between the entities in the database.
Object oriented data model, along with the mapping between the
entities, describes the state of each entity and the tasks performed
by them.
Object-Oriented Model
This data model is another method of representing real world
objects. It considers each object in the world as objects and isolates
it from each other. It groups its related functionalities together and
allows inheriting its functionality to other related sub-groups.
Let us consider an Employee database to understand this model
better. In this database we have different types of employees –
Engineer, Accountant, Manager, Clark. But all these employees
belong to Person group. Person can have different attributes like
name, address, age and phone
Advantages
Because of its inheritance property, we can re-use the attributes and
functionalities. It reduces the cost of maintaining the same data multiple times.
Also, these informations are encapsulated and, there is no fear being misused by
other objects. If we need any new feature we can easily add new class inherited
from parent class and adds new features. Hence it reduces the overhead and
maintenance costs.
Because of the above feature, it becomes more flexible in the case of any
changes.
Codes are re-used because of inheritance.
Since each class binds its attributes and its functionality, it is same as
representing the real world object. We can see each object as a real entity. Hence
it is more understandable.
Disadvantages
It is not widely developed and complete to use it in the database systems. Hence
it is not accepted by the users.
Entity/Relationship (E/R) Model
• Entities: objects
• Relationships: associate entities
• Roles of entities in a relationship
• Constraints on entities:
domain constraints
key constraints
• Constraints on relationships:
Cardinality constraints
Participation constraints
Weak Entity Sets
• Multiway relationships
• Subclass/superclass Relationships
• Aggregation
Symbols Used in E-R Notation
Entities and Entity Sets
name street number balance
city
id
Customer custacct Account
• Entities:
nouns, “things” in the world
Have attributes: course name, id, address, dept, age,
room, …
• Entity sets: a set of entities
Attributes
• Single-valued versus multi-valued:
“telephone number”: multi-valued
“Salary”: single-valued
• Atomic versus composite:
“Age”: atomic
“Address”: composite
• Derived versus stored:
Derived: derived from other attributes or entities, e.g.,
“age” derived from “date of birth.”
Stored: all other attributes
U1.44
ER Diagram
id name street balance
city number
age
Customer custacct Account
dob
tel
opendate
• Graphical representation of ER schema. Put as much information as possible.
• Entity set: rectangle
• Attribute: ellipse
• Derived attribute: dashed ellipse (“age”)
• Multivalued attribute: double ellipse (“tel”)
• Relationship set: diamond, with lines connected to its entity sets. May have
attributes, called “relationship attributes.”
•Not specified how to represent a composite attribute.
E-R Diagrams
Rectangles represent entity sets.
Diamonds represent relationship sets.
Lines link attributes to entity sets and entity sets to relationship sets.
Ellipses represent attributes
Double ellipses represent multivalued attributes.
Dashed ellipses denote derived attributes.
Underline indicates primary key attributes (will study later)
E-R Diagram With Composite, Multivalued, and
Derived Attributes
E-R Diagram for Hospital Management System
ER Diagram for Library Management System
Types of Keys
Super Key is defined as a set of attributes within a table that uniquely identifies each
record within a table. Super Key is a superset of Candidate key.
Candidate Key are minimal superkeys in an entity, one of those keys is selected to be
the primary key.
Primary Key is a candidate key that is chosen to uniquely identify entities within an
entity set like: rollno
Foreign Key is an attribute in an another relation schema whose values are derived from
the primary key of base relation.
Composite key - Key that consist of two or more attributes that uniquely identify an
entity occurance is called Composite key. But any attribute that makes up the Composite
key is not a simple key in its own.
Secondary key or Alternate Key - The candidate key which are not selected for primary
key are known as secondary keys or alternative keys
Roles in a Relationship
• Role: the function that an entity plays in a
relationship
• Needed when entity set is related to itself via a
relationship.
manager
employee works for
worker
Key Constraints on Entity Sets
• Associate each entity set with a “key,” which is set of
attributes that uniquely identify an entity in entity set.
• In ER diagram: denoted by underlining the attributes
• Multiple keys possible:
One primary key is chosen and underlined.
Other keys, called secondary keys, either not
indicated or listed in a side comment attached to the
diagram.
dept name
number balance
course
student
Account
No two accounts have the same number. No two students have the same
name in the same dept.
Cardinality Constraints
A B A B A B
Many-to-many Many-to-one One-to-one
Multiplicity of binary relationship set R between entity sets A and B
Example: For “One-to-one,” an entity in A is associated with at most one
entity in B, and vice versa.
Data Association
•Associations exist between different attributes of
an entity.
•An association between two attributes indicates
that the values of the associated attributes are
interdependent.
•Relationship exists between entities (Binary
Relationship)
•A possible relationship that may exist between any
two sets may be -
• one-to-one
• one-to-many
• many-to-many
U1.55
U1.56
Many-to-One Relationship (cont)
customer custacct account
opendate
In a many-to-one relationship, relationship attributes can be
repositioned to the entity set on the “many” side.
customer custacct account
opendate
U1. 57
U1.58
Participation Constraints
• It is the participation of an entity set E in a
relationship set R. It can be
- Total
- Partial
A B A B
Total Partial
U1. 59
Weak Entity Sets
• Weak entity sets: they do not have sufficient attributes to form a key.
They need to “borrow” attributes from other entity sets to form a key.
• Example:
Transactions of different accounts could have the same trans#, so “trans#”
cannot be a key
By borrowing attribute “number” from “account,” we have a key for
“transaction.”
“Transaction” is a weak entity set related to accounts via log relationship.
number balance trans#
account log transaction
60
Weak Entity Sets (cont’)
• A weak entity set depends upon (one or more) strong entity sets via a
one-to-many relationship from whom they derive their key.
• The “helper” entity set that provides the attributes is called the “owner”
entity set.
• A weak entity set may have a discriminator (or a partial key) that
distinguish between weak entities related to the same strong entity
• Key of weak entity set = key of owner entity set(s) + discriminator
number balance trans#
amount
account log transaction
Weak Entity set
Strong entity set
An entity set that has a primary key is termed as strong
entity set.
Subclass/Superclass Relationships
• Reason: An ES may have members with special properties not associated with all
ES members.
• Example: Different accounts have different attributes.
Checking Account: overdraft amount,
Savings account: interest-rate.
• Possible representations in ER:
Add an attribute “accountType”: a checking account has a value for the
“overdraft” attribute. A savings account has a value for the “rate” attribute.
Problem: inconsistency; useless attributes; different accounts participate
in different relationships.
Add two columns : IsCheckingAccount and IsSavingAcc:
The value for overdraft will be stored in IsCheckingAcc column.
And the interest_rate will be stored in IsSavingAcc column.
Problems : there will be many NULL values.
64
Subclass/Superclass Relationships
account#
accounts
balance
ISA
savings checkings
rate overdraft
• The problems stated previously can be solved by using subclass/superclass
relationships.
• “Savings” and “checkings” are subclasses of the “account” ES.
• “Accounts” is a superclass of savings and checkings ES’s.
• An entity in a subclass must belong to the superclass as well.
Every savings/checking account is also an account.
• Attribute Inheritance:
Subclasses inherit all attributes of the superclass.
Key of the subclass is the same as the key for the superclass.
Subclass/Superclass Relationships
• Superclass and Subclass relationships arise during schema design due
to the process of specialization and generalization
• Specialization: process of classifying a class of objects into more
specialized subclasses
E.g., start with an employee ES, then specialize it into different
types of employees.
• Generalization: Reverse of specialization. A process of synthesis of
two or more lower-level ES to produce a higher-level ES.
Specialization
An entity set may include sub grouping of entities that are
distinct in some way from other entities in the set.
For Instance, a subset of entities with in an entity set may
have attributes that are not shared by all the entities in the
entity set.
The process of introducing new characteristics to an
existing class of objects to create one or more new classes
of objects is called Specialization..
Specialization Example
Generalization
A bottom-up design process – combine a number of
entity sets that share the same features into a higher-
level entity set.
Generalization is the process of viewing objects as a
single general class by concentrating on the general
properties of the constituent sets while ignoring their
differences.
Specialization and generalization are simple
inversions of each other. They are represented in an E-R
diagram in the same way.
Multiple Inheritance
•Subclass inherits all its attributes from its superclass.
•If a subclass has 2 or more superclasses, then it inherits from
all the superclasses.
Aggregation : Form 1
• This represents “whole-part or a-part-of” relationship. This is represented by a
hollow diamond followed by a line.
• In this type of relationship, a child object does not exist without its parent. And
a parent object may contain multiple instances of child object.
• Let’s take an example of relationship between Department and Teacher. A
Teacher object can not exist independently without the existence of any
department. And if we delete any department, the teachers associated with that
dept will also be deleted.
Another form of Aggregation
Form 2: It allows a relationship set to participate in another relation.
Express that “an employee works on a specific project possibly using some
machines (could be 0).”
employees Works projects
machinery
Design 1: incorrect, since it requires each project to use tools.
employees work projects
using machinery
“Design” 2: incorrect, since “relationships of relationships” are not permitted in ER!
Aggregation
employees work projects
using machinery
• Aggregation is an abstraction through which relationships are treated
as higher-level entities.
• Treat the relationship set “work” and the ES’s “employees” and
“projects” as a higher-level ES -- an aggregate entity set.
• Permit relationships between aggregate entity sets and other entity set.
• To create tables out of it : Create a table consisting Primary key of the
aggregated relationship & the primary key of the associated entity.
Draw E-R Diagram
Supplier(S_ID, Sname, Status, City)
Parts (P_ID, Pname, Color, Weight, City)
Projects(Pr_ID, Pr_name, City)
Supplied_Quantity(S_ID, P_ID, Pr_ID, Quantity)
Draw an E-R Diagram.
E/R Design Principles
• Keep the same schema: Schemas should not change often. So store
frequently changing information as instances.
currently each project consists of 10 members. Since later projects may
have more or less employees, do not hard code the 10 employees as 10
attributes of the project entity
• Avoid redundancy: schemas should prevent representing the same
facts multiple times.
An attribute/relationship is redundant if deleting it does not result in a
loss of any information
Redundancy may cause:
wastage of space
application programming more difficult: need to update all instances of a
fact to avoid inconsistency of database
• Consistent and clear names for attributes, entities, and relationships
Redundant Attributes
dept # mgr start date ssno
start date
departments manages employees
Redundant attribute: Managers start date are stored twice.
Redundant Relationship
supplies suppliers is-customer-of
items used-by projects
• The fact that a project is-customer-of a supplier can be
derived from the relationships “used-by” and “supplies”:
A project is-customer-of a supplier if the supplier
supplies an item used by the project.
Case Study 1
• Design a DB representing cities, counties, and states in the US:
For states, record the name, population, and state capital (a city).
For counties, record the name, the population, and the located state.
For cities, record the name, the population, the located state and the
located county.
• Uniqueness assumptions:
Names of states are unique.
Names of counties are unique within a state (e.g., 26 states have
Washington Counties).
Cities are unique only within a state (e.g., there are 24 Springfields
among the 50 states).
Some counties and cities have the same name, even within a state
(e.g., Los Angeles).
All cities are located within a single county
Design 1: bad
Co. Popu. Co. name
Popu.
cities Located states name
Ci. Popu.
Ci. name capit
al
Problem: County Population is repeated for each city.
Design 2: good
Co. Popu. Co. name Popu. name
counties Located states
Belongs-to
cities capitals
Ci. Popu. Ci. name
The population of a county is derived
from those of its cities.
Case Study 2
• Design a DB consistent with the following facts.
Trains are either local trains or express trains, but never both.
A train has a unique number and an engineer.
Stations are either express stops or local stops, but never
both.
A station has a unique name and an address.
All local trains stop at all stations.
Express trains stop only at express stations.
For each train and each station the train stops at, there is a
time.
Design 1: bad
number type name
time
addr
trains StopsAt stations
engineer type
Problem: does not capture the constraints that express trains only stop only at
express stations and local trains stop at all local stations
Design 2: good
number engineer
train
time name
address
ISA
local trains StopsAt2 stations
ISA
express trains
time
StopsAt1 express stations local stations
Case Study 3
An accounting firm wants a simple HR application that will help
it to keep track of its Employees, their positions (or
designations), allowances (or perks), salary scales, and which
departments have those positions.
The application must keep track of all the positions in the firm,
the employees appointed to fill up those positions, the
allowances granted to these positions, the salary scales for
these positions, and the departments having these positions.
Dr. Edgar F. Codd (1923-2003)
Codd completed his PhD at the
University of Michigan in 1963,
and presented a thesis on the topic
of a self-reproducing computer
consisting of a large number of
simple identical cells, each of
which interacts in a uniform
manner with its four immediate
neighbors.
Codd reported this work in a
book entitled Cellular Automata
published by Academic Press in
1968.
12 Codd's Rules
Rule 1 : The Information Rule
All data should be presented in table form.
Rollno Name Age college
10 Rohit 20 Bv
11 Rahul 21 Abes
12 Amit 22 Jss
13 Simran 23 its
Rule 2 : Guaranteed Access Rule
All data should be accessible without ambiguity.
This can be accomplished through a combination
of the table name, primary key and column name.
Rule 3: Systematic treatment of null values
A field should be allowed to remain empty. This
involves the support of null value, which is distinct
from an empty string or a number with a value of zero.
Most database implementations support the concept
of a not –null field constraint that prevent null values in
a specific table column.
Rule 4: Dynamic on-line catalog based on the relational
model
A relational database must provide access to its structure
through the same tools that are used to access the data.
This is usually accomplished by storing the structure definition
with in special system tables.
These tables are created owned and maintained by the DBMS.
They can be accessed by the users in the same manner as ordinary
tables, depending on the user’s privileges.
Rule 5 : Comprehensive data sub-language Rule
The database must support at least one clearly defined
language that include functionality for data definition,
data manipulation, data integrity and data transaction
control.
All commercial relational databases use forms of
standard SQL( structure Query Language) as their
supported comprehensive language.
Rule 6 : View updating Rule
Data can be presented into different logical
combinations called views.
Each view should support the same full range of data
manipulation that has direct access to a table available.
In practical, providing update and delete access to
logical views is difficult and is not fully supported by
current database.
Rule 7 : High-level Insert, Update and Delete
Data can be retrieved from a relational database in sets
constructed of data from multiple rows and multiple
tables.
This rule states that insert, update, delete operations
should be supported for any retrievable set rather just for
a single row in a single table.
Rule 8 : Physical data independence
The user is isolated from the physical method of storing
and retrieving information from the database.
Changes can be made to the underlying architecture (
hardware, disk storage methods) without affecting how the
user accesses it.
Rule 9 : Logical Data Independence.
How the data is viewed should not be changed
when the logical structure (table’s structure) of the
database changes.
This rule is difficult to satisfy.
Most databases rely on strong ties between the
data viewed and the actual structure of underlying
tables.
Rule 10 : Integrity Independence
SQL should support constraints on user input
that maintain database integrity.
At a minimum, all databases do preserve two
constraints through SQL.
Primary key should be not null and unique.
If a foreign key is defined in one table, any value
in it must exist as a primary key in another table.
Rule 11 : Distribution Independence
A user should be totally unaware of whether or
not the database is distributed ( whether parts of
the database exist in multiple locations).
A variety of reasons make this rule difficult to
implement.
Rule 12: The Non subversion rule
There should be no way to modify the database
structure other than through the multiple row
database language( SQL).
Most databases today support administrative
tools that allows some direct manipulation of
the data structure.
Short Questions:
Q.1 What is data independence?
Q.2.What do you mean by DBMS?
Q.3What is the difference between Generalization and
specialization?
Q.4 Describe the characteristics of DBMS.
Q.5 Explain all components of E-R Diagram.
Q.6 What is role of keys in DBMS and explain how many types of
keys are there?
Long Questions:
Q.1 Describe 3-level of abstractions of DBMS.
Q.2 Differentiate between physical and logical data
independency.
Q.3 Discuss all 12 rules of Dr. E.F. Codd’s.
Q.4 What is data model? Discuss various data models
available in DBMS.
Q.5Differentiate between weak and strong entity sets
with example.
Q.6 What is a DBMS? How does it differ from a
conventional file system?