0 ratings0% found this document useful (0 votes) 38 views44 pagesUnit III - RDBMS
Relational database management system
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
UNIT - Il
DATABASE DESIGN
41
INTRODUCTION
Database design can be defined as a collection of tasks or processes that enhance the
designing, development, implementation, and maintenance of enterprise data
management system.
Designing a proper database reduces the maintenance cost thereby improving data
consistency and the cost-effective measures are greatly influenced in terms of disk
storage space.
The designer should follow the constraints and decide how the elements correlate and
what kind of data must be stored.
The main objectives behind database designing are to produce physical and logical
design models of the proposed database system.
To elaborate this, the logical model is primarily concentrated on the requirements of
data and the considerations must be made in terms of monolithic considerations and
hence the stored physical data must be stored independent of the physical conditions.
On the other hand, the physical database design model includes a translation of the
logical design model of the database by keep control of physical media using hardware
resources and software systems such as Database Management System (DBMS).Ly
4.2
(a)
(b)
(ce)
Database Design
DATABASE DESIGN PROC
The process of designing a database involves the following steps:
Determine the purpose of the
database
ind and Organize the Information
v
Create tables for
the information
Establish relationship between the
tables
—
v
Redefine your design
Fig 4.1 Steps involved in Database Design Process
Determine the Purpose of the database:
be used for, how it is expected to be used,
who We expect to use it, ete. This will help us to develop a mission statement
and prepare for the Femaining steps,
Find and organize the information:
Once the purpose of the database has been figured OUl, We need to gather the
data that needs 10 be Slored there. After the nece
gathered, we need 10 organize it.
information by breaking each Piece
Create tables for the information:
Ssary information are
Its usually easiest to organize the
into its smallest Useful parts,
Once the information is Oeanized, we want to diy
tables. Separate the dati
a into major entities or s
will become a table. Lal
ide up the information into
bel each table with the
ubjects. Then, each subject
Subject within that table.Relational Database Management System 4.3
Establish relationships between the tables:
It can be hard to use a database with independent or unrelated tables. It is
best to look at each individual table and decide how the data within relates
to the data in other tables. We can then add fields to the tables or create new
ones to clarify the established relationships so everything is connected
Redefine your design:
One of the last database design steps is to take a step back once to check if
we've “completed” the database. We need to scan it and analyze the design
for any errors. Run the database with the tables and record to see if we can
get the results that we want. Necessary adjustments and refinements are to
be made to get the desired output.
OBJECTIVES OF DATABASE DESIGN
The database supports both required and ad hoc information retrieval. The
database must store the data necessary to support information requirements defined
during the design process and any possible ad hoc queries that may be posed by a user.
The tables are constructed properly and efficiently. Each table in the database
represents a single subject, is composed of relatively distinct fields, keeps redundant
data to an absolute minimum, and is identified throughout the database by a field with
unique values.
Data integrity is imposed at the field, table, and relationship levels. These levels of
integrity help guarantee that the data structures and their values will be valid and
accurate at all times.
The database supports business rules relevant to the organization. The data must
provide valid and accurate information that is always meaningful to the business.
The database lends itself to future growth. The database structure should be easy to
modify or expand as the information requirements of the business change and grow.
‘The database should be Flexible. The database should not be implemented in a firm
manner assuming that the business remains constant forever.
‘The database should be Efficient. The database design should make full and efficient
Use of the facilities provided; also the users must be able to interact with the database
without any time delay.
og44
43
43.1
43.2
4.3.3
Database Ds
DATABASE DESIGN TOOLS
NEED FOR DATABASE DESIGN TOOLS
DESIRED FE ATURES
ADVANTAGES
DISADVANTAGES
COMMERCIAL DATABASE DESIGN TOOLS _
ASE DESIGN TOOLS /
sed to automate the task of designing a business system.
NEED FOR DATAB.
available with a variety of features. The design tools are
The database design tools are u!
Many database design tools are
vendor specific.
productivity because the manual tasks are
The database design tools increase the overall
tedious tasks and more time is spent in
automated and less time is spent in performing
thinking about the actual design of the database.
The quality of the end product is improved in using database design tools.
DESIRED FEATURES OF DATABASE DESIGN TOOLS
Various features of database design tools are as follows:
The database design tools should capture the user needs.
The capability to model the flow of data in an organization.
The database design tool should have the capability to model entities and their
relationships.
The database design tool should hav i
e the capability t 2 fi
Language (DDL) to create database object.4 " ihe
The database design tool should support full life cycle database support.
The database design tool should .
Avira generate reports for documentation and user-feedback
ADVANTAG:
The advantages of using database design tools are as follows:
The amount of code to be sre
written ii c $a res
reduced as a result the database design time isRelational Database Management System 4.5
Chances of errors because of manual work are reduced.
Easy to convert the business model to working database model
Easy to ensure that all business requirements are met with. A
Higher quality, more accurate product is produced.
DISADVANTAGES
Some of the disadvantages of database design tools are given below:
More expenses involved for the tool itself.
Developers may require special training to use the tool.
COMMERCIAL DATABASE DESIGN TOOLS
Various popular database design tools are as follows:
HeidiSQL
This free and open-source software is one of the most popular data modeling tools for
MariaDB and MySQL worldwide.
Archi
It is an open-source conceptual and physical data modeling tool that uses
the ArchiMate modeling language. This language supports the analysis and
visualization of various complex database systems.
PgModeler
PgModeler is an open-source database modeler that supports multiple PostgreSQL
databases.
MySQL Workbench
MySQL Workbench is more than just a visual database design tool; it also integrates
database administration, performance monitoring, and database migration.
ModelSphere
Open Model Sphere is an open-source UML modeling tool that supports all forms of data
models - conceptual, logical, and physical. It allows for the conversion of models from
one type to another.o
46
6)
7
44
44.1
[Link] RED!
Database Design
Database Deployment Manager
Database Deployment Manager (DDM) is an open-source database design tool that
allows users - typically programmers - to create models and diagrams. It is also a
database management software that enables users to create and maintain databases and
create ER diagrams between tables,
DBDesigner
DBDesigner is an online database modeling tool that allows users to design database
schema without writing any SQL code. Its simple and intuitive user interface has
features that simplify the modeling process.
FUNCTIONAL DEPENDENCIES
mr
Introduction
+ Redundancy & Data Anomaly
Armstrong’s axioms/properties
* Types of functional dependencies
INTRODUCTION
ANCY AND DATA ANOMALY
Redundancy means having multiple copies of same data in the database. This
problem arises when a database is not normalized. Suppose a table of student details
attributes are: student Id, student name, college name, college rank, course opted, ;a
Relational Database Management System 47
‘As it can be observed that values of attribute college name, college rank, course is
used due to redundancy are: Insertion
being repeated W hich can lead to problems, Problems
anomaly. Deletion anomaly, and Updation anomaly
1 Insertion Anomaly
If a student detail has to be inserted whose course is not being decided yet then
insertion will not be possible till the time course is decided for student
+ 4 +
This problem happens when the rtion of a data record is not possible without
adding some additional unrelated data to the record.
2. Deletion Anomaly
If the details of students in this table are deleted then the details of college will also
get deleted. This anomaly happens when deletion of a data record results in losing
some unrelated information that was stored as part of the record that was deleted from
a table
11 is not possible to delete some information without losing some other information
in the table as well.
Updation Anomaly
Suppose if the rank of the college changes then changes will have to be all over the
database which will be time-consuming and computationally costly. If updation does
not occur at all places then database will be in inconsistent state.‘ datadase Desigy
44.1.2 WHAT IS FUNCTIONAL DEPENDENCY?
+ Functional dependency in DBMS is a relationship between attributes of a table
Gependent on each other and was introduced by EF, Codd, which helps in preventing
data redundaney
. \ functional dependency is a constraint that specifies the relationship between two
sels of attributes where one
can accurately determine the value of other sets
Ivas denoted as N —+ Y, where X is a set of attributes that is capable of determining,
the value of Y_ The attribute set on the lett side of the arrow, X is called Determinant,
while on the night side, W is called the Dependent
Above suggests the following:
Functional Dependency
A->B |
|
B- functionally dependent on A |
A- determinant set
B - dependent attribute
eeRelational Database Management System 4.9
"example:
| roll_no Name dept_name dept building
UCSOL Teena co M4
_ 7 —___ —___|
ucso2 Uma Iv A3 |
UCS03 Siva co A4
UCS04 Rex IT A3
|
|
uCS0S Vinu EC B2 |
UCS06 | Meena ME B2 |
From the above table we can conclude some valid functional dependencies:
roll no—+ { name, dept_name, dept_building },+ Here, roll_no can determine values
of fields name, dept_name and dept_building, hence a valid Functional dependency
roll_no + dept_name , Since, roll_no can determine whole set of {name, dept_name,
dept building}, it can determine its subset dept_name also.
dept_name — dept_building , Dept_name can identify the dept_building accurately,
since departments with different dept_name will also have a different dept_building
More valid functional dependencies: roll_no — name, {roll_no, name} >>
{dept_name, dept_building}, etc.
He oe
“re are some invalid functional dependencies:
name ,
ra — dept_name Students with the same name can have different dept_name,
lence eee
ce this is not a valid functional dependency.
dept ; i
fe bulking —+dept_name There can be multiple departments in the same building,
hence ae in the above table departments ME and EC are in the same building B2,
lept_building — dept_name is an invalid functional dependency.410 Database Design _
More invalid functional dependencies: name
+ roll no, fname, dept_name} —+
roll _no, dept building
+ roll no, etc
Example
The following 1s an example that would make it easier to understand functional dependency
We have a table with two attributes ~ Deptid and DeptName.
Deptld ~ Department ID
DeptName ~ Department Name
The Deptld is our —_ primary key. Here, Deptld uniquely identifies
the DeptName attribute. This is because if you want to know the department name, then
at first you need to have the Deptld.
Deptld DeptName
001 Finance
002 Marketing
003 HR
Therefore, the above functional dependency
'y between Deptld and DeptName can be
Getermined as Deptid is functionally dependent on DeptName
Deptld ->DeptName
4.4.2 ARMSTRONG’S AXIOMS/PROPERTIES OF FUNCTIONAL
DEPENDENCIES:
to Armstrong's Axioms property was developed by William Armstrong in 1974 to reason
about functional dependencies. The propert Bi $ : f
S Property suggests rules that hold true if the following areRelational Database Management System 4i)
Reflexivity
A-> [Link] Bis a subset of A.
Augmentation
The last rule suggests: AC->BC, if A>B
. _ Transitivity
If A->B and B->C, then A->C i.e. a transitive relation.
[Link] Reflexivity:
IFY isa subset of X, then XY holds by reflexivity rule
For example, {roll_no, name} — name is valid
[Link] Augmentation:
IfX — Y isa valid dependency, then XZ —+ YZ is also valid by the augmentation
ral.
For example, If {roll_no, name} — dept_building is valid, hence {roll_no, name,
dept_name} — {dept_building, dept_name} is also valid.
[Link] Transitivity:
IfX — Y and Y — Z are both valid dependencies, then X—+Z is also valid by the
Transitivity rule.
For example, roll_no —» dept_named&dept_name — dept_building, then roll_no —
dept_building is also valid.
44.3 TYPES OF FUNCTIONAL DEPENDENCIES IN DBMS
* Trivial functional dependency
* — Non-Trivial functional dependency
* — Multivalued functional dependency
* Transitive functional dependency
443.1 Trivial Functional Dependency
\n Trivial Functional Dependency, a dependent is always a subset of the determinant.
ic IfX — Y and Y is the subset of X, then it is called trivial functional dependencydatabase Design
For example,
roll_no | name | Age
42 abe | 17
| 43 par | 18
44 xyz | 18
Here, {roll_no, name} —> name is a trivial functional dependency, since the
dependent name is a subset of determinant set {roll_no, name}
Similarly, rol_no — roll_no is also an example of trivial functional dependency.
[Link] Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of the
determinant.
ie. IfX — Y and Yis not a subset of X, then it is called Non-trivial functional
dependency.
For example,
roll_no | name | Age
42 abe 17
)43 par | 18
|
| 44 xyz | 18
Here, roll_no + name is « non-trivial functional dependency, since the
dependent mame is not a subset of determinant rell_no
Similarly, {roll_mo, name} ~» age is also a non-trivial functional dependency
cy,
since age is not a subset of {roll_no, name}Relational Database Management System 4.13
3 Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent set are not
dependent on each other.
ie. If'a— {b, ¢} and there exists no funetional dependency between b and e, then it
is called a multivalued functional dependency.
For example,
roll_no | name | age
42 abe | 17
4B pqr | 18
44 xyz | 18
45 abe | 19
Here, roll_no — {name, age} is a multivalued functional dependency, since the
dependents name & age are not dependent on each other(i.e. mame — age or age >
name doesn’t exist !)
[Link] Transitive Functional Dependency
In transitive functional dependency, dependent is indirectly dependent on determinant.
ie. Ifa + b&b — ¢, then according to axiom of transitivity, a > ¢. This is
a transitive functional dependency.4.14 Database Design
For example,
| roll_no | Name | dept name | dept_building
[ucsor[tem|co fa |
UCS02 | Uma | IT : 2
UCS03 | Siva co I
UCS04 | Rex IT 2
Here, 14nroll_no — dept and dept — building_no,
Hence, according to the axiom of transitivity, 14nroll_no —> building_no is a valid
functional dependency. This is an indirect functional dependency, hence called
Transitive functional dependency.
4.5 | NORMALIZATION
¢ Introduction
2NF
© 3NF
* BCNF
* Denormalization
4.5.1) INTRODUCTION
A large database defined as a single
relation may result in data duplication. This
repetition of data may result in: :
+ Making relations very large
° Difficult to maintain and update data as it would involve s i
s searching many records in
relation.Relational Database Management System 4.15
| __ Wastage and poor utilization of disk space and resources.
The likelihood of errors and inconsistencies increases.
So to handle these problems, we should analyze and decompose the relations with
redundant data into smaller, simpler, and well-structured relations that satisfy desirable
properties. Normalization is a process of decomposing the relations into relations with fewer
attributes.
Database Normalization is a technique of organizing the data in the database.
Normalization is a systematic approach of decomposing tables to eliminate data redundancy
(repetition) and undesirable characteristics like Insertion, Update and Deletion Anomalies. It is
multi-step process that puts data into tabular form, removing duplicated data from the relation
tables.
Normalization is used for mainly two purposes,
+ Eliminating redundant (useless) data.
+ Ensuring data dependencies make sense i.e data is logically stored.
4.5.2 ADVANTAGES & DISADVANTAGES OF NORMALIZATION
[Link] Advantages of Normalization
«Normalization helps to minimize data redundancy.
+ Greater overall database organization.
. Data consistency within the database.
7 Much more flexible database design.
7 Enforces the concept of relational integrity.
[Link] Disadvantages of Normalization
* You cannot start building the database before knowing what the user needs.
* The performance degrades when normalizing the relations to higher normal forms, L.,
ANB, SNF.
. It is very time-consuming and difficult to normalize relations of a higher degree.
" Careless decomposition may lead to a bad database design, leading to serious problems.416 Database Design
4.5.3 TYPES OF NORMAL FORMS.
Normalization works through a series of stages called Normal forms. The normal forms
apply to individual relations. The relation is said to be in particular normal form if it satisfies
constraints,
Following are the various types of Normal forms:
Decomposition of Relation
Euminate
Repeating
Groups
Conditions
Figure 4.2 Types of Normal Forms
Normal Form Description
INF A relation is in INF if it contains an atomic value.
2NE A relation will be in 2NF if it is in INF and all non-key attributes are
fully functional dependent on the primary key.
3NF A relation will be in 3NF if it is in 2NF and no transition dependency
exists.
BCNF A stronger definition of 3NF is known as Boyce Codd’s normal form.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has
no multi-valued dependency.
ey
IZ
IS
A relation is in SNF. If it is in 4NF and does not c
dependency, joining should be lossless.
‘ontain any joinRelational Database Management System — 4.17
453.1 First Normal Form (INF)
. A relation will be INF if it contains an atomic valued attributes or columns.
. It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute. Values stored in a column should be of the same domain
. All the columns in a table should have unique names.
. First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
Example:
Relation EMPLOYEE is not in INF because of multi-valued attribute EMP_PHONE.
EMPLOYEE table:
| EMP_ID EMP_NAME EMP_PHONE EMP_STATE
| 4 John 7272826385, uP
9064738238
| 20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302
The decomposition of the EMPLOYEE table into INF has been shown below:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
I: tT . John 7272826385 UP
| a a John 9064738238 uP
C a Harry 8574783832 Bihar
a Sam 7390372389 Punjab
e Sam 8589830302 Punjab418 Databasy Design
[Link] Second Normal Form QNEF)
. In the 2NF, relational must be in INE
. In the second normal form, all non-key attributes are fully functional dependent on the
primary key
Example:
Let's assume, a school can store the data of teachers and the subjects they teach. Ina
school. @ teacher can teach more than one subject.
TEACHER table
reacHert> | supsect | TEACHERAGE
3 Chemistry 30 |
435 Biology 30 7
oa English : 35 ‘|
8 Maths : 38
7 : | Computer 38 |
In the given table, non-prime attribute TEACHER_AGE_ is dependent on
TEACHER ID which is a proper subset of a candidate key. That's why it violates the rule for
[Link] convert the given table into 2NF, we decompose it into two tables:
TEACHER DETAIL table:
TEACHER ID TEACHER_AGE
25 30
+ . _
47 35
{
83
| 38Relational Database Management System — 4.19
TEACHER_SUBJECT table:
Be TEACHER_ID SUBJECT
25 Chemistry |
25 Biology =
47 English
83 Math :
83 Computer
[Link] Third Normal Form (3NF)
A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
. 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
. If there is no transitive dependency for non-prime attributes, then the relation must be
in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for
every non-trivial function dependency X — Y.
1 X is a super key.
2. Y isa prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE _DETAIL table:
/EMP_ID EMP_NAME_ | EMP_ZIP EMP STATE | EMP_CITY oa
[208 "Hany 201010 “UP Noida
a ‘Stephan 02228 us Boston
444 Lan 60007 US Chicago
355) Katharine 06389 Uk | Norwich
666 John 462007 MP | __ Bhopal4.20 Database Design
Super key in the table above:
1 {EMPUID), (EMP_LID, EMP NAME), {EMP_ID, EMP NAME, EMP ZIP}. so on
Candidate key: |FMP_ ID}
Non-prime attributes: In the given table, all attributes except EMP ID are non-prime.
Here, EMP_ STATE & EMP_CITY dependent on EMP_ZIP and EMP_ ZIP dependent
on EMP_ID. The non-prime attributes (EMP_ STATE, EMP_ CITY) transitively dependent on,
super key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP STATE to the new
“EMPLOYEE Z1P> table, with EMP_ZIP as a Primary key
EMPLOYEE table:
EMP_ID
222
666
EMPLOYEE. ZIP table:
EMP _ZIP
201010
02228
60007
O63K9
462007
EMP_ZIP
201010
02228
EMP_CITY
Noida |
Boston
Chicago
Norwich
Bhopal
-Relational Database Management System 4.21
[Link] Boyce Codd normal form (BCNF)
. BCNF is the advance version of 3NF. It is stricter than 3NF.
. A table is in BCNF if every functional dependency X — Y, X is the super key of the
table.
+ For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example:
Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
EMP_ID | EMP_COUNTRY | EMP_DEPT | DEPT_TYPE
264 India Designing D394
264 India Testing D394 300
364] | UK Stores D283 232
364 UK Developing D283 349
| 264 “India Designing D394 235
In the above table Functional dependencies are as follows:
1. EMP_ID ~ EMP_COUNTRY
2. EMP_DEPT -> {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India4.22
Database Design
EMP_DEPT table:
EMP_DEPT | | DEPT_TYPE | EMP_DEPT_NO |
Designing SSS = |
| Testing | D394 300 7
Stores | SSC«i@283 232
LK Developing =| —C«é@D iz “549 _
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283 7
D394 300
r D283 232
D283 549
Functional dependencies:
1.
2
EMP_ID — EMP_COUNTRY
EMP_DEPT — {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now: this is in BCNF because left side part of both the functional dependencies isa key.
[Link] Fourth normal form (4NF)
A relation will be in 4NF if it is in Boyce C
dependency. in Boyce Codd normal form and has no multi-valued
For a dependency A — B, if for a single value of A. :
. Multip] :
the relation will be a multi-valued dependency. Pit values of B exists, thenRelational Database Management System 4.23 ’
| COURSE | HOBBY
ja a Computer Dancing
a Math Singing
: uo Chemistry [ Dancing
4 | a | Cricket
59 |= Physics Hockey |
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-
valued dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
: STUD COURSE
a Computer
a | Math
L : 34 Chemistry
Se Biology
| 59 Physics4.24 Database Di
STUDENT_HOBBY
| STUID ____ HOBBY = |
— Ai Dancing |
aaa 7 : __ Singing —_|
a Dancing |
i _ 4 Cricket
r 59 Hockey |
[Link] Fifth normal form (SNF)
* Avelation is in SNF if it is in 4NF and not contains any join dependency and joining
should be lossless.
SNF is satisfied when all the tables are broken into as many tables as possible in order
to avoid redundancy.
SNF is also known as Project-join normal form (PJ/NF).
| SUBJECT LECTURER SEMESTER
|
oe __Computer _ Anshika Semester |
Computer John Semester |
Math John Semester |
|
Math Akash Semester 2
eee EE verse | __ Semester 1
In the above table, John takes
both Computer and Math
Class for Semester | but he
ter 2. In this case, combination
of all these fields required to
doesn't take Math class for Semes|
identify a valid data.
Suppose we add a new Semester as Semester
who will be taking that Subject so we leave Lecturer a
together acts as a Primary key,
t 3 but do not knoy
ind Subject as NU
So we can't leave other t
W about the subject and
LL. But all three columns
'wo columns blank,
So to make the above table into SNF, we can decompose it into three relations P1, P2&
P3: .Relational Database Management System
425
Pi
SEMESTER 7 SUBJECT |
Semester) Computer |
- Semester 1 CF Math |
- Semester! | Chemistry
Semester 2 Math |
P2
SUBJECT LECTURER SUBJECT ;
Computer Anshika Computer o
Computer John Computer :
Math John Math
: Math Akash Math
Chemistry Praveen Chemistry
semsteR "LECTURER :
Semester) | nha
oa John
Semester) | ohn
Semester? Tah
Semester) Praveen26
46
database. The da
Database Design
DENORMALIZATION
Denormalization is a database optimization technique where we add redundant data in
the database to get rid of the complex join operations. This is done to speed up database access
speed. Denormalization is done after normalization for improving the performance of the
a from one table is included in another table to reduce the number of
joins in the query and hence helps in speeding up the performance.
Example:
Suppose after normalization we have two tables first, Student table and second, Branch
table. The student has the attributes as Roll_no, Student-name, Age, and Branch_id.
Student table fs
Roll_no | Student_name ] Age ] Branch_id
1 ‘Andrew 18 10
2 Angel 19 10
3 Priya 20 10
4 Analisa 2 1
5 ‘Anna 2 12
The branch table is related to the Student table
the Student table.
to perform the join operations. So, we can add the di
the Student table and this will help in reducin,
operation and thus optimize the database.
4.61
with Branch_id as the foreign key in
___Branch table _
| Branch_id | Branch_name| HOD
| 40 csE Meat
| on | EC | Dee |
poe Ex __| Drpar |
If we want the name of students along with the name of the branch name then we need
to perform a join operation. The problem here is that if the table is large we need a lot of time
ADVANTAGES OF DENORMALIZATION
Query execution is fast since we have
Fetching queries in a normalized database
tables, but we already know that the more joins,
lata of Branch_name from Branch table
ig the time that would have been used in join
to join fewer tables.
generally requires joining a large number of
the slower the query. To overcome this.4.6.2
4.6.3
Relational Database Management 8) 427
we can add redundancy to a database by copying values between parent and child tables,
minimizing the number of joins needed for a query.
Make database more convenient to manage
A normalized database is not required calculated values for applications. Calculating
these values on-the-fly will take a longer time, slowing down the execution of the query.
Thus, in denormalization, fetching queries can be simpler because we need to look at
fewer tables.
Facilitate and accelerate reporting
Suppose you need certain statistics very frequently. It requires a long time to create them
from live data and slows down the entire system. Suppose you want to monitor client
revenues over a certain year for any or all clients. Generating such reports from live data
will require "searching" throughout the entire database, significantly slowing it down.
DISADVANTAGES OF DENORMALIZATION
The following are the disadvantages of denormalization:
It takes large storage due to data redundancy.
It makes it expensive to updates and inserts data in a table.
It makes update and inserts code harder to write.
Since data can be modified in several ways, it makes data inconsistent. Hence, we'll need
to update every piece of duplicate data. It's also used to measure values and produce
reports. We can do this by using triggers, transactions, and/or procedures for all
operations that must be performed together.
HOW IS DENORMALIZATION DIFFERENT FROM NORMALIZATION?
The denormalization is different from normalization in the following manner:
Denormalization is a technique used to merge data from multiple tables into a single
table that can be queried quickly. Normalization, on the other hand, is used to delete
redundant data from a database and replace it with non-redundant and reliable data.
re run regularly on the
large number of
se tables are not
Denormalization is used when joins are costly, and queries at
‘ables. Normalization, on the other hand, is typically used when a
inserVupdate/delete operations are performed, and joins between tho
expensiveTRANSACTION PROCESSING &
DATABASE SECURITY
$.1 TRANSACTION PROCESSING
> INTRODUCTION
> TRANSACTION OPERATIONS
> TRANSACTION STATES
> PROPERTIES OF TRANSACTIONS
> SCHEDULES & CONFLICTS
c> o SERIAL SCHEDULE
o PARALLEL SCHEDULE
> SERIALIZABILITY
> ANOMALIES DUE TO INTERLEAVED
TRANSACTIONS
© WRCONELICTS
© RWCONEBLICTS
© WWCONPLICTS
LOCK BASED CONCURRENCY CONTROL
o LOCK BASED PROTOCOL
o TIMESTAMP BASED PROTOCOL52__ Transaction Processing & Database Security
$1.1 INTRODUCTION
A transaction is a program including a collection of database operations, executed as
logical unit of data processing. The operations performed in a transaction include one or more
of database operations like insert, delete, update or retrieve data. It is an atomic process that 1s
cither performed into completion entirely or is not performed at all
Each high level operation can be divided into a number of low level tasks or operations.
For example, a data update operation can be divided into three tasks
+ read_item() ~ reads data item from storage to main memory.
+ modify_item() ~ change value of item in the main memory,
. write_item() ~ write the modified value from main memory to storage.
m() and write_item() operations. Likewise, for
database operations.
Database access is restricted to read
all transactions, read and write forms the bi
5.1.2. TRANSACTION OPERATIONS
The low level operations performed in a transaction are
. begin_transaction ~ A marker that specifies start of transaction execution.
. read_item or write_item — Database operations that may be interleaved with main
memory operations as a part of transaction.
|_transaction — A marker that specifies end of transaction
* commit — A signal to specify that the transaction has been successfully completed in its
entirety and will not be undone.
. rollback — A signal to specify that the transaction has been unsuccessful and so all
temporary changes in the database are undone. A committed transaction cannot be rolled
back
S13 TRANSACTION
A
A transaction may go through a subset of five states, active, partially committed,
“ommitted, failed and aborted
1 + state. The transaction
Active The initial state where the transaction enters is the active state, The transac
Femains in this state while it is executing read, write or other operations
ogRelational Database Management System $3
Partially Committed — The transaction enters this state after the last statement of the
transaction has been executed,
Committed ~ The transaction enters this state after successful completion of the
transaction and system checks have issued commit signal.
Failed ~ The transaction goes from partially committed state or active state to failed
state when it is discovered that normal execution can no longer proceed or system checks
fail,
Aborted — Thi
is the state after the transaction has been rolled back after failure and
the database has been restored to its state that was before the transaction began.
The following state transition diagram depicts the states in the transaction and the low
level transaction operations that causes change in states.
begin_transaction
Figure 5,1 State Transition Diagram — States in a transaction
5.1.4 DESIRABLE PROPERTIES OF TRANSACTIONS
Any transaction must maintain the ACID properties, viz, Atomicity, Consistency
Isolation, and Durability : ;
Atomicity ~ This property states that a tran:
is, either it is performed in its entirety or not performed at all. No Partial update should
exist.
tion is an atomic unit of Processing, that
Consistency — A transaction should take the database from
one consi:
another consistent state. It should not adversely affect any dat;
lent state
a item in the databaseSa
Transaction Processing & Database Security
Isolation — A transaction should be executed as if it is the only one in the system. There
should not be any interference from the other concurrent transactions that are
simultaneously running.
Durability — If a committed transaction brings about a change, that change should be
durable in the database and not lost in case of any failure.
SCHEDULES AND CONFLICTS,
In a system with a number of simultaneous transactions, a schedule is the total order of
execution of operations. Given a schedule S comprising of n transactions, say T!, T2.
T3..
..Tn; for any transaction Ti, the operations in Ti must execute as laid down in the
schedule S.
Types of Schedules
There are two types of schedules —
Serial Schedules — In a serial schedule, at any point of time, only one transaction is
active, i.e. there is no overlapping of transactions. This is depicted in the following graph
é
2
G T1 T2 m3
3 —= §» —s —_ >
=
Time
Figure 5.2 Serial Schedule
Parallel Schedules ~ In parallel schedules, more than one transactions are active
simultaneously, i.e. the transactions contain operations that overlap at time. This 1s
depicted in the following graphRelational Database Management System 55
Transactions
Time
Figure 5.3 Parallel Schedule
5.1.6 CONFLICTS IN SCHEDULES
In a schedule comprising of multiple transactions, a conflict occurs when two active
transactions perform non-compatible operations. Two operations are said to be in conflict, when
all of the following three conditions exists simultaneously —
. The two operations are parts of different transactions.
. Both the operations access the same data item.
. ‘At least one of the operations is a write_item() operation, i.e. it tries to modify the data
item.
§.1.7. SERIALIZABILITY
A serializable schedule of ‘n’ transactions is a parallel schedule which is equivalent to
a serial schedule comprising of the same ‘n’ transactions. A serializable schedule contains the
correctness of serial schedule while ascertaining better CPU utilization of parallel schedule.
Equivalence of Schedules
Equivalence of two schedules can be of the following types
+ Result equivalence — Two schedules producing
equivalent Producing identical results are said to be result
. View equivalence
Two sched
ules that perfor s -
said to be view equivalent Perform similar action in a similar manner are
. Conflict equivalence Two schedul
les a
He Si , i
the same se of transactions and has ty said to be conflict equivalent if both contai"
c
same order of conflicting pairs of operations.
_,56 Transaction Processing & Database Security
5.1.8 ANOMALIES DUE TO INTERLEAVED TRANSACTIONS
When the read and write operations done alternatively there is a possibility of some type
of anomalies. These are classified into three categories.
1. Write-Read Conflicts (WR Conflicts)
This conflict occurs when a transaction reads the data which is written by the other
transaction, but not yet committed. This happens when the Transaction T2 is trying to
read the object A that has been modified by another Transaction T1, which has not yet
completed (committed). This type read is called as dirty read.
CJ
a
|
fle
Figure 5.4 Write-Read Conflicts (WR Conflicts)
Suppose if the transactions are interleaved according to the above schedule then the
account transfer program T1 deducts $100 from account A, then the interest deposit
Program T2 reads the current values of accounts A and B and adds 6% interest to each,
ees are transfer program credits $100 to account B. The outcome of this
oa sit different from the normal execution like if the two instructions are
y one. This type of anomalies leaves the database in inconsistency state.Relational Database Management System 5.7
Read-Write Conflicts (RW Conflicts)
This conflict occurs when a transaction writes the data which is previously read by the
other transaction. In this case anomalous behavior could result is that a Transaction T2
could change the value of an object A that has been read by a Transaction T1, while T2
is still in progress. If TI tries to read A again it will get different results. This type of
tead is called as Unrepeatable Read.
Figure 5.5 Read-Write Conflicts (RW Conflicts)
‘main in account A. Now TI will try to reduce it
by $100. This makes the Database inconsistent.
Write-Write Conflicts (WW Conflicts)
This conflict occurs when the data updated by a transaction is overwritten by another
transaction which might lead to data update loss, The third type of anomalous behavior
is that one Transaction is updating an object while another one is also in progress. This
type of write is called as Blind Write,5.8 Transaction Processing & Database Security
RIB)
RiB)
[ve COMMIT
comMIT
Figure 5.6 Write-Write Conflicts (WW. Conflicts)
If A and B are two accounts and their values have to be kept equal always, Transaction
TI updates both objects to 3,000 and T2 updates both objects to 2,000. At first T1
updates the value of object A to 3,000. Immediately T2 makes A as 2,000 and B as 2,000
and committed. After the completion of T2, T1 updates B to 3,000. Now the value of A
is 2,000 and value of B is 3,000, they are not equal. Constraint violated in this case due
to serial scheduling.
5.1.9 LOCK BASED CONCURRENCY CONTROL
Ina multiprogramming environment where multiple transactions can be executed
simultaneously, it is highly important to control the concurrency of transactions. We have
Concurrency control protocols to ensure atomicity, isolation, and serializability of concurrent
transactions. Concurrency control Protocols can be broadly divided into two categories —
Lock based Protocols
. Time stamp based Protocols
[Link] LOCK-BASED PROTOCOLS
Database s,
transaction cannot
‘ystems equipped with lock-based protocols use a mechanism by which any
kinds —
read or write data until it acquires an appropriate lock on it. Locks are of two
Binary Locks —
A lock on a data item c: i 7
unlocked. ‘an be in two states;
it is either locked orRelational Database Management System 5.9
Shared/exclusive ~ This type of locking mechanism differentiates the locks based on
their us
Ifa lock is acquired on a data item to perform a write operation, it 1s an
exclusive lock. Allowing more than one transaction to write on the same data item would
lead the database into an inconsistent state. Read locks are shared because no data value
is being changed.
There are four types of lock protocols available ~
(a) Simplistic Lock Protocol
Simplistic lock-based protocols allow transactions to obtain a lock on every object
before a ‘write’ operation is performed. Transactions may unlock the data item after
completing the ‘write’ operation.
} (b) Pre-claiming Lock Protocol
[ Pre-claiming protocols evaluate their operations and create a list of data items on which
they need locks. Before initiating an execution, the transaction requests the system for
all the locks it needs beforehand. If all the locks are granted, the transaction executes
and releases all the locks when all its operations are over. If all the locks are not granted,
the transaction rolls back and waits until all the locks are granted.
‘9 Lock acquisition
phase
Seas ett
T begin Teng Time
Figure 5.7 Preclaiming Lock Protocol
(c) Two-Phase Locking 2PL
This locking protocol divides the execution phase of a transaction into three parts. In the
first part, when the transaction starts Cxecuting, it seeks permission for the locks it
Tequires. The second part is where the transaction Acquires all the locks. As soon as the
transaction releases its first lock, the third phase starts, In this phase, the transaction
cannot demand any new locks; it only releases the acquired locks, i5.10 Transaction Processing & Database Security
Lock acquisition releasing
phase \ phase
T begin Tend me
Figure 5.8 Two — Phase Locking Protocol
Two-phase locking has two phases, one is growing, where all the locks are being
acquired by the transaction; and the second phase is shrinking, where the locks held by
the transaction are being released.
To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock
and then upgrade it to an exclusive lock.
(a) Strict Two-Phase Locking
The first phase of Strict-2PL is same as 2PL. After acquiring all the locks in the first
phase, the transaction continues to execute normally. But in contrast to 2PL, Strict-2PL
does not release a lock after using it. Strict-2PL holds all the locks until the commit point
and releases all the locks at a time.
Lock acquisition release at
phase commit
Nf 4
T begin Tend
Figure 5.9 Strict two phase locking Protocol
Time
Strict-2PL does not have cascading abort as 2PL does.
1.9.2 TIMES
°.2 TIMESTAMP-BASED PROTOCOLS
i. the Most commonly used concurrency protocol is the timestamp based protocol. This
col uses ei : : :
Ses either system time or logical counter as a timestamp.wa
; Relational Database Management System Su
Lock-based protocols manage the order between the conflicting pairs among
transactions at the time of execution, whereas timestamp-based protocols start working as soon
as a transaction is created.
Every transaction has a timestamp associated with it, and the ordering is determined by
the age of the transaction. A transaction created at 0002 clock time would be older than all other
transactions that come after it. For example, any transaction 'y' entering the system at 0004 is
two seconds younger and the priority would be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the
system know when the last ‘read and write’ operation was performed on the data item.
[Link].1 Timestamp Ordering Protocol
The timestamp-ordering protocol ensures serializability among transactions in their
conflicting read and writes operations. This is the responsibility of the protocol system that the
conflicting pair of tasks should be executed according to the timestamp values of the
transactions.
* The timestamp of transaction T; is denoted as TS(T}).
+ Read time-stamp of data-item X is denoted by R-timestamp(X).
. Write time-stamp of data-item X is denoted by W-timestamp(X).
Timestamp ordering protocol works as follows —
. If a transaction Ti issues a read(X) operation —
o If TS(Ti) < W-timestamp(X)
. Operation rejected.
o If TS(Ti) >= W-timestamp(X)
. Operation executed.
o All data-item timestamps updated.
. If a transaction Ti issues a write(X) operation —
o If TS(Ti) < R-timestamp(X)
. Operation rejected.
o If TS(Ti) < W-timestamp(X)5.12 Transaction Processing & Database Security
+ Operation rejected and Ti rolled back.
o Otherwise, operation executed.
[Link].2 Thomas’ Write Rule
This rule states if TS(Ti) < W-timestamp(X), then the operation is rejected and T, is
rolled back.
Time-stamp ordering rules can be modified to make the schedule view serializable.
Instead of making T; rolled back, the 'write' operation itself is ignored.
5.2 DATABASE SECURITY
> INTRODUCTION
COMMON THREATS & CHALLENGES
> CONTROL MEASURES TO PROVIDE |
SECURITY
> BEST PRACTICES FOR EVALUATING
DATABASE SECURITY
> CLASSIFICATION OF DATABASE SECURITY
° TYPES
° DIFFERENT LEVELS
5.2.1 INTRODUCTION
cools, controls, and measures designed t
integrity, and availability. It also mean
loss of data. Security of data base i
Database security refers to the range of
©stablish and preserve database confidentiality,
keeping sensitive information safe and prevents the
Sontrolled by Database Administrator (DBA).
5.2.2 COMMON THREATS AND CHALLENGES
erabilities, or patterns of carelessness or misus
Many software misconfigurations, vuln
of databa:
an result in breaches. The following are among the most common types or causes
Security attacks and their causes:Relational Database Management System 5.43
Insider thrent
An insider threat is a security threat from any one of three sources with privileged acces,
to the database
° A malicious insider who intends to do harm
> A negligent insider who makes errors that make the database vulnerable to attack
. An infiltrator--an outsider who somehow obtains credentials via a scheme such 2,
phishing or by gaining access to the credential database itself
Insider threats are among the most common causes of database security breaches and
are often the result of allowing too many employees to hold privileged user access credentials
Exploitation of database software vulnerabilities:
Hackers make their living by finding and targeting vulnerabilities in all kinds of
software, including database management software. All major commercial database softwar:
vendors and open source database management platforms issue regular security patches t
address these vulnerabilities, but failure to apply these patches in a timely fashion can increase
your exposure.
SQL/NoSQL injection attacks:
A database-specific threat, these involve the insertion of arbitrary SQL or non-
SQL attack strings into database queries served by web applications or HTTP headers
Organizations that don’t follow secure web application coding practices and perform regula"
vulnerability testing ‘are open to these attacks.
Buffer overflow exploitations:
Buffer overflow occurs when a proc
attempts to write more data to a fixed-length
block of memory than it is allowed to hold. Attackers may use the excess data, stored in adjacen!
memory addresses, as a foundation from which to launch attacks.
Denial of service (DoS/DDoS) attacks:
In a denial of service (DoS) attack, the attacker surge the target server—in this case the
database server —with so many requests that the server can no longer fulfill legitimate reques's
from actual users, and, in many cases, the server becomes unstable or crashes..14._Transaction Processing & Database Security
Malware:
Malware is software written specifically to exploit vulnerabilities or otherwise cause
~ gamage to the database. Malware may arrive via any endpoint device connecting to the
databa
Attacks on backups:
Organizations that fail to protect backup data with the same stringent controls used to
protect the database itself can be vulnerable to attacks on backups.
3 CONTROL MEASURES TO PROVIDE DATABASE SECURITY
The following are the main control measures are used to provide security of data in
databases:
1, Authentication
2. Access control
3. Inference control
4. Flow control
5. Database Security applying Statistical Method
6. Encryption
These are explained as following below.
1, Authentication ; Authentication is the process of confirmation that whether the user
log in only according to the rights provided to him to perform the activities of data
base. A particular user can login only up to his privilege but he can’t access the other
Sensitive data. The privilege of accessing sensitive data is restricted by using
Authentication.
By using these authentication tools for biometrics such as retina and figure prints can
Prevent the data base from unauthorized/malicious users.
Access Control : The security mechanism of DBMS must include some provisions
for restricting access to the data base by unauthorized users. Access control is done by
Creating user accounts and to control login process by the DBMS. So, that database
Access of sensitive data is possible only to those people (database users) who are5.2.4
Relational Database Management System 5.15
allowed to access such da pd to restrict acs
ess to unauthorized persons.
The database system must also keep the track of all operations performed by certain
user throughout the entire login time
Inference Control :
his method is known as the countermeasures to statistical
database security problem, It is used to prevent the user from completing any inference
channel This method protect sensitive information — from _ indirect
disclosure. Inferences are of two types, identity disclosure or attribute disclosure.
Flow Control : This prevents information from flowing in a way that it reaches
unauthorized users, Channels are the pathways for information to flow implicitly in
ways that violate the privacy policy of a company are called convert channels.
Database Security applying Statistical Method : Statistical database security
focuses on the protection of confidential individual values stored in and used for
statistical purposes and used to retrieve the summaries of values based on categories
They do not permit to retrieve the individual information. This allows to access the
database to get statistical information about the number of employees in the company
but not to access the detailed confidential/personal information about the specific
individual employee.
Encryption : This method is mainly used to protect sensitive data (such as credit card
numbers, OTP numbers) and other sensitive numbers. The data is encoded using some
encoding algorithms. An unauthorized user who tries to access this encoded data wil
face difficulty in decoding it, but authorized users are given decoding keys to decode
data.
BEST PRACTICES FOR EVALUATING DATABASE SECURITY
Consider each of the following areas:
Physical security: Whether your database server is on-premise or in a cloud dat
center, it must be located within a secure, climate-controlled environment
Administrative and network access controls: The practical minimum number °!
users should have access to the database, and their permissions. should be restricted
the minimum levels necessary for them to do their jobs. Likewise, network acces
should be limited to the minimum level of permissions necessary
End user account/device security:
Always be aware of who is accessing the databas’
and when and how the data is being used. Data monitoring solutions can alert you !!
data activities are unusual or appear risky. All user devices connecting to the networkin Transaction Processing & Database Security »
housing the database should be physically secure and subject to security controls at all |
times
Encryption: ALL data including data in the database, and credential data should
be protected with best-in-class encryption while at rest and in transit, All encryption
keys should be handled in accordance with best-practice guidelines,
. Di ase software security: Always use the latest version of your database
management software, and apply all patches as soon as they are issued
Application/web server security: Any application or web server that interacts with
the database can be a channel for attack and should be subject to ongoing security
testing and best practice management
Backup security: All backups, copies, or images of the database must be subject to
the same (or equally stringent) security controls as the database itself.
Auditing: Record all logins to the database server and operating system. and log all
operations performed on sensitive data as well. Database security standard audits
should be performed regularly.
5.2.5 CLASSIFICATION OF DATABASE SECURITY
The database security is broadly classified into physical and logical security. Database
recovery is the way of restoring a database to a correct state in the event of a failure.
+ Physical security ~ Physical security refers to the security of the hardware that is
associated with the system and the protection of the site where the computer resides.
The natural events like fire, floods, and earthquakes can be considered as some of the
physical threats. It is advisable to have backup copies of databases in the face of massive
disasters,
. Logical security — Logical security refers to the security measures present it the
operating system or the DBMS designed to handle threats to the data. Logical security
1s far more difficult to accomplish.
52.5.1 DATABA: ECURITY AS PER THE LEVE!
Database security is performed at different levels. This is explained below
(4) Database Security at Design Level
Is necessary to take care of the database security at the stage of database design Some
; #uidelines to implement the most secure system are(b)
Relational Database Management System 5.17
The database design should be simple.
The database must be normalized.
Create a unique key for each user or group of users.
Database security at maintenance level
Once the database is designed, the administrator is playing an important role in the
maintenance of the database. The security issues at maintenance level can be classified
into the following —
Operating system issues and availability
Confidentiality and accountability through authorization rules
Encryption
Authentication schemes
Database Security through Access Control
A database for an enterprise contains a great deal of information and usually has several
groups of users. Most users need to access only a small portion of the database which is
allocated to them. DBMS should provide mechanisms to access the data. Especially, it
is a way to control the data accessible by a given user.
The mechanisms for access control at the DBMS level are as follows —
Discretionary access control: DAC is identity-based access control. DAC
mechanisms will be controlled by user identification such as username and password
DAC is discretionary because the owners can transfer objects or any Sel
information to other users. In simple words, the owner can determine the access
privileges. :
Mandatory access control: The operating system in MAC will provid the
user based on their identities and data, For gaining access, the a hi ‘ mabe J
personal information. It is very secure because the rules o eer
by the admin and will be strictly followed. MAC settings a1
be established in a secure network and are limited to syste
id restrictions are imposed
nd policy management will
'm administrators.