Unit 1 Notes DBMS
Unit 1 Notes DBMS
PAPER 105
Unit 1
1. What do you understand about DBMS?
What is DBMS?
Database Management System (DBMS) is software for storing and retrieving users’ data while
considering appropriate security measures. It consists of a group of programs that manipulate
the database. The DBMS accepts the request for data from an application and instructs the
operating system to provide the specific data. In large systems, a DBMS helps users and other
third-party software store and retrieve data. DBMS allows users to create their own databases
as per their requirements. The term “DBMS” includes the user of the database and other
application programs. It provides an interface between the data and the software application.
Example of a DBMS
Let us see a simple example of a university database. This database is maintaining information
concerning students, courses, and grades in a university environment. The database is
organized as five files:
• The STUDENT file stores the data of each student
• The COURSE file stores contain data on each course.
• The SECTION stores information about sections in a particular course.
• The GRADE file stores the grades which students receive in the various sections
• The TUTOR file contains information about each professor.
To define DBMS:
We need to specify the structure of the records of each file by defining the different types of
data elements to be stored in each record.
We can also use a coding scheme to represent the values of a data item.
Basically, your Database will have 5 tables with a foreign key defined amongst the various
tables.
History of DBMS
Here, are the important landmarks from the history of DBMS:
Characteristics of DBMS
Here are the characteristics and properties of a Database Management System:
Users of DBMS
Following are the various category of users of DBMS
• MySQL • SQLite
• Microsoft Access • IBM DB2
• Oracle • LibreOffice Base
• PostgreSQL • MariaDB
• dBASE • Microsoft SQL Server
• FoxPro
Application of DBMS
Below are the popular database system applications:
Types of Databases
There are various types of databases used for storing different varieties of data:
1) Centralized Database
It is the type of database that stores data at a centralized database system. It comforts the users
to access the stored data from different locations through several applications. These
applications contain the authentication process to let users access data securely. An example of
a Centralized database can be Central Library that carries a central database of each library in
a college/university.
Advantages of Centralized Database
o It has decreased the risk of data management, i.e., manipulation of data will not affect
the core data.
o Data consistency is maintained as it manages data in a central repository.
o It provides better data quality, which enables organizations to establish data standards.
o It is less costly because fewer vendors are required to handle the data sets.
Disadvantages of Centralized Database
o The size of the centralized database is large, which increases the response time for
fetching the data.
o It is not easy to update such an extensive database system.
o If any server failure occurs, entire data will be lost, which could be a huge loss.
2) Distributed Database
Unlike a centralized database system, in distributed systems, data is distributed among different
database systems of an organization. These database systems are connected via communication
links. Such links help the end-users to access the data easily. Examples of the Distributed
database are Apache Cassandra, HBase, Ignite, etc.
We can further divide a distributed database system into:
o Homogeneous DDB: Those database systems which execute on the same operating
system and use the same application process and carry the same hardware devices.
o Heterogeneous DDB: Those database systems which execute on different operating
systems under different application procedures, and carries different hardware devices.
Advantages of Distributed Database
o Modular development is possible in a distributed database, i.e., the system can be
expanded by including new computers and connecting them to the distributed system.
o One server failure will not affect the entire data set.
3) Relational Database
This database is based on the relational data model, which stores data in the form of rows(tuple)
and columns(attributes), and together forms a table(relation). A relational database uses SQL
for storing, manipulating, as well as maintaining the data. E.F. Codd invented the database in
1970. Each table in the database carries a key that makes the data unique from
others. Examples of Relational databases are MySQL, Microsoft SQL Server, Oracle, etc.
Properties of Relational Database
There are following four commonly known properties of a relational model known as ACID
properties, where:
• A means Atomicity: This ensures the data operation will complete either with success
or with failure. It follows the 'all or nothing' strategy. For example, a transaction will either be
committed or will abort.
• C means Consistency: If we perform any operation over the data, its value before and
after the operation should be preserved. For example, the account balance before and after the
transaction should be correct, i.e., it should remain conserved.
• I means Isolation: There can be concurrent users for accessing data at the same time
from the database. Thus, isolation between the data should remain isolated. For example, when
multiple transactions occur at the same time, one transaction effects should not be visible to
the other transactions in the database.
• D means Durability: It ensures that once it completes the operation and commits the
data, data changes should remain permanent.
4) NoSQL Database
Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of data sets.
It is not a relational database as it stores data not only in tabular form but in several different
ways. It came into existence when the demand for building modern applications increased.
Thus, NoSQL presented a wide variety of database technologies in response to the demands.
We can further divide a NoSQL database into the following four types:
• Key-value storage: It is the simplest type of database storage where it stores every
single item as a key (or attribute name) holding its value, together.
• Document-oriented Database: A type of database used to store data as JSON-like
document. It helps developers in storing data by using the same document-model format as
used in the application code.
• Graph Databases: It is used for storing vast amounts of data in a graph-like structure.
Most commonly, social networking websites use the graph database.
• Wide-column stores: It is similar to the data represented in relational databases. Here,
data is stored in large columns together, instead of storing in rows.
Advantages of NoSQL Database
o It enables good productivity in the application development as it is not required to store
data in a structured format.
o It is a better option for managing and handling large data sets.
o It provides high scalability.
o Users can quickly access data from the database through key-value.
5) Cloud Database
A type of database where data is stored in a virtual environment and executes over the cloud
computing platform. It provides users with various cloud computing services (SaaS, PaaS,
IaaS, etc.) for accessing the database. There are numerous cloud platforms, but the best options
are:
o Amazon Web Services(AWS)
o Microsoft Azure
o Kamatera
o PhonixNAP
o ScienceSoft
o Google Cloud SQL, etc.
6) Object-oriented Databases
The type of database that uses the object-based data model approach for storing data in the
database system. The data is represented and stored as objects which are similar to the objects
used in the object-oriented programming language.
7) Hierarchical Databases
It is the type of database that stores data in the form of parent-children relationship nodes. Here,
it organizes data in a tree-like structure.
Data get stored in the form of records that are connected via links. Each child record in the tree
will contain only one parent. On the other hand, each parent record can have multiple child
records.
8) Network Databases
It is the database that typically follows the network data model. Here, the representation of data
is in the form of nodes connected via links between them. Unlike the hierarchical database, it
allows each record to have multiple children and parent nodes to form a generalized graph
structure.
9) Personal Database
Collecting and storing data on the user's system defines a Personal Database. This database is
basically designed for a single user.
Advantage of Personal Database
o It is simple and easy to handle.
o It occupies less storage space as it is small in size.
Only one user can access data at a Multiple users can access data at
User Access time. a time.
The users are not required to write The user has to write procedures
Meaning procedures. for managing databases
In 1-Tier Architecture the database is directly available to the user, the user can directly sit
on the DBMS and use it that is, the client, server, and Database are all present on the same
machine. For Example: to learn SQL we set up an SQL server and the database on the local
system. This enables us to directly interact with the relational database and execute
operations. The industry won’t use this architecture they logically go for 2-Tier and 3-Tier
Architecture.
2-Tier Architecture
The 2-tier architecture is similar to a basic client-server model. The application at the client
end directly communicates with the database on the server side. APIs like ODBC and JDBC
are used for this interaction. The server side is responsible for providing query processing
and transaction management functionalities. On the client side, the user interfaces and
application programs are run. The application on the client side establishes a connection with
the server side in order to communicate with the DBMS.
An advantage of this type is that maintenance and understanding are easier, and compatible
with existing systems. However, this model gives poor performance when there are a large
number of users.
DBMS 2-Tier Architecture
3-Tier Architecture
In 3-Tier Architecture, there is another layer between the client and the server. The client
does not directly communicate with the server. Instead, it interacts with an application server
which further communicates with the database system and then the query processing and
transaction management takes place. This intermediate layer acts as a medium for the
exchange of partially processed data between the server and the client. This type of
architecture is used in the case of large web applications.
Data Model is the modeling of the data description, data semantics, and consistency constraints
of the data. It provides the conceptual tools for describing the design of a database at each level
of data abstraction. Therefore, there are following four data models used for understanding the
structure of the database:
1) Relational Data Model: This type of model designs the data in the form of rows and
columns within a table. Thus, a relational model uses tables for representing data and in-
between relationships. Tables are also called relations. This model was initially described by
Edgar F. Codd, in 1969. The relational data model is the widely used model which is primarily
used by commercial data processing applications.
2) Entity-Relationship Data Model: An ER model is the logical representation of data as
objects and relationships among them. These objects are known as entities, and relationship is
an association among these entities. This model was designed by Peter Chen and published in
1976 papers. It was widely used in database designing. A set of attributes describe the entities.
For example, student_name, student_id describes the 'student' entity. A set of the same type of
entities is known as an 'Entity set', and the set of the same type of relationships is known as
'relationship set'.
4) Semistructured Data Model: This type of data model is different from the other three data
models (explained above). The semistructured data model allows the data specifications at
places where the individual data items of the same type may have different attributes sets. The
Extensible Markup Language, also known as XML, is widely used for representing the
semistructured data. Although XML was initially designed for including the markup
information to the text document, it gains importance because of its application in the exchange
of data.
Data Model gives us an idea that how the final system will look like after its complete
implementation. It defines the data elements and the relationships between the data elements.
Data Models are used to show how data is stored, connected, accessed and updated in the
database management system. Here, we use a set of symbols and text to represent the
information so that members of the organisation can communicate and understand it. Though
there are many data models being used nowadays but the Relational model is the most widely
used model. Apart from the Relational model, there are many other types of data models
about which we will study in details in this blog. Some of the Data Models in DBMS are:
1. Hierarchical Model
2. Network Model
3. Entity-Relationship Model
4. Relational Model
5. Object-Oriented Data Model
6. Object-Relational Data Model
7. Flat Data Model
8. Semi-Structured Data Model
9. Associative Data Model
10. Context Data Model
Hierarchical Model
Hierarchical Model was the first DBMS model. This model organises the data in the
hierarchical tree structure. The hierarchy starts from the root which has root data and then it
expands in the form of a tree adding child node to the parent node. This model easily
represents some of the real-world relationships like food recipes, sitemap of a website
etc. Example: We can represent the relationship between the shoes present on a shopping
website in the following way:
Network Model
This model is an extension of the hierarchical model. It was the most popular model before
the relational model. This model is the same as the hierarchical model, the only difference is
that a record can have more than one parent. It replaces the hierarchical tree with a
graph. Example: In the example below we can see that node student has two parents i.e. CSE
Department and Library. This was earlier not possible in the hierarchical model.
1. Ability to Merge more Relationships: In this model, as there are more relationships
so data is more related. This model has the ability to manage one-to-one relationships as well
as many-to-many relationships.
2. Many paths: As there are more relationships so there can be more than one path to
the same record. This makes data access fast and simple.
3. Circular Linked List: The operations on the network model are done with the help of
the circular linked list. The current position is maintained with the help of a program and this
position navigates through the records according to the relationship.
Advantages of Network Model
• The data can be accessed faster as compared to the hierarchical model. This is because
the data is more related in the network model and there can be more than one path to reach a
particular node. So the data can be accessed in many ways.
• As there is a parent-child relationship so data integrity is present. Any change in
parent record is reflected in the child record.
Disadvantages of Network Model
• As more and more relationships need to be handled the system might get complex.
So, a user must be having detailed knowledge of the model to work with the model.
• Any change like updation, deletion, insertion is very complex.
Entity-Relationship Model
Entity-Relationship Model or simply ER Model is a high-level data model diagram. In this
model, we represent the real-world problem in the pictorial form to make it easy for the
stakeholders to understand. It is also very easy for the developers to understand the system
by just looking at the ER diagram. We use the ER diagram as a visual tool to represent an
ER Model. ER diagram has the following three components:
Features of ER Model
Relational Model
Relational Model is the most widely used model. In this model, the data is maintained in the
form of a two-dimensional table. All the information is stored in the form of row and
columns. The basic structure of a relational model is tables. So, the tables are also
called relations in the relational model. Example: In this example, we have an Employee
table.
Features of Relational Model
• Tuples : Each row in the table is called tuple. A row contains all the information about
any instance of the object. In the above example, each row has all the information about any
specific individual like the first row has information about John.
• Attribute or field: Attributes are the property which defines the table or relation. The
values of the attribute should be from the same domain. In the above example, we have
different attributes of the employee like Salary, Mobile_no, etc.
Advnatages of Relational Model
• Simple: This model is more simple as compared to the network and hierarchical
model.
• Scalable: This model can be easily scaled as we can add as many rows and columns
we want.
• Structural Independence: We can make changes in database structure without
changing the way to access the data. When we can make changes to the database structure
without affecting the capability to DBMS to access the data we can say that structural
independence has been achieved.
Disadvantages of Relatinal Model
• Hardware Overheads: For hiding the complexities and making things easier for the
user this model requires more powerful hardware computers and data storage devices.
• Bad Design: As the relational model is very easy to design and use. So the users don't
need to know how the data is stored in order to access it. This ease of design can lead to the
development of a poor database which would slow down if the database grows.
But all these disadvantages are minor as compared to the advantages of the relational model.
These problems can be avoided with the help of proper implementation and organisation.
In the above example, we have two objects Employee and Department. All the data and
relationships of each object are contained as a single unit. The attributes like Name, Job_title
of the employee and the methods which will be performed by that object are stored as a single
object. The two objects are connected through a common attribute i.e the Department_id and
the communication between these two will be done with the help of this common id.
Object-Relational Model
As the name suggests it is a combination of both the relational model and the object-oriented
model. This model was built to fill the gap between object-oriented model and the relational
model. We can have many advanced features like we can make complex data types according
to our requirements using the existing data types. The problem with this model is that this
can get complex and difficult to handle. So, proper understanding of this model is required.
Semi-Structured Model
Semi-structured model is an evolved form of the relational model. We cannot differentiate
between data and schema in this model. Example: Web-Based data sources which we can't
differentiate between the schema and data of the website. In this model, some entities may
have missing attributes while others may have an extra attribute. This model gives flexibility
in storing the data. It also gives flexibility to the attributes. Example: If we are storing any
value in any attribute then that value can be either atomic value or a collection of values.
• Item : Items contain the name and the identifier(some numeric value).
• Links: Links contain the identifier, source, verb and subject.
Example : Let us say we have a statement "The world cup is being hosted by London from
30 May 2020". In this data two links need to be stored:
1. The world cup is being hosted by London. The source here is 'the world cup', the verb
'is being' and the target is 'London'.
2. ...from 30 May 2020. The source here is the previous link, the verb is 'from' and the
target is '30 May 2020'.
This is represented using the table as follows:
Components of ER Diagram
ER Model consists of Entities, Attributes, and Relationships among Entities in a Database
System.
Components of ER Diagram
Entity
An Entity may be an object with a physical existence – a particular person, car, house, or
employee – or it may be an object with a conceptual existence – a company, a job, or a
university course.
Entity Set: An Entity is an object of Entity Type and a set of all entities is called an entity
set. For Example, E1 is an entity having Entity Type Student and the set of all students is
called Entity Set. In ER diagram, Entity Type is represented as:
Entity Set
1. Strong Entity
A Strong Entity is a type of entity that has a key Attribute. Strong Entity does not depend on
other Entity in the Schema. It has a primary key, that helps in identifying it uniquely, and it
is represented by a rectangle. These are called Strong Entity Types.
2. Weak Entity
An Entity type has a key attribute that uniquely identifies each entity in the entity set. But
some entity type exists for which key attributes can’t be defined. These are called Weak
Entity types.
For Example, A company may store the information of dependents (Parents, Children,
Spouse) of an Employee. But the dependents don’t have existed without the employee. So
Dependent will be a Weak Entity Type and Employee will be Identifying Entity type for
Dependent, which means it is Strong Entity Type.
A weak entity type is represented by a Double Rectangle. The participation of weak entity
types is always total. The relationship between the weak entity type and its identifying strong
entity type is called identifying relationship and it is represented by a double diamond.
Attributes
Attributes are the properties that define the entity type. For example, Roll_No, Name, DOB,
Age, Address, and Mobile_No are the attributes that define entity type Student. In ER
diagram, the attribute is represented by an oval.
Attribute
1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called the key
attribute. For example, Roll_No will be unique for each student. In ER diagram, the key
attribute is represented by an oval with underlying lines.
Key Attribute
2. Composite Attribute
An attribute composed of many other attributes is called a composite attribute. For
example, the Address attribute of the student Entity type consists of Street, City, State, and
Country. In ER diagram, the composite attribute is represented by an oval comprising of
ovals.
Composite Attribute
3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example, Phone_No
(can be more than one for a given student). In ER diagram, a multivalued attribute is
represented by a double oval.
Multivalued Attribute
4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is known as a derived
attribute. e.g.; Age (can be derived from DOB). In ER diagram, the derived attribute is
represented by a dashed oval.
Derived Attribute
The Complete Entity Type Student with its Attributes can be represented as:
Entity and Attributes
Entity-Relationship Set
A set of relationships of the same type is known as a relationship set. The following
relationship set depicts S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered in C3.
Relationship Set
Degree of a Relationship Set
The number of different entity sets participating in a relationship set is called the degree of a
relationship set.
1. Unary Relationship: When there is only ONE entity set participating in a relation, the
relationship is called a unary relationship. For example, one person is married to only one
person.
Unary Relationship
2. Binary Relationship: When there are TWO entities set participating in a relationship, the
relationship is called a binary relationship. For example, a Student is enrolled in a Course.
Binary Relationship
3. n-ary Relationship: When there are n entities set participating in a relation, the
relationship is called an n-ary relationship.
Cardinality
The number of times an entity of an entity set participates in a relationship set is known
as cardinality. Cardinality can be of different types:
1. One-to-One: When each entity in each entity set can take part only once in the
relationship, the cardinality is one-to-one. Let us assume that a male can marry one female
and a female can marry one male. So the relationship will be one-to-one.
the total number of tables that can be used in this is 2.
2. One-to-Many: In one-to-many mapping as well where each entity can be related to more
than one relationship and the total number of tables that can be used in this is 2. Let us assume
that one surgeon deparment can accomodate many doctors. So the Cardinality will be 1 to M.
It means one deparment has many Doctors.
total number of tables that can used is 3.
In this case, each student is taking only 1 course but 1 course has been taken by many
students.
4. Many-to-Many: When entities in all entity sets can take part more than once in the
relationship cardinality is many to many. Let us assume that a student can take more than one
course and one course can be taken by many students. So the relationship will be many to
many.
the total number of tables that can be used in this is 3.
many to many cardinality
In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled by S1, S3,
and S4. So it is many-to-many relationships.
Participation Constraint
Participation Constraint is applied to the entity participating in the relationship set.
1. Total Participation – Each entity in the entity set must participate in the relationship. If
each student must enroll in a course, the participation of students will be total. Total
participation is shown by a double line in the ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT participate in the
relationship. If some courses are not enrolled by any of the students, the participation in the
course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total
participation and Course Entity set having partial participation.
Total Participation and Partial Participation
Every student in the Student Entity set participates in a relationship but there exists a course
C4 that is not taking part in the relationship.
How to Draw ER Diagram?
• The very first step is Identifying all the Entities, and place them in a Rectangle, and
labeling them accordingly.
• The next step is to identify the relationship between them and pace them
accordingly using the Diamond, and make sure that, Relationships are not connected to each
other.
• Attach attributes to the entities properly.
• Remove redundant entities and relationships.
• Add proper colors to highlight the data present in the database.
11 Andrew Johnson
22 Tom Wood
33 Alex Hale
In the above-given example, employee ID is a primary key because it uniquely identifies an
employee record. In this table, no other employee can have the same employee ID.
• Keys help you to identify any row of data in a table. In a real-world application, a
table could contain thousands of records. Moreover, the records could be duplicated. Keys in
RDBMS ensure that you can uniquely identify a table record despite these challenges.
• Allows you to establish a relationship between and identify the relation between
tables
• Help you to enforce identity and integrity in the relationship.
1. Super Key
2. Primary Key
3. Candidate Key
4. Alternate Key
5. Foreign Key
6. Compound Key
7. Composite Key
8. Surrogate Key
• Super Key – A super key is a group of single or multiple keys which identifies rows
in a table.
• Primary Key – is a column or group of columns in a table that uniquely identify
every row in that table.
• Candidate Key – is a set of attributes that uniquely identify tuples in a table.
Candidate Key is a super key with no repeated attributes.
• Alternate Key – is a column or group of columns in a table that uniquely identify
every row in that table.
• Foreign Key – is a column that creates a relationship between two tables. The
purpose of Foreign keys is to maintain data integrity and allow navigation between two
different instances of an entity.
• Compound Key – has two or more attributes that allow you to uniquely recognize a
specific record. It is possible that each column may not be unique by itself within the
database.
• Composite Key – is a combination of two or more columns that uniquely identify
rows in a table. The combination of columns guarantees uniqueness, though individual
uniqueness is not guaranteed.
• Surrogate Key – An artificial key which aims to uniquely identify each record is
called a surrogate key. These kind of key are unique because they are created when you don’t
have any natural primary key.
Example:
Example:
In this table, StudID, Roll No, Email are qualified to become a primary key. But since StudID
is the primary key, Roll No, Email becomes the alternative key.
Candidate key Example: In the given table Stud ID, Roll No, and email are candidate keys
which help us to uniquely identify the student record in the table.
DeptCode DeptName
001 Science
002 English
005 Computer
In this table, adding the foreign key in Deptcode to the Teacher name, we can create a
relationship between the two tables.
This article will explain different integrity constraints and the importance of their role
in database design. We'll also look at some common types of integrity constraints in
DBMS. Understanding these concepts is essential for creating robust, efficient databases.
We will also provide examples of how these constraints can be used to protect your data.
So, if you’re curious to learn more about integrity constraints in DBMS, read on!
Integrity constraints are rules that help to maintain the accuracy and consistency of data in
a database. They can be used to enforce business rules or to ensure that data is entered
correctly. For example, a simple integrity constraint in DBMS might state that all customers
must have a valid email address. This would prevent someone from accidentally entering an
invalid email address into the database. Integrity constraints can also be used to enforce
relationships between tables.
For example, if a customer can only have one shipping address, then an integrity constraint
can be used to ensure that only one shipping address is entered for each customer. Enforcing
integrity constraints in SQL can help prevent data inconsistencies and errors, making it
easier to manage and query the data.
Integrity constraints are an important part of maintaining database correctness. They ensure
that the data in the database adheres to a set of rules, which can help prevent errors and
inconsistencies. In some cases, integrity constraints can be used to enforce business rules,
such as ensuring that a customer's balance remains within a certain limit.
In other cases, they can be used to enforce data integrity, such as ensuring that all values in
a column are unique. Integrity constraints in SQL can be either enforced by the database
system or by application code. Enforcing them at the database level can help ensure that the
rules are always followed, even if the application code is changed. However, enforcing them
at the application level can give the developer more flexibility in how the rules are enforced.
1. Domain Constraint
A domain constraint is a restriction on the values that can be stored in a column. For
example, if you have a column for "age," domain integrity constraints in DBMS would
ensure that only values between 1 and 120 can be entered into that column. This ensures that
only valid data is entered into the database.
An entity integrity constraint is a restriction on null values. Null values are values that are
unknown or not applicable, and they can be problematic because they can lead to inaccurate
results. Entity integrity constraints would ensure that null values are not entered into any
required columns. For example, if you have a column for "first name," an entity integrity
constraint in DBMS would ensure that this column cannot contain any null values.
A referential integrity constraint is a restriction on how foreign keys can be used. A foreign
key is a column in one table that references a primary key in another table. For example,
let's say you have a table of employees and a table of department managers. The "employee
ID" column in the employee's table would be a foreign key that references the "manager ID"
column in the manager's table.
Referential integrity constraints in DBMS would ensure that every manager ID in the
manager's table has at least one corresponding employee ID in the employee's table. In other
words, it would prevent you from assigning an employee to a manager who doesn't exist.
4. Key Constraint
Key constraints in DBMS are a restriction on duplicate values. A key is composed of one or
more columns whose values uniquely identify each row in the table. For example, let's say
you have a table of products with columns for "product ID" and "product name." The
combination of these two values would be the key for each product, and a key constraint
would ensure that no two products have the same combination of product ID and product
name.
Within databases, a key constraint is a rule that defines how data in a column(s) can be
stored in a table. There are several different types of key constraints in DBMS, each with its
own specific purpose. Now, we'll take a high-level look at the five most common types of
key constraints: primary key constraints, unique key constraints, foreign key constraints,
NOT NULL constraints, and check constraints.
A primary key constraint (also known as a "primary key") is a type of key constraint that
requires every value in a given column to be unique. In other words, no two rows in a table
can have the same value for their primary key column(s). A primary key can either be a
single column or multiple columns (known as a "composite" primary key). The null value
is not allowed in the primary key column(s).
A unique key constraint is a column or set of columns that ensures that the values stored in
the column are unique. A table can have more than one unique key constraint, unlike the
primary key. A unique key column can contain NULL values. Like primary keys, unique
keys can be made up of a single column or multiple columns.
A foreign key constraint defines a relationship between two tables. A foreign key in one
table references a primary key in another table. Foreign keys prevent invalid data from being
inserted into the foreign key column. Foreign keys can reference a single column or multiple
columns.
A NOT NULL constraint is used to ensure that no row can be inserted into the table without
a value being specified for the column(s) with this type of constraint. Thus, every row must
have a non-NULL value for these columns.
5. Check Constraints
A check constraint enforces data integrity by allowing you to specify conditions that must
be met for data to be inserted into a column. For example, you could use a check constraint
to ensure that only positive integer values are inserted into a particular column. Check
constraints are usually used in combination with other constraints (such as NOT NULL
constraints) to enforce more complex rules.
There are several different types of key constraints in DBMS that you can use in SQL
databases. Each type of constraint has its own specific use cases and benefits. By
understanding when to use each type of constraint, you can ensure that your database is both
reliable and consistent. For in-depth knowledge of the types of integrity constraints, you can
go for the MongoDB Administration certification & expand your knowledge and develop
a stronger outlook.
Advantages of Integrity Constraints
Integrity constraints in DBMS can be used to enforce rules at the database level, which
means that they are applied to all users and applications that access the database. There are
several advantages to using integrity constraints in SQL, which will be outlined in more
detail below.
1. Declarative Ease
One of the advantages of integrity constraints is that they can be declared easily. Integrity
constraints are written in a declarative language, which means that they can be specified
without having to write code. This makes it easy for even non-technical users to understand
and specify rules.
2. Centralized Rules
Another advantage of integrity constraints is that they provide a centralized way to specify
rules. Therefore, rules only have to be specified once and then they can be enforced across
the entire database. This is much more efficient than having to specify rules individually for
each application or user.
Integrity constraints also provide flexibility when loading data into the database. When data
is loaded into the database, the integrity constraints are checked automatically. In other
words, if there are any problems with the data, they can be detected and corrected
immediately.
Using integrity constraints can also help to maximize application development productivity.
This is because developers do not have to write code to enforce rules; they can simply
specify the rules using an integrity constraint language. This saves time and effort during
development and makes it easier to create consistent and reliable applications.
Finally, using integrity constraints in DBMS provides immediate feedback to users when
they attempt to violate a rule. For example, if a user tries to insert an invalid value into a
database column, the database will reject the attempted insertion and return an error message
to the user instead. This provides a clear indication to the user that their input is incorrect
and needs to be corrected.
Integrity constraints are important for several reasons. First, they help to ensure the accuracy
of data by preventing invalid data from being entered into the database. Second, they help
to maintain the consistency of data by ensuring that data is consistent across different tables
and fields. Third, they help to prevent unauthorized access to data by ensuring that only
authorized users can access specific data.
Finally, they help to optimize performance by ensuring that only valid data is accessed and
processed. By enforcing integrity constraints, databases can maintain a high level of
accuracy and consistency while also preventing unauthorized access and optimizing
performance.
Till now, we learned and understood about relations and its representation. In the relational
database system, it maintains all information of a relation or table, from its schema to the
applied constraints. All the metadata is stored. In general, metadata refers to the data about
data. So, storing the relational schemas and other metadata about the relations in a structure is
known as Data Dictionary or System Catalog.
A data dictionary is like the A-Z dictionary of the relational database system holding all
information of each relation in the database.
With this, the system also keeps the following data based on users of the system:
o Name of authorized users
o Accounting and authorization information about users.
o The authentication information for users, such as passwords or other related
information.
In addition to this, the system may also store some statistical and descriptive data about
the relations, such as:
A system may also store the storage organization, whether sequential, hash, or heap. It
also notes the location where each relation is stored:
o If relations are stored in the files of the operating system, the data dictionary note, and
stores the names of the file.
o If the database stores all the relations in a single file, the data dictionary notes and store
the blocks containing records of each relation in a data structure similar to a linked list.
At last, it also stores the information regarding each index of all the relations:
All the above information or metadata is stored in a data dictionary. The data dictionary also
maintains updated information whenever they occur in the relations. Such metadata constitutes
a miniature database. Some systems store the metadata in the form of a relation in the database
itself. The system designers design the way of representation of the data dictionary. Also, a
data dictionary stores the data in a non-formalized manner. It does not use any normal form so
as to fastly access the data stored in the dictionary.
For example, in the data dictionary, it uses underline below the value to represent that the
following field contains a primary key.
So, whenever the database system requires fetching records from a relation, it firstly finds in
the relation of data dictionary about the location and storage organization of the relation. After
confirming the details, it finally retrieves the required record from the database.
Max Field
Attribute Name Data Type Description isRequired
Size
1. Data models in DBMS provide very little information about the database, so a data
dictionary is very essential to have proper knowledge about entities, relationships, and
attributes that are present in a data model.
2. The Data Dictionary provides consistency by reducing data redundancy in the
collection and use of data across various members of a team.
3. The Data Dictionary provides structured analysis and design tools by enforcing the use
of data standards. Data standards are the set of rules that govern the way data is collected,
recorded, and represented.
4. Using a Data Dictionary helps to define naming conventions that are used in a model.
There are mainly two types of data dictionary in a database management system:
Every relational database has an Integrated Data Dictionary contained within the DBMS.
This integrated data dictionary acts as a system catalog that is accessed and updated by the
relational database. In older databases, they did not include an integrated data dictionary, so in
that case, the database administrator had to use Stand Alone Data Dictionary. In DBMS, an
Integrated Data Dictionary can bind metadata to data.
The Integrated Data Dictionary can be further classified into two types:
• Active: An active data dictionary is updated automatically by the DBMS whenever any
changes are made to the database. This is also known as a self-updating dictionary as it keeps
the information up-to-date.
In DBMS, this type of data dictionary is very flexible as it allows the Database Administrator
to define and manage all the confidential data. It doesn't matter whether the data is
computerized or not. A stand-alone data dictionary allows database designers to interact with
end-users regardless of the data dictionary format.
There is no standard format for a data dictionary. Below given are some of the common
elements:
1. Data Elements: The Data Dictionary stores the definition of all the data elements such
as name, datatype, storage formats, and validation rules.
2. Tables: All information regarding the table, such as the user who created the table, the
number of rows and columns, the date on which the table was created and accessed, etc.
3. Index: Indexes for defined database tables are stored in the data dictionary. DBMS
stores the index name used by the attributes, location, and characteristics of the index, as well
as the date of creation, in each index.
4. Programs: Programs defined to access the database, including reports, application and
screen formats, SQL queries, etc., are also stored in the data dictionary.
5. Relationship between data elements: The Data Dictionary stores the type of
relationship; for example, if it is compulsory or optional, the cardinality of the relationship and
connectivity, etc.
6. Administrations and End-Users: The Data Dictionary stores all the information of
the administration along with the end-users.
The metadata in DBMS, which is stored in the Data Dictionary, is similar to a monitor that
monitors the use of the database and the allocation of permission to access the database by the
users.
How to Create a Data Dictionary?
While creating a stand-alone data dictionary, the database administrator can take the help of a
template in SQL Server, Oracle, or even Microsoft Excel.
Composition = is composed of
Sequence + AND
Selection [|] OR
A large database defined as a single relation may result in data duplication. This repetition of
data may result in:
So to handle these problems, we should analyze and decompose the relations with redundant
data into smaller, simpler, and well-structured relations that are satisfy desirable properties.
Normalization is a process of decomposing the relations into relations with fewer attributes.
What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It
is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion
Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.
The main reason for normalizing the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data integrity and other problems as the
database grows. Normalization consists of a series of guidelines that helps to guide you in
creating a good database structure.
o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple
into a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of
data results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.
Normalization works through a series of stages called Normal forms. The normal forms apply
to individual relations. The relation is said to be in particular normal form if it satisfies
constraints.
Normal Description
Form
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.
3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-
valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency,
joining should be lossless.
Advantages of Normalization
Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms, i.e.,
4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious problems.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In
a school, a teacher can teach more than one subject.
TEACHER table
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
A relation is in third normal form if it holds atleast one of the following conditions for every
non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_ZIP
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
Example: Let's assume there is a company where employees work in more than one
department.
EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Example
STUDENT
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Example
SUBJECT LECTURER SEMESTER
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take
Math class for Semester 2. In this case, combination of all these fields required to identify a
valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will
be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1ckward Skip 10sPlay VideoForward Skip 10s
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same
relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the decomposition
give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
DEPT_ID EMP_ID DEPT_NAME
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then the resultant
relation will look like:
Employee ⋈ Department
Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every
dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either
must be a part of R1 or R2 or must be derivable from the combination of functional
dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of relation R1(ABC).
Example: Suppose there is a bike manufacturer company which produces two colors(white
and black) of each model every year.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL.
The representation of these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
If the join of R1 and R2 over C is equal to relation R, then we can say that a join dependency
(JD) exists.
Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R (A,
B, C, D).
A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-join
decomposition.
The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to the relation
R.
Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of R.
A statement in which some columns of any relation are contained in other columns is known
as an Inclusion Dependency. Inclusion dependencies, like functional dependencies, represent
one-to-many relationships. However, inclusion dependencies are more commonly used to
represent relationships between relations. A foreign key is an example of inclusion
dependency. The relation which it is referring is contained in the column of primary key.
Let's say we take two relations, namely R and S that are created by using two entity sets in a
way that every entity in R is also S entity. Inclusion dependence occurs when projecting R's
key attributes gives a relation that is contained in the relation acquired by projecting S's key
attributes.
Let's name the relations R as teacher and S as student, so take the attribute as teacher_id, so
we can write:
student:
1 Rahul Singh 1 18
teacher_id will be the primary key for teacher table and will be foreign key for the student
table, attributes of the teacher table will be available in the student table.
Interference axioms for inclusion dependencies are described in the following table:
• Reflexive rule here states that a table can have attributes and can project on itself:
If X⊇X then X->X.
• Projection and Permutation rule here states that if IF AB->CD then A->C AND B-
>D.
• Transitivity rule here states that if a table A projects to B and B projects to C, so We
can conclude A->C.
These rules can be applied on any database system that manages stored data using only its
relational capabilities. This is a foundation rule, which acts as a base for all the other rules.
Every single data element (value) is guaranteed to be accessible logically with a combination
of table-name, primary-key (row value), and attribute-name (column value). No other means,
such as pointers, can be used to access data.
The NULL values in a database must be given a systematic and uniform treatment. This is a
very important rule because a NULL can be interpreted as one the following − data is missing,
data is not known, or data is not applicable.
The structure description of the entire database must be stored in an online catalog, known
as data dictionary, which can be accessed by authorized users. Users can use the same query
language to access the catalog which they use to access the database itself.
A database can only be accessed using a language having linear syntax that supports data
definition, data manipulation, and transaction management operations. This language can be
used directly or by means of some application. If the database allows access to data without
any help of this language, then it is considered as a violation.
All the views of a database, which can theoretically be updated, must also be updatable by the
system.
A database must support high-level insertion, updation, and deletion. This must not be limited
to a single row, that is, it must also support union, intersection and minus operations to yield
sets of data records.
The data stored in a database must be independent of the applications that access the database.
Any change in the physical structure of a database must not have any impact on how the data
is being accessed by external applications.
The logical data in a database must be independent of its user’s view (application). Any change
in logical data must not affect the applications using it. For example, if two tables are merged
or one is split into two different tables, there should be no impact or change on the user
application. This is one of the most difficult rule to apply.
A database must be independent of the application that uses it. All its integrity constraints can
be independently modified without the need of any change in the application. This rule makes
a database independent of the front-end application and its interface.
The end-user must not be able to see that the data is distributed over various locations. Users
should always get the impression that the data is located at one site only. This rule has been
regarded as the foundation of distributed database systems.
If a system has an interface that provides access to low-level records, then the interface must
not be able to subvert the system and bypass security and integrity constraints.