PDF Color
PDF Color
Components of DBMS
Component Description
DBMS NOTES 1
Duplication of data
Inconsistency
Waste of space
Applications of DBMS
Banking: maintaining customer information, accounts, loans, and banking
transactions
Advantages of DBMS
Reduction in Data Redundancy: eliminates duplication of data, reducing
waste of space
DBMS NOTES 2
Better Interaction with Users: provides better service to users through
efficient data retrieval and modification## Advantages of DBMS
Efficient System:
It is very common to change the content of stored data. These changes
can easily be made in a database management system than in a
conventional system as these do not need to have any impact on
application programs. The cost of developing and maintaining systems is
also lower.
Disadvantages of DBMS
Other Disadvantages
Atomicity and integrating problems are found.
DBMS NOTES 3
There is no concurrent access and recovery.
Security Problems
DBMS NOTES 4
"Every user of the database system should not be
allowed to access all data. Since application programs are
added to the
file-oriented systems in an ad-hoc manner, it was difficult to
enforce
such security systems."
Who is a DBA?
A Database Administrator (DBA)
Role of DBA
Defining conceptual schema: A DBA creates the conceptual schema
corresponding to the abstract level database design made by data
administrator.
DBMS NOTES 5
Insulation between Programs, data, and data abstraction: A database
system provides insulation between programs, data, and data
abstraction, which means that changes to one do not affect the others.
Database Models
A database model is a collection of conceptual
tools for describing data, data relationships, data semantics, and
consistency constraints.
Organises data in a tree structure, with each child node having only
Hierarchical
one parent node
Organises data into tables with rows and columns, with each row
Relational
representing a single record
Object-
Uses objects to represent data and relationships between them
Oriented
DBMS NOTES 6
Allows for many-to-many relationships between data entities
Relations (Tables)
A relation (table) consists of rows (records) and columns (fields).
Table Structure:
Column Description
Operations
Three basic operations are used to develop useful sets of data:
Projection: Extracts fields from a table, allowing the user to create new
tables that contain only the required information.
DBMS NOTES 7
These operations are all part of Relational Algebra.
Class Description
DBMS NOTES 8
Instance:
Types of Schema
There are three types of schema:
Conceptual Schema: Describes the stored data in terms of the data model
of the DBMS.
Data Independence
Definition:
Relationships
A relationship is a meaningful association between one or more entity types.
DBMS NOTES 9
HASA relationship: a relationship between two entities where one has or
owns the other
Relationship Type
A set of meaningful associations between one or more participating entity
types.
Relationships with the same attributes fall into one relationship set.
Degree of a Relationship
Defined as the number of entities associated with the relationship.
Degree Description
Connectivity or Cardinality
Describes the mapping of associated entity instances in a relationship.
Attributes
Define the properties of a data object in an entity.
DBMS NOTES 10
Single-Valued Attribute: an attribute that holds a single value for a single
entity
Entity Set
Weak Entity Set: an entity set that does not possess sufficient attributes to
form a primary key
Mapping Constraints 🗂️
One-to-One: an entity in A is associated with at most one entity in B, and
an entity in B is associated with at most one entity in A
Generalization
DBMS NOTES 11
set that contains entities processing these common
features.
Specialization
Aggregation
Generalization
is a top-down process of defining super-classes and their related
sub-classes. We first define a super-class, then sub-classes, and their
attributes and relationships.
Advantages of Generalization:
Enables the entity type to share common attributes among different classes
Allows for the creation of a class that can be refined progressively into finer
sub-classes
Inheritance
Key Points:
DBMS NOTES 12
A class can be created at a broad level and then refined progressively into
finer sub-classes
Can be created at a broad level and then refined progressively into finer
sub-classes
Subclass:
Inherits all the properties of its superclass to which it can add its own
properties
Examples:
Superclass Subclass Unique Attributes
E-R Diagrams
Banking System:
class Bank {
accounts: [
{
accountNumber: string,
accountHolder: string,
balance: number
DBMS NOTES 13
}
]
}
class Account {
transactions: [
{
transactionId: string,
transactionDate: date,
amount: number
}
]
}
Hospital System:
class Hospital {
patients: [
{
patientId: string,
patientName: string,
address: string
}
]
}
class Patient {
appointments: [
{
appointmentId: string,
appointmentDate: date,
doctor: string
}
]
}
DBMS NOTES 14
Advantages and Disadvantages of Different Models
Hierarchical Model
Advantages:
Simpllicity
Data sharing
Data security
Data integrity
Efficiency
Disadvantages:
Implementation complexity
Inflexibility
Operational anomalies
Network Model
Advantages:
Simplicity
Database standards
Disadvantages:
System complexity
Operational anomalies
Not user-friendly
Relational Model
Advantages:
Simplicity
DBMS NOTES 15
No anomalies
Structural independence
Disadvantages:
Hardware overheads
Database Users
There are different types of users depending on their need and way of
accessing the database:
Application Programmers:
They are the developers who interact with the database by means of DML
queries. These queries are written in the application programs like C,
C++; Java, etc.
DBMS NOTES 16
Specialized Users: These users are also sophisticated users, but they write
special database application programs.
Native Users: These are the users who use the existing applications and
interfaces.
File Organization
File Organization: A file organization refers to the way the files are physically
arranged on a storage device.
Indexed Sequential File An effective way of organizing the records when there is a
Organization need to access individual records directly.
Economy of storage
Convenience of updates
Ease of retrieval
Reliability
Security
Integrity
DBMS NOTES 17
In sequential organized files, the records are
written one after another in order when the file is created and
can be
accessed only in that order in which they are written when
the file is
used for input.
Advantages:
Easy to handle
Involves no overhead
Disadvantages:
Advantages:
Disadvantages:
DBMS NOTES 18
Can only be stored on disks
Definition:
An indexed sequential file organization is a combination of sequential
file and relative file organizations. It provides the benefits of both
access methods, allowing for efficient sequential access and direct
access to individual records.
Structure:
The indexed sequential file organization consists of an index with
pointers to a sequential data file. The index is structured as a binary
search tree, allowing for fast lookup and retrieval of records.
Example:
Consider a credit card billing system with a master file of customer
account information. The account number is used as the index key,
allowing for fast retrieval of individual records. The file can be
accessed in batch mode to generate customer invoices and build summary
reports of accounts activity on a monthly basis.
Advantages:
Disadvantages:
DBMS NOTES 19
support multiple keys or access paths. It allows for efficient retrieval
of records based on different keys or combinations of keys.
Example:
Consider a banking system with several types of users, including
tellers, loan officers, branch managers, and account holders. Each user
needs to access the same data in different ways. A multi-key file
organization can support multiple access paths, including account ID,
overdraft limit, social security number, and group code.
Approach:
One approach to support multi-key file organization is to use a single
data file and multiple indexes, each providing a different access path
to the data records.
Hashing
Definition:
Hashing is a technique of storing and retrieving records in a file
using a hash function. The hash function calculates the address of the
page where the record is stored based on one or more fields in the
record.
Hash Function:
A hash function is a mathematical formula that manipulates the keys in
some way to compute the index for the keys in the hash table.
Example:
Consider a hash function that takes the first two characters of the
staff number, converts them to an integer value, and then adds this
value to the remaining digits of the field. The resulting sum is used as
the address of the disk page where the record is stored.
Collision Resolution:
DBMS NOTES 20
Technique Description
Open Addressing Probes other slots in the hash table to find an empty slot
Chained Overflow Stores multiple records with the same hash value in a linked list
Separate Chaining Uses a linked list to store all records with the same hash value
Initial State:
T0 | T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8
---| ---| ---| ---| ---| ---| ---| ---| ---
null | null | null | null | null | null | null | null | n
ull
Insertion of Elements:
h(5) = 5 mod 9 = 5
h(28) = 28 mod 9 = 1
h(19) = 19 mod 9 = 1
...
T0 | T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8
---| ---| ---| ---| ---| ---| ---| ---| ---
null | 28 -> 19 | null | null | null | 5 | null | null |
null
DBMS NOTES 21
```## Collision Resolution by Open Addressing 🗂️
In open addressing, all elements of the dynamic set are sto
red in the hash table itself. Each entry of the hash table
either contains an element of the dynamic set or a sentinel
value indicating that the slot is free.
DBMS NOTES 22
h(k, i) = (k mod m + c1 \* i + c2 \* i^2) mod m
DBMS NOTES 23
| 59 | 4 | 5 | 4 | 9 | 14 |
| 21 | 10 | 3 | 10 | 1 | |
| 65 | 10 | 5 | 10 | 1 | 2 |
| 88 | 0 | 1 | 0 | 1 | 2 |
## B-Tree
**Advantages:**
**Example:**
DBMS NOTES 24
| 4 | 2 | 4 |
| 5 | 3 | 5 |
| ... | ... | ... |## B-Trees
```js
+---------------+
| 40 |
+---------------+
| / \ |
|/ \ |
+---------------+---------------+
| 14 | 68 |
+---------------+---------------+
| / \ / \ / \ |
|/ \| |/ \| |/ \|
+---------------+---------------+--------------
-+
DBMS NOTES 25
| 5 | 15 | 35 | 45 | 55 |
+---------------+---------------+--------------
-+
B+ Trees
A B+ tree is a variation of a B-tree that is well-suited for disk access.
Properties of B+ Trees
Internal Nodes: Store only keys and child pointers.
Example of a B+ Tree
+---------------+
| 40 |
+---------------+
| / \ |
|/ \ |
+---------------+---------------+
| 14 | 68 |
+---------------+---------------+
| / \ / \ / \ |
|/ \| |/ \| |/ \|
+---------------+---------------+--------------
-+
| Data | Data | Data | Data | Data |
+---------------+---------------+--------------
-+
DBMS NOTES 26
Properties of Relational Tables
Values are Atomic: Columns in a relational table are not repeating groups
or arrays.
Column Values are of the Same Kind: All values in a column come from the
same domain.
RDBMS vs DBMS
RDBMS DBMS
Relationship between
Specified at table creation Programmatically specified
tables
Codd's 12 Rules
Dr. E.F. Codd's 12 rules for a Relational Database Management System
(RDBMS):
DBMS NOTES 27
Rule 3: Systematic treatment of null values
Poor Data Control: There is no centralized control over the fields, making it
difficult to manage and enforce data standards.
DBMS NOTES 28
Security Problems: Enforcing security measures is challenging because
application programs are often added in an ad-hoc manner, leading to
inconsistent security practices.
Database Keys
Understanding database keys is crucial for designing and working with
relational databases. Here are definitions of primary key, superkey, and foreign
key:
Primary Key
The primary key is a column (or a set of columns) used to uniquely identify
each row in a table. Each table can have only one primary key.
Superkey
A super key is a set of one or more columns (attributes) that can be used to
uniquely identify a record in a table. Note that a table can have multiple
superkeys.
Foreign Key
A foreign key is a field (or collection of fields) in one table that uniquely
identifies a row of another table. It's used to link two tables together to enforce
referential integrity.
Suppose we have a table "Students" and another table "Courses". If the
"Courses" table has a primary key called "CourseID", and the "Students" table
has a column called "EnrolledCourse" that also contains "CourseID", then
"EnrolledCourse" column is a foreign key in the "Students" table referencing
the "CourseID" primary key in the "Courses" table.
Please note that the foreign key in one table points to a primary key in another
table.
Example:
Students Table
DBMS NOTES 29
---------------|| 1 | John | CS101
|| 2 | Jane | MTH101
|
Courses Table
Normalization
Normalization is a database design technique that reduces data redundancy
and eliminates undesirable characteristics like Insertion, Update and Deletion
Anomalies.
The main aim of Normalization is to divide a database into two or more tables
and defining relationships between them. In brief,
DBMS NOTES 30
Used to control access and rights to database data.
Given its comprehensive capabilities, SQL falls under all categories making it a
powerful tool for interacting with relational databases.
DBMS NOTES 31
CREATE TABLE Students(StudentID int, StudentName varchar(25
5));
ALTER TABLE Students ADD Email varchar(255);
DROP TABLE Students;
Functional Dependency
Functional Dependency is a fundamental concept in the study of database
systems. It refers to the relationship between two sets of attributes in a
database.
To explain briefly:
This means that if you know the value of A, you can predict the value
of B with certainty. This concept is key in the creation of database
schemas, particularly in the normalization process.
DBMS NOTES 32
Pitfalls of Locks Based Protocol
Lock-based protocols have several limitations. Here are some of the main
pitfalls:
Deadlock: This can occur when two or more operations try to lock each
other concurrently, creating a cycle where each is waiting for the other to
release a lock.
Decrease in concurrent processing: The more locks in the system, the less
the concurrent processing. This could lead to inefficiencies in the system.
Note that the severity of these pitfalls may vary depending on the specific lock-
based protocol implemented.
1. NOT NULL Key: A NOT NULL constraint ensures that a column cannot have
a null value.
1. Unique Key: A unique key is a set of one or more than one fields/columns
of a table that uniquely identify a record in a database table. It is like
Primary key but it can accept one null value.
DBMS NOTES 33
Indexes are essential tools to expedite data retrieval in a database. Let's
explore a few common types:
1. B-Tree:
This is the most common type of index. It allows the database to find data
by leading the database system through a tree of data nodes.
2. Bitmap Index:
It's mainly useful in databases where the data in indexed columns has a
limited number of distinct values. Bitmap indexes use bit arrays (commonly
referred to as bitmaps) and answer queries by performing bitwise logical
operations on these bitmaps.
3. Hash Index:
In a hash index, a unique hash function generates a unique numeric value
for any data input. This numeric value provides the address where the data
is stored.
4. Clustered Index:
It sorts and stores the data rows in the table or view based on their key
values. There can only be one clustered index per table.
5. Non-Clustered Index:
It's just like a book index where reference is given to the page where the
information is saved but the pages aren't sorted in order.
Characteristics of SQL
SQL (Structured Query Language) is a standardized programming language
used for managing and manipulating databases. Here are some primary
characteristics of SQL:
DBMS NOTES 34
Ubiquitous: SQL is almost universally used. Any application that interacts
with a database likely uses SQL.
Scalability and Flexibility: SQL can handle a large amount of data and can
be scaled up and down as per requirements.
High Performance: SQL can handle heavy loads and performs well in big
data scenarios.
Characteristics of a Database
A database is an organized collection of data that is stored and accessed
electronically. Databases allow us to preserve, retrieve, and manipulate data
effectively, making them critical for any type of serious computational work.
Here are some key characteristics of a database:
DBMS NOTES 35
These are the key features that make databases an essential tool for managing
data in any enterprise application.
It is less efficient if you want to access data located near the end of the file,
as you have to traverse all the preceding data.
Random Access
In random access, data can be read or written no matter the order. You can
start from any position.
It is more efficient if you want to access data located anywhere in the file,
as you can directly reach to that point.
Remember that the choice of access depends on the specific use-case in your
program.
Relational Algebra
Relational algebra is a procedural query language, which takes instances of
relations as input and yields instances of relations as output. It is mainly used to
manipulate the data in a relational database.
DBMS NOTES 36
Union: combining two relations,
This operation is used to find the tuples present in one relation but not
in the second relation.
7. Join Operation:
This operation is used to find the common tuples between two relations.
DBMS NOTES 37
These operations are fundamental to manage and manipulate data in relational
databases.
Example:
Example:
Here Student_name → Class_name but Student_name is not a super key which violates
BCNF.
DBMS NOTES 38
1. Domain Constraint: This ensures that all entries in a column must only be
of a specific data type.
2. Primary Key Constraint: This ensures that all entries are unique and that
there are no null values in the primary key column.
3. Foreign Key Constraint: This ensures that the values in the foreign key
column match the values of a primary key in another table.
4. Unique Constraint: This ensures that all values in a column are unique,
differentiating the records.
5. Not Null Constraint: This ensures that a column cannot have a null value.
6. Check Constraint: This ensures that the value in a column meets a specific
condition.
Note: Violation of these constraints can affect the integrity of the database
data.
1. Domain Constraint: This ensures that all entries in a column must only be
of a specific data type.
2. Primary Key Constraint: This ensures that all entries are unique and that
there are no null values in the primary key column.
3. Foreign Key Constraint: This ensures that the values in the foreign key
column match the values of a primary key in another table.
4. Unique Constraint: This ensures that all values in a column are unique,
differentiating the records.
5. Not Null Constraint: This ensures that a column cannot have a null value.
6. Check Constraint: This ensures that the value in a column meets a specific
condition.
Note: Violation of these constraints can affect the integrity of the database
data.
DBMS NOTES 39
Database Normalization: 4th and 5th
Normal Forms
Fourth Normal Form (4NF)
Fourth Normal Form (4NF) states that a table should not have any multi-valued
dependency, which means that each independent multi-valued fact should be
represented in separate tables.
DBMS Architecture
The architecture of a Database Management System (DBMS) can be seen as
either 2-tier or 3-tier.
3-tier architecture adds an intermediary layer between the user and the
database, often improving performance, manageability, and security.
Definitions
Entity: A distinct, real-world object in an entity-relationship model.
Weak Entity: An entity that depends on another entity-type for its existence.
Normalization in Databases
DBMS NOTES 40
Normalization is a process in database design that aims to reduce redundant
data, ensure data consistency and enhance the integrity of the database.
Please note that while normalization has its benefits, it's not always the optimal
approach. It depends on the specific application and requirements.
Syntax Example
Here is a basic example in SQL:
SELECT column_name
FROM table_name
WHERE column_name IN (SELECT column_name FROM table_name WH
ERE condition);
The query within the parentheses is the subquery or the nested query.
DBMS NOTES 41
This set of values is then used by the main query to execute and retrieve
the final data.
Database Normalization
Database Normalization involves organizing the attributes and tables of a
database to minimize redundancy and dependency. It generally encompasses
First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form
(3NF).
Example:
In a Student table, if `StudentID` determines `StudentName
`, then we have `StudentID -> StudentName`.
Example:
In a Course table `(CourseID, ProfessorID) -> CourseName`,
`CourseName` is fully functionally dependent on `(CourseID,
ProfessorID)` as it can't be determined by a subset.
Normal Forms
1NF (First Normal Form): Data is only in tabular form with no repeating
groups.
DBMS NOTES 42
2NF (Second Normal Form): All non-key attributes are fully functionally
dependent on the primary key.
Types of Keys
There are several types of keys in a database:
1. Primary Key: A unique identifier for a record in a table. There can only be
one primary key in a table.
2. Foreign Key: Used to link two tables together. It's a field in a table that is a
primary key in another table.
5. Secondary Key: Also known as a non-prime key, this key is not a part of the
primary key but is still used for retrieval purposes.
7. Alternate Key: Alternate Keys are the Candidate Keys excluding the Primary
Keys.
DBMS NOTES 43
primary key of the entity in the ER model becomes the primary key of the
new table.
2. Mapping of Weak Entity Types: Weak Entity types are the ones that are
dependent on some other entity type. A separate table is created with a
foreign key that refers to the primary key of its owner entity.
The above SQL snippet illustrates the creation of a new 'Employee' table with a
foreign key 'Department_ID' referring to the 'Department' table, thereby
mapping a relationship between the two entities in the original ER model.
Lock in Programming:
Used in multithreaded applications to prevent multiple threads from
accessing the same resource concurrently, which could lead to inconsistent
results or corruption.
A thread will "lock" a resource before using it, which prevents other threads
from accessing it until the original thread has "unlocked" it.
import threading
# creating a lock
lock = threading.Lock()
DBMS NOTES 44
# acquiring the lock
lock.acquire()
Shared Locks
A shared lock (S lock) allows concurrent transactions to read (SELECT) a
resource but not to write (UPDATE) it.
Exclusive Locks
An exclusive lock (X lock) prevents other transactions from both reading
(SELECT) and writing (UPDATE) to the resource.
In essence, 'Shared' means other transactions can read but not change, while
'Exclusive' means no other transactions can read or change.
DBMS NOTES 45
A well-designed database should avoid these anomalies. This involves methods
such as normalizing your database to 3rd normal form (3NF) or Boyce-Codd
normal form (BCNF).
Understanding Mapping
Mapping in mathematics refers to a concept where each element of a set,
called the domain, is paired with an element of another set, known as
the range.
This process can also be understood as a function that "maps" one set to
another.
In simpler terms, for every input from the domain, the function assigns
precisely one output in the range.
For instance,
= ( )y=f(x)
In this equation, the function f is a mapping that maps x (element from the
domain) to y (element from the range).
DBMS NOTES 46
Please feel free to reach out if you need more detailed information on any of
these types.
4. Durability: Once a transaction has been committed, it will remain so, even
in the event of power loss, crashes, or errors.
Example: Once you save a document, it’s stored in the database and is
ensured to remain stored even if your computer suddenly crashes.
Each property adds robustness to the system and ensures the data integrity
and reliability of transactions.
DBMS NOTES 47
DDL stands for Data Definition Language.
DDL is a set of SQL commands used to create, modify, and delete database
structures but not data.
It is used to retrieve, store, modify, delete, insert and update data in the
database.
In summary, the key difference between DDL and DML is that DDL is used to
manipulate the database structure, while DML is used to manipulate the data
within the database.
1. Lock-based protocols
A simple yet effective strategy is to lock part of a database that a
transaction is accessing. For example, when Transaction A is working on
Data Item X, it locks X. Any other Transaction B cannot access X until
Transaction A releases the lock.
2. Timestamp-based protocols
In these types of protocol, transactions are assigned a timestamp to avoid
conflicts. For instance, if Transaction A's timestamp is earlier than
Transaction B, Transaction A will get precedence over B.
3. Validation-based protocols
DBMS NOTES 48
With this protocol, transactions are executed without checks, but at commit
time, validation is done to prevent incorrect outcomes. When transaction A
comes to commit, if overlapping transaction B fails the validation, B would
roll back.
Key Points:
Security encompasses measures and controls that ensure confidentiality,
integrity, and availability of data.
DBMS NOTES 49
dependency.
Example:
Consider a table StudentCourseProfessor with the fields:
StudentID
CourseID
ProfessorID
ProfessorName
StudentID
CourseID
ProfessorID
Professor
ProfessorID
ProfessorName
After the changes, both tables are in 3NF as they adhere to both 2NF rules and
do not have transitive functional dependencies.
DBMS NOTES 50