DBMS Module 2 Notes
DBMS Module 2 Notes
1. Lossless Decomposition
2. Lossy Decomposition
Lossless Lossy
The decompositions R1, R2, R2…Rn for a
The decompositions R1, R2, R2…Rn for a
relation schema R are said to be Lossy if
relation schema R are said to be Lossless if
there natural join results into addition of
there natural join results the original relation R.
extraneous tuples with the original relation R.
Formally, Let R be a relation and R1, R2, R3 … Formally, Let R be a relation and R1, R2,
Rn be it’s decomposition, the decomposition is R3 … Rn be its decomposition, the
lossless if – decomposition is lossy if –
R1 ⨝ R2 ⨝ R3.... ⨝ Rn = R R ⊂ R1 ⨝ R2 ⨝ R3.... ⨝ Rn
The common attribute of the sub relations is a The common attribute of the sub relation is
super key/candidate key of any one of the not a super key/candidate key of any of the
relation. sub relation.
1 NF 2 NF 3 NF BCNF
Is always Is always Is always May be LOSSLESS
LOSSLESS LOSSLESS LOSSLESS or LOSSY
May be
Is always Is always Is always DEPENDENCY
DEPENDENCY DEPENDENCY DEPENDENCY PRESERVING or
PRESERVING PRESERVING PRESERVING not be
DEPENDENCY
Imp: The 1NF, 2NF, and 3NF are valid for dependency preserving decomposition.
In this technique, the original relation is decomposed into smaller relations in such a way that the
resulting relations preserve the functional dependencies of the original relation. This is
important because if the decomposition results in losing any of the original functional
dependencies, it can lead to data inconsistencies and anomalies.
EXAMPLE 1:
SOLUTION:
For Lossy: Common attribute (Y i.e., candidate key) is present in both R1 and R2. Therefore,
Relation is not lossy.
For Lossless:
i.e., R1 ∪ R2 = R
R1 ∩ R2 ≠ ø
Step 1: Write down all the non trivial dependency present in the child table R1 and R2
R1 (X, Y) R2 (Y, Z)
Y -> X Z -> Y
Else {Take the closure of the determinant w.r.t functional dependency given in the
question and check the possibility. Determinant present on LHS.}
Check the dependency which is directly present (in the given question) : True
R1 (X, Y) R2 (Y, Z)
FD: X -> Y (directly present, True) FD: Y -> Z (directly present, True)
Y+ = (Y, Z, X) Z+ = (Z, X, Y)
Here, Z is not the part of R1, so, exclude Z Here, X is not the part of R2, so, exclude X
Hence, we proved that Y can determine X Hence, we proved that Z can determine Y
indirectly. (True) indirectly. (True)
X -> Y
Y -> X
Y -> Z
Step 4: Check all the original dependencies are preserved or not, w.r.t. to possible functional
dependencies.
Else {Take the closure of the determinant w.r.t. derived functional dependencies}
Y -> Z Y -> X
Z -> Y
Z -> X
Z+ = {Z, Y, X}
EXAMPLE 2:
Find Candidate Key, Check for 3NF, BCNF, Lossy or Lossless, Dependency Preserving or
Not?
SOLUTION:
D -> A
Variables present at Right Hand Side of the given functional dependencies are A, C, D
Assume
OR
D+ = {DA}
For Lossy: D is common here i.e., Candidate Key. So, the relation is not Lossy
For Lossless:
i.e., R1 ∪ R2 = R
R1 ∩ R2 ≠ ø
AB -> CD
D -> A
R1 (D, A) R2 (B, C, D)
B+ = {B}
C+ = {C}
D+ = {D}
BC+ = {BC}
4NF:-
A relation “R” is said to be 4NF if & only if the following conditions are satisfied:
MVD ( →→ ) :-
EXAMPLE:
Windows English
Aman
Apple Hindi
English
Mohan Linux
Spanish
Now,
Mohan Spanish
No MVD in table R1 because R1 Contains two No MVD in table R2 because R2 Contains two
attributes. attributes.
5NF:-
A relation “R” is said to be 5NF if & only if the following conditions are satisfied
simultaneously:
1) A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should
be lossless.
2) 5NF is satisfied when all the tables are broken into as many tables as possible in order to
avoid redundancy.
JD:-
Let “R” be a relation schema & R1, R2,……,Rn be the decomposition of R, R is said to satisfy
the join dependency if and only if
EXAMPLE:
S1 P1 J1
S1 P2 J1
S1 P1 J2
Given the above constraints, the following join dependencies exist on the table:
Supplier Part
S1 P1
S1 P2
S2 P2
Supplier Project
S1 J1
S1 J2
S2 J2
Part Project
P1 J1
P2 J1
P1 J2
P2 J2
Now, these decomposed tables eliminate the redundancy caused by the specific constraints and
join dependencies of the original relation.
When you take the natural join of these tables, you will get back the original table.
INCLUSION DEPENDENCE
Imagine you have two lists, one with names of your friends and another with their phone
numbers. Each friend's name is matched with their phone number. Now, if every friend's name in
the first list is also found in the second list along with their phone number, we can say there is an
"inclusion dependence" between the two lists.
So, in simple terms, inclusion dependence means that if something is in one list, it must also be
in another list. It's like saying, "If it's on the friend's name list, it has to be on the phone number
list too."
In a database, it's similar. If you have information in one table, and there's an inclusion
dependence with another table, it means that every piece of information in the first table must
also be in the second table.
A statement in which some columns of any relation are contained in other columns is
known as an Inclusion Dependency. Inclusion dependencies, like functional dependencies,
represent one-to-many relationships. However, inclusion dependencies are more commonly
used to represent relationships between relations.
Let's name the relations R as teacher and S as student, so take the attribute as teacher_id, so we
can write:
Teacher:
Student:
1 Rahul Singh 1 18
teacher_id will be the primary key for teacher table and will be foreign key for the student table,
attributes of the teacher table will be available in the student table.
File Organization: Indexing, Structure of Index Files and Types, Dense and
Sparse Indexing
***INDEXING: Indexing is a very useful technique that helps in optimizing the search time
in database queries. The table of database indexing consists of a search key and pointer.
The index is a type of data structure. It is used to locate and access the data in a database table
quickly.
Primary Indexing:
Clustering Indexing:
SUMMARY:
INDEX STRUCTURE:
The first column of the database is the search key that contains a copy of the primary key or
candidate key of the table. The values of the primary key are stored in sorted order so that the
corresponding data can be accessed easily.
The second column of the database is the data reference. It contains a set of pointers holding the
address of the disk block where the value of the particular key can be found.
Data pointers pointing to Each record in the data file Fewer records in the data file
Ordering The dense index can be built The sparse index can be built
on order as well as only on the ordered field of
unordered fields of the the database file.
database files.
Root: A root can have children between 2 & P {here, P is called the order of tree}
External Node/ Leaf Node: Nodes with no child called leaf nodes
Max P P
Min 2 P/2
Example: Consider a B+ Tree in which the maximum number of keys in a node is 5. What is the
order of tree, minimum number of keys in root node, internal node and leaf node?
Therefore, P-1 = 5
P=5+1=6
Order of Tree, P = 6
Example: The order of a leaf node in a B+ tree is the maximum number of (value, data record
pointer) pairs it can hold. Given that the block size is 1K bytes, data record pointer is 7
bytes long, the value field is 9 bytes long and a block pointer is 6 bytes long, what is the order of
the leaf node, maximum number of keys?
Order, P = ?
Formula: P[(Key Size + Record Pointer Size) + Block Pointer Size] ≤ Block Size
16P + 6 ≤ 1024
16P = 1024 - 6
16P = 1018
P = 1018/16
P = 63
Example: A data file consisting of 1,50,000 student-records is stored on a hard disk with block
size of 4096 bytes. The data file is sorted on the primary key RollNo. The size of a record pointer
for this disk is 7 bytes. Each student-record has a candidate key attribute called ANum of
size 12 bytes. Suppose an index file with records consisting of two fields, ANum value and the
record pointer the corresponding student record, is built and stored on the same disk. Assume
that the records of data file and index file are not split across disk blocks. The number of blocks
in the index file is ________
Solution:
Formula:
Number of Blocks need to store all the records = Total Number of Records/ Number of
Records into the block
Number of Blocks need to store all the records = 150000/ 215 = 697.67 = 698
Transactions access data using read and write operations. In order to maintain consistency in a
database, before and after the transaction, certain properties are followed. These are called
***ACID properties.
A: Atomicity: By this, we mean that either the entire transaction takes place at once or doesn’t
happen at all. There is no midway i.e. transactions do not occur partially. Each transaction is
considered as one unit and either runs to completion or is not executed at all.
C: Consistency: This means that integrity constraints must be maintained so that the database is
consistent before and after the transaction. It refers to the correctness of a database.
I: Isolation: This property ensures that multiple transactions can occur concurrently without
leading to the inconsistency of database state. Transactions occur independently without
interference. Changes occurring in a particular transaction will not be visible to any other
transaction until that particular change in that transaction is written to memory or has been
committed.
D: Durability: This property ensures that once the transaction has completed execution, the
updates and modifications to the database are stored in and written to disk and they persist even
if a system failure occurs. These updates now become permanent and are stored in non-volatile
memory. The effects of the transaction, thus, are never lost.
***Transaction States:
Active State: As we have discussed in the DBMS transaction introduction that a transaction is a
sequence of operations. If a transaction is in execution then it is said to be in active state.
Failed State: If a transaction is executing and a failure occurs, either a hardware failure or a
software failure then the transaction goes into failed state from the active state. (Power failure)
Partially Committed State: As we can see in the above diagram that a transaction goes into
“partially committed” state from the active state when there are read and write operations present
in the transaction. A transaction contains number of read and write operations. Once the whole
transaction is successfully executed, the transaction goes into partially committed state where we
have all the read and write operations performed on the main memory (local memory) instead of
the actual database. The reason why we have this state is because a transaction can fail during
execution so if we are making the changes in the actual database instead of local memory,
database may be left in an inconsistent state in case of any failure. This state helps us to rollback
the changes made to the database in case of a failure during execution.
Committed State: If a transaction completes the execution successfully then all the changes
made in the local memory during partially committed state are permanently stored in the
database. You can also see in the above diagram that a transaction goes from partially committed
state to committed state when everything is successful.
Terminated State: This is the last state in the life cycle of a transaction. After entering the
committed state or aborted state, the transaction finally enters into a terminated state where its
life cycle finally comes to an end.
Serial Schedules: In Serial schedule, a transaction is executed completely before starting the
execution of another transaction. In other words, you can say that in serial schedule, a transaction
does not start execution until the currently running transaction finished execution. This type of
execution of transaction is also known as serial execution.
Example: Consider the following schedule involving two transactions T1 and T2.
This is a serial schedule since the transactions perform serially in the order T1 —> T2
There are two transactions T1 and T2 executing serially one after the other.
In non-serial/parallel schedules,
Operations of all the transactions are inter leaved or mixed with each other.
Consistent
Recoverable
Cascade less
Strict
FOR DIFFRENT VARIABLES (A,B): No conflict occurs between two different variables
Deadlock is a critical challenge in database management systems (DBMS) that can cause system
failure and impact overall performance. The potential for deadlocks increases in a multi-user
environment, where multiple transactions compete for shared resources.
A deadlock in DBMS (Database Management System) occurs when two or more transactions are
stuck in a never-ending or indefinite waiting state. Each transaction needs a resource that another
transaction holds, which is referred to as the locking of resources, thus, creating a deadlock
situation.
The figure shows an example where process P1 holds the resource R1 and requesting for R2.
Similarly, P2 holds the resource R2 and requesting for R1 . Both processes are waiting for each
other to release the resource to proceed further.
Mutual Exclusion: It arises when only one process gets allocated to a single resource at a time.
If multiple processes request the same resources simultaneously, deadlock can occur if they are
unable to process unless they can lock that resource. In the example that we discussed, cars A
and B got stuck in a condition where only one of them could go through the lane, creating a
mutual exclusion condition.
No Preemption: Resources cannot be forcibly taken away from processes, meaning once you
hold some resource, you keep it until you're done utilizing it.
If process P1 is using a resource R1 and a high priority process P2 enters and requests for the
resource R1, than process P1 will not stopped and the resource R1 not allocated to P2.
Circular wait: Processes form a circular chain of resource dependencies, where each process
waits for a resource held by another process, resulting in a never-ending cycle. We discussed this
condition in the example, where vehicles A and B were “circularly” waiting for each other to
release the lane for them to proceed.
In circular wait, two or more processes wait for resources in a circular order.
Example:
1. Deadlock Prevention in DBMS: To prevent deadlocks before they can happen. For
instance, only one car can cross an intersection at a time.
2. Deadlock Avoidance in DBMS: Similar to having a traffic controller managing the flow of
cars, in DBMS, a system monitors potential deadlocks and decides which processes can
proceed to prevent deadlock situations.
3. Deadlock Ignorance in DBMS: Imagine cars driving without specific rules or precautions.
In DBMS, deadlock ignorance means not taking specific measures. It's like allowing cars to
navigate without traffic lights or controllers and dealing with any issues as they arise.
4. Deadlock Detection in DBMS: If there's a traffic accident, it's detected, and authorities
respond. Similarly, in DBMS, deadlock detection identifies when a deadlock has occurred,
and the system takes action to resolve it.
The LOG is a sequence of log records, recording all the update activities in the database.
<Ti, Xj, V1, V2> Transaction Ti has performed, Xj is variable, V1 is old value, V2 is new value
IT IS OF TWO TYPES:-
In an immediate update, the changes are In a deferred update, the changes are not
applied directly/immediately to the database. applied immediately to the database.
The log file contains both old as well as new The log file contains only new value.
values.
Represented as: <Ti, Xj, V1, V2> Represented as: <Ti, Xj, V2>
T1 <T1 Start> T0 T1
B = B + 50;
W (B)
<T0 Start>
<T0, A, 550>
<T0 Commit>
<T0, B, 950>
<T0 Commit>
<T1 Start>
<T1, C, 1100>
<T1 Commit>
If failure occurs after <Ti, Xj, V1, V2> or <Ti, Xj, V2> than perform
REDO: To Restart
UNDO: To Overwrite
VIEW TABLE: View table is a logical table based on one or more original table/ base table.
Characteristics:
Views are used for security purpose because they provide encapsulation.
Types of Views:
1. Simple View: View table formed with the help of one base table. Does not stores data.
2. Complex View: View table formed with the help of two or more base tables. Does not stores
data.
3. Materialized View: Like a physical table that stores query output/ intermediate results. It
stores Data. This is basically used in Data ware House.
UPDATE
DELETE
Aggregate functions
(SUM(), MIN(), MAX(), COUNT(), Aggregate functions
AVG()) (SUM(), MIN(), MAX(), COUNT(),
AVG())
DISTINCT
DISTINCT
GROUP BY
GROUP BY
HAVING
HAVING
UNION or UNION ALL
UNION or UNION ALL
Sub-query in the select and where clause
Subquery in the select and where clause
Left Join, Outer Join
Left Join, Outer Join
Growing Phase: During this phase, a transaction can obtain (acquire) any number of locks as
required but cannot release any. This phase continues until the transaction acquires all the locks
it needs and no longer requests.
Shrinking Phase: Once the transaction releases its first lock, the Shrinking phase starts. During
this phase, the transaction can release but not acquire any more locks.
Lock Point: The exact moment when the transaction switches from the Growing phase to the
Shrinking phase (i.e. when it releases its first lock) is termed the lock point.
The primary purpose of the Two-Phase Locking protocol is to ensure conflict serializability, as
the protocol ensures a transaction does not interfere with others in ways that produce inconsistent
results.
Is hardware independent.
Gives users a simple interface to open, read/write records, and close files.
Data Fragmentation: Data fragmentation involves breaking down a database into smaller, more
manageable pieces or fragments. There are several types of data fragmentation:
Horizontal Fragmentation:
Vertical Fragmentation:
Data Replication: Data replication involves creating and maintaining copies of the same data on
multiple nodes in a distributed database system. Replication provides benefits such as increased
availability and fault tolerance. There are different types of data replication:
Full Replication:
All nodes in the distributed system have a complete copy of the entire database.
Partial Replication:
Selective Replication:
Only specific portions of the database are replicated based on certain criteria.
Advantages:
High Availability: Redundant copies ensure system remains accessible despite node failures.
Improved Performance: Parallel processing and data localization enhance query response
times.
Data Security: Redundancy and replication enhance data security and resilience.
Disadvantages:
Costs: Initial setup and maintenance costs can be higher due to complexity.
Refer Assignment 2
Set 1: Ques 1, 2, 3, 4, 5
Set 2: Ques 4, 5
Set 3: Ques 1, 2, 3
Set 4: Ques 1, 2, 4
Transaction States
ACID properties
Deadlock handling
Distributed Database