DBMS Unit 3 Notes by MultiAtomsPlus (1)
DBMS Unit 3 Notes by MultiAtomsPlus (1)
Unit-3
Syllabus
Topics Covered
What is Normalization?
It is a systematic process in database design to organize data into multiple tables to:
It involves dividing large tables into smaller, well-structured tables while maintaining
relationships between them. This process adheres to a set of rules, called normal forms,
to achieve an efficient and error-free database design.
Importance of Normalization
Eliminates Redundancy
Improves Data Integrity
Facilitates Scalability
Normalized databases are easier to manage and scale as the amount of data grows.
Example
Unnormalized Table:
Problems:
Table 3 (Instructors):
Table 4 (Enrollments):
A Functional Dependency (FD) is a rule that defines how one column in a table
determines the value of another column.
Denoted as X → Y, where:
X (Determinant): The column(s) whose values decide another column.
Y (Dependent): The column whose value depends on X.
StudentID → Name: If you know the StudentID, you can find the Name.
Course → Name: This is not true, as the same course could be taken by multiple students.
Armstrong's axioms are basic rules to derive all possible functional dependencies
(FDs) from a given set of FDs. There are three main axioms:
1. Reflexivity
Rule: If a set of attributes (Y) is a subset of another set of attributes (X), then X → Y
holds true.
Meaning: Any attribute or group of attributes determines itself or its subset.
{Roll_No, Name} → Name: Since Name is a part (subset) of {Roll_No, Name}, this FD is
valid.
Roll_No → Roll_No: Any column always determines itself.
2. Augmentation
{StudentID, Name} → Name: This is trivial because Name is already part of the
determinant {StudentID, Name}.
StudentID → StudentID: This is also trivial.
StudentID → {Hobby, Skill}, but Hobby and Skill do not depend on each other.
{OrderID, ProductID} → Quantity depends on the full composite key, so it's fully
functional.
3. 𝐴 → 𝐵
4. 𝐴𝐵 → 𝐶
Step 1: Split RHS of FDs
If any FD has multiple attributes on the right-hand side, split it into multiple FDs
with one attribute on the right side.
𝐴 → 𝐵𝐶 becomes:
𝐴→𝐵
𝐴→𝐶
1. 𝐴 → 𝐵
2. 𝐴 → 𝐶
3. 𝐵 → 𝐶
4. 𝐴 → 𝐵
1. 5. 𝐴𝐵 → 𝐶
Updated FDs:
1. 𝐴 → 𝐵
2. 𝐴 → 𝐶
3. 𝐵 → 𝐶
4. 𝐴𝐵 → 𝐶
Let’s analyze 𝐴𝐵 → 𝐶:
1. 𝐴 → 𝐵
2. 𝐴 → 𝐶
3. 𝐵 → 𝐶
Importance of Normalization:
Minimizes Redundancy: Reduces duplicate data.
Improves Data Integrity: Ensures that data is accurate and consistent.
Simplifies Maintenance: Makes it easier to update data.
Optimizes Queries: Improves database performance by reducing unnecessary
joins.
Step-by-Step Explanation of Normal Forms
Prime Attributes
A prime attribute is an attribute that is part of at least one candidate key of a relation.
Candidate Key: A minimal set of attributes that can uniquely identify every tuple (row) in a
relation.
Non-Prime Attributes
A non-prime attribute is an attribute that is not part of any candidate key.
Example
Consider a relation 𝑅(𝐴, 𝐵, 𝐶, 𝐷) with the following Functional Dependencies (FDs):
1. 𝐴𝐵 →𝐶
2. 𝐶 → 𝐷
𝐴𝐵 is a candidate key because it can uniquely identify all attributes in the relation.
S 2 Id if P i d N P i A ib
Step 2: Identify Prime and Non-Prime Attributes:
Prime Attributes: 𝐴, 𝐵 (because they are part of the candidate key 𝐴𝐵).
Non-Prime Attributes: 𝐶, 𝐷 (because they are not part of any candidate key).
Functional Dependencies:
1. 𝐴𝐵 → 𝐶
2. 𝐶 → 𝐷
Tips
1. Always start by calculating closures.
2. Look for minimal sets of attributes that can uniquely identify all attributes.
3. Eliminate redundancy to identify candidate keys.
Example
Table: Student_Subject
1. Student Table
Example
Table: Employee
𝐸𝑚𝑝_𝐼𝐷 → 𝐷𝑒𝑝𝑡_𝐼𝐷
𝐷𝑒𝑝𝑡_𝐼𝐷 → 𝐷𝑒𝑝𝑡_𝑁𝑎𝑚𝑒, 𝐷𝑒𝑝𝑡_𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛
Problem:
1. Employee Table
2. Department Table
Key Terms
2. VerifyFunctional Dependencies.
For each dependency 𝑋 → 𝑌, check if 𝑋 is a superkey.
3. Decompose if Needed.
If 𝑋 is not a superkey, decompose the table into smaller tables until every determinant is a
superkey.
Example
Table: Student
Functional Dependencies:
1. 𝑅𝑜𝑙𝑙_𝑁𝑜 → 𝑆𝑢𝑏𝑗𝑒𝑐𝑡
2. 𝑆𝑢𝑏𝑗𝑒𝑐𝑡 → 𝑇𝑒𝑎𝑐ℎ𝑒𝑟
1. Student_Subject Table
2. Subject_Teacher Table
AKTU- 2023-24
Normalization reduces redundancy, improves data integrity, avoids anomalies, and makes
databases easier to maintain.
AKTU- 2022-23
Relation R(A, B, C, D, E)
FDs: 𝐴𝐵 → 𝐶, 𝐵 → 𝐸, 𝐶 → 𝐷
Candidate Key: 𝐴𝐵.
Answer:
Prime Attributes: A, B.
Non-Prime Attributes: C, D, E.
1. 𝐴𝐵 → 𝐶
2. 𝐴 → 𝐷𝐸
3. 𝐵 → 𝐹
4. 𝐹 → 𝐺𝐻
5. 𝐷 → 𝐼𝐽
Attributes: 𝐴, 𝐵, 𝐶, 𝐷, 𝐸, 𝐹, 𝐺, 𝐻, 𝐼, 𝐽
Step-by-Step Process:
𝐴𝐵 determines 𝐶.
𝐴 determines 𝐷, 𝐸 and 𝐵 determines 𝐹.
𝐹 determines 𝐺, 𝐻 and 𝐷 determines 𝐼, 𝐽.
By combining, 𝐴𝐵 can determine all other attributes (𝐴𝐵+ = {𝐴, 𝐵, 𝐶, 𝐷, 𝐸, 𝐹, 𝐺, 𝐻, 𝐼, 𝐽}.
Partial Dependencies:
Attributes: 𝐴, 𝐷, 𝐸.
Key: 𝐴.
Attributes: 𝐵, 𝐹.
Key: 𝐵.
Attributes: 𝐹, 𝐺, 𝐻.
Key: 𝐹
Key: 𝐹.
Attributes: 𝐷, 𝐼, 𝐽.
Key: 𝐷.
Attributes: 𝐴, 𝐵, 𝐶.
Key: 𝐴𝐵.
An Inclusion Dependency is a constraint that specifies that the values in one set of
columns in a table (relation) must appear in another set of columns in another table
(or the same table). It expresses a subset relationship between attributes.
Student Table
Course Table
The course_id column in Students must match a value in the course_id column in Courses.
This is an Inclusion Dependency.
It ensures that every course a student takes (in the Students table) actually exists in the list of
courses (in the Courses table).
Constraint: 𝜋𝑐𝑜𝑢𝑟𝑠𝑒𝑖 𝑑 (𝑆𝑡𝑢𝑑𝑒𝑛𝑡) ⊆ 𝜋𝑐𝑜𝑢𝑟𝑠𝑒𝑖 𝑑 (𝐶𝑜𝑢𝑟𝑠𝑒)
Why Is It Important?
1. Keeps Data Consistent: It prevents errors, like assigning a student to a course that
doesn’t exist.
2. Maintains Relationships: It keeps the "connections" between tables intact.
3. Avoids Data Loss: If a course is removed, any student enrolled in that course can also
be removed safely, keeping the database clean.
When we split a large table (relation 𝑅) into smaller tables (like 𝑅1 and 𝑅2), lossless join
decomposition ensures that no data is lost when we combine (join) these smaller tables back
to reconstruct the original table.
Why is it Important?
If decomposition is not lossless, some data may be missing or duplicated when we try to
reconstruct the original table.
A lossless decomposition guarantees that the original data can always be obtained by a
natural join operation on the decomposed tables.
1.Union of Attributes:
The combined attributes of 𝑅1 and 𝑅2 must equal all the attributes of 𝑅.
Mathematically:
Attributes(𝑅1) ∪ Attributes(𝑅2) = Attributes(𝑅)
This ensures that the common attribute(s) can uniquely identify tuples in at least one of the
decomposed tables.
Aktu-2022-23
1. Condition 1:
Union of attributes:
𝑅1 ∪ 𝑅2 = (𝑉, 𝑊, 𝑋) ∪ (𝑉, 𝑌, 𝑍) = (𝑉, 𝑊, 𝑋, 𝑌, 𝑍)
✅ Matches 𝑅.
2. Condition 2:
Intersection of attributes:
𝑅1 ∩ 𝑅2 = (𝑉, 𝑊, 𝑋) ∩ (𝑉, 𝑌, 𝑍) = (𝑉)
✅ Not empty.
3. Condition 3:
Common attribute 𝑉 is a key:
In 𝑅1: 𝑉 → 𝑊𝑋 ✅
✅ Lossless join!
Decomposition (ii): 𝑅1 = (𝑉, 𝑊, 𝑋), 𝑅2 = (𝑋, 𝑌, 𝑍)
1. Condition 1:
Union of attributes:
✅ Matches 𝑅.
2. Condition 2:
Intersection of attributes:
✅ Not empty.
3. Condition 3:
Common attribute 𝑋 is not a key in 𝑅1 or 𝑅2:
𝑋 alone doesn’t determine all attributes in 𝑅1 or 𝑅2.
Final Answer
Trivial MVD: If 𝑋 and 𝑌 overlap completely or their union equals the whole table.
Non-Trivial MVD: If 𝑋 and 𝑌 are separate and unrelated.
Example:
Trivial: 𝑇𝑒𝑎𝑐ℎ𝑒𝑟 ↠ 𝑇𝑒𝑎𝑐ℎ𝑒𝑟, 𝑆𝑢𝑏𝑗𝑒𝑐𝑡.
Non-Trivial: 𝑇𝑒𝑎𝑐ℎ𝑒𝑟 ↠ 𝑆𝑢𝑏𝑗𝑒𝑐𝑡 (when Subject doesn’t overlap with Teacher).
What is JD?
JD specifies that a relation 𝑅R can be decomposed into smaller relations 𝑅1 , 𝑅2 , ..., 𝑅𝑛 such
that the original relation can be perfectly reconstructed (lossless join).
How is it denoted?
𝐽𝐷(𝑅1 , 𝑅2 , ..., 𝑅𝑛 ): 𝑅1 , 𝑅2 , ..., 𝑅𝑛 are subsets of 𝑅.
When is JD Trivial?
If any one of the relations 𝑅1 , 𝑅2 , ..., 𝑅𝑛 is equal to the entire relation 𝑅, the JD is trivial.
Example of JD:
A table is in 4NF if it is in BCNF (Boyce-Codd Normal Form) and does not have any non-trivial
Multivalued Dependencies (MVDs).
In simple terms, 4NF eliminates redundancy caused by independent multivalued facts.
Example:
Problem:
The Subject and Committee are independent of each other.
𝑇𝑒𝑎𝑐ℎ𝑒𝑟 ↠ 𝑆𝑢𝑏𝑗𝑒𝑐𝑡 and 𝑇𝑒𝑎𝑐ℎ𝑒𝑟 ↠ 𝐶𝑜𝑚𝑚𝑖𝑡𝑡𝑒𝑒 are independent multivalued dependencies.
A table is in 5NF if it is in 4NF and cannot be decomposed further without losing data.
5NF deals with Join Dependencies (JD).
Example:
Problem:
This table has a join dependency: The relationship between Student, Course, and Teacher can
be split into three smaller relationships.
Decomposition into 5NF:
Database design typically follows the normalization approach, but there are alternative
methods that may be more appropriate depending on the application's requirements. These
approaches focus on optimizing database performance, reducing redundancy, and ensuring
data integrity, sometimes diverging from strict normalization principles.
1. Denormalization
What it is: Combines multiple tables into one to make data retrieval faster.
When to use: For read-heavy applications where performance is more important than
reducing redundancy.
Example: Instead of separate Customer and Orders tables, combine them into one table
with all details.
Advantage: Faster reads.
Disadvantage: Data redundancy increases.
4. Graph-Based Design
What it is: Stores data as nodes (items) and edges (relationships).
When to use: For highly interconnected data like social networks.
Example: A User node connected to Friends or Posts nodes.
Advantage: Great for relationship-heavy queries.
Disadvantage: Not ideal for standard tabular data.