1.
Database System vs File System
Aspect Database System File System
Centralized, structured (tables,
Structure Decentralized, flat files (text, binary)
relations)
Minimal (normalization reduces
Redundancy High (same data stored in multiple files)
duplication)
Access
Advanced (user roles, permissions) Limited (OS-level file permissions)
Control
No query language (manual programming
Querying Supports complex queries (SQL)
needed)
Consistency ACID properties ensure data integrity No built-in consistency checks
Key Difference:
A database system (e.g., MySQL, Oracle) provides structured, secure, and efficient data management,
while a file system (e.g., NTFS, FAT32) is simpler but lacks relationships, security, and scalability.
2. Data Model Schema & Instances
Term Definition Example
Blueprint of the database (logical design). Defines
Schema Student (roll_no, name, course)
tables, attributes, relationships.
Snapshot of data at a given time (actual stored data).
Instance (101, "Alice", "CS"), (102, "Bob", "EE")
Changes with updates.
Schema: Employee(id, name) →
Relation Schema remains fixed; instances vary.
Instance: (1, "John")
Why Important?
Schema ensures structural integrity.
Instance reflects current data state.
3. Data Independence
Type Definition Example
Logical Changes in schema (e.g., adding a Adding email to Student table without
Independence column) don’t affect applications. breaking apps.
Physical Changes in storage (e.g., switching from
Moving a database to cloud storage.
Independence HDD to SSD) don’t affect schema.
Significance:
Enables scalability and flexibility in DBMS.
4. Entity-Relationship (ER) Model
Component Definition Notation
Entity Real-world object (e.g., Student, Course). Rectangle (Student)
Attribute Property of an entity (e.g., roll_no, name). Oval (name)
Association between entities (e.g.,
Relationship Diamond (Enrolls)
"Enrolls").
1:N (One student can enroll in many
Cardinality Constraints (1:1, 1:N, M:N).
courses)
Purpose:
Visualizes database design before implementation.
5. Keys in DBMS
Key Type Definition Example
Any attribute set that uniquely identifies a
Super Key {roll_no, name} (if roll_no is unique)
record.
Candidate Minimal super key (no redundant
roll_no or email (if both are unique)
Key attributes).
Chosen candidate key for unique
Primary Key roll_no (selected as main identifier)
identification.
dept_id in Employee references Department
Foreign Key Links two tables (references primary key).
.
Why Important?
Ensures data uniqueness and integrity.
6. Relational Data Model
Concept Definition Example
2D structure with rows (tuples) and columns Employee(id, name,
Relation (Table)
(attributes). salary)
Concept Definition Example
Tuple (Row) Single record in a table. (101, "Alice", 50000)
Attribute (Column) Field representing a property. name, salary
Integrity
Rules (e.g., NOT NULL, UNIQUE). PRIMARY KEY (id)
Constraints
Advantage:
Simple, flexible, and mathematically sound (relational algebra).
7. SQL (Structured Query Language)
Category Commands Purpose
DDL CREATE, ALTER, DROP Define/modify database structure.
SELECT, INSERT, UPDATE, DELET
DML Manipulate data.
E
DCL GRANT, REVOKE Control access permissions.
Aggregate Functions COUNT(), SUM(), AVG() Compute summary statistics.
Example Query:
sql
Copy
Download
SELECT name FROM Employee WHERE salary > 30000;
8. Normalization (1NF, 2NF, 3NF, BCNF)
Normal
Rule Example Fix
Form
1NF Atomic values (no repeating groups). Split Phone: "123,456" → Phone1:123, Phone2:456.
No partial dependency (all non-key Move course_name to a separate table if only
2NF
attributes depend on full PK). dependent on course_id.
No transitive dependency (non-key Remove dept_location if it depends
3NF
attributes depend only on PK). on dept_id (not emp_id).
Stricter 3NF (every determinant is a
BCNF Ensure no overlapping candidate keys.
candidate key).
Goal: Eliminate redundancy and anomalies.
9. Transactions & ACID Properties
Property Definition Example
All operations succeed or none do ("all-or- If a bank transfer fails, roll back both
Atomicity
nothing"). debit/credit.
Consistenc Database remains valid before/after A + B = Total must hold before/after
y transaction. transfer.
User X shouldn’t see User Y’s uncommitted
Isolation Concurrent transactions don’t interfere.
changes.
Committed changes persist even after
Durability Saved data survives power failure.
crashes.
Use Case: Banking, inventory systems.
10. Concurrency Control
Technique How It Works Example
Transaction T1 locks row for update;
Locking (2PL) Shared (read) / Exclusive (write) locks.
T2 waits.
Orders transactions by timestamp to avoid Older transaction (T1) gets priority
Timestamping
conflicts. over T2.
Validation Allow edits but validate at commit
Checks conflicts before commit.
(Optimistic) time.
Problem Solved: Prevents dirty reads, lost updates, and phantom reads.
11. Distributed Databases
Concept Definition Challenge
Data
Splits data across locations (horizontal/vertical). Ensuring consistency.
Fragmentation
Replication Copies data to multiple nodes for availability. Synchronizing updates.
Trade-off: Consistency, Availability, Partition Can’t achieve all three
CAP Theorem
Tolerance. simultaneously.
Example: Google Spanner, Cassandra.
12. Recovery Techniques
Method How It Works Use Case
Log-Based Crash recovery
Records changes (undo/redo logs).
Recovery via ROLLBACK or COMMIT.
Saves a consistent state to reduce log
Checkpoints Faster recovery after system failure.
scans.
Example:
Undo Log: Reverts incomplete transactions.
Redo Log: Reapplies committed changes.
UNIT I: Introduction & ER Modeling
Topic EXPLANATION
DBMS is a collection of programs that manage data and provide
Overview mechanisms for storage, retrieval, and security. It replaces traditional
file systems for better data management.
File systems store data in a flat file structure, leading to redundancy and
Database System vs File
inconsistency. DBMS offers centralized control, data abstraction, and
System
reduced redundancy.
Database System DBMS follows a 3-level architecture: internal (storage), conceptual
Concept and (logical structure), and external (user view). It ensures data abstraction
Architecture and independence.
Data Model, Schema, Data models define how data is structured; schema is the structure
and Instances definition, while instances are the actual data at a point in time.
It refers to the capacity to change the schema at one level without
Data Independence affecting the schema at the next higher level—logical and physical
independence.
DBMS supports languages like DDL (schema definition), DML (data
Database Languages and
manipulation), and query languages. Interfaces include GUI, SQL
Interfaces
console, and APIs.
Data Definition Language defines the database schema using commands
DDL
like CREATE, ALTER, and DROP.
Data Manipulation Language deals with data operations using INSERT,
DML
UPDATE, DELETE, and SELECT queries.
Overall Database It includes components like users, data, hardware, software, and DBMS
Structure utilities forming the complete data environment.
ER models represent real-world entities and relationships using entities,
ER Model Concepts
attributes, and relationships.
Standard ER notation includes rectangles (entities), diamonds
ER Diagram Notation
(relationships), and ovals (attributes).
It defines the number of entities participating in a relationship: one-to-
Mapping Constraints
one, one-to-many, or many-to-many.
Keys (Super, Candidate, Super key is any combination uniquely identifying rows. Candidate key
Topic EXPLANATION
Primary) is a minimal super key. Primary key is the chosen candidate key.
It is the process of combining lower-level entities into a higher-level
Generalization
entity.
It is used when a relationship itself acts as an entity and participates in
Aggregation
another relationship.
Reduction of ER ER diagrams are mapped to relational tables using standard rules for
Diagram to Tables entities, attributes, and relationships.
Extends ER with concepts like specialization, categorization, and
Extended ER Model
inheritance.
Relationship of Higher Relationships involving more than two entities are higher-degree
Degree relationships (ternary, quaternary).
📗 UNIT II: Relational Data Model and SQL
Topic EXPLANATION
Relational Data Model It represents data in tables (relations), where rows are tuples and
Concepts columns are attributes. Each table has a unique name.
Rules to maintain data accuracy and consistency—main types are
Integrity Constraints
domain, entity, and referential integrity.
Entity Integrity Ensures primary key in a table is unique and not NULL.
Ensures foreign key values in a table match primary key values in the
Referential Integrity
referenced table.
Keys uniquely identify records; types include super key, candidate
Key Constraints
key, and primary key.
Values in each column must be from a predefined domain (data type
Domain Constraints
and range).
A procedural query language using operations like SELECT,
Relational Algebra
PROJECT, UNION, and JOIN.
A non-procedural query language; includes Tuple Relational Calculus
Relational Calculus
(TRC) and Domain Relational Calculus (DRC).
Tuple and Domain TRC uses variables representing tuples; DRC uses variables
Calculus representing domain values.
SQL is a declarative language, easy to learn, supports both DDL and
Characteristics of SQL
DML, and is standardized.
Simple syntax, supports complex queries, embedded in various
Advantages of SQL
languages, and is widely used across platforms.
SQL Data Types and SQL supports various data types like INT, VARCHAR, DATE, along
Literals with constants known as literals.
DDL, DML, DCL (Data Control Language), and TCL (Transaction
Types of SQL Commands
Control Language).
SQL Operators and Their Operators include arithmetic, comparison, logical, and set operators
Procedure used in query expressions.
Tables, Views, and Tables store data, views are virtual tables, and indexes enhance data
Topic EXPLANATION
Indexes retrieval speed.
Queries retrieve data using SELECT; subqueries are nested SELECTs
Queries and Subqueries
inside another query.
Aggregate Functions Functions like COUNT, SUM, AVG, MIN, and MAX summarize data.
Used to add, modify, or remove records from tables using INSERT,
Insert, Update, Delete
UPDATE, and DELETE commands.
Joins, Unions, Joins combine rows from tables; UNION, INTERSECT, and MINUS
Intersections, Minus perform set operations.
A database object used to retrieve and process each row individually
Cursors
from a result set.
Procedures that automatically execute in response to certain events on
Triggers
a table.
Procedures in Stored procedures are precompiled SQL code blocks for repeated
SQL/PLSQL execution and encapsulation.
📙 UNIT III: Database Design & Normalization
Topic EXPLANATION
A relationship where one set of attributes determines another.
Functional Dependencies
Fundamental for normalization.
Structured levels (1NF to BCNF) to reduce redundancy and
Normal Forms
anomalies in relational databases.
First Normal Form (1NF) Eliminates repeating groups; ensures atomic values in each cell.
Removes partial dependency; applicable only when the table is in
Second Normal Form (2NF)
1NF and has a composite key.
Removes transitive dependency; all attributes should depend only on
Third Normal Form (3NF)
the primary key.
A stricter version of 3NF where every determinant must be a
BCNF
candidate key/Super Key.
Specifies that a set of values in one relation must appear in another;
Inclusion Dependence
supports referential integrity.
Lossless Join Decomposing tables without losing any information or creating
Decomposition spurious tuples.
Normalization using FD, Uses Functional, Multivalued, and Join Dependencies to normalize
MVD, JD databases beyond BCNF.
Includes ER modeling, heuristic-based design, or bottom-up/top-
Alternative Approaches
down design approaches.
📕 UNIT IV: Transaction and Distributed DBMS
Topic EXPLANATION
Transaction System A transaction is a logical unit of work that must be executed fully or
Topic EXPLANATION
not at all to maintain consistency.
Ensures transactions execute concurrently without conflicting,
Testing of Serializability
preserving consistency like in serial execution.
Serializability of A schedule is serializable if its outcome is equivalent to some serial
Schedules execution of transactions.
Conflict & View Conflict serializability considers reordering of conflicting operations;
Serializable view serializability is based on output equivalence.
Ensures committed transactions do not depend on uncommitted ones,
Recoverability
avoiding cascading aborts.
Recovery from Techniques include log-based recovery, checkpoints, and undo/redo
Transaction Failures operations.
Maintains a log of changes made by transactions to support recovery
Log Based Recovery
during failures.
A snapshot of the database state used to speed up recovery by reducing
Checkpoints
log scanning.
Techniques include wait-die, wound-wait, timeout, or using detection
Deadlock Handling
and prevention algorithms.
In distributed DBMS, data is stored across multiple locations to
Distributed Data Storage
improve availability and performance.
Maintains correctness and isolation in distributed or multi-user
Concurrency Control
environments using locking or timestamps.
Maintains metadata about data distribution, locations, and schema in
Directory System
distributed databases.
📒 UNIT V: Concurrency Control Techniques
Topic EXPLANATION
Ensures correct results for concurrent transactions by preserving
Concurrency Control
isolation and consistency.
Use shared/exclusive locks; Two-Phase Locking (2PL) is a popular
Locking Techniques
protocol ensuring serializability.
Assigns timestamps to transactions for ordering; avoids conflicts by
Time Stamping Protocols
using read/write rules based on timestamps.
Checks for conflicts only at the end of transactions, suitable for read-
Validation Based Protocol
heavy workloads.
Allows locks at various levels (database, table, row) to improve
Multiple Granularity
concurrency and reduce locking overhead.
Maintains multiple versions of data items to allow concurrent reads
Multi Version Schemes
and writes without conflict.
Recovery with Concurrent Uses logs and checkpoints to recover transactions while maintaining
Transactions consistency across concurrent executions.