Database Management Systems (DBMS) Study Guide
Introduction
A Database Management System (DBMS) is software that provides an interface to interact with databases.
It manages data storage, retrieval, and organization while ensuring data integrity, security, and concurrent
access.
Database Fundamentals
Database Models
1. Hierarchical Model: Tree-like structure with parent-child relationships
2. Network Model: Graph structure allowing multiple parent-child relationships
3. Relational Model: Data stored in tables with relationships via keys
4. Object-Oriented Model: Data represented as objects with attributes and methods
5. NoSQL Models: Document, Key-Value, Column-family, Graph databases
Three-Schema Architecture
1. External Schema: User view of database
2. Conceptual Schema: Logical structure of entire database
3. Internal Schema: Physical storage structure
Data Independence
Logical Data Independence: Changes in conceptual schema don't affect external schemas
Physical Data Independence: Changes in internal schema don't affect conceptual schema
Relational Database Model
Key Concepts
Relation: Table with rows (tuples) and columns (attributes)
Domain: Set of possible values for an attribute
Cardinality: Number of tuples in a relation
Degree: Number of attributes in a relation
Keys
Super Key: Set of attributes that uniquely identifies tuples
Candidate Key: Minimal super key
Primary Key: Chosen candidate key for unique identification
Foreign Key: References primary key of another relation
Composite Key: Key consisting of multiple attributes
Integrity Constraints
1. Entity Integrity: Primary key cannot be null
2. Referential Integrity: Foreign key must reference existing primary key
3. Domain Integrity: Attribute values must be from specified domain
4. User-defined Integrity: Business rules and constraints
Entity-Relationship (ER) Model
Components
Entity: Real-world object with attributes
Attribute: Property of an entity
Relationship: Association between entities
Cardinality: One-to-One, One-to-Many, Many-to-Many
ER Diagram Symbols
Rectangle: Entity
Oval: Attribute
Diamond: Relationship
Lines: Connections between components
Enhanced ER Features
Generalization/Specialization: Super-class and sub-class relationships
Aggregation: Treating relationship as higher-level entity
Weak Entity: Entity depending on another entity for identification
Normalization
Purpose
Eliminate data redundancy
Prevent update anomalies
Ensure data integrity
Optimize storage space
Normal Forms
First Normal Form (1NF)
Eliminate repeating groups
Each cell contains single atomic value
All entries in column are of same data type
Second Normal Form (2NF)
Must be in 1NF
Eliminate partial dependencies
All non-key attributes fully dependent on primary key
Third Normal Form (3NF)
Must be in 2NF
Eliminate transitive dependencies
Non-key attributes depend only on primary key
Boyce-Codd Normal Form (BCNF)
Stronger version of 3NF
Every determinant is a candidate key
Fourth Normal Form (4NF)
Must be in BCNF
Eliminate multi-valued dependencies
Fifth Normal Form (5NF)
Must be in 4NF
Eliminate join dependencies
Denormalization
Intentionally introducing redundancy for performance
Trade-off between storage space and query performance
SQL (Structured Query Language)
DDL (Data Definition Language)
CREATE: Create database objects
ALTER: Modify database objects
DROP: Delete database objects
TRUNCATE: Remove all records from table
DML (Data Manipulation Language)
SELECT: Retrieve data
INSERT: Add new records
UPDATE: Modify existing records
DELETE: Remove records
DCL (Data Control Language)
GRANT: Give privileges to users
REVOKE: Remove privileges from users
TCL (Transaction Control Language)
COMMIT: Save transaction changes
ROLLBACK: Undo transaction changes
SAVEPOINT: Set recovery point within transaction
Advanced SQL Features
Joins: Inner, Left Outer, Right Outer, Full Outer, Cross
Subqueries: Nested SELECT statements
Views: Virtual tables based on queries
Stored Procedures: Precompiled SQL code
Triggers: Automatic execution on database events
Indexes: Improve query performance
Transaction Management
ACID Properties
1. Atomicity: Transaction is all-or-nothing
2. Consistency: Database remains in valid state
3. Isolation: Concurrent transactions don't interfere
4. Durability: Committed changes are permanent
Transaction States
Active: Transaction is executing
Partially Committed: Final statement executed
Committed: Transaction successfully completed
Failed: Transaction cannot continue
Aborted: Transaction rolled back
Concurrency Control
Lost Update Problem: One transaction's update overwrites another's
Dirty Read Problem: Reading uncommitted data
Unrepeatable Read: Same query returns different results
Phantom Read: New rows appear between reads
Locking Mechanisms
Shared Lock: Multiple transactions can read
Exclusive Lock: Only one transaction can write
Two-Phase Locking: Growing and shrinking phases
Deadlock: Circular wait for resources
Isolation Levels
1. Read Uncommitted: No locks, all problems possible
2. Read Committed: Prevents dirty reads
3. Repeatable Read: Prevents dirty and unrepeatable reads
4. Serializable: Prevents all concurrency problems
Database Design Process
Requirements Analysis
Identify user requirements
Define functional and non-functional requirements
Understand business rules and constraints
Conceptual Design
Create ER model
Identify entities, attributes, and relationships
Define cardinalities and constraints
Logical Design
Convert ER model to relational schema
Apply normalization principles
Define integrity constraints
Physical Design
Choose storage structures
Design indexes for performance
Optimize for specific DBMS
Database Security
Access Control
Authentication: Verify user identity
Authorization: Control user permissions
Role-based Access Control: Assign permissions to roles
Security Threats
SQL Injection: Malicious SQL code injection
Data Breach: Unauthorized access to sensitive data
Privilege Escalation: Gaining unauthorized privileges
Security Measures
Input validation and sanitization
Encryption of sensitive data
Regular security audits
Backup and recovery procedures
NoSQL Databases
Types
1. Document Stores: MongoDB, CouchDB
2. Key-Value Stores: Redis, DynamoDB
3. Column-Family: Cassandra, HBase
4. Graph Databases: Neo4j, Amazon Neptune
CAP Theorem
Consistency: All nodes see same data simultaneously
Availability: System remains operational
Partition Tolerance: System continues despite network failures
Can only guarantee two out of three properties
When to Use NoSQL
Large-scale applications
Flexible schema requirements
Horizontal scaling needs
Real-time web applications
Database Performance
Indexing
B-Tree Indexes: Balanced tree structure
Hash Indexes: Fast equality lookups
Bitmap Indexes: Efficient for low cardinality data
Clustered vs Non-clustered Indexes
Query Optimization
Cost-based Optimization: Choose lowest cost execution plan
Rule-based Optimization: Apply predefined rules
Query Execution Plans: Visual representation of query processing
Performance Tuning
Analyze query execution plans
Create appropriate indexes
Optimize database design
Monitor system resources
Backup and Recovery
Backup Types
Full Backup: Complete database copy
Incremental Backup: Changes since last backup
Differential Backup: Changes since last full backup
Recovery Techniques
Log-based Recovery: Use transaction logs
Checkpoint-based Recovery: Periodic state snapshots
Shadow Paging: Maintain current and shadow pages
Disaster Recovery
Offsite backups
Replication strategies
Recovery time and point objectives
Conclusion
DBMS knowledge is essential for designing efficient, secure, and scalable database systems.
Understanding relational theory, normalization, SQL, transaction management, and modern NoSQL
alternatives provides a comprehensive foundation for database professionals. Regular practice with
different database systems and staying updated with emerging technologies will enhance your database
management skills.