DECAP200_Database Management System
DECAP200_Database Management System
The unit also discusses the drawbacks of traditional file processing systems, such as data redundancy,
difficulty in accessing data, and lack of data independence. In contrast, DBMSs offer several advantages,
including reducing redundancy, ensuring data independence, and improving data integrity. A key feature of
DBMSs is their architecture, which includes hardware, software, data, procedures, and database access
language.
Furthermore, the unit introduces the concept of data models in DBMS, such as the hierarchical, network,
entity-relationship, and relational models, emphasizing the relational model as the most common. It also
covers the concept of data independence—both logical and physical—and explains constraints in DBMS,
such as NOT NULL, UNIQUE, PRIMARY KEY, and FOREIGN KEY, which are used to enforce data
integrity.
The unit concludes with an overview of DBMS components and their roles, including database administrators
(DBA), developers, and end-users, and emphasizes the importance of understanding these concepts for
effective database management and design.
A database is a structured collection of data that is stored and managed in a computer system. It is
designed to efficiently store, retrieve, and manipulate data. Databases allow for easy access to vast
amounts of data and help organize information in a way that reduces redundancy and improves the
consistency of data.
Data: Raw facts and figures that can be processed to generate meaningful information.
Database Management System (DBMS): Software that provides an interface for users to interact with
databases and manage data effectively.
Schemas: The structure or blueprint of the database, defining how data is organized.
Tables: Organize data into rows and columns, forming the primary unit of storage in a relational
database.
Queries: Requests made to the database to retrieve or modify data, often using a language like SQL
(Structured Query Language).
Keys: Unique identifiers (like primary keys) used to ensure data integrity and relationships between
tables.
1
2. List and Explain Various Database System Applications
Database systems are used in a wide variety of applications across industries. Some common
applications include:
Banking: Storing transaction data, account details, and maintaining financial records.
Airlines: Managing flight schedules, reservations, customer data, and ticketing.
Telecommunications: Managing customer data, billing systems, call records, and service plans.
Healthcare: Storing patient records, treatment history, and managing hospital data.
Retail: Tracking inventory, sales data, and customer transactions.
Educational Institutions: Maintaining student records, grades, and class schedules.
These applications utilize databases to store, retrieve, and manipulate data quickly, efficiently, and
securely.
3. What Are the Differences Between File Processing Systems and DBMS?
File processing systems and Database Management Systems (DBMS) are both used to store data, but
they differ in several ways:
Data Redundancy: In file processing systems, each application has its own data storage, leading to
data redundancy. In DBMS, data is centralized, reducing redundancy and inconsistency.
Data Access: File systems require custom application programs to access data, which is time-
consuming and error-prone. DBMS provides standardized interfaces (like SQL) for data access, making
it more efficient.
Data Integrity: File systems do not have built-in mechanisms for maintaining data integrity. DBMS
ensures data consistency through constraints, keys, and ACID properties (Atomicity, Consistency,
Isolation, Durability).
Security: File systems lack centralized security mechanisms, while DBMS provides security features
like user access control and encryption.
Backup and Recovery: File systems require manual intervention for backups. DBMS includes
automated backup and recovery systems to protect data.
Data Integrity: DBMS ensures accuracy and consistency of data through integrity constraints like
primary keys and foreign keys.
Data Redundancy Control: It minimizes duplication by centralizing data storage, ensuring that data is
shared across multiple applications without unnecessary repetition.
Security: DBMS provides robust security mechanisms, such as access control and authentication, to
protect sensitive data.
Efficient Data Access: DBMS allows for fast data retrieval using queries, ensuring efficient use of
resources.
Backup and Recovery: DBMS provides automatic backup systems and recovery mechanisms to
prevent data loss.
Data Independence: DBMS provides abstraction from underlying data structures, making it easier to
change the database schema without affecting the application.
2
5. Write Short Notes on Disadvantages of Database Management System
Complexity: Implementing and managing a DBMS can be complex, requiring specialized knowledge
and expertise.
Cost: DBMS software can be expensive, and the hardware requirements may be significant.
Performance Overhead: DBMS may introduce performance overhead due to its features like data
integrity checks, which could slow down processing for certain tasks.
Maintenance: Regular maintenance, updates, and patches are required for a DBMS to function
efficiently.
Learning Curve: Database administrators and developers need to be trained in DBMS technologies,
which can take time.
Data independence refers to the ability to change the schema (structure) of a database at one level
without affecting the schema at the higher levels. There are two types of data independence:
Physical Data Independence: This allows changes in the physical storage of data (e.g., changing from
one storage medium to another) without affecting the logical structure or application programs.
Logical Data Independence: This allows changes to the logical schema (e.g., adding new fields or
tables) without affecting the external schema or application programs.
Data independence is important because it reduces the need for redesigning applications when making
changes to the database structure.
There are several types of database languages, which are used to interact with the DBMS:
Data Definition Language (DDL): Defines the structure of the database, including tables, columns,
and relationships (e.g., CREATE, ALTER, DROP).
Data Manipulation Language (DML): Used to retrieve, insert, update, or delete data in the database
(e.g., SELECT, INSERT, UPDATE, DELETE).
Data Control Language (DCL): Controls access to data in the database, including permissions and
security (e.g., GRANT, REVOKE).
Data Query Language (DQL): Focuses on querying data from the database, primarily using the
SELECT statement.
Transaction Control Language (TCL): Manages transactions in the database, ensuring that data
operations are completed successfully or rolled back if necessary (e.g., COMMIT, ROLLBACK).
The Database Administrator (DBA) is responsible for managing and overseeing the operation of the
database system. Their responsibilities include:
Database Design: Designing the database schema, determining the structure of tables, and ensuring
normalization to reduce redundancy.
3
Performance Monitoring: Monitoring the performance of the DBMS to ensure quick query processing
and addressing any bottlenecks.
Data Security: Ensuring the security of the database by implementing access control mechanisms and
encryption techniques.
Backup and Recovery: Setting up automated backup systems and creating disaster recovery plans to
protect data.
User Management: Managing database users, their roles, and ensuring appropriate permissions.
Maintenance and Upgrades: Performing regular maintenance tasks like updates, patches, and
optimizing the database for better performance.
A data user is an individual or application that interacts with the database to retrieve or modify data.
There are different types of data users:
End Users: These are individuals who directly interact with the database, typically through a user
interface or application. They query the database for information (e.g., customers or employees).
Application Programmers: They design and develop software applications that interact with the
database by writing code that queries or manipulates the database.
Database Administrators (DBAs): As previously mentioned, DBAs manage the database and ensure
its smooth operation.
Data Analysts: These users analyze data, generate reports, and use the data for decision-making
processes.
Internal Level: This is the lowest level that deals with the physical storage of data. It defines how the
data is actually stored in the database and manages data structures like indexes and files.
Conceptual Level: This is the middle level that defines the logical structure of the entire database. It
specifies what data is stored and the relationships among the data, abstracting the complexities of
physical storage.
External Level: This is the highest level that interacts with users and applications. It defines the way
data is presented to the users and how they access and interact with the database.
This architecture ensures separation between the internal storage details and how the data is presented
and accessed by users, supporting data independence.
4
The unit introduces data modeling, which is a crucial step in developing a database. The Entity-
Relationship (ER) Model is explored as one of the fundamental approaches to designing relational databases.
The ER model helps visualize the database structure through entities, attributes, and relationships. Entities
represent real-world objects (such as employees, departments, or products), attributes define their properties
(like name, ID, or salary), and relationships show how entities are connected (e.g., an employee "works in" a
department).
The unit then discusses constraints in database design, such as key constraints, which ensure unique
identification of records, and participation constraints, which define whether all instances of an entity must
be involved in a relationship. It also covers ER diagrams, which graphically represent database structures.
These diagrams use symbols like rectangles for entities, ovals for attributes, diamonds for relationships, and
lines to connect them.
Another key aspect covered is Extended Entity-Relationship (EER) Modeling, which extends the basic ER
model by incorporating features like generalization, specialization, and aggregation. Generalization
combines multiple entity types into a higher-level entity, whereas specialization creates sub-entities from a
parent entity. Aggregation allows representing relationships between relationships.
The unit concludes by emphasizing the importance of structured database design in ensuring data
consistency, reducing redundancy, and improving database performance. By following a systematic approach,
designers can create databases that are well-organized, scalable, and efficient for real-world applications
Amazon needs to reorganize its database to effectively track user activities, books, sales, and related
information. Based on the given requirements, the following entities and relationships can be identified:
1. User:
o Attributes: UserID (Primary Key), Name, Password, Email, LastVisitDate, WillingToBeSpammed.
2. Book:
o Attributes: ISDN (Primary Key), Title, AuthorName, PublisherName, Cost.
3. Sale:
o Attributes: SaleID (Primary Key), DateOfSale, UserID (Foreign Key), Address (Street, Number, City,
State, Country, Zip Code), TelephoneNumber, CreditCardNumber.
4. Comment:
o Attributes: CommentID (Primary Key), UserID (Foreign Key), BookID (Foreign Key), Rating (1-10),
CommentContent, HelpfulnessPercentage.
5. WishList:
o Attributes: UserID (Foreign Key), BookID (Foreign Key), DateAdded, PurchasedStatus, BoughtBy (if
purchased by a friend).
Relationships:
1. User and Sale: A user can make multiple purchases (1-to-many relationship).
2. User and Comment: A user can comment on multiple books (1-to-many relationship).
5
3. User and WishList: A user can have multiple books on their wish-list (1-to-many relationship).
4. Book and Comment: A book can have multiple comments (1-to-many relationship).
5. Book and WishList: A book can appear in multiple wish-lists (1-to-many relationship).
Assumptions:
The Entity Relationship Diagram (ERD) would include these entities and relationships with
appropriate keys (Primary and Foreign) and constraints like ensuring that a user’s email is unique,
comments are associated with valid books, and wish-list items are tracked for purchases.
1. Customer:
o Attributes: CustomerID (Primary Key), Name, Address, PhoneNumber.
2. Account:
o Attributes: AccountNumber (Primary Key), AccountType (SB, RD, FD), Balance, CustomerID
(Foreign Key).
3. Loan:
o Attributes: LoanID (Primary Key), LoanAmount, LoanType, DateIssued, CustomerID (Foreign Key).
Relationships:
1. Customer and Account: A customer can have one account, but each account is associated with only
one customer (1-to-1 relationship).
2. Customer and Loan: A customer can take out multiple loans, but each loan is linked to one customer
(1-to-many relationship).
Assumptions:
The Entity Relationship Diagram (ERD) would depict these entities, with Customer related to
Account and Loan, enforcing the rules of one account per customer and loans being tied to accounts.
There are several types of users in a DBMS, each with specific roles:
1. End Users: These users directly interact with the database, querying and modifying data for personal or
organizational needs.
6
2. Database Administrators (DBAs): They manage and maintain the database, ensuring security,
performance, and data integrity. They also handle backups and recovery operations.
3. Application Programmers: These users write applications that interact with the database through SQL
queries, allowing data to be used in business applications.
4. System Analysts: They analyze business requirements and design database systems to meet those
needs, working closely with end users and DBAs.
5. Data Analysts: They use DBMS to generate reports and analyze data to support decision-making
processes.
A weak entity is an entity that cannot be uniquely identified by its own attributes alone and relies on a
strong (or owner) entity for identification. It typically has a partial key that, in combination with the
strong entity, forms the full primary key. For example, in a library system, the entity BookCopy might
be a weak entity because it cannot be uniquely identified without referencing the Book entity, as
multiple copies of the same book exist. The primary key of BookCopy might be a combination of the
BookID (strong entity's ID) and the CopyNumber.
Specialization and generalization are concepts used to model hierarchical relationships between
entities.
Specialization: This process involves creating subtypes from a supertype based on some distinguishing
characteristics. For example, a Person entity can be specialized into Employee and Customer, where
each subtype has its own attributes.
Generalization: This is the reverse process, where multiple entities are combined into a supertype
based on common features. For example, Employee and Customer entities can be generalized into a
single Person entity that contains common attributes like name and address.
Extended features of Entity-Relationship Diagrams (ERD) include additional components like Weak
Entities, Aggregation, Generalization, and Specialization. These features allow for the representation
of more complex real-world scenarios, improving the abstraction of data relationships. Additionally,
ERDs can include Cardinality (to indicate the number of instances in a relationship) and Participation
Constraints (to show whether an entity’s participation in a relationship is mandatory or optional).
This structure helps in managing large and complex databases while ensuring data independence.
An attribute is a property or characteristic of an entity. For example, for the entity Employee, typical
attributes include EmployeeID, Name, and Salary. Attributes can be categorized as:
Entities:
1. Employee:
o Attributes: EmployeeID, Name, Position.
2. Project:
o Attributes: ProjectID, ProjectName, Budget.
Relationships:
1. Employee and Project: An employee is assigned to at least one project. This is a 1-to-many
relationship, as multiple employees can work on the same project, but each employee works on at least
one project.
ER Diagram: The diagram would show Employee and Project entities connected by a relationship.
The Employee entity would have an optional connection to Project (indicating the possibility of no
project if on vacation), with the EmployeeID being a foreign key in the project relationship.
A significant portion of the unit is dedicated to Relational Calculus, which is a declarative approach to
querying databases. Unlike relational algebra, which specifies the step-by-step operations to retrieve data,
relational calculus defines the desired results based on conditions.
Additionally, the unit discusses Joins, which are essential for combining data from multiple tables. These
include Natural Join, which automatically joins tables based on common attributes, and Conditional Join,
which merges tables based on specified conditions.
The unit also introduces Structured Query Language (SQL) as the standard language for managing
relational databases. It categorizes SQL commands into DDL (Data Definition Language) for schema
creation and modification, DML (Data Manipulation Language) for inserting, updating, and deleting data,
and DCL (Data Control Language) for access control.
Finally, the unit compares the Relational Model with Network and Hierarchical Models, emphasizing the
advantages of relational databases, such as flexibility, scalability, and ease of data retrieval
Relational algebra is a procedural query language used to query and manipulate relational databases.
It operates on relations (tables) and provides a foundation for relational query languages like SQL.
Relational algebra defines a set of operations that take one or more relations as input and produce a new
relation as output. The main goal of relational algebra is to provide an abstract mechanism for querying
and processing relational data.
1. Select (σ): Selects rows from a relation based on a specified condition. It is similar to the WHERE
clause in SQL.
o Example: σ(age > 25)(Employee) selects employees whose age is greater than 25.
2. Project (π): Projects or selects specific columns from a relation, effectively narrowing down the
number of attributes in the result.
o Example: π(name, age)(Employee) retrieves only the name and age columns from the Employee table.
3. Union (U): Combines two relations with the same schema, including all distinct rows from both
relations.
o Example: Employee U Manager combines the Employee and Manager relations, eliminating duplicates.
4. Set Difference (−): Returns the rows that are in one relation but not in another.
o Example: Employee − Manager returns employees who are not managers.
5. Cartesian Product (×): Combines every row from one relation with every row from another relation.
o Example: Employee × Department combines every employee with every department.
6. Rename (ρ): Renames a relation or its attributes.
o Example: ρ(Department, D)(Department) renames the relation "Department" to "D".
7. Join (⨝): Combines two relations based on a related column, often a foreign key in one relation that
references the primary key in another relation.
9
o Example: Employee ⨝ Department joins employees with their respective departments.
Relational Algebra and Relational Calculus are both query languages used to interact with relational
databases, but they have distinct approaches:
Relational Algebra is a procedural query language, which means that it focuses on describing the
procedure or operations to retrieve the required data. It specifies how to perform operations.
Relational Calculus is a declarative query language, where the user specifies what data they want,
without providing specific steps on how to retrieve it. Relational calculus is more akin to SQL in that
the user focuses on the desired result rather than the process.
While relational algebra operates through set-based operations (like select, project, union, etc.),
relational calculus focuses on logical formulas, typically using quantifiers like for all or there exists.
The expressive power of both relational algebra and relational calculus refers to their ability to express
queries or computations. In essence, both relational algebra and relational calculus are equally powerful
in terms of their ability to represent any query over a relational database.
Despite their differences in syntax and approach, both can express the same set of queries, meaning
they are equally expressive in terms of computational power.
Select (σ) operation filters rows from a relation based on a given condition. It can be thought of as a
way to extract specific records from a database.
o Example: σ(salary > 50000)(Employee) selects all employees with a salary greater than 50,000.
Project (π) operation selects specific columns from a relation, reducing the result to only the needed
attributes, effectively performing a "vertical" slice of the data.
o Example: π(name, age)(Employee) retrieves only the name and age of employees, removing other
attributes like salary or address.
Together, the select and project operations allow users to refine the dataset by filtering rows and
selecting only necessary attributes, providing a focused and efficient query.
A join is a relational operation that combines records from two or more relations based on a related
attribute, typically a primary key in one table and a foreign key in another.
10
Conditional Join (θ-join): A join where the relationship between the tables is defined by a condition,
such as equality or any other condition (e.g., greater than or less than).
o Syntax: R ⨝θ S, where θ is a condition, e.g., R.A = S.B.
o Example: Employee ⨝(Employee.DepartmentID = Department.DepartmentID) Department joins
employees with their respective departments.
Natural Join: A type of join that automatically matches columns with the same name in both relations
and joins them on these common attributes.
o Syntax: R ⨝ N S.
o Example: Employee ⨝ N Department will automatically join the tables on the common column(s),
such as DepartmentID.
Listing Rows: In relational databases, listing rows from a table is done using the SELECT statement.
o Syntax: SELECT column1, column2 FROM table_name WHERE condition;
o Example: SELECT name, salary FROM Employee WHERE age > 30;
Updating Rows: Rows can be updated using the UPDATE statement.
o Syntax: UPDATE table_name SET column1 = value1, column2 = value2 WHERE condition;
o Example: UPDATE Employee SET salary = 60000 WHERE EmployeeID = 101;
The Relational Database Model organizes data into tables, known as relations, which consist of rows
(tuples) and columns (attributes). Each table is identified by a unique name, and each row is identified
by a unique key (usually a primary key). Relationships between tables are maintained through foreign
keys. The relational model supports operations like selection, projection, and joins to manipulate and
retrieve data, ensuring data integrity, security, and consistency.
SQL (Structured Query Language) is the standard language used to interact with relational
databases. It allows users to define, manipulate, and query data in relational databases.
Categories of SQL:
Syntax:
sql
Copy
CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM parent_table
WHERE condition;
Example:
sql
Copy
CREATE VIEW EmployeeSalary AS
SELECT Name, Salary
FROM Employee
WHERE Salary > 50000;
Relational Model:
Data is organized into tables (relations), and relationships between data are established using foreign
keys.
Provides flexibility in query construction and is the most widely used model for databases.
Examples: SQL-based systems like MySQL, PostgreSQL.
Network Model:
Data is organized in a graph structure where entities are nodes and relationships are edges.
More complex than relational model and not as widely used today.
Example: IBM’s IMS (Information Management System).
Hierarchical Model:
Data is organized in a tree-like structure, with each record having a single parent, creating a strict
hierarchy.
It is rigid and does not handle many-to-many relationships as easily.
Example: XML data or early mainframe systems.
In conclusion, while the relational model offers simplicity and flexibility, the network and hierarchical
models are less commonly used due to their complexity and limited ability to handle diverse data
relationships.
12
Unit 04: SQL (DDL)
Unit 4 of the document focuses on Structured Query Language (SQL) and Data Definition Language
(DDL). It begins by introducing SQL as the standard language for interacting with relational databases. SQL
is widely used for defining, manipulating, and retrieving data from databases. The unit provides historical
context, explaining that SQL was initially developed at IBM as SEQUEL and later became the standard
language for database management.
The unit then details DDL (Data Definition Language), which is used for defining and modifying database
structures. The main DDL commands covered include:
CREATE – Used to create database objects like tables, indexes, and views. The syntax for creating a
table involves specifying column names, data types, and constraints.
ALTER – Used to modify existing database structures, such as adding, modifying, or deleting
columns in a table.
DROP – Used to remove database objects, such as tables, permanently.
TRUNCATE – Used to delete all records from a table without affecting its structure.
The document also explores Data Manipulation Language (DML) commands briefly, emphasizing their
role in modifying data rather than defining database structures. The section then transitions into Data Control
Language (DCL) and Transaction Control Language (TCL). DCL commands like GRANT and
REVOKE manage user permissions, while TCL commands like COMMIT, ROLLBACK, and
SAVEPOINT handle transaction management.
Further, the unit introduces SQL constraints, which ensure data integrity. These include:
Lastly, the unit discusses SQL Joins, which allow data retrieval from multiple tables. The major types of
joins covered include:
The unit concludes by emphasizing SQL’s role in database management and its ability to provide a structured
approach to defining, manipulating, and securing relational databases
13
A query is a request made to a database for the purpose of retrieving or manipulating data. It is written
using a specific query language, such as SQL (Structured Query Language). Queries allow users to
interact with the database by performing operations like retrieving specific information, inserting new
records, updating existing records, or deleting records. A query can be simple or complex, depending
on the requirement and the data involved.
SQL (Structured Query Language) is a standard programming language designed for managing and
manipulating relational databases. It is used to interact with a database system, enabling users to
perform operations such as querying, inserting, updating, and deleting data. SQL is essential for
defining the structure of data (using Data Definition Language), manipulating data (using Data
Manipulation Language), and controlling data access (using Data Control Language).
Declarative Nature: SQL allows users to specify what data they want without having to describe how
to retrieve it.
Data Definition and Data Manipulation: SQL allows users to define the structure of databases and
tables (DDL), and perform data operations like insertion, updating, and deletion (DML).
Data Integrity: SQL supports data integrity constraints such as primary keys, foreign keys, and unique
constraints to ensure accurate and consistent data.
Cross-Platform: SQL can be used across different database systems (e.g., MySQL, Oracle,
PostgreSQL), providing portability.
SQL provides various commands to create and delete relations (tables) within a database:
CREATE TABLE: Used to create a new table within the database, specifying the column names, data
types, and constraints.
o Example:
sql
Copy
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
Name VARCHAR(100),
City VARCHAR(50)
);
DROP TABLE: Deletes an entire table from the database, including its data and structure.
o Example:
sql
Copy
DROP TABLE Customers;
ALTER TABLE: Used to modify the structure of an existing table, such as adding, deleting, or
modifying columns.
o Example:
14
sql
Copy
ALTER TABLE Customers ADD Email VARCHAR(100);
4. Basic Structure of SQL Query
The basic structure of an SQL query follows the sequence of operations to retrieve or manipulate data:
Example:
sql
Copy
SELECT Name, Age FROM Employees WHERE Age > 30 ORDER BY Name ASC;
This query retrieves the names and ages of employees who are older than 30, ordered by their names in
ascending order.
SQL supports several set operations to combine the results of two queries. These include:
UNION: Combines the results of two SELECT statements and eliminates duplicates.
o Example:
sql
Copy
SELECT Name FROM Employees
UNION
SELECT Name FROM Managers;
INTERSECT: Returns only the common rows between two SELECT statements.
o Example:
sql
Copy
SELECT Name FROM Employees
INTERSECT
SELECT Name FROM Managers;
EXCEPT: Returns the rows from the first query that are not present in the second query.
o Example:
sql
Copy
SELECT Name FROM Employees
EXCEPT
SELECT Name FROM Managers;
6. Comparison and Logical Operators Supported by SQL
15
SQL supports various comparison and logical operators to perform conditional checks:
Comparison Operators:
o =: Equal to
o != or <>: Not equal to
o >: Greater than
o <: Less than
o >=: Greater than or equal to
o <=: Less than or equal to
o Example: SELECT * FROM Employees WHERE Salary > 50000;
Logical Operators:
o AND: Combines multiple conditions, returns true if both conditions are true.
o OR: Combines multiple conditions, returns true if any condition is true.
o NOT: Reverses the result of a condition.
o Example: SELECT * FROM Employees WHERE Age > 30 AND Salary > 50000;
To order the rows (tuples) of a table in SQL, the ORDER BY clause is used. It sorts the data in either
ascending (ASC) or descending (DESC) order based on one or more columns.
Example:
sql
Copy
SELECT Name, Salary FROM Employees ORDER BY Salary DESC;
This query retrieves employee names and their salaries, ordered by salary in descending order.
sql
Copy
SELECT Department, COUNT(*) FROM Employees GROUP BY Department;
Aggregate Functions: Functions that perform calculations on data and return a single result.
o COUNT(): Returns the number of rows.
o SUM(): Returns the sum of a numeric column.
o AVG(): Returns the average value of a numeric column.
o MIN() and MAX(): Return the minimum and maximum values, respectively.
o Example:
sql
Copy
SELECT Department, AVG(Salary) FROM Employees GROUP BY Department;
HAVING Clause: Filters results after grouping, similar to WHERE, but used for aggregate functions.
o Example:
16
sql
Copy
SELECT Department, AVG(Salary) FROM Employees GROUP BY Department HAVING
AVG(Salary) > 50000;
9. SQL Commands Used for Modifying the Database
sql
Copy
INSERT INTO Employees (Name, Age, Salary) VALUES ('John Doe', 30, 60000);
UPDATE: Modifies existing data in a table.
o Example:
sql
Copy
UPDATE Employees SET Salary = 65000 WHERE Name = 'John Doe';
DELETE: Removes rows from a table.
o Example:
sql
Copy
DELETE FROM Employees WHERE Name = 'John Doe';
10. SQL Query to Find Distinct Customers and Branch Names in Hyderabad
To find the distinct customers and branch names in Hyderabad where the customers have taken loans,
the query can be structured as follows:
sql
Copy
SELECT DISTINCT Customers.Name, Branches.BranchName
FROM Customers
JOIN Loans ON Customers.CustomerID = Loans.CustomerID
JOIN Branches ON Loans.BranchID = Branches.BranchID
WHERE Branches.City = 'Hyderabad';
This query selects the distinct customers' names and the corresponding branch names in Hyderabad
where the customers have taken loans. It uses JOIN to connect the relevant tables and WHERE to filter
the rows by the city 'Hyderabad'.
Unit 5 in the document focuses on SQL Data Manipulation Language (DML) and provides an overview of
the commands and operations used to manage data within a database. The unit introduces essential DML
17
operations such as INSERT, UPDATE, and DELETE, which allow for modifying the contents of database
tables.
The INSERT command is used to add new rows to a table, with specific syntax for inserting data values into
corresponding columns. The UPDATE statement allows for modifying existing data in one or more columns
of a table, with criteria defined by the WHERE clause to ensure precise updates. The DELETE command is
used to remove rows from a table, also governed by the WHERE clause to avoid unintentional data loss.
A key topic covered in the unit is subqueries, which are queries embedded within another SQL query.
Subqueries can be used in various clauses like WHERE, HAVING, and FROM to filter results based on
other queries. Examples demonstrate the use of subqueries with the INSERT, UPDATE, and DELETE
statements to perform operations on data based on complex conditions.
The unit also discusses constraints, such as NOT NULL, UNIQUE, CHECK, and DEFAULT, which
enforce rules on data to maintain integrity and validity. Additionally, views are introduced as a way to
simplify data retrieval by creating virtual tables based on SQL queries, allowing for more efficient data
management and security by restricting direct access to underlying tables.
Overall, Unit 5 emphasizes the importance of DML in the day-to-day operations of a database and how it
plays a crucial role in managing and modifying data to meet business requirements
A subquery in SQL is a query nested within another query, typically within the WHERE, FROM, or
SELECT clause. The purpose of a subquery is to allow the main query to be more efficient by breaking
down complex operations into simpler steps. Subqueries can return a single value (scalar subquery), a
list of values (multi-row subquery), or a table of values (multi-column subquery). The result of the
subquery is used by the main query to filter data, perform calculations, or derive values.
Example:
sql
Copy
SELECT Name, Salary
FROM Employees
WHERE DepartmentID = (SELECT DepartmentID FROM Departments WHERE DepartmentName =
'Sales');
In this example, the subquery (SELECT DepartmentID FROM Departments WHERE
DepartmentName = 'Sales') returns the DepartmentID for the 'Sales' department, which is then used by
the main query to retrieve the employees who belong to that department.
A view in SQL is a virtual table derived from one or more base tables, which is constructed by a query.
It does not store data physically but displays the data from the underlying tables when queried. Views
are used to simplify complex queries, provide data security (by limiting the columns or rows a user can
access), and present a consistent interface for users.
18
Features of Views:
Simplification: Views can encapsulate complex queries into a simple interface, making it easier for
users to query.
Data Security: By restricting access to certain columns or rows, views can hide sensitive information
from unauthorized users.
Consistency: A view ensures that the result set remains consistent across different users or applications
that need the same data.
Updatable: Some views can be updated if they are based on a single table without complex joins or
aggregations.
Example:
sql
Copy
CREATE VIEW EmployeeDetails AS
SELECT Name, Salary, DepartmentID
FROM Employees
WHERE DepartmentID = 1;
This view shows the details of employees from the department with DepartmentID = 1.
In SQL, relations refer to tables in the database. The commands used to create and delete tables
(relations) are:
CREATE TABLE: Used to create a new table in the database with specified columns, data types, and
constraints. Example:
sql
Copy
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
Name VARCHAR(100),
Salary DECIMAL(10, 2)
);
DROP TABLE: Used to delete a table and all its data from the database. Example:
sql
Copy
DROP TABLE Employees;
ALTER TABLE: Used to modify an existing table, such as adding, deleting, or modifying columns.
Example:
sql
Copy
ALTER TABLE Employees ADD Email VARCHAR(100);
4. Basic Structure of SQL Query
19
The basic structure of an SQL query follows a sequence of clauses to perform an operation, such as
selecting data or modifying the database:
Example:
sql
Copy
SELECT Name, Age
FROM Employees
WHERE Age > 30
ORDER BY Age DESC;
This query selects the names and ages of employees older than 30, ordered by age in descending order.
DML (Data Manipulation Language) commands are used to retrieve, insert, update, and delete data
in a database:
sql
Copy
SELECT Name, Salary FROM Employees;
INSERT INTO: Adds new data to a table. Example:
sql
Copy
INSERT INTO Employees (Name, Age, Salary)
VALUES ('John Doe', 28, 50000);
UPDATE: Modifies existing data in a table. Example:
sql
Copy
UPDATE Employees
SET Salary = 60000
WHERE EmployeeID = 1;
DELETE: Removes data from a table. Example:
sql
Copy
DELETE FROM Employees WHERE EmployeeID = 1;
6. Comparison and Logical Operators Supported by SQL
20
SQL supports various comparison operators to compare values and logical operators to combine
conditions:
Comparison Operators:
o =: Equal to
o != or <>: Not equal to
o >: Greater than
o <: Less than
o >=: Greater than or equal to
o <=: Less than or equal to
Example:
sql
Copy
SELECT Name FROM Employees WHERE Salary > 50000;
Logical Operators:
o AND: Combines multiple conditions and returns true if both conditions are true.
o OR: Combines multiple conditions and returns true if any condition is true.
o NOT: Reverses the result of a condition.
Example:
sql
Copy
SELECT Name FROM Employees WHERE Age > 30 AND Salary > 50000;
7. SQL Commands Used for Modifying the Database
sql
Copy
INSERT INTO Employees (Name, Age, Salary)
VALUES ('Jane Doe', 25, 55000);
UPDATE: Updates existing records in a table. Example:
sql
Copy
UPDATE Employees
SET Salary = 70000
WHERE Name = 'Jane Doe';
DELETE: Deletes records from a table. Example:
sql
Copy
DELETE FROM Employees WHERE Name = 'Jane Doe';
8. SQL Query to Find Distinct Customers and Branch Names in Hyderabad Where the
Customers Have Taken Loans
21
To find the distinct customers and branch names of the branches situated in the city "Hyderabad" where
the customers have taken loans, the query can be written as follows:
sql
Copy
SELECT DISTINCT Customers.Name, Branches.BranchName
FROM Customers
JOIN Loans ON Customers.CustomerID = Loans.CustomerID
JOIN Branches ON Loans.BranchID = Branches.BranchID
WHERE Branches.City = 'Hyderabad';
In this query:
Unit 6 in the document discusses Relational Languages with a focus on Tuple Relational Calculus (TRC),
Domain Relational Calculus (DRC), and other querying techniques. The unit explains the significance of
Relational Calculus as a non-procedural query language used to define the desired results based on certain
conditions.
It first introduces Tuple Relational Calculus, which works by specifying a tuple variable that is used to
represent the rows that satisfy the given conditions. The unit elaborates on the semantics of TRC queries,
emphasizing how they are formulated and how they differ from procedural query languages like SQL. It
explains the syntax and the logic behind how tuples are selected based on their attributes and relationships
with other tuples.
Next, the unit delves into Domain Relational Calculus, focusing on its use of domain variables to represent
column values. This differs from TRC as it deals directly with attributes rather than complete tuples. DRC is
illustrated with examples to demonstrate how queries are structured and the use of logical connectors like
AND, OR, and NOT.
The document also covers Query-by-Example (QBE), which is an intuitive method for constructing queries
by providing a visual example rather than writing out a formal query. This section aims to make database
querying more accessible for users who may not be familiar with the technical syntax of query languages.
The unit concludes by discussing aggregate functions, which are used to perform calculations over a set of
tuples, such as SUM, AVG, COUNT, etc. These functions are crucial for summarizing data and are often
used in conjunction with GROUP BY and HAVING clauses to structure queries that retrieve grouped data
with specific conditions
22
Relational calculus and relational algebra are two formal query languages used to retrieve information
from relational databases. While relational algebra provides a procedural approach, where the user
specifies a sequence of operations to retrieve data, relational calculus provides a declarative approach,
focusing on what data to retrieve without specifying how to retrieve it. Relational calculus expresses
queries in terms of logical formulas and set theory, allowing users to describe properties of the result
set rather than the step-by-step procedure to obtain it.
Relational calculus is an alternative to relational algebra because it uses a different paradigm for
querying databases. Relational algebra relies on operations like select, project, union, and join, while
relational calculus focuses on defining the set of data through logical expressions. Both approaches are
equivalent in terms of expressive power; however, relational calculus is often preferred for its
simplicity and closer alignment with mathematical logic.
In tuple relational calculus (TRC), queries are written using a tuple variable that represents a row or
tuple in a relation. The syntax involves defining a tuple variable and specifying the conditions that the
tuples must satisfy. The result is the set of tuples that meet the criteria specified in the query.
less
Copy
{ t | P(t) }
Where:
t is a tuple variable.
P(t) is a condition or predicate that the tuple t must satisfy.
Example:
pgsql
Copy
{ t.Name, t.Salary | Employee(t) AND t.Salary > 50000 }
This query retrieves the Name and Salary of employees from the Employee relation where the salary is
greater than 50,000.
3. How Does Tuple Relational Calculus Differ from Domain Relational Calculus?
Tuple relational calculus (TRC) and domain relational calculus (DRC) are both non-procedural query
languages, but they differ in the way queries are expressed.
Tuple Relational Calculus (TRC) uses tuple variables, which refer to entire rows (or tuples) in a
relation. The query describes the properties of the rows or tuples that satisfy the conditions specified.
Domain Relational Calculus (DRC), on the other hand, uses domain variables that refer to the values
in the individual columns of a relation. In DRC, the query is based on the values in columns (or
domains) rather than entire tuples.
23
In essence, TRC works with tuples as units, while DRC works with individual domain elements
(column values). Both are equivalent in terms of their expressive power but use different forms to
express queries.
In domain relational calculus (DRC), the query is written using domain variables that represent the
values of individual columns. A typical query is structured as follows:
Copy
{ <v1, v2, ..., vn> | P(v1, v2, ..., vn) }
Where:
Example:
pgsql
Copy
{ <v1, v2> | ∃ t (Employee(t) AND t.Name = v1 AND t.Salary = v2 AND v2 > 50000) }
This query retrieves the Name and Salary of employees from the Employee relation where the salary is
greater than 50,000. Here, v1 and v2 are domain variables representing the Name and Salary of the
employee.
Query by Example (QBE) is called a graphical query language because it allows users to specify
database queries using a graphical interface. In QBE, the user fills out a template or grid (often a table
format), and the system translates this graphical input into SQL queries behind the scenes. The user
does not need to know the underlying query language (e.g., SQL) and can simply select tables,
columns, and conditions through a visual interface.
For instance, in QBE, a user might fill in a table with fields corresponding to columns in the database
and leave blanks to specify the conditions or values that they are looking for. The graphical nature of
QBE makes it user-friendly and accessible, especially for those without a technical background.
SQL (Structured Query Language) is a textual, non-graphical language used to query and manipulate
relational databases. SQL requires users to write queries using commands such as SELECT, INSERT,
UPDATE, and DELETE.
Example of SQL:
sql
Copy
SELECT Name, Salary FROM Employees WHERE Salary > 50000;
24
QBE (Query By Example), on the other hand, is a graphical interface used for querying databases,
where the user fills in a template with values or conditions to be queried. It abstracts the SQL syntax
and allows users to work visually by specifying conditions in a grid-like format.
Example of QBE:
The user might fill in the template with "Salary > 50000" under the Salary column and leave the Name
column empty to indicate all employees with salaries greater than 50,000. The system then generates
the corresponding SQL query.
Thus, the main difference lies in the interaction style: SQL is command-based and requires the user to
understand the syntax, while QBE is graphical and more intuitive for users.
7. How to Use Aggregate Functions in SQL? Give Examples to Illustrate Your Answer
SQL provides several aggregate functions to perform calculations on a set of values and return a
single result. Common aggregate functions include COUNT(), SUM(), AVG(), MIN(), and MAX().
Examples:
sql
Copy
SELECT COUNT(*) FROM Employees WHERE Salary > 50000;
This returns the number of employees with salaries greater than 50,000.
sql
Copy
SELECT SUM(Salary) FROM Employees WHERE Department = 'HR';
This returns the total salary of employees in the HR department.
sql
Copy
SELECT AVG(Salary) FROM Employees;
This returns the average salary of all employees.
MIN() and MAX(): Return the minimum and maximum values in a column.
sql
Copy
SELECT MIN(Salary), MAX(Salary) FROM Employees;
8. Entity-Relationship Diagram for Small Computer Business Firm
In this case, the entities and relationships in the small computer business firm can be as follows:
25
1. Entities:
o Employee: EmployeeNo, Name, Address, PhoneNo, JobTitle, Salary.
o Machine: Model, Specs, Name, QuantityOnHand.
o Part: PartName, Price, QuantityOnHand.
o Supplier: SupplierName, Address, PhoneNo.
o Customer: CustomerName, Address, PhoneNo, CreditLimit (if applicable).
o Sale: SaleDate, Quantity, TotalPrice, CustomerID, EmployeeID.
2. Relationships:
o An Employee assembles Machines.
o Machines consist of Parts.
o Parts are ordered from Suppliers.
o Sales involve Customers and are processed by Employees.
o Customer may have a credit limit if they are a credit customer.
The Entity-Relationship Diagram (ERD) will link these entities with their respective attributes and
define the relationships between them, such as one-to-many relationships between customers and sales
or employees and the machines they assemble.
Unit 7 of the document focuses on Relational Database Design, emphasizing the principles and techniques
used to create an efficient and well-structured database. It begins by explaining the importance of
Normalization, a process aimed at reducing data redundancy and improving data integrity. Normalization is
achieved through Normal Forms (NF), including First Normal Form (1NF), Second Normal Form (2NF),
Third Normal Form (3NF), Boyce-Codd Normal Form (BCNF), and higher levels such as Fourth Normal
Form (4NF) and Fifth Normal Form (5NF).
The unit explains Functional Dependencies (FDs), which describe relationships between attributes in a
relational database. These dependencies help identify redundant data and serve as the foundation for
normalization. It also covers Multivalued Dependencies (MVDs) and how they affect database design.
The unit further explores Decomposition, a process of breaking down a large relation into smaller, more
manageable tables while preserving lossless join and dependency preservation. This ensures that no
information is lost and that the database remains functional.
Another key aspect discussed is Denormalization, which is sometimes necessary to improve query
performance by reintroducing some redundancy. While normalization focuses on minimizing redundancy,
denormalization is used strategically to enhance database efficiency.
The unit concludes by discussing Database Anomalies, such as insertion, update, and deletion anomalies,
which arise when a database is not properly normalized. It emphasizes that a well-designed relational database
balances efficiency, integrity, and performance
26
Data redundancy refers to the unnecessary repetition of data within a database, often resulting in
inefficiencies and various problems. These issues include:
Wasted Storage Space: Storing duplicate copies of the same information increases the overall storage
requirements, leading to inefficient use of resources. For example, storing the same address for multiple
customers in a customer database increases the overall data storage needed.
Data Inconsistency: When redundant data is updated in one place but not in others, it leads to
inconsistencies. For instance, if a customer’s address is stored in several places and one entry is
updated while others are not, this creates discrepancies in the database.
Increased Maintenance Effort: Redundant data requires more effort to maintain and update. Any
modification in the redundant data must be reflected in all the places where it is stored, increasing the
complexity and potential for errors.
Example: A customer database stores the same phone number and address multiple times for various
orders made by the same customer. If the customer’s address changes, every instance of that address
must be updated, or else data inconsistency will occur.
A functional dependency is a relationship between two sets of attributes in a relation where one set of
attributes determines the value of another set. It means that for any two tuples in a relation, if they agree
on the values of one set of attributes, they must also agree on the values of another set.
Example: In a student database, if we have attributes Student_ID and Name, then we can say that
Student_ID → Name because each student’s ID uniquely determines their name. If you know a
student's ID, you can always find their corresponding name.
Relational databases are characterized by several key features that distinguish them from other types of
databases:
Tables (Relations): Data is organized into tables (relations), where each table consists of rows (tuples)
and columns (attributes). Each table represents a distinct entity or relationship.
Primary Key: Each table has a primary key that uniquely identifies each row. This ensures that no two
rows are identical.
Foreign Key: Foreign keys are used to establish relationships between different tables by referencing
primary keys from other tables.
Data Integrity: Relational databases enforce various integrity constraints such as entity integrity (each
row must be unique) and referential integrity (foreign keys must match valid primary keys).
Normalization: Data in relational databases is typically normalized to eliminate redundancy and ensure
that it adheres to normal forms, making it more efficient and easier to maintain.
Structured Query Language (SQL): SQL is used to interact with relational databases for tasks like
querying, updating, and managing data.
27
To reduce data redundancy, relational databases use techniques such as normalization and
decomposition:
Normalization: This is the process of structuring a database to minimize redundancy by ensuring that
each piece of data is stored in only one place. It involves organizing attributes and dividing tables into
smaller ones according to certain rules (e.g., 1NF, 2NF, 3NF).
Decomposition: This involves breaking large, redundant tables into smaller, more manageable ones to
eliminate repetition. For example, if a table stores both customer information and order details, it can
be decomposed into separate Customer and Order tables.
Example: In a table containing customer and order information, redundancy occurs if the same
customer details (address, phone number) are repeatedly stored with each order. By splitting the table
into two—one for customers and one for orders—the redundancy is reduced.
5. Differences Between Third Normal Form (3NF) and Boyce-Codd Normal Form (BCNF)
Both 3NF and BCNF are higher forms of normalization that eliminate redundancy, but they have subtle
differences:
3NF: A relation is in 3NF if it is in 2NF and no transitive dependencies exist, i.e., non-prime attributes
are not dependent on other non-prime attributes.
Example: A table that stores student data (student_id, student_name, student_dob, advisor_name)
violates 3NF if advisor_name depends on student_id through another attribute like advisor_id. To bring
it to 3NF, we separate advisor details into a different table.
Example: A table storing books and authors might violate BCNF if an author’s name determines the
book title, but the author is not a key.
A relation table is subjected to advanced normalization to achieve a higher degree of data integrity,
reduce redundancy, and eliminate update anomalies such as insertion, deletion, and modification
anomalies. By applying normalization rules such as 1NF, 2NF, 3NF, and BCNF, we ensure that each
piece of data is represented only once, making the database more efficient to update and query.
For example, a database in 1NF may still contain repeating groups of attributes or transitive
dependencies, so it is subject to further normalization into 2NF, 3NF, and BCNF to handle these issues.
7. Define Multivalued Dependencies. Give Examples. Explain How Are They Eliminated?
A multivalued dependency occurs when one attribute determines another set of attributes, but not
necessarily in a one-to-one manner. This kind of dependency causes redundancy because a value in one
column leads to multiple values in other columns.
28
Example: A table that stores employees and their skills can have multiple skills for each employee. If
Employee_ID →→ Skill (where →→ represents a multivalued dependency), then for each employee,
there will be multiple rows for each skill.
To eliminate this, the relation can be decomposed into two separate relations: one for employee details
and one for skills, ensuring each attribute in a relation depends on the key and not on other non-prime
attributes.
8. Disadvantages of Normalization
While normalization helps eliminate redundancy and ensures data integrity, it can also have some
disadvantages:
Performance Issues: Highly normalized databases can result in multiple table joins, which may reduce
query performance, especially in large datasets.
Complex Queries: With multiple tables created to eliminate redundancy, the queries become more
complex, requiring more joins and potentially slower performance.
Over-Normalization: Excessive normalization can lead to overly fragmented data models, which may
hinder the ease of data retrieval and reporting.
To convert it to 3NF, we would separate the Manager information into a new table linked by
Department.
10. Normalization is the Process of Refining the Design of Relational Tables to Minimize Data
Redundancy
Normalization is the process of organizing the data in a database to minimize redundancy and
dependency. By dividing large tables into smaller, more manageable ones, and establishing
relationships between them, normalization ensures that data is stored only once, reducing the chance of
anomalies. It also makes the database easier to maintain and update, as changes to one attribute need
only be made in one place.
11. A Relation R is Said to be in the First Normal Form (1NF) If and Only If Every Attribute
Contains Atomic Values Only
A relation is in 1NF if it does not contain repeating groups or arrays and all attributes contain atomic
values. This means each field in the table must hold a single value, not a set of values or lists.
Example: Consider the relation Student_ID, Student_Name, Subjects. If the Subjects column contains
multiple values like "Math, Science", it violates 1NF. To convert this into 1NF, we would separate the
subjects into individual rows, so each row contains one subject for a student.
29
Unit 08: Transaction Management
Unit 8 of the document focuses on Transaction Management, an essential concept in database systems that
ensures data consistency, reliability, and integrity during concurrent access. It begins by introducing the
concept of a transaction, defining it as a logical unit of work that consists of one or more database operations
such as retrieval, insertion, deletion, and updating of data.
The unit then explores the Transaction States, outlining the different states a transaction can go through,
including Active, Partially Committed, Failed, Aborted, and Committed. Each state represents a different
stage in the execution of a transaction, and the transition between these states is controlled to maintain
database consistency.
A critical aspect discussed is the ACID Properties (Atomicity, Consistency, Isolation, Durability), which are
fundamental in ensuring reliable database transactions:
Atomicity ensures that a transaction is either fully completed or fully rolled back.
Consistency guarantees that the database remains in a valid state before and after the transaction.
Isolation prevents concurrent transactions from interfering with each other.
Durability ensures that committed transactions are permanently recorded in the database even in the
event of a system failure.
The unit also explains Implementation of Atomicity and Durability, where recovery techniques such as
Log-based Recovery and Shadow Paging are introduced. These mechanisms help maintain data integrity in
case of a system crash.
Another key topic covered is Concurrent Execution of Transactions, which improves system performance
by allowing multiple transactions to execute simultaneously. However, concurrency can lead to conflicts such
as Lost Updates, Dirty Reads, and Inconsistent Analysis, which need to be managed through Concurrency
Control Mechanisms.
Finally, the unit discusses Serializability and Recoverability, which are crucial for ensuring correct
transaction execution. Serializability ensures that concurrent transactions produce the same results as if they
were executed sequentially, while recoverability ensures that committed transactions do not depend on
aborted ones.
Overall, Unit 8 highlights the importance of transaction management in database systems, ensuring data
consistency, handling concurrent transactions, and implementing recovery mechanisms
A transaction in a database system is a logical unit of work that consists of one or more operations,
such as reading or writing data, that must be executed as a single unit. The properties that define
transactions are often referred to as the ACID properties:
30
Atomicity: This ensures that all operations within a transaction are completed successfully. If any part
of the transaction fails, the entire transaction is aborted and the database is reverted to its original state.
It ensures "all or nothing" behavior.
Consistency: This ensures that a transaction transforms the database from one consistent state to
another. It must adhere to predefined rules, such as constraints, triggers, and other integrity rules.
Isolation: This ensures that the operations of one transaction are isolated from others. Even if multiple
transactions are executed concurrently, the result is as if they were executed sequentially, ensuring no
transaction's intermediate states are visible to others.
Durability: This guarantees that once a transaction is committed, the changes are permanent and will
survive any subsequent system failures. The changes made are written to a durable medium such as a
disk.
Transactions: A transaction is a sequence of database operations that performs a logical unit of work.
Transactions typically involve operations like insert, delete, and update. Each transaction must be
executed in its entirety or not at all to ensure database consistency.
Schedules: A schedule is a sequence of operations from different transactions that may be executed
concurrently. A schedule ensures the correct execution of operations while maintaining the ACID
properties. Schedules can be classified as serial or non-serial. A serial schedule executes transactions
one after the other, while a non-serial schedule interleaves operations from different transactions. The
key goal of scheduling is to ensure that the concurrent execution of transactions does not violate the
consistency of the database.
Lock-based concurrency control is a mechanism used to ensure that transactions are executed in a
way that prevents conflicts and ensures data integrity during concurrent execution. It works by locking
data items to restrict access by other transactions while they are being used. There are two main types
of locks:
Shared Lock (S-lock): This allows multiple transactions to read a data item but prevents any
transaction from writing to it. Multiple transactions can acquire shared locks on the same data item
simultaneously.
Exclusive Lock (X-lock): This prevents any other transaction from reading or writing to a data item.
Only one transaction can acquire an exclusive lock on a data item at any given time.
Lock-based protocols, such as Two-Phase Locking (2PL), ensure that once a transaction releases a
lock, it cannot acquire new locks, preventing deadlocks and ensuring serializability.
4. ACID Properties
ACID stands for the four key properties that ensure reliable processing of database transactions:
Atomicity: Ensures that all operations in a transaction are completed; if any operation fails, the
transaction is rolled back to its initial state.
Consistency: Guarantees that a transaction moves the database from one valid state to another,
adhering to all database rules such as constraints and triggers.
Isolation: Ensures that transactions do not interfere with each other. It provides a mechanism to make
transactions appear as if they were executed serially, even if they are executed concurrently.
31
Durability: Once a transaction is committed, its changes are permanent and are stored in a durable
medium. The data will persist even in the event of a system crash.
These properties are crucial for ensuring the integrity and correctness of the database system, especially
when dealing with concurrent operations.
Improved System Performance: By allowing multiple transactions to run concurrently, the system
can achieve better throughput, leading to more efficient utilization of resources like CPU and memory.
Better Resource Utilization: In systems with multiple users, allowing transactions to execute
concurrently helps in keeping the system resources engaged and reduces idle times.
Responsiveness: Concurrency allows the system to remain responsive and handle multiple user
requests simultaneously, which is essential in systems with high workloads.
However, concurrent execution also introduces challenges like the need for proper synchronization,
conflict resolution, and maintaining the consistency of the database.
The transaction state diagram represents the various states a transaction can be in during its life
cycle. The typical states are:
The transaction moves through these states based on its execution and system responses.
In SQL, transactions are controlled using commands like BEGIN TRANSACTION, COMMIT, and
ROLLBACK. Some key characteristics under programmer control include:
Access Modes: These define the type of operations a transaction can perform. The modes include:
o Read-only: Allows a transaction to only read data but not modify it.
o Read-write: Allows a transaction to both read and modify data.
Isolation Levels: The isolation level defines the extent to which a transaction's operations are isolated
from others. Common isolation levels include:
o Read Uncommitted: Transactions can read data modified by other uncommitted transactions.
o Read Committed: Transactions can only read committed data, preventing dirty reads.
o Repeatable Read: Transactions can read data that other transactions have committed but prevent non-
repeatable reads.
32
o Serializable: The highest isolation level, ensuring complete isolation and preventing other transactions
from accessing data until the current transaction is complete.
A transaction’s state is a reflection of its current execution stage. The states in a transaction life cycle
include:
Active: The transaction is in progress and no decisions have been made yet regarding its final outcome
(commit or abort).
Partially Committed: The transaction has executed its operations but has not yet committed.
Committed: The transaction has completed successfully, and all changes made during the transaction
are permanent.
Failed: The transaction has encountered an error, preventing it from completing successfully.
Aborted: The transaction was rolled back, and the database is restored to its initial state before the
transaction began.
These states help in managing the transaction lifecycle and ensuring that operations are conducted in a
safe, predictable manner.
However, concurrent execution introduces challenges such as data inconsistencies and conflicts, which
must be carefully managed using techniques like locking, transactions, and isolation levels to ensure
correctness and consistency.
Unit 9 of the document discusses Concurrency Control, which is crucial for maintaining data consistency
and ensuring that multiple transactions can execute simultaneously without conflicts. It introduces the need
for concurrency control, explaining that simultaneous transactions can lead to issues like lost updates, dirty
reads, and inconsistent data retrieval.
The unit elaborates on Lock-Based Protocols, which use locks to control access to database resources. These
include:
33
Shared Locks (S-Locks) – Allow multiple transactions to read but not modify data.
Exclusive Locks (X-Locks) – Permit only one transaction to read and write the data at a time.
Additionally, the document covers the Two-Phase Locking (2PL) Protocol, which enforces strict rules to
prevent conflicts. This protocol ensures that once a transaction releases a lock, it cannot acquire any new ones,
preventing deadlocks and maintaining serializability.
Another key topic is Timestamp-Based Protocols, where each transaction is assigned a unique timestamp to
dictate the execution order. These protocols help prevent conflicts by ensuring older transactions are executed
before newer ones when accessing the same data.
The unit also explains Validation-Based Protocols, which validate transactions before committing them.
Transactions go through three phases: Read Phase, Validation Phase, and Write Phase, ensuring they do
not interfere with other transactions before finalizing changes.
Lastly, the document covers Deadlock Handling, a critical issue in concurrency control. It discusses different
deadlock prevention, detection, and recovery techniques, ensuring the system remains responsive and
transactions are processed efficiently.
Overall, Unit 9 emphasizes the importance of concurrency control mechanisms in preventing data corruption
and ensuring efficient multi-user database operations
Lock-based protocols are used in database systems to manage concurrency control and ensure that
transactions are executed in a way that preserves the integrity and consistency of the database. These
protocols prevent conflicts between transactions by controlling access to data items through locks,
ensuring that multiple transactions can access the data concurrently without violating the database's
consistency.
Shared Lock (S-Lock): This lock allows multiple transactions to read the data but prevents any
transaction from modifying the data. When a transaction holds a shared lock on a data item, other
transactions can also acquire shared locks but cannot write to the data item.
Exclusive Lock (X-Lock): This lock prevents any other transaction from either reading or writing the
data. Only one transaction can hold an exclusive lock on a data item at a time. When a transaction holds
an exclusive lock on a data item, no other transaction can access that item.
These locks help maintain serializability, the highest level of isolation in database transactions, by
ensuring that concurrent transactions do not interfere with one another in a way that would compromise
the database’s integrity.
34
Validation-based protocols are a type of concurrency control mechanism that operates by validating
transactions before committing them. There are three distinct phases involved in a validation-based
protocol:
1. Read Phase: During this phase, a transaction reads the database and executes its operations. It is
allowed to access and modify the data, but it is not yet validated.
2. Validation Phase: After the transaction has completed its operations, it enters the validation phase,
where the system checks whether the transaction is in conflict with any other transaction. If no conflicts
are detected, the transaction is allowed to commit; otherwise, it is rolled back.
3. Write Phase: If the transaction passes validation, it proceeds to the write phase where its changes are
permanently written to the database. If the transaction fails validation, it is aborted, and the changes are
discarded.
This protocol ensures that transactions are only committed if they do not violate the consistency of the
database.
Lock-based concurrency control is a method that uses locks to manage access to data in a multi-user
database system. The primary goal of lock-based protocols is to prevent conflicting operations on the
same data by multiple transactions. These protocols work by assigning locks to data items whenever a
transaction wants to read or write to a data item.
The two primary types of locks are shared locks and exclusive locks, as explained earlier. The most
common lock-based protocol is the two-phase locking protocol (2PL), which ensures that transactions
acquire all the locks they need before they release any locks. This ensures that once a transaction starts
releasing locks, no new locks can be acquired, guaranteeing serializability.
Deadlock is a potential issue in lock-based concurrency control. This occurs when two or more
transactions hold locks on resources and are waiting for each other to release locks. Various strategies,
such as deadlock detection, prevention, or avoidance, are used to handle deadlocks in database
systems.
4. ACID Properties
The ACID properties define a set of guarantees that a database management system must adhere to in
order to ensure transaction reliability. These properties are:
Atomicity: This ensures that a transaction is treated as a single unit of work. Either all operations
within the transaction are executed, or none of them are. If a transaction fails, the database is rolled
back to its original state, ensuring no partial updates.
Consistency: This ensures that a transaction takes the database from one consistent state to another.
After a transaction, the database must satisfy all integrity constraints and business rules.
Isolation: This ensures that the operations of one transaction are not visible to other transactions until
the transaction is completed (committed). This prevents interference between concurrent transactions.
Durability: Once a transaction has been committed, its changes are permanent, even if the system
crashes immediately after. This ensures that the data is not lost.
Together, these properties ensure that transactions are executed reliably and that the database remains
in a valid state throughout the process.
35
5. Why We Need Concurrent Execution of Transactions
Improved Resource Utilization: In multi-user environments, concurrent execution ensures that CPU,
memory, and I/O resources are used efficiently. It avoids idle times and maximizes throughput.
Increased System Throughput: By allowing multiple transactions to be processed simultaneously, the
system can handle more transactions in less time, improving overall performance.
Enhanced User Responsiveness: Concurrent execution allows the system to respond quickly to user
requests without making them wait for other transactions to complete.
Despite its benefits, concurrent execution introduces challenges such as data inconsistency and the need
for careful management of concurrency control mechanisms like locks.
The Strict Two-Phase Locking (Strict 2PL) protocol is a variation of the two-phase locking protocol
that guarantees serializability and ensures no cascading rollbacks. In strict 2PL:
A transaction can hold locks during its execution phase and must release them only after the transaction
has been completed (either committed or aborted).
The protocol enforces that transactions release their locks only after they have committed. This ensures
that if a transaction fails, it does not affect other transactions that have not yet committed, thus avoiding
cascading rollbacks.
Strict 2PL ensures serializability and maintains consistency in the database by ensuring that
transactions are executed in a controlled, isolated manner.
In SQL, the programmer has control over certain transaction characteristics to ensure that transactions
behave in a desired manner:
Access Modes: Access modes define how a transaction interacts with data. Common access modes
include read-only (where transactions can only read data) and read-write (where transactions can read
and modify data).
Isolation Levels: SQL provides different isolation levels to control the visibility of transactions. These
levels include:
o Read Uncommitted: Transactions can read uncommitted data from other transactions, leading to dirty
reads.
o Read Committed: Transactions can only read committed data, preventing dirty reads.
o Repeatable Read: Ensures that if a transaction reads a data item, it will see the same value if it reads it
again.
o Serializable: The highest level of isolation, ensuring that transactions are fully isolated from each
other.
These characteristics allow the programmer to fine-tune the behavior of transactions to meet the
specific needs of the application.
Active: A transaction is in progress and has not yet reached a decision about whether it will commit or
abort.
Partially Committed: After the transaction has executed its operations, it enters the partially
committed state, indicating that it has completed its work but has not yet been committed to the
database.
Committed: Once a transaction has successfully completed and all changes are written to the database,
it enters the committed state. The changes are now permanent.
Failed: If a transaction encounters an error or cannot proceed due to some issue, it enters the failed
state.
Aborted: If a transaction fails or is rolled back, it enters the aborted state. The system undoes the
changes made during the transaction and restores the database to its original state.
Concurrent execution refers to the simultaneous execution of multiple transactions in a system. The
motivation behind concurrent execution includes:
Increased Throughput: By executing multiple transactions concurrently, the system can process more
transactions in a shorter time, improving overall system performance.
Better Utilization of Resources: It maximizes the use of system resources like memory and CPU,
ensuring that they are not sitting idle while waiting for other transactions to complete.
Enhanced Responsiveness: For systems with multiple users, concurrent execution allows for a quicker
response time as users can interact with the system simultaneously without waiting for other tasks to
complete.
However, concurrency also brings challenges like ensuring transaction isolation, preventing conflicts,
and maintaining database consistency.
The Thomas Write Rule is a concurrency control technique used in transaction management to
prevent lost updates and ensure serializability. It states that a write operation by a transaction can be
ignored if it is based on obsolete data, i.e., if the data has been modified by another transaction that
has committed after the current transaction began.
This rule helps prevent unnecessary writes and ensures that transactions read the most recent committed
data. It effectively prevents issues such as cascading rollbacks by allowing the system to ignore writes
that would not result in a valid state. It is particularly useful in systems where transactions may conflict
and helps in maintaining database consistency.
Unit 10 of the document covers SQL Data Control Language (DCL) and Transaction Control Language
(TCL), which are crucial components of database security and transaction management. The unit begins with
an introduction to Structured Query Language (SQL) and its classification into DDL (Data Definition
37
Language), DML (Data Manipulation Language), DCL (Data Control Language), and TCL
(Transaction Control Language).
GRANT – Used to provide specific privileges to users, such as SELECT, INSERT, UPDATE, or
DELETE.
REVOKE – Used to withdraw previously granted privileges from users.
A comparison between GRANT and REVOKE is also provided to highlight their differences and
applications in access control.
The unit further explores aggregate functions, which are used to perform calculations on a set of values and
return a single result. The main aggregate functions covered include:
Next, the unit discusses numeric and character functions in SQL, including:
The unit concludes by emphasizing the importance of DCL in database security and TCL in ensuring
transactional consistency, reinforcing best practices for managing database access and maintaining data
integrity
DCL (Data Control Language) commands are SQL commands that control access to data within a
database. They are used to grant or revoke user permissions and privileges on database objects, such as
tables, views, and procedures. These commands ensure that only authorized users can perform specific
operations, enhancing the security and integrity of the database. The two primary DCL commands are:
GRANT: Provides specific privileges to users or roles, allowing them to perform certain operations on
database objects (e.g., SELECT, INSERT, UPDATE, DELETE).
38
REVOKE: Removes previously granted privileges, thereby restricting a user or role from performing
specific operations on the database objects.
DCL commands are essential for managing user roles and permissions, ensuring that only authorized
individuals can perform actions that impact the database.
The GRANT command in SQL is used to assign specific privileges (such as SELECT, INSERT,
UPDATE, DELETE) on database objects (like tables or views) to users or roles. The basic syntax for
the GRANT command is as follows:
sql
Copy
GRANT privilege_type ON object TO user;
For example, to grant a user named "john" the ability to SELECT and UPDATE records from the
"employees" table, the following SQL query would be used:
sql
Copy
GRANT SELECT, UPDATE ON employees TO john;
This command allows the user "john" to read (SELECT) and modify (UPDATE) the data in the
"employees" table. Additionally, privileges can be granted with the WITH GRANT OPTION, which
allows the recipient to grant the same privileges to other users. This is useful in scenarios where
administrative control needs to be delegated to users.
The REVOKE command is used to remove or withdraw previously granted privileges from a user or
role on database objects. It ensures that a user no longer has access to perform specific operations on a
database object. The syntax for the REVOKE command is as follows:
sql
Copy
REVOKE privilege_type ON object FROM user;
For example, if we want to revoke the SELECT privilege from the user "john" on the "employees"
table, the following SQL query would be used:
sql
Copy
REVOKE SELECT ON employees FROM john;
This query removes the SELECT privilege from the user "john," meaning they can no longer read data
from the "employees" table. It's important to note that the REVOKE command does not affect
privileges granted through the WITH GRANT OPTION unless explicitly revoked.
4. What are the differences between DDL, DML, and DCL commands?
DDL (Data Definition Language): DDL commands are used to define and manage database structures
such as tables, indexes, views, and schemas. These commands modify the structure of the database and
39
include operations like CREATE, ALTER, and DROP. DDL commands generally do not affect the
data within the database but focus on its organization and layout.
Example:
sql
Copy
CREATE TABLE employees (id INT, name VARCHAR(50), age INT);
DML (Data Manipulation Language): DML commands are used to manipulate data within the
database. These commands deal with the retrieval, insertion, updating, and deletion of data in tables.
The most common DML commands are SELECT, INSERT, UPDATE, and DELETE.
Example:
sql
Copy
INSERT INTO employees (id, name, age) VALUES (1, 'Alice', 30);
DCL (Data Control Language): DCL commands are used to control access to data in a database.
These commands manage user permissions and privileges. The primary DCL commands are GRANT
and REVOKE, which grant or revoke specific privileges to or from users.
Example:
sql
Copy
GRANT SELECT ON employees TO john;
The key difference is that DDL is concerned with database structure, DML is used for manipulating
data, and DCL is for controlling access to the database.
Aggregate functions are used in SQL to perform calculations on a set of values and return a single
result. These functions are often used with the GROUP BY clause to group rows based on specific
columns. Here are five common aggregate functions:
1. COUNT(): Returns the number of rows that match a specified condition. Example:
sql
Copy
SELECT COUNT(*) FROM employees WHERE age > 30;
2. SUM(): Returns the total sum of a numeric column. Example:
sql
Copy
SELECT SUM(salary) FROM employees;
3. AVG(): Returns the average value of a numeric column. Example:
sql
Copy
SELECT AVG(age) FROM employees;
40
4. MAX(): Returns the maximum value from a column. Example:
sql
Copy
SELECT MAX(salary) FROM employees;
5. MIN(): Returns the minimum value from a column. Example:
sql
Copy
SELECT MIN(age) FROM employees;
These functions are useful for summarizing data and performing statistical analysis on a database.
The GROUP BY and ORDER BY clauses are both used to organize the result sets in SQL, but they
serve different purposes:
GROUP BY: This clause is used to group rows that have the same values in specified columns into
aggregated data, such as sums or averages. It is typically used with aggregate functions like COUNT,
SUM, AVG, etc. The grouping of rows happens before any sorting or display.
Example:
sql
Copy
SELECT department, AVG(salary) FROM employees GROUP BY department;
ORDER BY: This clause is used to sort the result set based on one or more columns, either in
ascending (ASC) or descending (DESC) order. It does not change how data is grouped but simply sorts
the rows in the result set after they have been selected.
Example:
sql
Copy
SELECT name, salary FROM employees ORDER BY salary DESC;
In summary, GROUP BY is used to group data for aggregation, while ORDER BY is used to sort the
result set.
DCL commands play a critical role in database security and access control. By controlling who can
access the data and what actions they can perform, DCL commands help ensure that the database is
secure from unauthorized access and potential misuse. The GRANT command allows database
administrators to assign specific privileges to users or roles, while the REVOKE command allows
them to remove these privileges when necessary. This allows for fine-grained control over data access,
ensuring that only authorized individuals can perform operations such as reading, inserting, updating,
or deleting data. Proper use of DCL commands helps maintain data integrity, confidentiality, and
security within the database system.
41
Unit 11: Recovery Systems
Unit 11 of the document focuses on Recovery Systems in database management, discussing methods to
restore a database after a failure. The unit introduces the concept of Crash Recovery, explaining how a
database system must be prepared to handle failures from hardware, software, or environmental causes like
power outages. It emphasizes the need to ensure atomicity (completeness of transactions) and durability
(retention of committed transactions), even after a system crash.
A key method for ensuring data integrity is Log-based Recovery, where changes made by a transaction are
logged to allow for undoing uncommitted transactions and ensuring durability for committed ones. The unit
outlines the ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) model, detailing its
mechanisms for handling recovery through transaction logs and checkpoints.
The unit further explores Buffer Management, which ensures efficient management of memory used to store
data temporarily during transaction processing. Failure with Loss of Non-Volatile Storage is also discussed,
illustrating how data can be recovered even if the system loses permanent storage.
Finally, the unit highlights the importance of Backup and Recovery strategies, noting that proper backups are
essential to ensure high availability and minimize downtime. By explaining these techniques, the unit
emphasizes the crucial role of recovery systems in maintaining data consistency and availability in databases
1. Define Recovery.
Recovery in the context of database management systems refers to the process of restoring the database
to a consistent state after a failure. Failures could include system crashes, disk failures, or transaction
failures that prevent the system from operating as intended. The goal of recovery is to ensure that the
database reflects all the changes made by committed transactions while undoing the effects of
transactions that were not committed before the failure. Recovery typically involves the use of
transaction logs, backups, and various algorithms to ensure that data is consistent and accurate after a
failure.
2. Describe ARIES.
ARIES (Algorithms for Recovery and Isolation Exploiting Semantics) is a widely used recovery
algorithm in database systems. ARIES provides a robust, efficient, and comprehensive mechanism for
ensuring database consistency, even in the event of system crashes. ARIES is based on the Write-
Ahead Logging (WAL) protocol, meaning that before any data is written to the database, the changes
must first be recorded in the log.
Analysis: This phase identifies the transactions that were active at the time of the crash and determines
the state of the database, identifying the recovery point.
Redo: In this phase, all operations are redone from the log to ensure that committed transactions are
reflected in the database.
42
Undo: Transactions that were not committed before the crash are undone using the log, which
guarantees that no partial or incomplete operations are left in the database.
ARIES uses a combination of techniques like logging, checkpoints, and consistent record keeping to
efficiently handle recovery after a crash.
A transaction failure occurs when a transaction cannot complete its intended operations successfully.
This can happen due to various reasons:
Application failure: This may occur if there is a bug in the application that caused the transaction to
fail.
System crash: A failure due to hardware or software issues leading to an unexpected system shutdown
while a transaction is still in progress.
Transaction constraints violation: If a transaction violates integrity constraints (e.g., a foreign key
violation), it cannot commit.
Deadlock: If a transaction cannot proceed because it is waiting for resources held by another
transaction, it may fail after being aborted by the system.
Transaction failure requires recovery mechanisms to undo any changes made by the failed transaction
to maintain database consistency.
System Crash: A system crash occurs when the operating system or database management system
(DBMS) fails due to software errors, hardware malfunctions, or other issues. The database may be
partially updated, but the hardware remains intact. Recovery typically involves redoing committed
transactions and undoing uncommitted ones.
Disk Failure: A disk failure happens when there is physical damage to the storage medium where the
database is stored, such as hard drive failure. This can lead to data corruption or loss. Disk failure
typically requires hardware repair, and recovery may involve restoring the database from a backup or
using a redundant storage system like RAID.
While both are failures, system crashes are typically recoverable through software mechanisms,
whereas disk failures may require hardware repair and possibly complete data restoration from
backups.
Stable-storage refers to a storage medium that can survive system failures, such as power outages or
crashes, and is used for logging and recovery purposes. It is often implemented using a combination of
techniques to ensure durability and persistence. Common implementations include:
Mirrored disks: Two separate disks are used to store data, ensuring that even if one disk fails, the
other holds the data. This redundancy ensures data is not lost in case of hardware failure.
RAID (Redundant Array of Independent Disks): This involves combining multiple disk drives to
improve data redundancy and performance, with several RAID levels offering different trade-offs
between performance and fault tolerance.
43
Write-Ahead Logging (WAL): In conjunction with the above, write-ahead logs are used to ensure that
data is written to a stable storage before it is updated in the database, providing durability and allowing
recovery from system crashes.
By combining these techniques, stable-storage ensures that database logs and critical data are not lost
during failures, which is crucial for transaction recovery.
Log-Based Recovery is a method used to restore the database to a consistent state after a failure. It
relies on maintaining a transaction log, which records all changes made to the database during
transaction execution. The log ensures that even if the system crashes, it can be used to replay
committed transactions and undo incomplete or failed transactions.
Write-Ahead Logging (WAL): Before modifying the database, a log entry describing the change is
written to a log file. This ensures that in the event of a crash, the changes can be either redone or
undone based on the log.
Redo Phase: After a crash, the log is analyzed to identify the transactions that were committed but
whose changes were not written to the database. These transactions are then reapplied to the database.
Undo Phase: If a transaction was in progress but not committed, its changes are undone using the log
to ensure consistency.
Log-based recovery provides a highly reliable method for handling system crashes and transaction
failures.
Deferred Database Modification: In this approach, updates to the database are not made until the
transaction commits. The changes are only written to the database after the commit, which ensures that
no partial changes are left in case of failure. This is useful for systems that prioritize performance and
simplicity of rollback.
Example:
o A transaction updates the salary of an employee, but this change is not written to the database until the
transaction is committed. If a crash happens before the commit, the changes are discarded.
Immediate Database Modification: In this approach, database updates are written to the database
immediately, even before the transaction commits. However, these changes are not considered
permanent until the transaction commits, and they can be rolled back if the transaction fails.
Example:
o A transaction updates an employee’s salary, and the change is immediately written to the database. If
the transaction fails before the commit, the system rolls back the changes using the log.
The primary difference is when the changes are actually applied to the database, with deferred
modification ensuring no changes are visible until a commit, while immediate modification allows
changes to be written instantly.
44
8. Write Short Notes On:
(a) Log Record Buffering: Log record buffering refers to temporarily storing log entries in memory
before writing them to disk. This improves performance by reducing disk I/O operations. The log
records are periodically flushed to stable storage to ensure durability in case of system failures.
(b) Database Buffering: Database buffering involves storing frequently accessed data in memory to
improve the performance of read and write operations. It reduces the need to access the slower disk
storage, speeding up transactions and queries.
(c) Checkpoints: A checkpoint is a point in time where the database state is recorded to ensure
consistency during recovery. During a checkpoint, all the modified pages are written to disk, and the
log is updated to reflect the committed transactions up to that point. This reduces the amount of work
required during recovery.
Volatile Storage: Volatile storage requires power to maintain data. When the power is lost, the data is
erased. RAM is a common example of volatile storage, where data is lost when the system shuts down
or crashes.
Non-Volatile Storage: Non-volatile storage retains data even when power is lost. Examples include
hard drives, solid-state drives (SSDs), and flash memory, which store data persistently and can be
recovered after a crash or power failure.
A Remote Backup System is a strategy used to protect data by maintaining a backup copy of the
database in a separate, geographically distant location. In case of a disaster such as fire, flood, or a
major hardware failure, the backup copy ensures that data is not lost. Remote backups can be done over
the internet or private networks, often using cloud storage solutions, ensuring that the data is safe even
if the primary system is compromised. Remote backups provide redundancy and improve data recovery
time in disaster recovery scenarios.
Unit 12 of the document focuses on Distributed Databases (DDBMS), a critical concept in modern database
management that enables efficient data distribution across multiple locations. It begins by introducing the
Distributed Database Management System (DDBMS), which manages databases spread across different
physical locations while presenting them as a single unified database to users.
The unit categorizes Types of Distributed Databases into Homogeneous DDBMS, where all databases use
the same software, and Heterogeneous DDBMS, where different database systems are used but are integrated
through middleware. This distinction is essential for understanding compatibility challenges in distributed
systems.
Another crucial topic is Data Replication, where copies of data are stored at multiple sites to enhance
availability and reliability. This leads to Fragmentation, which divides a database into smaller, more
manageable pieces that can be stored across different sites. Fragmentation is further classified into Horizontal
Fragmentation (storing subsets of rows) and Vertical Fragmentation (storing subsets of columns), helping
optimize data retrieval efficiency.
The unit also discusses Distribution Transparency, which ensures that users can access data without needing
to know its physical storage location. This is achieved through Location Transparency (data access without
knowing its location) and Replication Transparency (ensuring consistency among replicated copies).
Additionally, the document covers Database Control in DDBMS, focusing on concurrency control, deadlock
handling, and consistency management to prevent conflicts during distributed transactions. Query
Optimization in Distributed Databases is another key aspect, where the system selects the most efficient
way to execute queries across multiple locations.
The unit concludes by emphasizing the advantages of distributed databases, such as improved availability,
fault tolerance, and scalability, while also discussing the challenges, including synchronization issues,
increased complexity, and security risks
Advantages:
Improved Reliability: Since the data is distributed across different locations, the failure of one node
does not lead to a complete loss of data, increasing the overall system's reliability and fault tolerance.
Increased Availability: Distributed databases are generally more available because they provide access
to data from multiple locations. Even if one site goes down, other sites may continue to function.
Scalability: As the database grows, more resources (storage or processing power) can be added to the
distributed system, thus improving its performance and handling of larger workloads.
Enhanced Performance: Local data processing can be done at the site closest to the user, leading to
better performance as queries can be processed locally without needing to access distant databases.
Flexibility: Distributed DBMSs support different types of distributed architectures (e.g., centralized,
decentralized), making it flexible to adapt to various business needs.
Disadvantages:
Complexity: Managing a distributed database system is more complex than a centralized one, as it
requires careful coordination, synchronization, and fault tolerance mechanisms.
Security Issues: With data spread across multiple sites, ensuring the security and integrity of the
database becomes more difficult. There is a higher risk of unauthorized access and data breaches.
46
Network Dependence: Since the system is distributed, it heavily relies on network connectivity.
Network issues, such as latency or outages, can severely affect the performance or availability of the
database.
Synchronization Overhead: Keeping the data synchronized across different nodes can be resource-
intensive and may cause delays or inconsistencies if not managed properly.
2. Features of DDBMS
Data Distribution: The database is distributed across different sites, allowing the system to store data
in multiple locations, often for performance, redundancy, or geographic considerations.
Autonomy: Each site in a distributed database may have control over its local database, but they
cooperate with other sites. Different sites may also be governed by different DBMS systems.
Replication: To increase reliability and availability, data can be replicated across different sites. This
ensures that copies of the same data are available even if a site becomes unavailable.
Transparency: A DDBMS provides various types of transparency, including location transparency
(users do not need to know where data is located) and access transparency (users can access data
without knowing how it is distributed).
Concurrency Control: Distributed systems must handle concurrency control in a way that ensures that
multiple users can access and modify data at the same time without conflict or inconsistency.
Fault Tolerance: DDBMSs are designed to handle failures at individual nodes or in the network. They
ensure that data can be recovered even when some parts of the system fail.
Efficiency: The system should optimize query execution to minimize response time and resource
consumption (e.g., CPU, memory, and bandwidth usage). This involves query optimization techniques
that consider the distributed nature of the data.
Load Balancing: The query processing should distribute the computational workload effectively across
different sites, preventing overloading of any single site.
Minimizing Data Transfer: Since data is spread across multiple sites, it is important to minimize the
amount of data transferred between sites to reduce network traffic and improve response times.
Handling Heterogeneity: Distributed systems may involve different types of databases and systems.
Query processing must handle these differences seamlessly.
Transparency: The user should not be aware of the distribution of data across multiple sites. The
system should present a unified interface to the user, hiding the complexity of distributed data storage
and query execution.
Horizontal fragmentation involves dividing a relation (table) into smaller, more manageable pieces
(fragments) based on rows. Each fragment consists of a subset of rows that satisfy certain conditions.
This allows for more efficient querying when users are interested in subsets of the data.
47
Vertical fragmentation involves splitting a relation into smaller pieces based on columns. Different
columns are stored in different fragments, and queries that require access to only certain columns can
access those fragments without needing to access the entire relation.
1. Range-Based Fragmentation: Divides the relation based on ranges of values in a particular column.
For example, students can be divided based on their year of study.
2. List-Based Fragmentation: Divides the relation into fragments based on a specific list of values. For
example, students can be grouped based on their course names.
3. Hash-Based Fragmentation: Uses a hash function to divide the data into fragments. This method
ensures an even distribution of data across fragments.
Fragmentation:
Correctness Criteria:
Completeness: All tuples in the relation must appear in one and only one fragment.
Disjointness: The fragments must be disjoint; no tuple should appear in multiple fragments.
1. Location Transparency: Users and applications should not need to know the physical location of the
data. A query can be issued without specifying where the data resides.
o Example: A query for student records will return data regardless of whether the student’s information
is stored in one or multiple locations.
2. Fragmentation Transparency: Users should not be aware of how data is fragmented (horizontally or
vertically) across different sites.
o Example: A user queries a student’s information, and the system retrieves the data from multiple
fragmented pieces seamlessly.
3. Replication Transparency: Users should not be aware of the replication of data across sites.
48
o Example: A user accessing student records will see the same data whether it’s replicated across
multiple sites or stored in a single location.
4. Access Transparency: Users should not need to know the details of accessing the data, such as
whether it is distributed or centralized.
o Example: Users interact with the database in the same way regardless of whether the data resides on
one server or is distributed across multiple servers.
5. Concurrency Transparency: The system should ensure that multiple users can concurrently access
and modify data without affecting each other’s operations, ensuring consistency.
o Example: Two users accessing and modifying the same student record simultaneously should not cause
data conflicts or errors.
Distributed Deadlock Prevention: In this approach, the system prevents deadlocks by ensuring that
circular wait conditions cannot occur. It uses a variety of techniques like resource allocation ordering or
requiring transactions to request all resources at once, thus eliminating the possibility of circular wait.
Distributed Deadlock Avoidance: Deadlock avoidance techniques ensure that deadlocks never happen
by analyzing resource allocation patterns and deciding whether granting a resource will lead to a
deadlock situation. The system grants resources only if the resulting state is safe, i.e., it does not lead to
a cycle in the wait-for graph.
Deadlock Detection and Recovery Scheme: A typical deadlock detection scheme in a distributed
system uses a wait-for graph to detect cycles (i.e., deadlocks). Each node in the graph represents a
transaction, and edges represent resource requests. If a cycle is detected, a deadlock is identified, and
one or more transactions involved in the cycle are aborted and rolled back.
Homogeneous Database: In a homogeneous distributed database, all the sites use the same DBMS.
The systems are compatible and can interact seamlessly.
o Example: A company with several offices, each using the same DBMS, such as Oracle, to manage its
data.
Heterogeneous Database: In a heterogeneous distributed database, different sites may use different
DBMSs or different versions of the same DBMS. These systems must be able to interact and
communicate despite differences in data representation, DBMS types, or operating systems.
o Example: One office using MySQL while another uses SQL Server.
Data Integration: Different DBMSs at different sites must be able to exchange data, which may
involve using middleware, such as database gateways or APIs, to handle conversions between formats.
Query Processing: Query processing is more complex due to the differences in query languages and
database structures.
49
Link Failure: If a link between two sites in a distributed system fails, data cannot be exchanged
between the sites, potentially causing delays or loss of access to certain data. This can lead to
inconsistencies, as different sites may have outdated or incomplete data.
Network Partitioning: Network partitioning occurs when the network splits into disjoint sections,
preventing communication between certain nodes. This can lead to problems like split-brain
scenarios, where the system may mistakenly assume different parts of the database are independent and
may process transactions without coordinating, leading to inconsistency.
Recovery:
Replication: Data can be replicated across different sites, ensuring that even if one site is isolated, data
can be retrieved from other sites.
Quorum-Based Systems: A quorum mechanism can be used to ensure that a majority of sites agree on
the data, preventing inconsistencies during network partitioning.
1. Parsing: The query is parsed to check for syntax errors and to convert the query into an internal
representation, such as a parse tree.
2. Optimization: The query is optimized to find the most efficient execution plan. In distributed systems,
this involves determining the best strategy for accessing distributed data, minimizing data transfer, and
optimizing resource utilization.
3. Execution: The query is executed based on the optimized plan. This phase involves data retrieval and
processing, potentially accessing multiple sites in the distributed database.
4. Result Aggregation: The results from different sites are collected and integrated to form a complete
result set, which is then sent back to the user.
Unit 13 of the document focuses on Cloud Databases, covering their fundamental concepts, service models,
risks, and strategic implementation. It begins by defining Cloud Computing as a model that enables on-
demand access to computing resources, such as storage and processing power, over the internet. Cloud
databases are databases that operate in cloud environments, offering scalability, flexibility, and cost
efficiency.
50
It also discusses the differences between cloud databases and traditional databases, emphasizing aspects
such as data distribution, maintenance, and performance optimization. Cloud databases provide benefits such
as automated backups, high availability, and disaster recovery, which make them a popular choice for
enterprises.
Security is a significant concern in cloud computing, and the document highlights risks associated with
cloud databases, including data breaches, unauthorized access, and compliance challenges. It emphasizes the
importance of cloud computing strategy planning, which involves selecting the appropriate cloud model,
ensuring data security, and optimizing costs.
The unit concludes by discussing the Software-as-a-Service (SaaS) model, which enables businesses to
access database applications via a web interface without requiring local infrastructure. Additionally, it
contrasts cloud computing with distributed computing, explaining how both concepts manage large-scale
data processing but differ in architecture and control.
Overall, Unit 13 underscores the growing role of cloud databases in modern computing, highlighting their
advantages, challenges, and best practices for effective implementation
Cloud computing refers to the delivery of computing services such as storage, processing power,
databases, networking, software, and analytics over the internet, or "the cloud." Instead of owning and
maintaining physical infrastructure or data centers, businesses and individuals can access these services
on-demand through a cloud service provider. This allows users to scale up or down based on their
needs and only pay for the resources they use, leading to cost efficiency. Cloud computing eliminates
the need for extensive IT infrastructure, reduces operational costs, and increases agility and innovation
by offering services remotely via a network.
Cloud computing offers numerous benefits, making it a preferred choice for businesses and individuals
alike. Some of the key benefits include:
Cost Efficiency: With cloud computing, users can avoid the large upfront costs associated with
purchasing and maintaining physical hardware. Instead, they pay on a pay-as-you-go basis, which
reduces capital expenditures.
Scalability: Cloud services allow users to easily scale up or down based on demand. Whether it's
increasing storage capacity or processing power, cloud resources can be adjusted in real-time to meet
business requirements.
Flexibility: Cloud computing offers access to a wide range of services, enabling businesses to use
exactly what they need without being tied to specific infrastructure. The cloud also supports a variety of
operating systems and software platforms.
Accessibility: Cloud services are accessible from anywhere with an internet connection. This promotes
collaboration and remote working, as employees can access the cloud from multiple devices, whether
they are in the office or working remotely.
51
Disaster Recovery: Cloud providers offer robust backup and disaster recovery solutions, ensuring that
data is protected and easily recoverable in case of hardware failure or unforeseen incidents.
3) What is a Cloud?
A cloud, in the context of computing, refers to a system of virtualized servers and resources that are
hosted and managed remotely by cloud service providers. These resources, including computing power,
storage, and networking, are delivered over the internet to users or organizations on-demand. The cloud
eliminates the need for on-premises infrastructure, offering scalable and flexible services that can be
accessed anytime and from anywhere, provided there is an internet connection. Clouds can be private,
public, or hybrid, depending on their architecture and use cases.
In cloud computing, several types of data are handled, stored, and processed, depending on the nature
of the application and use case. Some of the most common data types include:
Structured Data: This is highly organized data, typically stored in databases. It follows a specific
format, such as rows and columns in relational databases (SQL). Examples include financial records
and customer information.
Unstructured Data: This refers to data that doesn't have a predefined structure. It may include text,
images, videos, emails, social media content, and documents. Cloud computing platforms can provide
storage and tools for managing such data.
Semi-Structured Data: This data type contains elements of both structured and unstructured data, such
as JSON or XML files. While it may not fit neatly into tables, it still has tags or markers to separate
data elements.
Big Data: Cloud computing is often used to process and analyze large volumes of data, including
structured, unstructured, and semi-structured data, using technologies like Hadoop, NoSQL databases,
and data lakes.
Real-Time Data: This is dynamic data that is constantly being updated and requires real-time
processing. Examples include sensor data from IoT devices, social media feeds, and financial market
data.
Cloud computing architecture is often divided into several layers, each responsible for different aspects
of the system. These layers include:
1. Infrastructure as a Service (IaaS): This is the foundational layer of cloud architecture, providing
virtualized computing resources such as servers, storage, and networking. IaaS allows businesses to
rent infrastructure and scale resources as needed. Examples include Amazon Web Services (AWS) and
Microsoft Azure.
2. Platform as a Service (PaaS): This layer provides a platform that allows developers to build, deploy,
and manage applications without worrying about underlying infrastructure. It includes tools for
52
application development, databases, and middleware. Examples include Google App Engine and
Heroku.
3. Software as a Service (SaaS): SaaS delivers fully managed software applications over the cloud,
allowing users to access and use them over the internet without installation. Examples include Gmail,
Salesforce, and Microsoft Office 365.
4. Cloud Management Layer: This layer is responsible for managing and orchestrating the deployment,
monitoring, and maintenance of cloud resources across the IaaS, PaaS, and SaaS layers.
5. Security Layer: This layer ensures that cloud infrastructure and services are secure from unauthorized
access, data breaches, and other threats. It includes encryption, firewalls, identity management, and
access control systems.
For large-scale cloud computing, several platforms are used that offer powerful computing capabilities
and resource management. These platforms are designed to scale effectively to meet the demands of
large organizations and enterprises. Some of the most popular platforms for large-scale cloud
computing include:
Amazon Web Services (AWS): AWS is a leading cloud platform offering services across computing,
storage, networking, databases, machine learning, and more. It is widely used for large-scale enterprise
applications and offers scalability and flexibility.
Microsoft Azure: Azure is another major cloud platform that provides a comprehensive set of tools for
building and deploying applications. It offers services similar to AWS and is popular among enterprises
that use Microsoft products.
Google Cloud Platform (GCP): Google Cloud provides high-performance computing, storage, and
analytics services. Its platform is particularly strong in data analytics, AI, and machine learning, and it
is used for large-scale data processing.
IBM Cloud: IBM Cloud offers a combination of IaaS, PaaS, and SaaS services, with a strong focus on
AI, IoT, and hybrid cloud deployments, making it suitable for large-scale enterprise solutions.
Alibaba Cloud: Alibaba Cloud provides a comprehensive suite of services, including compute,
networking, databases, and big data tools. It is widely used in Asia and growing globally for large-scale
enterprise applications.
Cloud computing offers different deployment models, each with varying degrees of control, security,
and flexibility. These models are:
1. Public Cloud: In a public cloud, the infrastructure and resources are owned and operated by a cloud
service provider and are made available to the general public. It is a cost-effective option for businesses
that do not require specific security or privacy controls. Examples include AWS, Google Cloud, and
Microsoft Azure.
2. Private Cloud: A private cloud is used exclusively by a single organization, providing more control
over security and customization. It can be hosted on-premises or by a third-party provider. This model
is preferred by businesses with strict regulatory or security requirements.
53
3. Hybrid Cloud: A hybrid cloud combines both private and public clouds, allowing data and
applications to be shared between them. This model enables businesses to use the public cloud for less-
sensitive workloads while keeping critical applications and data in a private cloud.
4. Community Cloud: A community cloud is shared by several organizations that have common
concerns, such as security, compliance, or data privacy. It allows multiple organizations to benefit from
shared resources while maintaining privacy.
Cloud computing provides several security measures to ensure data protection, confidentiality,
integrity, and availability. Key security aspects include:
Encryption: Data is encrypted both at rest and in transit, ensuring that sensitive information is
protected from unauthorized access.
Access Control: Role-based access control (RBAC) and identity management systems ensure that only
authorized users and systems can access specific data and resources.
Data Backup and Disaster Recovery: Cloud providers offer automated backups and disaster recovery
options, ensuring data is recoverable in case of system failure or accidental loss.
Firewall and Intrusion Detection: Cloud platforms implement firewalls and intrusion
detection/prevention systems to block unauthorized access and monitor malicious activity.
Compliance and Certifications: Cloud providers often comply with industry standards and regulations
such as GDPR, HIPAA, and SOC 2, ensuring data privacy and legal compliance.
Multi-Factor Authentication (MFA): MFA requires users to provide two or more forms of
verification, adding an additional layer of security to prevent unauthorized access.
Secure APIs: APIs used for cloud services are secured with encryption and authentication mechanisms
to prevent misuse and breaches.
The unit explains the structure of a PL/SQL block, which consists of three main sections:
The unit also discusses PL/SQL subprograms, including procedures and functions. Procedures are named
PL/SQL blocks that execute specific tasks, while functions return values and can be used within SQL queries.
Another key concept covered is cursors, which enable row-by-row processing of query results. The unit
differentiates between:
54
Implicit Cursors – Automatically created by PL/SQL when executing SQL statements.
Explicit Cursors – Defined by developers to manually control query execution.
Triggers, which are automatically executed PL/SQL blocks triggered by database events such as INSERT,
UPDATE, or DELETE operations, are also explored. The unit explains two types of triggers:
The unit concludes by discussing error handling in PL/SQL, which ensures that exceptions are managed
efficiently using built-in exception types like NO_DATA_FOUND and DUP_VAL_ON_INDEX.
Overall, Unit 14 highlights the advantages of PL/SQL, including improved performance, reduced network
traffic, and enhanced security, making it a powerful tool for database programming
Both %ROWTYPE and TYPE RECORD are used in PL/SQL to define composite variables that can
hold multiple values, but they have different purposes and behaviors.
%ROWTYPE: This is a built-in attribute used to declare a record that can hold an entire row of a table
or view. It automatically assumes the structure (column names and data types) of the table or view. For
example, if you want to fetch and store a full row from the employees table, you can declare a variable
of type employees%ROWTYPE, which will match the exact column types and names of the employees
table.
Example:
plsql
Copy
DECLARE
emp_record employees%ROWTYPE;
BEGIN
SELECT * INTO emp_record FROM employees WHERE employee_id = 101;
55
END;
TYPE RECORD: This is a user-defined record type, which is explicitly defined by the developer.
Unlike %ROWTYPE, a TYPE RECORD allows you to define a custom structure with specific
variables that do not need to correspond to an actual table or view. You define the record type by
explicitly stating the fields and their respective data types.
Example:
plsql
Copy
DECLARE
TYPE emp_record_type IS RECORD (
emp_id employees.employee_id%TYPE,
emp_name employees.first_name%TYPE,
emp_salary employees.salary%TYPE
);
emp_record emp_record_type;
BEGIN
SELECT employee_id, first_name, salary INTO emp_record.emp_id, emp_record.emp_name,
emp_record.emp_salary
FROM employees WHERE employee_id = 101;
END;
The key difference is that %ROWTYPE is directly tied to a database table or view's structure, while
TYPE RECORD is user-defined and can be tailored for any custom structure.
A cursor in PL/SQL is a pointer used to retrieve, manipulate, and manage a set of data returned by a
query. It allows PL/SQL to process data row-by-row, which is useful when performing operations that
involve complex data retrieval or iterative processing. The primary uses of cursors include:
Fetching rows from a database: A cursor is often used to fetch data from the database one row at a
time for further processing.
Processing query results: Cursors help in processing large volumes of data by iterating over the result
set, allowing the developer to perform actions such as calculations or business logic on each row.
Handling multiple rows: When a query returns more than one row, a cursor allows sequential access
to each row, making it easier to work with data that doesn't fit in a single variable.
Explicit Cursor: The developer can define an explicit cursor for complex queries, providing greater
flexibility to fetch, loop, and manipulate data in a procedural block.
Implicit Cursor: PL/SQL automatically creates an implicit cursor for queries that are executed as part
of a PL/SQL block without the need for the developer to define one.
plsql
Copy
DECLARE
CURSOR emp_cursor IS SELECT employee_id, first_name FROM employees;
56
emp_record emp_cursor%ROWTYPE;
BEGIN
OPEN emp_cursor;
LOOP
FETCH emp_cursor INTO emp_record;
EXIT WHEN emp_cursor%NOTFOUND;
DBMS_OUTPUT.PUT_LINE('Employee ID: ' || emp_record.employee_id || ' Name: ' ||
emp_record.first_name);
END LOOP;
CLOSE emp_cursor;
END;
A database trigger is a special kind of stored procedure in PL/SQL that is automatically executed or
"fired" in response to certain events or changes in the database. The main uses of triggers are:
Data Integrity and Validation: Triggers can enforce complex business rules and data validation
checks whenever a specific operation (INSERT, UPDATE, DELETE) is performed on a table. For
instance, a trigger might ensure that an employee's salary cannot be negative when an update is made.
Auditing and Logging: Triggers are often used for logging changes to important tables. For example,
whenever a record is updated or deleted, a trigger can store the old values in an audit table for future
reference.
Enforcing Referential Integrity: Triggers can automatically enforce integrity constraints between
tables, such as ensuring that a referenced key exists before allowing an insert or update.
Automatic Calculation: Triggers can be used for automatic calculations. For example, after a sale is
recorded, a trigger might automatically update the inventory table to reflect the new stock levels.
Example of a trigger:
plsql
Copy
CREATE OR REPLACE TRIGGER emp_salary_check
BEFORE INSERT OR UPDATE ON employees
FOR EACH ROW
BEGIN
IF :NEW.salary < 0 THEN
RAISE_APPLICATION_ERROR(-20001, 'Salary cannot be negative');
END IF;
END;
Predefined Exceptions: These are exceptions that PL/SQL has already defined. They are automatically
raised when specific errors occur during the execution of a PL/SQL block. Examples include
NO_DATA_FOUND, TOO_MANY_ROWS, and ZERO_DIVIDE.
57
User-Defined Exceptions: These are exceptions defined by the user to handle specific error conditions
that are not covered by predefined exceptions. To create a user-defined exception, you define an
exception name and then raise it explicitly in the code when certain conditions are met.
plsql
Copy
DECLARE
insufficient_funds EXCEPTION;
BEGIN
IF balance < withdrawal_amount THEN
RAISE insufficient_funds;
END IF;
EXCEPTION
WHEN insufficient_funds THEN
DBMS_OUTPUT.PUT_LINE('Insufficient funds for withdrawal.');
END;
In PL/SQL, functions and procedures are reusable blocks of code that can be called within a PL/SQL
anonymous block, other procedures, or functions. They allow for modularization and reuse of code.
Calling a Function: A function is called by specifying its name and passing any necessary arguments.
The return value of the function can be assigned to a variable.
Example:
plsql
Copy
DECLARE
result NUMBER;
BEGIN
result := calculate_bonus(5000, 0.10); -- Calling a function
DBMS_OUTPUT.PUT_LINE('Bonus: ' || result);
END;
Calling a Procedure: A procedure is executed with the EXECUTE or CALL statement, and it does not
return a value but may modify the state of the database or variables.
Example:
plsql
Copy
DECLARE
emp_id NUMBER := 101;
BEGIN
increase_salary(emp_id, 500); -- Calling a procedure
END;
58
7) Explain Two Virtual Tables Available at the Time of Database Trigger Execution
In the context of database triggers, two virtual tables are available during trigger execution: :NEW and
:OLD. These tables are used to reference the values of the affected rows before and after the trigger
event.
:NEW: This virtual table holds the new values of the fields in the row that is being inserted or updated.
In an INSERT trigger, :NEW contains the new row being added. In an UPDATE trigger, it contains the
updated values of the modified row.
:OLD: This virtual table holds the original values of the fields in the row before the update or delete
operation. In a DELETE trigger, :OLD contains the values of the row being deleted. In an UPDATE
trigger, it contains the original values before the change.
Example:
plsql
Copy
CREATE OR REPLACE TRIGGER salary_audit
AFTER UPDATE ON employees
FOR EACH ROW
BEGIN
IF :OLD.salary <> :NEW.salary THEN
INSERT INTO salary_changes (employee_id, old_salary, new_salary, change_date)
VALUES (:OLD.employee_id, :OLD.salary, :NEW.salary, SYSDATE);
END IF;
END;
Here, the :OLD and :NEW tables are used to track and store changes in an employee's salary.
59