MCS-207 2024-25
MCS-207 2024-25
SOLVED ASSIGNMENT
2024-25
There are five questions in this assignment, which carries 80 marks. Rest 20 marks are
for viva voce. You may use illustrations and diagrams to enhance the explanations.
Please go through the guidelines regarding assignments given in the Programme Guide
for the format of the presentation. The answer to each part of the question should be
confined to about 300 words. Make suitable assumption, if any.
Question 1: (Covers Block 1) (4+4+4+4+4=20 Marks)
(a) Explain the three level DBMS architecture with the help of an example. Also, explain
the concept of data independence in the context of database systems with the help of an
example.
Ans. Three-Level DBMS Architecture
1. Internal Level: This is the lowest level, representing how data is physically stored in the
system. It manages data storage, indexing, and memory allocation. For example, data might
be stored as binary files or on disks in blocks.
2. Conceptual Level: This middle layer provides a community view of the entire database,
abstracting away details of physical storage. It defines what data is stored and the
relationships between different data types. For instance, in a college database, tables like
"Students" and "Courses" would be defined, but physical storage details would be hidden.
3. External Level: This topmost level shows how individual users or applications view the
data. Different users can see different views of the same database. For example, a professor
may see only students' academic records, while an administrator might see financial
information.
Data Independence
Data independence allows changes to one level of the DBMS without affecting others. There
are two types:
- Logical Data Independence: Changing the conceptual schema (e.g., adding a field to a table)
without affecting external views.
Page 1 of 36
- Physical Data Independence: Changing the internal schema (e.g., how data is stored)
without altering the conceptual schema.
For example, the physical storage format of student records can be changed, but users
accessing those records via SQL queries remain unaffected.
(b) Explain the following terms in the context of a relational model with the help of one
example of each– Super key, Domain, Cartesian Product, Primary Key, Natural join,
Set Intersection, Set Difference operation and referential integrity constraint.
Ans. Relational Model Terms
1. Super Key: A super key is a set of one or more attributes that uniquely identify a tuple
(row) in a table.
Example: In a table Students, a combination of attributes like (Student_ID, Email) can form
a super key because both together uniquely identify each student.
3. Cartesian Product: This is the combination of all possible pairs of rows from two tables.
Example: If Students has 3 rows and Courses has 2 rows, the Cartesian product of Students
× Courses will produce 6 rows.
4. Primary Key: A primary key is a minimal super key that uniquely identifies each row in a
table.
Example: In a Students table, the attribute Student_ID can serve as the primary key because
it uniquely identifies each student.
5. Natural Join: A natural join combines two tables based on common attributes, removing
duplicates.
Example: Joining Students and Courses based on a common column like Student_ID will
merge the tables where Student_ID matches.
Page 2 of 36
Example: If Table A and Table B both have common student records, the intersection will
return only those common records.
7. Set Difference: This operation returns the rows present in one table but not in the other.
Example: If Table A has students enrolled in Course A, and Table B has students enrolled in
Course B, the difference between Table A and Table B will show students enrolled only in
Course A.
8. Referential Integrity Constraint: This ensures that a foreign key in one table must have a
corresponding value in the referenced primary key of another table.
Example: In a Courses table, the Student_ID (foreign key) must match a valid Student_ID
in the Students table.
(c) A University maintains the list of the books available in its library using a database
system. In addition, this system is used for issue and return of books to its students. This
database is used to find the following details by the students of the university and the
staff of the library:
• List of the classification number, ISBN number, Title, Author Names, Subject Area of
the books.
• Searching of books using subject area, Title and Author name.
• List of books that are issued to a specific student. Draw an ER diagram for the library.
Specify key attributes and constraints on each entity type and on each relationship type.
Note any unspecified requirements and make appropriate assumptions to make the
specification complete.
Ans. ER Diagram for Library Database System
The university library system tracks the books, students, and book transactions (issue and
return). The following ER diagram captures the entities, relationships, key attributes, and
constraints.
Entities:
1. Book:
- Attributes:
- ISBN (Primary Key): Unique identifier for each book.
Page 3 of 36
- Classification Number: A code representing the categorization of the book.
- Title: The name of the book.
- Author Names: Names of the book’s authors.
- Subject Area: The topic the book covers.
- Constraints:
- A book must have one ISBN and one classification number.
- A book can have multiple authors.
2. Student:
- Attributes:
- Student_ID (Primary Key): Unique identifier for each student.
- Name: Name of the student.
- Department: Academic department to which the student belongs.
- Year of Study: Year the student is currently in.
- Constraints:
- A student must have a unique Student_ID.
Relationships:
Page 4 of 36
- An author can write multiple books (N:1).
- Constraints:
- Each book must have a unique ISBN.
- A book can only be issued to one student at a time.
- A student cannot issue more than a predefined number of books (assume 5 books).
Page 5 of 36
This model captures the essential functions of the system: listing, searching, and tracking
book transactions. By enforcing referential integrity, the system ensures that issued books are
properly returned and that all records are consistent.
(d) Design normalised tables in 3NF for the ER diagram drawn in part
Ans. Normalized Tables in 3NF
Based on the ER diagram for the library system, here are the normalized tables in Third
Normal Form (3NF):
1. Book
- Attributes:
- ISBN (Primary Key): Unique identifier for each book.
- Classification_Number: Classification code of the book.
- Title: Title of the book.
- Subject_Area: Subject area of the book.
- Constraints:
- ISBN is the primary key.
- Classification_Number, Title, and Subject_Area are fully functionally dependent on ISBN.
2. Author
- Attributes:
- Author_ID (Primary Key): Unique identifier for each author.
- Author_Name: Name of the author.
- Constraints:
- Author_ID is the primary key.
- Author_Name must be unique.
Page 6 of 36
- Attributes:
- ISBN (Foreign Key): References Book.ISBN.
- Author_ID (Foreign Key): References Author.Author_ID.
- Constraints:
- Composite primary key (ISBN, Author_ID).
- Ensures that each combination of ISBN and Author_ID is unique.
4. Student
- Attributes:
- Student_ID (Primary Key): Unique identifier for each student.
- Name: Name of the student.
- Department: Academic department of the student.
- Year_of_Study: Current academic year of the student.
- Constraints:
- Student_ID is the primary key.
5. Transaction
- Attributes:
- Transaction_ID (Primary Key): Unique identifier for each transaction.
- ISBN (Foreign Key): References Book.ISBN.
- Student_ID (Foreign Key): References Student.Student_ID.
- Issue_Date: Date when the book was issued.
- Return_Date: Date when the book was returned (nullable).
- Constraints:
- Transaction_ID is the primary key.
- ISBN and Student_ID together with Issue_Date ensure that a book issued to a student is
tracked accurately.
Page 7 of 36
(c), with the required integrity constraints.
Ans. Normalized Tables in 3NF with Integrity Constraints
Based on the ER diagram for the university library system, the normalized tables in Third
Normal Form (3NF) are:
1. Book
- Attributes:
- ISBN (Primary Key): Unique identifier for each book.
- Classification_Number: Classification code for the book.
- Title: Title of the book.
- Subject_Area: Subject area of the book.
- Constraints:
- ISBN is the primary key.
- Each book’s Classification_Number, Title, and Subject_Area must be uniquely associated
with ISBN.
2. Author
- Attributes:
- Author_ID (Primary Key): Unique identifier for each author.
- Author_Name: Name of the author.
- Constraints:
- Author_ID is the primary key.
- Author_Name must be unique.
3. Book_Author
- Attributes:
- ISBN (Foreign Key): References Book.ISBN.
- Author_ID (Foreign Key): References Author.Author_ID.
- Constraints:
- Composite primary key (ISBN, Author_ID).
- Ensures that each book-author pair is unique.
4. Student
- Attributes:
- Student_ID (Primary Key): Unique identifier for each student.
- Name: Name of the student.
- Department: Department to which the student belongs.
- Year_of_Study: Academic year of the student.
- Constraints:
- Student_ID is the primary key.
5. Transaction
- Attributes:
- Transaction_ID (Primary Key): Unique identifier for each transaction.
- ISBN (Foreign Key): References Book.ISBN.
- Student_ID (Foreign Key): References Student.Student_ID.
- Issue_Date: Date when the book was issued.
- Return_Date: Date when the book was returned (nullable).
- Constraints:
- Transaction_ID is the primary key.
- ISBN and Student_ID together with Issue_Date ensure accurate tracking of issued books.
Integrity Constraints
- Referential Integrity:
- ISBN in Book_Author and Transaction references Book.ISBN.
- Student_ID in Transaction references Student.Student_ID.
- Uniqueness:
- ISBN and Author_ID combination in Book_Author must be unique.
- Each Student_ID and Transaction_ID must be unique.
(e) Explain how the secondary index can be created in a file. Also, explain the
advantages and disadvantages of using secondary indexes. When should you use
secondary Indexes? Give reasons in support of your answer
Ans. Creating a Secondary Index
A secondary index is created to improve query performance for non-primary key attributes in
a file. Here's how it can be created:
1. Select the Attribute: Choose the attribute (non-primary key) that frequently appears in
query conditions.
2. Create the Index: Build a separate index file with entries containing the values of the
chosen attribute and pointers (addresses) to the corresponding records in the main file. For
example, if indexing the Author_Name attribute, the secondary index will list Author_Name
values and their record addresses.
3. Maintain the Index: Update the secondary index whenever records are added, deleted, or
modified in the main file to ensure it remains synchronized.
Advantages:
- Faster Query Performance: Speeds up searches, especially for non-primary key attributes.
- Improved Sorting and Filtering: Allows efficient sorting and filtering based on non-primary
key attributes.
Disadvantages:
- Increased Storage: Requires additional storage for the index file.
- Performance Overhead: Can slow down insert, update, and delete operations due to the need
to maintain the index.
Reasons:
- They enhance performance for specific query patterns.
- They help in optimizing the retrieval of data based on attributes other than the primary key,
improving overall efficiency.
However, avoid excessive indexing, as it can lead to significant maintenance costs and
storage overhead. Use secondary indexes selectively to balance query performance and
system resources.
The primary key for this relation is a composite key: (EnrolNo, CourseCode). This
combination uniquely identifies each record because:
- EnrolNo ensures the uniqueness of each student.
- CourseCode specifies the unique course for each student.
Redundancies:
- ProgrammeName is repeated for each student in the same programme.
- CourseName is repeated for each course in the same relation.
1. Student:
- EnrolNo (Primary Key)
- StudentName
- ProgrammeCode
2. Programme:
- ProgrammeCode (Primary Key)
- ProgrammeName
3. Course:
- CourseCode (Primary Key)
- CourseName
4. Enrollment:
- EnrolNo (Foreign Key)
- CourseCode (Foreign Key)
- Grade
3NF Decomposition
In 3NF, all the attributes in each relation are fully functionally dependent on the primary key,
and there are no transitive dependencies.
(b) Explain the concept of Multi-valued dependency and Join dependency with the help
of an example of each. Also, explain the 4th Normal Form and 5th Normal form.
Ans. Multi-Valued Dependency (MVD) and Join Dependency
Example:
Consider a relation `R(A, B, C)` where:
- `A` → `B`
- `A` → `C`
This implies that for each value of `A`, there can be multiple values for `B` and `C`
independently.
Explanation: For `A = 1`, the values of `B` and `C` can vary independently, showing a multi-
valued dependency.
Concept: A join dependency occurs when a relation can be decomposed into multiple
relations such that the original relation can be reconstructed by joining these decomposed
relations.
Example:
Consider a relation `R(A, B, C)` with join dependency if `R` can be decomposed into `R1(A,
B)` and `R2(A, C)`, and joining `R1` and `R2` on attribute `A` will yield the original relation
`R`.
Decomposition:
- `R1(A, B)`: | 1 | X | 1 | Y |
- `R2(A, C)`: | 1 | M | 1 | N |
Join:
- `R1` ⨝ `R2` on `A` reconstructs the original relation.
Concept: A relation is in 4NF if it is in Boyce-Codd Normal Form (BCNF) and has no multi-
valued dependencies other than those implied by candidate keys.
Example:
Consider a relation `R(A, B, C)` with the functional dependencies `A → B` and `A → C`,
where `A` is the candidate key.
Instance:
Decomposition:
- `R1(A, B)`
- `R2(A, C)`
Each decomposed relation is in 4NF because there are no non-trivial multi-valued
dependencies.
Concept: A relation is in 5NF (or Project-Join Normal Form) if it is in 4NF and cannot be
decomposed further without losing information. It deals with join dependencies, ensuring that
every join dependency in the relation is a consequence of the candidate keys.
Example:
Consider a relation `R(A, B, C, D)` with join dependency where:
- `R` can be decomposed into `R1(A, B)`, `R2(B, C)`, and `R3(C, D)`.
Instance:
Decomposition:
- `R1(A, B)`: | 1 | X | 1 | Y |
- `R2(B, C)`: | X | M | Y | M |
- `R3(C, D)`: | M | N |
Join: Joining `R1`, `R2`, and `R3` on their common attributes reconstructs the original
relation, demonstrating that it is in 5NF.
(c) Explain the following terms with the help of an example of each – Assertion, Cursor,
Stored Procedure, Triggers.
Ans. Assertion, Cursor, Stored Procedure, and Triggers
1. Assertion
Concept: An assertion is a condition or constraint that must always hold true for a database. It
enforces business rules at the database level.
Example: Consider a database where employees should not have a salary greater than
$100,000. An assertion might be defined as:
2. Cursor
Concept: A cursor is a database object used to retrieve, manipulate, and navigate through a
result set row by row.
3. Stored Procedure
4. Trigger
(x) Customers Who Have Not Made Any Credit Transaction Since January 1, 2023
These queries cover a range of operations including table creation, data insertion, and
complex data retrieval.
2. Consistency: This property ensures that a transaction brings the database from one valid
state to another. For instance, if a database constraint enforces a rule that total account
balances must equal zero, a transaction that transfers funds must respect this constraint,
ensuring database integrity before and after the transaction.
3. Isolation: This property ensures that concurrent transactions do not interfere with each
other. For example, if two transactions simultaneously attempt to update the same account
balance, isolation ensures each transaction is executed in a way that they do not affect each
other's results.
4. Durability: This property ensures that once a transaction is committed, its changes persist
even in the case of a system crash. For example, once a payment is processed and committed,
it remains in the database even if the system shuts down immediately afterward.
(b) What are the problems that can be encountered, if the three transactions (given in
Figure 1) are run concurrently? Explain with the help of different transaction schedules
of these transactions
(c) What is 2-Phase locking? Lock and unlock various data items of the transactions,
given in Figure 1, using 2-Phase locking such that no concurrency related problem
occurs, when the transactions A, B and C are executed concurrently.
Ans. 2-Phase Locking Protocol:
The 2-Phase Locking (2PL) protocol is a concurrency control mechanism that ensures
serializability of transactions by dividing the locking process into two distinct phases:
1. Growing Phase: A transaction can acquire any number of locks but cannot release any
locks.
2. Shrinking Phase: A transaction can release locks but cannot acquire any more.
2. Transaction B:
- Growing Phase: Lock Y, then lock Z.
- Shrinking Phase: Release Z, then release Y.
3. Transaction C:
- Growing Phase: Lock X, then lock Z.
- Shrinking Phase: Release Z, then release X.
Locking Order:
1. Transaction A locks X, then Y.
2. Transaction B waits until A releases Y before locking Y, then locks Z.
3. Transaction C waits until A releases X before locking X, then locks Z.
By following this protocol, we ensure that no two transactions will interfere with each other’s
locks in a way that would cause deadlock or inconsistency.
(d) Explain the Log based Recovery with the help of an example. What are Redo and
Undo operations. Why do you need checkpoint? Explain the process of recovery with
check points with the help of an example.
Ans. Log-Based Recovery:
Log-based recovery uses a log file to keep track of all changes made during transactions,
enabling the database to recover to a consistent state after a crash. Each transaction’s
operations are recorded in a log, which includes details about data modifications.
Checkpoint:
A checkpoint is a process where the database saves its state to stable storage, marking a point
where all changes up to that time are committed. This helps minimize recovery time, as only
changes since the last checkpoint need to be processed.
1. Before Crash: A checkpoint is taken. Transactions A and B are committed, but transaction
C is still in progress.
2. After Crash:
- Redo: Reapply changes from transactions A and B as they were committed.
- Undo: Roll back changes from transaction C as it was not committed.
This process ensures that the database recovers to a consistent state efficiently.
(e) Explain with the help of an example, how recovery is performed when Deferred
database modification scheme is used.
Ans. Deferred Database Modification Scheme:
In the Deferred Database Modification (DMM) scheme, changes made by a transaction are
not applied to the database until the transaction is committed. This means that updates are
kept in a temporary area and only written to the actual database after the transaction
completes successfully.
2. Transaction T commits. At this point, the changes are applied to the database, and the log
entries are flushed to ensure the database reflects the committed state.
3. In Case of a Crash:
- If Transaction T has not committed yet, its changes are not applied, and the database
remains in its state before T started.
- If Transaction T had committed, the recovery process re-applies the changes from the log
to ensure all committed updates are present in the database.
Thus, DMM ensures that only fully completed transactions affect the database, simplifying
recovery.
(f) What is the cost of selection operation, when the index scan method is used? Explain
with the help of an example. Explain the cost of Join operation when Merge-Join
method is used.
Ans. Cost of Selection Operation with Index Scan:
When using an index scan, the cost of a selection operation depends on the index type and the
number of qualifying records. For example, with a B-tree index, the cost includes:
1. Index Lookup Cost: Determining the index entry’s location, which is typically logarithmic
in relation to the number of entries.
2. Access Cost: Reading the actual data records, often constant or linear depending on the
number of matching entries.
Example: Suppose a B-tree index is used to find records where `age = 30` in a table with
10,000 entries. The index lookup cost is `O(log N)`, where `N` is the number of entries
(10,000), and the access cost depends on the number of records found (e.g., 50 records).
Merge-Join requires both relations to be sorted on the join attribute. The cost involves:
1. Sorting Cost: Sorting both relations, which is `O(N log N)` for each relation.
2. Merge Cost: Linear scan of both sorted relations to find matching tuples, which is `O(N +
M)`, where `N` and `M` are the sizes of the two relations.
Example: For two relations, A with 5,000 tuples and B with 8,000 tuples, sorting both
relations would be `O(5,000 log 5,000) + O(8,000 log 8,000)`, and merging would be
`O(5,000 + 8,000)`.
(g) Make the query tree for the following query (assume the database of problem 2(d)).
SELECT c.custName, c. custId, a.AccountNumber, t.DebitORCredit
FROM Customer c, Account a, Transaction t
WHERE c.custId = a.custId AND a.AccountNumber = t.AccountNumber AND Amount
> 10000;
Ans. To create a query tree for the SQL query, follow these steps:
1. Base Relations: Start with the base relations `Customer (c)`, `Account (a)`, and
`Transaction (t)`.
2. Selection Operation: Apply the selection `Amount > 10000` on `Transaction (t)`. This
filters the records in `Transaction` where the amount is greater than 10,000.
3. Join Operations:
- Perform a join between `Account (a)` and the filtered `Transaction (t)` on the condition
`a.AccountNumber = t.AccountNumber`.
- Next, join the result with `Customer (c)` on `c.custId = a.custId`.
```
π (c.custName, c.custId, a.AccountNumber, t.DebitORCredit)
|
⨝
_____________________
| |
⨝ (c.custId = a.custId) ⨝ (a.AccountNumber = t.AccountNumber)
| |
Customer (c) σ (Amount > 10000)
|
Transaction (t)
|
Account (a)
```
In this tree:
- π denotes the projection operation.
- ⨝ denotes the join operation.
- σ denotes the selection operation.
2. Collections: Support arrays, sets, and lists of data types. For example, a `STUDENT` type
might have a list of `COURSES`.
3. User-Defined Types (UDTs): Allow users to define new data types that encapsulate both
data and methods. For example, a `CIRCLE` type might have attributes like `radius` and
methods like `calculateArea`.
1. Objects: Represent data as entities with both state (attributes) and behavior (methods).
Each object is an instance of a class.
2. Classes and Inheritance: Define types and hierarchies, where classes can inherit attributes
and methods from parent classes.
1. Data Model:
- OODBMS stores data as objects with encapsulated attributes and methods.
- ORDBMS stores data in a relational format with extensions for complex types.
2. Complex Data Handling:
- OODBMS natively supports complex data structures and relationships.
- ORDBMS extends relational databases with support for complex data types but relies on
traditional relational operations.
3. Query Language:
- OODBMS typically uses object-oriented query languages or extensions to SQL.
- ORDBMS primarily uses SQL with extensions for handling complex types and user-
defined types.
4. Inheritance:
- OODBMS supports class hierarchies and inheritance directly.
- ORDBMS uses table inheritance or type extension features to mimic hierarchical
structures.
(b) Explain the multi-dimensional data model of a data warehouse. Also, define the
concept of decision tree with the help of an example. List any four applications of data
mining.
Ans. Multi-Dimensional Data Model of a Data Warehouse:
The multi-dimensional data model is used in data warehousing to represent data in a way that
is intuitive for analysis and reporting. It organizes data into a structure that allows users to
analyze data from multiple perspectives.
2. Measures: Represent the quantitative data to be analyzed, such as sales revenue, profit, or
quantity sold. Measures are aggregated along different dimensions.
3. Cubes: The core structure in a multi-dimensional model is the data cube, which stores data
in a multi-dimensional array format. Each cell in the cube contains aggregated data (e.g., total
sales) for a specific combination of dimensions.
Example: A sales data warehouse might have dimensions like `Product`, `Region`, and
`Time`, with measures like `Sales Amount` and `Units Sold`. Users can analyze total sales by
region and product across different time periods.
Decision Tree:
A decision tree is a supervised learning algorithm used for classification and regression tasks.
It splits data into branches based on feature values to make decisions or predictions.
Example: In a decision tree for classifying whether a person should receive a loan:
- Root Node: "Income Level" (e.g., High, Medium, Low).
- Branch 1: High income leads to "Approved" if other criteria are met.
- Branch 2: Medium income leads to further checks, such as "Credit Score".
- Branch 3: Low income typically leads to "Denied".
1. Customer Segmentation: Identifying distinct customer groups for targeted marketing and
personalized services.
2. Fraud Detection: Detecting fraudulent transactions or behavior patterns in financial
services.
3. Market Basket Analysis: Analyzing purchase patterns to identify associations between
products, used in recommendation systems.
4. Predictive Maintenance: Predicting equipment failures or maintenance needs in
manufacturing to minimize downtime and costs.
(c) Explain the need of NoSQL databases. Explain the characteristics of any two types of
NoSQL databases.
Ans. Need for NoSQL Databases:
1. Scalability: Easily scale horizontally by adding more servers, unlike relational databases
that scale vertically.
2. Flexibility: Support dynamic schemas, making it easier to adapt to changes in data
structure.
3. High Performance: Optimize for specific use cases like large-scale data retrieval or high-
speed writes.
- Scalability: Easily scale by adding more nodes, handling increased data volume and traffic.
- Fault Tolerance: Improve reliability by replicating data across multiple sites, ensuring
availability even if some nodes fail.
- Geographic Distribution: Allow data to be closer to users or systems, reducing latency and
improving access speed.
Blockchain databases are decentralized, distributed ledgers that use cryptographic techniques
to secure and verify transactions. Key features include:
- Immutability: Once recorded, transactions cannot be altered, ensuring data integrity and
trust.
- Decentralization: Operate across a network of nodes without a central authority, reducing
the risk of a single point of failure.
- Consensus Mechanisms: Use algorithms like Proof of Work or Proof of Stake to agree on
transaction validity, preventing fraud and double-spending.
Blockchain databases are widely used in cryptocurrencies, supply chain management, and
other applications requiring secure, transparent, and tamper-proof record-keeping.