0% found this document useful (0 votes)
84 views61 pages

dbms

important questions on dbms for engineers

Uploaded by

Srijeeta Sen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views61 pages

dbms

important questions on dbms for engineers

Uploaded by

Srijeeta Sen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 61

1.

1 Two disadvantages of DBMS


 High Cost:
Implementing and maintaining a DBMS can be expensive due to software licensing fees,
hardware upgrades, and the need for specialized personnel to manage and administer the
system.
 Complexity:
A DBMS is complex to design, set up, and operate. Properly configuring a database system to
ensure efficiency, security, and reliability requires expertise and can be time-consuming.
1.2 List five ways in which the type declaration system of a language such as Java or C++ differs from
the data definition language used in a database.
1. Purpose
Type Declaration System (Java/C++): Primarily used to define variables, classes, and data
structures within the scope of a program. Its goal is to manage in-memory data efficiently during
program execution.
DDL (Database): Used to define the schema of a database, such as tables, columns, constraints,
and relationships, for persistent storage.
2. Scope
Type Declaration System: Operates within the context of a single application or program runtime.
DDL: Affects the global structure of a database, shared and accessed by multiple applications and
users.
3. Data Lifetime
Type Declaration System: Data typically exists only while the program runs and is stored in
memory.
DDL: Data is stored persistently in a database and remains accessible even after the application
terminates.
4. Data Relationships
Type Declaration System: Relationships between data types (e.g., inheritance, composition) are
defined within the logic of the program.
DDL: Specifies explicit relationships (e.g., primary keys, foreign keys) to enforce data integrity
across tables in the database.
5. Constraints
Type Declaration System: Constraints like data types, access modifiers, or method signatures are
enforced at compile time and pertain to program logic.
DDL: Constraints (e.g., NOT NULL, UNIQUE, CHECK, PRIMARY KEY) are enforced at the database
level, ensuring consistency and correctness of stored data.
1.3 List six major steps that you would take in setting up a database for a particular enterprise.
Requirement Analysis
 Identify the goals and objectives of the database.
 Gather detailed requirements from stakeholders to understand what data needs to be
stored, how it will be accessed, and any specific constraints or security needs.
2. Database Design
 Conceptual Design: Create an Entity-Relationship Diagram (ERD) to model the data and
relationships between entities.
 Logical Design: Convert the ERD into a logical schema with tables, fields, and relationships.
 Normalization: Optimize the design to eliminate data redundancy while ensuring efficient
access.
3. Choose a Database Management System (DBMS)
 Select an appropriate DBMS (e.g., MySQL, PostgreSQL, Oracle, MongoDB) based on
scalability, performance, cost, and compatibility with enterprise requirements.
4. Database Implementation
 Create the physical schema by defining tables, columns, data types, and relationships using
the Data Definition Language (DDL) of the chosen DBMS.
 Implement indexes, constraints, and triggers as needed for efficient and secure data
handling.
5. Data Migration
 If applicable, migrate existing data from legacy systems into the new database.
 Perform data cleaning to ensure consistency and accuracy during migration.
6. Testing and Deployment
 Test the database for performance, security, and correctness using test cases and simulated
workloads.
 Deploy the database in a production environment and configure backup mechanisms and
disaster recovery plans.
1.4 Suppose you want to build a video site similar to YouTube. Discuss the relevance of each of these
points to the storage of actual video data, and to metadata about the video, such as title, the
user who uploaded it, tags, and which users viewed it.
1. Storage Requirements
Video Data: Requires significant storage space as video files are typically large. High-resolution
formats like 4K increase the need for scalable and cost-effective storage solutions, such as cloud
storage or distributed file systems.
Metadata: Requires comparatively minimal storage space since it consists of textual and
relational data (e.g., title, tags, and user IDs). This can be efficiently stored in a relational or
NoSQL database.
2. Performance and Retrieval
Video Data: Needs efficient streaming capabilities to deliver videos quickly to users. Content
Delivery Networks (CDNs) are crucial to reduce latency and distribute video content closer to
users.
Metadata: Requires fast read and write operations for searching and updating. An optimized
database design (e.g., indexing on tags or titles) is critical for quick metadata queries.
3. Data Redundancy and Replication
Video Data: Redundancy is essential for fault tolerance. Video files are often replicated across
multiple storage nodes to ensure availability and prevent data loss.
Metadata: Metadata should also be replicated but is less storage-intensive. Replication improves
database reliability and allows quick access to frequently requested metadata.
4. Scalability
Video Data: As user-uploaded content grows exponentially, scalable storage solutions like object
storage (e.g., AWS S3 or Google Cloud Storage) are vital to handle millions of videos.
Metadata: Needs scalable database solutions (e.g., distributed databases like Cassandra or
PostgreSQL with sharding) to accommodate a growing number of users, videos, and interactions.
5. Consistency
Video Data: Strict consistency is less critical since video delivery relies more on availability.
Temporary delays in propagation between storage nodes are tolerable.
Metadata: Consistency is more important. Users expect accurate information about likes,
comments, views, and other metadata, so the system must ensure these updates propagate
reliably.
6. Security and Access Control
Video Data: Requires secure access to prevent unauthorized viewing or downloading. Encryption
at rest and in transit, along with token-based access control, is essential.
Metadata: Also needs to be secured, as it may contain sensitive user information (e.g., uploader
details). Access control mechanisms should limit access to only authorized users or services.
7. Backup and Disaster Recovery
Video Data: Backups are crucial but can be expensive due to the size of video files. Incremental
backups and reliance on replicated storage systems can help minimize costs.
Metadata: Regular backups of the metadata database are critical to ensure the integrity of user
information, tags, and relationships in case of failures.
1.5 Keyword queries used in Web search are quite different from database queries. List key
differences between the two, in terms of the way the queries are specified, and in terms of what
is the result of a query.
1. Query Specification
Keyword Queries (Web Search):
Specified using natural language or a set of keywords.
No strict syntax; users can simply type phrases or words.
Typically unstructured and ambiguous, relying on search engines to interpret intent.
Example: “Best pizza places near me” or “how to learn Python”.
Database Queries:
Structured and formal, using specific query languages like SQL.
Must follow a strict syntax to interact with the database.
Explicitly specifies fields, tables, and conditions.
Example: SELECT name FROM restaurants WHERE location = 'New York' AND cuisine = 'Pizza';
2. Result Format
Keyword Queries (Web Search):
Returns a ranked list of results (e.g., web pages, images, or videos) based on relevance.
Often includes snippets, links, and rich features like maps or answer boxes.
Results are approximate and may not be exact matches.
Database Queries:
Returns structured data, such as rows and columns, that precisely match the query conditions.
Results are deterministic and predictable, providing exact matches based on the query.
3. Underlying Data
Keyword Queries (Web Search):
Operate over large, heterogeneous, and often unstructured datasets like web pages or
documents.
Use indexing, crawling, and ranking algorithms to retrieve results.
May involve advanced techniques like Natural Language Processing (NLP) to interpret intent.
Database Queries:
Operate over structured datasets, stored in tables with predefined schemas.
Depend on indexing and relational logic to retrieve specific records efficiently.
4. User Expertise Required
Keyword Queries (Web Search):
Minimal expertise required; designed for casual users.
Search engines infer user intent and correct errors (e.g., typos, synonyms).
Database Queries:
Requires familiarity with database schema and query languages.
Users must know the exact structure and relationships of the data.
5. Flexibility
Keyword Queries (Web Search):
Flexible and forgiving; returns relevant results even if the query is incomplete, vague, or contains
errors.
Search engines may suggest alternative queries or auto-correct.
Database Queries:
Strict and precise; errors in syntax or missing details lead to query failure.
Requires detailed specification of conditions to retrieve relevant data.
1.6 List four applications you have used that most likely employed a database system to store
persistent data.
Social Media Platforms (e.g., Facebook, Instagram, Twitter):
These platforms use databases to store user profiles, posts, comments, likes, and relationships
between users (e.g., followers and friends).

E-commerce Websites (e.g., Amazon, eBay):


Databases are used to store product details, user accounts, transaction histories, reviews, and
inventory information.

Banking Applications (e.g., PayPal, Bank Mobile Apps):


Banking systems use databases to manage account information, transaction records, customer
details, and payment processing.

Video Streaming Services (e.g., YouTube, Netflix):


Databases store metadata about videos, user accounts, viewing history, recommendations, and
ratings.
1.7 List four significant differences between a file-processing system and a DBMS.
1. Data Redundancy and Consistency
File-Processing System:
Data redundancy is common because the same data might be stored in multiple files.
Maintaining consistency across files is difficult, leading to potential data anomalies.
DBMS:
Minimizes redundancy through normalization and centralized data management.
Enforces consistency using constraints and transaction management.
2. Data Access and Querying
File-Processing System:
Accessing data requires custom programs or scripts for each task.
Querying is inflexible, as there is no standardized query language.
DBMS:
Provides powerful query languages like SQL for flexible and efficient data access.
Supports complex queries and joins without needing custom programming.
3. Data Integrity and Security
File-Processing System:
Lacks built-in mechanisms to enforce data integrity (e.g., foreign key constraints).
Security is rudimentary and depends on file-level permissions.
DBMS:
Ensures data integrity with constraints (e.g., primary key, unique, NOT NULL).
Provides robust security features like role-based access control and encryption.
4. Concurrency and Recovery
File-Processing System:
Poor support for concurrent access, leading to risks of data corruption.
Limited or no mechanisms for recovery in case of failures.
DBMS:
Supports multi-user access with concurrency control mechanisms (e.g., locking, transactions).
Includes robust recovery systems to restore data after crashes or errors.
1.8 Explain the concept of physical data independence, and its importance in database systems.
Physical data independence refers to the ability to modify the physical storage structure or
organization of data in a database without affecting the conceptual schema or the application
programs that interact with the database.
The physical schema defines how data is actually stored on disk (e.g., file structures, indexing, or
partitioning).
The conceptual schema defines the logical structure of the database (e.g., tables, relationships,
and constraints) that users and applications work with.
Physical data independence ensures that changes in the storage details (like using a new indexing
technique or optimizing storage structures) do not require rewriting queries or altering
application logic.
Importance of Physical Data Independence
Ease of Maintenance
Database administrators (DBAs) can optimize storage or reorganize files for better performance
without disrupting the functionality of the system.
Improved Performance
Physical data independence allows changes to storage strategies (e.g., adding indexes or
clustering data) to improve query performance without requiring users to adapt to the changes.
Reduced Development Costs
Developers can focus on application logic without worrying about how data is physically stored
or accessed, speeding up development time.
System Scalability
As the database grows, the underlying storage structure can be scaled or modified (e.g.,
transitioning to distributed storage) without impacting the higher-level functionality.
Data Portability
Physical data independence makes it easier to migrate data to different hardware or storage
systems without affecting user interactions or application logic.
Example of Physical Data Independence
Suppose a database initially stores data in a flat file. Later, the DBA decides to store the same
data using a B-tree index for faster query processing. Due to physical data independence:
Queries like SELECT * FROM employees WHERE emp_id = 101; will continue to function without
any changes.
Only the physical storage layer is modified, and the logical structure (conceptual schema)
remains intact.
This abstraction is a key feature of database systems, ensuring flexibility and long-term usability.
1.9 List five responsibilities of a database-management system. For each responsibility, explain the
problems that would arise if the responsibility were not discharged.
1. Data Storage and Retrieval
 Responsibility: Efficiently store data and allow for fast, accurate retrieval when queried.
 Problems if not discharged:
o Difficulty in locating and accessing data, leading to poor performance.
o Increased time and effort for users to retrieve relevant information.
o Data might be stored redundantly, consuming excessive storage space.
2. Data Integrity and Consistency
 Responsibility: Ensure that data adheres to predefined rules and constraints (e.g., primary
keys, foreign keys).
 Problems if not discharged:
o Inconsistent or inaccurate data, such as duplicate records or invalid foreign key
references.
o Loss of trust in the database as a reliable source of information.
o Potential failure of dependent applications due to incorrect or corrupted data.

3. Concurrency Control
 Responsibility: Manage simultaneous access by multiple users to ensure data consistency
and prevent conflicts.
 Problems if not discharged:
o Conflicting updates, such as two users modifying the same record simultaneously,
leading to data anomalies.
o Lost updates or overwritten data.
o Deadlocks or system crashes caused by uncoordinated access.

4. Backup and Recovery


 Responsibility: Provide mechanisms to recover data in case of system failures, hardware
crashes, or accidental deletions.
 Problems if not discharged:
o Irretrievable loss of data in the event of a failure.
o Extended downtime while attempting manual recovery.
o Business disruptions and potential financial losses due to data unavailability.

5. Security and Access Control


 Responsibility: Restrict unauthorized access and ensure sensitive data is protected.
 Problems if not discharged:
o Unauthorized access to sensitive information, leading to data breaches and privacy
violations.
o Malicious or accidental alterations to data by unauthorized users.
o Non-compliance with legal and regulatory standards, potentially resulting in
penalties or lawsuits.
1.10 List at least two reasons why database systems support data manipulation using a declarative
query language such as SQL, instead of just providing a a library of C or C++ functions to carry out
data manipulation.
Here are two key reasons why database systems support data manipulation using a declarative
query language like SQL instead of just providing a library of C or C++ functions:
1. Simplicity and Ease of Use
 Reason: SQL is a high-level, declarative language that allows users to specify what they want
to do with the data, rather than how to perform the operations.
 Explanation: Users can write SQL queries to retrieve, update, or manipulate data with
simple, concise statements. In contrast, using a language like C or C++ would require writing
detailed instructions on how to access and modify data (e.g., looping through records,
handling memory management, etc.), which would be far more complex and error-prone.
 Example: A simple SQL query like SELECT name FROM employees WHERE department = 'HR';
is easier to write and understand than writing equivalent C or C++ code that handles data
retrieval and processing manually.
2. Optimization and Efficiency
 Reason: SQL allows the database system to handle query optimization and efficiently
execute complex queries, which would be challenging to achieve with C or C++ functions.
 Explanation: The DBMS can internally optimize SQL queries, determining the best way to
retrieve or manipulate data, such as using indexes, joins, and other strategies to improve
performance. When using a library of C or C++ functions, the developer would need to
manually handle optimizations, which is error-prone and inefficient.
 Example: The DBMS might automatically choose the best index or method for performing a
join, something that would require careful planning and optimization in a C or C++
application, potentially leading to slower and less efficient code.
By using a declarative language like SQL, the database system abstracts the complexities of data
manipulation and optimization, making it easier for developers to work with databases and for
the DBMS to ensure high performance and correctness.
1.11 What are five main functions of a database administrator?
1. Database Design and Modeling
 Function: The DBA is responsible for designing the database structure, including the schema,
tables, relationships, and constraints.
 Explanation: They work with application developers and business users to determine the
data requirements and create an optimized design. Proper design ensures that data is stored
efficiently, supports all necessary relationships, and adheres to business rules.
 Example: Defining tables for a customer database, determining the relationships between
customers and orders, and applying constraints like primary keys and foreign keys.

2. Performance Tuning and Optimization


 Function: The DBA monitors and optimizes the performance of the database to ensure it
operates efficiently and can handle large volumes of data and queries.
 Explanation: This involves indexing, query optimization, and adjusting system parameters to
ensure fast data retrieval, low latency, and efficient use of resources (e.g., CPU, memory).
 Example: Creating indexes on frequently queried columns or optimizing slow-running SQL
queries.

3. Backup and Recovery Management


 Function: The DBA ensures that data is regularly backed up and can be restored in case of
hardware failures, data corruption, or other disasters.
 Explanation: Regular backups, both full and incremental, are scheduled to prevent data loss.
The DBA also tests the recovery process to ensure that data can be restored quickly and
accurately when needed.
 Example: Setting up daily full backups and weekly incremental backups for a production
database, and ensuring that recovery procedures are in place in case of a system crash.

4. Security Management
 Function: The DBA manages database security to ensure that data is protected from
unauthorized access or breaches.
 Explanation: This involves setting up user roles and permissions, enforcing password policies,
and ensuring encryption for sensitive data. The DBA also monitors database access logs for
suspicious activity.
 Example: Granting read-only access to certain users while giving full access to
administrators, and implementing encryption for storing credit card information.

5. Database Maintenance and Updates


 Function: The DBA is responsible for ongoing database maintenance tasks such as upgrading
the database software, patching security vulnerabilities, and maintaining the health of the
database system.
 Explanation: Regular maintenance ensures that the database software is up-to-date and free
from security flaws. The DBA must also monitor the database for signs of corruption or
performance degradation and take corrective actions.
 Example: Applying database patches to fix bugs, upgrading the database version for new
features, or cleaning up obsolete data.
1.12 Explain the difference between two-tier and three-tier architectures. Which is better suited
for Web applications? Why?

Feature Two-Tier Architecture Three-Tier Architecture

Client, Application Server, and


Layers Client and Database Server
Database Server

Limited scalability, as both layers Better scalability by separating logic


Scalability
are tightly coupled and data layers

Enhanced security with the


Database is exposed to the client,
Security application server acting as a
higher risk
mediator

Simpler to implement but harder More complex but allows for better
Complexity
to scale separation of concerns

Harder to maintain as logic is Easier to maintain and update layers


Maintainability
tightly coupled with the client independently

Three-tier architecture is better suited for Web applications. Here's why:


1. Scalability: Web applications often serve a large number of concurrent users. Three-tier
architecture allows the application server to handle the business logic, which can be scaled
independently from the database server. This is especially important in high-traffic web
environments where direct client-to-database connections (as in two-tier) could overwhelm
the database.
2. Security: In a three-tier system, the database server is not directly exposed to the client. All
interactions with the database are mediated through the application server, which can
implement proper access controls, validation, and security checks, reducing the risk of
security vulnerabilities.
3. Separation of Concerns: The three-tier model separates the presentation layer (client),
business logic (application server), and data management (database). This modularization
allows for easier maintenance, as changes in one layer (like changing the business logic or
updating the database) do not directly affect other layers.
4. Flexibility: Web applications often need to support a variety of clients (browsers, mobile
devices, etc.), and three-tier architecture allows business logic and data to be processed on
the server side, making the client-side simpler and more lightweight.
5. Maintainability and Upgradability: With three-tier architecture, it’s easier to modify or
update the application server or database server independently of the client interface,
making the system more adaptable to changing requirements over time.
1.13 Describe at least 3 tables that might be used to store information in a social-networking
system such as Facebook.
Users Table
Purpose: Stores information about each user of the social network.
Columns:
user_id (Primary Key): A unique identifier for each user.
first_name: The user's first name.
last_name: The user's last name.
email: The user's email address.
password_hash: A hashed version of the user's password.
dob (date of birth): The user's birthdate.
profile_picture_url: URL to the user's profile image.
date_joined: The date when the user created their account.
status: The current status or bio of the user (optional).
location: The user's city or geographical location.
Posts Table
Purpose: Stores posts made by users, including text, images, and any associated metadata (e.g.,
timestamp).
Columns:
post_id (Primary Key): A unique identifier for each post.
user_id (Foreign Key): The ID of the user who made the post, referencing the Users table.
content: The text or content of the post (can include links or text).
image_url: A URL to any image or media attached to the post.
timestamp: The date and time when the post was created.
location: Optional field for tagging the location of the post.
visibility: Defines whether the post is public, private, or restricted to specific users or groups.
Friends Table
Purpose: Stores relationships between users (i.e., friend connections).
Columns:
user_id (Foreign Key): The ID of the first user in the friendship, referencing the Users table.
friend_id (Foreign Key): The ID of the second user in the friendship, also referencing the Users
table.
status: The status of the friendship (e.g., pending, confirmed, blocked).
date_added: The date when the friendship was created.

2.1 Consider the foreign key constraint from the dept name attribute of instructor to the department
relation. Give examples of inserts and deletes to these relations, which can cause a violation of the
foreign key constraint.

A foreign key constraint ensures that the value of the attribute in a table (the foreign key)
corresponds to a valid value in another table (the referenced table). In this case, the foreign key
constraint connects the dept_name attribute in the instructor table to the department relation
(table), ensuring that each instructor is assigned to a valid department.

Foreign Key Setup:

 instructor table:

o instructor_id

o name

o dept_name (foreign key to department table)

 department table:

o dept_name (primary key)

o dept_head

o budget

The foreign key constraint on instructor.dept_name ensures that every value in instructor.dept_name
must exist as a valid dept_name in the department table.

Example of Foreign Key Constraint Violation:

1. Insertions that Cause a Violation:

When inserting data into the instructor table, a violation occurs if we try to insert a record with a
dept_name that does not exist in the department table.

Example:

 The department table contains:

dept_name dept_head budget

Computer Science Dr. Smith 50000

Electrical Dr. Johnson 60000


Engineering

 Attempting to insert an instructor with a non-existent department:

INSERT INTO instructor (instructor_id, name, dept_name)

VALUES (1, 'Dr. Alice', 'Mechanical Engineering');

Violation: The dept_name 'Mechanical Engineering' does not exist in the department table, so this
insert would violate the foreign key constraint.

2. Deletions that Cause a Violation:

When deleting data from the department table, a violation occurs if we try to delete a department
that is referenced by one or more instructors in the instructor table.

Example:
 The instructor table contains:

instructor_id name dept_name

1 Dr. Alice Computer Science

2 Dr. Bob Electrical Engineering

 Attempting to delete the Computer Science department while it is still referenced by an


instructor:

DELETE FROM department

WHERE dept_name = 'Computer Science';

Violation: Since the Computer Science department is referenced by Dr. Alice in the instructor table,
deleting the department would violate the foreign key constraint.

How to Prevent Violations:

To prevent violations, you can define referential actions on the foreign key constraint, such as:

1. ON DELETE CASCADE: Deletes all related records in the instructor table if the department is
deleted.

2. ON DELETE SET NULL: Sets the dept_name in the instructor table to NULL if the related
department is deleted.

3. ON UPDATE CASCADE: Updates the dept_name in the instructor table if the department
name in the department table is updated.

For example:

ALTER TABLE instructor

ADD CONSTRAINT fk_dept_name

FOREIGN KEY (dept_name) REFERENCES department(dept_name)

ON DELETE CASCADE;

This would automatically delete all instructors assigned to a department if the department is
deleted, preventing a foreign key violation.

2.2 Consider the time slot relation. Given that a particular time slot can meet more than once in a
week, explain why day and start time are part of the primary key of this relation, while end time is
not.

In a time slot relation (e.g., a table that stores information about class schedules or meetings), the
primary key typically uniquely identifies each record in the relation. Given the scenario where a time
slot can meet more than once in a week, we need to understand why certain attributes like day and
start time are part of the primary key, while end time is not.

Time Slot Relation Example:

Let's assume the table has the following attributes:


 day: The day of the week when the time slot occurs (e.g., Monday, Tuesday).

 start_time: The start time of the meeting or class.

 end_time: The end time of the meeting or class.

 course_id: The course or event ID that is scheduled during this time slot.

 location: The location where the meeting or class takes place.

Primary Key Selection:

The primary key of this relation should uniquely identify each time slot. Here’s why day and start
time are part of the primary key, while end time is not:

1. Day and Start Time Uniquely Identify a Slot:

o Day and start time together are sufficient to uniquely identify when a time slot
occurs, because each time slot on a particular day will have a specific start time.

o For example, a class could occur at 10:00 AM on Monday, 11:00 AM on Monday, etc.
Similarly, the same course may repeat on the same day (e.g., Monday at 10:00 AM
every week).

o If both day and start_time are unique together, they provide enough information to
differentiate between different time slots.

2. End Time is Not Necessary for Uniqueness:

o The end time typically does not need to be part of the primary key because it
doesn't help in uniquely identifying the time slot itself. In most scheduling systems,
the day and start time can uniquely define a meeting or class, even if the end time is
not explicitly stored as part of the primary key.

o Multiple events can have the same start time and day, but they might differ in end
time. However, this doesn’t affect the uniqueness of the time slot since start time
already serves as a distinguishing factor.

o For example, a course could be scheduled to meet from 10:00 AM to 11:00 AM and
another course might meet from 10:00 AM to 12:00 PM on the same day. In this
case, both have the same start_time and day, but they are different records and can
still be distinguished by the course ID or other attributes, but the start time and day
uniquely identify the time slot without needing the end time.

2.3 In the instance of instructor, no two instructors have the same name. From this, can we conclude
that name can be used as a super key (or primary key) of instructor?

No, we cannot conclude that name can be used as a superkey (or primary key) of the instructor
table, even if no two instructors have the same name. Here's why:

1. Superkey Definition:

A superkey is a set of one or more attributes (columns) that can uniquely identify every tuple (row)
in a relation (table). A primary key is a minimal superkey, meaning it's a superkey with no
unnecessary attributes (i.e., it cannot be reduced further without losing its ability to uniquely identify
records).
2. Why Name Can't Be Used as a Superkey:

Even if we are told that no two instructors have the same name, the name attribute does not
necessarily meet the requirements for a superkey or primary key because:

 Potential for Data Inconsistencies: In real-world applications, it's possible (though unlikely)
that instructors could have the same name in some cases, especially if the system is not
constrained to enforce unique names. For example, there could be two instructors with the
name "John Doe" in different departments, even if they don't exist in the current dataset. In
a future scenario where the data changes or more instructors are added, it might violate the
uniqueness assumption.

 Lack of Minimality: For an attribute to be a primary key, it must uniquely identify each
record, but also be minimal. A minimal key means no other attributes can be removed
without breaking the uniqueness property. While name might uniquely identify instructors
under current conditions, it is not guaranteed to be the only possible identifier in the future
(for example, in cases of name duplication or system expansions). A better candidate for the
primary key is typically something like instructor_id (which is guaranteed to be unique and
minimal).

3. Practical Example:

 Suppose we have the following instructor table:

instructor_id name dept_name

1 John Doe Computer Science

2 Alice Smith Electrical Eng

3 John Doe Mechanical Eng

 While the names "John Doe" might be unique for the current data, in the future, there could
be two instructors with the same name in different departments (as in the example).
Therefore, name alone cannot guarantee uniqueness across all possible records.

4. Better Candidate for Superkey:

A better candidate for a primary key is an instructor_id or a combination of instructor_id and name
(if instructor_id is not available). This guarantees uniqueness regardless of the instructor's name.

2.4 What is the result of first performing the cross product of student and advisor, and then
performing a selection operation on the result with the predicate s id = ID? (Using the symbolic
notation of relational algebra, this query can be written as s id=I D(student × advisor ).)

Let's break down the query step by step, using relational algebra operations. The query is:

σs.id=id(student×advisor)

This means we are performing a cross product of the student and advisor relations and then
applying a selection operation to filter the result based on the condition s.id=id, where s.id refers to
the id of the student and id refers to the id of the advisor.

Step-by-Step Explanation:

1. Cross Product (×):


o The cross product (also known as the Cartesian product) of two relations combines
every tuple from the first relation (student) with every tuple from the second
relation (advisor).

o If the student relation has attributes (s.id, s.name, s.major) and the advisor relation
has attributes (id, name, dept), the result of the cross product will have attributes:

o (s.id, s.name, s.major, id, name, dept)

o The number of resulting tuples will be the product of the number of tuples in the
student relation and the number of tuples in the advisor relation.

2. Selection (σ):

o The selection operation filters the result of the cross product based on the condition
s.id= id. This means we will only keep the tuples where the student id matches the
advisor id.

o After applying the selection, the resulting relation will only contain the pairs of
students and advisors where the student and advisor have the same id. Essentially,
this step pairs up each student with their respective advisor.

Result of the Query:

 After the cross product, every student is paired with every advisor, but after applying the
selection operation s.id= id, the result will be filtered to only include those pairs where the id
of the student matches the id of the advisor.

 The resulting relation will have the attributes:

o (s.id, s.name, s.major, id, name, dept)

o But since s.id= id in the resulting tuples, the attributes could be simplified to:

o (s.id, s.name, s.major, name, dept)

 The resulting tuples will correspond to the students and their respective advisors (where the
student’s id matches the advisor’s id).

2.5
2.6 employee (person name, street, city)

works (person name, company name, salary)

company (company name, city)


Consider the relational database above. Give an expression in the

relational algebra to express each of the following queries:

a. Find the names of all employees who live in city “Miami”.

b. Find the names of all employees whose salary is greater than $100,000.

c. Find the names of all employees who live in “Miami” and whose

salary is greater than $100,000.


2.7 branch (branch name, branch city, assets)

customer (customer name, customer street, customer city)

loan (loan number, branch name, amount)

borrower (customer name, loan number)

account (account number, branch name, balance)

depositor (customer name, account number)

Consider the bank database above. Give an expression in the relational algebra for each of
the following queries.

a. Find the names of all branches located in “Chicago”.

b. Find the names of all borrowers who have a loan in branch “Downtown”.
2.8 for the database given above, a. What are the appropriate primary keys? b. Given your
choice of primary keys, identify appropriate foreign keys.

a. Appropriate Primary Keys

A primary key uniquely identifies each record in a table. Here’s the identification of the
primary keys for each table:

1. branch(branch_name, branch_city, assets)

o Primary Key: branch_name

 Reason: The branch_name uniquely identifies each branch in the bank. Each
branch has a unique name, and there’s no need for multiple attributes to
uniquely identify it.

2. customer(customer_name, customer_street, customer_city)

o Primary Key: customer_name

 Reason: Assuming each customer has a unique name in the system (or can
be uniquely identified by name), this would be the primary key. In real-world
systems, it might be more practical to have a customer ID, but here we
assume the name is unique.

3. loan(loan_number, branch_name, amount)

o Primary Key: loan_number

 Reason: The loan_number uniquely identifies each loan, and the


combination of loan_number, branch_name, and amount is not necessary
because each loan has a unique number.

4. borrower(customer_name, loan_number)

o Primary Key: (customer_name, loan_number)


 Reason: The combination of customer_name and loan_number is necessary
because a customer can have multiple loans, and a loan can be associated
with multiple borrowers. So, the pair of attributes ensures uniqueness.

5. account(account_number, branch_name, balance)

o Primary Key: account_number

 Reason: The account_number uniquely identifies each account in the bank.

6. depositor(customer_name, account_number)

o Primary Key: (customer_name, account_number)

 Reason: The combination of customer_name and account_number ensures


uniqueness, as a customer can have multiple accounts, and an account can
have multiple depositors (in case of joint accounts).

b. Appropriate Foreign Keys

Foreign keys establish relationships between tables by referencing primary keys in other
tables. Here’s how we can define the foreign keys:

1. loan(loan_number, branch_name, amount)

o Foreign Key: branch_name

 References: branch(branch_name)

 Reason: The branch_name in the loan table references the branch_name in


the branch table to indicate which branch is associated with the loan.

2. borrower(customer_name, loan_number)

o Foreign Key: customer_name

 References: customer(customer_name)

 Reason: The customer_name in the borrower table references the


customer_name in the customer table, identifying which customer has the
loan.

o Foreign Key: loan_number

 References: loan(loan_number)

 Reason: The loan_number in the borrower table references the


loan_number in the loan table, indicating which loan is associated with the
borrower.

3. account(account_number, branch_name, balance)

o Foreign Key: branch_name

 References: branch(branch_name)

 Reason: The branch_name in the account table references the branch_name


in the branch table to indicate which branch an account belongs to.
4. depositor(customer_name, account_number)

o Foreign Key: customer_name

 References: customer(customer_name)

 Reason: The customer_name in the depositor table references the


customer_name in the customer table to indicate which customer is a
depositor in a particular account.

o Foreign Key: account_number

 References: account(account_number)

 Reason: The account_number in the depositor table references the


account_number in the account table to identify which account a customer
is depositing into.

2.9 Describe the differences in meaning between the terms relation and relation schema.

Aspect Relation Relation Schema

Definition A set of tuples (data entries) in a The structure or blueprint of a relation


table

Represents Actual data in the database The design/structure of a table (not


data)

Example A table with actual rows and values A table definition, listing attributes
and types

Context Instance of a table at a given time The schema or structure used to


define a table

Change Changes as data is updated or Changes when the table structure


inserted (e.g., attributes) is modified

2.10 employee (person name, street, city)

works (person name, company name, salary)

company (company name, city)

Consider the relational database above. Give an expression in the

relational algebra to express each of the following queries:

a. Find the names of all employees who work for “First Bank Corporation”.

b. Find the names and cities of residence of all employees who work for

“First Bank Corporation”.

c. Find the names, street address, and cities of residence of all employees

who work for “First Bank Corporation” and earn more than $10,000.
2.11 branch (branch name, branch city, assets)

customer (customer name, customer street, customer city)

loan (loan number, branch name, amount)

borrower (customer name, loan number)

account (account number, branch name, balance)

depositor (customer name, account number)

Consider the bank database of Figure 2.15. Give an expression in the relational algebra for
each of the following queries:

a. Find all loan numbers with a loan value greater than $10,000.

b. Find the names of all depositors who have an account with a value

greater than $6,000.

c. Find the names of all depositors who have an account with a value

greater than $6,000 at the “Uptown” branch.


3.1 Suppose you are given a relation grade points(grade, points), which provides a conversion from
letter grades in the takes relation to numeric scores; for example an “A” grade could be specified to
correspond to 4 points, an “A−” to 3.7 points, a “B+” to 3.3 points, a “B” to 3 points, and so on. The
grade points earned by a student for a course offering (section) is defined as the number of credits
for the course multiplied by the numeric points for the grade that the student received.

Given the above relation, and our university schema, write each of the following queries in SQL. You
can assume for simplicity that no takes tuple has the null value for grade.

a. Find the total grade-points earned by the student with ID 12345, across all courses taken by the
student.

b. Find the grade-point average (GPA) for the above student, that is, the total grade-points divided by
the total credits for the associated courses.

c. Find the ID and the grade-point average of every student.

a. SELECT SUM(takes.credits * grade_points.points) AS total_grade_points

FROM takes

JOIN grade_points

ON takes.grade = grade_points.grade

WHERE takes.ID = 12345;

b. SELECT

SUM(takes.credits * grade_points.points) / SUM(takes.credits) AS GPA


FROM takes

JOIN grade_points

ON takes.grade = grade_points.grade

WHERE takes.ID = 12345;

c. SELECT

takes.ID,

SUM(takes.credits * grade_points.points) / SUM(takes.credits) AS GPA

FROM takes

JOIN grade_points

ON takes.grade = grade_points.grade

GROUP BY takes.ID;

3.2 person (driver id, name, address)

car (license, model, year)

accident (report number, date, location)

owns (driver id, license)

participated (report number, license, driver id, damage amount)

Consider the insurance database above, where the primary keys

are underlined. Construct the following SQL queries for this relational

database.

a. Find the total number of people who owned cars that were involved

in accidents in 2009.

b. Add a new accident to the database; assume any values for required

attributes.

c. Delete the Mazda belonging to “John Smith”.

a. Find total number of people who owned cars involved in accidents in 2009:

SELECT COUNT(DISTINCT owns.driver_id) AS total_people

FROM owns

JOIN car ON owns.license = car.license

JOIN participated ON car.license = participated.license

JOIN accident ON participated.report_number = accident.report_number

WHERE YEAR(accident.date) = 2009;


b. Add a new accident to the database:

INSERT INTO accident (report_number, date, location)

VALUES ('R12345', '2009-05-15', 'Main Street and 5th Avenue');

c. Delete the Mazda belonging to “John Smith”:

DELETE FROM owns

WHERE driver_id = (SELECT driver_id FROM person WHERE name = 'John Smith')

AND license = (SELECT license FROM car WHERE model = 'Mazda' AND driver_id = (SELECT driver_id
FROM person WHERE name = 'John Smith'));

3.3 Suppose that we have a relation marks (ID, score) and we wish to assign grades to students based
on the score as follows: grade F if score < 40, grade C if 40 ≤ score < 60, grade B if 60 ≤ score < 80,
and grade A if 80 ≤ score. Write SQL queries to do the following:

a. Display the grade for each student, based on the marks relation.

b. Find the number of students with each grade.

a. Display the grade for each student:

SELECT ID,

score,

CASE

WHEN score < 40 THEN 'F'

WHEN score >= 40 AND score < 60 THEN 'C'

WHEN score >= 60 AND score < 80 THEN 'B'

WHEN score >= 80 THEN 'A'

END AS grade

FROM marks;

b. Find the number of students with each grade:

SELECT

CASE

WHEN score < 40 THEN 'F'

WHEN score >= 40 AND score < 60 THEN 'C'

WHEN score >= 60 AND score < 80 THEN 'B'

WHEN score >= 80 THEN 'A'

END AS grade,

COUNT(*) AS num_students
FROM marks

GROUP BY grade;

3.4 Consider the SQL query:

select distinct p.a1

from p, r1, r2

where p.a1 = r1.a1 or p.a1 = r2.a1

Under what conditions does the preceding query select values of p.a1 that

are either in r1 or in r2? Examine carefully the cases where one of r1 or r2

may be empty.

General Behavior of the Query:

1. When r1 and r2 both have matching rows:

o The query will select values of p.a1 that match either r1.a1 or r2.a1. The OR
condition means that if p.a1 matches r1.a1 or p.a1 matches r2.a1, it will be included
in the result.

2. When r1 and r2 both have matching rows and there is overlap between them (i.e., some a1
values exist in both r1 and r2):

o The DISTINCT keyword ensures that the result set does not have duplicate p.a1
values, so even if p.a1 matches both r1.a1 and r2.a1, it will appear only once in the
output.

Case Analysis When One of r1 or r2 is Empty:

1. When r1 is Empty:

 If r1 is empty, the condition p.a1 = r1.a1 will never be true for any row in p. The query will
only return values of p.a1 that match r2.a1. In this case, the query behaves as if it is checking
for p.a1 values that exist in r2.

 Result when r1 is empty:

o The query will select values of p.a1 that are present in r2. If there are no values of
p.a1 that match any r2.a1, the result will be empty.

2. When r2 is Empty:

 Similarly, if r2 is empty, the condition p.a1 = r2.a1 will never be true for any row in p. The
query will only return values of p.a1 that match r1.a1. In this case, the query behaves as if it
is checking for p.a1 values that exist in r1.

 Result when r2 is empty:

o The query will select values of p.a1 that are present in r1. If there are no values of
p.a1 that match any r1.a1, the result will be empty.

3. When Both r1 and r2 are Empty:


 If both r1 and r2 are empty, neither condition p.a1 = r1.a1 nor p.a1 = r2.a1 can be satisfied
for any p.a1.

 Result when both r1 and r2 are empty:

o The result will be empty because no values of p.a1 will match either r1.a1 or r2.a1.

3.5 branch(branch name, branch city, assets)

customer (customer name, customer street, customer city)

loan (loan number, branch name, amount)

borrower (customer name, loan number)

account (account number, branch name, balance )

depositor (customer name, account number)

Consider the bank database above, where the primary keys are underlined. Construct the following
SQL queries for this relational database.

a. Find all customers of the bank who have an account but not a loan.

b. Find the names of all customers who live on the same street and in

the same city as “Smith”.

c. Find the names of all branches with customers who have an account

in the bank and who live in “Harrison”.

a. Find all customers with an account but no loan:

SELECT c.customer_name

FROM customer c

JOIN depositor d ON c.customer_name = d.customer_name

LEFT JOIN borrower b ON c.customer_name = b.customer_name

WHERE b.loan_number IS NULL;

b. Find all customers who live on the same street and city as "Smith":

SELECT c1.customer_name

FROM customer c1, customer c2

WHERE c1.customer_street = c2.customer_street

AND c1.customer_city = c2.customer_city

AND c2.customer_name = 'Smith';

c. Find the names of all branches with customers who have an account and live in "Harrison":

SELECT DISTINCT b.branch_name


FROM branch b

JOIN account a ON b.branch_name = a.branch_name

JOIN depositor d ON a.account_number = d.account_number

JOIN customer c ON d.customer_name = c.customer_name

WHERE c.customer_city = 'Harrison';

3.6 employee (employee name, street, city)

works (employee name, company name, salary)

company (company name, city)

manages (employee name, manager name)

Consider the employee database of Figure 3.20, where the primary keys are

underlined. Give an expression in SQL for each of the following queries.

a. Find the names and cities of residence of all employees who work for

“First Bank Corporation”.

b. Find the names, street addresses, and cities of residence of all employees who work for “First Bank
Corporation” and earn more than

$10,000.

c. Find all employees in the database who do not work for “First Bank

Corporation”.

d. Find all employees in the database who earn more than each employee

of “Small Bank Corporation”.

e. Assume that the companies may be located in several cities. Find all

companies located in every city in which “Small Bank Corporation”

is located.

f. Find the company that has the most employees.

g. Find those companies whose employees earn a higher salary, on av-

erage, than the average salary at “First Bank Corporation”.

h. Modify the database so that “Jones” now lives in “Newtown”.

i. Give all managers of “First Bank Corporation” a 10 percent raise

unless the salary becomes greater than $100,000; in such cases, give

only a 3 percent raise.

a. Find names and cities of employees working for "First Bank Corporation":
SELECT e.employee_name, e.city

FROM employee e

JOIN works w ON e.employee_name = w.employee_name

WHERE w.company_name = 'First Bank Corporation';

b. Find names, street addresses, and cities of employees working for "First Bank Corporation" and
earning more than $10,000:

SELECT e.employee_name, e.street, e.city

FROM employee e

JOIN works w ON e.employee_name = w.employee_name

WHERE w.company_name = 'First Bank Corporation' AND w.salary > 10000;

c. Find employees not working for "First Bank Corporation":

SELECT e.employee_name

FROM employee e

LEFT JOIN works w ON e.employee_name = w.employee_name

WHERE w.company_name != 'First Bank Corporation' OR w.company_name IS NULL;

d. Find employees earning more than each employee of "Small Bank Corporation":

SELECT e.employee_name

FROM employee e

JOIN works w ON e.employee_name = w.employee_name

WHERE w.salary > ALL (SELECT w2.salary

FROM works w2

WHERE w2.company_name = 'Small Bank Corporation');

e. Find all companies located in every city where "Small Bank Corporation" is located:

SELECT c.company_name

FROM company c

WHERE NOT EXISTS (SELECT 1

FROM company c2

WHERE c2.company_name = 'Small Bank Corporation'

AND c2.city NOT IN (SELECT c3.city

FROM company c3

WHERE c3.company_name = c.company_name));


f. Find the company with the most employees:

SELECT w.company_name

FROM works w

GROUP BY w.company_name

ORDER BY COUNT(w.employee_name) DESC

LIMIT 1;

g. Find companies with higher average salary than "First Bank Corporation":

SELECT w.company_name

FROM works w

GROUP BY w.company_name

HAVING AVG(w.salary) > (SELECT AVG(w2.salary)

FROM works w2

WHERE w2.company_name = 'First Bank Corporation');

h. Modify the database so "Jones" now lives in "Newtown":

UPDATE employee

SET city = 'Newtown'

WHERE employee_name = 'Jones';

i. Give managers of "First Bank Corporation" a 10% raise unless salary exceeds $100,000:

UPDATE works

SET salary = CASE

WHEN salary * 1.10 > 100000 THEN salary * 1.03

ELSE salary * 1.10

END

WHERE employee_name IN (SELECT m.employee_name

FROM manages m

JOIN works w ON m.employee_name = w.employee_name

WHERE w.company_name = 'First Bank Corporation');

3.7 For the above database,

a. Find the names of all employees who work for “First Bank Corporation”.

b. Find all employees in the database who live in the same cities as the
companies for which they work.

c. Find all employees in the database who live in the same cities and on

the same streets as do their managers.

d. Find all employees who earn more than the average salary of all

employees of their company.

e. Find the company that has the smallest payroll.

f. Give all employees of “First Bank Corporation” a 10 percent raise.

g. Give all managers of “First Bank Corporation” a 10 percent raise.

h. Delete all tuples in the works relation for employees of “Small Bank

Corporation”.

a. Find the names of all employees who work for "First Bank Corporation":

SELECT employee_name

FROM works

WHERE company_name = 'First Bank Corporation';

b. Find all employees who live in the same cities as the companies they work for:

SELECT e.employee_name

FROM employee e

JOIN works w ON e.employee_name = w.employee_name

JOIN company c ON w.company_name = c.company_name

WHERE e.city = c.city;

c. Find all employees who live in the same cities and on the same streets as their managers:

SELECT e.employee_name

FROM employee e

JOIN manages m ON e.employee_name = m.employee_name

JOIN employee manager ON m.manager_name = manager.employee_name

WHERE e.city = manager.city

AND e.street = manager.street;

d. Find employees who earn more than the average salary at their company:

SELECT e.employee_name

FROM works w

JOIN employee e ON w.employee_name = e.employee_name


WHERE w.salary > (SELECT AVG(salary)

FROM works

WHERE company_name = w.company_name);

e. Find the company with the smallest payroll:

SELECT company_name

FROM works

GROUP BY company_name

ORDER BY SUM(salary) ASC

LIMIT 1;

f. Give all employees of "First Bank Corporation" a 10% raise:

UPDATE works

SET salary = salary * 1.10

WHERE company_name = 'First Bank Corporation';

g. Give all managers of "First Bank Corporation" a 10% raise:

UPDATE works

SET salary = salary * 1.10

WHERE employee_name IN (SELECT m.employee_name

FROM manages m

JOIN works w ON m.employee_name = w.employee_name

WHERE w.company_name = 'First Bank Corporation');

h. Delete all tuples for employees of "Small Bank Corporation" from the works table:

DELETE FROM works

WHERE company_name = 'Small Bank Corporation';

3.8 Consider the query:

select course id, semester, year, sec id, avg (tot cred)

from takes natural join student

where year = 2009

group by course id, semester, year, sec id

having count (ID) >= 2;

Explain why joining section as well in the from clause would not change the result.

In the given SQL query, you are performing the following operations:
 select course_id, semester, year, sec_id, avg(tot_cred): You are selecting the course ID,
semester, year, section ID, and the average total credits (tot_cred).

 from takes natural join student: You are joining the takes and student tables using a natural
join. This join is based on columns with the same name in both tables (e.g., student_ID or
course_id).

 where year = 2009: You filter the rows to only include those for the year 2009.

 group by course_id, semester, year, sec_id: You group the results by course_id, semester,
year, and sec_id.

 having count(ID) >= 2: You only include groups that have at least two students (ID represents
the student identifier).

Joining section in the FROM clause:

The query involves information from the takes and student tables. If you were to include a join with
the section table, you would potentially be joining on a section_id that may already be implicitly
represented by the sec_id column in the takes table.

Why joining section would not change the result:

1. Existing Relationship: The sec_id in the takes table already represents the section ID, which
is likely a foreign key that links the takes table with the section table (assuming the sec_id is
unique to a section). So, sec_id already implicitly references the section table.

2. Redundant Information: The query already includes sec_id in the group by clause. If you
were to add an additional join with the section table (using sec_id), the result would still
include the same sec_id values because the data for sections is already represented by the
sec_id column in the takes table. There is no new information being added by joining the
section table.

3. No Additional Filtering: The section table (presumably containing additional attributes like
section_name, instructor, etc.) would not filter or affect the results unless you add specific
conditions in the WHERE clause related to the section table. But as it stands, the WHERE
clause already filters based on the year, which does not need any additional information
from the section table.

4. Logical Equivalence: Adding the section table in the FROM clause would be logically
equivalent to using it in the group by and having clauses if it does not filter out any rows or
affect the results directly. Since no such filtering condition is specified in this query, the
inclusion of section would not change the outcome.

4.1 Write the following queries in SQL:

a. Display a list of all instructors, showing their ID, name, and the number of sections that they have
taught. Make sure to show the number of sections as 0 for instructors who have not taught any
section. Your query should use an outer join, and should not use scalar subqueries.

b. Write the same query as above, but using a scalar subquery, without outer join.

c. Display the list of all course sections offered in Spring 2010, along with the names of the
instructors teaching the section. If a section has more than one instructor, it should appear as many
times in the result as it has instructors. If it does not have any instructor, it should still appear in the
result with the instructor name set to “—”.

d. Display the list of all departments, with the total number of instructors in each department,
without using scalar subqueries. Make sure to correctly handle departments with no instructors.

a. SELECT i.ID, i.name, COUNT(s.sec_id) AS num_sections

FROM instructor i

LEFT OUTER JOIN teaches t ON i.ID = t.ID

LEFT OUTER JOIN section s ON t.sec_id = s.sec_id

GROUP BY i.ID, i.name;

b. SELECT i.ID, i.name,

(SELECT COUNT(*)

FROM teaches t

WHERE t.ID = i.ID) AS num_sections

FROM instructor i;

c. SELECT s.course_id, s.sec_id, s.semester, s.year,

COALESCE(i.name, '—') AS instructor_name

FROM section s

LEFT OUTER JOIN teaches t ON s.sec_id = t.sec_id AND s.year = t.year AND s.semester = t.semester

LEFT OUTER JOIN instructor i ON t.ID = i.ID

WHERE s.semester = 'Spring' AND s.year = 2010;

d. SELECT d.dept_name, COUNT(i.ID) AS num_instructors

FROM department d

LEFT OUTER JOIN instructor i ON d.dept_name = i.dept_name

GROUP BY d.dept_name;

4.2 Outer join expressions can be computed in SQL without using the SQL outer join operation. To
illustrate this fact, show how to rewrite each of the following SQL queries without using the outer
join expression.

a. select* from student natural left outer join takes

b. select* from student natural full outer join takes

a. SELECT student.*, takes.*

FROM student

LEFT JOIN takes ON student.ID = takes.ID;


b. SELECT student.*, takes.*

FROM student

LEFT JOIN takes ON student.ID = takes.ID

UNION

SELECT student.*, takes.*

FROM takes

LEFT JOIN student ON takes.ID = student.ID;

4.3 employee (employee name, street, city)

works (employee name, company name, salary)

company (company name, city)

manages (employee name, manager name)

Consider the relational database above. Give an SQL DDL definition

of this database. Identify referential-integrity constraints that should hold,

and include them in the DDL definition.

-- Create the employee table

CREATE TABLE employee (

employee_name VARCHAR(100) PRIMARY KEY, -- Assuming employee names are unique

street VARCHAR(100) NOT NULL,

city VARCHAR(100) NOT NULL

);

-- Create the company table

CREATE TABLE company (

company_name VARCHAR(100) PRIMARY KEY, -- Company names are unique

city VARCHAR(100) NOT NULL

);

-- Create the works table

CREATE TABLE works (

employee_name VARCHAR(100),

company_name VARCHAR(100),
salary DECIMAL(15, 2) NOT NULL,

PRIMARY KEY (employee_name, company_name),

FOREIGN KEY (employee_name) REFERENCES employee(employee_name)

ON DELETE CASCADE ON UPDATE CASCADE,

FOREIGN KEY (company_name) REFERENCES company(company_name)

ON DELETE CASCADE ON UPDATE CASCADE

);

-- Create the manages table

CREATE TABLE manages (

employee_name VARCHAR(100),

manager_name VARCHAR(100),

PRIMARY KEY (employee_name, manager_name),

FOREIGN KEY (employee_name) REFERENCES employee(employee_name)

ON DELETE CASCADE ON UPDATE CASCADE,

FOREIGN KEY (manager_name) REFERENCES employee(employee_name)

ON DELETE CASCADE ON UPDATE CASCADE

);

4.4 SQL provides an n-ary operation called coalesce, which is defined as follows: coalesce (A1, A2, ...,
An) returns the first non-null Ai in the list A1, A2, ..., An, and returns null if all of A1, A2, ..., An are
null.

Let a and b be relations with the schemas A (name, address, title), and B (name, address, salary),
respectively. Show how to express a natural full outer join b using the full outer-join operation with
an on condition and the coalesce operation. Make sure that the result relation does not contain two
copies of the attributes name and address, and that the solution is correct even if some tuples in a
and b have null values for attributes name or address.

SELECT

COALESCE(a.name, b.name) AS name,

COALESCE(a.address, b.address) AS address,

a.title,

b.salary

FROM

a
FULL OUTER JOIN

ON

a.name = b.name AND a.address = b.address;

4.5 Under what circumstances would the query

select * from student natural full outer join takes natural full outer join course

include tuples with null values for the title attribute?

Tuples with null values for the title attribute in the result can occur under the following
circumstances:

1. Students who have not taken any course:

o If a student exists in the student relation but does not have a corresponding entry in
the takes relation, the join will include that student's information. However, since
there is no corresponding course entry, attributes from course, such as title, will be
NULL.

2. Courses that have no students enrolled:

o If a course exists in the course relation but does not have any student enrolled (i.e.,
no corresponding entry in takes), the join will include that course's information.
However, attributes from the student relation will be NULL, and the title will still be
present because it comes from course.

3. Incomplete or mismatched data:

o If there are tuples in the takes relation that reference non-existent course IDs in the
course table, the title for those rows will be NULL. This might occur due to:

 Referential integrity violations.

 A course being removed from the course table but still referenced in takes.

4. Outer join nature:

o Since a FULL OUTER JOIN is used, all tuples from all relations (student, takes, course)
are included, even if they do not match with corresponding tuples in the other
relations. The unmatched attributes will contain NULL values.

6.9 Describe how to translate join expressions in SQL to relational algebra.

1. Understand the SQL Join Syntax

SQL supports various types of joins, such as:

 Inner Join: Combines rows with matching values in both relations.

 Outer Join (Left, Right, Full): Includes non-matching rows from one or both relations, with
nulls for missing attributes.

 Cross Join: Produces a Cartesian product of two relations.


 Self Join: A join of a relation with itself.

Each of these can be expressed in relational algebra using specific operators.

2. Relational Algebra Operators for Joins

Relational algebra provides the following operators for expressing joins:

 Cartesian Product (×): Produces all possible combinations of tuples from two relations.

 Selection (σ): Filters tuples based on a condition.

 Projection (π): Selects specific attributes from a relation.

 Natural Join (⨝): Combines tuples with common attribute values, automatically matching
attributes with the same name.

 Outer Joins (⟕, ⟖, ⟗): Variants of join that include unmatched tuples with null values for
missing attributes.

3. Translation Rules

Below are specific examples of how to translate SQL join expressions into relational algebra:

a. Inner Join

SQL:

SELECT *

FROM R INNER JOIN S ON R.a = S.b;

Relational Algebra:

σR.a=S.b(R×S)

Perform a Cartesian product (R × S) between relations R and S.

 Apply a selection (σ) to filter tuples where R.a = S.b.

Alternatively, if attributes a and b are the only common attributes:

R⋈R.a=S.bS

b. Natural Join

SQL:

SELECT *

FROM R NATURAL JOIN S;

Relational Algebra:

R⋈S

 A natural join automatically matches tuples where all common attributes have the same
value.

c. Cross Join
SQL:

SELECT *

FROM R CROSS JOIN S;

Relational Algebra:

R×SR \times SR×S

 Simply take the Cartesian product of R and S.

d. Left Outer Join

SQL:

SELECT *

FROM R LEFT OUTER JOIN S ON R.a = S.b;

Relational Algebra:

R⋈R.a=S.bS∪πR(R−πR(R⋈R.a=S.bS))

 Perform a natural join or conditional join (R ⨝ S).

 Add tuples from R that do not match in S, with nulls for S's attributes.

e. Right Outer Join

SQL:

SELECT *

FROM R RIGHT OUTER JOIN S ON R.a = S.b;

Relational Algebra:

R⋈R.a=S.bS∪πS(S−πS(R⋈R.a=S.bS))

 Similar to left outer join but includes non-matching tuples from S.

f. Full Outer Join

SQL:

SELECT *

FROM R FULL OUTER JOIN S ON R.a = S.b;

Relational Algebra:

(R⋈R.a=S.bS)∪πR(R−πR(R⋈R.a=S.bS))∪πS(S−πS(R⋈R.a=S.bS))

 Combines left and right outer join results to include all unmatched tuples from both R and S.

g. Self Join

SQL:

SELECT *
FROM R AS R1 INNER JOIN R AS R2 ON R1.a = R2.b;

Relational Algebra:

σR1.a=R2.b(R×R)

 Treat R as two instances (R1 and R2) and apply the same inner join logic.

4. Key Considerations

 Common Attributes: Natural joins automatically match all common attributes. For explicit
conditions (e.g., R.a = S.b), use a theta-join (⨝ with a condition).

 Outer Joins: These are extensions of natural joins that include unmatched tuples from one or
both sides.

 Projection: Add projection (π) to select specific columns as needed.

6.10 Write the following queries in relational algebra, using the university schema.

a. Find the names of all students who have taken at least one Comp. Sci. course.

b. Find the IDs and names of all students who have not taken any course offering before Spring 2009.

c. For each department, find the maximum salary of instructors in that department. You may assume
that every department has at least one instructor.

d. Find the lowest, across all departments, of the per-department maximum salary computed by the
preceding query.

University Schema:

 Student(ID, name, dept_name, tot_cred)

 Instructor(ID, name, dept_name, salary)

 Course(course_id, title, dept_name, credits)

 Section(course_id, sec_id, semester, year, building, room_number, time_slot_id)

 Teaches(ID, course_id, sec_id, semester, year)

 Takes(ID, course_id, sec_id, semester, year, grade)

 Department(dept_name, building, budget)


6.11 Using the university example, write relational-algebra queries to find the course sections taught
by more than one instructor in the following ways:

a. Using an aggregate function.

b. Without using any aggregate functions.


6.12 Consider the following relational schema for a library:

Member (memb no, name, dob)

books (isbn, title, authors, publisher)

borrowed (memb no, isbn, date)

Write the following queries in relational algebra.

a. Find the names of members who have borrowed any book published by “McGraw-Hill”.

b. Find the name of members who have borrowed all books published by “McGraw-Hill”.

c. Find the name and membership number of members who have borrowed more than five different
books published by “McGraw-Hill”.

d. For each publisher, find the name and membership number of members who have borrowed more
than five books of that publisher.

e. Find the average number of books borrowed per member. Take into account that if a member does
not borrow any books, then that member does not appear in the borrowed relation at all.
6.13 Let R = (A, B) and S = (A, C), and let r(R) and s(S) be relations.

Write relational-algebra expressions equivalent to the following domain-

relational-calculus expressions:

a. {< a > | ∃ b (< a, b > ∈ r ∧ b = 17)}

b. {< a, b, c > | < a, b > ∈ r ∧ < a, c > ∈ s}

c. {< a > | ∃ b (< a, b > ∈ r) ∨ ∀ c (∃ d (< d, c > ∈ s) ⇒ < a, c > ∈ s)}

d. {< a > | ∃ c (< a, c > ∈ s ∧ ∃ b1, b2 (< a, b1 > ∈ r ∧ < c, b2 > ∈ r ∧ b1 > b2))}
7.14 Explain the distinctions among the terms primary key, candidate key, and

superkey.

In relational database theory, primary key, candidate key, and superkey are all related to the
concept of identifying tuples (rows) uniquely in a relation (table). They differ in their characteristics
and constraints:

1. Superkey

A superkey is any set of attributes (columns) that uniquely identifies a tuple in a relation. It can
consist of one or more attributes, and it may include extra attributes that are not necessary for
uniqueness. In other words, a superkey is a set of attributes that can uniquely identify a row, but it
may contain redundant attributes.

 Example: If we have a relation Employee(ID, Name, SSN), a superkey could be {ID}, {SSN}, or
{ID, Name}. While {ID, Name} still uniquely identifies the tuple, it contains unnecessary extra
attributes (as {ID} alone would be sufficient).

2. Candidate Key

A candidate key is a minimal superkey, meaning it is a superkey with no redundant attributes. Every
candidate key is a superkey, but it does not contain any unnecessary attributes. In other words, a
candidate key is a superkey, but if you remove any attribute from it, it will no longer uniquely identify
the tuple.

 Example: In the Employee relation, if ID and SSN both uniquely identify employees, then
both {ID} and {SSN} are candidate keys. However, {ID, Name} is not a candidate key, because
{ID} alone is sufficient to uniquely identify the tuple.

3. Primary Key

The primary key is one of the candidate keys selected to uniquely identify tuples in a relation. The
primary key is chosen by the database designer and is typically the key that will be used most often
for indexing or joining with other tables. There can only be one primary key in a relation.
 Example: In the Employee relation, if both {ID} and {SSN} are candidate keys, the database
designer would choose one of them to be the primary key. For instance, let's say {ID} is
chosen as the primary key. So, {ID} becomes the primary key, and {SSN} is still a candidate key
but not the primary key.

Key Differences:

 A superkey can have unnecessary attributes but still uniquely identify a tuple.

 A candidate key is a minimal superkey, meaning it is the smallest set of attributes that can
uniquely identify a tuple.

 A primary key is a selected candidate key that is chosen to uniquely identify tuples in a table,
and only one primary key can exist in a relation.

7.15 We can convert any weak entity set to a strong entity set by simply adding appropriate
attributes. Why, then, do we have weak entity sets?

Weak entity sets exist in an Entity-Relationship (ER) model for a specific reason related to how
entities are represented and related to one another. While it's true that you can convert a weak
entity set into a strong entity set by adding attributes, weak entity sets are still useful in database
modeling because they serve a specific purpose in representing certain types of relationships and
dependencies between entities. Here’s why weak entity sets are important:

1. Dependency on a Strong Entity (Identifying Entity)

A weak entity set is an entity set that cannot be uniquely identified by its own attributes alone.
Instead, it relies on a strong entity set (also known as the owner entity set) for its identification. This
relationship is typically called an identifying relationship. The weak entity set has a partial key that is
combined with the primary key of the strong entity to form a unique identifier.

 Example: Consider a Dependent entity set that represents the dependents of an employee.
The Dependent entity cannot be uniquely identified by its own attributes (such as
Dependent Name or Birth Date), but it can be uniquely identified by combining the
Employee ID (the strong entity) and Dependent Name.

2. Representation of Real-World Relationships

Weak entity sets are used to represent real-world relationships that are inherently dependent on
other entities. Without weak entity sets, these relationships might require redundant or overly
complex structures to express.

 Example: In a Invoice and Invoice Item scenario, the Invoice Item is a weak entity that
cannot exist independently of the Invoice entity. Each Invoice Item can be uniquely identified
only within the context of a specific invoice. Converting the Invoice Item to a strong entity
would require duplicating the Invoice ID in every Invoice Item, which introduces redundancy
and complicates the design.

3. Avoiding Redundancy and Ensuring Data Integrity

By using a weak entity set, we avoid redundancy. A weak entity set naturally minimizes the repetition
of attributes that are already included in the identifying (strong) entity. This helps in maintaining data
integrity and a more efficient representation of the relationship between entities.
 Example: If we turn a weak entity into a strong entity by adding the necessary identifying
attributes, we might end up duplicating information that already exists in the strong entity.
For instance, if a Course Enrollment entity is weak and depends on both Student and Course,
converting it into a strong entity would require storing both Student ID and Course ID as part
of the primary key, which could result in redundancy if this relationship is often referenced in
other parts of the system.

4. More Natural Modeling of Relationships

Weak entities reflect the real-world concept that some entities do not exist independently but are
part of a larger context. These types of relationships occur frequently in many domains, such as:

 Parts in an assembly (a part’s identification depends on the assembly in which it’s used).

 Employees in a department (an employee’s identification could depend on the department


they belong to).

 Order items in an order (an order item’s identification depends on the order it belongs to).

5. Simplifying Database Design

Weak entities help simplify database design and prevent unnecessary complexity. Without weak
entity sets, you might need to create additional artificial keys or relationships to capture these
dependent relationships, making the schema more complicated and harder to maintain.

7.16 Design a database for an automobile company to provide to its dealers to assist them in
maintaining customer records and dealer inventory and to assist sales staff in ordering cars.

Each vehicle is identified by a vehicle identification number (VIN). Each individual vehicle is a
particular model of a particular brand offered by the company (e.g., the XF is a model of the car
brand Jaguar of Tata Motors). Each model can be offered with a variety of options, but an individual
car may have only some (or none) of the available options. The database needs to store information
about models, brands, and options, as well as information about individual dealers, customers, and
cars.

Your design should include an E-R diagram, a set of relational schemas, and a list of constraints,
including primary-key and foreign-key constraints.

1. Entity-Relationship (E-R) Diagram

Entities:

1. Brand

o Attributes: Brand_ID (Primary Key), Brand_Name, Parent_Company

2. Model

o Attributes: Model_ID (Primary Key), Model_Name, Brand_ID (Foreign Key),


Year_Of_Manufacture, Price

3. Option

o Attributes: Option_ID (Primary Key), Option_Name, Option_Price

4. Car
o Attributes: VIN (Primary Key), Model_ID (Foreign Key), Color, Year, Price,
Date_Manufactured

5. Dealer

o Attributes: Dealer_ID (Primary Key), Dealer_Name, Address, City, State, Zipcode,


Contact_Info

6. Customer

o Attributes: Customer_ID (Primary Key), Name, Address, City, State, Zipcode,


Contact_Info

7. Car_Option (Junction table for many-to-many relationship between Car and Option)

o Attributes: VIN (Foreign Key), Option_ID (Foreign Key)

8. Order

o Attributes: Order_ID (Primary Key), Customer_ID (Foreign Key), Dealer_ID (Foreign


Key), Order_Date, Total_Amount

9. Order_Item

o Attributes: Order_ID (Foreign Key), VIN (Foreign Key), Quantity

Relationships:

 Brand to Model: One Brand offers many Models, but each Model belongs to only one Brand.
(1:M)

 Model to Car: One Model can have many Cars, but each Car belongs to only one Model.
(1:M)

 Car to Option: A Car can have many Options, and an Option can be applied to many Cars.
(M:N)

 Dealer to Car: A Dealer may have many Cars in inventory, and each Car can be available at
multiple Dealers. (M:N)

 Customer to Order: A Customer can place multiple Orders, but each Order is placed by one
Customer. (1:M)

 Order to Car: An Order can contain multiple Cars, and a Car can be part of multiple Orders.
(M:N)

2. Relational Schema

The relational schema consists of the following tables:

1. Brand

Attribute Data Type Constraint

Brand_ID INT Primary Key

Brand_Name VARCHAR(255) Not Null


Parent_Compan VARCHAR(255) Not Null
y

2. Model

Attribute Data Type Constraint

Model_ID INT Primary Key

Model_Name VARCHAR(255) Not Null

Brand_ID INT Foreign Key references Brand

Year_Of_Manufacture INT Not Null

Price DECIMAL(10,2) Not Null

3. Option

Attribute Data Type Constraint

Option_ID INT Primary Key

Option_Nam VARCHAR(255) Not Null


e

Option_Price DECIMAL(10,2) Not Null

4. Car

Attribute Data Type Constraint

VIN VARCHAR(17) Primary Key

Model_ID INT Foreign Key references Model

Color VARCHAR(50) Not Null

Year INT Not Null

Price DECIMAL(10,2) Not Null

Date_Manufactured DATE Not Null

5. Car_Option (Junction table)

Attribute Data Type Constraint

VIN VARCHAR(17) Foreign Key references Car

Option_ID INT Foreign Key references Option

Primary Key (VIN, Option_ID) Composite Key

6. Dealer

Attribute Data Type Constraint

Dealer_ID INT Primary Key


Dealer_Name VARCHAR(255) Not Null

Address VARCHAR(255) Not Null

City VARCHAR(100) Not Null

State VARCHAR(100) Not Null

Zipcode VARCHAR(20) Not Null

Contact_Info VARCHAR(100) Not Null

7. Customer

Attribute Data Type Constraint

Customer_I INT Primary Key


D

Name VARCHAR(255) Not Null

Address VARCHAR(255) Not Null

City VARCHAR(100) Not Null

State VARCHAR(100) Not Null

Zipcode VARCHAR(20) Not Null

Contact_Info VARCHAR(100) Not Null

8. Order

Attribute Data Type Constraint

Order_ID INT Primary Key

Customer_ID INT Foreign Key references Customer

Dealer_ID INT Foreign Key references Dealer

Order_Date DATE Not Null

Total_Amoun DECIMAL(10,2) Not Null


t

9. Order_Item (Junction table)

Attribute Data Type Constraint

Order_ID INT Foreign Key references Order

VIN VARCHAR(17) Foreign Key references Car

Quantity INT Not Null

Primary Key (Order_ID, VIN) Composite Key

3. Constraints
1. Primary Key Constraints:

o Each table has a primary key to uniquely identify records.

2. Foreign Key Constraints:

o Model.Brand_ID references Brand(Brand_ID)

o Car.Model_ID references Model(Model_ID)

o Car_Option.VIN references Car(VIN)

o Car_Option.Option_ID references Option(Option_ID)

o Order.Customer_ID references Customer(Customer_ID)

o Order.Dealer_ID references Dealer(Dealer_ID)

o Order_Item.Order_ID references Order(Order_ID)

o Order_Item.VIN references Car(VIN)

3. Unique Constraints:

o VIN in Car table is unique.

o Order_ID and VIN in Order_Item table is a composite key.

4. Not Null Constraints:

o All foreign keys should be NOT NULL.

o Attributes like Car.Price, Model.Price, Order.Total_Amount, and Quantity should also


be NOT NULL.

5. Check Constraints:

o Car.Price and Model.Price should be non-negative.

o Order.Total_Amount should be non-negative.

o Order_Item.Quantity should be greater than zero.

4. Additional Considerations:

 Cascading Updates/Deletes:

o When a brand is deleted, all models linked to that brand should be deleted or
updated accordingly (cascading delete).

o Similarly, cascading updates or deletes should be applied to ensure consistency when


deleting or updating records in the Car_Option, Order_Item, and Order tables.

7.17 Design a generalization–specialization hierarchy for a motor vehicle sales company. The
company sells motorcycles, passenger cars, vans, and buses. Justify your placement of attributes at
each level of the hierarchy. Explain why they should not be placed at a higher or lower level.
In this design, we will create a generalization-specialization hierarchy where the top level will be a
general entity for motor vehicles, and the lower levels will specialize into categories like motorcycles,
passenger cars, vans, and buses.

1. General Entity: Motor Vehicle

Motor Vehicle is the general entity, which represents all types of motorized vehicles that the
company sells. The attributes here will capture the common properties shared by all motor vehicles,
regardless of the type.

Attributes of Motor Vehicle:

 VIN (Vehicle Identification Number): Unique identifier for each motor vehicle.

 Make: The manufacturer of the vehicle (e.g., Harley-Davidson, Toyota, etc.).

 Model: The model name or number of the vehicle.

 Year: The manufacturing year of the vehicle.

 Price: The price of the vehicle.

 Engine Type: The type of engine (e.g., internal combustion, electric, hybrid).

 Color: The color of the vehicle.

 Fuel Type: Type of fuel the vehicle uses (e.g., petrol, diesel, electric).

 Transmission Type: The transmission type (e.g., automatic, manual).

These attributes apply to all types of vehicles because they define basic characteristics that are
common across motorcycles, cars, vans, and buses.

2. Specialization Levels (Subclasses)

Now we will specialize the Motor Vehicle entity into more specific vehicle types: Motorcycle,
Passenger Car, Van, and Bus.

Motorcycle

Motorcycles have unique attributes that are not applicable to cars, vans, or buses. The attributes of a
Motorcycle will include those that are specific to motorcycles and are not shared with other vehicle
types.

Additional Attributes for Motorcycle:

 Engine Capacity: The size of the engine (e.g., 500cc, 1000cc).

 Type: Type of motorcycle (e.g., cruiser, sport, touring).

These attributes are specific to motorcycles because no other vehicle type would have an engine
capacity described in cc or a specific type like sport or cruiser.

Passenger Car

Passenger cars have their own specialized attributes. These cars are primarily designed for
transporting passengers.
Additional Attributes for Passenger Car:

 Number of Doors: The number of doors the vehicle has (e.g., 2-door, 4-door).

 Seating Capacity: The number of seats available (e.g., 4, 5, 7 seats).

 Trunk Volume: The size of the trunk (for luggage).

These attributes apply to Passenger Cars specifically and should not be placed at the Motor Vehicle
level because not all motor vehicles, such as motorcycles or buses, have attributes like seating
capacity or trunk volume.

Van

Vans are often used for transporting goods or people in bulk. They have their own set of specialized
attributes that differentiate them from cars and motorcycles.

Additional Attributes for Van:

 Cargo Space: The capacity of the cargo area (in cubic feet).

 Sliding Doors: The presence of sliding doors for easier access to the vehicle.

 Number of Seats: Vans may have varying seating arrangements depending on whether they
are used for goods or passengers.

These attributes should be placed in the Van subclass because they are specific to vans and don't
apply to Motorcycles or Passenger Cars.

Bus

Buses are vehicles that carry large numbers of passengers and have specialized features for
transportation on a larger scale.

Additional Attributes for Bus:

 Passenger Capacity: The number of passengers the bus can carry.

 Number of Floors: For double-decker buses, this attribute is relevant.

 Accessibility Features: Features such as wheelchair ramps or spaces for handicapped


passengers.

These attributes are specific to Buses and are not relevant to Motorcycles, Passenger Cars, or Vans.
They deal with the bus's capacity and its unique design, such as number of floors or accessibility
features.

3. Justification of Attribute Placement:

 At the Motor Vehicle level: Attributes like VIN, Make, Model, Year, Price, Fuel Type, and
Engine Type are shared by all vehicle types. These attributes are essential identifiers for any
vehicle, making them applicable at the Motor Vehicle level.

 At the Motorcycle level: The attributes Engine Capacity and Type are specific to
motorcycles. These should not be at the Motor Vehicle level because Engine Capacity
doesn’t apply to cars or buses, and Type is unique to motorcycles.
 At the Passenger Car level: The attributes Number of Doors, Seating Capacity, and Trunk
Volume are specific to cars. While other vehicles may have seating capacity, trunk volume
and the number of doors are only meaningful in the context of passenger cars.

 At the Van level: Attributes like Cargo Space and Sliding Doors are crucial for Vans but don't
apply to motorcycles or cars. These attributes describe aspects of the van’s functionality that
are not relevant for the other vehicle types.

 At the Bus level: Passenger Capacity, Number of Floors, and Accessibility Features are all
attributes specific to buses. These features make buses distinct from other vehicle types,
such as motorcycles, which have a very different structure and purpose.

1.1 Define the following terms: data, database, DBMS, database system, database catalog, program-
data independence, user view, DBA, end user, canned transaction, deductive database system,
persistent object, meta-data, and transaction-processing application.
Data
 Raw facts or figures that have no context or meaning on their own but can be processed to
produce information. For example, numbers like "23" or names like "Alice."
Database
 An organized collection of related data that is stored electronically in a way that allows for
easy access, management, and updating. For example, a customer database storing names,
addresses, and purchase histories.
DBMS (Database Management System)
 Software that provides an interface for users and applications to interact with the database,
enabling data storage, retrieval, and manipulation while ensuring data integrity and security.
Examples include MySQL, PostgreSQL, and Oracle.
Database System
 A system that consists of the database, the DBMS, and the applications that use the DBMS to
perform various tasks like querying or updating data.
Database Catalog
 A repository that stores metadata about the database, such as schema definitions, tables,
columns, data types, constraints, and user permissions. It is used by the DBMS to manage
the database.
Program-Data Independence
 The ability to modify the database schema without having to change the application
programs that access the database. This is achieved by separating the data structure
(schema) from the application logic.
User View
 A subset of the database or an abstraction tailored to the needs of a particular user or group.
For example, an accountant might only see financial data, while an HR employee sees
employee records.
DBA (Database Administrator)
 A person responsible for managing the database system, including tasks such as designing
the schema, monitoring performance, ensuring data security, and performing backups and
recovery.
End User
 A person who directly interacts with the database through applications or query tools. End
users can be casual users (e.g., employees using a report generator) or sophisticated users
(e.g., analysts writing SQL queries).
Canned Transaction
 Predefined database operations or queries that are repeatedly executed by end users, often
through a user-friendly interface. For example, placing an order in an e-commerce system.
Deductive Database System
 A database system that integrates logic programming (e.g., Prolog) with a database. It can
derive new facts and relationships using inference rules and stored data.
Persistent Object
 An object in an object-oriented database that retains its state across multiple sessions and
exists beyond the runtime of the application that created it. For example, a customer object
in an e-commerce system.
Meta-Data
 Data about the data, describing the structure, organization, and constraints of the data in the
database. Metadata includes table names, column names, data types, and relationships.
Transaction-Processing Application
 An application designed to handle a sequence of database operations (transactions) in a way
that ensures data integrity and consistency, even in the event of failures. Examples include
banking systems, e-commerce systems, and ticket-booking systems.

1.2 What four main types of actions involve databases? Briefly discuss each.
1. Data Definition
 Purpose: Defining the structure and organization of data in the database.
 Description: This involves creating, modifying, and deleting database schemas, such as
defining tables, columns, data types, constraints, and relationships.
 Example: Using SQL to create a table:
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(50),
salary DECIMAL(10, 2)
);
2. Data Manipulation
 Purpose: Handling and modifying the data stored in the database.
 Description: Includes inserting new data, updating existing data, deleting data, and retrieving
data. These actions are performed using a query language like SQL.
 Example:
o Insert Data:
INSERT INTO employees (id, name, salary) VALUES (1, 'Alice', 50000);
o Retrieve Data:
SELECT * FROM employees;
3. Data Querying
 Purpose: Retrieving specific information from the database.
 Description: This involves writing queries to filter, aggregate, or analyze the data. Querying is
one of the most common operations performed by users to get meaningful insights or
reports.
 Example:
Retrieve all employees with a salary greater than $40,000:
SELECT name FROM employees WHERE salary > 40000;
4. Transaction Management
 Purpose: Ensuring data consistency and integrity during multiple operations.
 Description: A transaction is a group of one or more operations performed as a single unit. It
ensures the ACID properties (Atomicity, Consistency, Isolation, Durability) are maintained,
even during system failures.
 Example:
Transfer $100 from one account to another:
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

1.3 Discuss the main characteristics of the database approach and how it differs from traditional file
systems.
1. Self-Describing Nature
 Database Approach:
o The database contains not only the data but also metadata (data about the
structure, constraints, and schema).
o Metadata is stored in a catalog and makes the system self-describing.
 File Systems:
o Metadata is embedded in application programs, making it less flexible and harder to
manage.
2. Program-Data Independence
 Database Approach:
o Changes to the database schema (e.g., adding columns to a table) do not require
changes to application programs.
o Achieved through a data abstraction layer provided by the DBMS.
 File Systems:
o Data and programs are tightly coupled. Changes to file structure often require
modifying application logic.
3. Data Abstraction
 Database Approach:
o Data is organized at multiple abstraction levels:
 Physical Level: How data is stored.
 Logical Level: What data is stored and the relationships.
 View Level: How users see the data.
o DBMS hides complexities from users.
 File Systems:
o No abstraction; users and developers deal directly with data at the physical level.
4. Support for Multiple Views of Data
 Database Approach:
o Different users can see different subsets or formats of the data depending on their
needs (e.g., an HR team might see employee details, while finance sees payroll data).
 File Systems:
o Data views are fixed, often requiring separate files or additional logic for different
perspectives.
5. Data Sharing and Multiuser Access
 Database Approach:
o Designed for concurrent access by multiple users while maintaining data consistency.
o Uses transaction management and locking mechanisms.
 File Systems:
o Limited support for concurrent access; often requires manual handling to avoid
inconsistencies.
6. Data Integrity and Security
 Database Approach:
o Enforces data integrity using constraints (e.g., foreign keys, unique constraints).
o Offers robust security mechanisms to control access at different levels.
 File Systems:
o Integrity and security mechanisms must be implemented manually, often resulting in
redundancy and errors.
7. Reduction of Data Redundancy and Inconsistency
 Database Approach:
o Centralized control reduces redundant data storage, ensuring consistency.
 File Systems:
o Separate files may lead to duplicate data, increasing redundancy and potential
inconsistencies.
8. Backup and Recovery
 Database Approach:
o Built-in mechanisms for data backup and recovery in case of failure.
 File Systems:
o Limited or no built-in recovery; backup must be managed manually.
9. Complex Querying and Reporting
 Database Approach:
o Allows complex querying through SQL and advanced reporting tools.
 File Systems:
o Requires custom programming for complex queries, which is time-consuming and
error-prone.

1.4 What are the responsibilities of the DBA and the database designers?
Responsibilities of the Database Administrator (DBA)
The DBA is responsible for the overall management, maintenance, and security of the database
system. Key responsibilities include:

1. Database Design Implementation


 Collaborates with database designers to implement the physical database structure.
 Ensures that the design aligns with performance and storage requirements.

2. Data Security and Integrity


 Sets up access controls to ensure only authorized users can access the database.
 Implements encryption, user authentication, and role-based access management.
 Ensures data compliance with legal and organizational policies.

3. Performance Monitoring and Optimization


 Monitors database performance (e.g., query response time, resource usage).
 Tunes queries, indexes, and database configuration to optimize performance.

4. Backup and Recovery


 Develops and manages data backup strategies to prevent data loss.
 Implements disaster recovery plans to restore data in case of hardware or software failures.

5. Data Availability and Maintenance


 Ensures database uptime and resolves outages quickly.
 Schedules maintenance tasks like indexing, updates, and archiving.

6. Database Upgrades and Patching


 Applies updates and patches to the database management system (DBMS).
 Ensures that upgrades do not disrupt ongoing operations.

7. User Support
 Assists end-users, developers, and analysts with database-related queries and issues.
 Provides training and documentation on database usage.

Responsibilities of the Database Designers


The Database Designers focus on designing the logical and physical structure of the database
system. Their responsibilities include:

1. Requirements Analysis
 Works with stakeholders to understand data requirements, workflows, and constraints.
 Identifies relationships between data entities and ensures the database meets business
needs.

2. Data Modeling
 Creates conceptual, logical, and physical data models.
 Defines the structure of tables, relationships, keys (primary, foreign, candidate), and
constraints.

3. Normalization
 Applies normalization techniques to eliminate data redundancy and maintain consistency.
 Balances normalization with performance considerations to optimize query efficiency.

4. Schema Design
 Designs the database schema, specifying tables, columns, data types, constraints, and
relationships.
 Incorporates indexing strategies for efficient data retrieval.

5. Data Integrity Constraints


 Specifies constraints like primary keys, foreign keys, unique constraints, and checks.
 Ensures that data remains valid and consistent.

6. Prototyping and Validation


 Builds prototypes to validate database design with stakeholders.
 Tests the design using sample data and queries to ensure it meets requirements.

7. Collaboration with DBA and Developers


 Works closely with the DBA to implement the physical database.
 Collaborates with application developers to ensure database compatibility with applications.

1.5

You might also like