dbms
dbms
3. Concurrency Control
Responsibility: Manage simultaneous access by multiple users to ensure data consistency
and prevent conflicts.
Problems if not discharged:
o Conflicting updates, such as two users modifying the same record simultaneously,
leading to data anomalies.
o Lost updates or overwritten data.
o Deadlocks or system crashes caused by uncoordinated access.
4. Security Management
Function: The DBA manages database security to ensure that data is protected from
unauthorized access or breaches.
Explanation: This involves setting up user roles and permissions, enforcing password policies,
and ensuring encryption for sensitive data. The DBA also monitors database access logs for
suspicious activity.
Example: Granting read-only access to certain users while giving full access to
administrators, and implementing encryption for storing credit card information.
Simpler to implement but harder More complex but allows for better
Complexity
to scale separation of concerns
2.1 Consider the foreign key constraint from the dept name attribute of instructor to the department
relation. Give examples of inserts and deletes to these relations, which can cause a violation of the
foreign key constraint.
A foreign key constraint ensures that the value of the attribute in a table (the foreign key)
corresponds to a valid value in another table (the referenced table). In this case, the foreign key
constraint connects the dept_name attribute in the instructor table to the department relation
(table), ensuring that each instructor is assigned to a valid department.
instructor table:
o instructor_id
o name
department table:
o dept_head
o budget
The foreign key constraint on instructor.dept_name ensures that every value in instructor.dept_name
must exist as a valid dept_name in the department table.
When inserting data into the instructor table, a violation occurs if we try to insert a record with a
dept_name that does not exist in the department table.
Example:
Violation: The dept_name 'Mechanical Engineering' does not exist in the department table, so this
insert would violate the foreign key constraint.
When deleting data from the department table, a violation occurs if we try to delete a department
that is referenced by one or more instructors in the instructor table.
Example:
The instructor table contains:
Violation: Since the Computer Science department is referenced by Dr. Alice in the instructor table,
deleting the department would violate the foreign key constraint.
To prevent violations, you can define referential actions on the foreign key constraint, such as:
1. ON DELETE CASCADE: Deletes all related records in the instructor table if the department is
deleted.
2. ON DELETE SET NULL: Sets the dept_name in the instructor table to NULL if the related
department is deleted.
3. ON UPDATE CASCADE: Updates the dept_name in the instructor table if the department
name in the department table is updated.
For example:
ON DELETE CASCADE;
This would automatically delete all instructors assigned to a department if the department is
deleted, preventing a foreign key violation.
2.2 Consider the time slot relation. Given that a particular time slot can meet more than once in a
week, explain why day and start time are part of the primary key of this relation, while end time is
not.
In a time slot relation (e.g., a table that stores information about class schedules or meetings), the
primary key typically uniquely identifies each record in the relation. Given the scenario where a time
slot can meet more than once in a week, we need to understand why certain attributes like day and
start time are part of the primary key, while end time is not.
course_id: The course or event ID that is scheduled during this time slot.
The primary key of this relation should uniquely identify each time slot. Here’s why day and start
time are part of the primary key, while end time is not:
o Day and start time together are sufficient to uniquely identify when a time slot
occurs, because each time slot on a particular day will have a specific start time.
o For example, a class could occur at 10:00 AM on Monday, 11:00 AM on Monday, etc.
Similarly, the same course may repeat on the same day (e.g., Monday at 10:00 AM
every week).
o If both day and start_time are unique together, they provide enough information to
differentiate between different time slots.
o The end time typically does not need to be part of the primary key because it
doesn't help in uniquely identifying the time slot itself. In most scheduling systems,
the day and start time can uniquely define a meeting or class, even if the end time is
not explicitly stored as part of the primary key.
o Multiple events can have the same start time and day, but they might differ in end
time. However, this doesn’t affect the uniqueness of the time slot since start time
already serves as a distinguishing factor.
o For example, a course could be scheduled to meet from 10:00 AM to 11:00 AM and
another course might meet from 10:00 AM to 12:00 PM on the same day. In this
case, both have the same start_time and day, but they are different records and can
still be distinguished by the course ID or other attributes, but the start time and day
uniquely identify the time slot without needing the end time.
2.3 In the instance of instructor, no two instructors have the same name. From this, can we conclude
that name can be used as a super key (or primary key) of instructor?
No, we cannot conclude that name can be used as a superkey (or primary key) of the instructor
table, even if no two instructors have the same name. Here's why:
1. Superkey Definition:
A superkey is a set of one or more attributes (columns) that can uniquely identify every tuple (row)
in a relation (table). A primary key is a minimal superkey, meaning it's a superkey with no
unnecessary attributes (i.e., it cannot be reduced further without losing its ability to uniquely identify
records).
2. Why Name Can't Be Used as a Superkey:
Even if we are told that no two instructors have the same name, the name attribute does not
necessarily meet the requirements for a superkey or primary key because:
Potential for Data Inconsistencies: In real-world applications, it's possible (though unlikely)
that instructors could have the same name in some cases, especially if the system is not
constrained to enforce unique names. For example, there could be two instructors with the
name "John Doe" in different departments, even if they don't exist in the current dataset. In
a future scenario where the data changes or more instructors are added, it might violate the
uniqueness assumption.
Lack of Minimality: For an attribute to be a primary key, it must uniquely identify each
record, but also be minimal. A minimal key means no other attributes can be removed
without breaking the uniqueness property. While name might uniquely identify instructors
under current conditions, it is not guaranteed to be the only possible identifier in the future
(for example, in cases of name duplication or system expansions). A better candidate for the
primary key is typically something like instructor_id (which is guaranteed to be unique and
minimal).
3. Practical Example:
While the names "John Doe" might be unique for the current data, in the future, there could
be two instructors with the same name in different departments (as in the example).
Therefore, name alone cannot guarantee uniqueness across all possible records.
A better candidate for a primary key is an instructor_id or a combination of instructor_id and name
(if instructor_id is not available). This guarantees uniqueness regardless of the instructor's name.
2.4 What is the result of first performing the cross product of student and advisor, and then
performing a selection operation on the result with the predicate s id = ID? (Using the symbolic
notation of relational algebra, this query can be written as s id=I D(student × advisor ).)
Let's break down the query step by step, using relational algebra operations. The query is:
σs.id=id(student×advisor)
This means we are performing a cross product of the student and advisor relations and then
applying a selection operation to filter the result based on the condition s.id=id, where s.id refers to
the id of the student and id refers to the id of the advisor.
Step-by-Step Explanation:
o If the student relation has attributes (s.id, s.name, s.major) and the advisor relation
has attributes (id, name, dept), the result of the cross product will have attributes:
o The number of resulting tuples will be the product of the number of tuples in the
student relation and the number of tuples in the advisor relation.
2. Selection (σ):
o The selection operation filters the result of the cross product based on the condition
s.id= id. This means we will only keep the tuples where the student id matches the
advisor id.
o After applying the selection, the resulting relation will only contain the pairs of
students and advisors where the student and advisor have the same id. Essentially,
this step pairs up each student with their respective advisor.
After the cross product, every student is paired with every advisor, but after applying the
selection operation s.id= id, the result will be filtered to only include those pairs where the id
of the student matches the id of the advisor.
o But since s.id= id in the resulting tuples, the attributes could be simplified to:
The resulting tuples will correspond to the students and their respective advisors (where the
student’s id matches the advisor’s id).
2.5
2.6 employee (person name, street, city)
b. Find the names of all employees whose salary is greater than $100,000.
c. Find the names of all employees who live in “Miami” and whose
Consider the bank database above. Give an expression in the relational algebra for each of
the following queries.
b. Find the names of all borrowers who have a loan in branch “Downtown”.
2.8 for the database given above, a. What are the appropriate primary keys? b. Given your
choice of primary keys, identify appropriate foreign keys.
A primary key uniquely identifies each record in a table. Here’s the identification of the
primary keys for each table:
Reason: The branch_name uniquely identifies each branch in the bank. Each
branch has a unique name, and there’s no need for multiple attributes to
uniquely identify it.
Reason: Assuming each customer has a unique name in the system (or can
be uniquely identified by name), this would be the primary key. In real-world
systems, it might be more practical to have a customer ID, but here we
assume the name is unique.
4. borrower(customer_name, loan_number)
6. depositor(customer_name, account_number)
Foreign keys establish relationships between tables by referencing primary keys in other
tables. Here’s how we can define the foreign keys:
References: branch(branch_name)
2. borrower(customer_name, loan_number)
References: customer(customer_name)
References: loan(loan_number)
References: branch(branch_name)
References: customer(customer_name)
References: account(account_number)
2.9 Describe the differences in meaning between the terms relation and relation schema.
Example A table with actual rows and values A table definition, listing attributes
and types
a. Find the names of all employees who work for “First Bank Corporation”.
b. Find the names and cities of residence of all employees who work for
c. Find the names, street address, and cities of residence of all employees
who work for “First Bank Corporation” and earn more than $10,000.
2.11 branch (branch name, branch city, assets)
Consider the bank database of Figure 2.15. Give an expression in the relational algebra for
each of the following queries:
a. Find all loan numbers with a loan value greater than $10,000.
b. Find the names of all depositors who have an account with a value
c. Find the names of all depositors who have an account with a value
Given the above relation, and our university schema, write each of the following queries in SQL. You
can assume for simplicity that no takes tuple has the null value for grade.
a. Find the total grade-points earned by the student with ID 12345, across all courses taken by the
student.
b. Find the grade-point average (GPA) for the above student, that is, the total grade-points divided by
the total credits for the associated courses.
FROM takes
JOIN grade_points
ON takes.grade = grade_points.grade
b. SELECT
JOIN grade_points
ON takes.grade = grade_points.grade
c. SELECT
takes.ID,
FROM takes
JOIN grade_points
ON takes.grade = grade_points.grade
GROUP BY takes.ID;
are underlined. Construct the following SQL queries for this relational
database.
a. Find the total number of people who owned cars that were involved
in accidents in 2009.
b. Add a new accident to the database; assume any values for required
attributes.
a. Find total number of people who owned cars involved in accidents in 2009:
FROM owns
WHERE driver_id = (SELECT driver_id FROM person WHERE name = 'John Smith')
AND license = (SELECT license FROM car WHERE model = 'Mazda' AND driver_id = (SELECT driver_id
FROM person WHERE name = 'John Smith'));
3.3 Suppose that we have a relation marks (ID, score) and we wish to assign grades to students based
on the score as follows: grade F if score < 40, grade C if 40 ≤ score < 60, grade B if 60 ≤ score < 80,
and grade A if 80 ≤ score. Write SQL queries to do the following:
a. Display the grade for each student, based on the marks relation.
SELECT ID,
score,
CASE
END AS grade
FROM marks;
SELECT
CASE
END AS grade,
COUNT(*) AS num_students
FROM marks
GROUP BY grade;
from p, r1, r2
Under what conditions does the preceding query select values of p.a1 that
may be empty.
o The query will select values of p.a1 that match either r1.a1 or r2.a1. The OR
condition means that if p.a1 matches r1.a1 or p.a1 matches r2.a1, it will be included
in the result.
2. When r1 and r2 both have matching rows and there is overlap between them (i.e., some a1
values exist in both r1 and r2):
o The DISTINCT keyword ensures that the result set does not have duplicate p.a1
values, so even if p.a1 matches both r1.a1 and r2.a1, it will appear only once in the
output.
1. When r1 is Empty:
If r1 is empty, the condition p.a1 = r1.a1 will never be true for any row in p. The query will
only return values of p.a1 that match r2.a1. In this case, the query behaves as if it is checking
for p.a1 values that exist in r2.
o The query will select values of p.a1 that are present in r2. If there are no values of
p.a1 that match any r2.a1, the result will be empty.
2. When r2 is Empty:
Similarly, if r2 is empty, the condition p.a1 = r2.a1 will never be true for any row in p. The
query will only return values of p.a1 that match r1.a1. In this case, the query behaves as if it
is checking for p.a1 values that exist in r1.
o The query will select values of p.a1 that are present in r1. If there are no values of
p.a1 that match any r1.a1, the result will be empty.
o The result will be empty because no values of p.a1 will match either r1.a1 or r2.a1.
Consider the bank database above, where the primary keys are underlined. Construct the following
SQL queries for this relational database.
a. Find all customers of the bank who have an account but not a loan.
b. Find the names of all customers who live on the same street and in
c. Find the names of all branches with customers who have an account
SELECT c.customer_name
FROM customer c
b. Find all customers who live on the same street and city as "Smith":
SELECT c1.customer_name
c. Find the names of all branches with customers who have an account and live in "Harrison":
Consider the employee database of Figure 3.20, where the primary keys are
a. Find the names and cities of residence of all employees who work for
b. Find the names, street addresses, and cities of residence of all employees who work for “First Bank
Corporation” and earn more than
$10,000.
c. Find all employees in the database who do not work for “First Bank
Corporation”.
d. Find all employees in the database who earn more than each employee
e. Assume that the companies may be located in several cities. Find all
is located.
unless the salary becomes greater than $100,000; in such cases, give
a. Find names and cities of employees working for "First Bank Corporation":
SELECT e.employee_name, e.city
FROM employee e
b. Find names, street addresses, and cities of employees working for "First Bank Corporation" and
earning more than $10,000:
FROM employee e
SELECT e.employee_name
FROM employee e
d. Find employees earning more than each employee of "Small Bank Corporation":
SELECT e.employee_name
FROM employee e
FROM works w2
e. Find all companies located in every city where "Small Bank Corporation" is located:
SELECT c.company_name
FROM company c
FROM company c2
FROM company c3
SELECT w.company_name
FROM works w
GROUP BY w.company_name
LIMIT 1;
g. Find companies with higher average salary than "First Bank Corporation":
SELECT w.company_name
FROM works w
GROUP BY w.company_name
FROM works w2
UPDATE employee
i. Give managers of "First Bank Corporation" a 10% raise unless salary exceeds $100,000:
UPDATE works
END
FROM manages m
a. Find the names of all employees who work for “First Bank Corporation”.
b. Find all employees in the database who live in the same cities as the
companies for which they work.
c. Find all employees in the database who live in the same cities and on
d. Find all employees who earn more than the average salary of all
h. Delete all tuples in the works relation for employees of “Small Bank
Corporation”.
a. Find the names of all employees who work for "First Bank Corporation":
SELECT employee_name
FROM works
b. Find all employees who live in the same cities as the companies they work for:
SELECT e.employee_name
FROM employee e
c. Find all employees who live in the same cities and on the same streets as their managers:
SELECT e.employee_name
FROM employee e
d. Find employees who earn more than the average salary at their company:
SELECT e.employee_name
FROM works w
FROM works
SELECT company_name
FROM works
GROUP BY company_name
LIMIT 1;
UPDATE works
UPDATE works
FROM manages m
h. Delete all tuples for employees of "Small Bank Corporation" from the works table:
select course id, semester, year, sec id, avg (tot cred)
Explain why joining section as well in the from clause would not change the result.
In the given SQL query, you are performing the following operations:
select course_id, semester, year, sec_id, avg(tot_cred): You are selecting the course ID,
semester, year, section ID, and the average total credits (tot_cred).
from takes natural join student: You are joining the takes and student tables using a natural
join. This join is based on columns with the same name in both tables (e.g., student_ID or
course_id).
where year = 2009: You filter the rows to only include those for the year 2009.
group by course_id, semester, year, sec_id: You group the results by course_id, semester,
year, and sec_id.
having count(ID) >= 2: You only include groups that have at least two students (ID represents
the student identifier).
The query involves information from the takes and student tables. If you were to include a join with
the section table, you would potentially be joining on a section_id that may already be implicitly
represented by the sec_id column in the takes table.
1. Existing Relationship: The sec_id in the takes table already represents the section ID, which
is likely a foreign key that links the takes table with the section table (assuming the sec_id is
unique to a section). So, sec_id already implicitly references the section table.
2. Redundant Information: The query already includes sec_id in the group by clause. If you
were to add an additional join with the section table (using sec_id), the result would still
include the same sec_id values because the data for sections is already represented by the
sec_id column in the takes table. There is no new information being added by joining the
section table.
3. No Additional Filtering: The section table (presumably containing additional attributes like
section_name, instructor, etc.) would not filter or affect the results unless you add specific
conditions in the WHERE clause related to the section table. But as it stands, the WHERE
clause already filters based on the year, which does not need any additional information
from the section table.
4. Logical Equivalence: Adding the section table in the FROM clause would be logically
equivalent to using it in the group by and having clauses if it does not filter out any rows or
affect the results directly. Since no such filtering condition is specified in this query, the
inclusion of section would not change the outcome.
a. Display a list of all instructors, showing their ID, name, and the number of sections that they have
taught. Make sure to show the number of sections as 0 for instructors who have not taught any
section. Your query should use an outer join, and should not use scalar subqueries.
b. Write the same query as above, but using a scalar subquery, without outer join.
c. Display the list of all course sections offered in Spring 2010, along with the names of the
instructors teaching the section. If a section has more than one instructor, it should appear as many
times in the result as it has instructors. If it does not have any instructor, it should still appear in the
result with the instructor name set to “—”.
d. Display the list of all departments, with the total number of instructors in each department,
without using scalar subqueries. Make sure to correctly handle departments with no instructors.
FROM instructor i
(SELECT COUNT(*)
FROM teaches t
FROM instructor i;
FROM section s
LEFT OUTER JOIN teaches t ON s.sec_id = t.sec_id AND s.year = t.year AND s.semester = t.semester
FROM department d
GROUP BY d.dept_name;
4.2 Outer join expressions can be computed in SQL without using the SQL outer join operation. To
illustrate this fact, show how to rewrite each of the following SQL queries without using the outer
join expression.
FROM student
FROM student
UNION
FROM takes
);
);
employee_name VARCHAR(100),
company_name VARCHAR(100),
salary DECIMAL(15, 2) NOT NULL,
);
employee_name VARCHAR(100),
manager_name VARCHAR(100),
);
4.4 SQL provides an n-ary operation called coalesce, which is defined as follows: coalesce (A1, A2, ...,
An) returns the first non-null Ai in the list A1, A2, ..., An, and returns null if all of A1, A2, ..., An are
null.
Let a and b be relations with the schemas A (name, address, title), and B (name, address, salary),
respectively. Show how to express a natural full outer join b using the full outer-join operation with
an on condition and the coalesce operation. Make sure that the result relation does not contain two
copies of the attributes name and address, and that the solution is correct even if some tuples in a
and b have null values for attributes name or address.
SELECT
a.title,
b.salary
FROM
a
FULL OUTER JOIN
ON
select * from student natural full outer join takes natural full outer join course
Tuples with null values for the title attribute in the result can occur under the following
circumstances:
o If a student exists in the student relation but does not have a corresponding entry in
the takes relation, the join will include that student's information. However, since
there is no corresponding course entry, attributes from course, such as title, will be
NULL.
o If a course exists in the course relation but does not have any student enrolled (i.e.,
no corresponding entry in takes), the join will include that course's information.
However, attributes from the student relation will be NULL, and the title will still be
present because it comes from course.
o If there are tuples in the takes relation that reference non-existent course IDs in the
course table, the title for those rows will be NULL. This might occur due to:
A course being removed from the course table but still referenced in takes.
o Since a FULL OUTER JOIN is used, all tuples from all relations (student, takes, course)
are included, even if they do not match with corresponding tuples in the other
relations. The unmatched attributes will contain NULL values.
Outer Join (Left, Right, Full): Includes non-matching rows from one or both relations, with
nulls for missing attributes.
Cartesian Product (×): Produces all possible combinations of tuples from two relations.
Natural Join (⨝): Combines tuples with common attribute values, automatically matching
attributes with the same name.
Outer Joins (⟕, ⟖, ⟗): Variants of join that include unmatched tuples with null values for
missing attributes.
3. Translation Rules
Below are specific examples of how to translate SQL join expressions into relational algebra:
a. Inner Join
SQL:
SELECT *
Relational Algebra:
σR.a=S.b(R×S)
R⋈R.a=S.bS
b. Natural Join
SQL:
SELECT *
Relational Algebra:
R⋈S
A natural join automatically matches tuples where all common attributes have the same
value.
c. Cross Join
SQL:
SELECT *
Relational Algebra:
SQL:
SELECT *
Relational Algebra:
R⋈R.a=S.bS∪πR(R−πR(R⋈R.a=S.bS))
Add tuples from R that do not match in S, with nulls for S's attributes.
SQL:
SELECT *
Relational Algebra:
R⋈R.a=S.bS∪πS(S−πS(R⋈R.a=S.bS))
SQL:
SELECT *
Relational Algebra:
(R⋈R.a=S.bS)∪πR(R−πR(R⋈R.a=S.bS))∪πS(S−πS(R⋈R.a=S.bS))
Combines left and right outer join results to include all unmatched tuples from both R and S.
g. Self Join
SQL:
SELECT *
FROM R AS R1 INNER JOIN R AS R2 ON R1.a = R2.b;
Relational Algebra:
σR1.a=R2.b(R×R)
Treat R as two instances (R1 and R2) and apply the same inner join logic.
4. Key Considerations
Common Attributes: Natural joins automatically match all common attributes. For explicit
conditions (e.g., R.a = S.b), use a theta-join (⨝ with a condition).
Outer Joins: These are extensions of natural joins that include unmatched tuples from one or
both sides.
6.10 Write the following queries in relational algebra, using the university schema.
a. Find the names of all students who have taken at least one Comp. Sci. course.
b. Find the IDs and names of all students who have not taken any course offering before Spring 2009.
c. For each department, find the maximum salary of instructors in that department. You may assume
that every department has at least one instructor.
d. Find the lowest, across all departments, of the per-department maximum salary computed by the
preceding query.
University Schema:
a. Find the names of members who have borrowed any book published by “McGraw-Hill”.
b. Find the name of members who have borrowed all books published by “McGraw-Hill”.
c. Find the name and membership number of members who have borrowed more than five different
books published by “McGraw-Hill”.
d. For each publisher, find the name and membership number of members who have borrowed more
than five books of that publisher.
e. Find the average number of books borrowed per member. Take into account that if a member does
not borrow any books, then that member does not appear in the borrowed relation at all.
6.13 Let R = (A, B) and S = (A, C), and let r(R) and s(S) be relations.
relational-calculus expressions:
d. {< a > | ∃ c (< a, c > ∈ s ∧ ∃ b1, b2 (< a, b1 > ∈ r ∧ < c, b2 > ∈ r ∧ b1 > b2))}
7.14 Explain the distinctions among the terms primary key, candidate key, and
superkey.
In relational database theory, primary key, candidate key, and superkey are all related to the
concept of identifying tuples (rows) uniquely in a relation (table). They differ in their characteristics
and constraints:
1. Superkey
A superkey is any set of attributes (columns) that uniquely identifies a tuple in a relation. It can
consist of one or more attributes, and it may include extra attributes that are not necessary for
uniqueness. In other words, a superkey is a set of attributes that can uniquely identify a row, but it
may contain redundant attributes.
Example: If we have a relation Employee(ID, Name, SSN), a superkey could be {ID}, {SSN}, or
{ID, Name}. While {ID, Name} still uniquely identifies the tuple, it contains unnecessary extra
attributes (as {ID} alone would be sufficient).
2. Candidate Key
A candidate key is a minimal superkey, meaning it is a superkey with no redundant attributes. Every
candidate key is a superkey, but it does not contain any unnecessary attributes. In other words, a
candidate key is a superkey, but if you remove any attribute from it, it will no longer uniquely identify
the tuple.
Example: In the Employee relation, if ID and SSN both uniquely identify employees, then
both {ID} and {SSN} are candidate keys. However, {ID, Name} is not a candidate key, because
{ID} alone is sufficient to uniquely identify the tuple.
3. Primary Key
The primary key is one of the candidate keys selected to uniquely identify tuples in a relation. The
primary key is chosen by the database designer and is typically the key that will be used most often
for indexing or joining with other tables. There can only be one primary key in a relation.
Example: In the Employee relation, if both {ID} and {SSN} are candidate keys, the database
designer would choose one of them to be the primary key. For instance, let's say {ID} is
chosen as the primary key. So, {ID} becomes the primary key, and {SSN} is still a candidate key
but not the primary key.
Key Differences:
A superkey can have unnecessary attributes but still uniquely identify a tuple.
A candidate key is a minimal superkey, meaning it is the smallest set of attributes that can
uniquely identify a tuple.
A primary key is a selected candidate key that is chosen to uniquely identify tuples in a table,
and only one primary key can exist in a relation.
7.15 We can convert any weak entity set to a strong entity set by simply adding appropriate
attributes. Why, then, do we have weak entity sets?
Weak entity sets exist in an Entity-Relationship (ER) model for a specific reason related to how
entities are represented and related to one another. While it's true that you can convert a weak
entity set into a strong entity set by adding attributes, weak entity sets are still useful in database
modeling because they serve a specific purpose in representing certain types of relationships and
dependencies between entities. Here’s why weak entity sets are important:
A weak entity set is an entity set that cannot be uniquely identified by its own attributes alone.
Instead, it relies on a strong entity set (also known as the owner entity set) for its identification. This
relationship is typically called an identifying relationship. The weak entity set has a partial key that is
combined with the primary key of the strong entity to form a unique identifier.
Example: Consider a Dependent entity set that represents the dependents of an employee.
The Dependent entity cannot be uniquely identified by its own attributes (such as
Dependent Name or Birth Date), but it can be uniquely identified by combining the
Employee ID (the strong entity) and Dependent Name.
Weak entity sets are used to represent real-world relationships that are inherently dependent on
other entities. Without weak entity sets, these relationships might require redundant or overly
complex structures to express.
Example: In a Invoice and Invoice Item scenario, the Invoice Item is a weak entity that
cannot exist independently of the Invoice entity. Each Invoice Item can be uniquely identified
only within the context of a specific invoice. Converting the Invoice Item to a strong entity
would require duplicating the Invoice ID in every Invoice Item, which introduces redundancy
and complicates the design.
By using a weak entity set, we avoid redundancy. A weak entity set naturally minimizes the repetition
of attributes that are already included in the identifying (strong) entity. This helps in maintaining data
integrity and a more efficient representation of the relationship between entities.
Example: If we turn a weak entity into a strong entity by adding the necessary identifying
attributes, we might end up duplicating information that already exists in the strong entity.
For instance, if a Course Enrollment entity is weak and depends on both Student and Course,
converting it into a strong entity would require storing both Student ID and Course ID as part
of the primary key, which could result in redundancy if this relationship is often referenced in
other parts of the system.
Weak entities reflect the real-world concept that some entities do not exist independently but are
part of a larger context. These types of relationships occur frequently in many domains, such as:
Parts in an assembly (a part’s identification depends on the assembly in which it’s used).
Order items in an order (an order item’s identification depends on the order it belongs to).
Weak entities help simplify database design and prevent unnecessary complexity. Without weak
entity sets, you might need to create additional artificial keys or relationships to capture these
dependent relationships, making the schema more complicated and harder to maintain.
7.16 Design a database for an automobile company to provide to its dealers to assist them in
maintaining customer records and dealer inventory and to assist sales staff in ordering cars.
Each vehicle is identified by a vehicle identification number (VIN). Each individual vehicle is a
particular model of a particular brand offered by the company (e.g., the XF is a model of the car
brand Jaguar of Tata Motors). Each model can be offered with a variety of options, but an individual
car may have only some (or none) of the available options. The database needs to store information
about models, brands, and options, as well as information about individual dealers, customers, and
cars.
Your design should include an E-R diagram, a set of relational schemas, and a list of constraints,
including primary-key and foreign-key constraints.
Entities:
1. Brand
2. Model
3. Option
4. Car
o Attributes: VIN (Primary Key), Model_ID (Foreign Key), Color, Year, Price,
Date_Manufactured
5. Dealer
6. Customer
7. Car_Option (Junction table for many-to-many relationship between Car and Option)
8. Order
9. Order_Item
Relationships:
Brand to Model: One Brand offers many Models, but each Model belongs to only one Brand.
(1:M)
Model to Car: One Model can have many Cars, but each Car belongs to only one Model.
(1:M)
Car to Option: A Car can have many Options, and an Option can be applied to many Cars.
(M:N)
Dealer to Car: A Dealer may have many Cars in inventory, and each Car can be available at
multiple Dealers. (M:N)
Customer to Order: A Customer can place multiple Orders, but each Order is placed by one
Customer. (1:M)
Order to Car: An Order can contain multiple Cars, and a Car can be part of multiple Orders.
(M:N)
2. Relational Schema
1. Brand
2. Model
3. Option
4. Car
6. Dealer
7. Customer
8. Order
3. Constraints
1. Primary Key Constraints:
3. Unique Constraints:
5. Check Constraints:
4. Additional Considerations:
Cascading Updates/Deletes:
o When a brand is deleted, all models linked to that brand should be deleted or
updated accordingly (cascading delete).
7.17 Design a generalization–specialization hierarchy for a motor vehicle sales company. The
company sells motorcycles, passenger cars, vans, and buses. Justify your placement of attributes at
each level of the hierarchy. Explain why they should not be placed at a higher or lower level.
In this design, we will create a generalization-specialization hierarchy where the top level will be a
general entity for motor vehicles, and the lower levels will specialize into categories like motorcycles,
passenger cars, vans, and buses.
Motor Vehicle is the general entity, which represents all types of motorized vehicles that the
company sells. The attributes here will capture the common properties shared by all motor vehicles,
regardless of the type.
VIN (Vehicle Identification Number): Unique identifier for each motor vehicle.
Engine Type: The type of engine (e.g., internal combustion, electric, hybrid).
Fuel Type: Type of fuel the vehicle uses (e.g., petrol, diesel, electric).
These attributes apply to all types of vehicles because they define basic characteristics that are
common across motorcycles, cars, vans, and buses.
Now we will specialize the Motor Vehicle entity into more specific vehicle types: Motorcycle,
Passenger Car, Van, and Bus.
Motorcycle
Motorcycles have unique attributes that are not applicable to cars, vans, or buses. The attributes of a
Motorcycle will include those that are specific to motorcycles and are not shared with other vehicle
types.
These attributes are specific to motorcycles because no other vehicle type would have an engine
capacity described in cc or a specific type like sport or cruiser.
Passenger Car
Passenger cars have their own specialized attributes. These cars are primarily designed for
transporting passengers.
Additional Attributes for Passenger Car:
Number of Doors: The number of doors the vehicle has (e.g., 2-door, 4-door).
These attributes apply to Passenger Cars specifically and should not be placed at the Motor Vehicle
level because not all motor vehicles, such as motorcycles or buses, have attributes like seating
capacity or trunk volume.
Van
Vans are often used for transporting goods or people in bulk. They have their own set of specialized
attributes that differentiate them from cars and motorcycles.
Cargo Space: The capacity of the cargo area (in cubic feet).
Sliding Doors: The presence of sliding doors for easier access to the vehicle.
Number of Seats: Vans may have varying seating arrangements depending on whether they
are used for goods or passengers.
These attributes should be placed in the Van subclass because they are specific to vans and don't
apply to Motorcycles or Passenger Cars.
Bus
Buses are vehicles that carry large numbers of passengers and have specialized features for
transportation on a larger scale.
These attributes are specific to Buses and are not relevant to Motorcycles, Passenger Cars, or Vans.
They deal with the bus's capacity and its unique design, such as number of floors or accessibility
features.
At the Motor Vehicle level: Attributes like VIN, Make, Model, Year, Price, Fuel Type, and
Engine Type are shared by all vehicle types. These attributes are essential identifiers for any
vehicle, making them applicable at the Motor Vehicle level.
At the Motorcycle level: The attributes Engine Capacity and Type are specific to
motorcycles. These should not be at the Motor Vehicle level because Engine Capacity
doesn’t apply to cars or buses, and Type is unique to motorcycles.
At the Passenger Car level: The attributes Number of Doors, Seating Capacity, and Trunk
Volume are specific to cars. While other vehicles may have seating capacity, trunk volume
and the number of doors are only meaningful in the context of passenger cars.
At the Van level: Attributes like Cargo Space and Sliding Doors are crucial for Vans but don't
apply to motorcycles or cars. These attributes describe aspects of the van’s functionality that
are not relevant for the other vehicle types.
At the Bus level: Passenger Capacity, Number of Floors, and Accessibility Features are all
attributes specific to buses. These features make buses distinct from other vehicle types,
such as motorcycles, which have a very different structure and purpose.
1.1 Define the following terms: data, database, DBMS, database system, database catalog, program-
data independence, user view, DBA, end user, canned transaction, deductive database system,
persistent object, meta-data, and transaction-processing application.
Data
Raw facts or figures that have no context or meaning on their own but can be processed to
produce information. For example, numbers like "23" or names like "Alice."
Database
An organized collection of related data that is stored electronically in a way that allows for
easy access, management, and updating. For example, a customer database storing names,
addresses, and purchase histories.
DBMS (Database Management System)
Software that provides an interface for users and applications to interact with the database,
enabling data storage, retrieval, and manipulation while ensuring data integrity and security.
Examples include MySQL, PostgreSQL, and Oracle.
Database System
A system that consists of the database, the DBMS, and the applications that use the DBMS to
perform various tasks like querying or updating data.
Database Catalog
A repository that stores metadata about the database, such as schema definitions, tables,
columns, data types, constraints, and user permissions. It is used by the DBMS to manage
the database.
Program-Data Independence
The ability to modify the database schema without having to change the application
programs that access the database. This is achieved by separating the data structure
(schema) from the application logic.
User View
A subset of the database or an abstraction tailored to the needs of a particular user or group.
For example, an accountant might only see financial data, while an HR employee sees
employee records.
DBA (Database Administrator)
A person responsible for managing the database system, including tasks such as designing
the schema, monitoring performance, ensuring data security, and performing backups and
recovery.
End User
A person who directly interacts with the database through applications or query tools. End
users can be casual users (e.g., employees using a report generator) or sophisticated users
(e.g., analysts writing SQL queries).
Canned Transaction
Predefined database operations or queries that are repeatedly executed by end users, often
through a user-friendly interface. For example, placing an order in an e-commerce system.
Deductive Database System
A database system that integrates logic programming (e.g., Prolog) with a database. It can
derive new facts and relationships using inference rules and stored data.
Persistent Object
An object in an object-oriented database that retains its state across multiple sessions and
exists beyond the runtime of the application that created it. For example, a customer object
in an e-commerce system.
Meta-Data
Data about the data, describing the structure, organization, and constraints of the data in the
database. Metadata includes table names, column names, data types, and relationships.
Transaction-Processing Application
An application designed to handle a sequence of database operations (transactions) in a way
that ensures data integrity and consistency, even in the event of failures. Examples include
banking systems, e-commerce systems, and ticket-booking systems.
1.2 What four main types of actions involve databases? Briefly discuss each.
1. Data Definition
Purpose: Defining the structure and organization of data in the database.
Description: This involves creating, modifying, and deleting database schemas, such as
defining tables, columns, data types, constraints, and relationships.
Example: Using SQL to create a table:
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(50),
salary DECIMAL(10, 2)
);
2. Data Manipulation
Purpose: Handling and modifying the data stored in the database.
Description: Includes inserting new data, updating existing data, deleting data, and retrieving
data. These actions are performed using a query language like SQL.
Example:
o Insert Data:
INSERT INTO employees (id, name, salary) VALUES (1, 'Alice', 50000);
o Retrieve Data:
SELECT * FROM employees;
3. Data Querying
Purpose: Retrieving specific information from the database.
Description: This involves writing queries to filter, aggregate, or analyze the data. Querying is
one of the most common operations performed by users to get meaningful insights or
reports.
Example:
Retrieve all employees with a salary greater than $40,000:
SELECT name FROM employees WHERE salary > 40000;
4. Transaction Management
Purpose: Ensuring data consistency and integrity during multiple operations.
Description: A transaction is a group of one or more operations performed as a single unit. It
ensures the ACID properties (Atomicity, Consistency, Isolation, Durability) are maintained,
even during system failures.
Example:
Transfer $100 from one account to another:
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
1.3 Discuss the main characteristics of the database approach and how it differs from traditional file
systems.
1. Self-Describing Nature
Database Approach:
o The database contains not only the data but also metadata (data about the
structure, constraints, and schema).
o Metadata is stored in a catalog and makes the system self-describing.
File Systems:
o Metadata is embedded in application programs, making it less flexible and harder to
manage.
2. Program-Data Independence
Database Approach:
o Changes to the database schema (e.g., adding columns to a table) do not require
changes to application programs.
o Achieved through a data abstraction layer provided by the DBMS.
File Systems:
o Data and programs are tightly coupled. Changes to file structure often require
modifying application logic.
3. Data Abstraction
Database Approach:
o Data is organized at multiple abstraction levels:
Physical Level: How data is stored.
Logical Level: What data is stored and the relationships.
View Level: How users see the data.
o DBMS hides complexities from users.
File Systems:
o No abstraction; users and developers deal directly with data at the physical level.
4. Support for Multiple Views of Data
Database Approach:
o Different users can see different subsets or formats of the data depending on their
needs (e.g., an HR team might see employee details, while finance sees payroll data).
File Systems:
o Data views are fixed, often requiring separate files or additional logic for different
perspectives.
5. Data Sharing and Multiuser Access
Database Approach:
o Designed for concurrent access by multiple users while maintaining data consistency.
o Uses transaction management and locking mechanisms.
File Systems:
o Limited support for concurrent access; often requires manual handling to avoid
inconsistencies.
6. Data Integrity and Security
Database Approach:
o Enforces data integrity using constraints (e.g., foreign keys, unique constraints).
o Offers robust security mechanisms to control access at different levels.
File Systems:
o Integrity and security mechanisms must be implemented manually, often resulting in
redundancy and errors.
7. Reduction of Data Redundancy and Inconsistency
Database Approach:
o Centralized control reduces redundant data storage, ensuring consistency.
File Systems:
o Separate files may lead to duplicate data, increasing redundancy and potential
inconsistencies.
8. Backup and Recovery
Database Approach:
o Built-in mechanisms for data backup and recovery in case of failure.
File Systems:
o Limited or no built-in recovery; backup must be managed manually.
9. Complex Querying and Reporting
Database Approach:
o Allows complex querying through SQL and advanced reporting tools.
File Systems:
o Requires custom programming for complex queries, which is time-consuming and
error-prone.
1.4 What are the responsibilities of the DBA and the database designers?
Responsibilities of the Database Administrator (DBA)
The DBA is responsible for the overall management, maintenance, and security of the database
system. Key responsibilities include:
7. User Support
Assists end-users, developers, and analysts with database-related queries and issues.
Provides training and documentation on database usage.
1. Requirements Analysis
Works with stakeholders to understand data requirements, workflows, and constraints.
Identifies relationships between data entities and ensures the database meets business
needs.
2. Data Modeling
Creates conceptual, logical, and physical data models.
Defines the structure of tables, relationships, keys (primary, foreign, candidate), and
constraints.
3. Normalization
Applies normalization techniques to eliminate data redundancy and maintain consistency.
Balances normalization with performance considerations to optimize query efficiency.
4. Schema Design
Designs the database schema, specifying tables, columns, data types, constraints, and
relationships.
Incorporates indexing strategies for efficient data retrieval.
1.5