Dbms Mod1,2 Soln Navathe
Dbms Mod1,2 Soln Navathe
MODULE 1
• Data: Raw facts that have meaning (e.g., name, age, marks).
• DBMS (Database Management System): Software used to manage and access databases (e.g., MySQL).
• DBA (Database Administrator): Person responsible for managing and controlling the database.
• DBA:
o Monitoring performance.
• Database Designers:
• Transaction management
• Concurrency control
• Controlled redundancy: Data duplicated for efficiency, with DBMS control (e.g., department name
stored once and referenced).
• Uncontrolled redundancy: Data repeated unnecessarily, causing inconsistency (e.g., same employee
data stored in multiple files).
prefix for the course number also changes, identify the columns
1.14 (a)
If the CS department name changes to CSSE and the course-number prefix changes, these columns must be
updated:
• STUDENT.Major — students whose major = CS must change to CSSE.
• COURSE.Course_number — the course numbers that start with CS (e.g. CS1310, CS3320, CS3380) must
be updated to the new prefix.
• SECTION.Course_number — since SECTION stores the course number, those rows that reference CS
courses must be updated.
(Other tables like GRADE_REPORT use Section_identifier and do not need changes.)
1.14 (b)
Yes. Restructure to avoid multiple updates by factoring department/prefix out of the course-number string
and using keys/foreign keys. Example design (short):
• The course prefix is stored once (in DEPARTMENT as dept_code or generated from it).
• If the department changes name/prefix, you update one place (the DEPARTMENT entry or its code
mapping) and all courses/sections/prerequisites continue to refer by keys — no string updates needed.
10 marks qs:
2.1. Define the following terms: data model, database schema, database state, internal schema, conceptual
schema, external schema, data independence,DDL, DML, SDL, VDL, query language, host language, data
sublanguage,database utility, catalog, client/server architecture, three-tier architecture,and n-tier architecture.
2.4. Describe the three-schema architecture. Why do we need mappings among schema levels? How do
different schema definition languages support thisarchitecture?
2.12. Think of different users for the database shown in Figure 1.2. What types of applications would each user
need? To which user category would each belong, and what type of interface would each need?
2.13. Choose a database application with which you are familiar. Design a schema and show a sample database
for that application, using the notation of Figures 1.2 and 2.1. What types of additional information and
constraints would you like to represent in the schema? Think of several users of your database, and design a
view for each.
2.14. If you were designing a Web-based system to make airline reservations and sell airline tickets, which
DBMS architecture would you choose from Section 2.5? Why? Why would the other architectures not be a
good choice?
2.15. Consider Figure 2.1. In addition to constraints relating the values of columns in one table to columns in
another table, there are also constraints that impose restrictions on values in a column or a combination of
columns within a table. One such constraint dictates that a column or a group of columns must be unique
across all rows in the table. For example, in the STUDENT table, the Student_number column must be unique
(to prevent two different students from having the same Student_number). Identify the column or the group
of columns in the other tables that must be unique across all rows in the table. ANS:
Term Definition
Data Model A set of concepts for describing the structure of a database (e.g.,
relational model).
Database Schema Logical design or blueprint of the database showing tables, fields, and
relationships.
Conceptual Schema Logical view describing all entities and relationships in the database.
Data Independence Ability to change schema at one level without affecting other levels.
DDL (Data Definition Language) Used to define schema (CREATE, ALTER, DROP).
DML (Data Manipulation Used to manipulate data (INSERT, UPDATE, DELETE, SELECT).
Language)
Client/Server Architecture Database runs on a server, users interact via client applications.
Three-Schema Architecture:
3. External Schema – Describes user views (different for each group of users).
Mappings:
Student Naive user View grades, register for courses Web form / student portal
Instructor Parametric user Enter or update grades, manage Input forms
sections
Tables:
Table Columns
Relationships:
Constraints:
• Return_Date ≥ Issue_Date.
User Views:
User View
Chosen Architecture:
Three-Tier Architecture
Why:
• Separates user interface (web/app), business logic (reservation rules), and database (flight data).
STUDENT Student_number
COURSE Course_number
SECTION Section_identifier
GRADE_REPORT (Student_number, Section_identifier) pair (a student cannot have multiple grades for the
same section)
5marks:
2.2 Main categories of data models and their differences
1. Conceptual / High-level models: Describe data in a way users understand (e.g., ER model).
2. Logical / Representational models: Describe data structure in DBMS (e.g., relational model).
3. Physical / Low-level models: Describe how data is stored in memory and disks.
Relational Model Data stored in tables (rows & columns) Simple structure, uses SQL
Object Model Data as objects with attributes and methods Supports inheritance & encapsulation
XML Model Data stored as hierarchical XML documents Suited for web data exchange
Term Definition
Database The logical design or structure of the database (tables, fields, relationships). It rarely
Schema changes.
Database State The actual content (data) stored in the database at a particular time. It changes
frequently.
Example:
Schema = definition of STUDENT(Name, Roll, Dept)
State = actual rows like (‘Smith’, 17, ‘CS’)
Logical Data Change in conceptual schema does not Add a new field in a table without
Independence affect external schema. changing user views.
Physical Data Change in physical storage does not affect Moving data to SSD from HDD.
Independence conceptual schema.
Harder to achieve:
Logical data independence is harder because conceptual changes often affect user views and applications.
Procedural DML User specifies how to get data (navigational). Embedded SQL, Relational Algebra.
Nonprocedural DML User specifies what data to retrieve, not how. SQL SELECT statement.
Example:
Two-tier Client directly communicates with database server. Desktop application (e.g., MS Access).
Three-tier Client → Application Server → Database Server. Web apps (e.g., online banking).
Difference:
Three-tier adds a middle layer for business logic → increases security, scalability, and performance.
Additional Functionality:
• Divides application logic into more specialized layers (e.g., business, authentication, analytics).
Example (4-tier):
Client → Web Server → Application Server → Database Server.
5 marks:
Perfect, Angshul — here are your short, clear, and full-mark answers (5 marks each) for DBMS Unit 3 (3.1
– 3.15) — written simply so you can revise and score full marks easily.
• Helps designers and users communicate without worrying about physical details.
2. Not applicable: e.g., middle name for someone who doesn’t have one.
3. Not available: e.g., grade not yet assigned.
3.3 Definitions
Term Definition
Attribute Value Actual data value for an attribute (e.g., ‘John’ for Name).
Relationship Instance Association between specific entities (e.g., Student ‘John’ enrolled in ‘DBMS’).
Composite Attribute Attribute made up of smaller parts (e.g., Name → First, Last).
Derived Attribute Attribute derived from others (e.g., Age from DOB).
Key Attribute Attribute that uniquely identifies an entity (e.g., Roll No).
Value Set (Domain) Set of all possible values an attribute can take.
Term Definition
Entity Set All entities of a type that exist in database (e.g., all student records).
Difference:
Value Set / Domain Defines all possible valid values that attribute can take (e.g., Name domain = string).
Term Meaning
Relationship Instance Specific link between entities (e.g., Ravi enrolled in DBMS).
• Need: When the same entity type participates more than once in the same relationship (to avoid
confusion).
1. (min, max) notation: Specifies minimum and maximum number of times an entity participates.
2. Cardinality ratio and participation: Specifies 1:1, 1:N, M:N and whether total or partial participation.
• It can be migrated to one of the participating entity types only if the relationship is 1:1 or 1:N.
• This concept forms the basis of object-oriented and object-relational models, where relationships are
represented as object references.
• Example:
• Owner Entity Type: Entity on which weak entity depends (e.g., Department).
• Weak Entity Type: Cannot be uniquely identified without owner (e.g., Dependent).
• Partial Key: Attribute(s) that uniquely identify weak entities under the same owner (e.g.,
Dependent_Name).
Yes.
• Attributes: Ovals
• Relationships: Diamonds
Which combinations of attributes have to be unique for each individual SECTION entity in the UNIVERSITY
database (shown in Figure 3.20) to enforce each of the following miniworld constraints:
a. During a particular semester and year, only one section can use a particular classroom at a particular
DaysTime value. b. During a particular semester and year, an instructor can teach only one section at a
particular DaysTime value. c. During a particular semester and year, the section numbers for sections offered
for the same course must all be different. d. Can you think of any other similar constraints?
To enforce each miniworld constraint, the following attribute combinations must be unique for each SECTION:
a. Only one section can use a particular classroom at a given DaysTime during a semester and year
Unique combination:
(Semester, Year, Building, RoomNo, DaysTime)
Reason:
Prevents two sections from occupying the same classroom at the same time.
b. An instructor can teach only one section at a given DaysTime during a semester and year
Unique combination:
(Semester, Year, Instructor_ID, DaysTime)
Reason:
Ensures that one instructor cannot teach two different classes at the same time.
c. Section numbers for the same course must be different in the same semester and year
Unique combination:
(Course_Code, Semester, Year, SecNo)
Reason:
Prevents duplicate section numbers for the same course within one semester.
• A student cannot register for two sections of the same course in one semester.
→ (Student_ID, Course_Code, Semester, Year) must be unique.
Composite and multivalued attributes can be nested to any number of levels. Suppose we want to design an
attribute for a STUDENT entity type to keep track of previous college education.
Such an attribute will have one entry for each college previously attended, and each such entry will be
composed of:
• college name
Furthermore:
• Each degree entry contains the degree name and the month and year the degree was awarded.
• Each transcript entry contains a course name, semester, year, and grade.
Design an attribute to hold this information. Use the conventions in Figure 3.5.
Design an attribute for Previous_College_Education to record each college attended, with nested attributes for
degree and transcript details:
Previous_College_Education = {
(College_Name,
Start_Date,
End_Date,
Degree = {
(Degree_Name, Degree_Month, Degree_Year)
},
Transcript = {
(Course_Name, Semester, Year, Grade)
}
)
}
Explanation:
Show an alternative design for the attribute described in Exercise 3.17 that uses only entity types (including
weak entity types, if needed) and relationship types.
Convert the above attribute into entity types and relationship types for normalization:
Entities:
Relationships:
Explanation:
Consider the ER diagram in Figure 3.21, which shows a simplified schema for an airline reservations system.
Extract from the ER diagram the requirements and constraints that produced this schema. Try to be as precise
as possible in your requirements and constraints specification.
Requirements extracted:
1. Airports: Each airport has a unique Airport_code, Name, City, and State.
6. Leg Instance: Represents a specific occurrence of a leg on a certain Date, with attributes like
No_of_avail_seats.
7. Reservation: Each reservation links a customer (Customer_Name, Cphone) to a seat on a leg instance.
Constraints:
• Each Flight Leg has one Departure and one Arrival Airport.
In Chapters 1 and 2, we discussed the database environment and database users. We can consider many entity
types to describe such an environment, such as DBMS, stored database, DBA, and catalog/data dictionary.
1. Try to specify all the entity types that can fully describe a database system and its environment.
Entities:
Explanation:
This ER schema describes how the DBMS, databases, users, DBA, and hardware/software interact in a
complete database environment.
Note: সমস্ত ER diagram গুল ো chatgpt এর বোনোলনো, কিন্তু shape ভু আলে। entity থোিলব বক্স এ (eg DBMS ) , relation থোলি বরকি তে (eg
manages ) , attributes থোলি eclipse এ (eg name , version .. )। বোকি সব ঠিি ।
Design an ER schema for keeping track of information about votes taken in the U.S. House of Representatives
during the current two-year congressional session.
• U.S. STATE:
o Name (e.g., ‘Texas’, ‘New York’, ‘California’).
• CONGRESS_PERSON:
o Name.
o District represented.
o Bill_name.
The database also keeps track of how each congressperson voted on each bill (of Vote attribute is {‘Yes’, ‘No’,
‘Abstain’, ‘Absent’}). Draw an ER schema diagram for this application. State clearly any assumptions you make.
Entities:
Entity Attributes
CONGRESS_PERSON CP_ID, Name, District, Start_Date, Party {Republican, Democrat, Independent, Other}
Relationships:
1. REPRESENTS:
2. SPONSORS:
ER Diagram Summary:
Here are the provided questions with proper formatting, without any changes to the content.
A database is being constructed to keep track of the teams and games of a sports league. A team has a
number of players, not all of whom participate in each game. It is desired to keep track of:
Assumptions (short):
• Sport: Soccer.
• We must record for each player in a game: the position played and whether they started/substituted,
and the game result for the team.
• Players may belong to only one team at a time (if you prefer, can relax to many teams over seasons).
• Relationship PLAYS_IN_GAME between PLAYER and GAME with attributes Position, Started (Y/N),
Minutes_Played.
• Relationship PARTICIPATES between TEAM and GAME with attribute Result (Win/Loss/Draw) and role
names Home/Away.
Consider the ER diagram shown in Figure 3.22 for part of a BANK database. Each bank can have multiple
branches, and each branch can have multiple accounts and loans.
a. List the strong (nonweak) entity types in the ER diagram. b. Is there a weak entity type? If so, give its name,
partial key, and identifying relationship. c. What constraints do the partial key and the identifying relationship
of the weak entity type specify in this diagram? d. List the names of all relationship types, and specify the
(min, max) constraint on each participation. e. List concisely the user requirements that led to this ER schema
design. f. Suppose that every customer must have at least one account but is restricted to at most two loans at
a time, and that a bank branch cannot have more than 1,000 loans. How does this show up on the (min, max)
constraints?
• Yes: ACCOUNT and LOAN are shown as weak w.r.t. BANK_BRANCH (in the textbook figure they are
depicted with identifying relationships ACCTS and LOANS from BANK_BRANCH).
o Example: For ACCOUNT weak-entity: partial key = Acct_no (unique only within branch),
identifying relationship = ACCTS (connects BANK_BRANCH → ACCOUNT).
Note: Some texts treat ACCOUNT/LOAN as regular entities with composite key (BranchNo, Acct_no); the book
shows them as weak.
• The partial key (Acct_no for ACCOUNT) by itself does not uniquely identify an account globally — it
identifies it within its owning branch. The identifying relationship ACCTS indicates that each ACCOUNT
must be identified together with the owning BANK_BRANCH (owner). So global key = (Branch_no,
Acct_no). Same for LOAN: (Branch_no, Loan_no).
d. List relationship types & (min,max) participation (from diagram):
(Use figure notation: e.g., BRANCHES between BANK and BANK_BRANCH is 1..N etc.)
o BANK_BRANCH: (1,1) — each account must belong to exactly one branch (owner), ACCOUNT:
(1,N) or (0,N) depending on whether account must exist with branch. In the figure, identifying
relation implies ACCOUNT depends on branch and participates (N).
o ACCOUNT side: (1,N) or (M,N) depending on notation in fig (looks like M on customer side, N on
account side) → Usually: a CUSTOMER may have many ACCOUNTS (M), an ACCOUNT may have
many CUSTOMERS (N) (joint accounts).
(If teacher expects exact numeric (min,max) from figure: A_C is M:N, L_C is M:N. Identifying relationships
impose owner side (1) on branch.)
• Support joint accounts and multiple customers per loan → M:N between ACCOUNT/LOAN and
CUSTOMER.
• Need to store account balances, loan amounts, account/loan numbers (unique per branch).
f. If every customer must have ≥1 account, ≤2 loans, branch ≤1000 loans → (min,max) show-up:
• CUSTOMER participation in A_C (has account): (1, N) meaning min 1 (must have at least one account).
• CUSTOMER participation in L_C (has loan): (0, 2) or if at least possibly 0 loans: (0,2) (max 2).
• BANK_BRANCH participation in LOANS: (0, 1000) meaning a branch may have at most 1000 loans (min
maybe 0).
• Assume that an employee may work in up to two departments or may not be assigned to any
department.
• Assume that each department must have one and may have up to three phone numbers.
1. Supply (min, max) constraints on this diagram. State clearly any additional assumptions you make.
2. Under what conditions would the relationship HAS_PHONE be redundant in this example?
Given: Employee may work in up to two departments or may not be assigned to any. Each department must
have 1..3 phone numbers.
(min,max) constraints:
o DEPARTMENT side: (1,N) — a department can have many employees (min 1 employee
assumed).
• HAS_PHONE between EMPLOYEE and PHONE (in fig phone is shared with DEPARTMENT via CONTAINS):
o EMPLOYEE to PHONE via HAS_PHONE: employee may have (0,N) phones (personal phones);
phone may be assigned to an employee (0,1) if phone is personal.
• Also redundant if phone numbers are only department phones and employees are always associated
with departments — then employee phone can be resolved via department → phone without separate
HAS_PHONE.
3.25. Supplying Constraints and Analyzing Relationship Type (Figure 3.24)
• Assume that a course may or may not use a textbook, but that a text by definition is a book that is used
in some course.
1. Supply (min, max) constraints on this diagram. State clearly any additional assumptions you make.
2. If we add the relationship ADOPTS, to indicate the textbook(s) that an instructor uses for a course,
should it be a binary relationship between INSTRUCTOR and TEXT, or a ternary relationship among all
three entity types?
3. What (min, max) constraints would you put on the relationship? Why?
• A text is always used by some course (text must be used): TEXT participation in USES → (1,N) (min 1).
• Instructors teach from 2 to 4 courses: INSTRUCTOR participation in TEACHES → (2,4); COURSE side:
each course may be taught by (1,N) instructors (depending on co-teaching).
• ADOPTS captures the fact which instructor uses which text for which course. This is inherently ternary
if different instructors may adopt different texts for the same course or an instructor uses a different
text for different courses. So model as ternary: (INSTRUCTOR, COURSE, TEXT) with attribute(s) such as
Adopt_Year or Notes.
o If instead all instructors of a course use the same text(s), then ADOPTS can be modeled as
binary (COURSE — USES — TEXT) and a separate binary link INSTRUCTOR—TEACHES—COURSE
suffices. But to allow instructor-specific adoption, use ternary.
• COURSE participation in ADOPTS: course may have 0..5 adopted texts → (0,5) relative to course in
ADOPTS.
• TEXT participation: text must be used by at least one (if text definition implies used) → (1,N).
• Use ternary ADOPTS(INSTRUCTOR, COURSE, TEXT) if instructor-level adoption matters. Put cardinalities:
(INSTRUCTOR:0,N), (COURSE:0,5), (TEXT:1,N).
3.26. Consider an entity type SECTION in a UNIVERSITY database, which describes the section offerings of
courses. The attributes of SECTION are Section_number, Semester, Year, Course_number, Instructor,
Room_no (where section is taught), Building (where section is taught), Weekdays (domain is the possible
combinations of weekdays in which a section can be offered {'MWF', 'M W', 'T T', and so on}), and Hours
(domain is all possible time periods during which sections are offered {'9-9:50 A.M.', '10-10:50 A.M.', ..., '3:30-
4:50 p.m.', '5:30-6:20 p.m.', and so on}). Assume that Section_number is unique for each course within a
particular semester/year combination (that is, if a course is offered multiple times during a particular
semester, its section offerings are numbered 1, 2, 3, and so on). There are several composite keys for section,
and some attributes are components of more than one key. Identify three composite keys, and show how they
can be represented in an ER schema diagram.
Note: these three keys share attributes (Semester, Year, Course_number, Hours, etc.). Any of these can be
chosen as the primary key depending on design goals. Also enforce uniqueness constraints such as (Building,
Room_no, Semester, Year, Hours) unique to prevent double-booking.
3.27. Cardinality ratios often dictate the detailed design of a database. The cardinality ratio depends on the
real-world meaning of the entity types involved and is defined by the specific application. For the following
binary relationships, suggest cardinality ratios based on the common-sense meaning of the entity types.
Clearly state any assumptions you make.
For each pair I give a short assumption and the recommended cardinality ratio (format: Entity1 (min,max) —
(min,max) Entity2).
1. STUDENT — SOCIAL_SECURITY_CARD
o Assumption: Each student has at most one SS card; each SS card belongs to exactly one student.
o Cardinality: STUDENT (0..1) — (1..1) SOCIAL_SECURITY_CARD
(Or STUDENT (1,1) if all students must have an SS card.)
2. STUDENT — TEACHER
o Assumption: Students attend many teachers’ classes; a teacher teaches many students.
o Cardinality: STUDENT (0..N) — (0..N) TEACHER (i.e., M:N)
3. CLASSROOM — WALL
o Assumption: A classroom has several walls; a wall belongs to exactly one classroom.
o Cardinality: CLASSROOM (1..1) — (1..N) WALL (wall → classroom is many-to-one)
4. COUNTRY — CURRENT_PRESIDENT
o Assumption: A country has at most one current president; a president (as role) is for one
country.
oCardinality: COUNTRY (0..1) — (1..1) CURRENT_PRESIDENT
(min 0 if some countries have no president; use (1..1) if always present)
5. COURSE — TEXTBOOK
o Assumption: A course may use 0..many textbooks; a textbook may be used by many courses.
o Cardinality: COURSE (0..N) — (0..N) TEXTBOOK (M:N)
6. ITEM (in an order) — ORDER
o Assumption: An order has many items; an item/product can appear in many different orders.
o Cardinality: ITEM (0..N) — (0..N) ORDER (M:N)
7. STUDENT — CLASS (enrollment)
o Assumption: Students enroll in many classes; a class has many students.
o Cardinality: STUDENT (0..N) — (0..N) CLASS (M:N)
8. CLASS — INSTRUCTOR
o Assumption: Typically each class is taught by one instructor, but an instructor teaches many
classes. (If co-teaching allowed, class→instructor could be M:N.)
o Cardinality (common case): CLASS (1..1) — (0..N) INSTRUCTOR
If co-teaching: CLASS (1..N) — (0..N) INSTRUCTOR (M:N)
9. INSTRUCTOR — OFFICE
o Assumption: Usually one instructor has one office (or none), an office may be assigned to at
most one instructor.
o Cardinality: INSTRUCTOR (0..1) — (0..1) OFFICE (or INSTRUCTOR (1..1) — (1..1)
OFFICE if all instructors must have offices)
10. EBAY_AUCTION_ITEM — EBAY_BID
o Assumption: An item can have many bids; a bid is placed for exactly one item.
o Cardinality: EBAY_AUCTION_ITEM (1..1) — (0..N) EBAY_BID (i.e., 1:N)
3.28. Consider the ER schema for the MOVIES database in Figure 3.25. Assume that MOVIES is a populated
database. ACTOR is used as a generic term and includes actresses. Given the constraints shown in the ER
schema, respond to the following statements with True, False, or Maybe. Assign a response of Maybe to
statements that, although not explicitly shown to be True, cannot be proven False based on the schema.
Justify each answer.
a. There are no actors in this database that have been in no movies. b. There are some actors who have acted
in more than ten movies. c. Some actors have done a lead role in multiple movies. d. A movie can have only a
maximum of two lead actors. e. Every director has been an actor in some movie.f. No producer has ever been
an actor.g. A producer cannot be an actor in some other movie. h. There are movies with more than a dozen
actors. i. Some producers have been a director as well. j. Most movies have one director and one producer. k.
Some movies have one director but several producers. l. There are some actors who have done a lead role,
directed a movie, and produced a movie. m. No movie has a director who also acted in that movie.
• LEAD_ROLE is a relationship between ACTOR and MOVIE with a cardinality constraint shown on the
ACTOR side as 2 (this indicates an actor can have at most 2 lead roles) and N on the MOVIE side (a
movie can have many lead actors).
• ALSO_A_DIRECTOR links ACTOR and DIRECTOR with 1..1 (shows every DIRECTOR corresponds to an
ACTOR — i.e., directors are also actors).
• ACTOR_PRODUCER connects ACTOR and PRODUCER with 1..1 (indicates producers are also actors in
this schema).
• DIRECTS is 1:N from DIRECTOR → MOVIE (each movie has one director; director may direct many
movies).
• PRODUCES is M:N between PRODUCER and MOVIE (a movie can have many producers and producers
can produce many movies).
Now answers:
a. There are no actors in this database that have been in no movies. → Maybe
• Justification: Schema doesn't show a mandatory minimum participation constraint requiring every
ACTOR to participate in PERFORMS_IN. Without an explicit minimum (e.g., 1), an actor may exist
without any performances. So we cannot prove "no actors with zero movies".
b. There are some actors who have acted in more than ten movies. → Maybe
• Justification: PERFORMS_IN is M:N with no shown upper bound on the number of movies an actor may
perform in; it allows >10, but schema does not force existence of such actors. So it's possible but not
guaranteed.
• Justification: LEAD_ROLE has a numeric limit on the actor side (the diagram shows 2 near actor), which
suggests an actor can be lead in up to 2 movies. That allows “multiple” (i.e., more than one) but at
most 2. The schema permits it but doesn't guarantee at least one actor has done so. So answer is
Maybe.
• Justification: The LEAD_ROLE cardinality near the MOVIE side is N (no small upper bound shown), so a
movie can have many lead actors. Nothing in the diagram restricts movies to at most two leads.
e. Every director has been an actor in some movie. → True
• Justification: The ALSO_A_DIRECTOR relationship links DIRECTOR and ACTOR with 1..1 participation on
both sides (diagram indicates each DIRECTOR is associated with exactly one ACTOR), meaning every
director is an actor in the schema — so True.
• Justification: The ACTOR_PRODUCER relationship connects ACTOR and PRODUCER (the diagram
indicates overlap), so producers may also be actors. The schema allows/indicates producers as actors,
so the statement “no producer has ever been an actor” is false.
• Justification: The schema allows a PRODUCER to be associated with ACTOR (via ACTOR_PRODUCER)
and actors perform in movies (PERFORMS_IN), so nothing prohibits a producer from acting in another
movie. So the statement is false.
• Justification: PERFORMS_IN has N on the movie side, so a movie can have many actors (including >12).
The schema allows it but does not guarantee at least one such movie exists -> Maybe.
• Justification: The schema allows overlap: producer ↔ actor, director ↔ actor — if the same person is
linked to both roles (possible), then a producer could also be a director. The schema permits it but
doesn't force it -> Maybe.
• Justification: The schema shows DIRECTS as 1 director per movie (director side 1) but PRODUCES is
M:N (multiple producers allowed). The word “most” is a data/population-level claim (not enforced by
schema), so the schema cannot determine “most” -> Maybe.
• Justification: DIRECTS indicates one director per movie; PRODUCES allows many producers per movie
(M:N). So the schema directly allows such movies -> True.
l. Some actors have done a lead role, directed a movie, and produced a movie. → Maybe
• Justification: The schema allows an actor to be a director and/or producer (via linking relationships)
and to have lead roles. It’s possible for a person to appear in all three roles, but schema does not force
existence. So Maybe.
• Justification: Nothing prevents a director to be an actor in the same movie (directors are actors per
schema). The diagram does not forbid a director appearing as actor in the movie they direct; thus the
statement is false.
Quick exam-writing tips (for these questions)
• For 3.26: Draw the SECTION box, list attributes, then write the 3 composite keys next to it and explain
why each is useful — 3–4 lines will get full marks. Include uniqueness constraints for room/time and
instructor/time.
• For 3.27: Give each pair a short assumption (1 line) and the cardinality in (min,max) form — very high-
yield.
• For 3.28: For each statement give one-word answer (True / False / Maybe) plus a one-line justification
referring to the diagram’s cardinalities or relationship presence.
Given the ER schema for the MOVIES database in Figure 3.25, draw an instance diagram using three movies
that have been released recently. Draw instances of each entity type: MOVIES, ACTORS, PRODUCERS,
DIRECTORS involved; make up instances of the relationships as they exist in reality for those movies.
Example movies: Oppenheimer (2023), Barbie (2023), Everything Everywhere All at Once (2022).
Instances
Relationships:
• Michelle Yeoh and Ke Huy Quan PERFORM_IN EEAAO; Yeoh has LEAD_ROLE.
• Emma Thomas PRODUCES Oppenheimer; Margot Robbie PRODUCES Barbie; Jonathan Wang PRODUCES
EEAAO.
3.30. Illustrating a UML Diagram for Exercise 3.16
Illustrate the UML diagram for Exercise 3.16. Your UML design should observe the following requirements: a. A
student should have the ability to compute his/her GPA and add or drop majors and minors. b. Each
department should be able to add or delete courses and hire or terminate faculty. c. Each instructor should be
able to assign or change a student’s grade for a course.
• In the exam, draw classes with attributes and listed methods (just the method names are usually
enough).
• Add associations and multiplicities. Mention that some methods could be implemented in
helper/service classes (Registrar, DeptOffice) if asked.
Consider the UNIVERSITY database described in Exercise 3.16. Build the ER schema for this database using a
data modeling tool such as ERwin or Rational Rose.
ER schema same as Fig 3.20; entities: STUDENT, INSTRUCTOR, DEPARTMENT, COURSE, SECTION, COLLEGE.
Relationships: TEACHES, TAKES, OFFERS, EMPLOYS, HAS, CHAIR, DEAN.
Consider a MAIL_ORDER database in which employees take orders for parts from customers. The data
requirements are summarized as follows:
• The mail order company has employees, each identified by a unique employee number, first and last
name, and Zip Code.
• Each customer of the company is identified by a unique customer number, first and last name, and Zip
Code.
• Each part sold by the company is identified by a unique part number, a part name, price, and quantity
in stock.
• Each order placed by a customer is taken by an employee and is given a unique order number. Each
order contains specified quantities of one or more parts. Each order has a date of receipt as well as an
expected ship date. The actual ship date is also recorded.
Design an entity–relationship diagram for the mail order database and build the design using a data modeling
tool such as ERwin or Rational Rose.
Relationships
• Each movie is identified by title and year of release. Each movie has a length in minutes. Each has a
production company, and each is classified under one or more genres (such as horror, action, drama,
and so forth). Each movie has one or more directors and one or more actors appear in it. Each movie
also has a plot outline. Finally, each movie has zero or more quotable quotes, each of which is spoken
by a particular actor appearing in the movie.
• Actors are identified by name and date of birth and appear in one or more movies. Each actor has a
role in the movie.
• Directors are also identified by name and date of birth and direct one or more movies. It is possible for
a director to act in a movie (including one that he or she may also direct).
• Production companies are identified by name and each has an address. A production company
produces one or more movies.
Design an entity–relationship diagram for the movie database and enter the design using a data modeling tool
such as ERwin or Rational Rose.
Entities
• ACTOR(Name, DOB)
• DIRECTOR(Name, DOB)
• PRODUCTION_COMPANY(Name, Address)
• GENRE(GenreName)
Relationships
Consider a CONFERENCE_REVIEW database in which researchers submit their research papers for
consideration. Reviews by reviewers are recorded for use in the paper selection process. The database system
caters primarily to reviewers who record answers to evaluation questions for each paper they review and make
recommendations regarding whether to accept or reject the paper. The data requirements are summarized as
follows:
• Authors of papers are uniquely identified by e-mail id. First and last names are also recorded.
• Each paper is assigned a unique identifier by the system and is described by a title, abstract, and the
name of the electronic file containing the paper.
• A paper may have multiple authors, but one of the authors is designated as the contact author.
• Reviewers of papers are uniquely identified by e-mail address. Each reviewer’s first name, last name,
phone number, affiliation, and topics of interest are also recorded.
• Each paper is assigned between two and four reviewers. A reviewer rates each paper assigned to him
or her on a scale of 1 to 10 in four categories: technical merit, readability, originality, and relevance to
the conference. Finally, each reviewer provides an overall recommendation regarding each paper.
• Each review contains two types of written comments: one to be seen by the review committee only
and the other as feedback to the author(s).
Design an entity–relationship diagram for the CONFERENCE_REVIEW database and build the design using a
data modeling tool such as ERwin or Rational Rose.
Entities
Relationships
Consider the ER diagram for the AIRLINE database shown in Figure 3.21. Build this design using a data
modeling tool such as ERwin or Rational Rose. (practical qs)
Answer (summary):
Entities: AIRPORT, AIRPLANE_TYPE, AIRPLANE, FLIGHT, FLIGHT_LEG, LEG_INSTANCE, FARE, SEAT, RESERVATION.
Relationships: DEPARTURE_AIRPORT, ARRIVAL_AIRPORT, INSTANCE_OF, ASSIGNED, RESERVES, CAN_LAND,
LEGS, FARES.
Follow exactly Figure 3.21; can be implemented in any tool.
4.1. What is a subclass? When is a subclass needed in data modeling?
• A subclass is a subgroup (subset) of entities in a superclass that share some additional attributes or
relationships not common to all entities of the superclass.
• It is needed when some entities of a type have special properties or behaviors distinct from others.
Example: In an EMPLOYEE entity, if some are MANAGERS with extra attributes like Bonus, then
MANAGER is a subclass of EMPLOYEE.
Term Definition
Superclass of a subclass The higher-level entity type from which a subclass inherits attributes and
relationships.
Superclass/Subclass The connection showing that every entity in a subclass is also a member of its
Relationship superclass.
IS-A Relationship Another name for the superclass/subclass link (e.g., Manager IS-A Employee).
Specialization The process of defining one or more subclasses from an existing entity type.
Generalization The reverse process—combining two or more entity types into a single,
higher-level superclass.
Category (Union type) A subclass with members coming from different superclasses. (e.g., OWNER is
a category of PERSON and COMPANY.)
Specific (local) attributes Attributes defined only for the subclass.
Specific relationships Relationships that apply only to a subclass but not to its superclass.
• Inheritance means a subclass automatically receives all attributes and relationships of its superclass.
User-defined Formed explicitly by the modeler MANAGER subclass defined manually from
subclass based on application semantics. EMPLOYEE.
Predicate-defined Formed automatically by a condition EMPLOYEE with Salary > 1,00,000 forms
subclass (predicate) on attribute values. subclass HIGH_PAID_EMPLOYEE.
Difference:
User-defined Defined explicitly by the designer, based Specializing EMPLOYEE into MANAGER,
specialization on real-world categories. ENGINEER, etc.
4.6. Discuss the two main types of constraints on specializations and generalizations.
1. Disjointness Constraint
o Determines whether an entity can belong to only one subclass (disjoint) or multiple subclasses
(overlapping).
o Example: CAR and TRUCK disjoint subclasses of VEHICLE; STUDENT and EMPLOYEE may overlap.
2. Completeness Constraint
o Specifies whether all entities in the superclass must belong to a subclass (total) or some may
not (partial).
o Example:
4.7. What is the difference between a specialization hierarchy and a specialization lattice?
Type Description
Hierarchy Each subclass has only one direct superclass. Forms a tree structure.
Lattice A subclass may have multiple superclasses (multiple inheritance). Forms a lattice structure.
Example:
4.8. What is the difference between specialization and generalization? Why not shown separately in
diagrams?
• In EER diagrams, both appear the same visually (same notation for IS-A link).
• The difference lies only in the direction of design thinking, not in the structure itself.
4.9. How does a category differ from a regular shared subclass? What is a category used for? Give examples.
• A category (union type) is a special kind of subclass whose members come from two or more
superclasses that are not necessarily related.
• A shared subclass inherits from two or more superclasses that have a common ancestor.
Use: Categories are used when entities can belong to different, unrelated superclasses, e.g., ownership,
membership, or partnership relationships.
4.10. UML terms and their corresponding EER terms
Qualified Relationship with identifying Uses qualifier attribute to identify related objects.
association attribute
4.11. Differences between EER schema diagrams and UML class diagrams
These abstractions help in simplifying complex real-world data into structured models.
4.13. What aggregation feature is missing from the EER model? How can it be enhanced?
• Missing Feature:
The EER model lacks explicit aggregation — the ability to represent a relationship as an entity that can
participate in another relationship.
• Enhancement:
Add higher-order relationships or “relationship-as-entity” modeling (used in UML as
aggregation/composition) to show that one relationship forms part of another.
Example:
PROJECT aggregates EMPLOYEE and TASK relationships; representing this explicitly requires the concept of
aggregation/composition.
4.14. Similarities and differences between conceptual database modeling and knowledge representation
(KR)
In short:
EER focuses on data storage and integrity, KR on reasoning and inference.
Purpose Represents concepts, meanings, and relationships in a Defines data structure for
domain for knowledge sharing. storage and retrieval.
Here are the provided Extended Entity-Relationship (EER) design questions formatted clearly:
Design an EER schema for a database application that you are interested in. Specify all constraints that should
hold on the database.
• A superclass/subclass relationship
• A category
Relationships (4+):
Superclass/Subclass:
Category:
• MEMBER category combining STUDENT and INSTRUCTOR (both can join events/community).
Constraints:
Consider the BANK ER schema in Figure 3.22, and suppose that it is necessary to keep track of different types
of ACCOUNTS and LOANS:
Both TRANSACTIONS and PAYMENTS include the amount, date, and time.
Modify the BANK schema, using ER and EER concepts of specialization and generalization. State any
assumptions you make about the additional requirements.
New Entities:
Assumptions:
The following narrative describes a simplified version of the organization of Olympic facilities planned for the
summer Olympics. Draw an EER diagram that shows the entity types, attributes, relationships, and
specializations for this application. State any assumptions you make.
• Multisport complexes have areas of the complex designated for each sport with a location indicator
(e.g., center, NE corner, and so on).
• A complex has a location, chief organizing individual, total occupied area, and so on.
• Each complex holds a series of events (e.g., the track stadium may hold many different races).
• For each event there is a planned date, duration, number of participants, number of officials, and so
on.
• A roster of all officials will be maintained together with the list of events each official will be involved
in.
• Different equipment is needed for the events (e.g., goal posts, poles, parallel bars) as well as for
maintenance.
• The two types of facilities (one-sport and multisport) will have different types of information. For each
type, the number of facilities needed is kept, together with an approximate budget.
Entities:
Relationships:
Specialization:
o ONE_SPORT_COMPLEX
Category:
• RESOURCE category combining EQUIPMENT and FACILITY (both are resources used in events).
Constraints:
4.19. Identify all the important concepts represented in the library database case study described below. In
particular, identify the abstractions of classification (entity types and relationship types), aggregation,
identification, and specialization/generalization. Specify (min, max) cardinality constraints whenever possible.
List details that will affect the eventual design but that have no bearing on the conceptual design. List the
semantic constraints separately. Draw an EER diagram of the library database. Case Study: The Georgia Tech
Library (GTL) has approximately 16,000 members, 100,000 titles, and 250,000 volumes (an average of 2.5
copies per book). About 10% of the volumes are out on loan at any one time. The librarians ensure that the
books that members want to borrow are available when the members want to borrow them. Also, the
librarians must know how many copies of each book are in the library or out on loan at any given time. A
catalog of books is available online that lists books by author, title, and subject area. For each title in the
library, a book description is kept in the catalog; the description ranges from one sentence to several pages.
The reference librarians want to be able to access this description when members request information about a
book. Library staff includes chief librarian, departmental associate librarians, reference librarians, check-out
staff, and library assistants. Books can be checked out for 21 days. Members are allowed to have only five
books out at a time. Members usually return books within three to four weeks. Most members know that they
have one week of grace before a notice is sent to them, so they try to return books before the grace period
ends. About 5% of the members have to be sent reminders to return books. Most overdue books are returned
within a month of the due date. Approximately 5% of the overdue books are either kept or never returned. The
most active members of the library are defined as those who borrow books at least ten times during the year.
The top 1% of membership does 15% of the borrowing, and the top 10% of the membership does 40% of the
borrowing. About 20% of the members are totally inactive in that they are members who never borrow. To
become a member of the library, applicants fill out a form including their SSN, campus and home mailing
addresses, and phone numbers. The librarians issue a numbered, machine-readable card with the member’s
photo on it. This card is good for four years. A month before a card expires, a notice is sent to a member for
renewal. Professors at the institute are considered automatic members. When a new faculty member joins the
institute, his or her information is pulled from the employee records and a library card is mailed to his or her
campus address. Professors are allowed to check out books for three-month intervals and have a two-week
grace period. Renewal notices to professors are sent to their campus address. The library does not lend some
books, such as reference books, rare books, and maps. The librarians must differentiate between books that
can be lent and those that cannot be lent. In addition, the librarians have a list of some books they are
interested in acquiring but cannot obtain, such as rare or out-of-print books and books that were lost or
destroyed but have not been replaced. The librarians must have a system that keeps track of books that cannot
be lent as well as books that they are interested in acquiring. Some books may have the same title; therefore,
the title cannot be used as a means of identification. Every book is identified by its International Standard Book
Number (ISBN), a unique international code assigned to all books. Two books with the same title can have
different ISBNs if they are in different languages or have different bindings (hardcover or softcover). Editions of
the same book have different ISBNs. The proposed database system must be designed to keep track of the
members, the books, the catalog, and the borrowing activity.
Abstraction Examples
Weak Entities COPY (depends on TITLE), LOAN (depends on MEMBER & COPY)
ARTIST Name, DOB, DOD, Country_of_origin, Epoch, Each artist can have many art
Main_style objects
EXHIBITION Name, Start_date, End_date Related to art objects on
display
Step 2: Relationships
Step 3: Specializations
1. Type-based:
ART_OBJECT → PAINTING, SCULPTURE, OTHER_OBJECT
2. Ownership-based:
ART_OBJECT → PERMANENT_COLLECTION, BORROWED_OBJECT
(disjoint, total specialization)
Step 4: Constraints
• Each ART_OBJECT must belong to exactly one type subclass and one ownership subclass.
• Total specialization for ownership-based (every art object is either borrowed or permanent).
Figure 4.12 shows an example of an EER diagram for a small-private-airport database; the database is used to
keep track of airplanes, their owners, airport employees, and pilots. From the requirements for this
database, the following information was collected: Each AIRPLANE has a registration number [Reg#], is of a
particular plane type [OF_TYPE], and is stored in a particular hangar [STORED_IN]. Each PLANE_TYPE has a
model number [Model], a capacity [Capacity], and a weight [Weight]. Each HANGAR has a number [Number],
a capacity [Capacity], and a location [Location]. The database also keeps track of the OWNERs of each plane
[OWNS] and the EMPLOYEEs who have maintained the plane [MAINTAIN]. Each relationship instance in OWNS
relates an AIRPLANE to an OWNER and includes the purchase date [Pdate]. Each relationship instance in
MAINTAIN relates an EMPLOYEE to a service record [SERVICE]. Each plane undergoes service many times;
hence, it is related by [PLANE_SERVICE] to a number of SERVICE records. A SERVICE record includes as
attributes the date of maintenance [Date], the number of hours spent on the work [Hours], and the type of
work done [Work_code]. We use a weak entity type [SERVICE] to represent airplane service, because the
airplane registration number is used to identify a service record. An OWNER is either a person or a
corporation. Hence, we use a union type (category) [OWNER] that is a subset of the union of corporation
[CORPORATION] and person [PERSON] entity types. Both pilots [PILOT] and employees [EMPLOYEE] are
subclasses of PERSON. Each PILOT has specific attributes license number [Lic_num] and restrictions [Restr];
each EMPLOYEE has specific attributes salary [Salary] and shift worked [Shift]. All PERSON entities in the
database have data kept on their Social Security number [Ssn], name [Name], address [Address], and
telephone number [Phone]. For CORPORATION entities, the data kept includes name [Name], address
[Address], and telephone number [Phone]. The database also keeps track of the types of planes each pilot is
authorized to fly [FLIES] and the types of planes each employee can do maintenance work on [WORKS_ON].
Show how the SMALL_AIRPORT EER schema in Figure 4.12 may be represented in UML notation. (Note: We
have not discussed how to represent categories (union types) in UML, so you do not have to map the
categories in this and the following question.)
Show how the UNIVERSITY EER schema in Figure 4.9 may be represented in UML notation.
• Entities $\rightarrow$ Classes: All EER entities (e.g., PERSON, DEPARTMENT, COURSE)
become UML Classes.
• Inheritance: The "is-a" relationships are shown using UML inheritance (a line with a hollow triangle
pointing to the superclass).
o FACULTY and STUDENT inherit from PERSON.
o GRAD_STUDENT inherits from STUDENT.
3. Complex Structures
5.1. Definitions
Term Definition
Domain The set of all possible valid values an attribute can take (e.g., domain of “Age” is
integers 0–120).
Attribute A named column of a relation that represents a data field (e.g., Name, RollNo).
Relation Schema The structure or definition of a relation, written as R(A1, A2, …, An), showing its
name and attributes.
Relation State The actual data or set of tuples present in a relation at a specific time.
Relational Database The current data (set of relation states) in the database at a specific moment.
State
5.2. Why are tuples in a relation not ordered?
• Sets are unordered collections, meaning the order of tuples has no meaning.
• Tuple order does not affect query results or data interpretation — each tuple is uniquely identified by
its primary key, not its position.
• Duplicate tuples make retrieval, updates, and integrity checking ambiguous and inefficient.
A set of one or more attributes that can A minimal superkey — no proper subset can
Definition
uniquely identify a tuple. uniquely identify a tuple.
3. Enforce entity integrity — no two tuples can have the same key, and it cannot be NULL.
2. Not applicable — attribute doesn’t apply (e.g., “SpouseName” for unmarried person).
3. Not yet assigned — value will be provided later (e.g., marks not yet entered).
Example:
If Student(DeptNo) references Department(DeptNo), a student cannot belong to a non-existing department.
• A foreign key is an attribute (or set of attributes) in one relation that refers to the primary key of
another relation.
Example:
Student(DeptNo) → foreign key referencing Department(DeptNo).
Definition A logical unit of work that may contain one or more A single modification of data (e.g.,
operations (insert, update, delete) executed together. changing one field).
Property Must follow ACID (Atomicity, Consistency, Isolation, Doesn’t guarantee ACID properties.
Durability).
Example Transferring money between two accounts involves Changing a single account balance
multiple updates but forms one transaction. is one update operation.
Question:
Suppose that each of the following update operations is applied directly to the database state shown in Figure
5.6.
Discuss all integrity constraints violated by each operation and ways of enforcing them.
(a)
Insert:
<‘Robert’, ‘F’, ‘Scott’, ‘943775543’, ‘1972-06-21’, ‘2365 Newcastle Rd, Bellaire, TX’, M, 58000, ‘888665555’, 1>
into EMPLOYEE
Check:
Result:
No violation.
All referential and entity constraints satisfied.
Action: Insert allowed.
(b)
Insert:
<‘ProductA’, 4, ‘Bellaire’, 2> into PROJECT
Check:
Violations:
Action: Reject insertion. Must give new project number & valid department.
(c)
Insert:
<‘Production’, 4, ‘943775543’, ‘2007-10-01’> into DEPARTMENT
Check:
Violations:
(d)
Insert:
<‘677678989’, NULL, ‘40.0’> into WORKS_ON
Check:
Violations:
(e)
Insert:
<‘453453453’, ‘John’, ‘M’, ‘1990-12-12’, ‘Spouse’> into DEPENDENT
Check:
Result:
No violation.
Action: Insert allowed.
(f)
Delete:
WORKS_ON tuples with Essn = ‘333445555’
Check:
(g)
Delete:
EMPLOYEE tuple with Ssn = ‘987654321’ (Jennifer Wallace)
Check:
Violations:
(h)
Delete:
PROJECT tuple where Pname = ‘ProductX’ (Pnumber = 1)
Check:
Violations:
Action:
Either delete dependent WORKS_ON tuples first, or use CASCADE DELETE.
(i)
Modify:
Mgr_ssn and Mgr_start_date in DEPARTMENT where Dnumber = 5
→ Mgr_ssn = 123456789, Mgr_start_date = 2007-10-01
Check:
(j)
Modify:
Super_ssn of EMPLOYEE where Ssn = 999887777 → 943775543
Check:
Violation:
(k)
Modify:
Hours in WORKS_ON where Essn = 999887777 and Pno = 10 → 5.0
Check:
Result:
No violation.
Action: Update allowed.
Summary Table
Question:
b. Constraints to Check:
1. Key Constraints — primary keys (e.g., Seat_number unique for same flight leg & date).
3. Referential Integrity —
4. Domain Constraints — correct data types (e.g., valid date, phone format).
c. Classify Constraints
Constraint Type
2. FLIGHT_LEG.Departure_airport_code → AIRPORT(Airport_code)
3. FLIGHT_LEG.Arrival_airport_code → AIRPORT(Airport_code)
4. LEG_INSTANCE(Flight_number, Leg_number) → FLIGHT_LEG(Flight_number, Leg_number)
5. LEG_INSTANCE.Airplane_id → AIRPLANE(Airplane_id)
6. FARE.Flight_number → FLIGHT(Flight_number)
7. AIRPLANE.Airplane_type → AIRPLANE_TYPE(Airplane_type_name)
8. CAN_LAND.Airplane_type_name → AIRPLANE_TYPE(Airplane_type_name)
9. CAN_LAND.Airport_code → AIRPORT(Airport_code)
Question:
Relation:
CLASS(Course#, Univ_Section#, Instructor_name, Semester, Building_code, Room#, Time_period, Weekdays,
Credit_hours)
(Course#, Semester, Univ_Section#) If section numbers are unique only within a course-semester
combination.
(Building_code, Room#, Time_period, If no two classes share the same room, time, and weekday
Weekdays) schedule.
(Instructor_name, Semester, If an instructor can teach only one class in a given time period
Time_period) per semester.
Summary:
Usually, Univ_Section# is the primary key (globally unique), while the others are alternate candidate keys
valid under certain scheduling rules.
Here are the provided database schema analysis and design questions formatted clearly:
5.14. Order-Processing Database
Consider the following six relations for an order-processing database application in a company:
• ORDER(, Odate, , )
• ORDER_ITEM(, , Qty)
• ITEM(, Unit_price)
• SHIPMENT(, , )
• WAREHOUSE(, City)
Here, refers to total dollar amount of an order; Odate is the date the order was placed; and is the date an
order (or part of an order) is shipped from the warehouse. Assume that an order can be shipped from several
warehouses.
1. Specify the foreign keys for this schema, stating any assumptions you make.
Schema:
CUSTOMER(Cust#, Cname, City)
ORDER(Order#, Odate, Cust#, Ord_amt)
ORDER_ITEM(Order#, Item#, Qty)
ITEM(Item#, Unit_price)
SHIPMENT(Order#, Warehouse#, Ship_date)
WAREHOUSE(Warehouse#, City)
• ORDER.Cust# → CUSTOMER.Cust#
Assume each order must belong to an existing customer.
• ORDER_ITEM.Order# → ORDER.Order#
Assume order items always reference an existing order.
• ORDER_ITEM.Item# → ITEM.Item#
Assume each ordered item exists in the item master.
• SHIPMENT.Order# → ORDER.Order#
Assume shipment always refers to a recorded order.
• SHIPMENT.Warehouse# → WAREHOUSE.Warehouse#
Assume shipments are from known warehouses.
• If partial shipments allowed: track shipped quantities; ensure total shipped ≤ ordered qty.
• If an order may be shipped from multiple warehouses: SHIPMENT may have many rows per Order#.
Consider the following relations for a database that keeps track of business trips of salespersons in a sales
office:
• SALESPERSON(, Name, , )
• TRIP(, , , , , )
• EXPENSE(, , )
A trip can be charged to one or more accounts. Specify the foreign keys for this schema, stating any
assumptions you make.
Schema:
SALESPERSON(Ssn, Name, Start_year, Dept_no)
TRIP(Ssn, From_city, To_city, Departure_date, Return_date, Trip_id)
EXPENSE(Trip_id, Account#, Amount)
• TRIP.Ssn → SALESPERSON.Ssn
Assume each trip belongs to an existing salesperson.
• EXPENSE.Trip_id → TRIP.Trip_id
Assume Trip_id uniquely identifies a trip (Trip_id is primary key).
• Trip_id should be unique (or composite key: (Ssn, Trip_id) if Trip_id not global).
Consider the following relations for a database that keeps track of student enrollment in courses and the books
adopted for each course:
• BOOK_ADOPTION(, Quarter, )
Specify the foreign keys for this schema, stating any assumptions you make.
Schema:
STUDENT(Ssn, Name, Major, Bdate)
COURSE(Course#, Cname, Dept)
ENROLL(Ssn, Course#, Quarter, Grade)
BOOK_ADOPTION(Course#, Quarter, Book_isbn)
TEXT(Book_isbn, Book_title, Publisher, Author)
• ENROLL.Ssn → STUDENT.Ssn
• ENROLL.Course# → COURSE.Course#
• BOOK_ADOPTION.Course# → COURSE.Course#
• BOOK_ADOPTION.Book_isbn → TEXT.Book_isbn
Other constraints
Consider the following relations for a database that keeps track of automobile sales in a car dealership ( refers
to some optional equipment installed on an automobile):
• OPTION(, , Price)
• SALE(, , Date, )
1. First, specify the foreign keys for this schema, stating any assumptions you make.
2. Next, populate the relations with a few sample tuples, and then give an example of an insertion in the
SALE and SALESPERSON relations that violates the referential integrity constraints and of another
insertion that does not.
Schema:
CAR(Serial_no, Model, Manufacturer, Price)
OPTION(Serial_no, Option_name, Price)
SALE(Salesperson_id, Serial_no, Date, Sale_price)
SALESPERSON(Salesperson_id, Name, Phone)
Assumptions:
Sample tuples
CAR
SALESPERSON
SALE
• Insertion into SALESPERSON cannot violate referential integrity (SALESPERSON has no FKs). It can
violate key uniqueness if Salesperson_id already exists:
Database design often involves decisions about the storage of attributes. For example, a Social Security
number () can be stored as one attribute or split into three attributes (one for each of the three hyphen-
delineated groups of numbers in a Social Security number—XXX-XX-XXXX). However, Social Security numbers
are usually represented as just one attribute. The decision is based on how the database will be used. This
exercise asks you to think about specific situations where dividing the SSN is useful.
1. Query by region/area code (AAA): If you need frequent queries grouped by the first 3 digits (area or
issuing region), splitting avoids expensive substring operations.
2. Partial indexing / range searches: If you index parts separately (e.g., index on first 3 digits), searches by
area are faster.
3. Validation or formatting: If each part has different validation rules (e.g., group GG must be within
certain ranges), storing separately simplifies checks.
4. Masking/privacy: If you routinely display only last 4 digits, having parts stored separately makes
masking simpler.
5. Internationalization / storage constraints: If some systems store components differently or you need to
store non-digit separators, separate fields help integration.
6. Reporting / aggregate stats: If reports often group by the first/second group, separate attributes make
aggregation simpler.
• Simpler design, fewer columns, easier to enforce uniqueness, and typical use (lookup by full SSN) is
simple. Prefer single attribute unless you have real need above.
5.19. Consider a STUDENT relation in a UNIVERSITY database
Example tuple:
George Shaw 123-45-6789 555-1234 123 Main St, Anytown, CA 94539 555-4321 19 3.75
(a) Identify the critical missing information from Local_phone and Cell_phone.
Answer:
The area code and country code are missing.
Without them, you cannot call someone outside the local area or from another state/province.
(b) Would you store this additional information in the same attribute or as new attributes?
Answer:
Better to add new attributes — e.g.,
Local_area_code, Cell_area_code, Country_code.
This makes it easier to validate, search, and handle international numbers.
Storing all in one field (e.g., “+1-555-1234”) can make querying difficult.
What are the advantages and disadvantages of splitting into (First_name, Middle_name, Last_name)?
Advantages:
Disadvantages:
(d) General guideline for deciding between single and multiple attributes
Answer:
Split an attribute into multiple fields only if:
• The parts are used independently in queries or reports.
Keep as a single field if it’s always used as a unit (e.g., full name for display).
(e) Student can have between 0 and 5 phones — suggest two design options.
Design 1:
Create a separate table:
STUDENT_PHONE(Ssn, Phone_type, Phone_number)
Design 2:
Add multiple attributes:
Phone1, Phone2, Phone3, Phone4, Phone5
Answer:
Because privacy laws prevent SSN usage, a Student_id (surrogate key) is used.
It’s unique, simple, and system-generated.
(b) If you use last name in the key, what are the problems and solutions?
Problem:
Solutions:
• Use a surrogate key (Student_id) as the primary key.
• If name changes, just update the name field, not the key.
Advantages:
Disadvantages:
• Need extra constraints to ensure real-world uniqueness (e.g., no duplicate students with different IDs).
6.1. How do the relations (tables) in SQL differ from formal relations in Chapter 3? Why does SQL allow
duplicates?
Differences:
1. Duplicates:
3. NULL values:
o Formal model → does not allow NULLs (every value must be atomic).
4. Terminology differences:
o Relation → Table
Numeric types
Character types
6.3. How does SQL implement entity integrity and referential integrity? What are referential triggered
actions?
Entity Integrity:
• Implemented by using:
• PRIMARY KEY(column_name)
Referential Integrity:
• Implemented by using:
Options:
6.4. Describe the four clauses in a simple SQL retrieval query. Which are required and which are optional?
Syntax:
SELECT [DISTINCT] column_list
FROM table_list
[WHERE condition]
[ORDER BY column_list];
Here is the text formatted clearly, focusing on database constraints and SQL DDL (Data Definition Language)
statements:
Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1.
1. What are the referential integrity constraints that should hold on the schema?
• SECTION.Course_number →
COURSE.Course_number
• GRADE_REPORT.Student_number →
STUDENT.Student_number
• GRADE_REPORT.Section_identifier →
SECTION.Section_identifier
• PREREQUISITE.Course_number →
COURSE.Course_number
• PREREQUISITE.Prerequisite_number →
COURSE.Course_number
Name VARCHAR(100),
Class INT,
Major VARCHAR(50)
);
Course_name VARCHAR(200),
Credit_hours INT,
Department VARCHAR(50)
);
CREATE TABLE SECTION (
Semester VARCHAR(20),
Year INT,
Instructor VARCHAR(100),
);
Grade CHAR(2),
);
);
Notes: ON DELETE policies are typical choices — you can tighten (RESTRICT) or loosen (CASCADE) depending
on policy.
Repeat Exercise 6.5, but use the AIRLINE database schema of Figure 5.8.
• LEG_INSTANCE.Airplane_id → AIRPLANE(Airplane_id)
• FARE.Flight_number → FLIGHT(Flight_number)
• AIRPLANE.Airplane_type → AIRPLANE_TYPE(Airplane_type_name)
DDL (compact):
CREATE TABLE AIRPORT (
Name VARCHAR(100),
City VARCHAR(100),
State VARCHAR(50)
);
Airline VARCHAR(100),
Weekdays VARCHAR(20)
);
Departure_airport_code CHAR(6),
Scheduled_departure_time TIME,
Arrival_airport_code CHAR(6),
Scheduled_arrival_time TIME,
);
Company VARCHAR(100)
);
Total_number_of_seats INT,
Airplane_type VARCHAR(50),
);
Number_of_available_seats INT,
Airplane_id VARCHAR(20),
Departure_airport_code CHAR(6),
Departure_time TIME,
Arrival_airport_code CHAR(6),
Arrival_time TIME,
);
CREATE TABLE FARE (
Amount DECIMAL(10,2),
Restrictions VARCHAR(200),
);
);
Customer_name VARCHAR(200),
Customer_phone VARCHAR(20),
);
Notes: seat reservation must reference an existing leg instance (date). Number_of_available_seats should be
checked as business rule when inserting reservations.
6.7. Referential Integrity Actions for the LIBRARY Database (Figure 6.6)
Choose the appropriate action (reject, cascade, set to NULL, set to default) for each referential integrity
constraint, both for the deletion of a referenced tuple and for the update of a primary key attribute value in a
referenced tuple. Justify your choices.
o ON DELETE CASCADE — if a book is removed from catalog, delete author entries for that book.
2. BOOK.Publisher_name → PUBLISHER.Name
o ON DELETE SET NULL — if publisher removed, keep book record but null publisher (library still
holds book).
3. BOOK_COPIES.Book_id → BOOK.Book_id
o ON UPDATE CASCADE
4. BOOK_COPIES.Branch_id → LIBRARY_BRANCH.Branch_id
o ON DELETE RESTRICT — do not delete a branch if copies exist (must relocate/clear first).
o ON UPDATE CASCADE
5. BOOK_LOANS.Book_id → BOOK.Book_id
o ON DELETE RESTRICT — do not allow deleting a book that has current loan records (or move to
archive first).
o ON UPDATE CASCADE
6. BOOK_LOANS.Branch_id → LIBRARY_BRANCH.Branch_id
o ON UPDATE CASCADE
7. BOOK_LOANS.Card_no → BORROWER.Card_no
o ON UPDATE CASCADE
Short justification:
• Use CASCADE for child tables that only make sense with the parent (authors, copies).
• Use RESTRICT for entities where deletions would cause loss of important history or orphan loans
(loans, branches, borrowers).
• Use SET NULL for optional references (publisher) so book data remains.
Write appropriate SQL DDL statements for declaring the LIBRARY relational database schema of Figure 6.6.
Specify the keys and referential triggered actions.
Address VARCHAR(300),
Phone VARCHAR(20)
);
Title VARCHAR(300),
Publisher_name VARCHAR(200),
);
Book_id VARCHAR(50),
Author_name VARCHAR(200),
);
CREATE TABLE LIBRARY_BRANCH (
Branch_name VARCHAR(200),
Address VARCHAR(300)
);
Book_id VARCHAR(50),
Branch_id INT,
No_of_copies INT,
);
Name VARCHAR(200),
Address VARCHAR(300),
Phone VARCHAR(20)
);
Book_id VARCHAR(50),
Branch_id INT,
Card_no INT,
Date_out DATE,
Due_date DATE,
);
1. How can the key and foreign key constraints be enforced by the DBMS?
3. Can the constraint checks be executed efficiently when updates are applied to the database?
• Primary keys / unique keys: enforced by maintaining unique indexes (B-tree or hash). Inserting or
updating a key checks the index for duplicates.
• Foreign keys: enforced by checking the referenced table (usually via index on referenced PK) when
inserting/updating/deleting child or parent. Many DBMSs maintain internal structures to check
existence quickly.
• Triggered actions: performed automatically by DBMS engine (cascading updates/deletes) or via user-
defined triggers where DBMS lacks FK features.
Difficulty to implement:
• Not difficult conceptually — core DBMS functionality. Implementation requires careful concurrency
control and locking to avoid race conditions.
Performance / Efficiency:
• With proper indexes on PKs and FKs, checks are efficient (logarithmic lookup).
• Bulk updates may be expensive if many cascading actions occur, but overall enforcement is efficient in
production systems.
a. Retrieve the names of all employees in department 5 who work more than 10 hours per week on the
ProductX project. b. List the names of all employees who have a dependent with the same first name as
themselves. c. Find the names of all employees who are directly supervised by ‘Franklin Wong’.
a) Question:
Retrieve names of all employees in department 5 who work more than 10 hours/week on the ProductX
project.
SQL:
FROM EMPLOYEE E
WHERE E.Dno = 5
------+-------+---------
John | B | Smith
Joyce | A | English
b) Question:
List the names of all employees who have a dependent with the same first name as themselves.
SQL:
FROM EMPLOYEE E
c) Question:
Find the names of all employees who are directly supervised by ‘Franklin Wong’.
SQL:
FROM EMPLOYEE E
Who is Franklin Wong? Franklin T Wong has Ssn = 333445555 (from Figure 5.6).
Result rows:
-------+-------+-------------
John | B | Smith
Ramesh | K | Narayan
Joyce | A | English
6.11. Specify the updates of Exercise 3.11 using SQL update commands.
I don’t have the text of Exercise 3.11 in front of me. If you paste that exercise here I’ll convert each required
update precisely to SQL.
Meanwhile, here are examples of common update operations expressed in SQL so you can see the pattern and
adapt to your specific Exercise 3.11:
UPDATE STUDENT
UPDATE SECTION
If you paste Exercise 3.11 here I’ll rewrite those exact updates in SQL for you.
Schema used:
STUDENT(Student_number, Name, Class, Major)
COURSE(Course_number, Course_name, Credit_hours, Department)
SECTION(Section_identifier, Course_number, Semester, Year, Instructor)
GRADE_REPORT(Student_number, Section_identifier, Grade)
PREREQUISITE(Course_number, Prerequisite_number)
SELECT Name
FROM STUDENT
WHERE Class = 4
(b) Retrieve names of all courses taught by Professor King in 2007 and 2008.
FROM COURSE C
(c) For each section taught by Professor King, retrieve course number, semester, year, and number of
students who took the section.
SELECT S.Course_number,
S.Semester,
S.Year,
COUNT(G.Student_number) AS num_students
FROM SECTION S
ON S.Section_identifier = G.Section_identifier
WHERE S.Instructor = 'King'
(d) Retrieve the name and transcript of each senior student (Class = 4) majoring in CS.
(Transcript: course name, course number, credit hours, semester, year, grade)
SELECT ST.Name,
C.Course_number,
C.Course_name,
C.Credit_hours,
S.Semester,
S.Year,
G.Grade
FROM STUDENT ST
WHERE ST.Class = 4
(a) Insert new student <'Johnson', 25, 1, 'Math'> — assuming columns are (Name, Student_number, Class,
Major):
UPDATE STUDENT
SET Class = 2
(c) Insert new course < 'Knowledge Engineering', 'cs4390', 3, 'cs' >:
(d) Delete the record for the student whose name is 'Smith' and whose student number is 17:
Choice: Simple Online Bookstore — concise schema + example queries + suggested indexes.
DDL (concise):
Name VARCHAR(200),
City VARCHAR(100)
);
Title VARCHAR(300),
Publisher VARCHAR(200),
Price DECIMAL(8,2)
);
ISBN VARCHAR(20),
Author VARCHAR(200),
);
CREATE TABLE ORDERS (
CustID INT,
Odate DATE,
TotalAmt DECIMAL(10,2),
);
OrderID INT,
ISBN VARCHAR(20),
Qty INT,
PriceEach DECIMAL(8,2),
);
2. Customer order history: SELECT O.OrderID, O.Odate, I.ISBN, I.Qty FROM ORDERS O JOIN ORDER_ITEM I
USING (OrderID) WHERE O.CustID = ?;
3. Top-selling books: SELECT ISBN, SUM(Qty) FROM ORDER_ITEM GROUP BY ISBN ORDER BY SUM(Qty)
DESC;
Indexes to create:
(a) What happens when DELETE EMPLOYEE WHERE Lname = 'Borg' is run on Figure 5.6 state?
• The DBMS finds employee(s) with Lname='Borg' (from Figure 5.6 that is James E Borg, Ssn =
888665555). Because of ON DELETE CASCADE, the DBMS:
2. Also deletes any employee rows whose Super_ssn = 888665555 (i.e., all direct subordinates of
Borg).
3. That cascading delete continues recursively: if those deleted employees are supervisors of
others, their subordinates are deleted as well.
• So a single delete may remove a whole subtree of employees who ultimately report (directly or
indirectly) to Borg.
o ON DELETE SET NULL → subordinates remain but their Super_ssn becomes NULL (no
supervisor); or
• Use CASCADE only if you truly intend to delete the entire reporting subtree when deleting a supervisor
— rare in HR scenarios.
(Works in many DBMS: PostgreSQL, SQLite, Oracle uses CREATE TABLE AS SELECT, MySQL supports CREATE
TABLE ... SELECT ... )
Fname VARCHAR(50),
Minit CHAR(1),
Lname VARCHAR(50),
Bdate DATE,
Address VARCHAR(300),
Sex CHAR(1),
Salary DECIMAL(10,2),
Super_ssn CHAR(9),
Dno INT
);
SELECT Fname, Minit, Lname, Ssn, Bdate, Address, Sex, Salary, Super_ssn, Dno
FROM EMPLOYEE;
A full SQL SELECT query can include up to six clauses in this order:
1. SELECT –
o Specifies which attributes or expressions to display in the result.
o Can include aggregate functions, expressions, and the DISTINCT keyword.
o Required clause.
2. FROM –
o Lists the tables (and joins) from which the data is retrieved.
o May include aliases and JOIN conditions.
o Required clause.
3. WHERE –
o Filters rows based on a condition (comparison, logical operators, IN, BETWEEN, LIKE, etc.).
o Optional.
4. GROUP BY –
o Groups rows with the same values in one or more columns for aggregate functions (SUM, AVG,
etc.).
o Optional.
5. HAVING –
o Applies a condition to groups formed by GROUP BY.
o Similar to WHERE, but used for aggregate conditions.
o Optional.
6. ORDER BY –
o Sorts the final result set by one or more columns, ascending or descending.
o Optional.
1. In comparison operators:
o Any comparison with NULL (e.g., =, >, <, <>) yields UNKNOWN, not TRUE or FALSE.
o Rows with UNKNOWN in a WHERE condition are excluded from the result.
o Example:
o WHERE Salary > NULL -- always UNKNOWN
o To test NULLs, use IS NULL or IS NOT NULL.
2. In aggregate functions:
o COUNT, SUM, AVG, MIN, and MAX ignore NULL values (except COUNT(*) which counts all rows).
o Example: If some salaries are NULL, AVG(Salary) averages only non-NULL salaries.
3. In grouping attributes:
o Rows with NULL values in grouping columns are treated as equal (all NULLs form one group).
o Example:
o GROUP BY Dept_no
— All tuples with Dept_no IS NULL belong to a single group labeled NULL.
Summary Table
7.4. Discuss how each of the following constructs is used in SQL, and
options / uses (brief).
a. Nested queries (subqueries)
• What: SUM, AVG, COUNT, MIN, MAX operate on sets of rows. GROUP BY groups rows before aggregation;
HAVING filters groups.
• Options: grouping by one or many attributes, grouping sets, ROLLUP, CUBE (DBMS-specific).
• Useful for: totals, averages, counts per group (e.g., salary by department).
d. Triggers
• What: CREATE ASSERTION (SQL standard) expresses global constraints across database. Rarely
implemented.
• Difference: Assertions are declarative global constraints checked by DBMS; triggers are procedural and
tied to specific tables/events. Assertions should be checked always but may be expensive; triggers run
on specific events and can take actions.
• What: Define named subqueries (temporary result sets) used by the main query.
• Options: non-recursive and recursive (WITH RECURSIVE).
• Useful for: clarity, reuse of subquery results, writing recursive queries (hierarchies).
• What: CREATE TABLE, ALTER TABLE, DROP TABLE, CREATE INDEX, DROP INDEX, etc.
• Options: ALTER TABLE can add/drop columns, add constraints, rename columns (DBMS-specific
syntax).
• Useful for: evolving database structure; must consider data migration and constraints.
7.5. SQL queries on COMPANY (Figure 5.5) & results on Figure 5.6
(a) For each department whose average employee salary is more than $30,000, retrieve the department
name and the number of employees working for that department.
SQL:
• Dept 1 (Headquarters) — employees: James Borg (salary 55000) → avg = 55000 → qualifies, count = 1
• Dept 4 (Administration) — Alicia(25000), Jennifer(43000), Ahmad(25000) → avg =
(25000+43000+25000)/3 = 31000 → qualifies, count = 3
• Dept 5 (Research) — John(30000), Franklin(40000), Ramesh(38000), Joyce(25000) → avg = 33250 →
qualifies, count = 4
Result rows:
Dname | num_employees
--------------+--------------
Headquarters | 1
Administration| 3
Research | 4
(b) Suppose we want the number of male employees in each department making more than $30,000,
rather than all employees (as in 7.5a). Can we specify this query in SQL? Why or why not?
Two ways:
1. Filter then group (only departments with at least one male >30k):
• This returns counts only for departments that have qualifying males. Departments with zero qualifying
males will be absent.
SELECT D.Dname,
SUM(CASE WHEN E.Sex='M' AND E.Salary>30000 THEN 1 ELSE 0 END) AS
male_highpaid_count
FROM DEPARTMENT D
LEFT JOIN EMPLOYEE E ON D.Dnumber = E.Dno
GROUP BY D.Dname;
So results (if you use conditional aggregation and include all departments):
Dname | male_highpaid_count
--------------+---------------------
Headquarters | 1
Administration| 0
Research | 2
Why it’s possible: SQL supports WHERE filtering and conditional aggregation (CASE) so this is easily
expressible.
7.6. (Queries on the database in Figure 1.2)
Given schema (from Fig 1.2): STUDENT(Student_number, Name, Class, Major), COURSE, SECTION,
GRADE_REPORT(Student_number, Section_identifier, Grade), etc.
(You provided Figure 1.2 earlier — I used its data.)
(a) Retrieve the names and major departments of all straight-A students (students who have a grade of A in
all their courses).
SQL:
FROM STUDENT S
SELECT 1
FROM GRADE_REPORT G
);
Explanation: For each student ensure there is no grade that is not 'A'. (Also handles students with no grades —
they would be returned; to exclude those add AND EXISTS (SELECT 1 FROM GRADE_REPORT G WHERE
G.Student_number = S.Student_number).)
(b) Retrieve the names and major departments of all students who do not have a grade of A in any of their
courses.
SQL:
FROM STUDENT S
);
• Smith (Student_number = 17, Major = CS) — student 17 has grades B and C only.
(Student 8 has some A’s, so excluded.)
7.7. (Nested queries on COMPANY schema — Figure 5.5 — use Figure 5.6 data)
(a) Retrieve the names of all employees who work in the department that has the employee with the
highest salary among all employees.
SQL:
WHERE E.Dno = (
SELECT Dno
FROM EMPLOYEE
LIMIT 1
);
James | E | Borg
(b) Retrieve the names of all employees whose supervisor's supervisor has Ssn = '888665555'.
FROM EMPLOYEE E
• Ramesh K Narayan
• Joyce A English
So output rows:
John B Smith
Ramesh K Narayan
Joyce A English
(c) Retrieve the names of employees who make at least $10,000 more than the employee who is paid the
least in the company.
SQL:
FROM EMPLOYEE E
Figure 5.6 data: min salary = 25000. So threshold = 35000. Employees with salary >= 35000 are:
Result:
Franklin T Wong
Jennifer S Wallace
Ramesh K Narayan
James E Borg
7.8. Specify the following views in SQL on the COMPANY schema (Figure 5.5)
(a) A view that has the department name, manager name, and manager salary for every department.
M.Salary AS ManagerSalary
FROM DEPARTMENT D
(b) A view that has the employee name, supervisor name, and employee salary for each employee who
works in the 'Research' department.
CREATE VIEW Research_Employees AS
E.Salary
FROM EMPLOYEE E
(c) A view that has the project name, controlling department name, number of employees, and total hours
worked per week on the project for each project.
D.Dname AS ControllingDept,
COALESCE(SUM(W.Hours),0) AS TotalHours
FROM PROJECT P
(d) A view same as (c) but only for projects with more than one employee working on it.
D.Dname AS ControllingDept,
COALESCE(SUM(W.Hours),0) AS TotalHours
FROM PROJECT P
FROM EMPLOYEE
GROUP BY Dno;
State which of the following queries/updates would be allowed on the view, and show the corresponding
base-relation query/result on the Figure 5.6 database.
1 | 1 | 55000 | 55000
4 | 3 | 93000 | 31000
5 | 4 | 133000 | 33250
(b)
SELECT D, C
FROM DEPT_SUMMARY
Evaluation: From the computed rows above only Dno = 5 has Total_s > 100000.
Result:
D|C
--+--
5|4
• This is an aggregated view (GROUP BY). Updates (INSERT/UPDATE/DELETE) on such views are not
allowed (not updatable) because there is no one-to-one mapping to underlying base tuples. If you
attempted an update, the DBMS would reject it (unless you create INSTEAD OF triggers to handle it).
Operation Purpose
UNION (∪) Combine tuples from two union-compatible relations, removing duplicates.
DIFFERENCE (−) Return tuples in one relation but not in the other.
CARTESIAN PRODUCT (×) Combine every tuple of one relation with every tuple of another.
JOIN (⋈) Combine related tuples from two relations based on a condition.
DIVISION (÷) Find tuples in one relation that are related to all tuples in another.
Why required:
The UNION, INTERSECTION, and DIFFERENCE operators compare tuples position-wise.
Hence, attributes must match in structure and meaning to produce valid results.
8.4. Types of Inner Join & Why Theta Join is Required
1. Theta Join (⋈θ) – Join where condition θ can use any comparison operator (=, <, >, ≤, ≥, ≠).
3. Natural Join (⋈) – Equi-join that automatically joins on all common attribute names and removes
duplicates.
• The foreign key attribute(s) in one relation refers to the primary key in another.
• Joins are typically done on this relationship (e.g., Employee.Dno = Department.Dnumber) to combine
related data.
Hence, foreign keys define meaningful join conditions between tables.
The FUNCTION operation applies built-in functions or aggregate operations (SUM, AVG, COUNT, MIN, MAX) to
attribute values in a relation.
Use:
Type Description
OUTER JOIN Returns matching tuples plus non-matching tuples from one or both relations.
LEFT OUTER JOIN Keeps all tuples from the left relation.
RIGHT OUTER JOIN Keeps all tuples from the right relation.
Type Description
FULL OUTER JOIN Keeps all tuples from both, filling NULLs for missing matches.
• OUTER UNION merges tuples with different but partially matching attributes, filling missing values with
NULLs.
Procedural – Specifies how to get the result (sequence Non-procedural – Specifies what result is desired, not
of operations). how to get it.
Similarity: Both describe the same set of queries; they are relationally equivalent in power.
Term Meaning
Range relation Relation from which a tuple variable takes its values.
Expression `{t
Term Meaning
Expression `{<x₁,…,xₙ>
An expression is safe if it produces a finite result that can be derived from existing database values.
Unsafe expressions could refer to infinite domains (e.g., {x | ¬(x ∈ EMPLOYEE)}).
SQL only allows safe expressions.
A query language is relationally complete if it can express all queries that can be formulated in relational
algebra.
Examples of relationally complete languages:
10marks::-
8.3.
Question: Discuss some types of queries for which renaming of attributes is necessary in order to specify the
query unambiguously.
Answer (short):
Renaming attributes (using AS in SQL or ρ in relational algebra) is required in queries such as:
1. Self-join — when joining a table to itself you must rename one copy so attributes (e.g., Employee.Ssn)
don’t conflict.
o Example: find employees and their supervisors: join EMPLOYEE as E and EMPLOYEE as S on
E.Super_ssn = S.Ssn — you need distinct names for E.Name and S.Name.
2. Join of relations with same attribute names — when two relations share attribute names but you want
to keep both (e.g., Course.Course_number and Prerequisite.Prerequisite_number), rename to avoid
ambiguity.
3. Union / set operations with different attribute names — before applying UNION you may need to
rename so column positions/types match.
4. Projection with expressions — when you compute expressions (Salary*1.1) you should give them a
name: SELECT Salary*1.1 AS NewSalary.
5. Nested queries / derived tables — when using a subquery in FROM you must name (alias) the derived
table and its attributes to reference them in outer query.
8.16.
Question: Specify the following queries on the COMPANY relational database schema (Figure 5.5) using
relational operators. Also show the result of each query as it would apply to the database state in Figure 5.6.
I’ll give a relational-algebra expression (brief) for each part, then the result computed from Figure 5.6.
Notation:
• EMP, DEPT, WORKS_ON, PROJECT, DEPENDENT are the relation names.
(a) Retrieve the names of all employees in department 5 who work more than 10 hours per week on the
ProductX project.
Relational-algebra expression:
π_{Fname,Minit,Lname} (
Result:
John B Smith
Joyce A English
(b) List the names of all employees who have a dependent with the same first name as themselves.
Relational algebra:
π_{E.Fname,E.Minit,E.Lname} (
Compute on Figure 5.6: Compare employee first names to dependent names — no exact matches (Joy vs Joyce
not equal).
(c) Find the names of all employees who are directly supervised by ‘Franklin Wong’.
Relational algebra:
S = σ_{Fname='Franklin' ∧ Lname='Wong'}(EMP) ;
Compute on Figure 5.6: Franklin Wong has Ssn = 333445555. Employees with Super_ssn = 333445555 are Ssns
123456789 (John), 666884444 (Ramesh), 453453453 (Joyce).
Result:
John B Smith
Ramesh K Narayan
Joyce A English
(d) For each project, list the project name and the total hours per week (by all employees) spent on the
project.
• Reorganization (20): 10.0 + 15.0 + (NULL ignored) = 25.0 (NULL ignored in sum)
Result rows:
ProductX | 52.5
ProductY | 17.5
ProductZ | 50.0
Computerization| 55.0
Reorganization| 25.0
Newbenefits | 55.0
(e) Retrieve the names of all employees who work on every project.
Compute on Figure 5.6: projects set = {1,2,3,10,20,30}. No employee appears on all six project numbers.
(f) Retrieve the names of all employees who do not work on any project.
Relational algebra:
(g) For each department, retrieve the department name and the average salary of all employees working in
that department.
Result:
Headquarters | 55000
Administration | 31000
Research | 33250
(h) Retrieve the average salary of all female employees.
Relational algebra:
Result: 31000
(i) Find names & addresses of employees who work on at least one project located in Houston but whose
department has no location in Houston.
Approach / RA (brief):
1. ProjHouston = σ_{Plocation='Houston'}(PROJECT)
Compute with Figure 5.6 data: Projects in Houston: ProductZ (P3), Reorganization (P20). Employees working
on those projects: P3 → Essn 666884444 and 333445555; P20 → Essn 333445555, 987654321, 888665555.
Combined employees
{666884444(Ramesh,dno5),333445555(Franklin,dno5),987654321(Jennifer,dno4),888665555(James,dno1)}.
Departments that have Houston in DEPT_LOCATIONS: Dnumber 1 and 5 (Dept 5 has Bellaire, Sugarland,
Houston). So employees whose department has no location in Houston are those whose Dno ∉ {1,5} — that
leaves Dno 4 (Administration). Among the candidate employees above, only 987654321 (Jennifer) has Dno=4.
(j) List the last names of all department managers who have no dependents.
Relational algebra:
MgrSSNs = π_{Mgr_ssn}(DEPARTMENT)
Result: Borg
8.17.
Question: For the AIRLINE schema (Figure 5.8) specify queries in relational algebra.
I’ll give relational-algebra formulas for each part. I do not have a full instance of the AIRLINE data (you
showed the schema earlier but no instance with flight rows), so I will not produce numeric results unless you
provide the instance. If you want results, upload the airline instance (the data). For now I give RA expressions.
RA expression sketch:
FirstLeg = σ_{Leg_number=1}(FLIGHT_LEG)
(b) List flight numbers and weekdays of flights or legs that depart from iah and arrive in lax.
(c) List flight number, departure airport code, scheduled departure time, arrival airport code, scheduled
arrival time, and weekdays of all flights or flight legs that depart from some airport in the city of Houston
and arrive in some airport in the city of Los Angeles.
RA sketch:
RA:
Result = σ_{Flight_number='co197'}(FARE)
(e) Retrieve the number of available seats for flight number co197 on 2009-10-09.
RA (use LEG_INSTANCE):
σ_{Flight_number='co197' ∧ Date='2009-10-09'}(LEG_INSTANCE)
(If you want per-leg, remove SUM and project Leg_number and Number_of_available_seats.)
Relations:
(a) How many copies of the book titled The Lost Tribe are owned by branch 'Sharpstown'?
RA:
π_{No_of_copies} (
⋈_{BOOK_COPIES.Branch_id = LB.Branch_id}
( σ_{Branch_name='Sharpstown'}(LIBRARY_BRANCH) AS LB )
(b) How many copies of The Lost Tribe are owned by each branch?
RA:
BC = BOOK_COPIES ⋈ BOOK
(c) Retrieve names of borrowers who do not have any books checked out.
RA (anti-join):
AllBorrowers = BORROWER
π_{Name}( Result )
(Or π_{Name}( BORROWER ⋈_{not exists BOOK_LOANS.Card_no = BORROWER.Card_no} ))
(d) For each book loaned from Sharpstown and whose Due_date = today, retrieve book title, borrower
name, borrower address.
(e) For each branch, branch name and total number of books loaned out from that branch.
RA:
(f) Names, addresses, and number of books checked out for borrowers with >5 books checked out.
RA:
BORROWER ⋈ CountPerBorrower
(g) For each book authored (or coauthored) by Stephen King, retrieve the title and the number of copies
owned by branch 'Central'.
RA:
(a) List Order# and Ship_date for orders shipped from Warehouse# = W2.
RA:
(b) Warehouse information from which customer named Jose Lopez was supplied (produce Order#,
Warehouse#).
RA:
Simpler:
π_{O.Order#, S.Warehouse#}
RA:
(d) Orders that were not shipped within 30 days of ordering (Ship_date > Odate + 30).
(e) Order# for orders that were shipped from all warehouses that the company has in New York.
RA (division pattern):
RA:
π_{TRIP.*}( Res )
RA:
π_{Ssn}( σ_{To_city='Honolulu'}(TRIP) )
RA:
Or
(a) Number of courses taken by all students named John Smith in Winter 2009 (Quarter = 'W09').
RA:
(b) Textbooks (Course#, Book_isbn, Book_title) for CS courses that used > 2 books.
RA:
(c) Departments that have all their adopted books published by 'Pearson Publishing'.
RA:
Better: Dept s.t. set of Book_isbn for that dept ⊆ set of Book_isbn with Publisher='Pearson'
(Implementation: use grouping and checking counts: if count total adopted books = count adopted books
where publisher='Pearson'.)
8.22 — Table T1 and T2 (Figure 8.15)
T1 (P,Q,R):
(10, a, 5)
(15, b, 8)
(25, a, 6)
T2 (A,B,C):
(10, b, 6)
(25, c, 3)
(10, b, 5)
Matches P=10 with T2 A=10 (two rows), and P=25 with A=25.
(10, a, 5, 10, b, 6)
(10, a, 5, 10, b, 5)
(25, a, 6, 25, c, 3)
Matches where Q = B: T1 Q values are a, b, a; T2 B values b, c, b. Matching pairs: T1 row (15,b,8) matches T2
rows (10,b,6) and (10,b,5).
Result:
(15, b, 8, 10, b, 6)
(15, b, 8, 10, b, 5)
(c) & (d) — (your list duplicated; I’ll assume you wanted natural join T1 ⋈ T2 if attributes aligned — but they
are different attribute names so natural join yields empty unless we rename. So skip duplicates.)
(e) T1 ∪ T2
Union is defined only for union-compatible relations. T1 and T2 have different schemas; direct UNION is not
defined. If you first rename columns to same names and align domains (e.g. treat A→P,B→Q,C→R and drop
mismatches), you could do (ρ_{P/A, Q/B, R/C}(T2)) ∪ T1 — but note data types: T1.Q has letters but T1.P
numeric etc. If we alias T2 to (P,Q,R) with types coerced, union rows would be:
After renaming T2 to (P,Q,R) (A→P,B→Q,C→R): rows become (10,b,6),(25,c,3),(10,b,5). Union with T1 rows
produces set:
(10,a,5)
(15,b,8)
(25,a,6)
(10,b,6)
(25,c,3)
(10,b,5)
But strictly speaking: UNION not allowed without making schemas compatible.
• T1 (10,a,5) matches T2 (10,b,5) because P=10=A and R=5=C. Also matches T2 (10,b,6)? no (R 5 ≠ 6).
Result:
(10, a, 5, 10, b, 5)
(a) For salesperson Jane Doe, list Serial#, Manufacturer, Sale_price for all cars she sold.
RA:
Jane = σ_{Name='Jane Doe'}(SALESPERSON)
RA:
AllCars = CAR
Equivalent:
(c) Meaning of left outer join SALESPERSON ⟕ SALE (do not change order):
• A left outer join between SALESPERSON (left) and SALE (right) returns all SALESPERSON rows; for
salespersons with matching rows in SALE, it returns those combined rows; for salespersons with no
sales it returns the salesperson row with NULL in the sale columns.
• Example: If S03 (new salesperson) exists in SALESPERSON but has made no sales, SALESPERSON ⟕
SALE will contain an output row with S03 and NULL for Sale attributes. This is useful to list all
salespersons and their sales (if any).
(d) Write a RA query using selection and one set operation and explain.
Example RA:
Explanation: Returns Serial numbers for cars that were sold but have no options (set difference between sold
cars and cars-with-options).
8.24 — Convert queries a,b,c,e,f,i,j of Exercise 8.16 into tuple calculus and domain calculus
You asked for seven queries from 8.16. I will give concise tuple calculus (TRC) and domain calculus (DRC) forms
for each. (I keep notation compact: tuple variables t, s, w, p, d, dep, prj etc.)
Recall 8.16 queries (a,b,c,e,f,i,j) — I restate each, then TRC and DRC.
TRC:
{ <FN,MI,LN> |
8.16(b) — employees who have a dependent with same first name as themselves.
TRC:
DRC:
{ <FN,MI,LN> |
∃SSN ( EMP(SSN,FN,MI,LN,...) ∧
TRC:
{ <e.Fname,e.Minit,e.Lname> |
DRC:
{ <FN,MI,LN> |
TRC:
{ <e.Fname,e.Minit,e.Lname> |
DRC:
{ <FN,MI,LN> |
∃SSN ( EMP(SSN,FN,MI,LN,...) ∧
TRC:
{ <e.Fname,e.Minit,e.Lname> |
DRC:
{ <FN,MI,LN> |
}
8.16(i) — names & addresses of employees who work on ≥1 project located in Houston but whose
department has no location in Houston.
TRC:
{ <e.Fname, e.Address> |
TRC:
{ <m.Lname> |
DRC: analogous.
8.25 — Convert 8.17 (airline RA queries) parts a–d into tuple and domain relational calculus
I’ll restate each and give TRC / DRC.
8.17(a) — For each flight, list flight number, departure airport for first leg, arrival airport for last leg.
TRC:
FLIGHT(f) ∧
DRC: Replace tuple variables by domain variables (FlightNo, LegNo1, DepA, ...); quantifiers similarly express
existence and max leg number.
8.17(b) — Flight numbers & weekdays of flights or legs that depart iah and arrive lax.
TRC:
{ <fl.Flight_number, f.Weekdays> |
8.17(c) — flight number, dep/arr airport and scheduled times and weekdays for legs departing from any
airport in Houston and arriving to any in Los Angeles.
TRC:
TRC:
TRC:
(If aggregate total across legs required, use grouping/aggregation; otherwise list per leg.)
8.26. Specify queries c, d, and f of Exercise 8.18 in both tuple and domain relational calculus.
{ <bo.Name> |
Domain relational calculus (DRC) — domain vars Card, Name, Addr, Phone:
{ <Name> |
(d) For each book loaned from Sharpstown and whose Due_date = TODAY, retrieve Title, borrower Name,
borrower Address.
TRC:
∧ br.Branch_name = 'Sharpstown'
∧ bl.Branch_id = br.Branch_id
∧ bl.Book_id = book.Book_id
∧ bl.Card_no = bo.Card_no
∧ bl.Due_date = TODAY
∃Book_id,Pub,BrId,Card,DO,DD,BrName,BrAddr (
∧ DD = TODAY
(f) For borrowers who have more than 5 books checked out, retrieve Name, Address, and number-of-books.
TRC (use counting expressed as existential of distinct loan tuples — standard safe way):
BORROWER(bo) ∧
∃bl1,bl2,bl3,bl4,bl5,bl6 (
∧ bl1.Book_id ≠ bl2.Book_id ∧ bl1.Book_id ≠ bl3.Book_id ∧ ... (all pairwise distinct book-instance identities)
∧ n = 6 OR n = (actual count) -- notionally n is the count; TRC is verbose for exact aggregate
}
(Practical note: expressing exact counts in pure TRC is cumbersome; you normally extend calculus with
aggregate operators or use a counting construct. The TRC above shows the “>5” test by proving existence of 6
distinct loan tuples.)
(Again: standard DRC does not support COUNT; this uses a counting extension.)
Note: Pure TRC/DRC cannot express aggregates compactly without extensions — the usual approach is to
either exhibit existence of k distinct tuples (as above) or to assume a counting/aggregate extension.
8.27. In a TRC query with n tuple variables, what is the typical minimum number of join conditions? Why?
Effect of fewer join conditions?
Answer (short):
• Typical minimum: n − 1 join conditions (assuming you want a single connected join of n relations).
Reason: To connect n relations into one connected component (a join graph) you need at least n−1
edges (join conditions) — think of a tree connecting n nodes.
• Effect of fewer join conditions: If you supply fewer than n−1 join conditions the joined tuples may not
be fully connected — you get Cartesian product(s) between components, producing much larger result
sets and usually incorrect or semantically meaningless results. In short: missing join conditions →
unintended cartesian products/orphan rows and big performance cost.
8.28. Rewrite DRC queries that followed Q0 in Section 8.7 in abbreviated notation (Q0A style)
I don’t have the text of Section 8.7 or the specific Q0 queries in front of me. Please paste the original queries
from Section 8.7 (or allow me to access that page) and I will rewrite them in abbreviated notation (minimizing
domain variables and using constants).
(If you want, paste the small block of example queries and I’ll do the rewrite immediately.)
8.29. Tuple relational calculus expression for the “works on at least those projects that 123456789 works on”
query
Problem restatement: Return SSNs e.Ssn of employees e such that for every project x on which employee
123456789 works, e also works on x. Using the equivalence rules provided: (∀x)(P→Q) ≡ ¬∃x (P ∧ ¬Q) and (IF P
THEN Q) ≡ (¬P OR Q).
Let W(t) denote WORKS_ON(t) tuples with attributes (Essn,Pno,Hours). Let PROJECT(x) have x.Pnumber.
TRC (direct):
{ <e.Ssn> |
EMPLOYEE(e) ∧ ¬∃x (
PROJECT(x) ∧
Explanation: There does not exist a project x such that 123456789 works on x but e does not — i.e. e works on
at least all projects 123456789 does.
(You can rewrite the inner condition using the given equivalences; I used the ¬∃ form.)
For each I give short tuple-calculus and domain-calculus forms. Notation: relation R(A,B,C) fields named A,B,C;
S(C,D,E) etc.
(a) σ_{A = C}( R(A,B,C) ) — selection where attribute A equals attribute C in same tuple.
TRC:
DRC:
{ <a,b,c> | R(a,b,c) ∧ a = c }
TRC:
DRC:
{ <a,b> | ∃c ( R(a,b,c) ) }
(c) Cartesian/product join R(A,B,C) ⋈ S(C,D,E) (theta/equi on C) — (natural-like join on C).
TRC:
DRC:
TRC:
DRC:
(e) R ∩ S (intersection)
TRC:
{ t | R(t) ∧ S(t) }
DRC:
(When asking whether equality holds you'd use boolean formula; to return tuples, it's either R or S.)
TRC:
{ <r.A, r.B, r.C, s.D, s.E, s.F> | R(r) ∧ S(s) }
DRC:
(h) Division R(A,B) ÷ S(A) (R has attributes A,B; S has A). Return B values that are associated with all A in S.
(Alternate form in your prompt: R(A,B) ÷ S(A))
TRC:
{ <b> |
∃r ( R(r) ∧ r.B = b ∧
DRC:
{ <b> |
∀a ( S(a) → ∃ ( R(a,b) ) )
8.31. Suggest extensions to relational calculus to express: (a) aggregates & grouping; (b) outer joins; (c)
recursive closure queries.
Short suggestions:
o Extend calculus with aggregate operators (COUNT, SUM, AVG, MIN, MAX) and a GROUP BY
construct. Syntax example in DRC: Num = COUNT({ t | R(t) ∧ t.attr = value }). Alternatively add
aggregate functions COUNT_GRP(group-attributes, expr) and allow HAVING-like predicates.
o Semantically this requires second-order constructs (quantifying over sets) or built-in aggregate
primitives.
o Add optional existence or nullable semantics: allow existential patterns where missing matches
produce tuples with NULL values. In calculus, represent as (∃s S(s) ∧ cond) ∨ (¬∃s S(s) ∧ cond2)
and allow NULL placeholders; or add an OUTER_JOIN(R,S,on) operator returning tuples where
attributes from non-matching side are NULL.
o Add a fixed-point / recursion operator, e.g. TC(R, startAttr, endAttr) or WITH RECURSIVE style.
Formally include least-fixed-point operator lfp(F) to compute closure of a binary relation under
transitive composition. This is standard: add TC_x_y(R) returning reachable pairs.
8.32. Nested queries on Figure 5.5 (COMPANY) — specify using nested queries + show results on Figure 5.6.
These three are the same as earlier nested queries (we answered them before); I'll give a concise nested-RA
style and the result (we already computed using Figure 5.6).
(a) Names of employees who work in the department that has the employee with highest salary.
Result (Figure 5.6): Highest salary = 55000 (James Borg), Dept = 1 → employees in dept 1: James E Borg.
RA nested:
(c) Employees who make at least $10,000 more than the employee with the least salary.
RA nested:
Figure 5.6: min=25000 → threshold 35000 → employees: Franklin T Wong, Jennifer S Wallace, Ramesh K
Narayan, James E Borg.
• This is true. By De Morgan: ¬(P∨Q) is logically equivalent to ¬P ∧ ¬Q. So implication holds both ways.
• This is false. Existence of some x satisfying P does not imply that P holds for all x.