0% found this document useful (0 votes)
71 views14 pages

finalexam-eng-24-CSI3130 - Solutions

The document is a final examination for the University of Ottawa's courses CSI3130 and CSI3530, covering topics such as query processing, optimization, transactions, concurrency control, and recovery. It consists of multiple questions with detailed instructions and specific points allocated to each section. The exam includes theoretical questions, pseudocode, and practical scenarios related to database management concepts.

Uploaded by

1234567811t
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views14 pages

finalexam-eng-24-CSI3130 - Solutions

The document is a final examination for the University of Ottawa's courses CSI3130 and CSI3530, covering topics such as query processing, optimization, transactions, concurrency control, and recovery. It consists of multiple questions with detailed instructions and specific points allocated to each section. The exam includes theoretical questions, pseudocode, and practical scenarios related to database management concepts.

Uploaded by

1234567811t
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Université d’Ottawa University of Ottawa

Faculté de génie Faculty of Engineering

École de Science Informatique et Génie School of Electrical Engineering and


Electrique Computer science

CSI3130 and CSI3530 – Final Examination


Professor: Iluju Kiringa

December 11, 2024

Duration: 3 hrs

Instructions:
Closed book ; no other aid allowed except a letter-sized cheat sheet written on both sides; write
your answers in the space provided

Family name : ____________SOLUTIONS______________________________________

First name : __________________________________________________

Student number : ________________________________________________

There are 5 questions and a total of 100 points. There 14 pages, including this title page.

I – Query Processing /10


II – Query Optimization /20
III – Transactions and Concurrency Control /30
IV – Recovery /30
V – Big Data /10

Total /100
QUESTION I. [Query Processing: 10%]

(A) [5%] Describe the hash-based algorithm for projection with duplicate elimination.

Suppose that the projection to process is: ∏A r. Suppose we have B blocks at our disposal in the buffer
pool. The algorithm has two phases, namely the partitioning phase and the duplicate elimination phase.

In the partitioning phase, we use 1 input buffer block and B-1 output buffer blocks. The relation r is
read into the input buffer block, one block a time. For each tuple in the input buffer block, project out
the unwanted attributes in the projection list A, and then apply a hash function h to the combination of
the remaining attributes. Each tuple t is written to the output buffer i such that h(t) = i. At the end of the
partitioning phase, we have B-1 partitions, each one containing tuples that share the same hash value.

Duplicate elimination phase: For each of the B-1 partitions do the following:
1. Read in the partition one block at a time. Hash each tuple by using hash function h2 (≠ h)
applied to the combination of all fields and then insert it into an in-memory hash table. A new
tuple that hashes to the value as some existing tuple is discarded if the new tuple is a duplicate
of the existing one.
2. After reading in the entire partition, write the tuples in the in-memory hash table into the result
file. Then reinitialize the in-memory hash table to prepare for the next partition.

(B) [5%] Why is the optimization heuristic “Distribute selections over joins” a good thing to do?

The motivation behind the optimization heuristic “Distribute selections over joins” and which is why it
is a good thing to do is to apply joins as early as possible on join arguments in order to reduce the size
of the resulting join arguments.

CSI3130 / 3530 – December 11, 2024 Final Examination Page 2 of 14


QUESTION II. [Query Optimization: 20%]

(A) [View Maintenance: 10%] Given the materialized view v = r - s (Set difference). When a set of
tuples ir is inserted in r, or is is inserted in s, what should be done? And when dr is deleted from r,
or ds is deleted from s, what should be done?

Given the materialized view v = r – s, when a set of tuples ir is inserted in r, we add the tuples to v
if they are not also in s. When a set of tuples is is inserted in s, each tuple in is that is also in v is
removed from v. When a set of tuples dr is deleted from r, we also delete those tuples of dr from v
that are present in v. When a set of tuples ds is deleted from s, we add to v those tuples that are
also present in r.

Alternative formulation:

Given view v = r − s, when a tuple is inserted in r, we check if it is present in s, and if not we add
it to v. When a tuple is deleted from r, we delete it from v if present. When a tuple is inserted in s,
we delete it from v if present. When a tuple is deleted from s, we check if it is present in r, and if
so we add it to v.

CSI3130 / 3530 – December 11, 2024 Final Examination Page 3 of 14


(B) [5%] Suppose I want to provide an iterator interface to the external sorting. Give a short
pseudocode that outlines what should be done in the open(), getnext() and close() functions.

Let M denote the number of blocks in the main memory buffer available for sorting. For
simplicity we assume that there are less than M runs created in the run creation phase. The
pseudocode for the iterator functions open, next, and close are as shown below:

Sort::open()
begin
repeat
read M blocks of the relation r;
sort the in-memory part of the relation;
write the sorted data to a run file ri
until the end of the relation r
read one block of each of the N run files ri into a buffer block in memory
doner := false;
end

boolean Sort::next()
begin
if the buffer block of any run Ri is empty and not end-of-file(Ri)
begin
read the next block of ri (if any) into the buffer block;
end
if all buffer blocks are empty
return false;
choose the first tuple (in sort order) among the buffer blocks;
write the tuple to the output buffer;
delete the tuple from the buffer block and increment its pointer;
return true;
end

Sort::close()
begin
clear all the N runs from main memory and disk;
end

CSI3130 / 3530 – December 11, 2024 Final Examination Page 4 of 14


(C) [5%] Consider the query

select A, C
from r, s
where s.B > 17 and r.C < some (select C
from t
where t.A = r.A)

Show how to decorrelate this query using the multiset version of the semi join operation (Use a
relational algebra expression).

The solution can be written in relational algebra as follows:

ПA,C(σ s.B>17 (r x s) ⋉ θ2 t) where θ2 = (r.C < t.B Ʌ t.A=r.A).

CSI3130 / 3530 – December 11, 2024 Final Examination Page 5 of 14


QUESTION III. [Transactions and Concurrency Control: 30%]

(A) [10%] Consider three transactions, the first is transferring $100 from account A to account B,
the second is computing 20% interests on accounts A and B, and the third is transferring $50
from account B to account A. Show a schedule which is not serial, but is conflict serializable.

Here is a sample schedule that is not serial, but is conflict serializable:

T1 T2 T3
Read(B)
B :=B-50
write(B)

read(A)
A:= A-100
write(A) read(A)
A:=A+50
Write(A)
commit
read(A)
temp:= A*0.2
A:= A+temp
write(A)
read(B)
B:= B+ 100
write(B)
commit

read(B)
B:= B+temp
write(B)
commit

CSI3130 / 3530 – December 11, 2024 Final Examination Page 6 of 14


(B) [10%] Consider the following four schedules:

S1: T1 T2 T3

Read(X)
Write(X)
Write(X)
Read(X)
Commit

Commit

S2: T1 T2
Read(X)
Read(X)
Write(X)
Write(X)
Abort
Commit

S3: T1 T2 T3
read(X)
write(X)
commit
write(X)
commit
read(X),
commit

S4: T1 T2 T3

Read(X)
Read(Y)
Write(X)
Read(X)
Read(Y)
Commit
Commit
Commit

CSI3130 / 3530 – December 11, 2024 Final Examination Page 7 of 14


For each one of them, state whether it is conflict serializable, recoverable or avoids cascading
aborts. If the answer is “NO”, say briefly why. Use the table and space below to answer.

CONFLICT SERIALIZABLE RECOVERABLE AVOID CASCADING ABORTS


S1 NO (*) YES NO (*****)
S2 NO (**) YES YES
S3 YES YES YES
S4 YES NO (***) NO (****)

Justifications:

(*) – Give a precedence or serializabilty graph


(**) – Give a precedence or serializabilty graph
(***) – T2 reads X after T3 wrote x with no commit of T3 inbetween
(****) – Similar reason as in (***)
(*****) – Similar reason as in (***)

CSI3130 / 3530 – December 11, 2024 Final Examination Page 8 of 14


(C) [10%] Consider the granularity hierarchy below:

Using the multiple granularity locking protocol of Section 18.3 in the textbook, tell what each one of
the following transactions need to lock and in which mode to lock :

1. T1 reads the entire database

T1 needs to lock the node DB in S mode.

2. T2 reads record rb1 in file Fb

T2 needs to lock the nodes DB, A1, and Fb in IS mode (and in that order), and finally lock node
rb1 in S mode.

3. T3 reads all the records in file Fc

T3 needs to lock the nodes DB and A1 in IS mode (and in that order), and finally lock node Fb in
S mode.

4. T4 modifies record ra1 in file Fa

T4 needs to lock the nodes DB, A1, and Fa in IX mode (and in that order), and finally lock node
ra1 in X mode.

CSI3130 / 3530 – December 11, 2024 Final Examination Page 9 of 14


QUESTION IV. [Recovery: 30%]

(A) [10% -- Short Questions on ARIES]

(1) If, at the beginning of the Analysis Phase, a page is not in the checkpoint's modified pages
table, will we need to apply redo records to it? What for?
We may apply redo records to the page, since the later may have been brought into the Dirty
Page Table by redo actions that have already occurred since the end of the Analysis phase. Such
redo actions affect RecLSNs and PageLSNs and therefore impact the optimization of the
REDO phase of ARIES.

(2) What is LastLSN and how is it used to avoid unnecessary REDOs?


LastLSN is the LSN of the last action of a transaction. It is NOT used to avoid unnecessary
REDOs. To avoid unnecessary REDOs, RecLSN and PageLSN are the data structures that are
used in the REDO phase of ARIES.

(3) What are the following policies:


no-force: updated blocks need not be written to disk when transaction commits.
steal: blocks containing updates of uncommitted transactions can be written to disk, even before
the transaction commits, hence allowing other transactions to use the buffer block frames that
were used by those blocks that are written to disk.
Write Ahead Logging: Before a block of data in main memory is output to the database, all log
records pertaining to data in that block must have been output beforehand to stable storage.

CSI3130 / 3530 – December 11, 2024 Final Examination Page 10 of 14


(B) [20% -- ARIES] Assume the instance of the Log illustrated in Figure 19.10 in the
textbook (Given below):

Suppose that there is no crash after log record 7571 in Figure 19.10, but the system continues
its work by logging as follows:

7578:< T148, 2800.1, 30, 60>


7577:< T147, 4894.1, 90, 60>
7576:< T148, 4894.1, 40, 90>
7575:< T148 begin>
7574:< T147, 2390.4, 60, 30>
7573:< T147, 2390.4, 90, 60>
7572:< T147 begin>

Now suppose that the crash occurs right after log record 7578:< T148, 2800.1, 30, 60> is
written out, but before Page 2800 is written to disk. Run the Analysis, REDO and UNDO phases
of the Aries algorithm by doing the following:
• Show what happens during the ANALYSIS phase by determining the RedoLSN, as well as
the contents of the Transaction Table and the Dirty Page Table at the end of the
Analysis phase.
• Perform REDO by taking the appropriate actions.
• Perform UNDO by extending the log appropriately.

CSI3130 / 3530 – December 11, 2024 Final Examination Page 11 of 14


Answer to the ARIES Question IV B:

ANALYSIS Phase:
Smallest RecLSN = 7564, henceforth RedoLSN = 7564.
• Scan the log forward starting at LSN 7568, the last checkpoint LSN, where the Transaction Table (TT) has T145 in it
and the Dirty Page Table (DPT) has two entries
• At LSN 7569, add Transaction T146 to the TT with LastLSN 7569
• At LSN 7570, update LastLSN of T146 to 7570 and add PageID 2390 to DPT with PageLSN 7570 and RecLSN 7570
• At LSN 7571, remove T146 from the TT
• At LSN 7572, add Transaction T147 to the TT with LastLSN 7572
• At LSN 7573, update LastLSN of T147 to 7573 (Assume PageID 2390 is still in the main memory)
• At LSN 7574, update LastLSN of T147 to 7574 and update PageLSN of PageID 2390 to 7574
• At LSN 7575, add Transaction T148 to the TT with LastLSN 7575
• At LSN 7576, update LastLSN of T148 to 7576 and update PageLSN of PageID 4894 to 7576
• At LSN 7577, update LastLSN of T147 to 7577 and update PageLSN of PageID 4894 to 7577
• At LSN 7578, update LastLSN of T148 to 7578 and add PageID 22800 to DPT with PageLSN 7578 and RecLSN 7578.
At this point, TT and DPT are as shown below:
Transaction Table Dirty Page Table

transID lastLSN pageID PageLSN recLSN


T145 7567 4894 7567 7564
T147 7574 7200 7565 7565
T148 7576 2390 7574 7570
4894 7576 7576
2800 7578 7578

REDO Phase: This phase starts at the RedoLSN 7564 discovered in the Analysis Phase. I omit the details. To simplify
things, assume that all actions are redone. No redone actions are logged.
UNDO Phase: From the Transaction Table, we now know that the transaction that were active at the time of the crash
are T145, T147, and T148. These transactions must be undone in the UNDO Phase by scanning the log backwards all the
way to the oldest log entry of the losing transactions by following the ARIES UNDO algorithm; we start with T148 which has
the largest LastLSN. I omit details of the tracing of the algorithm, the result of which is the following extension to the log
(the new entries are in red):
7588:< T145 abort>
7587:< T145, 4894.1, 20>
7586:< T145, 4894.1, 40>
7585:< T147 abort>
7584 :<T147, 2390.4, 90>
7583:< T147, 2390.4, 60>
7582:< T148 abort>
7581:< T148, 4894.1, 40>
7580:< T147, 4894.1, 90>
7579:< T148, 2800.1, 30>
7578:< T148, 2800.1, 30, 60>
7577:< T147, 4894.1, 90, 60>
7576:< T148, 4894.1, 40, 90>
7575:< T148 begin>
7574:< T147, 2390.4, 60, 30>
7573:< T147, 2390.4, 90, 60>
7572:< T147 begin>
… [OLDER PORTION OF THE LOG]
CSI3130 / 3530 – December 11, 2024 Final Examination Page 12 of 14
QUESTION V. [Big Data: 10%]

Consider the diagram above that depicts the parallel processing of a MapReduce job. In the space
below, describe how such parallel processing works from the reading of the input files to the result in
the output files. Be sure to state what happens at each step and what are the roles of the User
Program, the Master, as well as the map and reduce functions.

CSI3130 / 3530 – December 11, 2024 Final Examination Page 13 of 14


Answer to the Big Data Question V

The goal of the MapReduce paradigm is to enable parallel processing of the overall task. The overall
job is divided in map tasks (Map i in the diagram) assigned to multiple machines to be executed in
parallel. Similarly, reduce tasks (Reduce j in the diagram) are also assigned to multiple machines to be
executed in parallel. The map and reduce tasks are executed in parallel as follows:

Step 1: Input and setup. The overall input is divided in partitions (Part i in the diagram) that could be
entire files or portions of files. A distributed file system is used to save the partitions and parallelize
inputs. Several partitions can be assigned to a single map task. A Master node gets copies of the map()
and reduce() codes provided by a User Program and sends them to map and reduce tasks, respectively.
Step 2: Mapping and role of the map function. The map tasks execute the map code write output
data to local files stored on machines where the tasks are executed, after sorting and partitioning it
using the reduce keys values. Each map task node creates separate files for each reduce task.
Step 3: Reducing and the role of the reduce function. Reduce tasks fetch the output of the previous
step across the network. Files fetched by a given reduce task from map tasks across the network are
sorted and merged to ensure that all occurrences of a given reduce key are kept together in the sorted
file. Finally, the reduce() functions are fed with the sorted files. Reduce task nodes parallelize file
output across multiple machines using a distributed file system.

CSI3130 / 3530 – December 11, 2024 Final Examination Page 14 of 14

You might also like