10-linkedlist
10-linkedlist
Linked Lists
15-122: Principles of Imperative Computation (Spring 2020)
Frank Pfenning, Rob Simmons, André Platzer, Iliano Cervesato
In this lecture we discuss the use of linked lists to implement the stack and
queue interfaces that were introduced in the last lecture. The linked list
implementation of stacks and queues allows us to handle work lists of any
length.
This fits as follows with respect to our learning goals:
Computational Thinking: We discover that arrays contain implicit infor-
mation, namely the indices of elements, which an be made explicit as
the addresses of the nodes of a linked list. We also encounter the no-
tion of trade-off, as arrays and linked lists have different advantages
and drawbacks and yet achieve similar purposes.
Algorithms and Data Structures: We explore linked lists, a data structure
used pervasively in Computer Science, and examine some basic algo-
rithms about them.
Programming: We see that programming algorithms for linked lists can
be tricky, which exposes once more the power of stating and checking
invariant. We use linked lists to implement stacks and queues.
1 Linked Lists
Linked lists are a common alternative to arrays in the implementation of
data structures. Each item in a linked list contains a data element of some
type and a pointer to the next item in the list. It is easy to insert and delete
elements in a linked list, which are not natural operations on arrays, since
arrays have a fixed size. On the other hand access to an element in the
middle of the list is usually O(n), where n is the length of the list.
An item in a linked list consists of a struct containing the data element
and a pointer to another linked list. In C0 we have to commit to the type
of element that is stored in the linked list. We will refer to this data as
having type elem, with the expectation that there will be a type definition
elsewhere telling C0 what elem is supposed to be. Keeping this in mind
ensures that none of the code actually depends on what type is chosen.
These considerations give rise to the following definition:
struct list_node {
elem data;
struct list_node* next;
};
typedef struct list_node list;
This definition is an example of a recursive type. A struct of this type
contains a pointer to another struct of the same type, and so on. We usually
use the special element of type t*, namely NULL, to indicate that we have
reached the end of the list. Sometimes (as will be the case for our use of
linked lists in stacks and queues), we can avoid the explicit use of NULL and
obtain more elegant code. The type definition is there to create the type
name list, which stands for struct list_node, so that a pointer to a list
node will be list*. We could also have written these two statements in the
other order, to make better use of the type definition:
typedef struct list_node list;
struct list_node {
elem data;
list* next;
};
There are some restriction on recursive types. For example, a declara-
tion such as
struct infinite {
int x;
struct infinite next;
}
would be rejected by the C0 compiler because it would require an infinite
amount of space. The general rule is that a struct can be recursive, but
the recursion must occur beneath a pointer or array type, whose values are
addresses. This allows a finite representation for values of the struct type.
We don’t introduce any general operations on lists; let’s wait and see
what we need where they are used. Linked lists as we use them here are
a concrete type which means we do not construct an interface and a layer of
Lecture 10: Linked Lists 3
abstraction around them. When we use them we know about and exploit
their precise internal structure. This is in contrast to abstract types such as
queues or stacks whose implementation is hidden behind an interface, ex-
porting only certain operations. This limits what clients can do, but it al-
lows the author of a library to improve its implementation without having
to worry about breaking client code. Concrete types are cast into concrete
once and for all.
2 List segments
A lot of the operations we’ll perform in the next few lectures are on segments
of lists: a series of nodes starting at start and ending at end.
Recursively:
bool is_segment(list* start, list* end) {
if (start == NULL) return false;
if (start == end) return true;
return is_segment(start->next, end);
}
In code:
bool is_acyclic(list* start) {
if (start == NULL) return true;
list* h = start->next; // hare
list* t = start; // tortoise
while (h != t) {
if (h == NULL || h->next == NULL) return true;
h = h->next->next;
//@assert t != NULL; // faster hare hits NULL quicker
t = t->next;
}
//@assert h == t;
return false;
}
A few points about this code: in the condition inside the loop we exploit
the short-circuiting evaluation of the logical or ‘||’ so we only follow the
next pointer for h when we know it is not NULL. Guarding against trying to
dereference a NULL pointer is an extremely important consideration when
writing pointer manipulation code such as this. The access to h->next and
h->next->next is guarded by the NULL checks in the if statement.
This algorithm is a variation of what has been called the tortoise and the
hare and is due to Floyd 1967.
also want both front and back not to be NULL so it conforms to the pic-
ture, with one element already allocated even if the queue is empty; the
is_segment function we already wrote enforces this.
bool is_queue(queue* Q) {
return Q != NULL
&& is_acyclic(Q->front)
&& is_segment(Q->front, Q->back);
}
To check if the queue is empty we just compare its front and back. If
they are equal, the queue is empty; otherwise it is not. We require that we
are being passed a valid queue. Generally, when working with a data struc-
ture, we should always require and ensure that its invariants are satisfied
in the pre- and post-conditions of the functions that manipulate it. Inside
the function, we will generally temporarily violate the invariants.
bool queue_empty(queue* Q)
//@requires is_queue(Q);
{
return Q->front == Q->back;
}
To obtain a new empty queue, we just allocate a list struct and point both
front and back of the new queue to this struct. We do not initialize the list
element because its contents are irrelevant, according to our representation.
Said this, it is good practice to always initialize memory if we care about
its contents, even if it happens to be the same as the default value placed
there.
queue* queue_new()
//@ensures is_queue(\result);
//@ensures queue_empty(\result);
{
queue* Q = alloc(queue); // Create header
list* dummy = alloc(list); // Create dummy node
Q->front = dummy; // Point front
Q->back = dummy; // and back to dummy node
return Q;
}
To enqueue something, that is, add a new item to the back of the queue,
we just write the data into the extra element at the back, create a new back
Lecture 10: Linked Lists 10
element, and make sure the pointers are updated correctly. You should
draw yourself a diagram before you write this kind of code. Here is a
before-and-after diagram for inserting 3 into a list. The new or updated
items are dashed in the second diagram.
In code:
void enq(queue* Q, elem x)
//@requires is_queue(Q);
//@ensures is_queue(Q);
{
list* new_dummy = alloc(list); // Create a new dummy node
Q->back->data = x; // Store x in old dummy node
Q->back->next = new_dummy;
Q->back = new_dummy;
}
Finally, we have the dequeue operation. For that, we only need to
change the front pointer, but first we have to save the dequeued element
in a temporary variable so we can return it later. In diagrams:
Lecture 10: Linked Lists 11
And in code:
elem deq(queue* Q)
//@requires is_queue(Q);
//@requires !queue_empty(Q);
//@ensures is_queue(Q);
{
elem x = Q->front->data;
Q->front = Q->front->next;
return x;
}
Lecture 10: Linked Lists 12
Let’s verify that our pointer dereferencing operations are safe. We have
Q->front->data
which entails two pointer dereference. We know is_queue(Q) from the
precondition of the function. Recall:
bool is_queue(queue Q) {
return Q != NULL
&& is_acyclic(Q->front)
&& is_segment(Q->front, Q->back);
}
We see that Q->front is okay, because by the first test we know that Q != NULL
is the precondition holds. By the second test we see that both Q->front and
Q->back are not null, and we can therefore dereference them.
We also make the assignment Q->front = Q->front->next. Why does
this preserve the invariant? Because we know that the queue is not empty
(second precondition of deq) and therefore Q->front != Q->back. Be-
cause Q->front to Q->back is a valid non-empty segment, Q->front->next
cannot be null.
An interesting point about the dequeue operation is that we do not ex-
plicitly deallocate the first element. If the interface is respected there cannot
be another pointer to the item at the front of the queue, so it becomes un-
reachable: no operation of the remainder of the running programming could
ever refer to it. This means that the garbage collector of the C0 runtime sys-
tem will recycle this list item when it runs short of space.
bool is_stack(stack* S) {
return S != NULL
&& is_acyclic(S->top)
&& is_segment(S->top, S->floor);
}
Popping from a stack requires taking an item from the front of the
linked list, which is much like dequeuing.
elem pop(stack* S)
//@requires is_stack(S);
//@requires !stack_empty(S);
//@ensures is_stack(S);
{
elem x = S->top->data;
S->top = S->top->next;
return x;
}
To push an element onto the stack, we create a new list item, set its data
field and then its next field to the current top of the stack — the opposite
end of the linked list from the queue. Finally, we need to update the top
Lecture 10: Linked Lists 14
field of the stack to point to the new list item. While this is simple, it is still
a good idea to draw a diagram. We go from
to
In code:
void push(stack* S, elem x)
//@requires is_stack(S);
//@ensures is_stack(S);
{
list* p = alloc(list); // Allocate a new top node
p->data = x;
p->next = S->top;
S->top = p;
}
The client-side type stack_t is defined as a pointer to a stack_header:
typedef stack* stack_t;
This completes the implementation of stacks.
Lecture 10: Linked Lists 15
6 Sharing
We observed in the last section that the floor pointer of a stack_header
structure is unused other than for checking that a stack is empty. This sug-
gests a simpler representation, where we take the empty stack to be NULL
and do without the floor pointer. This yields the following declarations
typedef struct stack_header stack;
struct stack_header {
list* top;
};
bool is_stack(stack* S) {
return S != NULL && is_acyclic(S->top);
}
and pictorial representation of a stack:
But, then, why have a header at all? Can’t we define the stack simply to be
the linked list pointed by top instead?
Eliminating the header would lead to a redesign of the interface and
therefore to changes in the code that the client writes. Specifically,
2. More dramatically, we need to change the type of push and pop. Con-
sider performing the operation push(S, 4) where S contains the ad-
dress of the stack from the caller’s perspective:
Lecture 10: Linked Lists 16
where p is a pointer to the newly allocated list node. Note that the
stack has not changed from the point of view of the caller! In fact,
from the caller’s standpoint, S still points to the node containing 3.
The only way for the caller to access the updated stack is that the
pointer p be given back to it. Thus, push must now return the updated
stack. Therefore, we need to change its prototype to
stack_t push(stack_t S, elem x);
The same holds for pop, with a twist: pop already returns the value
at the top of the stack. It now needs to return both this value and the
updated stack.
With such header-less stacks, the client has the illusion that push and pop
produces a new stack each time they are invoked. However, the underlying
linked lists share many of the same elements. Consider performing the
following operations on the stack S above:
stack_t S1 = push(S, 4);
stack_t S2 = push(S, 5);
This yields the following memory layout:
Lecture 10: Linked Lists 17
All three stacks share nodes 3, 2 and 1. Observe furthermore that the second
call to push operated on S, which remained unchanged after the first call.
At this point, a pop on S would result in a fourth stack, say S3, which points
to node 2.
Sharing is an efficient approach to maintaining multiple versions of a
data structure as a sequence of operations is performed on them. Sharing is
not without its perils, however. As an exercise, consider an implementation
of queues such that enq and deq return to their caller a pair of pointers
to the front and back of the underlying linked list (maybe packaged in a
struct). A carefully chosen series of enq and deq operations will break the
queue (or more precisely its representation invariant).
Lecture 10: Linked Lists 18
7 Exercises
Exercise 1 (sample solution on page 21). Define the function
bool is_sum(list* start, list* end, int sum);
that checks that the sum of all nodes in the segment from start to end is equal
to sum. You may assume that the data contained in each node is an integer. How
should it behave if the segment is empty?
Exercise 5 (sample solution on page 23). The function ith(l,i) you defined
in an earlier exercise works just like an array access A[i], except that it does
so on a linked list. Using it and other functions you wrote for previous exercises,
implement a version of binary search that operates on list segments. For simplicity,
you may assume that the type elem of data elements has been defined to be int.
Here’s the function prototype.
Lecture 10: Linked Lists 19
Exercise 7. Consider what would happen if we pop an element from the empty
stack when contracts are not checked in the linked list implementation? When
does an error arise?
Exercise 10 (sample solution on page 23). Here’s a simple idea to check that a
linked list is acyclic: first, we make a copy p of the start pointer. Then when we
advance p we run through an auxiliary loop to check if its next element is already
in the list. The code would be something like this:
bool is_acyclic(list* start) {
for (list* p = start; p != NULL; p = p->next)
//@loop_invariant is_segment(start, p);
{
if (p == NULL) return true;
Exercise 11. We say “on the ith iteration of our naive is_segment loop, we know
that we can get from start to p by following exactly i pointers.” Write a func-
tion is_reachable_in(list* start, list* end, int numsteps); this
function should return true if we can get from start to end in exactly numsteps
steps. Use this function as a loop invariant for is_segment.
Exercise 12. What happens when we swap the order of the lines in the enq func-
tion and why?
Sample Solutions
Solution of exercise 1 The sum of all the elements in a list (or array) seg-
ment would be defined recursively as the first element plus the sum of
the rest of the segment. Then, it is natural to define the sum of an empty
segment to be zero. Thus, is_sum(start, end, n) on an empty segment
would return true exactly when n == 0.
Solution of exercise 2
int lseg_len(list* start, list* end)
//@is_segment(start, end);
{
int n = 0;
for (list* p = start; p != end; p = p->next)
//@loop_invariant p != NULL;
{
n++;
}
return n;
}
Solution of exercise 3
Lecture 10: Linked Lists 22
Solution of exercise 4
bool is_in_lseg(int x, list* start, list* end)
//@requires is_segment(start, end);
{
for (list* p = start; p != end; p = p->next)
//@loop_invariant p != NULL;
{
if (p->data == x) return true;
}
return false;
}
int x = start->data;
for (list* p = start->next; p != end; p = p->next)
//@loop_invariant p != NULL;
{
if (x > p->data) return false;
x = p->data;
}
return true;
}
Lecture 10: Linked Lists 23
Solution of exercise 5
int lseg_binsearch(int x, list* start, list* end)
//@requires is_sorted_lseg(A, 0, n);
/*@ensures (\result == -1 || !is_in_lseg(x, start, end))
|| (0 < \result && \result < lseg_len(start, end) &&
\ith(start, \result) == x); @*/
{
int lo = 0;
int hi = lseg_len(start, end);
Solution of exercise 10 The code does not work when the input-list is a
self-loop, as in the following example:
int main() {
list* a = alloc(list);
a->next = a; // self loop
assert(is_acyclic(a, NULL));
return 0;
}