0% found this document useful (0 votes)
52 views

Articles: Lamport Timestamps 1 Matrix Clocks 3 Vector Clock 3 Application Checkpointing 5

Lamport timestamps provide a partial ordering of events in a distributed system with minimal overhead. Vector clocks are an algorithm that uses logical clocks maintained in vectors to generate a partial ordering of events and detect causality violations. Matrix clocks maintain a vector of vector clocks for each communicating host to capture chronological and causal relationships in a distributed system.

Uploaded by

vagababov
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Articles: Lamport Timestamps 1 Matrix Clocks 3 Vector Clock 3 Application Checkpointing 5

Lamport timestamps provide a partial ordering of events in a distributed system with minimal overhead. Vector clocks are an algorithm that uses logical clocks maintained in vectors to generate a partial ordering of events and detect causality violations. Matrix clocks maintain a vector of vector clocks for each communicating host to capture chronological and causal relationships in a distributed system.

Uploaded by

vagababov
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Contents

Articles
Lamport timestamps 1
Matrix clocks 3
Vector clock 3
Application checkpointing 5

References
Article Sources and Contributors 8
Image Sources, Licenses and Contributors 9

Article Licenses
License 10
Lamport timestamps 1

Lamport timestamps
Determining the order of events is difficult in a distributed computer system, since different nodes or processes will
typically not be perfectly synchronized. Lamport timestamps is a simple algorithm to provide a partial ordering of
events with minimal overhead, and conceptually provide a starting point for the more advanced Vector clock
method.
Distributed algorithms such as resource synchronization often depend on some method of ordering events to
function. For example, consider a system with two processes and a disk. The processes send messages to each other,
and also send messages to the disk requesting access. The disk grants access in the order the messages were sent.
Now, imagine process 1 sends a message to the disk asking for access to write, and then sends a message to process
2 asking it to read. Process 2 receives the message, and as a result sends its own message to the disk. Now, due to
some timing delay, the disk receives both messages at the same time: how does it determine which message
happened-before the other? A logical clock algorithm provides a mechanism to determine facts about the order of
such events.
Leslie Lamport invented a simple mechanism by which the happened-before ordering can be captured numerically.
A Lamport logical clock is a monotonically incrementing software counter maintained in each process.
It follows some simple rules:
1. A process increments its counter before each event in that process;
2. When a process sends a message, it includes its counter value with the message;
3. On receiving a message, the receiver process sets its counter to be greater than the maximum of its own value and
the received value before it considers the message received.
Conceptually, this logical clock can be thought of as a clock that only has meaning in relation to messages moving
between processes. When a process receives a message, it resynchronizes its logical clock with that sender.
Considerations: For every two events a and b occurring in the same process, and C(x) being the timestamp for a
certain event x, it is necessary that C(a) never equal C(b).
Therefore it is necessary that:
1. The logical clock be set so that there is minimum of one clock "tick" (increment of the counter) between events a
and b;
2. In a multiprocess or multithreaded environment, it might be necessary to attach the process ID (PID) or any other
unique ID to the timestamp so that it is possible to differentiate between events a and b which may occur
simultaneously in different processes.
Implications: A Lamport clock may be used to create a partial causal ordering of events between processes. Given a
logical clock following these rules, the following relation is true: if then , where means
"happened before."
This relation only goes one way, and is called clock consistency condition: if one event comes before another, then
that event's logical clock comes before the other's. The strong clock consitency condition, which is two way, can be
obtained by other techniques such as vector clocks. Using only a simple Lamport clock, only a partial causal
ordering can be inferred from the clock.
However, via the contrapositive, it's true that implies . So, for example, if
then a cannot have occurred before b.
Another way of putting this is that means that a may have happened before or at the same time as b,
but a did not happen after b.
Nevertheless, Lamport timestamps can be used to create a total ordering of events in a distributed system by using
some arbitrary mechanism to break ties (e.g. the ID of the process). The caveat is that this ordering is artifactual and
Lamport timestamps 2

cannot be depended on to imply a causal relationship.

Lamport's logical clock in distributed systems


• In a distributed system, it is not possible in practice to synchronize time across entities (typically thought of as
processes) within the system; hence, the entities can use the concept of a logical clock based on the events
through which they communicate.
• If two entities do not exchange any messages, then they probably do not need to share a common clock; events
occurring on those entities are termed as concurrent events.
• Among the processes on the same local machine we can order the events based on the local clock of the system.
• When two entities communicate by message passing, then the send event is said to 'happen before' the receive
event, and the logical order can be established among the events.
• A distributed system is said to have partial order if we can have a partial order relationship among the events in
the system. If 'totality', i.e., causal relationship among all events in the system can be established, then the system
is said to have total order.

See also
• Vector clocks

References
• Leslie Lamport (1978). "Time, clocks, and the ordering of events in a distributed system" [1]. Communications of
the ACM 21 (7): 558–565. doi:10.1145/359545.359563.

References
[1] http:/ / research. microsoft. com/ users/ lamport/ pubs/ time-clocks. pdf
Matrix clocks 3

Matrix clocks
A matrix clock is a mechanism for capturing chronological and causal relationships in a distributed system.
Matrix clocks are a generalization of the notion of vector clocks.[1] A matrix clock maintains a vector of the vector
clocks for each communicating host.
Every time a message is exchanged, the sending host sends not only what it knows about the global state of time, but
also the state of time that it received from other hosts.
This allows establishing a lower bound on what another hosts know, and is useful in applications such as
checkpointing and garbage collection.

References
[1] Drummond, Lúcia M. A.; Barbosa, Valmir C. (2003). "On reducing the complexity of matrix clocks". Parallel Computing 29 (7): 895–905.
doi:10.1016/S0167-8191(03)00066-8.

Vector clock
Vector clocks is an algorithm for generating a partial ordering of events in a distributed system and detecting
causality violations. Just as in Lamport timestamps, interprocess messages contain the state of the sending process's
logical clock. A vector clock of a system of N processes is an array/vector of N logical clocks, one clock per process;
a local "smallest possible values" copy of the global clock-array is kept in each process, with the following rules for
clock updates:
• Initially all clocks are zero.
• Each time a process experiences an
internal event, it increments its own
logical clock in the vector by one.
• Each time a process prepares to send
a message, it increments its own
logical clock in the vector by one
and then sends its entire vector along
with the message being sent.
• Each time a process receives a
message, it increments its own
logical clock in the vector by one Example of a system of vector clocks
and updates each element in its
vector by taking the maximum of the value in its own vector clock and the value in the vector in the received
message (for every element).

The vector clocks algorithm was independently developed by Colin Fidge [1] and Friedemann Mattern in 1988. [2] [3]
Vector clock 4

Partial ordering property


Vector clocks allow for the partial causal ordering of events. Defining the following:
• VC(x) denote the vector clock of event x
• In English: VC(x) is less than VC(y) if and only if VC(x)[z] is less than or equal to VC(y)[z] for all indices z
and there exists an index z' such that VC(x)[z'] is strictly less than VC(y)[z'].
• denote event x happened before event y. It's defined as: if , then VC(x) < VC(y)
Properties:
• If VC(a) < VC(b), then
• Antisymmetry: if VC(a) < VC(b), then ¬ VC(b) < VC(a)
• Transitivity: if VC(a) < VC(b) and VC(b) < VC(c), then VC(a) < VC(c) or if and , then
Relation with other orders:
• Let RT(x) be the real time when event x occurs. If VC(a) < VC(b), then RT(a) < RT(b)
• Let C(x) be the lamport timestamp of event x. If VC(a) < VC(b), then C(a) < C(b)

Other mechanisms
• Torres-Rojas and Ahamad, developed in 1999 Plausible Clocks[4] , a mechanism that takes less space than vector
clocks but that, in some cases, will totally order events that are causally concurrent.
• Almeida et al, introduced in 2008 Interval Tree Clocks[5] . This mechanism generalizes Vector Clocks and allows
operation in dynamic environments when the identities and number of processes in the computation is not known
in advance.

See also
• Lamport timestamps
• Version vector
• Matrix clocks

References
[1] http:/ / sky. scitech. qut. edu. au/ ~fidgec/
[2] Colin J. Fidge (February 1988). "Timestamps in Message-Passing Systems That Preserve the Partial Ordering" (http:/ / sky. scitech. qut. edu.
au/ ~fidgec/ Publications/ fidge88a. pdf). In K. Raymond (Ed.). . pp. 56–66. . Retrieved 2009-02-13.
[3] Mattern, F. (October 1988), "Virtual Time and Global States of Distributed Systems", in Cosnard, M., Proc. Workshop on Parallel and
Distributed Algorithms, Chateau de Bonas, France: Elsevier, pp. 215–226
[4] Torres-Rojas, Francisco; Ahamad, Mustaque (1999), "Plausible clocks: constant size logical clocks for distributed systems", Distributed
Computing (Springer Verlag) 12 (4): 179–195, doi:10.1007/s004460050065
[5] Almeida, Paulo; Baquero, Carlos; Fonte, Victor (2008), "Interval Tree Clocks: A Logical Clock for Dynamic Systems" (http:/ / gsd. di.
uminho. pt/ members/ cbm/ ps/ itc2008. pdf), Principles of Distributed Systems, 5401, Springer-Verlag, Lecture Notes in Computer Science,
pp. 259–274, doi:10.1007/978-3-540-92221-6,
Vector clock 5

External links
• Vector clock implementation in Erlang (http://bitbucket.org/justin/riak/src/tip/apps/riak/src/vclock.erl)
• Timestamp-based vector clock implementation in Erlang (http://github.com/cliffmoon/dynomite/blob/master/
elibs/vector_clock.erl)
• Explanation of Vector clocks (http://blog.basho.com/2010/01/29/why-vector-clocks-are-easy/)
• Fundamentals of Distributed Computing: A Practical Tour of Vector Clock Systems (http://net.pku.edu.cn/
~course/cs501/2008/reading/a_tour_vc.html)
• Why Vector Clocks are Hard (http://blog.basho.com/2010/04/05/why-vector-clocks-are-hard/)

Application checkpointing
Checkpointing is a technique for inserting fault tolerance into computing systems. It basically consists of storing a
snapshot of the current application state, and later on, use it for restarting the execution in case of failure.

Technique properties
There are many different points of view and techniques for achieving application checkpointing. Depending on the
specific implementation, a tool can be classified as having several properties:
• Amount of state saved: This property refers to the abstraction level used by the technique to analyze an
application. It can range from seeing each application as a black box, hence storing all application data, to
selecting specific relevant cores of data in order to achieve a more efficient and portable operation.
• Automatization level: Depending on the effort needed to achieve fault tolerance through the use of a specific
checkpointing solution.
• Portability: Whether or not the saved state can be used on different machines to restart the application.
• System architecture: How is the checkpointing technique implemented: inside a library, by the compiler or at
operating system level.
Each design decision made affects the properties and efficiency of the final product. For instance, deciding to store
the entire application state will allow for a more straightforward implementation, since no analysis of the application
will be needed, but it will deny the portability of the generated state files, due to a number of non-portable structures
(such as application stack or heap) being stored along with application data.

Use in distributed shared memory systems


In distributed shared memory, checkpointing is a technique that helps tolerate the errors leading to losing the effect
of work of long-running applications. The main property which should be induced by checkpointing techniques in
such systems is in preserving system consistency in case of failure. There are two main approaches to checkpointing
in such systems: coordinated checkpointing, in which all cooperating processes work together to establish coherent
checkpoint; and communication induced (called also dependency induced) independent checkpointing.
It must be stressed that simply forcing processes to checkpoint their state at fixed time intervals is not sufficient to
ensure global consistency. Even if we postulate the existence of global clock, the checkpoints made by different
processes still may not form a consistent state. The need for establishing a consistent state may force other process to
roll back to their checkpoints, which in turn may cause other processes to roll back to even earlier checkpoints,
which in the most extreme case may mean that the only consistent state found is the initial state (the so called
domino effect).
Application checkpointing 6

In the coordinated checkpointing approach, processes must ensure that their checkpoints are consistent. This is
usually achieved by some kind of two-phase commit protocol algorithm. In communication induced checkpointing,
each process checkpoints its own state independently whenever this state is exposed to other processes (that is, for
example whenever a remote process reads the page written to by the local process).
The system state may be saved either locally, in stable storage, or in a distant node's memory.

Practical implementations for Linux/Unix


A number of practical checkpointing packages have been developed for the Linux/Unix family of operating systems.
These checkpointing packages may be divided into two classes, those which operate in user space, examples of
which include the checkpointing package used by Condor [1] and the portable checkpointing library [2] developed by
The University of Tennessee. User space checkpointing packages are highly portable and can typically be compiled
and run on any modern Unix (e.g. Linux, FreeBSD [3], OpenBSD [4], Darwin [5] etc). In contrast, kernel based
checkpointing packages such as Chpox [6] and the checkpointing algorithms developed for the MOSIX [7] cluster
computing environment tend to be highly operating system dependent. Most kernel based checkpointing packages
developed to date run under either the 2.4 or 2.6 subfamilies of the Linux kernel [8] on i686 architectures.

Cryopid
Modern checkpointing packages such as Cryopid are capable of checkpointing a process pod, that is a parent
process and all its associated children, and of dealing with file system abstractions such as sockets and pipes (FIFO's)
in addition to regular files. In the case of Cryopid, there is also provision to roll all dynamic libraries, open files,
sockets and FIFO's associated with the process into the checkpoint. This is very useful when the checkpointed
process is to be restarted in a heterogeneous environment (e.g. the machine on which the checkpoint is restarted has
libraries and file system which differ from the host on which the process was checkpointed). Note that Cryopid [9]
seems to be unmaintained and does not compile on recent (2.6.22) Linux x86-64 kernels.
Note Cryopid is still maintained and is now available via the SourceForge project Cryopid2 [10]. This version of
Cryopid will compile on all Linux kernels up to 2.6.27 for 32-bit kernels. Work is in hand to get Cryopid2 working
on 64-bit kernels. The cryopid2 package extends Benard Blackhams original Cryopid package in a number of
significant ways. For example, it allows the state of Linux real time signals to be preserved when a checkpoint is
taken and also is capable of inter-operating with ssh via a portal daemon in order to implement full process
migration (and of any associated pod processes) between Linux hosts. Cryopid2 also has the capability to roll up its
environment (e.g. the bodies of open files) into the checkpoints it produces. This facilitates the migration of
processes onto foreign hosts which present an arbitrary file system environment to an inbound migrating process.
Pipes are also preserved in a similar manner: their contents are sucked into the migrating process prior to migration
(so they form part of the checkopoint) and spat out into the kernel of the new host when execution is resumed.
Cryopid2 is inter-operable with the P3 [11] Organic computing environment which uses its services for both
persistence and process migration.

DMTCP
DMTCP [12] (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpointing the state of an
arbitrary group of programs spread across many machines and connected by sockets. It does not modify the user's
program nor the operating system.
Among the applications supported by DMTCP are Open MPI, MATLAB, Python, Perl, and many programming
languages and shell scripting languages. With the use of TightVNC, it can also checkpoint and restart X Window
applications, as long as they do not use extensions (e.g.: no OpenGL, no video). Among the Linux features supported
by DMTCP are open file descriptors, pipes, sockets, signal handlers, process id and thread id virtualization (ensure
old pids and tids continue to work upon restart), ptys, fifos, process group ids, session ids, terminal attributes, and
Application checkpointing 7

mmap/mprotect (including mmap-based shared memory). See the QUICK-START file of the distribution for further
details.
DMTCP is also the basis for URDB [13], the Universal Reversible Debugger. URDB is still experimental.
Nevertheless, it currently adds reversibility to gdb, MATLAB, Python (pdb), and Perl (perl -d). It also supports
reverse expression watchpoints, a form of temporal search within a process lifetime.

OpenVz
OpenVZ kernel has an ability to checkpoint and restart a virtual private server (VPS), i.e. a set of processes and all
the data structures associated with those processes (opened files, sockets, IPC objects, network connections, etc.).
The primary use of checkpointing is "live migration", a move of a VPS from one physical server to another without a
need to shut down and restart it. OpenVZ supports checkpointing on x86, x86-64 and IA-64 architectures.

References
• E.N. Elnozahy, L. Alvisi, Y-M. Wang, and D.B. Johnson, "A survey of rollback-recovery protocols in
message-passing systems", ACM Comput. Surv., vol. 34, no. 3, pp. 375-408, 2002.
• The Home of Checkpointing Packages [14]
• Yibei Ling, Jie Mi, Xiaola Lin: A Variational Calculus Approach to Optimal Checkpoint Placement. IEEE Trans.
Computers 50(7): 699-708 (2001)
• R.E. Ahmed, R.C. Frazier, and P.N. Marinos, " Cache-Aided Rollback Error Recovery (CARER) Algorithms for
Shared-Memory Multiprocessor Systems", IEEE 20th International Symposium on Fault-Tolerant Computing
(FTCS-20), Newcastle upon Tyne, UK, June 26-28, 1990, pp. 82-88.
[1] http:/ / www. cs. wisc. edu/ condor/
[2] http:/ / www. cs. utk. edu/ ~plank/ ckp. html
[3] http:/ / www. freebsd. org/
[4] http:/ / www. openbsd. org/
[5] http:/ / www. opensource. apple. com/
[6] http:/ / directory. fsf. org/ all/ chpox. html
[7] http:/ / www. mosix. org/
[8] http:/ / www. kernel. org/
[9] http:/ / cryopid. berlios. de/
[10] http:/ / sourceforge. net/ projects/ cryopid2
[11] http:/ / sourceforge. net/ projects/ pupsp3
[12] http:/ / dmtcp. sourceforge. net
[13] http:/ / urdb. sourceforge. net
[14] http:/ / www. checkpointing. org/
Article Sources and Contributors 8

Article Sources and Contributors


Lamport timestamps  Source: http://en.wikipedia.org/w/index.php?oldid=382044605  Contributors: 4nT0, Ascorbic, CarlHewitt, ChrisCork, Fjzpwq1385, HappyCamper, Jadedcrypto, Jajmon,
Javifs, Jpbowen, Jrauser, Kbdank71, LLORT, Mattford63, Meghanac, Miym, Nadalle, Oyp, Raul654, Rpvdk, Ruud Koot, Svdb, TubularWorld, Utopianheaven, Who, Yworo, 17 anonymous edits

Matrix clocks  Source: http://en.wikipedia.org/w/index.php?oldid=333089990  Contributors: Dialectric, Marokwitz, Miym, Oyp, PamD, SarekOfVulcan, WaysToEscape

Vector clock  Source: http://en.wikipedia.org/w/index.php?oldid=383886140  Contributors: Andreas Kaufmann, Argv0, Codahale, Ej, Ericbg05, Finlay McWalter, Itistoday, Marokwitz,
MikeHearn, Nae'blis, Nitroshockwave, RHaworth, Rafat ahmad ali, Rchandra, Rjwilmsi, Rs-leo, Ruud Koot, Sander, Svdb, Utopianheaven, Xemal, 39 anonymous edits

Application checkpointing  Source: http://en.wikipedia.org/w/index.php?oldid=381031649  Contributors: Aktsu, Betacommand, Brossow, CharlesDexterWard, Cmh, Crazyvas, Dakart,
Debresser, Edward, Euchiasmus, Gazpacho, Grammaticus Repairo, Gregbard, Jerryobject, JonHarder, K001, Karya0, Khym Chanur, MathiasRav, Mortense, Schapel, Silly rabbit, Snowmanradio,
Svick, Szopen, TheParanoidOne, 51 anonymous edits
Image Sources, Licenses and Contributors 9

Image Sources, Licenses and Contributors


Image:Vector Clock.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Vector_Clock.svg  License: GNU Free Documentation License  Contributors: Nae'blis
License 10

License
Creative Commons Attribution-Share Alike 3.0 Unported
http:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/

You might also like