Articles: Lamport Timestamps 1 Matrix Clocks 3 Vector Clock 3 Application Checkpointing 5
Articles: Lamport Timestamps 1 Matrix Clocks 3 Vector Clock 3 Application Checkpointing 5
Articles
Lamport timestamps 1
Matrix clocks 3
Vector clock 3
Application checkpointing 5
References
Article Sources and Contributors 8
Image Sources, Licenses and Contributors 9
Article Licenses
License 10
Lamport timestamps 1
Lamport timestamps
Determining the order of events is difficult in a distributed computer system, since different nodes or processes will
typically not be perfectly synchronized. Lamport timestamps is a simple algorithm to provide a partial ordering of
events with minimal overhead, and conceptually provide a starting point for the more advanced Vector clock
method.
Distributed algorithms such as resource synchronization often depend on some method of ordering events to
function. For example, consider a system with two processes and a disk. The processes send messages to each other,
and also send messages to the disk requesting access. The disk grants access in the order the messages were sent.
Now, imagine process 1 sends a message to the disk asking for access to write, and then sends a message to process
2 asking it to read. Process 2 receives the message, and as a result sends its own message to the disk. Now, due to
some timing delay, the disk receives both messages at the same time: how does it determine which message
happened-before the other? A logical clock algorithm provides a mechanism to determine facts about the order of
such events.
Leslie Lamport invented a simple mechanism by which the happened-before ordering can be captured numerically.
A Lamport logical clock is a monotonically incrementing software counter maintained in each process.
It follows some simple rules:
1. A process increments its counter before each event in that process;
2. When a process sends a message, it includes its counter value with the message;
3. On receiving a message, the receiver process sets its counter to be greater than the maximum of its own value and
the received value before it considers the message received.
Conceptually, this logical clock can be thought of as a clock that only has meaning in relation to messages moving
between processes. When a process receives a message, it resynchronizes its logical clock with that sender.
Considerations: For every two events a and b occurring in the same process, and C(x) being the timestamp for a
certain event x, it is necessary that C(a) never equal C(b).
Therefore it is necessary that:
1. The logical clock be set so that there is minimum of one clock "tick" (increment of the counter) between events a
and b;
2. In a multiprocess or multithreaded environment, it might be necessary to attach the process ID (PID) or any other
unique ID to the timestamp so that it is possible to differentiate between events a and b which may occur
simultaneously in different processes.
Implications: A Lamport clock may be used to create a partial causal ordering of events between processes. Given a
logical clock following these rules, the following relation is true: if then , where means
"happened before."
This relation only goes one way, and is called clock consistency condition: if one event comes before another, then
that event's logical clock comes before the other's. The strong clock consitency condition, which is two way, can be
obtained by other techniques such as vector clocks. Using only a simple Lamport clock, only a partial causal
ordering can be inferred from the clock.
However, via the contrapositive, it's true that implies . So, for example, if
then a cannot have occurred before b.
Another way of putting this is that means that a may have happened before or at the same time as b,
but a did not happen after b.
Nevertheless, Lamport timestamps can be used to create a total ordering of events in a distributed system by using
some arbitrary mechanism to break ties (e.g. the ID of the process). The caveat is that this ordering is artifactual and
Lamport timestamps 2
See also
• Vector clocks
References
• Leslie Lamport (1978). "Time, clocks, and the ordering of events in a distributed system" [1]. Communications of
the ACM 21 (7): 558–565. doi:10.1145/359545.359563.
References
[1] http:/ / research. microsoft. com/ users/ lamport/ pubs/ time-clocks. pdf
Matrix clocks 3
Matrix clocks
A matrix clock is a mechanism for capturing chronological and causal relationships in a distributed system.
Matrix clocks are a generalization of the notion of vector clocks.[1] A matrix clock maintains a vector of the vector
clocks for each communicating host.
Every time a message is exchanged, the sending host sends not only what it knows about the global state of time, but
also the state of time that it received from other hosts.
This allows establishing a lower bound on what another hosts know, and is useful in applications such as
checkpointing and garbage collection.
References
[1] Drummond, Lúcia M. A.; Barbosa, Valmir C. (2003). "On reducing the complexity of matrix clocks". Parallel Computing 29 (7): 895–905.
doi:10.1016/S0167-8191(03)00066-8.
Vector clock
Vector clocks is an algorithm for generating a partial ordering of events in a distributed system and detecting
causality violations. Just as in Lamport timestamps, interprocess messages contain the state of the sending process's
logical clock. A vector clock of a system of N processes is an array/vector of N logical clocks, one clock per process;
a local "smallest possible values" copy of the global clock-array is kept in each process, with the following rules for
clock updates:
• Initially all clocks are zero.
• Each time a process experiences an
internal event, it increments its own
logical clock in the vector by one.
• Each time a process prepares to send
a message, it increments its own
logical clock in the vector by one
and then sends its entire vector along
with the message being sent.
• Each time a process receives a
message, it increments its own
logical clock in the vector by one Example of a system of vector clocks
and updates each element in its
vector by taking the maximum of the value in its own vector clock and the value in the vector in the received
message (for every element).
The vector clocks algorithm was independently developed by Colin Fidge [1] and Friedemann Mattern in 1988. [2] [3]
Vector clock 4
Other mechanisms
• Torres-Rojas and Ahamad, developed in 1999 Plausible Clocks[4] , a mechanism that takes less space than vector
clocks but that, in some cases, will totally order events that are causally concurrent.
• Almeida et al, introduced in 2008 Interval Tree Clocks[5] . This mechanism generalizes Vector Clocks and allows
operation in dynamic environments when the identities and number of processes in the computation is not known
in advance.
See also
• Lamport timestamps
• Version vector
• Matrix clocks
References
[1] http:/ / sky. scitech. qut. edu. au/ ~fidgec/
[2] Colin J. Fidge (February 1988). "Timestamps in Message-Passing Systems That Preserve the Partial Ordering" (http:/ / sky. scitech. qut. edu.
au/ ~fidgec/ Publications/ fidge88a. pdf). In K. Raymond (Ed.). . pp. 56–66. . Retrieved 2009-02-13.
[3] Mattern, F. (October 1988), "Virtual Time and Global States of Distributed Systems", in Cosnard, M., Proc. Workshop on Parallel and
Distributed Algorithms, Chateau de Bonas, France: Elsevier, pp. 215–226
[4] Torres-Rojas, Francisco; Ahamad, Mustaque (1999), "Plausible clocks: constant size logical clocks for distributed systems", Distributed
Computing (Springer Verlag) 12 (4): 179–195, doi:10.1007/s004460050065
[5] Almeida, Paulo; Baquero, Carlos; Fonte, Victor (2008), "Interval Tree Clocks: A Logical Clock for Dynamic Systems" (http:/ / gsd. di.
uminho. pt/ members/ cbm/ ps/ itc2008. pdf), Principles of Distributed Systems, 5401, Springer-Verlag, Lecture Notes in Computer Science,
pp. 259–274, doi:10.1007/978-3-540-92221-6,
Vector clock 5
External links
• Vector clock implementation in Erlang (http://bitbucket.org/justin/riak/src/tip/apps/riak/src/vclock.erl)
• Timestamp-based vector clock implementation in Erlang (http://github.com/cliffmoon/dynomite/blob/master/
elibs/vector_clock.erl)
• Explanation of Vector clocks (http://blog.basho.com/2010/01/29/why-vector-clocks-are-easy/)
• Fundamentals of Distributed Computing: A Practical Tour of Vector Clock Systems (http://net.pku.edu.cn/
~course/cs501/2008/reading/a_tour_vc.html)
• Why Vector Clocks are Hard (http://blog.basho.com/2010/04/05/why-vector-clocks-are-hard/)
Application checkpointing
Checkpointing is a technique for inserting fault tolerance into computing systems. It basically consists of storing a
snapshot of the current application state, and later on, use it for restarting the execution in case of failure.
Technique properties
There are many different points of view and techniques for achieving application checkpointing. Depending on the
specific implementation, a tool can be classified as having several properties:
• Amount of state saved: This property refers to the abstraction level used by the technique to analyze an
application. It can range from seeing each application as a black box, hence storing all application data, to
selecting specific relevant cores of data in order to achieve a more efficient and portable operation.
• Automatization level: Depending on the effort needed to achieve fault tolerance through the use of a specific
checkpointing solution.
• Portability: Whether or not the saved state can be used on different machines to restart the application.
• System architecture: How is the checkpointing technique implemented: inside a library, by the compiler or at
operating system level.
Each design decision made affects the properties and efficiency of the final product. For instance, deciding to store
the entire application state will allow for a more straightforward implementation, since no analysis of the application
will be needed, but it will deny the portability of the generated state files, due to a number of non-portable structures
(such as application stack or heap) being stored along with application data.
In the coordinated checkpointing approach, processes must ensure that their checkpoints are consistent. This is
usually achieved by some kind of two-phase commit protocol algorithm. In communication induced checkpointing,
each process checkpoints its own state independently whenever this state is exposed to other processes (that is, for
example whenever a remote process reads the page written to by the local process).
The system state may be saved either locally, in stable storage, or in a distant node's memory.
Cryopid
Modern checkpointing packages such as Cryopid are capable of checkpointing a process pod, that is a parent
process and all its associated children, and of dealing with file system abstractions such as sockets and pipes (FIFO's)
in addition to regular files. In the case of Cryopid, there is also provision to roll all dynamic libraries, open files,
sockets and FIFO's associated with the process into the checkpoint. This is very useful when the checkpointed
process is to be restarted in a heterogeneous environment (e.g. the machine on which the checkpoint is restarted has
libraries and file system which differ from the host on which the process was checkpointed). Note that Cryopid [9]
seems to be unmaintained and does not compile on recent (2.6.22) Linux x86-64 kernels.
Note Cryopid is still maintained and is now available via the SourceForge project Cryopid2 [10]. This version of
Cryopid will compile on all Linux kernels up to 2.6.27 for 32-bit kernels. Work is in hand to get Cryopid2 working
on 64-bit kernels. The cryopid2 package extends Benard Blackhams original Cryopid package in a number of
significant ways. For example, it allows the state of Linux real time signals to be preserved when a checkpoint is
taken and also is capable of inter-operating with ssh via a portal daemon in order to implement full process
migration (and of any associated pod processes) between Linux hosts. Cryopid2 also has the capability to roll up its
environment (e.g. the bodies of open files) into the checkpoints it produces. This facilitates the migration of
processes onto foreign hosts which present an arbitrary file system environment to an inbound migrating process.
Pipes are also preserved in a similar manner: their contents are sucked into the migrating process prior to migration
(so they form part of the checkopoint) and spat out into the kernel of the new host when execution is resumed.
Cryopid2 is inter-operable with the P3 [11] Organic computing environment which uses its services for both
persistence and process migration.
DMTCP
DMTCP [12] (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpointing the state of an
arbitrary group of programs spread across many machines and connected by sockets. It does not modify the user's
program nor the operating system.
Among the applications supported by DMTCP are Open MPI, MATLAB, Python, Perl, and many programming
languages and shell scripting languages. With the use of TightVNC, it can also checkpoint and restart X Window
applications, as long as they do not use extensions (e.g.: no OpenGL, no video). Among the Linux features supported
by DMTCP are open file descriptors, pipes, sockets, signal handlers, process id and thread id virtualization (ensure
old pids and tids continue to work upon restart), ptys, fifos, process group ids, session ids, terminal attributes, and
Application checkpointing 7
mmap/mprotect (including mmap-based shared memory). See the QUICK-START file of the distribution for further
details.
DMTCP is also the basis for URDB [13], the Universal Reversible Debugger. URDB is still experimental.
Nevertheless, it currently adds reversibility to gdb, MATLAB, Python (pdb), and Perl (perl -d). It also supports
reverse expression watchpoints, a form of temporal search within a process lifetime.
OpenVz
OpenVZ kernel has an ability to checkpoint and restart a virtual private server (VPS), i.e. a set of processes and all
the data structures associated with those processes (opened files, sockets, IPC objects, network connections, etc.).
The primary use of checkpointing is "live migration", a move of a VPS from one physical server to another without a
need to shut down and restart it. OpenVZ supports checkpointing on x86, x86-64 and IA-64 architectures.
References
• E.N. Elnozahy, L. Alvisi, Y-M. Wang, and D.B. Johnson, "A survey of rollback-recovery protocols in
message-passing systems", ACM Comput. Surv., vol. 34, no. 3, pp. 375-408, 2002.
• The Home of Checkpointing Packages [14]
• Yibei Ling, Jie Mi, Xiaola Lin: A Variational Calculus Approach to Optimal Checkpoint Placement. IEEE Trans.
Computers 50(7): 699-708 (2001)
• R.E. Ahmed, R.C. Frazier, and P.N. Marinos, " Cache-Aided Rollback Error Recovery (CARER) Algorithms for
Shared-Memory Multiprocessor Systems", IEEE 20th International Symposium on Fault-Tolerant Computing
(FTCS-20), Newcastle upon Tyne, UK, June 26-28, 1990, pp. 82-88.
[1] http:/ / www. cs. wisc. edu/ condor/
[2] http:/ / www. cs. utk. edu/ ~plank/ ckp. html
[3] http:/ / www. freebsd. org/
[4] http:/ / www. openbsd. org/
[5] http:/ / www. opensource. apple. com/
[6] http:/ / directory. fsf. org/ all/ chpox. html
[7] http:/ / www. mosix. org/
[8] http:/ / www. kernel. org/
[9] http:/ / cryopid. berlios. de/
[10] http:/ / sourceforge. net/ projects/ cryopid2
[11] http:/ / sourceforge. net/ projects/ pupsp3
[12] http:/ / dmtcp. sourceforge. net
[13] http:/ / urdb. sourceforge. net
[14] http:/ / www. checkpointing. org/
Article Sources and Contributors 8
Matrix clocks Source: http://en.wikipedia.org/w/index.php?oldid=333089990 Contributors: Dialectric, Marokwitz, Miym, Oyp, PamD, SarekOfVulcan, WaysToEscape
Vector clock Source: http://en.wikipedia.org/w/index.php?oldid=383886140 Contributors: Andreas Kaufmann, Argv0, Codahale, Ej, Ericbg05, Finlay McWalter, Itistoday, Marokwitz,
MikeHearn, Nae'blis, Nitroshockwave, RHaworth, Rafat ahmad ali, Rchandra, Rjwilmsi, Rs-leo, Ruud Koot, Sander, Svdb, Utopianheaven, Xemal, 39 anonymous edits
Application checkpointing Source: http://en.wikipedia.org/w/index.php?oldid=381031649 Contributors: Aktsu, Betacommand, Brossow, CharlesDexterWard, Cmh, Crazyvas, Dakart,
Debresser, Edward, Euchiasmus, Gazpacho, Grammaticus Repairo, Gregbard, Jerryobject, JonHarder, K001, Karya0, Khym Chanur, MathiasRav, Mortense, Schapel, Silly rabbit, Snowmanradio,
Svick, Szopen, TheParanoidOne, 51 anonymous edits
Image Sources, Licenses and Contributors 9
License
Creative Commons Attribution-Share Alike 3.0 Unported
http:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/