We talked about optimistic concurrency
previously, more specifically, in the context of StampedLock
tryOptimisticRead().
In this post we're going to give a few more examples of what can go wrong when using
tryOptimisticRead() and optimistic concurrency in general.
Don't get me wrong, I think that the fact that there is an RW-Lock in Java that provides this kind of API is a great addition, it's just that most developers are not aware of the dangers of using this API.
I personally find optimistic concurrency extremely useful when used in a highly controlled environment, i.e. inside the code of a data structure that you designed and implemented and that nobody else is going to touch. This is unlikely if you work for a large organization (like myself)
where your code is public and there will be other people changing it over time, and that's problematic for optimistic concurrency because it is very hard to reason about, which means it is very easy to make a small change that causes a nasty bug.
One of the arguments that the users of optimistic concurrency give (at least in Java) is that you can catch all kinds of exceptions and retry the operation in case an exception is thrown. This has its problems because the code that is being called inside
tryOptimisticRead()/validate() may actually throw exceptions as part of its normal behavior, and then you don't know how to distinguish between an "expected" exception, from an "unexpected" exception caused by some weird re-ordering of code:
double exmaple1() { // A read-only (optimistic) method
long stamp = sl.tryOptimisticRead();
Result result;
try {
result = mayThrowAnException(); // throwing is a "normal" behavior
} catch (Exception e) {
// Ooops an exception was thrown... should I recompute result or not?
}
if (!sl.validate(stamp)) {
stamp = sl.readLock();
try {
result = mayThrowAnException();
} finally {
sl.unlockRead(stamp);
}
}
return result;
}
But let's say it's not a completely generic piece of code and you have some control over it. Then, there is still the issue that some re-ordering will cause the code to enter an infinite loop. Now what?
You can have some variables reordering that would cause an infinite loop, like on the following example, where
x starts at 0 and
y starts at 1 and they are incremented each on another thread inside the write-lock. The order of reading (or writing) of
x and
y can be re-ordered and the variable delta ends up being
zero and the
for() loop goes into an infinite cycle.
void incrementXY() { // an exclusively locked method
long stamp = sl.writeLock();
try {
sharedX++; // Starts at zero. Can be re-ordered with increment below
sharedY++; // Starts at one. Can be re-ordered with increment above
} finally {
sl.unlockWrite(stamp);
}
}
void exmaple2() { // A read-only (optimistic) method
long stamp = sl.tryOptimisticRead();
long x = sharedX; // Starts at 0
long y = sharedY; // Starts at 1
long delta = y - x; // Ooops, this can be zero, or negative, or whatever
for (i = 0; i < 1000; i+=delta) {
doSomeReading(); // To infinity and beyond!
}
if (!sl.validate(stamp)) {
stamp = sl.readLock();
try {
doSomeReading();
} finally {
sl.unlockRead(stamp);
}
}
}
Notice that using a regular read-lock would mean that
delta would
always be 1 and there would be no infinite loop.
Well, to fix this, you can wait for a certain timeout. This raises the question of how long to wait for, and does it depend on the machine you're running on, and you went through all this trouble because someone told you that
tryOptimisticRead()/validate() was faster than using
readLock()/unlockRead(), but your code ends up having to wait for some randomly chosen (hard-coded) timer to expire... and you think
that is going to be faster? .... yeah, right.
Ok, but let's say you're willing to take the hit and add in your code the boilerplate to handle all the exceptions, and to timeout if the read-only critical section goes into an infinite loop (which are both a major pain and it would make my eyes bleed to write such code, so I'm not going to), after all this hurdle, you've got safe optimistic read-only code... right?
... nope, not really.
Suppose in the example above the
doSomeReading() method is actually allocating memory.
And suppose that there is no re-ordering but the thread calling
example3() goes to sleep after reading
sharedX, and in the meantime,
sharedY is incremented multiple times... like one million times:
double example3() { // A read-only (optimistic) method
long stamp = sl.tryOptimisticRead();
long x = sharedX; // Reads this to be 1
// Thread sleeps here
long y = sharedY; // Reads this to be 1 million
long delta = y - x; // oops, this can be a really large number
Widget array[] = new Widget[delta]; // Suppose there is enough memory for this
for (i = 0; i < delta; i++) {
array[i] = new Widget(); // There might not be enough memory for 1M Widget instances... out of memory?
}
if (!sl.validate(stamp)) {
stamp = sl.readLock();
try {
result = someReadOnlyFunctionThatThrowsAnException();
} finally {
sl.unlockRead(stamp);
}
}
return result;
}
By allocating a large number of Widget instances, you've just consumed all the memory in the JVM (or the system), and although you're catching exceptions and abort and then the GC will cleanup this mess, it might be too late, because now you may have failures in other threads of this application (or even in another applications running on the same JVM) because there is not enough memory to allocate anything...
your whole app goes down in a blaze of glory, maybe bringing down with it all the other apps in the JVM, and in an extreme situation, cause the OS to go down as well (unlikely in the case of Java/JVM, but not so unlikely in other languages that have a GC but run natively, like the D language).
As far as I know, there is no way to protect efficiently against this issue, but if you know a way, please use the comments below this post ;)
In case you think this example is contrived (it is), there are examples of this kind of thing happening in production systems where optimistic concurrency is being used, like on this presentation at CppCon last year by Brent Hall.
In this particular case, it was a transactional memory with optimistic reads that had an issue very similar to the example 3 above.
And the biggest reason of all not to use optimistic concurrency is: none of the above!
It's
reasoning.
Concurrency is hard, everybody knows that. The whole point of doing things like immutability and (pessimistic) transactional memory and having linearizable data structures is to have something that is easy(er) to reason about.
Optimistic concurrency requires us to reason in terms of Relaxed Consistency (like relaxed atomics in the C++ Memory Model), and our pitiful linear brains are not made for that, and anyone saying otherwise is either a genius (I mean, like above Einstein level) or has no idea what they're talking about.
Most software developers work in groups (companys, startups, university, research labs, whatever) and even if
you can reason about such code, it is unlikely that everyone else in your team will be able to understand it and change it in the future.
You have to ask yourself these questions:
Even if the code does what you think it does, how are you going to convince the rest of the team of the correctness of your code?
Is everyone in your team (i.e. the weakest team member) able to understand this code well enough that they can safely change it?
Unless the code is extremely simple and your team is made of a group of outstanding individuals, this is unlikely.
But hey, it's your code, so do what you think is best ;)