We always strive for performance, meaning that the compiler will apply tons of modifications to your code to make it fast. These can really get in the way of correctness, or at least of what your think should happen. We’ve seen an example of this in the context of standard concurrency (the double-checked locking problem) and of lockfree programming (the ABA bug).
The Silently Shifting Semicolon is a research article whose introduction
section has a great description of certain issues with correctness and performance. The
article is
authored by D. Marino, T. Millstein, M. Musuvathi, S. Narayanasamy,
and A. Singh, and was published in the 1st Summit on Advances in Programming
Languages, SNAPL 2015, May 3-6, 2015, Asilomar, California, USA
The introduction proceeds as a conversation between a CS
student and a CS professor. Below I quote/paraphrase/explain some of the main
points of the introduction section of The Silently Shifting Semicolon.
It should be understood that every statement below references that article and is
not my own. The writing in the article is great and very entertaining, and
my summarizing of it below does not do it justice. I highly recommend you read the original,
but in class we will go through my summary only.
Off to a good start
Alice is a bright and enthusiastic freshman in a programming languages (PL) course and
has learned that “Modern programming languages enforce simple yet powerful
abstractions that hide the complexities of the machine and make programming
easier and safer”. One of the fundamental concepts is
sequential composition:
Soon after, Alice learns the parallel composition concept:
In the above, C can be done at any time, but B must follow A.
Alice writes her first parallel program in Java:
The problem is that once in a while, Alice’s program throws a null-pointer
exception!!
Alice is baffled and talks to her professor:
Alice: This program sometimes gives me a null-pointer exception, can you help?
Prof: Easy, there is a bug in your program: You are assuming that A is always executed before B.
Alice: A bug? You taught me that A;B means A followed by B!
Prof: You see Alice, the compiler reorders instructions and, at runtime, the hardware can also execute instructions out of order, all for optimization reasons. So sometimes B is executed before A.
Like Alice at this point you should really feel that your entire world
has collapsed and that you can no longer write code!
Alice: Optimizations? Caches? but, programming languages are supposed to hide all these details from me.
Prof: They are supposed to, and they do. The problem is that you have a data race in your program.
Alice: A what?
Prof: Let me explain. In a multithreaded program, there
exists a partial order among instructions called the happens-before
relation. A program is said to have a data race if two instructions access
the same memory location, one of them is a write, and they are not ordered
by the happens-before relation.
At this point Alice needs to think for a while and then:
Alice: …I think I got it. I just need to ensure that a happens-before
ordering exists between two writes to the same variable, and between a
write and a read. Then I will not have data races. Which means I can
operate under the illusion that semicolons represent sequential
composition: A; B will mean A and then B.
Prof: Exactly Alice! You nailed it.
Alice: but… doesn’t this mean that Java is unsafe, since it does
not protect the abstraction of the semicolon? If I inadvertently introduce
a data race then my program goes haywire in unpredictable ways.
Prof: You are right…. but… protecting the semantics of semicolons would make programs run too
slowly.
Alice: How slowly?
Prof: We don’t know exactly.
Experiments suggest an average overhead of 26% for a set of Java benchmarks…
Alice: 26%??? That seems entirely acceptable to me if that ensures correctness!
Prof: Yes, but Programming Language design is a
delicate balance between programmability and performance. We have to ensure
that the utility of an abstraction justifies its cost. For instance, taking
the effort to protect semicolons will not eliminate the possibility of
concurrency errors. You need to avoid data races anyway…
Alice:… So, my program is just broken?
Prof: Your program is correct in a language where semicolons means sequential composition,
which is not quite the case in Java. But, it will all work if you
declared your cooked variable as volatile.
It turns out that since Java 5, accesses to a volatile variable CANNOT be reordered
by the JVM. Therefore, another use of volatile is to guarantee the happens-before
relationship (see the “Java Threads, Synchronization”
module!).
Alice: Vola-what??? So, you want
me to mark every data race in my program with this volatile annotation.
sigh I guess this is all fine as long as there are tools to help
novice programmers like me to detect and remove data races.
Prof: You wish. This is still actively being researched.
Alice: So… with no tool support, I have
to always assume that my programs contain data races. G-r-e-a-t. So,
really, what does the semicolon mean?
Prof: Attempts have been made to formalize its semantic and to
ensure some basic sanity properties, but doing
so while safely allowing the optimizations we desire is still an open
problem.
Alice: Ouch! How in the world are you programming with these
semicolons then when you don’t even know what they mean?
Prof: Luckily, it doesn’t seem to matter in practice.
Programs just seem to run ok even though there is the potential for
unpredictable behavior!
Alice: What? I can’t believe this! You taught me that modern programming languages enforce
simple and powerful abstractions, but apparently performance trumps
everything else and we’re in a madhouse. I quit! I was a double-major anyway!
The moral of the story is that “achieving sane semantics in Java requires programmers to meticulously avoid data races with little language support for doing so.” The Silently Shifting Semicolon article then proposes that
“programming languages should guarantee sequential composition, and it should not be possible for programs to expose the effects of compiler and hardware reorderings.”
If this were the case, then Alice would never have quit the major :)