I recently diagnosed the root cause of a concurrency bug, CR6822370,
and thought it sufficiently interesting to share the details. (CR 6822370 actually represents a
cluster of bugs that are now thought to be related by a common underlying issue).
Briefly, we have a lost wakeup bug in the native C++ Parker::park() platform-specific
infrastructure code that implements java.util.concurrent.LockSupport.park().
The lost wakeup arises from a race that itself arises because of architectural
reordering that in turn occurs because of missing memory barrier instructions.
The lost wakeup may manifest as various 'hangs' or instances of progress failure.
In an earlier post I mentioned that one goal of the new introductory curriculum at Carnegie Mellon is to teach parallelism as the general case of computing, rather than an esoteric, specialized subject for advanced students. Many people are incredulous when I tell them this, because it immediately conjures in their mind the myriad complexities…
H. Abdullah, M. Rinne, S. Törmä, and E. Nuutila. Proceedings of the 27th Annual ACM Symposium on Applied Computing, page 372--377. New York, NY, USA, ACM, (2012)
S. Abramsky. Electronic Notes in Theoretical Computer Science, (2006)Proceedings of the Workshop "Essays on Algebraic Process Calculi" (APC 25)Proceedings of the Workshop "Essays on Algebraic Process Calculi" (APC 25).
B. Adams, and K. Schutter. Proc. of the 5th Software-Engineering Properties of Languages and Aspect Technologies Workshop (SPLAT), AOSD 2007, New York, NY, USA, ACM, (2007)
V. Adve, C. Lattner, M. Brukman, A. Shukla, and B. Gaeke. MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, page 205. Washington, DC, USA, IEEE Computer Society, (2003)
F. Afonso, C. Silva, S. Montenegro, and A. Tavares. ACP4IS '07: Proceedings of the 6th workshop on Aspects, components,
and patterns for infrastructure software, page 1. New York, NY, USA, ACM, (2007)