Interlocked operations were originally conceived as a low level synchronization mechanism for shared memory symmetric multiprocessor systems. In multiprocessor systems, shared memory is an extremely efficient way to transfer data between processes and threads. A way had to be found to prevent atomicity problems when two or more processors tried to use the same piece of memory. Almost all processors introduced recently support interlocked operations to allow this. These are operations whereby a processor can read a value from memory, modify it and then write it back atomically, whilst ensuring that no other processors access the same memory, and the processor performing the operation is not interrupted. Win32 provides the following interlocked operations:
Overhead is reduced because just a couple of CPU instructions are required to enter and leave the lock, provided a thread does not have to wait. If threads have to wait for any appreciable time, then CPU is wasted, so they are only useful for implementing small critical sections. Spin locks are useful when enforcing critical sections that are themselves part of synchronization structures. Shared data inside synchronization primitives or schedulers is often protected by locks of this sort: the locks are often necessary because OS level synchronization primitives cannot be used to implement OS level synchronization primitives. Spin locks have all the same concurrency problems as mutexes, with the particular quirk that cyclic acquisition results not in deadlock, but in livelock. This is a slightly worse situation than deadlock because although the "blocked" threads are not executing any useful code, they are running around an infinite loop, using up CPU and degrading the performance of the entire system. Spin locks should not be used as semaphores to "suspend" a thread.
The important thing about this set of operations is that, given certain provisos, a thread switch can occur at any point in time, and this still remains thread safe. The first increment of the lock is a straight register indirect increment. The value is always in memory, and the increment is atomic. We then read the value of the lock into a local vairable. This is not atomic. The value read in to the local variable may be different from the result of the increment. However, the really cunning thing about this is that because the increment is performed before the read operation, thread conflicts that occur will always mean that the value read is too high instead of too low: thread conflicts result in a conservative estimate of whether the lock is free.
It is often useful to write operations like this in assembler, so as to be totally sure that the correct values are being left in memory, and not cached in registers. As it turns out, under Delphi 4 at least, by passing the lock as a var parameter, and including the local variable, the Delphi compiler generates correct code which will work on uniprocessor machines. On multiprocessor machines, register indirect increments and decrements are not atomic. This has been solved in the hand coded assembler version by adding the lock prefix in front of instructions that manipulate the lock. This prefix instructs a processor to lock the memory bus exclusively for the entire duration of the instruction, thus making these operations atomic.
The bad news is that although this is correct in theory, the Win32 virtual machine does not allow user level processes to execute instructions with the lock prefix. Programmers intending to actually use this mechanism should use it only in code with ring 0 privileges. Another problem is that since this version of the spin lock does not call Sleep, it is possible for threads to monopolise the processor whilst waiting for the lock, something that is guaranteed to bring the machine to a grinding halt.
MessageWaits. When Delphi applications are waiting for threads to exit, the main VCL thread is permanently blocked. This is a potentially problematic situation, because the VCL thread cannot process messages. Win32 provides a MsgWaitForMultipleObjects function to get around this. A thread performing a message wait is blocked either until the synchronization objects become signalled, or a message is placed in the threads message queue. This means that you can get the main VCL thread in an application to wait for worker threads whilst also allowing it to respond to windows messages. A good article on the subject can be found at: http://www.midnightbeach.com/jon/pubs/MsgWaits/MsgWaits.html.
Signal and wait. Windows NT and Win2K allow a program to atomically signal one synchronization object and wait on another. This gets around situations where deadlock problems exist if too many synchronization objects are acquired, but releasing locks in order to acquire other locks leaves holes in the locking scheme. Readers are advised to recall chapter 7, and in particular the problem encountered when trying to lock multiple objects in the correct order. Use of an atomic signal and wait removes the need for optimistic concurrency control in such situations.
© Martin Harvey
2000.