Although the OmniThreadLibrary treats communication as a superior approach to locking, there are still times when using “standard” synchronization primitives such as a critical section are unavoidable. As the standard Delphi/Windows approach to locking is low-level, OmniThreadLibrary builds on it and improves it in some significant ways. All these improvements are collected in the OtlSync unit and are described in the following sections. The only exception is the waitable value class/interface, which is declared in the OtlCommon unit.
This part of the book assumes that you have a basic understanding of locking. If you are new to the topic, you should first read the appropriate chapters from one of the books mentioned in the introduction.
OmniThreadLibrary simplifies sharing critical sections between a task owner and a task with the use of the
WithLock method. High-level tasks can access this method through the task configuration block.
I was always holding the opinion that locks should be as granular as possible. Putting many small locks around many unrelated pieces of code is better than using one giant lock for everything. However, programmers frequently use one or few locks because managing many critical sections can be a bother.
Delphi implements critical section support with a
TCriticalSection class which must be created and destroyed in the code. (There is also a
TRTLCriticalSection record, but it is only supported on Windows.) OmniThreadLibrary extends this implementation with an
IOmniCriticalSection interface, which you only have to create. The compiler will make sure that it is destroyed automatically at the appropriate place.
TCriticalSection internally8. It acts just as a proxy that calls
TCriticalSection functions. Besides that, it provides an additional functionality by counting the number of times a critical section has been acquired, which can help a lot while debugging. This counter can be read through the
TCriticalSection extension found in the OmniThreadLibrary is the
TOmniCS record. It allows you to use a critical section by declaring a record in an appropriate place.
TOmniCS, locking can be as simple as this:
TOmniCS is implemented as a record with one private field holding the
Release method merely calls the
Release method on the internal interface, while the
Acquire method is more tricky as it has to initialize the
ocsSync field first.
The initialization uses a global critical section to synchronize access to the code that should not be executed from two threads at once.
TOmniCS is a great simplification of the critical section concept, but it still requires you to declare a separate locking entity. If this locking entity is only used to synchronize access to a specific instance (being that an object, record, interface or even a simple type) it is often better to declare a variable/field of type
Locked<T> which combines any type with a critical section.
Locked<T>, the example from the
TOmniCS section can be rewritten as follows.
The interesting fact to notice is although the
lockedIntf is declared as a variable of type
Locked<IGpIntegerList>, it can be initialized and used as if it is of type
IGpIntegerList. This is accomplished by providing
Implicit operators for conversion from
T and back. Delphi compiler is (sadly) not smart enough to use this conversion operator in some cases so you would still sometimes have to use the provided
Value property. For example, you’d have to do it to release wrapped object. (In the example above we have wrapped an interface and the compiler itself handled the destruction.)
Besides the standard
Locked<T> also implements methods used for pessimistic locking, which is described later in this chapter, and two almost identical methods called
Locked which allow you to execute a code segment (a procedure, a method or an anonymous method) while the critical section is acquired. (In other words, you can be assured that the code passed to the
Locked method is always executed only once provided that all code in the program properly locks access to the shared variable.)
There is an alternative built into Delphi since 2009 which provides functionality similar to the
TMonitor. In modern Delphis, every object can be locked by using
System.TMonitor.Enter function and unlocked by using
System.TMonitor.Exit. The example above could be rewritten to use the
TMonitor with little work.
A reasonable question to ask is, therefore, why implementing
Locked<T>. Why is
TMonitor not good enough? There are plenty of reasons for that.
TMonitorwas buggy since its inception9,10 (although that was fixed few years later).
TMonitordoesn’t convey your intentions. Just by looking at the variable/field declaration you wouldn’t know that the entity is supposed to be used in a thread-safe manner. Using
Locked<T>, however, explicitly declares your intent.
Exitdoesn’t work with interfaces, records and primitive types.
On the positive size,
TMonitor is faster than a critical section.
A typical situation in a multi-threaded program is a multiple readers/exclusive writer scenario. It occurs when there are multiple reader threads which can operate on the same object simultaneously, but must be locked out when an exclusive writer thread wants to make changes to this object. Delphi already implements a synchronizer for this scenario (
TMultiReadExclusiveWriteSynchronizer11,12 from SysUtils), but it is quite a heavyweight object which you can use in many ways. For situations when the probability of collision13 is low and especially, when the object is not locked for a long period, a
TOmniMREW synchronizer will give you a better performance.
To use the
TOmniMREW synchronizer, a reader must call
EnterReadLock before reading the object and
ExitReadLock when it doesn’t need the object anymore. Similarly, a writer must call
TryEnterWriteLock [3.07.6] try to enter a read/write lock. If the lock cannot be acquired in
timeout_ms milliseconds, the functions return
I’d like to stress again the importance of not locking an object for a long time when using
Enter methods wait in a tight loop while waiting to get access, which can quickly use lots of CPU time if probability of collisions are high. (Collisions typically occur more often if an object is locked for extensive periods of time.)
Because of an optimized implementation that favours speed over safety, you’ll get a cryptic access violation error if the
TOmniMREW instance is destroyed while a read or write lock is taken. To be clear, this is a programming error; you should never destroy a synchronization object while it holds a lock. It’s just that the error displayed will not make it very clear what you are doing wrong.
For example, the following test code fragment will cause an access violation.
Sometimes you want to instruct background tasks to stop whatever they are doing and quit. Typically, this happens when the program is shutting down. Programs using the “standard” multi-threaded programming (i.e.
TThread) are solving this problem each in its own way, typically by using boolean flags or Windows events.
To make the task cancellation simpler and more standardized, OmniThreadLibrary introduces a cancellation token. A cancellation token is an instance of the
IOmniCancellationToken interface and implements functionality very similar to the Windows event synchronization primitive.
By default, a cancellation token is in a cleared (inactive) state. To signal it, a code calls the
Signal method. Signalled token can be cleared by calling the
The task can check the cancellation token’s state by calling the
IsSignalled method or by waiting (using
WaitForSingleObject or any of its variants) on the
Handle property. Wait will succeed when the cancellation token is signalled.
An important part of the cancellation token implementation is that the same token can be shared between multiple tasks. To cancel all tasks, the code must only call
Signal once (provided that other parts of the program don’t call
Cancellation tokens are used in low-level and high-level multi-threading. Low-level multi-threading uses the
CancelWith method to pass a multi-threading token around while the high-level multi-threading uses the task configuration block.
Cancellation is demonstrated in examples
The communication framework in the OmniThreadLibrary works asynchronously (you cannot know when a task or owner will receive and process the message). Most of the time that functions great, but sometimes you have to process messages synchronously (that is, you want to wait until the task processes the message) because otherwise the code gets too complicated. For those situations, OmniThreadLibrary offers a waitable value
TOmniWaitableValue, which is also exposed as an interface
The usage pattern is simple. The caller creates an object or interface of that type, sends it to another thread (typically via
Task.Comm.Send) and calls the
WaitFor method. The task receives the message, does the processing and calls
Signal to signal completion or
Signal(some_data) to signal completion and return data. At that point, the
WaitFor returns and caller can read the data from the
A practical example should clarify this explanation. The two methods below are taken from the OtlThreadPool unit.
When a code wants to cancel threadpooled task, it will call the
Cancel function. This function sends the
Cancel message to the worker task and passes along the ID of the task to be cancelled and a
TOmniWaitableValue object. Then it waits for the object to become signalled.
Cancel method in the worker task processes the message, does lots of complicated work (removed for clarity) and at the end calls the
Signal method on the
TOmniWaitableValue object to signal completion and return a boolean value.
Soon after the
Signal is called, the
WaitFor in the caller code exits and
TOmniThreadPool.Cancel retrieves result from the
A semaphore is a counting synchronisation object that starts at some value (typically greater than 0) which usually represents a number of available resources of some kind. To allocate a semaphore, one waits on it. If the semaphore count is greater than zero, the semaphore is signalled, wait will succeed and semaphore count gets decremented by one. [Of course, all of this occurs atomically.] If the semaphore count is zero, the semaphore is not signalled and wait will block until the timeout or until some other thread releases the semaphore, which increments the semaphore’s count and puts it into the signalled state.
While semaphores are implemented in the Windows kernel and Delphi wraps them in a pretty object
TSemaphore, Windows doesn’t support an useful variation on the theme – an inverse semaphore, also known as a countdown event.
Inverse semaphore differs from a normal semaphore by getting signalled when the count drops to zero. This allows another thread to execute a blocking wait that will succeed only when the semaphore’s count is zero. Why is that good, you’ll ask? Because it simplifies resource exhaustion detection. If you wait on an inverse semaphore and this semaphore becomes signalled, then you know that the resource is fully used.
The inverse semaphore is implemented by the
TOmniResourceCount class which implements an
Initial resource count is passed to the constructor or to the
Allocate will block if this count is zero (and will unblock automatically when the count becomes greater than zero); otherwise it will decrement the count. The new value of the resource count is returned as a function result.
TryAllocate is a safer version of
Allocate taking a timeout parameter (which may be set to
INFINITE) and returning success/fail status as a function result.
Release increments the count and unblocks waiting
Allocates. New resource count (potentially already incorrect at the moment caller sees it) is returned as the result.
Finally, there is a
Handle property exposing a handle which is signalled when resource count is zero and unsignalled otherwise.
Initializing an object in a multi-threaded world is not a problem – as long as the object is initialized before it is shared. To put this into a simple language – everything is fine if we can initialize object first and then pass it to multiple tasks.14
In most cases, this is not a problem, but sometimes we want to use a shared global object in multiple tasks. In that case, the first task that wants to use the object will have to create it. While this may look as a weird approach to programming, it is a legitimate programming pattern, called lazy initialization.
The reason behind this weirdness is that sometimes we don’t know in advance whether an object (or some part of a composite object) will be used at all. If the probability that the object will be used is low enough, it may be a good idea not to initialize it in advance, as that would take some time and use some memory (or maybe even lots of memory).
Additionally, there may not be a good place to call the initialization. A good example is the
TOmniCS record where we want to do an implicit initialization the first time an
Acquire method is called. As this record is usually just declared as a variable/field and not explicitly initialized, there is no better place to call the initialization code than from the
This part of the book will explain two well-known approaches to shared initialization – a pessimistic initialization and an optimistic initialization. There’s also a third approach – busy-wait – which you can read more about on my blog.
The difference between the two approaches is visible from the following pseudo-code.
An optimistic initializer assumes that there’s hardly a chance of initialization being called from two tasks at the same time. Under this assumption, it is fastest to initialize the object (in the code above, the initialization is represented by creation of the shared object) and then atomically copying this object into the shared field/variable. The (nonexisting)
AtomicallyTestAndStore method compares old value of
nil and stores
Shared is nil. It makes all this in a way that prevents the code from being executed from two threads at the same time. If the
AtomicallyTestAndStore fails (returns False), another task has already modified the
Shared variable and we must destroy the temporary resource.
The advantage of this approach is that there is no locking so we don’t have to create an additional critical section. Only CPU-level bus locking is used to implement the
AtomicallyTestAndStore. The disadvantage is that duplicate objects may be created at some point.
A pessimistic initializer assumes that there’s a significant probability of initialization being called from two tasks at the same time and uses an additional critical section to lock access to the initialization part. A test, lock, retest pattern is used for performance reason – the code first checks whether the shared object is initialized then (if it is not) locks the critical section and retests the shared object as another task could have initialized it in the meantime.
The advantage of this approach is that only a single object is created. The disadvantage is that we must manage additional critical section that will be used for locking.
It is unclear which approach is better. Although locking slows the application more than micro-locking, creating duplicate resources may slow it down even more. On the other hand, pessimistic initializer requires additional lock, but that won’t make much difference if you don’t create millions of shared objects. In most cases initialization code will be rarely called and the complexity of initializer will not change the program performance in any meaningful way so the choice of initializer will mainly be a matter of personal taste.
While pessimistic initialization doesn’t represent any problems for a skilled programmer, it is bothersome as we must manage an additional locking object. (Typically that will be a critical section.) To simplify the code and to make it more intentional, OmniThreadLibrary introduces a
Locked<T> type which wraps any type (the type of your shared object) and a critical section.
An instance of the
Locked<T> type contains two fields – one holding your data (
FValue) and another containing a critical section (
Locked<T> provides two helper functions (
Initialize) which implement the pessimistic initialization pattern.
The first version accepts a factory function which creates the object. The code implements the test, lock, retest pattern explained previously in this section.
Another version, implemented only in Delphi 2010 and newer, doesn’t require a factory function but calls the default (parameter-less) constructor. This is only possible if the
T type represents a class. Actually, this method simply calls the other version and provides a special factory method which travels the extended RTTI information, selects an appropriate constructor and executes it to create the shared object.
Locked<T> also implements methods
Release which use the built-in critical section to implement synchronization.
An optimistic initialization is supported with the
Atomic<T> class which is much simpler than the pessimistic
Locked<T>, there are two
Initialize functions, one creating the object using a user-provided factory function and another using RTTI to call the default parameter-less constructor. We’ll only examine the former.
The code first checks if the storage is already initialized by using a weird cast which assumes that the
T is pointer-sized. This is a safe assumption because
Atomic<T> only supports
T being a class or an interface.
Next the code checks whether the shared object and the temporary variable are properly aligned. This should in most cases not present a problem as all ‘normal’ fields (not stored in
packed record types) should always be appropriately aligned.
After that, the factory function is called to create an object.
InterlockedCompareExchangePointer is called. It takes three parameters – a destination address, an exchange data and a comparand. The functionality of the code can be represented by the following pseudo-code:
The trick here is that this code is all executed inside the CPU, atomically. The CPU ensures that the destination value is not modified (by another CPU) during the execution of the code. It is hard to understand (interlocked functions always make my mind twirl in circles) but basically it reduces to two scenarios:
nil(old, uninitialized value of
storageis set to the new object (
storageis not modified.
In yet another words –
InterlockedCompareExchangePointer either stores the new value in the
storage and returns
nil or does nothing, leaves already initialized
storage intact and returns something else than
At the end, the code handles two specific cases. If a
T is an interface type and initialization was successful, the temporary value in
tmpT must be replaced with
nil. Otherwise two variables (
tmpT) would own an interface with a reference count of 1 which would cause big problems.
T is a class type and initialization was not successful, the temporary value stored in
tmpT must be destroyed.
Initialize returns the same shared object twice – once in the
storage parameter and once as the function result. This allows us to write space-efficient initializers like in the example below, taken from the OtlParallel unit.
When you are initializing an interface and a new instance of the implementing object is created by calling the default constructor
Create, you can use the two-parameter version of
Atomic to simplify the code. [3.06] This is only supported in Delphi XE and newer.
For example, if the shared object is stored in
shared: IMyInterface and is created by calling
TMyInterface.Create, you can initialize it via:
A common scenario in parallel programming is that the program has to wait for something to happen. The occurrence of that something is usually signalled with an event.
On Windows, this is usually accomplished by calling one of the functions from the
WaitForMultipleObjects family. While they are powerful and quite simple to use, they also have a big limitation – one can only wait for up to 64 events at the same time.
Windows also offers a
RegisterWaitForSingleObject API call which can be used to circumvent this limitation. Its use is, however, quite complicated to use. To simplify programmer’s life, OmniThreadLibrary introduces a
TWaitFor class which allows the code to wait for any number of events.
TWaitFor, create an instance of this class and pass it an array of handles either as a constructor parameter or by calling the
SetHandles method. All handles must be created with the
CreateEvent Windows function.
You can then wait for any (
WaitAny) or all (
WaitAll) events to become signalled. In both cases the
Signalled array is filled with information about signalled (set) events. The
Signalled property is an array of
THandleInfo records, each of which only contains one field - an index (into the
handles array) of the signalled event.
For example, if you want to wait for two events and then react to them, use the following approach:
You don’t have to recreate
TWaitFor for each wait operation; it is perfectly ok to call
WaitXXX functions repeatedly on the same object. It is also fine to change the array of handles between two
WaitXXX calls by calling the
WaitAny method also comes in a variant which processes Windows messages, I/O completion routines and APC calls (
flags parameters are the same as the corresponding parameters to the
The use of the
TWaitFor is shown in demo
TOmniLockManager<K> class solves a specific problem – how to synchronize access to entities of any type. It is similar to
TMonitor, except that it works on all types, not just on objects.
Following requirements are implemented in the
TMonitor.Enter/Exit. The code calls
Lockto get exclusive access to a key and calls
Unlockto release the key back to the public use.
Unlockcalls in one thread matches the number of
Lockcalls, the key is unlocked. (In other words, if you call
Locktwice with the same key, you also have to call
Unlocktwice to release that key.)
TOmniLockManager<K> public class implements the
Lock function returns
False if it does not lock the key in the specified timeout. Timeouts 0 and
INFINITE are supported.
There’s also a
LockUnlock function which returns an interface that automatically unlocks the key when it is released. This interface also implements an
Unlock function which unlocks the key.
A practical example of using the lock manager is shown in demo
For debugging purposes, OmniThreadLibrary implements the
TOmniSingleThreadUseChecker record. It gives the programmer a simple way to make sure that some code is always executed from the same thread.
Using it is simple – first declare a variable/field of type
TOmniSingleThreadUseChecker in a context that has to be checked and then call
DebugCheck method of that variable whenever you want to check that some part of code was not used from more than one thread.
The difference between
DebugCheck is that the latter can be disabled during the compilation. It implements the check only if the conditional symbol
OTL_CheckThreadSafety is defined. Otherwise,
DebugCheck contains no code and does not affect the execution speed.
In cases where you use such an object from more than one thread (for example, if you use it from a task and then from the task controller after the task terminates) you can call
AttachToCurrentThread to associate the checker with the current thread.