1. Introduction to multi-threading

You may have heard about threads and multi-threading already, but you may not exactly know what those words mean. The precise definition is not important for this story. What is important are the general ideas. If you want to read more, start with the entry in Wikipedia. If you are well versed in these topics, you can freely skip ahead to the next chapter, Introduction to OTL.

When you start a program, the operating system creates internal entity called a process. It contains almost everything that an operating system associates with the started program - allocated memory, open file and window handles, sockets and so on. A process, however, does not contain the information related to the CPU state. That is part of a thread.

A thread encapsulates the state of the CPU registers, thread stack and thread local variables. Every process contains one thread - the main thread. Operating system creates it together with a process when it starts a program.

Modern operating systems allow multiple threads of execution inside a process. This is achieved by creating additional background threads in the code. To create a thread we can use operating system primitives (such as CreateThread on Windows), low-level run-time library functionality as Delphi’s TThread, or higher-level concepts such as tasks and abstractions (also known as patterns).

On a multi-CPU machine these threads can execute in parallel, one on each core. A typical computer system, however, runs much more threads than it has CPUs and that’s where multi-threading kicks in.

A multi-threading is not a recent invention. It appeared decades before multiprocessor boards became available to the general public. In a typical multi-threading system the operating system allows a thread to run for some time (typically few milliseconds) and then stores the thread’s state (registers etc) into the operating system’s internal data structures and activates some other thread. By quickly switching between multiple threads the system gives an illusion of running many threads in many programs in parallel.

1.1 Multi-threading as a source of problems

Multi-threading is a powerful concept, but it also brings many problems. They are all consequence of the same fact – that threads are not isolated.

An operating system works hard to isolate one process from another. This prevents one process from accessing the memory belonging to another, which is an important security feature. It also makes sure that a bug in one process won’t affect any other process in the system. If you think that’s nothing special, you are not old enough. Some of us had to work on Windows 3 which had no such isolation and we still remember how one badly behaved application could bring down the whole system.

Since threads are not isolated, each thread can access all the memory memory belonging to a process. This gives us extreme power and flexibility. Starting multiple threads provides all the threads with access to the same data. It is also a source of all problems. If multiple threads don’t only read from the shared data but write to it, you’ll get into trouble.

It should be obvious that one thread must not modify data while another thread is reading from it. It is harder to enforce this in practice. We can protect access to shared data by using locking (critical sections, spinlocks, TMonitor …) or interlocked operations, but they will slow the program down and they can introduce new problems (deadlocks).

That is why I prefer data copying, communications and aggregation over data sharing. Modern CPUs are fast and they can easily copy few tens or hundreds of bytes that are then sent to the thread so it can work on its own copy of the data and not on shared data. This approach is the basis of OmniThreadLibrary and you’ll see it throughout the book.

To write better code, you should know which operations that are completely safe in a single-threaded world will get you in trouble when you are multi-threading.

1.1.1 Reading and writing shared data

Let’s start with simple variable access. One thread is reading from a variable and another is writing to it.

1 var 
2   a,b: integer; 
4 // thread 1 
5 a := 42; 
6 // thread 2 
7 b := a; 

This looks safe. Variable b will contain either 42 or previous data stored in a. Well, that is not entirely true. b may also contain a mixture of both.

Depending on how the a is stored in memory (how it is aligned) the processor will access it either with one or two operations. If two operations are used, a read operation may partially overlap the write operation. For example, the code may read the first part of a, then a gets fully updated and only then the code reads the second part of a.

This code behaves “as expected” (doesn’t cause problems) if the address of a gives remainder of 0 when divided by 4. We say that a is 4-aligned. Delphi in most cases makes all integers 4-aligned. Sadly, in some occasions, especially when structured types – classes and records – are nested, variables are not well-aligned.

The simplest way to be sure that a value is correctly aligned is to use the TOmniAlignedInt32 record provided by the OmniThreadLibrary. It makes sure that the underlying data storage is correctly aligned and can as such be accessed atomically.

OmniThreadLibrary also implements TOmniAlignedInt64 type which creates 8-aligned int64 data. This data may be accessed atomically only from 64-bit code.

1.1.2 Modifying shared data

Another example of typical code that doesn’t work correctly in a multi-threaded environment is modifying the same data from two threads. In a typical program this data modification is hidden inside an operation we think of as atomic while in reality it isn’t.

A standard example of this problem is implemented with two threads, one incrementing the shared value and other decrementing it.

 1 var 
 2   data: integer; 
 4 data := 0; 
 6 // thread 1 
 7 for i := 1 to 100000 do 
 8   Inc(data); 
 9 // thread 2 
10 for i := 1 to 100000 do 
11   Dec(data); 

In a single-threaded world, the value of data is 0 when this pseudo-code finishes. In a multi-threaded scenario, however, it will lie somewhere between -100000 and 100000 – and that’s all we can say.

The problem with this code is that Inc and Dec are not atomic operations. Rather, they are implemented inside the CPU as a series of three operations – read data from memory, increment (or decrement) data, and write data to memory.

One way to fix such code is to use already mentioned TOmniAlignedInt32 which implements atomic Increment and Decrement operations.

 1 var 
 2   data: TOmniAlignedInt32; 
 4 data := 0; 
 6 // thread 1 
 7 for i := 1 to 100000 do 
 8   data.Increment; 
 9 // thread 2 
10 for i := 1 to 100000 do 
11   data.Decrement; 

Another option is to use interlocked operations from Delphi’s System.SyncObjs unit.

 1 var 
 2   data: integer; 
 4 data := 0; 
 6 // thread 1 
 7 for i := 1 to 100000 do 
 8   TInterlocked.Increment(data); 
 9 // thread b 
10 for i := 1 to 100000 do 
11   TInterlocked.Decrement(data); 

The third option is to protect increment/decrement operations by wrapping them in a critical section. You could, for example, use OmniThreadLibrary’s TOmniCS.

 1 var 
 2   data: integer; 
 3   lock: TOmniCS; 
 5 data := 0; 
 7 // thread 1 
 8 for i := 1 to 100000 do begin 
 9   lock.Acquire; 
10   data.Increment(data); 
11   lock.Release; 
12 end; 
14 // thread 2 
15 for i := 1 to 100000 do begin 
16   lock.Acquire; 
17   data.Decrement(data); 
18   lock.Release; 
19 end; 

Using interlocked operations is a bit slower than normal arithmetics and locking with critical sections is even slower. That’s one reason I prefer data copying and aggregation over locking.

The next example shows how the increment/decrement example could be rewritten with these principles in mind.

 1 var 
 2   data: integer; 
 4 // thread 1 
 5 var 
 6   tempData: integer; 
 8 tempData := 0;   
 9 for i := 1 to 100000 do 
10   Inc(tempData); 
11 send tempData to main thread 
13 // thread 2 
14 var 
15   tempData: integer; 
16 tempData := 0;   
17 for i := 1 to 100000 do 
18   Dec(tempData); 
19 send tempData to main thread 
21 // main thread 
22 data := result from thread 1 + result from thread 2 

This approach doesn’t manipulate shared data which makes it run faster than interlocked and locking solutions. It does, however, require a communication mechanism that will send data to and from a thread. Different parts of this book will deal with that.

1.1.3 Writes masquerading as reads

Another source of problems are shared objects. If we share an object between threads and then execute methods of that object in different threads we have to know exactly whether these operations are all thread-safe (that is, whether they can be executed in parallel from multiple threads).

A simple example, which I saw in real code, shares a stream between two threads. The worker thread is reading from the stream and doing something with the data (it doesn’t matter what). The main thread is monitoring the progress by tracking stream’s Position and updating a progress bar on the user interface.

In pseudo-code, these two threads are executing following operations on a shared stream.

1 var 
2   data: TStream; 
4 // worker thread 
5 data.Read(...); 
6 // main thread 
7 UpdatePosition(data.Position / data.Size);   

The problem here is easy to overlook as we perceive all operations on the shared stream as reading. In reality, this is not so. Let’s look at them in more detail.

Read reads data from a stream’s current position and then updates that current position.

Position just reads the current position.

Size is the worst of them all. Internally it is implemented as three Seek operations.

1 Pos := Seek(0, soCurrent); 
2 Result := Seek(0, soEnd); 
3 Seek(Pos, soBeginning); 

The problem here is that Seek modifies the current position. By calling Size in one thread we can change the current position just as the other thread starts reading from the stream. That will cause wrong data to be read from the stream.

The morale here is simple. Sharing data between thread is always dangerous.