On The Topic Of Multi-Threading
Multi-threading is one of those things which initially has this magical sound to it, and as long as you don’t pay too close attention to it, it does keep its fairy-dust glimmer. You may even end up using it once or twice without trying to write a high-performance application. The real point of this article is to discuss some of the finer points of multi-threading, henceforth we’ll be skipping the high-level languages such as Java and C#, as well as fancy high-level, cross-platform libraries such as Qt, and instead dive straight into the messy, low-level stuff. Do mind your step 🙂
The primary language used for writing high-performance, multi-threaded applications is C++. Since C++ lacks native multi-threading support (until C++ 2011 becomes official and fully implemented that is), one has to pick from a variety of threading API options, the choice of which will depend on the platform of choice. If you only ever run the application on a single operating system, and never plan to port it, you can just go with whatever native API your OS provides, be it Windows, Linux or OS X. This is not a bad choice, as any other libraries you can pick will just build upon this native threading API.
If you’re like me, however, and would like to make porting one’s applications to other platforms as easy as a simple recompile, then one has to pick a library which offers this functionality and portability. While there are various ones out there, including Intel Threading Building Blocks (TBB)  and Boost Thread, they each come with their own advantages and disadvantages. Intel TBB is the more high-level one of the two, using abstractions to take away many of the gritty details of multi-threading, whereas Boost Thread (BT from hereon) is quite low-level, making you do the resource management yourself.
The crucial detail when designing a multi-threaded application is the need to balance between efficiency of execution and maintainability. While I haven’t used Intel TBB myself yet, it seems like it could make things a lot easier to build and as long as they keep working, it’s fine. It has to be hell to debug if something goes wrong, though. With a low-level library such as BT there are far fewer layers between you and the system, making maintenance and debugging easier. In theory, of course.
For BT, launching a thread is as easy as:
MyClass worker(1); // our freshly initialized class boost::thread myThread(worker); // insert into new thread instance
That’s it. The thing where TBB might be easier is when you have to manage a large number of threads, but that’s something you have to look at for yourself.
As I pointed out earlier, in the end it are always the platform’s native threading APIs which are being used for the actual threading. These approaches aren’t too dissimilar. A new thread is created and is allocated a task to run. So-called mutexes, spinlocks and other structures are then used to ensure that a single piece of data that is shared by multiple threads isn’t accessed simultaneously, as this could lead to undesirable behaviour, data corruption and crashes. At this level things are still quite easy to understand.
The part where it gets messy is when you move on to the actual implementation in the hardware which makes this possible. Before the arrival of multi-core processors, multi-threading truly was an illusion, as there’d never been two tasks simultaneously active. Instead the OS’s task scheduler would swap out tasks, giving each a time slice to do its things before its state being saved to the stack again and another task’s state being restored. With multi-core processors two or more tasks can be active simultaneously, yet if you look at for example the statistics provided by the task manager of your operating system and particularly the number of active threads, you’ll see that it’s far higher than the number of cores in your system. For me it’s above 1,000 active threads as I write this.
Task-switching is thus still a very common practice, and we run into the first hurdle when it comes to reliable multi-threading: the OS’s task scheduler. As described earlier, it’s the piece of code which determines which task gets to run and in which order. Countless approaches to task scheduling exist, each being more beneficial to particular scenarios. In an embedded, real-time OS such as QNX the emphasis would be on exact time slices and timing, so that any scheduled task would run on time and exactly for as long it has to be. For a desktop OS such as Windows there’s no such need and scheduling is far more loose. It’s a pretty chaotic environment anyway, so if a task doesn’t run for exactly 100 ms, few will notice.
So in essence your threads will be competing with all the other threads which are active at that time. Don’t count on exact timing, and expect some of your threads to be waiting on results from other threads, depending on the design you’re using. On a hardware level, threading is more smoke and mirrors than the clean and pristine world of software makes us believe in. The task scheduler can mess up allocating your threads, reducing performance significantly. Moving threads between cores will cause all the data it had previously gathered in that core’s L1 and L2 cache to be invalidated and have to start all over again on the new core it’s assigned to. Similarly, task switching on a single core will ‘pollute’ the caches with data your thread doesn’t need, also leading to performance reductions.
One can lock threads to a single core to prevent such problems, but whether it’s the right choice depends again on the situation. It’s a good idea to try multiple approaches and see what works best. Use accurate timing methods and perform multiple runs (at least 5 or so) to rule out any glitches and to ensure you get useful data to base your decision on. Repeat this for every platform and in the case of Linux and similar OSs which allow you to swap out the task scheduler, for each type of task scheduler you will be deploying the application on.
Of course, we are talking about high-performance multi-threading here. If you are just running a processing task in a thread to not disrupt the UI thread, then by all means just use Qt’s threading functionality or so. Even if its abstractions and sometimes poor documentation can make it more of a headache than doing it the low-level approach.
Next article should be on the Android game project again. Until then,