High speed applications – parallelism in .NET part 1

This is the first in a series of posts about parallelism.

What we will cover in this series:

  • Some history and what parallelism is
  • Multitasking and multithreading
  • Styles/types of multitasking
  • Context switching and renegade thread methods – why Thread.Sleep is bad practice.
  • Creating our very own little thread pool.
  • The .NET thread pool, using ThreadPool.QueueUserWorkItem
  • Data consistency, thread safety, locking and Interlocked
  • The .NET thread pool using Task Parallel Library (TPL) – System.Threading.Tasks
    • Scheduling / Running tasks
    • Continuations
    • Why Task.Wait / WaitAny / WaitAll is as bad as Thread.Sleep
    • Task.WhenAll / WhenAny
    • Task.Delay
    • Cancellation
    • IProgress<T>
  • Using the .NET thread pool using System.Threading.Tasks.Parallel
    • Parallel.For
    • Parallel.ForEach
  • How tasks and continuations can be so much easier with async / await
  • UI threads
  • Data contexts
  • Debugging multithreaded applications

Please bear with me here in the beginning. Some of the things might be challenging when you read about them for the first time, but they will make more sense later on. Even better is that much of the dirty work is already done for you behind the scenes by Windows and the .NET framework. Though, I believe that it is an advantage to know the story behind it and how it works to write code more efficiently.

Synonymous and acronyms

Program, application and process are more or less different names for the same thing. There are differences and if at least one of the readers there request it, I will write a post just about which they are. But in short, a process is a unit containing executable code loaded into (virtual) memory. A process does not need to have a user interface and a process can host multiple other programs and applications, which is the case with web server processes. A program/application usually have some sort of interface for either computers or users.

OS = Operating system. The system running and controlling a computer providing API:s for actually doing stuff, providing multitasking and multithreading, security and much more. Microsoft Windows, Linux, MacOS, IOS and Android are examples of operating systems.

UI = User interface

IBM = International Business Machines – A company that was really big in the computer industry, with PC:s, servers, their own OS etc, back in the day.

Multicore processor = one physical processor with several cores where each acts as their own processor. A multicore processor can execute several instructions in parallel.

Execution path = A series of instructions, executing in sequence.

Virtual core = The processor exposes virtual cores, by creating two virtual cores of one physical core. Intels Hyperthreading is an example of this. The processor has it’s own multitasking where it switches context between the two cores/threads when one of the cores are waiting for memory access or such.

Some history, what is parallelism and why we need it…

My (parents’) first PC was an IBM PS/2 with an Intel 80286 10MHz processor (CPU) and 1MB RAM.  Like all PC:s at the time, it had 1 core, one concurrent execution path. It could do one thing at a time. Back then the operating system was MS DOS which has/had no support for multitasking / multithreading (explained later).

A processor core can carry out one instruction at a time. It can not be interrupted during this single instruction. The smallest unit of time for a processor is a clock cycle. The processor clock frequency determines the number of clock cycles per second. The higher the frequency, the more clock cycles per second which means, faster execution. A clock frequency of 10 megahertz (MHz) is 10 million clock cycles per second.

Clock cyles and instructions

The green squares are clock cycles, the red rectangles are instructions.
Different instructions take different amounts of clock cycles. Since the 80286, the processors have been improved in a lot of different ways. Instructions which previously took 3 clock cycles, now take only one clock cycle. For those instructions, the processor is 3 times faster. If all instructions in the image above take 1 clock cycle instead of 1-3, the processor would be twice as fast (since the cycles per instruction average in the image is 2). Adding new instructions to a processor which combine common patterns of instructions and execute them in 1 clock cycle is another way to execute code faster with the same clock frequency. I will not go into too much more detail here.

Increasing the clock frequency makes the processor execute more instructions in less time – more clock cycles per second means more instructions per second. The laptop I’m writing this post on has a maximum frequency of 2.8 GHz and it has 4 cores (8 virtual cores with hyperthreading). That is 4 parallel execution paths running 2.8 GHz. When there is no need to run at 2.8 GHz it will slow down to consume less energy.

There are faster processors available, but it’s getting more difficult with the current processor architecture to increase the clock frequency.

If you are digging a hole in the ground, you have the best tools available, you are as strong as you ever will be and as skilled in digging as you ever will be, the best way to dig the hole faster is to have friend help you out. The two of you will dig the hole faster, or maybe one of you can dig, and the other can transport the dirt away from the dig site.

If you are just one girl/guy digging the hole, you will still need to transport the dirt away, get refreshments, wipe the sweat from your brow, which means that you will have to carry out many tasks.

Having multiple cores in your system is increasing the work force. If you where 4 girls/guys, maybe two of you can dig the hole, one can transport the dirt away and one can carry out other tasks like bringing beverages and other tasks.

Parallelism

Parallelism is doing things in parallel. Some reasons for parallelism are:

  • A bigger task can finish quicker by splitting it up in smaller tasks than can be carried out by multiple cores in parallel.
  • Several different tasks can execute in parallel.
  • Have the OS tasks execute on one core and your tasks execute on the others.

Multitasking basics – multithreading and context switching

Before the multi core processors where readily available, there where still a need for multiple tasks to be carried out more or less simultaneously. You would be pretty irritated if you wouldn’t be able to do anything with your computer while downloading or copying a file….

Copying

You would be even more annoyed if the UI and the entire computer would freeze during the download.

Now fortunately, it is possible to multitask with one core.

Threads

A thread is like a virtual execution path. If your application (process) is running through a sequence of small tasks, it uses one thread. Multitasking with one core is chopping up the work of all the the running processes/threads into smaller tasks (with or without their consent) and executing each small task one at a time.
multipleprocessessinglethread

In this example, every thread gets approximately the same amount of execution time. Switching very quickly between them makes it seem like all three three programs are executing at the same time. The black line(s) represent the actual processor core execution path.

Context switching

Switching between threads is called context switching. It is managed by the operating system. A program can have more than one thread and if process 1 has 2 threads, while process 2 and 3 have 1- 4 threads are sharing the processor. When a thread is ready to execute (to use processing power), it is scheduled for execution.

The primary styles of context switching/multitasking

Cooperative multitasking

The operating system switches thread when the thread or process currently executing notifies the OS that now is a good time to switch. When copying a file, waiting for data to be read from disk is a good time to switch, so when the thread calls the OS API for reading from disk, the OS initiates the read and then switches context (if any other threads are scheduled). In an operating system relying only on cooperative multitasking, a malfunctioning or malicious process can cease control of the processor and the block the entire system.

Preemptive multitasking

Used in all modern operating systems. The operating system decides when it’s a good time to switch context. Either by knowing that the thread is waiting for something, like data to be read from disk, the thread has been executing for a set amount of time or a thread with higher priority is ready to run (scheduled).

Preemptive multitasking works best when the threads also cooperate with the operating system, like cooperative multitasking. When reading data from disk, the thread initiate the read and the OS knows that the thread will not need to execute until a certain I/O event is triggered.

When context switching is taking over your system

Switching context is expensive, or time consuming if you will, the reasons for this is beyond the scope of this post. You should not be afraid to use multiple threads in your programs, but make sure you need them and make them cooperative. Avoid renegade context switching thread methods.

A renegade context switching thread method

Let’s create a simple console application with an absurd amount of threads in a thread pool that cause a lot of context switching.

Each thread checks the queue,  carries out some work if there is any which there isn’t, and then sleeps for at least 1 ms. .NET and Windows knows that the thread will not need to do anything for a millisecond which in most cases cause a context switch. When running this program on my quad core 2,8 GHz laptop, no one process takes more than 1% CPU, but the total cpu usage is 26%. Stopping it, it’s back to ~4%.

noprocessover1percent taskmanager26pct

(The process is not even taking 1% CPU so it’s not visible in the Task Manager “top list” above).

The context switching of the 200 threads consume 22% of the system capacity. If you fire up Task Manager on your machine, showing all processes and their threads, you probably have more than 200 threads all in all. Imagine all of these threads where written as renegade thread context switching methods. Even though the threads kind of cooperate with the OS, by sleeping, the result is ridiculous.

The reason for creating this example is that even though the number of threads in the process is exaggerated, I’ve seen quite a lot of thread methods looking like this, not to mention that I’ve written thread methods like this myself more than once in the past.

We could make the method even worse by omitting Thread.Sleep and have the threads consume all available processing power, but that wouldn’t be nice at all, would it?

Improving the example

  1. First of all, it is not likely that we have a need to have more threads than the number of virtual cores in the system. Having more threads than virtual cores doing the same work will cause more context switching and reduce the overall performance.

    The overhead is now reduced to 1-2%, but this is still too much. Especially since we can assume there are other processes running in the system that are doing the exact same thing. Also, wouldn’t we rather have those 1-2% available when we need them. On a busy web server, those 2 percent could equal to maybe 50 requests per second, or on a not so busy web server, they could allow for hosting another web site.
  2. There is a much better collection for this kind of work. Actually, there are almost always a better solution to using Thread.Sleep. How about using BlockingCollection<T>?

    The BlockingCollcetion<T> will tell the OS that the thread will not need to be scheduled for execution until the event inside the BlockingCollection<T> is triggered for that thread. If one item is added to the collection, one thread will get scheduled and as soon as there is available processing power, the value is taken from the collection. If there are more items in the collection than threads waiting, all threads will be scheduled.

    The process is now not using any processing power at all unless something is posted to the collection, or the event of a key press to the console window.

    BlockingCollection<T> is also Thread Safe which means that it will work and keep data consistent even when multiple threads are accessing it simultaneously. All collections in the System.Collections.Concurrent namespace are thread safe. More about consistency later on.

starting_thread_time

Starting the improved example, we can see that the first 14 items in the collection are taken by thread 13 which is the first one started. then four items are taken by thread 14, thread 10 takes 8, and then the items taken by the threads even out. The main reason that thread 13 gets a head start is that it takes some time to start a thread. Instead of starting a thread when we need it, we should have the thread ready.  When we are looking for high performance software to carry out concurrent tasks, we should use a thread pool. Creating and disposing resources that could be reused / recycled is a waste of processing power. If you move the loop adding integers to the collection to after the threads are created, their execution will be much more random.

Ok great, we’ve created a thread pool for handling one particular task in parallel. If this is a pipeline kind of application this is a great start, but…

What if our application needs to do more than one certain task in parallel, and what if, yeah, what if there is already a generic thread pool we can use, to avoid creating threads, that adapts to the utilization of it. You guessed it. .NET has a generic thread pool we can use to schedule tasks.

We will cover the thread pool in the next post in the series.

Cheers
Erik Bergman

Check out part 2 of the series here

15 thoughts on “High speed applications – parallelism in .NET part 1

  1. Bethree

    Great!! In this very first post you taught me much more than hours of googling and reading.
    I’m anxious waiting for the second part.

    Reply
  2. Pingback: High speed applications – parallelism in .NET part 2 – Erik Bergman's .NET blog

  3. Erik Bergman Post author

    I’ve done some minor editing to the post adding the text “Task Parallel Library (TPL)”. No need to reread it though :).

    Reply
  4. Pingback: The week in .NET – 3/22/2016 | .NET Blog

  5. Pingback: High speed applications – Parallelism in .NET part 3, TPL – Erik Bergman's .NET blog

  6. Pingback: Compelling Sunday – 19 Posts on Programming and QA

  7. Pingback: High speed applications – Parallelism in .NET part 4 – TPL, exceptions & cancellation – Erik Bergman's .NET blog

Leave a Reply

Your email address will not be published. Required fields are marked *