2013-11-21

There Is No Thread

This is an essential truth of async in its purest form: There is no thread.

The objectors to this truth are legion. "No," they cry, "if I am awaiting an operation, there must be a thread that is doing the wait! It's probably a thread pool thread. Or an OS thread! Or something with a device driver..."

Heed not those cries. If the async operation is pure, then there is no thread.

The skeptical are not convinced. Let us humor them.

We shall trace an asynchronous operation all the way to the hardware, paying particular attention to the .NET portion and the device driver portion. We'll have to simplify this description by leaving out some of the middle-layer details, but we shall not stray from the truth.

Consider a generic "write" operation (to a file, network stream, USB toaster, whatever). Our code is simple:

private async void Button_Click(object sender, RoutedEventArgs e)
{
  byte[] data = ...
  await myDevice.WriteAsync(data, 0, data.Length);
}

We already know that the UI thread is not blocked during the await. Question: Is there another thread that must sacrifice itself on the Altar of Blocking so that the UI thread may live?

Take my hand. We shall dive deep.

First stop: the library (e.g., entering the BCL code). Let us assume that WriteAsync is implemented using the standard P/Invoke asynchronous I/O system in .NET, which is based on overlapped I/O. So, this starts a Win32 overlapped I/O operation on the device's underlying HANDLE.

The OS then turns to the device driver and asks it to begin the write operation. It does so by first constructing an object that represents the write request; this is called an I/O Request Packet (IRP).

The device driver receives the IRP and issues a command to the device to write out the data. If the device supports Direct Memory Access (DMA), this can be as simple as writing the buffer address to a device register. That's all the device driver can do; it marks the IRP as "pending" and returns to the OS.

The core of the truth is found here: the device driver is not allowed to block while processing an IRP. This means that if the IRP cannot be completed immediately, then it must be processed asynchronously. This is true even for synchronous APIs! At the device driver level, all (non-trivial) requests are asynchronous.

To quote the Tomes of Knowledge, "Regardless of the type of I/O request, internally I/O operations issued to a driver on behalf of the application are performed asynchronously".

With the IRP "pending", the OS returns to the library, which returns an incomplete task to the button click event handler, which suspends the async method, and the UI thread continues executing.

We have followed the request down into the abyss of the system, right out to the physical device.

The write operation is now "in flight". How many threads are processing it?

None.

There is no device driver thread, OS thread, BCL thread, or thread pool thread that is processing that write operation. There is no thread.

Now, let us follow the response from the land of kernel daemons back to the world of mortals.

Some time after the write request started, the device finishes writing. It notifies the CPU via an interrupt.

The device driver's Interrupt Service Routine (ISR) responds to the interrupt. An interrupt is a CPU-level event, temporarily seizing control of the CPU away from whatever thread was running. You could think of an ISR as "borrowing" the currently-running thread, but I prefer to think of ISRs as executing at such a low level that the concept of "thread" doesn't exist - so they come in "beneath" all threads, so to speak.

Anyway, the ISR is properly written, so all it does is tell the device "thank you for the interrupt" and queue a Deferred Procedure Call (DPC).

When the CPU is done being bothered by interrupts, it will get around to its DPCs. DPCs also execute at a level so low that to speak of "threads" is not quite right; like ISRs, DPCs execute directly on the CPU, "beneath" the threading system.

The DPC takes the IRP representing the write request and marks it as "complete". However, that "completion" status only exists at the OS level; the process has its own memory space that must be notified. So the OS queues a special-kernel-mode Asynchronous Procedure Call (APC) to the thread owning the HANDLE.

Since the library/BCL is using the standard P/Invoke overlapped I/O system, it has already registered the handle with the I/O Completion Port (IOCP), which is part of the thread pool. So an I/O thread pool thread is borrowed briefly to execute the APC, which notifies the task that it's complete.

The task has captured the UI context, so it does not resume the async method directly on the thread pool thread. Instead, it queues the continuation of that method onto the UI context, and the UI thread will resume executing that method when it gets around to it.

So, we see that there was no thread while the request was in flight. When the request completed, various threads were "borrowed" or had work briefly queued to them. This work is usually on the order of a millisecond or so (e.g., the APC running on the thread pool) down to a microsecond or so (e.g., the ISR). But there is no thread that was blocked, just waiting for that request to complete.

Now, the path that we followed was the "standard" path, somewhat simplified. There are countless variations, but the core truth remains the same.

The idea that "there must be a thread somewhere processing the asynchronous operation" is not the truth.

Free your mind. Do not try to find this "async thread" — that's impossible. Instead, only try to realize the truth:

There is no thread.

10 comments:

  1. Awesome post! Thanks for explaining it so clearly!

    ReplyDelete
  2. Amazing explanation in such a difficult and complicated concept! Thank you!

    ReplyDelete
  3. Very enlightening! Thank you Stephen.

    ReplyDelete
  4. A must-read for any asynchronous code developer. I keep linking to it on SO. Thanks Stephen!

    ReplyDelete
  5. Great article, though I am compelled to Well Actually you on one point:

    "DPCs also execute at a level so low that to speak of "threads" is not quite right; like ISRs, DPCs execute directly on the CPU, "beneath" the threading system."

    This isn't strictly true. While ISRs don't particularly have a concept of thread (a kernel developer would call it "Arbitrary context"), DPCs specifically *do* execute in the context of a thread. Now, *which* thread, is the question! They are run before the next scheduled thread has a chance to do anything - they are part of the scheduler. Since kernel memory is mapped into every process, what they need is always available.

    That word up there, "context" is super important - a thread is, at its simplest, a context in which to run code. Async operations *jump contexts* as they are being processed, they're not tied to a specific context (i.e. a thread).

    It's better to think of an async operation as a 'message' that gets passed around, and that message changes forms several times (i.e. from a system call => IRP => APC => UI Dispatcher Queue item). Every time this message is processed, it ends up changing forms

    ReplyDelete
    Replies
    1. Thanks for your comment; I love the "message" description!

      But I'm not sure I agree with you re DPCs (not that it really matters; the details of semantics aren't that important in this case). My reasoning is:
      - A thread does not have a DPC queue, but CPUs do. A DPC can be scheduled to a specific CPU, but not a specific thread.
      - DPCs are executed (at IRQL DISPATCH_LEVEL) when the CPU IRQL is transitioning from a higher level to DISPATCH_LEVEL or lower (e.g., PASSIVE_LEVEL). So they execute before normal thread code can resume.
      - While a DPC does execute with a thread context in the narrowest sense (CONTEXT), it does not execute with a valid thread context in the broader sense (able to use the security context of the current thread). So, they may execute in a thread context, they must be written to run in an arbitrary thread context.
      - A DPC is still very constrained in the code it can run as compared to code running as a part of a thread. In particular, no page faults are allowed.
      - Interrupts are masked while a DPC is running.

      Also, these quotes from MS:
      - "The system schedules all threads to run at IRQLs below DISPATCH_LEVEL." ("Scheduling, Thread Context, and IRQL", http://msdn.microsoft.com/en-us/windows/hardware/gg487402.aspx)
      - "IRQLs at or above DISPATCH_LEVEL are processor specific... IRQLs below DISPATCH_LEVEL are thread specific." (ibid)
      - "Code that is running at PASSIVE_LEVEL is considered to be working on behalf of the current thread." (ibid)
      - "DPCs ... are always called ... in an arbitrary thread context." (ibid)
      - "Before a processor returns to processing threads, it executes all of the DPCs in its queue." ("CPU Analysis", http://msdn.microsoft.com/en-us/library/windows/hardware/jj679884.aspx)

      I came across these quotes while trying to find out whether a DPC actually counts against the current thread's quantum. I was unable to find a definitive answer. :(

      Delete
  6. Hi Stephen,

    Awesome post. I still have one doubt:

    "Since the library/BCL is using the standard P/Invoke overlapped I/O system, it has already registered the handle with the I/O Completion Port (IOCP), which is part of the thread pool. So an I/O thread pool thread is borrowed briefly to execute the APC, which notifies the task that it's complete."

    What do you mean by an I/O thread pool thread, you mean one of the threads that are registered in the I/O completion port?

    ReplyDelete
    Replies
    1. Yes, it's one of the I/O threads in the thread pool. The ThreadPool keeps a number of threads registered in its IOCP; these are different than the worker threads that most people associate with the ThreadPool. The ThreadPool manages both worker thread and I/O threads.

      Delete
    2. Instead should the article's title be: "There is no (worker) thread." ;) But GREAT Article. LOVE IT.

      Delete