Introduction to Dataflow, Part 2
• CommentsLast time, we learned some basic concepts in the TPL Dataflow library. Today, let’s look at some blocks in more detail.
BufferBlock: A Queue
One of the simplest blocks is just a basic FIFO buffer, BufferBlock<T>
. The data that comes in is the data that goes out.
With a block this simple, you might wonder why you would even need it. BufferBlock
is useful on its own as an async
-compatible queue. It’s also useful in a dataflow mesh when combined with different options (such as throttling) that we’ll cover in next week’s post.
And, of course, it’s a great block to start playing with when you’re learning TPL Dataflow.
ActionBlock: Foreach
Possibly even simpler than BufferBlock
, ActionBlock<T>
is just an input buffer combined with a processing task, which executes a delegate for each input item. Conceptually, it’s like running a “foreach” loop over the data passing through the block.
A very useful feature of ActionBlock
is that its delegate may be async
. By default, the ActionBlock
will run the delegate to completion for one data item at a time. (We’ll take a look next week at how to change these defaults).
ActionBlock
does not provide any output data items. They are “pushed” to its own delegate, not to another block. As such, it represents the end of a dataflow mesh (unless your delegate posts or sends the data to another block, but that would be unusual).
NullTarget: /dev/null
OK, NullTarget<T>
has got to be the simplest block. It just accepts all data items and ignores them.
So why would you use it? Imagine you have a BufferBlock
linked to an ActionBlock
, but you applied a filter when you called LinkTo
. If a data item came along not matching the filter, the ActionBlock
would refuse to take it, but then the BufferBlock
would hold onto it. The data item would stick there, gumming up the whole system. An easy way to fix this is to link the BufferBlock
to a second block (NullTarget
), which would get any leftover data items (the ones rejected by the ActionBlock
), and ignore them.
TransformBlock: Select
TransformBlock<TInput, TOutput>
is like a LINQ Select
method: conceptually, it is a one-to-one mapping for data items.
You define the mapping function yourself in a delegate. Like ActionBlock
, this delegate may be async
. Also like ActionBlock
, TransformBlock
will only process one item at a time by default.
Unlike ActionBlock
, TransformBlock
does provide an output. So it actually has two buffers (data that has not been processed, and data that has been processed) and two tasks (one to process the data, and one to push data to the next block).
TransformManyBlock: SelectMany
TransformManyBlock<TInput, TOutput>
is very similar to TransformBlock
, except it’s a one-to-n mapping for data items. So it’s like LINQ’s SelectMany
, where a single input item may result in zero, one, or any number of output items. The results of this mapping are “flattened”, just like LINQ’s SelectMany
.
Again, you define the mapping function in a delegate, which may be async
. And TransformManyBlock
also processes only one input item at a time by default.
TransformManyBlock
has a similar internal structure to TransformBlock
: two buffers and two tasks. The only real difference between the two is that the mapping delegate returns a collection of items, which are inserted individually into the output buffer.
Advanced Block Types
The blocks described above are a good starting point for playing around with TPL Dataflow, but the library offers much more (which I won’t be covering in these intro posts):
WriteOnceBlock<T>
- Memorizes its first data item and passes out copies of it as its output. Ignores all other data items.BatchBlock<T>
- Groups a certain number of sequential data items into collections of data items.BroadcastBlock<T>
- Passes out copies of data items as its output. This block is just likeBufferBlock
except that aBufferBlock
will only send a particular data item to a single block;BroadcastBlock
will copy the item and send the copies to every block that it’s linked to.JoinBlock<T1, T2>
andJoinBlock<T1, T2, T3>
- Collects two or three inputs and combines them into aTuple
.BatchedJoinBlock<T1, T2>
andBatchedJoinBlock<T1, T2, T3>
- Collects a certain number of total items from two or three inputs and groups them into aTuple
of collections of data items.
Please read the official “Introduction to TPL Dataflow” document for more details on these block types; that document covers information like the option for greedy behavior, which is important for some batching and joining scenarios. Finally, if you’re using the advanced blocks, I also recommend also hanging out on the TPL Dataflow forum.
Update (2014-12-01): For more details, see Chapter 4 in my Concurrency Cookbook.