Last time, we learned some basic concepts in the TPL Dataflow library. Today, let’s look at some blocks in more detail.
BufferBlock: A Queue
One of the simplest blocks is just a basic FIFO buffer,
BufferBlock<T>. The data that comes in is the data that goes out.
With a block this simple, you might wonder why you would even need it.
BufferBlock is useful on its own as an
async-compatible queue. It’s also useful in a dataflow mesh when combined with different options (such as throttling) that we’ll cover in next week’s post.
And, of course, it’s a great block to start playing with when you’re learning TPL Dataflow.
Possibly even simpler than
ActionBlock<T> is just an input buffer combined with a processing task, which executes a delegate for each input item. Conceptually, it’s like running a “foreach” loop over the data passing through the block.
A very useful feature of
ActionBlock is that its delegate may be
async. By default, the
ActionBlock will run the delegate to completion for one data item at a time. (We’ll take a look next week at how to change these defaults).
ActionBlock does not provide any output data items. They are “pushed” to its own delegate, not to another block. As such, it represents the end of a dataflow mesh (unless your delegate posts or sends the data to another block, but that would be unusual).
NullTarget<T> has got to be the simplest block. It just accepts all data items and ignores them.
So why would you use it? Imagine you have a
BufferBlock linked to an
ActionBlock, but you applied a filter when you called
LinkTo. If a data item came along not matching the filter, the
ActionBlock would refuse to take it, but then the
BufferBlock would hold onto it. The data item would stick there, gumming up the whole system. An easy way to fix this is to link the
BufferBlock to a second block (
NullTarget), which would get any leftover data items (the ones rejected by the
ActionBlock), and ignore them.
TransformBlock<TInput, TOutput> is like a LINQ
Select method: conceptually, it is a one-to-one mapping for data items.
You define the mapping function yourself in a delegate. Like
ActionBlock, this delegate may be
async. Also like
TransformBlock will only process one item at a time by default.
TransformBlock does provide an output. So it actually has two buffers (data that has not been processed, and data that has been processed) and two tasks (one to process the data, and one to push data to the next block).
TransformManyBlock<TInput, TOutput> is very similar to
TransformBlock, except it’s a one-to-n mapping for data items. So it’s like LINQ’s
SelectMany, where a single input item may result in zero, one, or any number of output items. The results of this mapping are “flattened”, just like LINQ’s
Again, you define the mapping function in a delegate, which may be
TransformManyBlock also processes only one input item at a time by default.
TransformManyBlock has a similar internal structure to
TransformBlock: two buffers and two tasks. The only real difference between the two is that the mapping delegate returns a collection of items, which are inserted individually into the output buffer.
Advanced Block Types
The blocks described above are a good starting point for playing around with TPL Dataflow, but the library offers much more (which I won’t be covering in these intro posts):
WriteOnceBlock<T>- Memorizes its first data item and passes out copies of it as its output. Ignores all other data items.
BatchBlock<T>- Groups a certain number of sequential data items into collections of data items.
BroadcastBlock<T>- Passes out copies of data items as its output. This block is just like
BufferBlockexcept that a
BufferBlockwill only send a particular data item to a single block;
BroadcastBlockwill copy the item and send the copies to every block that it’s linked to.
JoinBlock<T1, T2, T3>- Collects two or three inputs and combines them into a
BatchedJoinBlock<T1, T2, T3>- Collects a certain number of total items from two or three inputs and groups them into a
Tupleof collections of data items.
Please read the official “Introduction to TPL Dataflow” document for more details on these block types; that document covers information like the option for greedy behavior, which is important for some batching and joining scenarios. Finally, if you’re using the advanced blocks, I also recommend also hanging out on the TPL Dataflow forum.
Update (2014-12-01): For more details, see Chapter 4 in my Concurrency Cookbook.