2009-04-30

Message Framing

(This post is part of the TCP/IP .NET Sockets FAQ)

The Problem

One of the most common beginner mistakes for people designing protocols for TCP/IP is that they assume that message boundaries are preserved. For example, they assume a single "Send" will result in a single "Receive".

Some TCP/IP documentation is partially to blame. Many people read about how TCP/IP preserves packets - splitting them up when necessary and re-ordering and re-assembling them on the receiving side. This is perfectly true; however, a single "Send" does not send a single packet.

Local machine (loopback) testing confirms this misunderstanding, because usually when client and server are on the same machine they communicate quickly enough that single "sends" do in fact correspond to single "receives". Unfortunately, this is only a coincidence.

This problem usually manifests itself when attempting to deploy a solution to the Internet (increasing latency between client and server) or when trying to send larger amounts of data (requiring fragmentation). Unfortunately, at this point, the project is usually in its final stages, and sometimes the application protocol has even been published!

True story: I once worked for a company that developed custom client/server software. The original communications code had made this common mistake. However, they were all on dedicated networks with high-end hardware, so the underlying problem only happened very rarely. When it did, the operators would just chalk it up to "that buggy Windows OS" or "another network glitch" and reboot. One of my tasks at this company was to change the communication to include a lot more information; of course, this caused the problem to manifest regularly, and the entire application protocol had to be changed to fix it. The truly amazing thing is that this software had been used in countless 24x7 automation systems for 20 years; it was fundamentally broken and no one noticed.

The Solution, Part 1 - Understanding

First, one must understand the abstraction of TCP/IP. From the application's perspective, TCP operates on streams of data, never packets. Repeat this mantra three times: "TCP does not operate on packets of data. TCP operates on streams of data."

There is no way to send a packet of data over TCP; that function call does not exist. Rather, there are two streams in a TCP connection: an incoming stream and an outgoing stream. One may read from the incoming stream by calling a "receive" method, and one may write to the outgoing stream by calling a "send" method. If one side calls "send" to send 5 bytes, and then calls "send" to send 5 more bytes, then there are 10 bytes that are placed in the outgoing stream. The receiving side may decide to read them one at a time from its receiving stream if it so wishes (calling "receive" 10 times), or it may wait for all 10 bytes to arrive and then read them all at once with a single call to "receive".

Sending data to the TCP stream is rather easy; all one has to do is call "send", and the appropriate bytes are queued to the outgoing stream. Receiving data from the TCP stream is a bit more tricky, because the "receive N bytes" operation will wait until at least one byte and at most N bytes arrive on the incoming stream before it returns. Note that the "receive N bytes" operation will complete even if it doesn't read all N bytes, giving the application a chance to act on partial data while the rest of the data bytes are in transit. In the real world, very few programs can process partial receives; almost all programs need a buffer to store partial receives until they have enough data to do meaningful work.

To repeat: TCP operates on streams, not on packets. However, most application protocols are based on the idea of "messages"; for example, a client may send a "Lookup X" message to the server, and the server will respond with an "X Data" or "X Not Found" message. Since TCP operates on streams, one must design a "message framing" protocol that will wrap the messages sent back and forth.

The Solution, Part 2 - Design

There are two approaches commonly used for message framing: length prefixing and delimiters.

Length prefixing prepends each message with the length of that message. The format (and length) of the length prefix must be explicitly stated; "4-byte signed little-endian" (i.e., "int" in C#) is a common choice. To send a message, the sending side first converts the message to a byte array and then sends the length of the byte array followed by the byte array itself.

Receiving a length-prefixed message is harder, because of the possibility of partial receives. First, one must read the length of the message into a buffer until the buffer is full (e.g., if using "4-byte signed little-endian", this buffer is 4 bytes). Then one allocates a second buffer and reads the data into that buffer. When the second buffer is full, then a single message has arrived, and one goes back to reading the length of the next message.

Delimiters are more complex to get right. When sending, any delimiter characters in the data must be replaced, usually with an escaping function. The receiving code cannot predict the incoming message size, so it must append all received data onto the end of a receiving buffer, growing the buffer as necessary. When a delimiter is found, the receiving side can apply an unescaping function to the receiving buffer to get the message. If the messages will never contain delimiters, then one may skip the escaping/unescaping functions.

A Brief Security Note

Whether using length-prefixing or delimiters, one must include code to prevent denial of service attacks. Length-prefixed readers can be given a huge message size; delimiting readers can be given a huge amount of data without delimiters. Either of these may result in an OutOfMemoryException, so one must include a maximum message size "sanity check" in the socket reading code.

The Solution, Part 3 - Code

A code sample for using length-prefixing is in its own blog post at http://blog.stephencleary.com/2009/04/sample-code-length-prefix-message.html.

Another decent code example of length prefixing is on Jon Cole's blog, although he assumes all the messages are just ASCII strings.

Yet another example of length prefixing is in the Nito.Async library: the Nito.Async.Sockets.SocketPacketProtocol class can be used to send or receive length-prefixed binary messages. It is written to use the Nito.Async socket classes, but the same code concepts translate well to the .NET Socket class.

(This post is part of the TCP/IP .NET Sockets FAQ)

23 comments:

  1. Hi

    What about UDP? Does it use streams like TCP? Do you have any articles or links about it?

    Thank you

    ReplyDelete
  2. UDP does not use streams; it uses packets, so message framing is not always necessary. With UDP, however, you have to deal with issues such as unreliable delivery, re-ordering, and hard (possibly changing) maximum packet sizes.

    I don't have any articles on UDP, and I don't really plan on writing any. TCP is low-level enough to be confusing to many programmers.

    The books referenced in my post on TCP/IP resources (http://nitoprograms.blogspot.com/2009/05/tcpip-resources.html) do have UDP information.

    ReplyDelete
  3. Again, an excellent article. I will repeat 5 times more the mantra. Thanks a lot.

    ReplyDelete
  4. Thanks for this. I was already doing this in my code, but you have given me a clearer understanding of what im doing!

    Much easier to design/visualise now

    ReplyDelete
  5. "Send" is not always a simple operation either. There's a reason why it returns the amount of data sent -- its quite possible for the send to fragment as well (if the hardware buffer gets full, for example), necessitating a retry on the sending side (and you have to be careful to start the retry at the appropriate point in the outgoing data array). Send is only slightly simpler than receive (no need to worry about allocating/growing the buffer or dealing with situations where you read more than needed and have to hold on to the excess for the next read call).

    And on another note, length specifiers are usually pretty redundant for anything other than string and array lengths (and even then, string lengths are only really needed in binary streams.. text streams typically just run a newline-delimited protocol). Its much simpler to just specify in the protocol documentation that an integer in a particular part of the data stream is 4 bytes, signed, little endian then the implementers can handle it however they need and the protocol itself is significantly simplified (not to mention reducing the number of bytes needing to travel the tubes).

    ReplyDelete
  6. You didn't mention resiliance as a solution. Designing your serializer (or XML reader and so forth) so that bytes are 'pushed' into it, instead of it reading them means you don't need to care about framing.

    ReplyDelete
  7. My solution is to ask receiving host to send back an integer number to confirm the message has been received. Therefore on the server side:
    for (i=0; i<npass; i++){
    nbytes=send(fd, buf, size_buf, 0);
    ..check nbytes here. nbytes=size_buf
    nbytes=recv(fd, &idone, 4, 0);
    ..check nbytes. nbytes should be equal to 4
    }

    on the client size:

    for (i=0; i<npass; i++){
    nbytes=recv(fd, buf, size_buf, 0);
    ..check nbytes here. nbytes=size_buf
    nbytes=send(fd, &idone, 4, 0);
    ..check nbytes. nbytes should be equal to 4
    }

    Juefu

    ReplyDelete
  8. @jonathan: Resiliance only works if you have a concept of a natural "end" (so it works fine for XML but you have to design it into other forms of serialization). I have found that code for this approach is usually more complex than basic framing. Not every platform has SAX-style XML parsers available.

    ReplyDelete
  9. @Juefu: that is not a solution to the message framing problem. e.g., it is possible for "nbytes" to not be 4 where you expect it to be 4.

    ReplyDelete
  10. Stephen,

    You are right. Based on my tests. The back-and-forth communication between server and client does help to stabilize the data transfer ('stablize' means that no data is lost during socket streaming). But if the data size for each send/receive call should be small, for example 1024 bytes. If each time I wanted to transfer 6000 bytes, I got errors. Not sure about the best solution for this problem.

    I wrote a class to perform data exchange between server and client. My method is similar to the prefix method since each time I send a short message about the information about a big array that I will send to the client. Then in the class I compute the number of passes I need to call 'send', etc.

    So far I am satisfied with the performance. Does any one want to test my class? I can share it by putting it on my Google documents.

    Juefu

    ReplyDelete
  11. @Juefu: Data is not lost in TCP/IP transfers; it just gets transferred in a stream rather than messages. One solution for this problem is message framing, which is what my blog post is about.

    ReplyDelete
  12. Stephen,

    Thanks for the clarification. Network programming is totally new to me. I think that we are talking about different problems. It seems that the purpose of message framing is trying to break between messages so that the receiver can extract useful information from received stream. Is that right? In contrast, my problem is how to avoid data loss in TCP/IP transfer. My tests showed that the function 'send' couldn't send a big chunk of data a time. I had to split it into pieces and made many send calls. In addition, there should be some blocking mechanism to make sure each call is completed without errors before the next call of send. My trick was to ask the receiver to send back a confirmation, which seemed to work for me. But now I am concerned since it is not quaranteed to be right as you suggested.

    I was trying to develop a class for my parallel computing applications, which actually needs message framing. Typically in my application, the server (parent) reads data from disk and sends data and instructions to clients (children), and then collect results from children. I don't want to use MPI since it is not robust in handling individual host failure. Do you have any suggestions for this kind of problems? By the way, I working on Linux system. Are you working on Windows? Thank you for your help.

    Juefu

    ReplyDelete
    Replies
    1. Sorry for replying so late - your message got caught in Google's spam filters...

      Message framing is used to break up the TCP/IP stream into messages. TCP/IP itself guarantees that no data will be "lost." If you lose data, the connection is broken. This is true on all platforms (Windows/Linux/whatever). TCP/IP will split up the stream into packets on the wire and re-assemble them in the correct order automatically on the other side. There's no length restriction on TCP/IP data. Perhaps you're using UDP/IP?

      Delete
  13. Stephen,

    thanks for you mantra, an important help.
    What experience do you have with transmission of data streams over unsecure connections, means e.g. a WLAN, where sometimes there is a connection loss and a reconnection of client and server is neccessary. Target is the same: Transfer messages over unsecure connection by handling a tcp stream with a small protocol (length coded or with delimiters). But now there is an additional level that client and server have to make decisions after which time they assume a connection loss and decide to make a reconnect. Did you manage problems like this ?
    Markus

    ReplyDelete
    Replies
    1. I've developed some systems like this, where a command/response has to survive underlying connection loss and reconnection. Conceptually, it's just an abstraction layer.

      The systems I did were quite complex due to timeouts and error recovery requirements. In modern-day code (particularly with async/await), I think this kind of system would be much simpler.

      Idempotent commands really help in situations like this.

      Delete
  14. How do you check if the UDP socket on other side has shutdown/stop. Since, UDP is unreliable, it is difficult to know if UDP on other side has stopped working or has shut down

    ReplyDelete
    Replies
    1. If you need the concept of a "connection" complete with detection of dropped connections, then use TCP. Otherwise, you'll just end up re-inventing TCP over UDP, which can get really ugly.

      Delete
  15. I am talking to a piece of test equipment which is already using a message delimiter at the end of each message. So I can't implement your message length solution (which I love and use often by the way). Do you have an example of a message-delimiter solution?

    ReplyDelete
    Replies
    1. I do not have a message delimiter example.

      I'd start off by partial-reading a large amount off the socket (the maximum message size, if there's a maximum), splitting by the delimiter, and saving any leftovers for next time. If you don't have a maximum message size, then you'll also need to handle growing your buffer if necessary.

      In .NET, I find the string methods (Split, Substring) more useful than the binary methods (Array.Copy, etc), so if your protocol is something simple like ASCII (many delimiter-based protocols are), I'd start with a conversion to string and then do the message framing. But don't do this if your protocol is any kind of Unicode (e.g., UTF-8 or UTF-16) or binary.

      If you do have tight timing constraints (e.g., you're reading a continuous stream of messages), then you may find my ArraySegments library useful (it's on NuGet). This makes it easier to minimize the amount of copying done (and memory garbage created).

      Delete