I ran into this problem the other day, and thought it would serve as a good use case study.
Put simply, I want to write an ASP.NET WebAPI action that will download a bunch of URLs and generate a zip on the fly, without storing any files in memory, and without blocking any threads on I/O.
Downloading a Simple Stream
It’s pretty straightforward to download a simple stream in WebAPI; there’s built-in support for that:
First, we (asynchronously) send the HTTP request and get the stream back (at this point, the file is not in memory). Next, we create an
HttpResponseMessage that will use that stream as its content (
StreamContent). Finally, we set a few headers so that this file downloads nicely if a browser makes the request. If you hit this API in a browser, it will download the
README.md file indirectly through the WebAPI.
Constructing a Zip File on the Fly Using MemoryStream
Let’s extend this example to have our WebAPI download multiple files, and combine them into a single zip file which is then downloaded by the user.
We can do this as such (using the excellent
Here we’re downloading two files and combining them on-the-fly into a single zip which is downloaded by the user. First, we create the
MemoryStream that the zip library will write to. Then, we download all the files asynchronously, and add them to the zip archive. Finally, we rewind the
MemoryStream containing the zip file and send it to the browser using good ‘ol
The problem with this approach is the
Storing the zip archive in the
MemoryStream (as you may infer from the name) means that we’re building up the entire zip file in memory. The code is asynchronously downloading (using
HttpClient), and WebAPI will asynchronously send it to the browser (using
StreamContent), but we are holding the entire zip in memory in the meantime.
There is a way to build the zip file while it is being streamed to the client. This is possible because the zip file format lists its contents at the end of the file.
To use this kind of dynamic streaming, we can’t use
StreamContent. What we really want is to write to the output stream directly. With ASP.NET MVC, we could use
HttpResponse.OutputStream to grab the output stream directly and write to it (not ideal from a design standpoint, but it would work). This is not an option in ASP.NET WebAPI.
However, ASP.NET WebAPI does have a response type that acts as a “callback” that allows us to write directly to the output stream after we return from the controller action method. Its name is
Constructing a Zip File on the Fly Using PushStreamContent
I think of
PushStreamContent as just a “callback”. When ASP.NET has sent the headers and is ready to send the actual content, it just invokes the callback that we give to
PushStreamContent. Using this technique, our code looks like this:
With the callback approach, we can write the zip archive directly to the output stream.
The actual sequence of operations is this:
- ASP.NET calls our controller action.
- We build the list of filenames to download, and return an
HttpResponseMessagewith a status code, a callback, and some HTTP headers.
- ASP.NET starts sending the response to the client; it sends the status code and HTTP headers first.
- Then, when ASP.NET sends the response body to the client, it invokes our callback method.
PushStreamContentcallback starts writing a zip file (directly to the response body), downloading the files asynchronously one at a time and adding them to the zipped content.
- When our callback returns, ASP.NET completes the response.
This approach has some really nice advantages:
- All I/O is asynchronous. At no time are any threads blocked waiting to read the source files from their URLs, nor are any threads blocked waiting to write to the output response stream.
- The zip file is not held in memory. It is streamed directly to the client, compressing on-the-fly.
- In fact, for large files, not even a single file is read entirely into memory. Each file is individually compressed on-the-fly.
It’s interesting to think about how this API will start a download of the zip file, and the zip is already streaming when it may not even have started downloading the other source files!
Drawbacks to PushStreamContent
Well, it’s not all rainbows and unicorns, of course.
The primary drawback to
PushStreamContent is error handling. Since an HTTP response always starts with sending the status code (like
200 OK) and the headers, by the time our
PushStreamContent callback is invoked, it’s too late to notify the client of an error. So, what happens if our callback throws an exception?
Based on my testing, it appears that ASP.NET will abort the connection. With all the browsers I tested, they correctly interpreted it as a generic “network error”.
The problem is, ASP.NET can’t go back in time and change the status code or response headers. So it’s not possible to get any kind of detailed error information to the client if there’s a problem in the
PushStreamContent callback. The best you can do is just log the error on the server side. This is something to keep in mind when using
Update from the comments: as Paleta points out, you also must close the output stream when using
PushStreamContent. In our sample code,
ZipOutputStream closes the output stream for us, but if you forget to close the ouput stream, ASP.NET will not do it for you!
A fully-working solution for ASP.NET 4.6 is available on GitHub.