Returning Arrow tables over HTTP is relatively straightforward. The Arrow IPC streaming format is repurposed to the HTTP use case and you can return each serialized IPC batch as a chunk of the HTTP response — Transfer-Encoding: chunked — to enable streaming clients.

Arrow IPC streams have a MIME type registered with IANA so you can also set the appropriate Content-Type header on your HTTP responses.

1
Content-Type: application/vnd.apache.arrow.stream

You can find Python examples of this in the arrow-experiments repository: https://github.com/apache/arrow-experiments/tree/main/http/get_simple/python.

Things get trickier when you try to support compression. Arrow IPC streams are not compressed by default and not all client implementations of the IPC protocol can parse streams with compressed buffers.

We will get back to IPC compression in a moment, but first let’s see how we can achieve compression of the data by compressing the HTTP response body itself.

Compressing HTTP Responses

All clients that can handle compressed HTTP responses can benefit from this. A server can look at the Accept-Encoding header of the request and decide to compress the response body if the client supports it. Chrome, Firefox, and most modern browsers support gzip and deflate compression as well as modern codecs like br (Brotli) and zstd (Zstandard)1.

Web browsers and HTTP client libraries will automatically decompress the response body if the Content-Encoding header is set to the codec used to compress the response body.

1
Content-Encoding: gzip

When Chrome makes an HTTP request, it sets the Accept-Encoding: gzip, deflate, br, zstd header to indicate that it can handle compressed responses in any of these formats. If you’re using a library to make HTTP requests, make sure it sets this header correctly so the server can compress the response body without worrying about compatibility with the client.

Using Arrow IPC Compression

The buffers placed in a serialized Arrow IPC stream can be compressed using the LZ4 or Zstandard codecs. Arrow IPC compression has nothing to do with the HTTP protocol — the compression and decompression is done by the Arrow library itself as it parses the stream. Note, though, that not all Arrow implementations support compression.2

To maximize the chances of compression without compatibility issues, you can ask the clients to communicate their compression preferences in the request headers. This can be achieved by passing parameters to the MIME type in the Accept request header.3

1
Accept: application/vnd.apache.arrow.stream; codecs="lz4, zstd"

This way, the server can decide to compress the IPC stream buffers using LZ4 or Zstandard without fear that client won’t be able to decode it. If the client doesn’t specify a codec, the server can disable compression altogether, fall back to HTTP compression (based on the Accept-Encoding header), or choose a compression codec at the risk of incompatibility.

You can find examples of this in Python at https://github.com/apache/arrow-experiments/pull/35.

To make troubleshooting easier, the server may indicate which compression codec was used to compress the IPC stream, even though the Arrow IPC stream reader picks up which codec was used from the data stream itself.

1
Content-Type: application/vnd.apache.arrow.stream; codecs=lz4

A Note on Double-Compression

Applying both IPC buffer and HTTP compression to the same data is not recommended. The extra CPU overhead of decompressing the data twice is not worth any possible gains that double compression might bring. If compression ratios are unambiguously more important than reducing CPU overhead, then a different compression algorithm4 that optimizes for that can be chosen.

Conclusion

Returning Arrow tables over HTTP is a powerful way to share data between applications. Unsurprisingly, columnar formats like Arrow are very friendly to compression and serialization. This is why Arrow works well as a network transfer format. As tooling evolves you can expect to see more and more Arrow tables being shared over HTTP instead of CSV or JSON.

  1. Zstdandard is not supported on Safari as of 2024. 

  2. Check the implementation status table in the Arrow documentation to see if the Arrow implementation you’re using supports buffer compression. 

  3. This is similar to clients requesting video streams by specifying the container format and the codecs they support like this Accept: video/webm; codecs="vp8, vorbis". 

  4. Compression codecs can be instantiated with different compression levels trading off compression ratio for speed.Â