Or: Websockets for Fun and Profit
What Is This Hyper Text Thing Anyway?
HTTP (Hyper Text Transport Protocol) was originally built to provide a resilient means of conveying data across a unreliable network of unreliable networks, and it arguably accomplishes this very well. It is now one of the most widely adopted and implemented application protocols in the world.
The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless, protocol that can be used for many tasks beyond its use for hypertext, such as name servers and distributed object management systems, through extension of its request methods, error codes and headers. A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.
The standard has recently been updated to HTTP/2 which looks to improve upon HTTP/1.1 with support for multiplexing for requests and responses, and minimising protocol overhead by introducing compression of HTTP headers.
So Why Do We Need Anything Else?
HTTP is great for delivering single, stateless Request/Response messages. However, in some cases what we actually want is full duplex, stateful communication over a network socket. If the messages are passing both ways between client and server, and they needs to be conveyed quickly with a minimum of overhead, then HTTP is not an efficient transport method.
We can mimic full-duplex bi-directional communication over HTTP using certain approaches that come under the umbrella term COMET, such as:
Of these, we found that Long-Polling gave the best trade-off between browser support and efficiency of use. However, this still meant that in order to push a relatively small message back to a web client, the HTTP overhead was significant.
Enter the Websocket…
- Connection negotiation
- Same-Origin policy enforcement
- Works across existing HTTP infrastructure
- Message-based communication and efficient message framing
- Subprotocol negotiation and extensibility
The Websocket protocol provides a means for chopping up large messages in to multiple frames. A message will be split up into 1 or more frames whether it is text or binary data. Each frame has framing data that describes attributes of the frame, such as the length of the payload and whether it contains text or binary data.
This diagram shows how an individual frame is composed:
Diagram from the IETF WebSocket Protocol RFC
onmessage event when all 500KB has been received and the message re-constructed.
Websockets provides a means of conveying information, but place no constraints on the format of those messages. Other than a single bit flag to indicate whether the content is text or binary, the content of the messages can be anything.
Websockets does provide a mechanism for applications to negotiate the agreed format of the message payload between client and server, allowing the ability for multiple versions of application protocols to be offered. The client provides a list of supported subprotocols, from which the server can choose the best option and specify that back to the client.
If the subprotocol negotiation is successful the
onopen callback is called on the client. The application can check the
protocol property on the WebSocket instance to get the servers specified protocol version.
However, if the server does not support any of the protocols provided by the client, the WebSocket handshake does not complete and the clients
onerror callback is called and the connection is ended.
The Bad News
Websockets are susceptible to head-of-line blocking. Messages are split into one or more frames and the frames delivered in order, but frames from different messages cannot be interleaved with each other. This means that large messages can block other messages from being delivered.
It is important to consider this when sending large payloads in either direction – if you are relying on small messages containing time-sensitive updates, then any large message will prevent these from being delivered until the large message has completed.
One of the ways that we tried to minimise the impact of this was to look at applying compression to messages.
Eat Me, Drink Me
What HTTP provides, along with many other benefits, is compression. Typically most HTTP Response bodies are encoded using a compression algorithm such as Gzip.
However, Websockets in their vanilla form provide no such mechanism – but they do provide extensibility. There are 2 different optional extensions for Websockets that provide compression:
- This applies compression to each frame individually
- This is considered deprecated in favour of:
- This applies compression to the entire message before being sliced up into frames
- This is the currently preferred compression option and is supported in Chrome and Firefox
This table shows Websocket and compression support across a selection of popular browsers:
In order for compression to be enabled for the session, the browser must advertise that it supports these extensions. This is done using Websocket specific headers as part of the initial HTTP handshake:
GET https://websocket.server.org/socket/ HTTP/1.1
In this handshake, the browser advertises support for the
permessage-deflate extension, along with
client_max_window_bits which means that it supports customizing the size of the “sliding window” used by the LZ77 algorithm
And in return the server responds to state that it will also support that extension, and also that the client should use a 215 bits (32KB) window size
Coping with Rejection
We made a concious decision that if a client does not advertise support for either compression extension then we shall reject the connection. The client can then decide to connect over a different transport (typically HTTP Long-Polling), which does support compression. In this way we can ensure that we are sending optimal data payloads to the client without consuming excessive amounts of data.
In this example, a browser that does not support compression but does support Websockets (such as IE11) tries to open a Websocket connection, which is rejected with a
400 Bad Request response to indicate that the request is not valid.
In this case we ensure that if the Websocket connection fails with a
400 Bad Request then we revert back to HTTP Long-Polling.
What we learned:
- Websockets are faster and lower overhead than emulating bi-direction communication over HTTP for both client and server.
- However, if you’re sending anything more than the smallest of messages then you should ensure that you’re using compression.
- If your client doesn’t support websocket compression, then you’re probably better off staying on HTTP using COMET techniques.