Bi-Directional Communications for the Web

Or: Websockets for Fun and Profit

 

What Is This Hyper Text Thing Anyway?

HTTP (Hyper Text Transport Protocol) was originally built to provide a resilient means of conveying data across a unreliable network of unreliable networks, and it arguably accomplishes this very well. It is now one of the most widely adopted and implemented application protocols in the world.

The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless, protocol that can be used for many tasks beyond its use for hypertext, such as name servers and distributed object management systems, through extension of its request methods, error codes and headers. A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.

RFC 2616: HTTP/1.1, June 1999

HTTP and it’s semantics now underpin a host of other protocols (such as HLS and MPEG-DASH for delivering video, WebDAV for authoring, plus many more) and architectural styles, such as REST.

The standard has recently been updated to HTTP/2 which looks to improve upon HTTP/1.1 with support for multiplexing for requests and responses, and minimising protocol overhead by introducing compression of HTTP headers.

So Why Do We Need Anything Else?

HTTP is great for delivering single, stateless Request/Response messages. However, in some cases what we actually want is full duplex, stateful communication over a network socket. If the messages are passing both ways between client and server, and they needs to be conveyed quickly with a minimum of overhead, then HTTP is not an efficient transport method.

We can mimic full-duplex bi-directional communication over HTTP using certain approaches that come under the umbrella term COMET, such as:

Of these, we found that Long-Polling gave the best trade-off between browser support and efficiency of use. However, this still meant that in order to push a relatively small message back to a web client, the HTTP overhead was significant.

Enter the Websocket…

The Websocket Protocol offers a native, low overhead implementation of full-duplex bi-directional communication. Supported in all modern browsers, it offers a high level message-based JavaScript API. It is the closest thing to a raw TCP socket you can get in the browser, but abstracts away many of the complexities behind its straightforward API. It also provides additional functionality such as:

  • Connection negotiation
  • Same-Origin policy enforcement
  • Works across existing HTTP infrastructure
  • Message-based communication and efficient message framing
  • Subprotocol negotiation and extensibility

Message Framing

The Websocket protocol provides a means for chopping up large messages in to multiple frames. A message will be split up into 1 or more frames whether it is text or binary data. Each frame has framing data that describes attributes of the frame, such as the length of the payload and whether it contains text or binary data.

This diagram shows how an individual frame is composed:

Diagram1

Diagram from the IETF WebSocket Protocol RFC

The Websocket JavaScript API exposed in the browser abstracts the framing of messages away from the client, so that only whole messages are sent or received. The client does not need to worry about buffering, or constructing messages from frames.

As an example, if the server sends a 500KB message, the Websocket JavaScript API will only fire the onmessage event when all 500KB has been received and the message re-constructed.

Subprotocol Negotiation

Websockets provides a means of conveying information, but place no constraints on the format of those messages. Other than a single bit flag to indicate whether the content is text or binary, the content of the messages can be anything.

Websockets does provide a mechanism for applications to negotiate the agreed format of the message payload between client and server, allowing the ability for multiple versions of application protocols to be offered. The client provides a list of supported subprotocols, from which the server can choose the best option and specify that back to the client.

Here is an example which illustrates the JavaScript API, and shows the negotiation from a clients point of view:

diagram2

 

If the subprotocol negotiation is successful the onopen callback is called on the client. The application can check the protocol property on the WebSocket instance to get the servers specified protocol version.

However, if the server does not support any of the protocols provided by the client, the WebSocket handshake does not complete and the clients onerror callback is called and the connection is ended.

The Bad News

Websockets are susceptible to head-of-line blocking. Messages are split into one or more frames and the frames delivered in order, but frames from different messages cannot be interleaved with each other. This means that large messages can block other messages from being delivered.

It is important to consider this when sending large payloads in either direction – if you are relying on small messages containing time-sensitive updates, then any large message will prevent these from being delivered until the large message has completed.

One of the ways that we tried to minimise the impact of this was to look at applying compression to messages.

Eat Me, Drink Me

What HTTP provides, along with many other benefits, is compression. Typically most HTTP Response bodies are encoded using a compression algorithm such as Gzip.

However, Websockets in their vanilla form provide no such mechanism – but they do provide extensibility. There are 2 different optional extensions for Websockets that provide compression:

  • x-webkit-deflate-frame
    • This applies compression to each frame individually
    • This is considered deprecated in favour of:
  • permessage-deflate
    • This applies compression to the entire message before being sliced up into frames
    • This is the currently preferred compression option and is supported in Chrome and Firefox

Browser Support

This table shows Websocket and compression support across a selection of popular browsers:

table1

 

Negotiation

In order for compression to be enabled for the session, the browser must advertise that it supports these extensions. This is done using Websocket specific headers as part of the initial HTTP handshake:

GET https://websocket.server.org/socket/ HTTP/1.1

Host: websocket.server.org
Connection: Upgrade
Upgrade: websocket
Origin: https://www.server.org
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: YbRtcdt+XsuQEgaksQar/g==
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Sec-WebSocket-Protocol: zap-protocol-v1
  
HTTP/1.1 101 Switching Protocols
connection: Upgrade
sec-websocket-extensions: permessage-deflate; server_no_context_takeover; client_max_window_bits=15
sec-websocket-protocol: zap-protocol-v1
upgrade: websocket
sec-websocket-accept: KZ3q5Gs4+31H86aDlC8AMBAIcFU=

In this handshake, the browser advertises support for the permessage-deflate extension, along with client_max_window_bits which means that it supports customizing the size of the “sliding window” used by the LZ77 algorithm

Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits

And in return the server responds to state that it will also support that extension, and also that the client should use a 215 bits (32KB) window size

sec-websocket-extensions: permessage-deflate; server_no_context_takeover; client_max_window_bits=15

 

Coping with Rejection

We made a concious decision that if a client does not advertise support for either compression extension then we shall reject the connection. The client can then decide to connect over a different transport (typically HTTP Long-Polling), which does support compression. In this way we can ensure that we are sending optimal data payloads to the client without consuming excessive amounts of data.

In this example, a browser that does not support compression but does support Websockets (such as IE11) tries to open a Websocket connection, which is rejected with a 400 Bad Request response to indicate that the request is not valid.

GET https://websocket.server.org/socket/ HTTP/1.1
Host: websocket.server.org
Connection: Upgrade
Upgrade: Websocket
Origin: websocket.server.org
Sec-WebSocket-Protocol: zap-protocol-v1
Sec-WebSocket-Key: lwg3gWRwcNr0pOvumDRbTA==
Sec-WebSocket-Version: 13
HTTP/1.1 400 Bad Request
content-length: 0

In this case we ensure that if the Websocket connection fails with a 400 Bad Request then we revert back to HTTP Long-Polling.

Conclusion

What we learned:

  • Websockets are faster and lower overhead than emulating bi-direction communication over HTTP for both client and server.
  • However, if you’re sending anything more than the smallest of messages then you should ensure that you’re using compression.
  • If your client doesn’t support websocket compression, then you’re probably better off staying on HTTP using COMET techniques.

Leave a Reply

*Mandatory fields