XMLHttpRequest

Browser APIs and Protocols, Chapter 15

Introduction

XMLHttpRequest (XHR) is a browser-level API that enables the client to script data transfers via JavaScript. XHR made its first debut in Internet Explorer 5, became one of the key technologies behind the Asynchronous JavaScript and XML (AJAX) revolution, and is now a fundamental building block of nearly every modern web application.

XMLHTTP changed everything. It put the "D" in DHTML. It allowed us to asynchronously get data from the server and preserve document state on the client… The Outlook Web Access (OWA) team’s desire to build a rich Win32 like application in a browser pushed the technology into IE that allowed AJAX to become a reality.

Jim Van Eaton, Outlook Web Access: A catalyst for web evolution

Prior to XHR, the web page had to be refreshed to send or fetch any state updates between the client and server. With XHR, this workflow could be done asynchronously and under full control of the application JavaScript code. XHR is what enabled us to make the leap from building pages to building interactive web applications in the browser.

However, the power of XHR is not only that it enabled asynchronous communication within the browser, but also that it made it simple. XHR is an application API provided by the browser, which is to say that the browser automatically takes care of all the low-level connection management, protocol negotiation, formatting of HTTP requests, and much more:

Free from worrying about all the low-level details, our applications can focus on the business logic of initiating requests, managing their progress, and processing returned data from the server. The combination of a simple API and its ubiquitous availability across all the browsers makes XHR the "Swiss Army knife" of networking in the browser.

As a result, nearly every networking use case (scripted downloads, uploads, streaming, and even real-time notifications) can and have been built on top of XHR. Of course, this doesn’t mean that XHR is the most efficient transport in each case—in fact, as we will see, far from it—but it is nonetheless often used as a fallback transport for older clients, which may not have access to newer browser networking APIs. With that in mind, let’s take a closer look at the latest capabilities of XHR, its use cases, and performance do’s and don’ts.

An exhaustive analysis of the full XHR API and its capabilities is outside the scope of our discussion—our focus is on performance! Refer to the official W3C standard for an overview of the XMLHttpRequest API.

§Brief History of XHR

Despite its name, XHR was never intended to be tied to XML specifically. The XML prefix is a vestige of a decision to ship the first version of what became known as XHR as part of the MSXML library in Internet Explorer 5:

This was the good-old-days when critical features were crammed in just days before a release…I realized that the MSXML library shipped with IE and I had some good contacts over in the XML team who would probably help out—I got in touch with Jean Paoli who was running that team at the time and we pretty quickly struck a deal to ship the thing as part of the MSXML library. Which is the real explanation of where the name XMLHTTP comes from—the thing is mostly about HTTP and doesn’t have any specific tie to XML other than that was the easiest excuse for shipping it so I needed to cram XML into the name.

Alex Hopmann, The story of XMLHTTP

Mozilla modeled its own implementation of XHR against Microsoft’s and exposed it via the XMLHttpRequest interface. Safari, Opera, and other browsers followed, and XHR became a de facto standard in all major browsers—hence the name and why it stuck. In fact, the official W3C Working Draft specification for XHR was only published in 2006, well after XHR came into widespread use!

However, despite its popularity and key role in the AJAX revolution, the early versions of XHR provided limited capabilities: text-based-only data transfers, restricted support for handling uploads, and inability to handle cross-domain requests. To address these shortcomings, the "XMLHttpRequest Level 2" draft was published in 2008, which added a number of new features:

In 2011, "XMLHttpRequest Level 2" specification was merged with the original XMLHttpRequest working draft. Hence, while you will often find references to XHR version or level 1 and 2, these distinctions are no longer relevant; today, there is only one, unified XHR specification. In fact, all the new XHR2 features and capabilities are offered via the same XMLHttpRequest API: same interface, more features.

New XHR2 features are now supported by all the modern browsers; see caniuse.com/xhr2. Hence, whenever we refer to XHR, we are implicitly referring to the XHR2 standard.

§Cross-Origin Resource Sharing (CORS)

XHR is a browser-level API that automatically handles myriad low-level details such as caching, handling redirects, content negotiation, authentication, and much more. This serves a dual purpose. First, it makes the application APIs much easier to work with, allowing us to focus on the business logic. But, second, it allows the browser to sandbox and enforce a set of security and policy constraints on the application code.

The XHR interface enforces strict HTTP semantics on each request: the application supplies the data and URL, and the browser formats the request and handles the full lifecycle of each connection. Similarly, while the XHR API allows the application to add custom HTTP headers (via the setRequestHeader() method), there are a number of protected headers that are off-limits to application code:

The browser will refuse to override any of the unsafe headers, which guarantees that the application cannot impersonate a fake user-agent, user, or the origin from where the request is being made. In fact, protecting the origin header is especially important, as it is the key piece of the "same-origin policy" applied to all XHR requests.

An "origin" is defined as a triple of application protocol, domain name, and port number—e.g., (http, example.com, 80) and (https, example.com, 443) are considered as different origins. For more details see The Web Origin Concept.

The motivation for the same-origin policy is simple: the browser stores user data, such as authentication tokens, cookies, and other private metadata, which cannot be leaked across different applications—e.g., without the same origin sandbox an arbitrary script on example.com could access and manipulate users’ data on thirdparty.com!

To address this specific problem, early versions of XHR were restricted to same-origin requests only, where the requesting origin had to match the origin of the requested resource: an XHR initiated from example.com could request another resource only from the same example.com origin. Alternatively, if the same origin precondition failed, then the browser would simply refuse to initiate the XHR request and raise an error.

However, while necessary, the same-origin policy also places severe restrictions on the usefulness of XHR: what if the server wants to offer a resource to a script running in a different origin? That’s where "Cross-Origin Resource Sharing" (CORS) comes in! CORS provides a secure opt-in mechanism for client-side cross-origin requests:

// script origin: (http, example.com, 80)
var xhr = new XMLHttpRequest();
xhr.open('GET', '/resource.js'); 
xhr.onload = function() { ... };
xhr.send();

var cors_xhr = new XMLHttpRequest();
cors_xhr.open('GET', 'http://thirdparty.com/resource.js'); 
cors_xhr.onload = function() { ... };
cors_xhr.send();
  1. Same-origin XHR request

  2. Cross-origin XHR request

CORS requests use the same XHR API, with the only difference that the URL to the requested resource is associated with a different origin from where the script is being executed: in the previous example, the script is executed from (http, example.com, 80), and the second XHR request is accessing resource.js from (http, thirdparty.com, 80).

The opt-in authentication mechanism for the CORS request is handled at a lower layer: when the request is made, the browser automatically appends the protected Origin HTTP header, which advertises the origin from where the request is being made. In turn, the remote server is then able to examine the Origin header and decide if it should allow the request by returning an Access-Control-Allow-Origin header in its response:

=> Request
GET /resource.js HTTP/1.1
Host: thirdparty.com
Origin: http://example.com 
...

<= Response
HTTP/1.1 200 OK
Access-Control-Allow-Origin: http://example.com 
...
  1. Origin header is automatically set by the browser.

  2. Opt-in header is set by the server.

In the preceding example, thirdparty.com decided to opt into cross-origin resource sharing with example.com by returning an appropriate access control header in its response. Alternatively, if it wanted to disallow access, it could simply omit the Access-Control-Allow-Origin header, and the client’s browser would automatically fail the sent request.

If the third-party server is not CORS aware, then the client request will fail, as the client always verifies the presence of the opt-in header. As a special case, CORS also allows the server to return a wildcard (Access-Control-Allow-Origin: *) to indicate that it allows access from any origin. However, think twice before enabling this policy!

With that, we are all done, right? Turns out, not quite, as CORS takes a number of additional security precautions to ensure that the server is CORS aware:

To enable cookies and HTTP authentication, the client must set an extra property (withCredentials) on the XHR object when making the request, and the server must also respond with an appropriate header (Access-Control-Allow-Credentials) to indicate that it is knowingly allowing the application to include private user data. Similarly, if the client needs to write or read custom HTTP headers or wants to use a "non-simple method" for the request, then it must first ask for permission from the third-party server by issuing a preflight request:

=> Preflight request
OPTIONS /resource.js HTTP/1.1 
Host: thirdparty.com
Origin: http://example.com
Access-Control-Request-Method: POST
Access-Control-Request-Headers: My-Custom-Header
...

<= Preflight response
HTTP/1.1 200 OK 
Access-Control-Allow-Origin: http://example.com
Access-Control-Allow-Methods: GET, POST, PUT
Access-Control-Allow-Headers: My-Custom-Header
...

(actual HTTP request) 

  1. Preflight OPTIONS request to verify permissions

  2. Successful preflight response from third-party origin

  3. Actual CORS request

The official W3C CORS specification defines when and where a preflight request must be used: "simple" requests can skip it, but there are a variety of conditions that will trigger it and add a minimum of a full roundtrip of network latency to verify permissions. The good news is, once a preflight request is made, it can be cached by the client to avoid the same verification on each request.

CORS is supported by all modern browsers; see caniuse.com/cors. For a deep dive on various CORS policies and implementation refer to the official W3C standard.

§Downloading Data with XHR

XHR can transfer both text-based and binary data. In fact, the browser offers automatic encoding and decoding for a variety of native data types, which allows the application to pass these types directly to XHR to be properly encoded, and vice versa, for the types to be automatically decoded by the browser:

ArrayBuffer

Fixed-length binary data buffer

Blob

Binary large object of immutable data

Document

Parsed HTML or XML document

JSON

JavaScript object representing a simple data structure

Text

A simple text string

Either the browser can rely on the HTTP content-type negotiation to infer the appropriate data type (e.g., decode an application/json response into a JSON object), or the application can explicitly override the data type when initiating the XHR request:

var xhr = new XMLHttpRequest();
xhr.open('GET', '/images/photo.webp');
xhr.responseType = 'blob'; 

xhr.onload = function() {
  if (this.status == 200) {
    var img = document.createElement('img');
    img.src = window.URL.createObjectURL(this.response); 
    img.onload = function() {
        window.URL.revokeObjectURL(this.src); 
    }
    document.body.appendChild(img);
  }
};

xhr.send();
  1. Set return data type to blob

  2. Create unique object URI from blob and set as image source

  3. Release the object URI once image is loaded

Note that we are transferring an image asset in its native format, without relying on base64 encoding, and adding an image element to the page without relying on data URIs. There is no network transmission overhead or encoding overhead when handling the received binary data in JavaScript! XHR API allows us to script efficient, dynamic applications, regardless of the data type, right from JavaScript.

The blob interface is part of the HTML5 File API and acts as an opaque reference for an arbitrary chunk of data (binary or text). By itself, a blob reference has limited functionality: you can query its size, MIME type, and split it into smaller blobs. However, its real role is to serve as an efficient interchange mechanism between various JavaScript APIs.

§Uploading Data with XHR

Uploading data via XHR is just as simple and efficient for all data types. In fact, the code is effectively the same, with the only difference that we also pass in a data object when calling send() on the XHR request. The rest is handled by the browser:

var xhr = new XMLHttpRequest();
xhr.open('POST','/upload');
xhr.onload = function() { ... };
xhr.send("text string"); 

var formData = new FormData(); 
formData.append('id', 123456);
formData.append('topic', 'performance');

var xhr = new XMLHttpRequest();
xhr.open('POST', '/upload');
xhr.onload = function() { ... };
xhr.send(formData); 

var xhr = new XMLHttpRequest();
xhr.open('POST', '/upload');
xhr.onload = function() { ... };
var uInt8Array = new Uint8Array([1, 2, 3]); 
xhr.send(uInt8Array.buffer); 
  1. Upload a simple text string to the server

  2. Create a dynamic form via FormData API

  3. Upload multipart/form-data object to the server

  4. Create a typed array (ArrayBuffer) of unsigned, 8-bit integers

  5. Upload chunk of bytes to the server

The XHR send() method accepts one of DOMString, Document, FormData, Blob, File, or ArrayBuffer objects, automatically performs the appropriate encoding, sets the appropriate HTTP content-type, and dispatches the request. Need to send a binary blob or upload a file provided by the user? Simple: grab a reference to the object and pass it to XHR. In fact, with a little extra work, we can also split a large file into smaller chunks:

var blob = ...; 

const BYTES_PER_CHUNK = 1024 * 1024; 
const SIZE = blob.size;

var start = 0;
var end = BYTES_PER_CHUNK;

while(start < SIZE) { 
  var xhr = new XMLHttpRequest();
  xhr.open('POST', '/upload');
  xhr.onload = function() { ... };

  xhr.setRequestHeader('Content-Range', start+'-'+end+'/'+SIZE); 
  xhr.send(blob.slice(start, end)); 

  start = end;
  end = start + BYTES_PER_CHUNK;
}
  1. An arbitrary blob of data (binary or text)

  2. Set chunk size to 1 MB

  3. Iterate over provided data in 1MB increments

  4. Advertise the uploaded range of data (start-end/total)

  5. Upload 1 MB slice of data via XHR

XHR does not support request streaming, which means that we must provide the full payload when calling send(). However, this example illustrates a simple application workaround: the file is split and uploaded in chunks via multiple XHR requests. This implementation pattern is by no means a replacement for a true request streaming API, but it is nonetheless a viable solution for some applications.

Slicing large file uploads is a good technique to provide more robust API where connectivity is unstable or intermittent—e.g., if a chunk fails due to dropped connection, the application can retry, or resume the upload later instead of restarting the full transfer from the start.

§Monitoring Download and Upload Progress

Network connectivity can be intermittent, and latency and bandwidth are highly variable. So how do we know if an XHR request has succeeded, timed out, or failed? The XHR object provides a convenient API for listening to progress events (Table 15-1), which indicate the current status of the request.

Event type Description Times fired
loadstart Transfer has begun once
progress Transfer is in progress zero or more
error Transfer has failed zero or once
abort Transfer is terminated zero or once
load Transfer is successful zero or once
loadend Transfer has finished once
Table 15-1. XHR progress events

Each XHR transfer begins with a loadstart and finishes with a loadend event, and in between, one or more additional events are fired to indicate the status of the transfer. Hence, to monitor progress the application can register a set of JavaScript event listeners on the XHR object:

var xhr = new XMLHttpRequest();
xhr.open('GET','/resource');
xhr.timeout = 5000; 

xhr.addEventListener('load', function() { ... }); 
xhr.addEventListener('error', function() { ... }); 

var onProgressHandler = function(event) {
  if(event.lengthComputable) {
    var progress = (event.loaded / event.total) * 100; 
    ...
  }
}

xhr.upload.addEventListener('progress', onProgressHandler); 
xhr.addEventListener('progress', onProgressHandler); 
xhr.send();
  1. Set request timeout to 5,000 ms (default: no timeout)

  2. Register callback for successful request

  3. Register callback for failed request

  4. Compute transfer progress

  5. Register callback for upload progress events

  6. Register callback for download progress events

Either the load or error event will fire once to indicate the final status of the XHR transfer, whereas the progress event can fire any number of times and provides a convenient API for tracking transfer status: we can compare the loaded attribute against total to estimate the amount of transferred data.

To estimate the amount of transferred data, the server must provide a content length in its response: we can’t estimate progress of chunked transfers, since by definition, the total size of the response is unknown.

Also, XHR requests do not have a default timeout, which means that a request can be "in progress" indefinitely. As a best practice, always set a meaningful timeout for your application and handle the error!

§Streaming Data with XHR

In some cases an application may need or want to process a stream of data incrementally: upload the data to the server as it becomes available on the client, or process the downloaded data as it arrives from the server. Unfortunately, while this is an important use case, today there is no simple, efficient, cross-browser API for XHR streaming:

Streaming has never been an official use case within the official XHR specification. As a result, short of manually splitting an upload into smaller, individual XHRs, there is no API for streaming data from client to server. Similarly, while the XHR2 specification does provide some ability to read a partial response from the server, the implementation is inefficient and very limited. That’s the bad news.

The good news is that there is hope on the horizon! Lack of streaming support as a first-class use case for XHR is a well-recognized limitation, and there is work in progress to address the problem:

Web applications should have the ability to acquire and manipulate data in a wide variety of forms, including as a sequence of data made available over time. This specification defines the basic representation for Streams, errors raised by Streams, and programmatic ways to read and create Streams.

W3C Streams API

The combination of XHR and Streams API will enable efficient XHR streaming in the browser. However, the Streams API is still under active discussion, and is not yet available in any browser. So, with that, we’re stuck, right? Well, not quite. As we noted earlier, streaming uploads with XHR is not an option, but we do have limited support for streaming downloads with XHR:

var xhr = new XMLHttpRequest();
xhr.open('GET', '/stream');
xhr.seenBytes = 0;

xhr.onreadystatechange = function() { 
  if(xhr.readyState > 2) {
    var newData = xhr.responseText.substr(xhr.seenBytes); 
    // process newData

    xhr.seenBytes = xhr.responseText.length; 
  }
};

xhr.send();
  1. Subscribe to state and progress notifications

  2. Extract new data from partial response

  3. Update processed byte offset

This example will work in most modern browsers. However, performance is not great, and there are a large number of implementation caveats and gotchas:

In short, currently, XHR streaming is neither efficient nor convenient, and to make matters worse, the lack of a common specification also means that the implementations differ from browser to browser. As a result, at least until the Streams API is available, XHR is not a good fit for streaming.

No need to despair! While XHR may not meet the criteria, we do have other transports that are optimized for the streaming use case: Server-Sent Events offers a convenient API for streaming text-based data from server to client, and WebSocket offers efficient, bidirectional streaming for both binary and text-based data.

§Real-Time Notifications and Delivery

XHR enables a simple and efficient way to synchronize client updates with the server: whenever necessary, an XHR request is dispatched by the client to update the appropriate data on the server. However, the same problem, but in reverse, is much harder. If data is updated on the server, how does the server notify the client?

HTTP does not provide any way for the server to initiate a new connection to the client. As a result, to receive real-time notifications, the client must either poll the server for updates or leverage a streaming transport to allow the server to push new notifications as they become available. Unfortunately, as we saw in the preceding section, support for XHR streaming is limited, which leaves us with XHR polling.

"Real-time" has different meanings for different applications: some applications demand submillisecond overhead, while others may be just fine with delays measured in minutes. To determine the optimal transport, first define clear latency and overhead targets for your application!

§Polling with XHR

One of the simplest strategies to retrieve updates from the server is to have the client do a periodic check: the client can initiate a background XHR request on a periodic interval (poll the server) to check for updates. If new data is available on the server, then it is returned in the response, and otherwise the response is empty.

Polling is simple to implement but frequently is also very inefficient. The choice of the polling interval is critical: long polling intervals translate to delayed delivery of updates, whereas short intervals result in unnecessary traffic and high overhead both for the client and the server. Let’s consider the simplest possible example:

function checkUpdates(url) {
  var xhr = new XMLHttpRequest();
  xhr.open('GET', url);
  xhr.onload = function() { ... }; 
  xhr.send();
}

setInterval(function() { checkUpdates('/updates') }, 60000); 
  1. Process received updates from server

  2. Issue an XHR request every 60 seconds

  • Each XHR request is a standalone HTTP request, and on average, HTTP incurs ~800 bytes of overhead (without HTTP cookies) for request/response headers.

  • Periodic checks work well when data arrives at predictable intervals. Unfortunately, predictable arrival rates are an exception, not the norm. Consequently, periodic polling will introduce additional latency delays between the message being available on the server and its delivery to the client.

  • Unless thought through carefully, polling often becomes a costly performance anti-pattern on wireless networks; see Eliminate Periodic and Inefficient Data Transfers. Waking up the radio consumes a lot of battery power!

What is the optimal polling interval? There isn’t one. The frequency depends on the requirements of the application, and there is an inherent trade-off between efficiency and message latency. As a result, polling is a good fit for applications where polling intervals are long, new events are arriving at a predictable rate, and the transferred payloads are large. This combination offsets the extra HTTP overhead and minimizes message delivery delays.

§Long-Polling with XHR

The challenge with periodic polling is that there is potential for many unnecessary and empty checks. With that in mind, what if we made a slight modification (Figure 15-1) to the polling workflow: instead of returning an empty response when no updates are available, could we keep the connection idle until an update is available?

Figure 15-1. Polling (left) vs. long-polling (right) latency
Figure 15-1. Polling (left) vs. long-polling (right) latency

The technique of leveraging a long-held HTTP request ("a hanging GET") to allow the server to push data to the browser is commonly known as "Comet." However, you may also encounter it under other names, such as "reverse AJAX," "AJAX push," and "HTTP push."

By holding the connection open until an update is available (long-polling), data can be sent immediately to the client once it becomes available on the server. As a result, long-polling offers the best-case scenario for message latency, and it also eliminates empty checks, which reduces the number of XHR requests and the overall overhead of polling. Once an update is delivered, the long-poll request is finished and the client can issue another long-poll request and wait for the next available message:

function checkUpdates(url) {
  var xhr = new XMLHttpRequest();
  xhr.open('GET', url);
  xhr.onload = function() { 
    ...
    checkUpdates('/updates'); 
  };
  xhr.send();
}

checkUpdates('/updates'); 
  1. Process received updates and open new long-poll XHR

  2. Issue long-poll request for next update (and loop forever)

  3. Issue initial long-poll XHR request

With that, is long-polling always a better choice than periodic polling? Unless the message arrival rate is known and constant, long-polling will always deliver better message latency. If that’s the primary criteria, then long-polling is the winner.

On the other hand, the overhead discussion requires a more nuanced view. First, note that each delivered message still incurs the same HTTP overhead; each new message is a standalone HTTP request. However, if the message arrival rate is high, then long-polling will issue more XHR requests than periodic polling!

Long-polling dynamically adapts to the message arrival rate by minimizing message latency, which is a behavior you may or may not want. If there is some tolerance for message latency, then polling may be a more efficient transport—e.g., if the update rate is high, then polling provides a simple "message aggregation" mechanism, which can reduce the number of requests and improve battery life on mobile handsets.

In practice, not all messages have the same priority or latency requirements. As a result, you may want to consider a mix of strategies: aggregate low-priority updates on the server, and trigger immediate delivery for high priority updates; see Nagle and Efficient Server Push.

§XHR Use Cases and Performance

XMLHttpRequest is what enabled us to make the leap from building pages to building interactive web applications in the browser. First, it enabled asynchronous communication within the browser, but just as importantly, it also made the process simple. Dispatching and controlling a scripted HTTP request takes just a few lines of JavaScript code, and the browser handles all the rest:

As such, XHR is a versatile and a high-performance transport for any transfers that follow the HTTP request-response cycle. Need to fetch a resource that requires authentication, should be compressed while in transfer, and should be cached for future lookups? The browser takes care of all of this and more, allowing us to focus on the application logic!

However, XHR also has its limitations. As we saw, streaming has never been an official use case in the XHR standard, and the support is limited: streaming with XHR is neither efficient nor convenient. Different browsers have different behaviors, and efficient binary streaming is impossible. In short, XHR is not a good fit for streaming.

Similarly, there is no one best strategy for delivering real-time updates with XHR. Periodic polling incurs high overhead and message latency delays. Long-polling delivers low latency but still has the same per-message overhead; each message is its own HTTP request. To have both low latency and low overhead, we need XHR streaming!

As a result, while XHR is a popular mechanism for "real-time" delivery, it may not be the best-performing transport for the job. Modern browsers support both simpler and more efficient options, such as Server-Sent Events and WebSocket. Hence, unless you have a specific reason why XHR polling is required, use them.