These are some of the ways that one can use boomerang. This will help us make sure the library actually supports all possible use cases.
We need the ability to measure the time the user thinks it took to load a page. This is typically the time between the user clicking a link (or entering a URL into the browser) and the page becoming usable.
While it’s easy enough to note the time when a user clicked a link if the link was on a page you control, it’s not easy to tell the exact moment when a user enters a URL into a browser and hits enter. We’ll therefore only concentrate on link clicks on pages that we control.
Breaking this into four separate use cases:
User clicks a link on a page we control and page is usable when onload fires
User types in URL/clicks a bookmark or link on a page we don’t control and our page is usable when onload fires and the user is using a browser that supports the WebTiming API (IE9+, Chrome, Firefox 7+).
See HOWTO #1a
User clicks a link on a page we control and page is usable at some developer determined point
User types in URL/clicks a bookmark or link on a page we don’t control and our page is usable at some developer determined point and the user is using a browser that supports the WebTiming API (IE9+, Chrome, Firefox 7+), but not in other browsers.
See HOWTO #1b
In all these cases the browser may not fire an event for when the download was initiated or completed and the library will need to expose methods/events that the web developer can invoke when needed.
See HOWTO #2.
Since users browse the web using different types of internet connections, it’s not always possible to aggregate page load times for multiple users to get an indication of a number that’s statistically representative for all users. Knowing the user’s bandwidth, however, allows us to aggregate data from similar users and use these numbers as representative for users with that type of network connection.
See HOWTO #3.
Many sites may be composed of separate components loaded from different back end services. For example, on my home page, I have widgets from dopplr, upcoming, twitter, delicious and flickr. Now while it’s important to have the overall page load time, which is what users perceive as my page’s performance, it can also be useful and instructive to measure the load time of each component separately, but as part of the measurement for the entire page. This allows us to then cut up a page’s load time into the load time for each of its components, and carry out statistical analysis of each component in the context of its containing page.
To do this, the library needs to expose additional timers that the page developer can set. These timers should be beaconed back along with the overall page load time.
See HOWTO #4.
Network Latency is one of the largest contributors to page download time. HTTP latency is typically higher than simple ping time because it requires a few to-and-fro packets to estabilish a TCP connection and make a HTTP request. There’s also overhead involved in HTTP request and response headers that don’t contribute to actual page content.
Measuring a user’s HTTP latency provides a good indication of the impact that parallelising component downloads can have on overall page load time.
See HOWTO #3.
When doing performance analysis, it is often useful to run an A/B test that compares the performance of two or more different page designs. Since these pages might all have the same URL, it becomes necessary to add some additional information to the data beaconed back that identifies the page from a series of tests.
The library should provide a method for the page developer to tag each page with additional information.
See HOWTO #5.
DNS latency tells us how long it takes to make a DNS request. This may be affected by various factors including your web server’s DNS configuration and TTL, and how an end user’s ISP’s DNS servers are configured. Measuring this latency gives us an indication of how many domains we can safely lookup on a single page. There’s a trade-off between the number of DNS lookups we can make and the parallelisability that multiple domains bring us. HTTP latency and DNS latency together can tell us where this point lies for our users.
See HOWTO #8.
Client side sampling won’t be as accurate as server side sampling since we don’t share state across clients, however we can still get a decently random sample.
See TODO #1.
If you have a URL entry point, chances are that someone will abuse it, either intentionally or accidentally.
The second possibility is that some script kiddie notices the beacon URL on your site and starts hitting it with a whole bunch of different combinations to try and exploit your server, or if nothing else, to try and DoS you.
boomerang should be able to protect against these problems.
See HOWTO #7.
Modern browsers include support for the Navigation Timing API that contains a lot of performance timing information related to page loading. The NavigationTiming plugin for boomerang collects this information and adds it to the beacon.
See HOWTO #9.
The latest code and docs is available on github.com/lognormal/boomerang