Thursday 7 August 2008

Web Browser Prefetching

Web Browser Prefetching A succinct description can be found from the link to Mozilla's FAQ: "Link prefetching is a browser mechanism, which utilizes browser idle time to download or prefetch documents that the user might visit in the near future. A web page provides a set of prefetching hints to the browser, and after the browser is finished loading the page, it begins silently prefetching specified documents and stores them in its cache. When the user visits one of the prefetched documents, it can be served up quickly out of the browser's cache." With this in mind, there could be scenarios where URL's are identified in internet history records which the user has not selected to visit. For this to happen there are a couple of fundamental requirements:
  1. A web page contains a prefetch link
  2. The web browser is set to act upon a prefetch link
For a quick test it's possible to use gemal's psyched site, but for a more real world example I used Google and Firefox to do a quick test. Google has, since March 2005 included the ability to prefetch the first result from a Google search which caused a few webmasters to get ruffled feathers from the fear of false hits skewing their stats (can be identified from Firefox clients with the X-moz: prefetch header). Interestingly none of the links to Google pages explaining their prefetching are working anymore. Firefox is by default set to enable prefetching and as far as I know can only be turned off by going to about:config and setting the value network.prefetch-next to False. I've not yet looked at IE or any of the additional plugins and tools that could also make use of prefetching.
I used the neat Firefox add on, HTTPFox to view the activity relating to the test.

The Test

I tried a few Google searches to see if the browser (Firefox 3) would then prefetch the first link but it wasn't working consistently.

Looking at the source for Google results page showed that a prefetch link wasn't always inserted. A bit more digging, and it appears that Google only inserts a prefetch link when the first result is a simple host name (e.g. www.microsoft.com).

I don't know when or if this has always been the case.

A search for microsoft (funnily enough) gives Microsoft's website as the first hit. Shortly after the Google results page had loaded a GET request appeared for www.microsoft.com, and then redirected to http://www.microsoft.com/en/us/default.aspx. A few times I see an aborted request, shown in HTTPFox as text/html (NS_BINDING_ABORTED). I suspect that this could be as a result of Firefox discarding the prefetch hint.

Just to confirm that this is recorded in internet history records, I did an internet history search in EnCase which showed the Google search and the subsequent Microsoft caching with no obvious sign that the Microsoft record was as a result of the prefetch and not of the user selecting the link.