Google Ajax Crawler

Build Status Gem Version

Rack Middleware adhering to the Google Ajax Crawling Scheme, using a headless browser to render JS heavy pages and serve a dom snapshot of the rendered state to a requesting search engine.

Details of the scheme can be found at:



gem install google_ajax_crawler

In your

require 'google_ajax_crawler'

use GoogleAjaxCrawler::Crawler do |config|
  config.page_loaded_test = lambda {|driver|'document.getElementById("loading") == null') }

app = lambda {|env| [200, {'Content-Type' => 'text/plain'}, "b" ] }
run app


In the examples folder, each driver has a rackup file, which can be launched:

rackup examples/[driver_name].ru

then open a browser to http://localhost:9292/#!test and view source.... This is how a search engine will see your page. NOTE: don't look at the markup through a web inspector as it will most likely display dom elements rendered on the fly by js.

Change the url to http://localhost:9292/?_escaped_fragment_=test , and then again view source to see how the DOM state has been captured

Configuration Options


Tell the crawler when your page has finished loading / rendering. As determining when a page has completed rendering can depend on a number of qualitative factors (i.e. all ajax requests have responses, certain content has been displayed, or even when there are no loaders / spinners visible on the page), the page loaded test allows you to specify when the crawler should decide that your page has finished loading / rendering and to return a snapshot of the rendered dom at that time.

The current crawler driver is passed to the lambda to allow querying of the current page's dom state.

A good pattern is to test your page state in a js function returning a boolean, accessible from the window context.. i.e.

use GoogleAjaxCrawler::Crawler do |config|
  config.page_loaded_test = lambda {|driver|'myApp.isPageLoaded()') }


The max time the crawler should wait before returning a response


The configured google ajax crawler driver used to query the current page state. Presently there is only one driver (now taking pull requests!); CapybaraWebkit


How often (in seconds) to test the page state with the configured page_loaded_test


What response headers shoudl be returned with the dom snapshot. Default headers specify the content-type text/html


All free - Use, modify, fork to your hearts content... See LICENSE.txt for further details.