Ruby Readability

Ruby Readability is a tool for extracting the primary readable content of a webpage. It is a Ruby port of arc90's readability project.

Build Status

Build Status


Command line:

(sudo) gem install ruby-readability


gem "ruby-readability", :require => 'readability'


require 'rubygems'
require 'readability'
require 'open-uri'

source = open('').read


You may provide options to, including:

  • :tags: the base whitelist of tags to sanitize, defaults to %w[div p];
  • :remove_empty_nodes: remove <p> tags that have no text content; also removes <p> tags that contain only images;
  • :attributes: whitelist of allowed attributes;
  • :debug: provide debugging output, defaults false;
  • :encoding: if the page is of a known encoding, you can specify it; if left unspecified, the encoding will be guessed (only in Ruby 1.9.x). If you wish to disable guessing, supply :do_not_guess_encoding => true;
  • :html_headers: in Ruby 1.9.x these will be passed to the guess_html_encoding gem to aid with guessing the HTML encoding;
  • :ignore_image_format: for use with .images. For example: :ignore_image_format => ["gif", "png"];
  • :min_image_height: set a minimum image height for #images;
  • :min_image_width: set a minimum image width for #images.

Command Line Tool

Readability comes with a command-line tool for experimentation in bin/readability.

Usage: readability [options] URL
    -d, --debug                      Show debug output
    -i, --images                     Keep images and links
    -h, --help                       Show this message


You can get a list of images in the content area with Document#images. This feature requires that the fastimage gem be installed.

rbody =, :tags => %w[div p img a], :attributes => %w[src href], :remove_empty_nodes => false)
  • newspaper is an advanced news extraction, article extraction, and content curation library for Python.

Potential Issues

If you're on a Mac and are getting segmentation faults, see the discussion at and consider updating your version of libxml2. Version 2.7.8 of libxml2, installed with brew, worked for me:

gem install nokogiri -- --with-xml2-include=/usr/local/Cellar/libxml2/2.7.8/include/libxml2 --with-xml2-lib=/usr/local/Cellar/libxml2/2.7.8/lib --with-xslt-dir=/usr/local/Cellar/libxslt/1.1.26

Or if you're using bundler and Rails 3, you can run this command to make bundler always globally build nokogiri this way:

bundle config build.nokogiri -- --with-xml2-include=/usr/local/Cellar/libxml2/2.7.8/include/libxml2 --with-xml2-lib=/usr/local/Cellar/libxml2/2.7.8/lib --with-xslt-dir=/usr/local/Cellar/libxslt/1.1.26


This code is under the Apache License 2.0. See

Ruby port by cantino, starrhorne, libc, and iterationlabs. Special thanks to fizx and marcosinger.

Bitdeli Badge