xp

Ruby gem that adds 4 new methods to String class that enables easy scraping of HTML/XML documents.

Installation

$ gem install xp

Usage

$ curl -s 'https://news.ycombinator.com' | xp --text '//td[class="title"]/a'

OR

$ curl -s 'https://news.ycombinator.com' | xp --text 'td.title > a'

The gem can also be used in Ruby scripts, by requiring the gem - require 'xp'.

Example

The following one liner can download all Dribbble shots from its home page:

'https://dribbble.com/'.css('.dribbble-link img').xpath('//img/@src').map { |link| link.text.download }

API

xp adds the following methods to the String class:

Method Return type Remarks
to_nokogiri Nokogiri::XML::Document Converts a url or a page source to Nokogiri object
css(selector) String Filters a url or html string based on the selector
xpath(selector) Strng Filters a url or html string based on the selector
download(location: 'downloads', name: nil) String Downloads the url in the string; can be customized via the optional parameters.
page_source(user_agent_alias: :mac_firefox, user_agent: nil) String Gets the page source of a url; can be customized via optional parameters.
url? Boolean Checks whether current string is a url.