xp
Ruby gem that adds 4 new methods to String class that enables easy scraping of HTML/XML documents.
Installation
$ gem install xp
Usage
$ curl -s 'https://news.ycombinator.com' | xp --text '//td[class="title"]/a'
OR
$ curl -s 'https://news.ycombinator.com' | xp --text 'td.title > a'
The gem can also be used in Ruby scripts, by requiring the gem - require 'xp'.
Example
The following one liner can download all Dribbble shots from its home page:
'https://dribbble.com/'.css('.dribbble-link img').xpath('//img/@src').map { |link| link.text.download }
API
xp adds the following methods to the String class:
| Method | Return type | Remarks |
|---|---|---|
to_nokogiri |
Nokogiri::XML::Document |
Converts a url or a page source to Nokogiri object |
css(selector) |
String |
Filters a url or html string based on the selector |
xpath(selector) |
Strng |
Filters a url or html string based on the selector |
download(location: 'downloads', name: nil) |
String |
Downloads the url in the string; can be customized via the optional parameters. |
page_source(user_agent_alias: :mac_firefox, user_agent: nil) |
String |
Gets the page source of a url; can be customized via optional parameters. |
url? |
Boolean |
Checks whether current string is a url. |