Raev

Raev is a Ruby gem for fetching, parsing and normalizing meta data from websites. It was extracted from http://promoterapp.com.

Install

gem install raev

or add the following line to Gemfile:

gem 'raev'

and run bundle install from your shell.

Usage

Get the domain name from an url without the www. subdomain.

Raev.url("http://indiegames.com/2011/05/c418_minecraft_volume_alpha.html").base
# => "indiegames.com"

Remove UTM analytics parameters from an url.

Raev.url("http://ipodtouchlab.com/2011/01/iphone-ipad-app-sale-20110117.html?utm_campaign=touch_lab_bot&utm_medium=twitter&utm_source=am6_feedtweet").clean
# =>  "http://ipodtouchlab.com/2011/01/iphone-ipad-app-sale-20110117.html"

Resolve a shortened or proxied url.

Raev.url("http://sbn.to/WRgXfl").resolved
# => "http://www.polygon.com/features/2013/3/25/4128022/gdc-gathering-of-game-makers"

Resolve a shortend or proxied url and remove UTM analytics parameters.

url = Raev.url("http://feedproxy.google.com/~r/fingergaming/~3/nBkNwBLq-U8/").resolved_and_clean 
# => "http://www.gamasutra.com/topic/smartphone-tablet/fg/2011/01/21/zynga-acquires-drop7-developer-areacode/"   

Fetch Twitter handle from url.

Raev.url("http://www.polygon.com").twitter
# => "polygon"

Fetch RSS feed from url.

Raev.url("http://www.polygon.com").feed
# => "http://www.polygon.com/rss/index.xml"

Fetch headline from url. Removes double spaces.

Raev.url("http://www.polygon.com/e3-2013/2013/6/14/4429126/the-indie-eight-ps4").headline
# => "The Indie Eight: Polygon talks with the showcase indies launching on PS4"

Normalize author name. Capitalizes name, strips whitespace, ignores email addresses and removes silly nicknames in quotes. Returns nil for empty strings or non-names like Editor or Staff.

Raev.normalize_author("[email protected] (Andreas)")
# => "Andreas"

Raev.normalize_author("andreas")
# => "Andreas"

Raev.normalize_author("Andreas 'Pixelate' Zecher")
# => "Andreas Zecher"

Raev.normalize_author("Editor")
# => nil

Raev.normalize_author(" ")
# => nil

Code Climate