LinkHum

LinkHum (aka "Links Humana") is URL auto-linker for user-entered texts. It tries hard to do the most reasonable thing even in complex cases.

It will be useful for sites with plain-text user input

Features:

  • auto-links URL;
  • very accurate detection of punctiations inside and outside of URL;
  • excessive tests set for complex (yet real-life) texts with URLs;
  • customizable behavior.

NB: the original algo was written by squadette and the test cases provided by users of Mokum. Just gemifying this (on behalf of original author).

Install

[sudo] gem install linkhum

Or in your Gemfile

gem 'linkhum'

And then

bundle install

Usage

As simple as:

LinkHum.urlify("Please look at http://github.com/zverok/linkhum, it's awesome!")
# => 'Please look at <a href="http://github.com/zverok/linkhum">http://github.com/zverok/linkhum</a>, it's awesome!'

Showcase

# Doesn't touch punctuations outside:
LinkHum.urlify('http://slashdot.org, or http://lwn.net? They say, "just http://google.com"')
# => "<a href='http://slashdot.org'>http://slashdot.org</a>, or <a href='http://lwn.net'>http://lwn.net</a>? They say, \"just <a href='http://google.com'>http://google.com</a>\""

# But processes it inside:
LinkHum.urlify('Watch this: https://www.youtube.com/watch?v=Q9Dv4Hmf_O8')
# => "Watch this: <a href='https://www.youtube.com/watch?v=Q9Dv4Hmf_O8'>https://www.youtube.com/watch?v=Q9Dv4Hmf_O8</a>"

# Understands parentheses:
LinkHum.urlify("It's a movie: https://en.wikipedia.org/wiki/Hours_(2013_film) It's just parens: (https://www.youtube.com/watch?v=Q9Dv4Hmf_O8)")
# => "It's a movie: <a href='https://en.wikipedia.org/wiki/Hours_(2013_film)'>https://en.wikipedia.org/wiki/Hours_(2013_film)</a> It's just parens: (<a href='https://www.youtube.com/watch?v=Q9Dv4Hmf_O8'>https://www.youtube.com/watch?v=Q9Dv4Hmf_O8</a>)"

# URL shortening:
LinkHum.urlify("It's too long: http://www.booking.com/searchresults.ru.html?sid=28c7356c8d0fb6d81de3a45eff97e0fe;dcid=4;bb_asr=2&class_interval=1&csflt=%7B%7D&dest_id=-2167973&dest_type=city&group_adults=2&group_children=0&idf=1&label_click=undef&no_rooms=1&offset=0&review_score_group=empty&score_min=0&si=ai%2Cco%2Cci%2Cre%2Cdi&src=index&ss=Lisbon%2C%20Lisbon%20Region%2C%20Portugal&ss_raw=Lisbon&ssb=empty")
# => "It's too long: <a href='http://www.booking.com/searchresults.ru.html?sid=28c7356c8d0fb6d81de3a45eff97e0fe;dcid=4;bb_asr=2&class_interval=1&csflt=%7B%7D&dest_id=-2167973&dest_type=city&group_adults=2&group_children=0&idf=1&label_click=undef&no_rooms=1&offset=0&review_score_group=empty&score_min=0&si=ai,co,ci,re,di&src=index&ss=Lisbon,%20Lisbon%20Region,%20Portugal&ss_raw=Lisbon&ssb=empty'>http://www.booking.com/searchresults.ru.html?sid=28c7356c8d0f...</a>"

# It's customizable:
LinkHum.urlify(
  "It's too long: http://www.booking.com/searchresults.ru.html?sid=28c7356c8d0fb6d81de3a45eff97e0fe;dcid=4;bb_asr=2&class_interval=1&csflt=%7B%7D&dest_id=-2167973&dest_type=city&group_adults=2&group_children=0&idf=1&label_click=undef&no_rooms=1&offset=0&review_score_group=empty&score_min=0&si=ai%2Cco%2Cci%2Cre%2Cdi&src=index&ss=Lisbon%2C%20Lisbon%20Region%2C%20Portugal&ss_raw=Lisbon&ssb=empty",
  max_length: 20)
# =>

# International domains and Non-ASCII paths:
LinkHum.urlify("Domain: http://www.詹姆斯.com/, and path: https://ru.wikipedia.org/wiki/Эффект_Даннинга_—_Крюгера")
# => "Domain: <a href='http://www.詹姆斯.com/'>http://www.詹姆斯.com/</a>, and path: <a href='https://ru.wikipedia.org/wiki/%D0%AD%D1%84%D1%84%D0%B5%D0%BA%D1%82_%D0%94%D0%B0%D0%BD%D0%BD%D0%B8%D0%BD%D0%B3%D0%B0_%E2%80%94_%D0%9A%D1%80%D1%8E%D0%B3%D0%B5%D1%80%D0%B0'>https://ru.wikipedia.org/wiki/Эффект_Даннинга_—_Крюгера</a>"

# Look, ma, no XSS!
LinkHum.urlify('http://example.com/foo?">here.</a><script>window.alert("wow");</script>')
# => "<a href='http://example.com/foo?%22%3Ehere.%3C/a%3E%3Cscript%3Ewindow.alert(%22wow%22);%3C/script%3E'>http://example.com/foo?\">here.</a><script>window.alert(\"wow\")...</a>"

Customization

On the fly

Custom URL params:

LinkHum.urlify("http://oursite.com/posts/12345 has been mentioned at http://cnn.com"){
  |uri|
  uri.host == 'oursite.com' ? {} : {target: '_blank'}
}
# => "<a href='http://oursite.com/posts/12345'>http://oursite.com/posts/12345</a> has been mentioned at <a href='http://cnn.com' target='_blank'>http://cnn.com</a>"

Provided block should receive an instance of Addressable::URI and return hash of additional link attributes. You can use it for opening foreign links in new tab, or for styling them different (Wikipedia-style), or to provide special icons for links to Youtube, Wikipedia and Google... Up to you

Define your own LinkHum

class MyLinks < LinkHum
  def link_attrs(uri)
    {target: '_blank'} unless uri.host == 'oursite.com'
  end
end

MyLinks.urlify("http://oursite.com/posts/12345 has been mentioned at http://cnn.com")
# => "<a href='http://oursite.com/posts/12345'>http://oursite.com/posts/12345</a> has been mentioned at <a href='http://cnn.com' target='_blank'>http://cnn.com</a>"

You can also define special strings, which should also became URLs on your site:

class MyLinks < LinkHum
  special /@(\S+)\b/ do |username|
    "http://oursite/users/#{username}"
  end
end

MyLinks.urlify("Hey, @jude!")
# => "Hey, <a href='http://oursite/users/jude'>@jude</a>!"

# nil or false means no replacements:
class MyLinksConditional < LinkHum
  special /@(\S+)\b/ do |username|
    "http://oursite/users/#{username}" if User.where(name: username).exists?
  end
end

MyLinksConditional.urlify("So, our @dude and @unknownguy walk into a bar...")
# => "So, our <a href='http://oursite/users/dude'>@dude</a> and @unknownguy walk into a bar..."

Some special gotchas:

  • in version 0.0.2, you can define any number of specials, but it's totally up to you to have non-conflicting, clearly distinguished patterns;
  • it passes to the block values by the same logic as String#scan does:
class AllSymbols < LinkHum
  special /@\S+\b/ do |username|
    p username
    nil
  end
end
AllSymbols.urlify('@dude')
# Receives "@dude"

class SelectedPart < LinkHum
  special /@(\S+)\b/ do |username|
    p username
    nil
  end
end
SelectedPart.urlify('@dude')
# Receives "dude"

class SeveralArgs < LinkHum
  special(/@(\S+)_(\S+)\b/) do |first, second|
    p first, second
    nil
  end
end
SeveralArgs.urlify('@cool_dude')
# Receives "cool", "dude"

"Parse only" mode

If your demands for resulting strings construction is far more complicated than default LinkHum behavior, you can use its #parse command to split string into tokens, and process them by yourself. All URL-detection goodness and specials still will be with you:

class MyParser < LinkHum
  # You don't need rendering blocks for your specials
  # Second argument is special's name, it is optional
  special /@(\S+)\b/, :username
  special /\#(\S+)\b/, :tag
end

MyParser.parse("Here is @dude. He is #cute. Is he on http://facebook.com?")
# => [
#   {type: :text    , content: 'Here is '},
#   {type: :username, content: '@dude', captures: ['dude']},
#   {type: :text    , content: '. He is '},
#   {type: :tag     , content: '#cute', captures: ['cute']},
#   {type: :text    , content: '. Is he on '},
#   {type: :url     , content: 'http://facebook.com'},
#   {type: :text    , content: '?'}
# ]

Credits

  • squadette -- author of original code;
  • users of Mokum -- testing and advicing (and now you can observe LinkHum work online at Mokum);
  • zverok -- gemifying, documenting and writing specs.

Contributing

Just usual fork-change-pull request process.

Development

  • Don't forget to use rspec after any changes made (and specify them, of course!)
  • It's preferred to use bundle exec dokaz to check if README written correctly and bundle exec dokaz -fshow to check what exactly code from README will output.

License

MIT