Class: URIChunk

Inherits:
Chunk::Abstract show all
Includes:
URI::REGEXP::PATTERN
Defined in:
app/models/chunks/uri.rb

Overview

This wiki chunk matches arbitrary URIs, using patterns from the Ruby URI modules. It parses out a variety of fields that could be used by renderers to format the links in various ways (shortening domain names, hiding email addresses) It matches email addresses and host.com.au domains without schemes (http://) but adds these on as required.

The heuristic used to match a URI is designed to err on the side of caution. That is, it is more likely to not autolink a URI than it is to accidently autolink something that is not a URI. The reason behind this is it is easier to force a URI link by prefixing ‘http://’ to it than it is to escape and incorrectly marked up non-URI.

I’m using a part of the [ISO 3166-1 Standard] for country name suffixes. The generic names are from www.bnoack.com/data/countrycode2.html)

[iso3166]: http://geotags.com/iso3166/

Constant Summary collapse

GENERIC =
'(?:aero|biz|com|coop|edu|gov|info|int|mil|museum|name|net|org)'
COUNTRY =
'(?:au|at|be|ca|ch|de|dk|fr|hk|in|ir|it|jp|nl|no|pt|ru|se|sw|tv|tw|uk|us)'
TLDS =

These are needed otherwise HOST will match almost anything

"\\.(?:#{GENERIC}|#{COUNTRY})"
USERINFO =

Redefine USERINFO so that it must have non-zero length

"(?:[#{UNRESERVED};:&=+$,]|#{ESCAPED})+"
URI_ENDING =

Pattern of legal URI endings to stop interference with some Textile markup. (Images: !URI!) and other punctuation eg, (wiki.com/)

'[)!]'
URI_PATTERN =

The basic URI expression as a string

"(?:(#{SCHEME})://)?" +    # Optional scheme://              (\1|\8)
"(?:(#{USERINFO})@)?" +    # Optional userinfo@              (\2|\9)
"(#{HOSTNAME}#{TLDS})" +   # Mandatory host eg, HOST.com.au  (\3|\10)
"(?::(#{PORT}))?" +        # Optional :port                  (\4|\11)
"(#{ABS_PATH})?" +         # Optional absolute path          (\5|\12)
"(?:\\?(#{QUERY}))?" +     # Optional ?query                 (\6|\13)
"(?:\\#(#{FRAGMENT}))?"

Instance Attribute Summary collapse

Attributes inherited from Chunk::Abstract

#text

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Chunk::Abstract

#mask, #post_mask, #pre_mask, #revert

Constructor Details

#initialize(match_data) ⇒ URIChunk

Returns a new instance of URIChunk.



55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# File 'app/models/chunks/uri.rb', line 55

def initialize(match_data)
  super(match_data)
  # Since the URI_PATTERN is tried twice, there are two sets of
  # groups, one from \1 to \7 and the second from \8 to \14.
  # The fields are set by which ever group matches.
  @scheme   	= match_data[1] || match_data[8]
  @user     	= match_data[2] || match_data[9]
  @host     	= match_data[3] || match_data[10]
  @port		= match_data[4] || match_data[11]
  @path		= match_data[5] || match_data[12]
  @query		= match_data[6] || match_data[13]
  @fragment	= match_data[7] || match_data[14]

  # If there is no scheme, add an appropriate one, otherwise
  # set the URI to the matched text.
	@text_scheme = scheme
  @uri = (scheme ? match_data[0] : nil )
  @scheme = scheme || ( user ? 'mailto' : 'http' )
  @delimiter = ( scheme == 'mailto' ? ':' : '://' ) 
  @uri ||= scheme + @delimiter + match_data[0]

  # Build up the link text. Schemes are omitted unless explicitly given.
	@link_text = ''
    @link_text << "#{@scheme}#{@delimiter}" if @text_scheme
    @link_text << "#{@user}@" if @user
    @link_text << "#{@host}" if @host
    @link_text << ":#{@port}" if @port
    @link_text << "#{@path}" if @path
    @link_text << "?#{@query}" if @query
end

Instance Attribute Details

#fragmentObject (readonly)

Returns the value of attribute fragment.



53
54
55
# File 'app/models/chunks/uri.rb', line 53

def fragment
  @fragment
end

#hostObject (readonly)

Returns the value of attribute host.



53
54
55
# File 'app/models/chunks/uri.rb', line 53

def host
  @host
end

Returns the value of attribute link_text.



53
54
55
# File 'app/models/chunks/uri.rb', line 53

def link_text
  @link_text
end

#pathObject (readonly)

Returns the value of attribute path.



53
54
55
# File 'app/models/chunks/uri.rb', line 53

def path
  @path
end

#portObject (readonly)

Returns the value of attribute port.



53
54
55
# File 'app/models/chunks/uri.rb', line 53

def port
  @port
end

#queryObject (readonly)

Returns the value of attribute query.



53
54
55
# File 'app/models/chunks/uri.rb', line 53

def query
  @query
end

#schemeObject (readonly)

Returns the value of attribute scheme.



53
54
55
# File 'app/models/chunks/uri.rb', line 53

def scheme
  @scheme
end

#uriObject (readonly)

Returns the value of attribute uri.



53
54
55
# File 'app/models/chunks/uri.rb', line 53

def uri
  @uri
end

#userObject (readonly)

Returns the value of attribute user.



53
54
55
# File 'app/models/chunks/uri.rb', line 53

def user
  @user
end

Class Method Details

.patternObject



44
45
46
47
48
49
50
51
# File 'app/models/chunks/uri.rb', line 44

def self.pattern()
  # This pattern first tries to match the URI_PATTERN that ends with 
  # punctuation that is a valid URI character (eg, ')', '!'). If
  # such a match occurs, there should be no backtracking (hence the ?> ). 
  # If the string cannot match a URI ending with URI_ENDING, then a second
  # attempt is tried.
  Regexp.new("(?>#{URI_PATTERN}(?=#{URI_ENDING}))|#{URI_PATTERN}", Regexp::EXTENDED, 'N')
end

Instance Method Details

#escaped_textObject

If there is no hostname in the URI, do not render it It’s probably only contains the scheme, eg ‘something:’



96
# File 'app/models/chunks/uri.rb', line 96

def escaped_text() ( host.nil? ? @uri : nil )  end

#unmask(content) ⇒ Object

If the text should be escaped then don’t keep this chunk. Otherwise only keep this chunk if it was substituted back into the content.



89
90
91
92
# File 'app/models/chunks/uri.rb', line 89

def unmask(content) 
  return nil if escaped_text
  return self if content.sub!( Regexp.new(mask(content)), "<a href=\"#{uri}\">#{link_text}</a>" )
end