Class: Linkify

Inherits:
Object
  • Object
show all
Includes:
LinkifyRe
Defined in:
lib/linkify-it-rb/index.rb

Defined Under Namespace

Classes: Match

Constant Summary collapse

TLDS_2CH_SRC_RE =

RE pattern for 2-character tlds (autogenerated by ./support/tlds_2char_gen.js)

'a[cdefgilmnoqrstuwxz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvwxyz]|d[ejkmoz]|e[cegrstu]|f[ijkmor]|g[abdefghilmnpqrstuwy]|h[kmnrtu]|i[delmnoqrst]|j[emop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdeghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrstwy]|qa|r[eosuw]|s[abcdeghijklmnortuvxyz]|t[cdfghjklmnortvwz]|u[agksyz]|v[aceginu]|w[fs]|y[et]|z[amw]'
TLDS_DEFAULT =

DON’T try to make PRs with changes. Extend TLDs with LinkifyIt.tlds() instead

'biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф'.split('|')
DEFAULT_OPTIONS =
{
  fuzzyLink: true,
  fuzzyEmail: true,
  fuzzyIP: false
}
DEFAULT_SCHEMAS =
{
  'http:' => {
    validate: lambda do |text, pos, obj|
      tail = text.slice(pos..-1)

      if (!obj.re[:http])
        # compile lazily, because "host"-containing variables can change on tlds update.
        obj.re[:http] = Regexp.new('^\\/\\/' + obj.re[:src_auth] + obj.re[:src_host_port_strict] + obj.re[:src_path], 'i')
      end
      if obj.re[:http] =~ tail
        return tail.match(obj.re[:http])[0].length
      end
      return 0
    end
  },
  'https:' =>  'http:',
  'ftp:' =>    'http:',
  '//' =>      {
    validate: lambda do |text, pos, obj|
      tail = text.slice(pos..-1)

      if (!obj.re[:no_http])
        # compile lazily, because "host"-containing variables can change on tlds update.
        obj.re[:no_http] = Regexp.new(
          '^' +
          obj.re[:src_auth] +
          # Don't allow single-level domains, because of false positives like '//test'
          # with code comments
          '(?:localhost|(?:(?:' + obj.re[:src_domain] + ')\\.)+' + obj.re[:src_domain_root] + ')' +
          obj.re[:src_port] +
          obj.re[:src_host_terminator] +
          obj.re[:src_path],
          'i'
        )
      end

      if (obj.re[:no_http] =~ tail)
        # should not be `://` & `///`, that protects from errors in protocol name
        return 0 if (pos >= 3 && text[pos - 3] == ':')
        return 0 if (pos >= 3 && text[pos - 3] == '/')
        return tail.match(obj.re[:no_http])[0].length
      end
      return 0
    end
  },
  'mailto:' => {
    validate: lambda do |text, pos, obj|
      tail = text.slice(pos..-1)

      if (!obj.re[:mailto])
        obj.re[:mailto] = Regexp.new('^' + obj.re[:src_email_name] + '@' + obj.re[:src_host_strict], 'i')
      end
      if (obj.re[:mailto] =~ tail)
        return tail.match(obj.re[:mailto])[0].length
      end
      return 0
    end
  }
}

Constants included from LinkifyRe

LinkifyRe::SRC_ANY, LinkifyRe::SRC_AUTH, LinkifyRe::SRC_CC, LinkifyRe::SRC_DOMAIN, LinkifyRe::SRC_DOMAIN_ROOT, LinkifyRe::SRC_EMAIL_NAME, LinkifyRe::SRC_HOST, LinkifyRe::SRC_HOST_PORT_STRICT, LinkifyRe::SRC_HOST_STRICT, LinkifyRe::SRC_HOST_TERMINATOR, LinkifyRe::SRC_IP4, LinkifyRe::SRC_P, LinkifyRe::SRC_PORT, LinkifyRe::SRC_PSEUDO_LETTER, LinkifyRe::SRC_XN, LinkifyRe::SRC_Z, LinkifyRe::SRC_Z_CC, LinkifyRe::SRC_Z_P_CC, LinkifyRe::TEXT_SEPARATORS, LinkifyRe::TPL_EMAIL_FUZZY, LinkifyRe::TPL_HOST_FUZZY, LinkifyRe::TPL_HOST_FUZZY_STRICT, LinkifyRe::TPL_HOST_FUZZY_TEST, LinkifyRe::TPL_HOST_NO_IP_FUZZY, LinkifyRe::TPL_HOST_PORT_FUZZY_STRICT, LinkifyRe::TPL_HOST_PORT_NO_IP_FUZZY_STRICT

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from LinkifyRe

#build_re, #re_src_path

Constructor Details

#initialize(schemas = {}, options = {}) ⇒ Linkify

new LinkifyIt(schemas, options)

  • schemas (Object): Optional. Additional schemas to validate (prefix/validator)

Creates new linkifier instance with optional additional schemas. Can be called without ‘new` keyword for convenience.

By default understands:

  • ‘http(s)://…` , `ftp://…`, `mailto:…` & `//…` links

  • “fuzzy” links and emails (example.com, [email protected]).

‘schemas` is an object, where each key/value describes protocol/rule:

  • __key__ - link prefix (usually, protocol name with ‘:` at the end, `skype:` for example). `linkify-it` makes shure that prefix is not preceeded with alphanumeric char and symbols. Only whitespaces and punctuation allowed.

  • __value__ - rule to check tail after link prefix

    • String - just alias to existing rule

    • Object

      • validate - validator function (should return matched length on success), or ‘RegExp`.

      • normalize - optional function to normalize text & url of matched result (for example, for @twitter mentions).

‘options`:

  • __fuzzyLink__ - recognige URL-s without ‘http(s):` prefix. Default `true`.

  • __fuzzyIP__ - allow IPs in fuzzy links above. Can conflict with some texts like version numbers. Default ‘false`.

  • __fuzzyEmail__ - recognize emails without ‘mailto:` prefix.




304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
# File 'lib/linkify-it-rb/index.rb', line 304

def initialize(schemas = {}, options = {})
  schemas = {} unless schemas

  # not needed
  # if (!(this instanceof LinkifyIt)) {
  #   return new LinkifyIt(schemas, options);
  # }

  # not needed, if you want to pass options, then must also pass schemas
  # if options.empty?
  #   if (isOptionsObj(schemas)) {
  #     options = schemas;
  #     schemas = {};
  #   }
  # }

  @__opts__           = DEFAULT_OPTIONS.merge(options)

  # Cache last tested result. Used to skip repeating steps on next `match` call.
  @__index__          = -1
  @__last_index__     = -1 # Next scan position
  @__schema__         = ''
  @__text_cache__     = ''

  @__schemas__        = {}.merge!(DEFAULT_SCHEMAS).merge!(schemas)
  @__compiled__       = {}

  @__tlds__           = TLDS_DEFAULT
  @__tlds_replaced__  = false

  @re                 = {}

  @bypass_normalizer  = false   # only used in testing scenarios

  compile
end

Instance Attribute Details

#__compiled__Object

Returns the value of attribute __compiled__.



4
5
6
# File 'lib/linkify-it-rb/index.rb', line 4

def __compiled__
  @__compiled__
end

#__index__Object

Returns the value of attribute __index__.



4
5
6
# File 'lib/linkify-it-rb/index.rb', line 4

def __index__
  @__index__
end

#__last_index__Object

Returns the value of attribute last_index.



4
5
6
# File 'lib/linkify-it-rb/index.rb', line 4

def __last_index__
  @__last_index__
end

#__schema__Object

Returns the value of attribute __schema__.



4
5
6
# File 'lib/linkify-it-rb/index.rb', line 4

def __schema__
  @__schema__
end

#__text_cache__Object

Returns the value of attribute text_cache.



4
5
6
# File 'lib/linkify-it-rb/index.rb', line 4

def __text_cache__
  @__text_cache__
end

#bypass_normalizerObject

Returns the value of attribute bypass_normalizer.



5
6
7
# File 'lib/linkify-it-rb/index.rb', line 5

def bypass_normalizer
  @bypass_normalizer
end

#reObject

Returns the value of attribute re.



5
6
7
# File 'lib/linkify-it-rb/index.rb', line 5

def re
  @re
end

Instance Method Details

#add(schema, definition) ⇒ Object

chainable LinkifyIt#add(schema, definition)

  • schema (String): rule name (fixed pattern prefix)

  • definition (String|RegExp|Object): schema definition

Add new rule definition. See constructor description for details.




348
349
350
351
352
# File 'lib/linkify-it-rb/index.rb', line 348

def add(schema, definition)
  @__schemas__[schema] = definition
  compile
  return self
end

#compileObject

Schemas compiler. Build regexps.




117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
# File 'lib/linkify-it-rb/index.rb', line 117

def compile
  @re = build_re(@__opts__)

  # Define dynamic patterns
  tlds = @__tlds__.dup

  onCompile

  tlds.push(TLDS_2CH_SRC_RE) if (!@__tlds_replaced__)
  tlds.push(@re[:src_xn])

  @re[:src_tlds]         = tlds.join('|')
  @re[:email_fuzzy]      = Regexp.new(@re[:tpl_email_fuzzy].gsub('%TLDS%', @re[:src_tlds]), true)
  @re[:link_fuzzy]       = Regexp.new(@re[:tpl_link_fuzzy].gsub('%TLDS%', @re[:src_tlds]), true)
  @re[:link_no_ip_fuzzy] = Regexp.new(@re[:tpl_link_no_ip_fuzzy].gsub('%TLDS%', @re[:src_tlds]), true)
  @re[:host_fuzzy_test]  = Regexp.new(@re[:tpl_host_fuzzy_test].gsub('%TLDS%', @re[:src_tlds]), true)

  #
  # Compile each schema
  #

  aliases = []

  @__compiled__ = {} # Reset compiled data

  schemaError = lambda do |name, val|
    raise Error, ('(LinkifyIt) Invalid schema "' + name + '": ' + val)
  end

  @__schemas__.each do |name, val|

    # skip disabled methods
    next if (val == nil)

    compiled = { validate: nil, link: nil }

    @__compiled__[name] = compiled

    if (val.is_a? Hash)
      if (val[:validate].is_a? Regexp)
        compiled[:validate] = createValidator(val[:validate])
      elsif (val[:validate].is_a? Proc)
        compiled[:validate] = val[:validate]
      else
        schemaError(name, val)
      end

      if (val[:normalize].is_a? Proc)
        compiled[:normalize] = val[:normalize]
      elsif (!val[:normalize])
        compiled[:normalize] = createNormalizer()
      else
        schemaError(name, val)
      end
      next
    end

    if (val.is_a? String)
      aliases.push(name)
      next
    end

    schemaError(name, val)
  end

  #
  # Compile postponed aliases
  #

  aliases.each do |an_alias|
    if (!@__compiled__[@__schemas__[an_alias]])
      # Silently fail on missed schemas to avoid errons on disable.
      # schemaError(an_alias, self.__schemas__[an_alias]);
    else
      @__compiled__[an_alias][:validate]  = @__compiled__[@__schemas__[an_alias]][:validate]
      @__compiled__[an_alias][:normalize] = @__compiled__[@__schemas__[an_alias]][:normalize]
    end
  end

  #
  # Fake record for guessed links
  #
  @__compiled__[''] = { validate: nil, normalize: createNormalizer }

  #
  # Build schema condition, and filter disabled & fake schemas
  #
  slist = @__compiled__.select {|name, val| name.length > 0 && !val.nil? }.keys.map {|str| escapeRE(str)}.join('|')

  # (?!_) cause 1.5x slowdown
  @re[:schema_test]   = Regexp.new('(^|(?!_)(?:[><\uff5c]|' + @re[:src_XPCc] + '))(' + slist + ')', 'i')
  @re[:schema_search] = Regexp.new('(^|(?!_)(?:[><\uff5c]|' + @re[:src_XPCc] + '))(' + slist + ')', 'ig')

  @re[:pretest]       = Regexp.new(
                            '(' + @re[:schema_test].source + ')|' +
                            '(' + @re[:host_fuzzy_test].source + ')|' + '@', 'i')

  #
  # Cleanup
  #

  resetScanCache
end

#createNormalizerObject




108
109
110
111
112
# File 'lib/linkify-it-rb/index.rb', line 108

def createNormalizer()
  return lambda do |match, obj|
    obj.normalize(match)
  end
end

#createValidator(re) ⇒ Object




99
100
101
102
103
104
105
# File 'lib/linkify-it-rb/index.rb', line 99

def createValidator(re)
  return lambda do |text, pos, obj|
    tail = text.slice(pos..-1)

    (re =~ tail) ? tail.match(re)[0].length : 0
  end
end

#escapeRE(str) ⇒ Object




88
89
90
# File 'lib/linkify-it-rb/index.rb', line 88

def escapeRE(str)
  return str.gsub(/[\.\?\*\+\^\$\[\]\\\(\)\{\}\|\-]/, "\\$&")
end

#match(text) ⇒ Object

LinkifyIt#match(text) -> Array|null

Returns array of found link descriptions or ‘null` on fail. We strongly suggest recommend to use [[LinkifyIt#test]] first, for best speed.

##### Result match description

  • __schema__ - link schema, can be empty for fuzzy links, or ‘//` for protocol-neutral links.

  • __index__ - offset of matched text

  • __lastIndex__ - index of next char after mathch end

  • __raw__ - matched text

  • __text__ - normalized text

  • __url__ - link, generated from matched text




480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
# File 'lib/linkify-it-rb/index.rb', line 480

def match(text)
  shift  = 0
  result = []

  # Try to take previous element from cache, if .test() called before
  if (@__index__ >= 0 && @__text_cache__ == text)
    result.push(Match.createMatch(self, shift))
    shift = @__last_index__
  end

  # Cut head if cache was used
  tail = shift ? text.slice(shift..-1) : text

  # Scan string until end reached
  while (self.test(tail))
    result.push(Match.createMatch(self, shift))

    tail   = tail.slice(@__last_index__..-1)
    shift += @__last_index__
  end

  if (result.length)
    return result
  end

  return nil
end

#normalize(match) ⇒ Object

LinkifyIt#normalize(match)

Default normalizer (if schema does not define it’s own).




544
545
546
547
548
549
550
551
552
553
554
555
# File 'lib/linkify-it-rb/index.rb', line 544

def normalize(match)
  return if @bypass_normalizer

  # Do minimal possible changes by default. Need to collect feedback prior
  # to move forward https://github.com/markdown-it/linkify-it/issues/1

  match.url = "http://#{match.url}" if match.schema.empty?

  if (match.schema == 'mailto:' && !(/^mailto\:/i =~ match.url))
    match.url = 'mailto:' + match.url
  end
end

#onCompileObject

LinkifyIt#onCompile()

Override to modify basic RegExp-s.




561
562
# File 'lib/linkify-it-rb/index.rb', line 561

def onCompile
end

#pretest(text) ⇒ Object

LinkifyIt#pretest(text) -> Boolean

Very quick check, that can give false positives. Returns true if link MAY BE can exists. Can be used for speed optimization, when you need to check that link NOT exists.




443
444
445
# File 'lib/linkify-it-rb/index.rb', line 443

def pretest(text)
  return !(@re[:pretest] =~ text).nil?
end

#resetScanCacheObject




93
94
95
96
# File 'lib/linkify-it-rb/index.rb', line 93

def resetScanCache
  @__index__      = -1
  @__text_cache__ = ''
end

#set(options) ⇒ Object

chainable LinkifyIt#set(options)

  • options (Object): { fuzzyLink|fuzzyEmail|fuzzyIP: true|false }

Set recognition options for links without schema.




360
361
362
363
# File 'lib/linkify-it-rb/index.rb', line 360

def set(options)
  @__opts__.merge!(options)
  return self
end

#test(text) ⇒ Object

LinkifyIt#test(text) -> Boolean

Searches linkifiable pattern and returns ‘true` on success or `false` on fail.




369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
# File 'lib/linkify-it-rb/index.rb', line 369

def test(text)
  # Reset scan cache
  @__text_cache__ = text
  @__index__      = -1

  return false if (!text.length)

  # try to scan for link with schema - that's the most simple rule
  if @re[:schema_test] =~ text
    re = @re[:schema_search]
    lastIndex = 0
    while ((m = re.match(text, lastIndex)) != nil)
      lastIndex = m.end(0)
      len       = testSchemaAt(text, m[2], lastIndex)
      if len > 0
        @__schema__     = m[2]
        @__index__      = m.begin(0) + m[1].length
        @__last_index__ = m.begin(0) + m[0].length + len
        break
      end
    end
  end

  # guess schemaless links
  if (@__opts__[:fuzzyLink] && @__compiled__['http:'])
    tld_pos = text.index(@re[:host_fuzzy_test])
    if !tld_pos.nil?
      # if tld is located after found link - no need to check fuzzy pattern
      if (@__index__ < 0 || tld_pos < @__index__)
        if ((ml = text.match(@__opts__[:fuzzyIP] ? @re[:link_fuzzy] : @re[:link_no_ip_fuzzy])) != nil)

          shift = ml.begin(0) + ml[1].length

          if (@__index__ < 0 || shift < @__index__)
            @__schema__     = ''
            @__index__      = shift
            @__last_index__ = ml.begin(0) + ml[0].length
          end
        end
      end
    end
  end

  # guess schemaless emails
  if (@__opts__[:fuzzyEmail] && @__compiled__['mailto:'])
    at_pos = text.index('@')
    if !at_pos.nil?
      # We can't skip this check, because this cases are possible:
      # [email protected], [email protected]
      if ((me = text.match(@re[:email_fuzzy])) != nil)

        shift = me.begin(0) + me[1].length
        nextc = me.begin(0) + me[0].length

        if (@__index__ < 0 || shift < @__index__ ||
            (shift == @__index__ && nextc > @__last_index__))
          @__schema__     = 'mailto:'
          @__index__      = shift
          @__last_index__ = nextc
        end
      end
    end
  end

  return @__index__ >= 0
end

#testSchemaAt(text, schema, pos) ⇒ Object

LinkifyIt#testSchemaAt(text, name, position) -> Number

  • text (String): text to scan

  • name (String): rule (schema) name

  • position (Number): text offset to check from

Similar to [[LinkifyIt#test]] but checks only specific protocol tail exactly at given position. Returns length of found pattern (0 on fail).




456
457
458
459
460
461
462
# File 'lib/linkify-it-rb/index.rb', line 456

def testSchemaAt(text, schema, pos)
  # If not supported schema check requested - terminate
  if (!@__compiled__[schema.downcase])
    return 0
  end
  return @__compiled__[schema.downcase][:validate].call(text, pos, self)
end

#tlds(list, keepOld) ⇒ Object

chainable LinkifyIt#tlds(list [, keepOld]) -> this

  • list (Array): list of tlds

  • keepOld (Boolean): merge with current list if ‘true` (`false` by default)

Load (or merge) new tlds list. Those are user for fuzzy links (without prefix) to avoid false positives. By default this algorythm used:

  • hostname with any 2-letter root zones are ok.

  • biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф are ok.

  • encoded (‘xn–…`) root zones are ok.

If list is replaced, then exact match for 2-chars root zones will be checked.




524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
# File 'lib/linkify-it-rb/index.rb', line 524

def tlds(list, keepOld)
  list = list.is_a?(Array) ? list : [ list ]

  if (!keepOld)
    @__tlds__ = list.dup
    @__tlds_replaced__ = true
    compile
    return self
  end

  @__tlds__ = @__tlds__.concat(list).sort.uniq.reverse

  compile
  return self
end