Class: Linkify

Inherits:

Object

Object
Linkify

show all

Includes:: LinkifyRe

Defined in:: lib/linkify-it-rb/index.rb

Defined Under Namespace

Classes: Match

Constant Summary collapse

TLDS_DEFAULT = DON’T try to make PRs with changes. Extend TLDs with LinkifyIt.tlds() instead

'biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф'.split('|')

DEFAULT_SCHEMAS =

{
  'http:' => {
    validate: lambda do |text, pos, obj|
      tail = text.slice(pos..-1)

      if (!obj.re[:http])
        # compile lazily, because "host"-containing variables can change on tlds update.
        obj.re[:http] = Regexp.new('^\\/\\/' + LinkifyRe::SRC_AUTH + LinkifyRe::SRC_HOST_PORT_STRICT + LinkifyRe::SRC_PATH, 'i')
      end
      if obj.re[:http] =~ tail
        return tail.match(obj.re[:http])[0].length
      end
      return 0
    end
  },
  'https:' =>  'http:',
  'ftp:' =>    'http:',
  '//' =>      {
    validate: lambda do |text, pos, obj|
      tail = text.slice(pos..-1)

      if (!obj.re[:no_http])
        # compile lazily, becayse "host"-containing variables can change on tlds update.
        obj.re[:no_http] = Regexp.new('^' + LinkifyRe::SRC_AUTH + LinkifyRe::SRC_HOST_PORT_STRICT + LinkifyRe::SRC_PATH, 'i')
      end

      if (obj.re[:no_http] =~ tail)
        # should not be `://`, that protects from errors in protocol name
        return 0 if (pos >= 3 && text[pos - 3] == ':')
        return tail.match(obj.re[:no_http])[0].length
      end
      return 0
    end
  },
  'mailto:' => {
    validate: lambda do |text, pos, obj|
      tail = text.slice(pos..-1)

      if (!obj.re[:mailto])
        obj.re[:mailto] = Regexp.new('^' + LinkifyRe::SRC_EMAIL_NAME + '@' + LinkifyRe::SRC_HOST_STRICT, 'i')
      end
      if (obj.re[:mailto] =~ tail)
        return tail.match(obj.re[:mailto])[0].length
      end
      return 0
    end
  }
}

Constants included from LinkifyRe

LinkifyRe::SRC_ANY, LinkifyRe::SRC_AUTH, LinkifyRe::SRC_CC, LinkifyRe::SRC_DOMAIN, LinkifyRe::SRC_DOMAIN_ROOT, LinkifyRe::SRC_EMAIL_NAME, LinkifyRe::SRC_HOST, LinkifyRe::SRC_HOST_PORT_STRICT, LinkifyRe::SRC_HOST_STRICT, LinkifyRe::SRC_HOST_TERMINATOR, LinkifyRe::SRC_IP4, LinkifyRe::SRC_P, LinkifyRe::SRC_PATH, LinkifyRe::SRC_PORT, LinkifyRe::SRC_PSEUDO_LETTER, LinkifyRe::SRC_PSEUDO_LETTER_NON_D, LinkifyRe::SRC_XN, LinkifyRe::SRC_Z, LinkifyRe::SRC_Z_CC, LinkifyRe::SRC_Z_P_CC, LinkifyRe::TPL_EMAIL_FUZZY, LinkifyRe::TPL_HOST_FUZZY, LinkifyRe::TPL_HOST_FUZZY_STRICT, LinkifyRe::TPL_HOST_FUZZY_TEST, LinkifyRe::TPL_HOST_PORT_FUZZY_STRICT, LinkifyRe::TPL_LINK_FUZZY

Instance Attribute Summary collapse

#__compiled__ ⇒ Object

Returns the value of attribute __compiled__.
#__index__ ⇒ Object

Returns the value of attribute __index__.
#__last_index__ ⇒ Object

Returns the value of attribute last_index.
#__schema__ ⇒ Object

Returns the value of attribute __schema__.
#__text_cache__ ⇒ Object

Returns the value of attribute text_cache.
#bypass_normalizer ⇒ Object

Returns the value of attribute bypass_normalizer.
#re ⇒ Object

Returns the value of attribute re.

Instance Method Summary collapse

#add(schema, definition) ⇒ Object

chainable LinkifyIt#add(schema, definition) - schema (String): rule name (fixed pattern prefix) - definition (String|RegExp|Object): schema definition.
#compile ⇒ Object

Schemas compiler.
#createNormalizer ⇒ Object

——————————————————————————.
#createValidator(re) ⇒ Object

——————————————————————————.
#escapeRE(str) ⇒ Object

——————————————————————————.
#initialize(schemas = {}) ⇒ Linkify constructor

new LinkifyIt(schemas) - schemas (Object): Optional.
#match(text) ⇒ Object

LinkifyIt#match(text) -> Array|null.
#normalize(match) ⇒ Object

LinkifyIt#normalize(match).
#pretest(text) ⇒ Object

LinkifyIt#pretest(text) -> Boolean.
#resetScanCache ⇒ Object

——————————————————————————.
#test(text) ⇒ Object

LinkifyIt#test(text) -> Boolean.
#testSchemaAt(text, schema, pos) ⇒ Object

LinkifyIt#testSchemaAt(text, name, position) -> Number - text (String): text to scan - name (String): rule (schema) name - position (Number): text offset to check from.
#tlds(list, keepOld) ⇒ Object

chainable LinkifyIt#tlds(list [, keepOld]) -> this - list (Array): list of tlds - keepOld (Boolean): merge with current list if ‘true` (`false` by default).

Constructor Details

#initialize(schemas = {}) ⇒ `Linkify`

new LinkifyIt(schemas)

schemas (Object): Optional. Additional schemas to validate (prefix/validator)

Creates new linkifier instance with optional additional schemas. Can be called without ‘new` keyword for convenience.

By default understands:

‘http(s)://…` , `ftp://…`, `mailto:…` & `//…` links
“fuzzy” links and emails (example.com, [email protected]).

‘schemas` is an object, where each key/value describes protocol/rule:

__key__ - link prefix (usually, protocol name with ‘:` at the end, `skype:` for example). `linkify-it` makes shure that prefix is not preceeded with alphanumeric char and symbols. Only whitespaces and punctuation allowed.
__value__ - rule to check tail after link prefix
- String - just alias to existing rule
- Object
  - validate - validator function (should return matched length on success), or ‘RegExp`.
  - normalize - optional function to normalize text & url of matched result (for example, for @twitter mentions).

# File 'lib/linkify-it-rb/index.rb', line 264

def initialize(schemas = {})
  # if (!(this instanceof LinkifyIt)) {
  #   return new LinkifyIt(schemas);
  # }

  # Cache last tested result. Used to skip repeating steps on next `match` call.
  @__index__          = -1
  @__last_index__     = -1 # Next scan position
  @__schema__         = ''
  @__text_cache__     = ''

  @__schemas__        = {}.merge!(DEFAULT_SCHEMAS).merge!(schemas)
  @__compiled__       = {}

  @__tlds__           = TLDS_DEFAULT
  @__tlds_replaced__  = false

  @re                 = {}

  @bypass_normalizer  = false   # only used in testing scenarios

  compile
end

Instance Attribute Details

#compiled ⇒ `Object`

Returns the value of attribute __compiled__.



4
5
6

# File 'lib/linkify-it-rb/index.rb', line 4

def __compiled__
  @__compiled__
end

#index ⇒ `Object`

Returns the value of attribute __index__.



4
5
6

# File 'lib/linkify-it-rb/index.rb', line 4

def __index__
  @__index__
end

#__last_index__ ⇒ `Object`

Returns the value of attribute last_index.



4
5
6

# File 'lib/linkify-it-rb/index.rb', line 4

def __last_index__
  @__last_index__
end

#schema ⇒ `Object`

Returns the value of attribute __schema__.



4
5
6

# File 'lib/linkify-it-rb/index.rb', line 4

def __schema__
  @__schema__
end

#__text_cache__ ⇒ `Object`

Returns the value of attribute text_cache.



4
5
6

# File 'lib/linkify-it-rb/index.rb', line 4

def __text_cache__
  @__text_cache__
end

#bypass_normalizer ⇒ `Object`

Returns the value of attribute bypass_normalizer.



5
6
7

# File 'lib/linkify-it-rb/index.rb', line 5

def bypass_normalizer
  @bypass_normalizer
end

#re ⇒ `Object`

Returns the value of attribute re.



5
6
7

# File 'lib/linkify-it-rb/index.rb', line 5

def re
  @re
end

Instance Method Details

#add(schema, definition) ⇒ `Object`

chainable LinkifyIt#add(schema, definition)

schema (String): rule name (fixed pattern prefix)
definition (String|RegExp|Object): schema definition

Add new rule definition. See constructor description for details.

# File 'lib/linkify-it-rb/index.rb', line 296

def add(schema, definition)
  @__schemas__[schema] = definition
  compile
  return self
end

#compile ⇒ `Object`

Schemas compiler. Build regexps.

# File 'lib/linkify-it-rb/index.rb', line 89

def compile
  @re = { src_xn: LinkifyRe::SRC_XN }

  # Define dynamic patterns
  tlds = @__tlds__.dup
  tlds.push('[a-z]{2}') if (!@__tlds_replaced__)
  tlds.push(@re[:src_xn])

  @re[:src_tlds] = tlds.join('|')
  @re[:email_fuzzy]      = Regexp.new(LinkifyRe::TPL_EMAIL_FUZZY.gsub('%TLDS%', @re[:src_tlds]), true)
  @re[:link_fuzzy]       = Regexp.new(LinkifyRe::TPL_LINK_FUZZY.gsub('%TLDS%', @re[:src_tlds]), true)
  @re[:host_fuzzy_test]  = Regexp.new(LinkifyRe::TPL_HOST_FUZZY_TEST.gsub('%TLDS%', @re[:src_tlds]), true)

  #
  # Compile each schema
  #

  aliases = []

  @__compiled__ = {} # Reset compiled data

  schemaError = lambda do |name, val|
    raise Error, ('(LinkifyIt) Invalid schema "' + name + '": ' + val)
  end

  @__schemas__.each do |name, val|

    # skip disabled methods
    next if (val == nil)

    compiled = { validate: nil, link: nil }

    @__compiled__[name] = compiled

    if (val.is_a? Hash)
      if (val[:validate].is_a? Regexp)
        compiled[:validate] = createValidator(val[:validate])
      elsif (val[:validate].is_a? Proc)
        compiled[:validate] = val[:validate]
      else
        schemaError(name, val)
      end

      if (val[:normalize].is_a? Proc)
        compiled[:normalize] = val[:normalize]
      elsif (!val[:normalize])
        compiled[:normalize] = createNormalizer()
      else
        schemaError(name, val)
      end
      next
    end

    if (val.is_a? String)
      aliases.push(name)
      next
    end

    schemaError(name, val)
  end

  #
  # Compile postponed aliases
  #

  aliases.each do |an_alias|
    if (!@__compiled__[@__schemas__[an_alias]])
      # Silently fail on missed schemas to avoid errons on disable.
      # schemaError(an_alias, self.__schemas__[an_alias]);
    else
      @__compiled__[an_alias][:validate]  = @__compiled__[@__schemas__[an_alias]][:validate]
      @__compiled__[an_alias][:normalize] = @__compiled__[@__schemas__[an_alias]][:normalize]
    end
  end

  #
  # Fake record for guessed links
  #
  @__compiled__[''] = { validate: nil, normalize: createNormalizer }

  #
  # Build schema condition, and filter disabled & fake schemas
  #
  slist = @__compiled__.select {|name, val| name.length > 0 && !val.nil? }.keys.map {|str| escapeRE(str)}.join('|')

  # (?!_) cause 1.5x slowdown
  @re[:schema_test]   = Regexp.new('(^|(?!_)(?:>|' + LinkifyRe::SRC_Z_P_CC + '))(' + slist + ')', 'i')
  @re[:schema_search] = Regexp.new('(^|(?!_)(?:>|' + LinkifyRe::SRC_Z_P_CC + '))(' + slist + ')', 'ig')

  @re[:pretest]       = Regexp.new(
                            '(' + @re[:schema_test].source + ')|' +
                            '(' + @re[:host_fuzzy_test].source + ')|' + '@', 'i')

  #
  # Cleanup
  #

  resetScanCache
end

#createNormalizer ⇒ `Object`

# File 'lib/linkify-it-rb/index.rb', line 80

def createNormalizer()
  return lambda do |match, obj|
    obj.normalize(match)
  end
end

#createValidator(re) ⇒ `Object`

# File 'lib/linkify-it-rb/index.rb', line 71

def createValidator(re)
  return lambda do |text, pos, obj|
    tail = text.slice(pos..-1)

    (re =~ tail) ? tail.match(re)[0].length : 0
  end
end

#escapeRE(str) ⇒ `Object`



60
61
62

# File 'lib/linkify-it-rb/index.rb', line 60

def escapeRE(str)
  return str.gsub(/[\.\?\*\+\^\$\[\]\\\(\)\{\}\|\-]/, "\\$&")
end

#match(text) ⇒ `Object`

LinkifyIt#match(text) -> Array|null

Returns array of found link descriptions or ‘null` on fail. We strongly suggest to use [[LinkifyIt#test]] first, for best speed.

##### Result match description

__schema__ - link schema, can be empty for fuzzy links, or ‘//` for protocol-neutral links.
__index__ - offset of matched text
__lastIndex__ - index of next char after mathch end
__raw__ - matched text
__text__ - normalized text
__url__ - link, generated from matched text

# File 'lib/linkify-it-rb/index.rb', line 418

def match(text)
  shift  = 0
  result = []

  # Try to take previous element from cache, if .test() called before
  if (@__index__ >= 0 && @__text_cache__ == text)
    result.push(Match.createMatch(self, shift))
    shift = @__last_index__
  end

  # Cut head if cache was used
  tail = shift ? text.slice(shift..-1) : text

  # Scan string until end reached
  while (self.test(tail))
    result.push(Match.createMatch(self, shift))

    tail   = tail.slice(@__last_index__..-1)
    shift += @__last_index__
  end

  if (result.length)
    return result
  end

  return nil
end

#normalize(match) ⇒ `Object`

LinkifyIt#normalize(match)

Default normalizer (if schema does not define it’s own).

# File 'lib/linkify-it-rb/index.rb', line 482

def normalize(match)
  return if @bypass_normalizer
  
  # Do minimal possible changes by default. Need to collect feedback prior
  # to move forward https://github.com/markdown-it/linkify-it/issues/1

  match.url = 'http://' + match.url if !match.schema

  if (match.schema == 'mailto:' && !(/^mailto\:/i =~ match.url))
    match.url = 'mailto:' + match.url
  end
end

#pretest(text) ⇒ `Object`

LinkifyIt#pretest(text) -> Boolean

Very quick check, that can give false positives. Returns true if link MAY BE can exists. Can be used for speed optimization, when you need to check that link NOT exists.



381
382
383

# File 'lib/linkify-it-rb/index.rb', line 381

def pretest(text)
  return !(@re[:pretest] =~ text).nil?
end

#resetScanCache ⇒ `Object`

# File 'lib/linkify-it-rb/index.rb', line 65

def resetScanCache
  @__index__      = -1
  @__text_cache__ = ''
end

#test(text) ⇒ `Object`

LinkifyIt#test(text) -> Boolean

Searches linkifiable pattern and returns ‘true` on success or `false` on fail.

# File 'lib/linkify-it-rb/index.rb', line 307

def test(text)
  # Reset scan cache
  @__text_cache__ = text
  @__index__      = -1

  return false if (!text.length)
  
  # try to scan for link with schema - that's the most simple rule
  if @re[:schema_test] =~ text
    re = @re[:schema_search]
    lastIndex = 0
    while ((m = re.match(text, lastIndex)) != nil)
      lastIndex = m.end(0)
      len       = testSchemaAt(text, m[2], lastIndex)
      if len > 0
        @__schema__     = m[2]
        @__index__      = m.begin(0) + m[1].length
        @__last_index__ = m.begin(0) + m[0].length + len
        break
      end
    end
  end

  # guess schemaless links
  if (@__compiled__['http:'])
    tld_pos = text.index(@re[:host_fuzzy_test])
    if !tld_pos.nil?
      # if tld is located after found link - no need to check fuzzy pattern
      if (@__index__ < 0 || tld_pos < @__index__)
        if ((ml = text.match(@re[:link_fuzzy])) != nil)

          shift = ml.begin(0) + ml[1].length

          if (@__index__ < 0 || shift < @__index__)
            @__schema__     = ''
            @__index__      = shift
            @__last_index__ = ml.begin(0) + ml[0].length
          end
        end
      end
    end
  end

  # guess schemaless emails
  if (@__compiled__['mailto:'])
    at_pos = text.index('@')
    if !at_pos.nil?
      # We can't skip this check, because this cases are possible:
      # [email protected], [email protected]
      if ((me = text.match(@re[:email_fuzzy])) != nil)

        shift = me.begin(0) + me[1].length
        nextc = me.begin(0) + me[0].length

        if (@__index__ < 0 || shift < @__index__ ||
            (shift == @__index__ && nextc > @__last_index__))
          @__schema__     = 'mailto:'
          @__index__      = shift
          @__last_index__ = nextc
        end
      end
    end
  end

  return @__index__ >= 0
end

#testSchemaAt(text, schema, pos) ⇒ `Object`

LinkifyIt#testSchemaAt(text, name, position) -> Number

text (String): text to scan
name (String): rule (schema) name
position (Number): text offset to check from

Similar to [[LinkifyIt#test]] but checks only specific protocol tail exactly at given position. Returns length of found pattern (0 on fail).

# File 'lib/linkify-it-rb/index.rb', line 394

def testSchemaAt(text, schema, pos)
  # If not supported schema check requested - terminate
  if (!@__compiled__[schema.downcase])
    return 0
  end
  return @__compiled__[schema.downcase][:validate].call(text, pos, self)
end

#tlds(list, keepOld) ⇒ `Object`

chainable LinkifyIt#tlds(list [, keepOld]) -> this

list (Array): list of tlds
keepOld (Boolean): merge with current list if ‘true` (`false` by default)

Load (or merge) new tlds list. Those are user for fuzzy links (without prefix) to avoid false positives. By default this algorythm used:

hostname with any 2-letter root zones are ok.
biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф are ok.
encoded (‘xn–…`) root zones are ok.

If list is replaced, then exact match for 2-chars root zones will be checked.

# File 'lib/linkify-it-rb/index.rb', line 462

def tlds(list, keepOld)
  list = list.is_a?(Array) ? list : [ list ]

  if (!keepOld)
    @__tlds__ = list.dup
    @__tlds_replaced__ = true
    compile
    return self
  end

  @__tlds__ = @__tlds__.concat(list).sort.uniq.reverse

  compile
  return self
end

Class: Linkify

Defined Under Namespace

Constant Summary collapse

Constants included from LinkifyRe

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(schemas = {}) ⇒ Linkify

Instance Attribute Details

#__compiled__ ⇒ Object

#__index__ ⇒ Object

#__last_index__ ⇒ Object

#__schema__ ⇒ Object

#__text_cache__ ⇒ Object

#bypass_normalizer ⇒ Object

#re ⇒ Object

Instance Method Details

#add(schema, definition) ⇒ Object

#compile ⇒ Object

#createNormalizer ⇒ Object

#createValidator(re) ⇒ Object

#escapeRE(str) ⇒ Object

#match(text) ⇒ Object

#normalize(match) ⇒ Object

#pretest(text) ⇒ Object

#resetScanCache ⇒ Object

#test(text) ⇒ Object

#testSchemaAt(text, schema, pos) ⇒ Object

#tlds(list, keepOld) ⇒ Object