Class: Natter::Parser

Inherits:

Object

Object
Natter::Parser

show all

Defined in:: lib/natter/parser.rb

Overview

Public: The parser is the main workhorse, responsible for deriving the intent from an utterance.

Instance Attribute Summary collapse

#known_utterances ⇒ Object readonly

Read access to the Hash containing known utterances.
#rules ⇒ Object readonly

Read access to the Hash containing known rules.

Instance Method Summary collapse

#add_rule(rule) ⇒ Object

Public: Adds a regex-based Rule to the parser.
#add_rules(rules) ⇒ Object

Public: Adds one or more regex-based Rules to the parser.
#add_utterance(example) ⇒ Object

Public: Adds a pre-computed utterance/intent pair to the parser.
#determine_confidences(intents) ⇒ Object

Internal: Determines the confidence of each intent in the passed array and then sorts them based on the calculated confidence values.
#expand_contractions(text) ⇒ Object

Expand the contractions within this string.
#init_contractions ⇒ Object

Private: Initialise the @contractions Hash.
#initialize ⇒ Parser constructor

A new instance of Parser.
#intent_from_match(rule, m) ⇒ Object

Internal: Converts a positive regex match and returns an Intent object.
#parse(text, use_cache = true) ⇒ Object

Public: Analyse an utterance and return any matching intents.
#purify(t) ⇒ Object

Internal: Tidies up the passed string to remove unnecessary characters and replace ambiguous phrases such as contractions.
#strip_trailing_punctuation(t) ⇒ Object

Internal: Removes trailing ‘?’ and ‘!’ from the passed string.

Constructor Details

#initialize ⇒ `Parser`

Returns a new instance of Parser.

# File 'lib/natter/parser.rb', line 10

def initialize
  @known_utterances = Hash.new # key = utterance, value = Intent
  @contractions = init_contractions # key = contraction, value = expansion
  @intent_cache = Hash.new # key = utterance, value = Intent
  @rules = Hash.new # key = rule regex pattern, value = Rule object
end

Instance Attribute Details

#known_utterances ⇒ `Object` (readonly)

Read access to the Hash containing known utterances.



8
9
10

# File 'lib/natter/parser.rb', line 8

def known_utterances
  @known_utterances
end

#rules ⇒ `Object` (readonly)

Read access to the Hash containing known rules.



6
7
8

# File 'lib/natter/parser.rb', line 6

def rules
  @rules
end

Instance Method Details

#add_rule(rule) ⇒ `Object`

Public: Adds a regex-based Rule to the parser.

rule - The Natter::Rule to add.

Raises:

(ArgumentError)

# File 'lib/natter/parser.rb', line 20

def add_rule(rule)
  raise ArgumentError, "Expected Natter::Rule but got `#{rule}`" unless rule.is_a?(Rule)
  if @rules.has_key?(rule.pattern)
    raise ArgumentError, "Regex pattern already defined by " +\
    "#{@rules[rule.pattern].identifier}: #{rule.pattern}"
  end
  # Make sure that this rule's owning skill is capitalised
  rule.skill.capitalize!
  @rules[rule.pattern] = rule
end

#add_rules(rules) ⇒ `Object`

Public: Adds one or more regex-based Rules to the parser. A convenience method.

rules - Either a Natter::Rule or an array of Natter::Rules.

# File 'lib/natter/parser.rb', line 35

def add_rules(rules)
  if rules.kind_of?(Array)
    rules.each { |rule| add_rule(rule) }
  else
    add_rule(rules)
  end
end

#add_utterance(example) ⇒ `Object`

Public: Adds a pre-computed utterance/intent pair to the parser. Used when a specific utterance(s) match a predetermined intent. This saves overhead as there is no regex processing required. These utterances are evaluated before the regex rules. Multiple examples can be added at once. Adding an utterance that already exists will overwrite the old one.

example - A Hash where:

key   = A single utterance or array of utterances
value = Natter::Intent

Examples

add_utterance(‘hello’ => Intent.new(‘greeting’)) add_utterance([‘what time is it’, ‘what is the time’] => Intent.new(‘currentTime’)) add_utterance(

'night night' => Intent.new('goodnight'),
'lock the door' => Intent.new('lock')

)

Returns nothing.

Raises:

(ArgumentError)

# File 'lib/natter/parser.rb', line 64

def add_utterance(example)
  raise ArgumentError, "Expected {utterance => Intent} or {[utterances] => Intent}" unless example.is_a?(Hash)
  example.map do |utterance, intent|
    if utterance.kind_of?(Array)
      utterance.each { |phrase| @known_utterances[phrase] = intent }
    else
      @known_utterances[utterance] = intent
    end
  end
end

#determine_confidences(intents) ⇒ `Object`

Internal: Determines the confidence of each intent in the passed array and then sorts them based on the calculated confidence values. Basically, if we have more than one intent then whichever intent has the greatest number of entities is likely to be the best match.

intents - An array of Intent objects.

Returns a sorted (by confidence) array of Intent objects. Mutates original array.

# File 'lib/natter/parser.rb', line 132

def determine_confidences(intents)
  # Handle where there's only one matching intent
  if intents.length == 1
    intents[0].confidence = 1.0
    return intents
  end

  # First determine the total number of entities in any of the intents
  total = 0
  intents.each { |i| total += i.entities.length }

  if total == 0
    # Edge case: all matching intents contain no entities.
    # Assign equal confidence to all intents
    result = intents.map do |i|
      i.confidence = 1.0/intents.length
      i # return this intent from the map
    end
  else
    result = intents.map do |i|
      i.confidence = i.entities.length.to_f/total
      i # return this intent from the map
    end
  end

  # Sort the array by descending confidence values
  result.sort_by { |i| i.confidence }.reverse
end

#expand_contractions(text) ⇒ `Object`

Expand the contractions within this string.

Examples

t = “I’m hot” t.expand_contractions!

# => "I am hot"

# File 'lib/natter/parser.rb', line 318

def expand_contractions(text)
  result = ''
  text.strip.split(' ').each do |word|
    result = result + @contractions.fetch(word, word) + ' '
  end
  return result.strip
end

#init_contractions ⇒ `Object`

Private: Initialise the @contractions Hash. Only needs doing once. OPTIMISE: Perhaps move these values to an editable text file?

# File 'lib/natter/parser.rb', line 223

def init_contractions
  {
    "that's" => "that is",
    "aren't" => "are not",
    "can't" => "can not",
    "could've" => "could have",
    "couldn't" => "could not",
    "didn't" => "did not",
    "doesn't" => "does not",
    "don't" => "do not",
    "dunno" => "do not know",
    "gonna" => "going to",
    "gotta" => "got to",
    "hadn't" => "had not",
    "hasn't" => "has not",
    "haven't" => "have not",
    "he'd" => "he had",
    "he'll" => "he will",
    "he's" => "he is",
    "how'd" => "how would",
    "how'll" => "how will",
    "how're" => "how are",
    "how's" => "how is",
    "i'd" => "i would",
    "i'll" => "i will",
    "i'm" => "i am",
    "i've" => "i have",
    "isn't" => "is not",
    "it'd" => "it would",
    "it'll" => "it will",
    "it's" => "it is",
    "mightn't" => "might not",
    "might've" => "might have",
    "mustn't" => "must not",
    "must've" => "must have",
    "ol'" => "old",
    "oughtn't" => "ought not",
    "shan't" => "shall not",
    "she'd" => "she would",
    "she'll" => "she will",
    "she's" => "she is",
    "should've" => "should have",
    "shouldn't" => "should not",
    "somebody's" => "somebody is",
    "someone'll" => "someone will",
    "someone's" => "someone is",
    "something'll" => "something will",
    "something's" => "something is",
    "that'll" => "that will",
    "that'd" => "that would",
    "there'd" => "there had",
    "there's" => "there is",
    "they'd" => "they would",
    "they'll" => "they will",
    "they're" => "they are",
    "they've" => "they have",
    "wasn't" => "was not",
    "we'd" => "we had",
    "we'll" => "we will",
    "we're" => "we are",
    "we've" => "we have",
    "weren't" => "were not",
    "what'd" => "what did",
    "what'll" => "what will",
    "what're" => "what are",
    "what's" => "what is",
    "what've" => "what have",
    "when's" => "when is",
    "where'd" => "where did",
    "where's" => "where is",
    "where've" => "where have",
    "who'd" => "who would",
    "who'll" => "who will",
    "who's" => "who is",
    "why'd" => "why did",
    "why're" => "why are",
    "why's" => "why is",
    "won't" => "will not",
    "won't've" => "will not have",
    "would've" => "would have",
    "wouldn't" => "would not",
    "you'd" => "you would",
    "you'll" => "you will",
    "you're" => "you are",
    "you've" => "you have"
  }
end

#intent_from_match(rule, m) ⇒ `Object`

Internal: Converts a positive regex match and returns an Intent object. Note that the confidence is set to 0 as it will be determined later.

rule - The Rule definining this intent. m - The positive regex match.

Returns Intent.

# File 'lib/natter/parser.rb', line 168

def intent_from_match(rule, m)
  if m.named_captures.empty?
    # No capture groups found. Double-check the rule doesn't need any entities
    if rule.entities.empty?
      return Intent.new(rule.name, rule.skill, 0)
    else
      # Expected at least one entity. This can't be a valid match then
      return nil
    end
  else
    # Found some entities. Check they match up with the rule
    intent = Intent.new(rule.name, rule.skill, 0)
    rule.entities.each do |entity|
      if m.named_captures.has_key?(entity.name)
        e = Entity.new(entity.name, entity.type, m.named_captures[entity.name].strip)
        intent.entities << e
      else
        # Found a named capture group that doesn't match an entity defined
        # in the rule
        return nil
      end
    end
    if intent.entities.length != m.named_captures.length
      # Found some entity matches but not all
      return nil
    else
      return intent
    end
  end
end

#parse(text, use_cache = true) ⇒ `Object`

Public: Analyse an utterance and return any matching intents.

utterance - The natural language string to analyse use_cache - If true then we will check a cache of previously returned

utterance/intent pairs to return rather than re-parsing.
(default: true)

Returns an Intent, an array of Intents or nil if the intent cannot be determined.

Raises:

(ArgumentError)

# File 'lib/natter/parser.rb', line 84

def parse(text, use_cache = true)
  raise ArgumentError, "Cannot parse thin air!" unless text.length > 0

  # Store the original string for later
  original = text

  # Tidy up the string for parsing
  utterance = purify(original)

  if @known_utterances.has_key?(utterance)
    return @known_utterances[utterance]
  end

  if use_cache && @intent_cache.has_key?(utterance)
    return @intent_cache[utterance]
  end

  intents = []
  @rules.each do |pattern, rule|
    m = utterance.match(rule.pattern)
    if m == nil
      next
    else
      intent = intent_from_match(rule, m)
      if intent then intents << intent end
    end
  end

  if intents.empty? then return nil end

  # Calculate the confidence of each intent
  intents = determine_confidences(intents)

  # Cache the matches
  @intent_cache[utterance] = intents

  return intents
end

#purify(t) ⇒ `Object`

Internal: Tidies up the passed string to remove unnecessary characters and replace ambiguous phrases such as contractions.

t - The string to purify.

Examples

str = “what’re you doing?!” str = purify(str)

# => "what are you doing"

# File 'lib/natter/parser.rb', line 209

def purify(t)
  t = expand_contractions(t)
  t = strip_trailing_punctuation(t)
end

#strip_trailing_punctuation(t) ⇒ `Object`

Internal: Removes trailing ‘?’ and ‘!’ from the passed string.

t - The string from which to remove superfluous trailing punctuation.



217
218
219

# File 'lib/natter/parser.rb', line 217

def strip_trailing_punctuation(t)
  t.sub(/[?!]+\z/, '')
end

Class: Natter::Parser

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize ⇒ Parser

Instance Attribute Details

#known_utterances ⇒ Object (readonly)

#rules ⇒ Object (readonly)

Instance Method Details

#add_rule(rule) ⇒ Object

#add_rules(rules) ⇒ Object

#add_utterance(example) ⇒ Object

#determine_confidences(intents) ⇒ Object

#expand_contractions(text) ⇒ Object

#init_contractions ⇒ Object

#intent_from_match(rule, m) ⇒ Object

#parse(text, use_cache = true) ⇒ Object

#purify(t) ⇒ Object

#strip_trailing_punctuation(t) ⇒ Object

#initialize ⇒ `Parser`

#known_utterances ⇒ `Object` (readonly)

#rules ⇒ `Object` (readonly)

#add_rule(rule) ⇒ `Object`

#add_rules(rules) ⇒ `Object`

#add_utterance(example) ⇒ `Object`

#determine_confidences(intents) ⇒ `Object`

#expand_contractions(text) ⇒ `Object`

#init_contractions ⇒ `Object`

#intent_from_match(rule, m) ⇒ `Object`

#parse(text, use_cache = true) ⇒ `Object`

#purify(t) ⇒ `Object`

#strip_trailing_punctuation(t) ⇒ `Object`