Class: Natter::Parser

Inherits:
Object
  • Object
show all
Defined in:
lib/natter/parser.rb

Overview

Public: The parser is the main workhorse, responsible for deriving the intent from an utterance.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeParser

Returns a new instance of Parser.



10
11
12
13
14
15
# File 'lib/natter/parser.rb', line 10

def initialize
  @known_utterances = Hash.new # key = utterance, value = Intent
  @contractions = init_contractions # key = contraction, value = expansion
  @intent_cache = Hash.new # key = utterance, value = Intent
  @rules = Hash.new # key = rule regex pattern, value = Rule object
end

Instance Attribute Details

#known_utterancesObject (readonly)

Read access to the Hash containing known utterances.



8
9
10
# File 'lib/natter/parser.rb', line 8

def known_utterances
  @known_utterances
end

#rulesObject (readonly)

Read access to the Hash containing known rules.



6
7
8
# File 'lib/natter/parser.rb', line 6

def rules
  @rules
end

Instance Method Details

#add_rule(rule) ⇒ Object

Public: Adds a regex-based Rule to the parser.

rule - The Natter::Rule to add.

Raises:

  • (ArgumentError)


20
21
22
23
24
25
26
27
28
29
# File 'lib/natter/parser.rb', line 20

def add_rule(rule)
  raise ArgumentError, "Expected Natter::Rule but got `#{rule}`" unless rule.is_a?(Rule)
  if @rules.has_key?(rule.pattern)
    raise ArgumentError, "Regex pattern already defined by " +\
    "#{@rules[rule.pattern].identifier}: #{rule.pattern}"
  end
  # Make sure that this rule's owning skill is capitalised
  rule.skill.capitalize!
  @rules[rule.pattern] = rule
end

#add_rules(rules) ⇒ Object

Public: Adds one or more regex-based Rules to the parser. A convenience method.

rules - Either a Natter::Rule or an array of Natter::Rules.



35
36
37
38
39
40
41
# File 'lib/natter/parser.rb', line 35

def add_rules(rules)
  if rules.kind_of?(Array)
    rules.each { |rule| add_rule(rule) }
  else
    add_rule(rules)
  end
end

#add_utterance(example) ⇒ Object

Public: Adds a pre-computed utterance/intent pair to the parser. Used when a specific utterance(s) match a predetermined intent. This saves overhead as there is no regex processing required. These utterances are evaluated before the regex rules. Multiple examples can be added at once. Adding an utterance that already exists will overwrite the old one.

example - A Hash where:

key   = A single utterance or array of utterances
value = Natter::Intent

Examples

add_utterance(‘hello’ => Intent.new(‘greeting’)) add_utterance([‘what time is it’, ‘what is the time’] => Intent.new(‘currentTime’)) add_utterance(

'night night' => Intent.new('goodnight'),
'lock the door' => Intent.new('lock')

)

Returns nothing.

Raises:

  • (ArgumentError)


64
65
66
67
68
69
70
71
72
73
# File 'lib/natter/parser.rb', line 64

def add_utterance(example)
  raise ArgumentError, "Expected {utterance => Intent} or {[utterances] => Intent}" unless example.is_a?(Hash)
  example.map do |utterance, intent|
    if utterance.kind_of?(Array)
      utterance.each { |phrase| @known_utterances[phrase] = intent }
    else
      @known_utterances[utterance] = intent
    end
  end
end

#determine_confidences(intents) ⇒ Object

Internal: Determines the confidence of each intent in the passed array and then sorts them based on the calculated confidence values. Basically, if we have more than one intent then whichever intent has the greatest number of entities is likely to be the best match.

intents - An array of Intent objects.

Returns a sorted (by confidence) array of Intent objects. Mutates original array.



132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
# File 'lib/natter/parser.rb', line 132

def determine_confidences(intents)
  # Handle where there's only one matching intent
  if intents.length == 1
    intents[0].confidence = 1.0
    return intents
  end

  # First determine the total number of entities in any of the intents
  total = 0
  intents.each { |i| total += i.entities.length }

  if total == 0
    # Edge case: all matching intents contain no entities.
    # Assign equal confidence to all intents
    result = intents.map do |i|
      i.confidence = 1.0/intents.length
      i # return this intent from the map
    end
  else
    result = intents.map do |i|
      i.confidence = i.entities.length.to_f/total
      i # return this intent from the map
    end
  end

  # Sort the array by descending confidence values
  result.sort_by { |i| i.confidence }.reverse
end

#expand_contractions(text) ⇒ Object

Expand the contractions within this string.

Examples

t = “I’m hot” t.expand_contractions!

# => "I am hot"


318
319
320
321
322
323
324
# File 'lib/natter/parser.rb', line 318

def expand_contractions(text)
  result = ''
  text.strip.split(' ').each do |word|
    result = result + @contractions.fetch(word, word) + ' '
  end
  return result.strip
end

#init_contractionsObject

Private: Initialise the @contractions Hash. Only needs doing once. OPTIMISE: Perhaps move these values to an editable text file?



223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
# File 'lib/natter/parser.rb', line 223

def init_contractions
  {
    "that's" => "that is",
    "aren't" => "are not",
    "can't" => "can not",
    "could've" => "could have",
    "couldn't" => "could not",
    "didn't" => "did not",
    "doesn't" => "does not",
    "don't" => "do not",
    "dunno" => "do not know",
    "gonna" => "going to",
    "gotta" => "got to",
    "hadn't" => "had not",
    "hasn't" => "has not",
    "haven't" => "have not",
    "he'd" => "he had",
    "he'll" => "he will",
    "he's" => "he is",
    "how'd" => "how would",
    "how'll" => "how will",
    "how're" => "how are",
    "how's" => "how is",
    "i'd" => "i would",
    "i'll" => "i will",
    "i'm" => "i am",
    "i've" => "i have",
    "isn't" => "is not",
    "it'd" => "it would",
    "it'll" => "it will",
    "it's" => "it is",
    "mightn't" => "might not",
    "might've" => "might have",
    "mustn't" => "must not",
    "must've" => "must have",
    "ol'" => "old",
    "oughtn't" => "ought not",
    "shan't" => "shall not",
    "she'd" => "she would",
    "she'll" => "she will",
    "she's" => "she is",
    "should've" => "should have",
    "shouldn't" => "should not",
    "somebody's" => "somebody is",
    "someone'll" => "someone will",
    "someone's" => "someone is",
    "something'll" => "something will",
    "something's" => "something is",
    "that'll" => "that will",
    "that'd" => "that would",
    "there'd" => "there had",
    "there's" => "there is",
    "they'd" => "they would",
    "they'll" => "they will",
    "they're" => "they are",
    "they've" => "they have",
    "wasn't" => "was not",
    "we'd" => "we had",
    "we'll" => "we will",
    "we're" => "we are",
    "we've" => "we have",
    "weren't" => "were not",
    "what'd" => "what did",
    "what'll" => "what will",
    "what're" => "what are",
    "what's" => "what is",
    "what've" => "what have",
    "when's" => "when is",
    "where'd" => "where did",
    "where's" => "where is",
    "where've" => "where have",
    "who'd" => "who would",
    "who'll" => "who will",
    "who's" => "who is",
    "why'd" => "why did",
    "why're" => "why are",
    "why's" => "why is",
    "won't" => "will not",
    "won't've" => "will not have",
    "would've" => "would have",
    "wouldn't" => "would not",
    "you'd" => "you would",
    "you'll" => "you will",
    "you're" => "you are",
    "you've" => "you have"
  }
end

#intent_from_match(rule, m) ⇒ Object

Internal: Converts a positive regex match and returns an Intent object. Note that the confidence is set to 0 as it will be determined later.

rule - The Rule definining this intent. m - The positive regex match.

Returns Intent.



168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
# File 'lib/natter/parser.rb', line 168

def intent_from_match(rule, m)
  if m.named_captures.empty?
    # No capture groups found. Double-check the rule doesn't need any entities
    if rule.entities.empty?
      return Intent.new(rule.name, rule.skill, 0)
    else
      # Expected at least one entity. This can't be a valid match then
      return nil
    end
  else
    # Found some entities. Check they match up with the rule
    intent = Intent.new(rule.name, rule.skill, 0)
    rule.entities.each do |entity|
      if m.named_captures.has_key?(entity.name)
        e = Entity.new(entity.name, entity.type, m.named_captures[entity.name].strip)
        intent.entities << e
      else
        # Found a named capture group that doesn't match an entity defined
        # in the rule
        return nil
      end
    end
    if intent.entities.length != m.named_captures.length
      # Found some entity matches but not all
      return nil
    else
      return intent
    end
  end
end

#parse(text, use_cache = true) ⇒ Object

Public: Analyse an utterance and return any matching intents.

utterance - The natural language string to analyse use_cache - If true then we will check a cache of previously returned

utterance/intent pairs to return rather than re-parsing.
(default: true)

Returns an Intent, an array of Intents or nil if the intent cannot be determined.

Raises:

  • (ArgumentError)


84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
# File 'lib/natter/parser.rb', line 84

def parse(text, use_cache = true)
  raise ArgumentError, "Cannot parse thin air!" unless text.length > 0

  # Store the original string for later
  original = text

  # Tidy up the string for parsing
  utterance = purify(original)

  if @known_utterances.has_key?(utterance)
    return @known_utterances[utterance]
  end

  if use_cache && @intent_cache.has_key?(utterance)
    return @intent_cache[utterance]
  end

  intents = []
  @rules.each do |pattern, rule|
    m = utterance.match(rule.pattern)
    if m == nil
      next
    else
      intent = intent_from_match(rule, m)
      if intent then intents << intent end
    end
  end

  if intents.empty? then return nil end

  # Calculate the confidence of each intent
  intents = determine_confidences(intents)

  # Cache the matches
  @intent_cache[utterance] = intents

  return intents
end

#purify(t) ⇒ Object

Internal: Tidies up the passed string to remove unnecessary characters and replace ambiguous phrases such as contractions.

t - The string to purify.

Examples

str = “what’re you doing?!” str = purify(str)

# => "what are you doing"


209
210
211
212
# File 'lib/natter/parser.rb', line 209

def purify(t)
  t = expand_contractions(t)
  t = strip_trailing_punctuation(t)
end

#strip_trailing_punctuation(t) ⇒ Object

Internal: Removes trailing ‘?’ and ‘!’ from the passed string.

t - The string from which to remove superfluous trailing punctuation.



217
218
219
# File 'lib/natter/parser.rb', line 217

def strip_trailing_punctuation(t)
  t.sub(/[?!]+\z/, '')
end