Class: Natter::Parser
- Inherits:
-
Object
- Object
- Natter::Parser
- Defined in:
- lib/natter/parser.rb
Overview
Public: The parser is the main workhorse, responsible for deriving the intent from an utterance.
Instance Attribute Summary collapse
-
#known_utterances ⇒ Object
readonly
Read access to the Hash containing known utterances.
-
#rules ⇒ Object
readonly
Read access to the Hash containing known rules.
Instance Method Summary collapse
-
#add_rule(rule) ⇒ Object
Public: Adds a regex-based Rule to the parser.
-
#add_rules(rules) ⇒ Object
Public: Adds one or more regex-based Rules to the parser.
-
#add_utterance(example) ⇒ Object
Public: Adds a pre-computed utterance/intent pair to the parser.
-
#determine_confidences(intents) ⇒ Object
Internal: Determines the confidence of each intent in the passed array and then sorts them based on the calculated confidence values.
-
#expand_contractions(text) ⇒ Object
Expand the contractions within this string.
-
#init_contractions ⇒ Object
Private: Initialise the @contractions Hash.
-
#initialize ⇒ Parser
constructor
A new instance of Parser.
-
#intent_from_match(rule, m) ⇒ Object
Internal: Converts a positive regex match and returns an Intent object.
-
#parse(text, use_cache = true) ⇒ Object
Public: Analyse an utterance and return any matching intents.
-
#purify(t) ⇒ Object
Internal: Tidies up the passed string to remove unnecessary characters and replace ambiguous phrases such as contractions.
-
#strip_trailing_punctuation(t) ⇒ Object
Internal: Removes trailing ‘?’ and ‘!’ from the passed string.
Constructor Details
#initialize ⇒ Parser
Returns a new instance of Parser.
10 11 12 13 14 15 |
# File 'lib/natter/parser.rb', line 10 def initialize @known_utterances = Hash.new # key = utterance, value = Intent @contractions = init_contractions # key = contraction, value = expansion @intent_cache = Hash.new # key = utterance, value = Intent @rules = Hash.new # key = rule regex pattern, value = Rule object end |
Instance Attribute Details
#known_utterances ⇒ Object (readonly)
Read access to the Hash containing known utterances.
8 9 10 |
# File 'lib/natter/parser.rb', line 8 def known_utterances @known_utterances end |
#rules ⇒ Object (readonly)
Read access to the Hash containing known rules.
6 7 8 |
# File 'lib/natter/parser.rb', line 6 def rules @rules end |
Instance Method Details
#add_rule(rule) ⇒ Object
Public: Adds a regex-based Rule to the parser.
rule - The Natter::Rule to add.
20 21 22 23 24 25 26 27 28 29 |
# File 'lib/natter/parser.rb', line 20 def add_rule(rule) raise ArgumentError, "Expected Natter::Rule but got `#{rule}`" unless rule.is_a?(Rule) if @rules.has_key?(rule.pattern) raise ArgumentError, "Regex pattern already defined by " +\ "#{@rules[rule.pattern].identifier}: #{rule.pattern}" end # Make sure that this rule's owning skill is capitalised rule.skill.capitalize! @rules[rule.pattern] = rule end |
#add_rules(rules) ⇒ Object
Public: Adds one or more regex-based Rules to the parser. A convenience method.
rules - Either a Natter::Rule or an array of Natter::Rules.
35 36 37 38 39 40 41 |
# File 'lib/natter/parser.rb', line 35 def add_rules(rules) if rules.kind_of?(Array) rules.each { |rule| add_rule(rule) } else add_rule(rules) end end |
#add_utterance(example) ⇒ Object
Public: Adds a pre-computed utterance/intent pair to the parser. Used when a specific utterance(s) match a predetermined intent. This saves overhead as there is no regex processing required. These utterances are evaluated before the regex rules. Multiple examples can be added at once. Adding an utterance that already exists will overwrite the old one.
example - A Hash where:
key = A single utterance or array of utterances
value = Natter::Intent
Examples
add_utterance(‘hello’ => Intent.new(‘greeting’)) add_utterance([‘what time is it’, ‘what is the time’] => Intent.new(‘currentTime’)) add_utterance(
'night night' => Intent.new('goodnight'),
'lock the door' => Intent.new('lock')
)
Returns nothing.
64 65 66 67 68 69 70 71 72 73 |
# File 'lib/natter/parser.rb', line 64 def add_utterance(example) raise ArgumentError, "Expected {utterance => Intent} or {[utterances] => Intent}" unless example.is_a?(Hash) example.map do |utterance, intent| if utterance.kind_of?(Array) utterance.each { |phrase| @known_utterances[phrase] = intent } else @known_utterances[utterance] = intent end end end |
#determine_confidences(intents) ⇒ Object
Internal: Determines the confidence of each intent in the passed array and then sorts them based on the calculated confidence values. Basically, if we have more than one intent then whichever intent has the greatest number of entities is likely to be the best match.
intents - An array of Intent objects.
Returns a sorted (by confidence) array of Intent objects. Mutates original array.
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
# File 'lib/natter/parser.rb', line 132 def determine_confidences(intents) # Handle where there's only one matching intent if intents.length == 1 intents[0].confidence = 1.0 return intents end # First determine the total number of entities in any of the intents total = 0 intents.each { |i| total += i.entities.length } if total == 0 # Edge case: all matching intents contain no entities. # Assign equal confidence to all intents result = intents.map do |i| i.confidence = 1.0/intents.length i # return this intent from the map end else result = intents.map do |i| i.confidence = i.entities.length.to_f/total i # return this intent from the map end end # Sort the array by descending confidence values result.sort_by { |i| i.confidence }.reverse end |
#expand_contractions(text) ⇒ Object
Expand the contractions within this string.
Examples
t = “I’m hot” t.expand_contractions!
# => "I am hot"
318 319 320 321 322 323 324 |
# File 'lib/natter/parser.rb', line 318 def (text) result = '' text.strip.split(' ').each do |word| result = result + @contractions.fetch(word, word) + ' ' end return result.strip end |
#init_contractions ⇒ Object
Private: Initialise the @contractions Hash. Only needs doing once. OPTIMISE: Perhaps move these values to an editable text file?
223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 |
# File 'lib/natter/parser.rb', line 223 def init_contractions { "that's" => "that is", "aren't" => "are not", "can't" => "can not", "could've" => "could have", "couldn't" => "could not", "didn't" => "did not", "doesn't" => "does not", "don't" => "do not", "dunno" => "do not know", "gonna" => "going to", "gotta" => "got to", "hadn't" => "had not", "hasn't" => "has not", "haven't" => "have not", "he'd" => "he had", "he'll" => "he will", "he's" => "he is", "how'd" => "how would", "how'll" => "how will", "how're" => "how are", "how's" => "how is", "i'd" => "i would", "i'll" => "i will", "i'm" => "i am", "i've" => "i have", "isn't" => "is not", "it'd" => "it would", "it'll" => "it will", "it's" => "it is", "mightn't" => "might not", "might've" => "might have", "mustn't" => "must not", "must've" => "must have", "ol'" => "old", "oughtn't" => "ought not", "shan't" => "shall not", "she'd" => "she would", "she'll" => "she will", "she's" => "she is", "should've" => "should have", "shouldn't" => "should not", "somebody's" => "somebody is", "someone'll" => "someone will", "someone's" => "someone is", "something'll" => "something will", "something's" => "something is", "that'll" => "that will", "that'd" => "that would", "there'd" => "there had", "there's" => "there is", "they'd" => "they would", "they'll" => "they will", "they're" => "they are", "they've" => "they have", "wasn't" => "was not", "we'd" => "we had", "we'll" => "we will", "we're" => "we are", "we've" => "we have", "weren't" => "were not", "what'd" => "what did", "what'll" => "what will", "what're" => "what are", "what's" => "what is", "what've" => "what have", "when's" => "when is", "where'd" => "where did", "where's" => "where is", "where've" => "where have", "who'd" => "who would", "who'll" => "who will", "who's" => "who is", "why'd" => "why did", "why're" => "why are", "why's" => "why is", "won't" => "will not", "won't've" => "will not have", "would've" => "would have", "wouldn't" => "would not", "you'd" => "you would", "you'll" => "you will", "you're" => "you are", "you've" => "you have" } end |
#intent_from_match(rule, m) ⇒ Object
Internal: Converts a positive regex match and returns an Intent object. Note that the confidence is set to 0 as it will be determined later.
rule - The Rule definining this intent. m - The positive regex match.
Returns Intent.
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
# File 'lib/natter/parser.rb', line 168 def intent_from_match(rule, m) if m.named_captures.empty? # No capture groups found. Double-check the rule doesn't need any entities if rule.entities.empty? return Intent.new(rule.name, rule.skill, 0) else # Expected at least one entity. This can't be a valid match then return nil end else # Found some entities. Check they match up with the rule intent = Intent.new(rule.name, rule.skill, 0) rule.entities.each do |entity| if m.named_captures.has_key?(entity.name) e = Entity.new(entity.name, entity.type, m.named_captures[entity.name].strip) intent.entities << e else # Found a named capture group that doesn't match an entity defined # in the rule return nil end end if intent.entities.length != m.named_captures.length # Found some entity matches but not all return nil else return intent end end end |
#parse(text, use_cache = true) ⇒ Object
Public: Analyse an utterance and return any matching intents.
utterance - The natural language string to analyse use_cache - If true then we will check a cache of previously returned
utterance/intent pairs to return rather than re-parsing.
(default: true)
Returns an Intent, an array of Intents or nil if the intent cannot be determined.
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
# File 'lib/natter/parser.rb', line 84 def parse(text, use_cache = true) raise ArgumentError, "Cannot parse thin air!" unless text.length > 0 # Store the original string for later original = text # Tidy up the string for parsing utterance = purify(original) if @known_utterances.has_key?(utterance) return @known_utterances[utterance] end if use_cache && @intent_cache.has_key?(utterance) return @intent_cache[utterance] end intents = [] @rules.each do |pattern, rule| m = utterance.match(rule.pattern) if m == nil next else intent = intent_from_match(rule, m) if intent then intents << intent end end end if intents.empty? then return nil end # Calculate the confidence of each intent intents = determine_confidences(intents) # Cache the matches @intent_cache[utterance] = intents return intents end |
#purify(t) ⇒ Object
Internal: Tidies up the passed string to remove unnecessary characters and replace ambiguous phrases such as contractions.
t - The string to purify.
Examples
str = “what’re you doing?!” str = purify(str)
# => "what are you doing"
209 210 211 212 |
# File 'lib/natter/parser.rb', line 209 def purify(t) t = (t) t = strip_trailing_punctuation(t) end |
#strip_trailing_punctuation(t) ⇒ Object
Internal: Removes trailing ‘?’ and ‘!’ from the passed string.
t - The string from which to remove superfluous trailing punctuation.
217 218 219 |
# File 'lib/natter/parser.rb', line 217 def strip_trailing_punctuation(t) t.sub(/[?!]+\z/, '') end |