Class: Answerific::Miner

Inherits:
Object
  • Object
show all
Defined in:
lib/answerific/miner.rb

Overview

Miner bot that answers questions by extracting information from the web Currently only supports Google Search

Instance Method Summary collapse

Instance Method Details

#answer(question) ⇒ Object

Answers ‘question` by querying Google Assumes `question` is downcase, only contains alpha numeric characters

(i.e. has been preprocessed by Answerific::Bot.preprocess)

Returns a string containing the response or nil if none is found



11
12
13
14
15
# File 'lib/answerific/miner.rb', line 11

def answer(question)
  p 'Answering ' + question
  return nil if !question || question.empty?
  mine(parse(preprocess(question)))
end

#broad_question_type(question) ⇒ Object

DETECT TYPE OF QUESTION ===



125
126
127
128
129
# File 'lib/answerific/miner.rb', line 125

def broad_question_type(question)
  return 'wh' if is_wh_question question
  return 'yes-no' if is_yes_no_question question
  return 'declarative'
end

#clean(input) ⇒ Object

Cleans the string ‘input` by removing non alpha-numeric characters



151
152
153
154
# File 'lib/answerific/miner.rb', line 151

def clean(input)
  ret = input.downcase
  ret.gsub(/[^0-9a-z ]/i, '').strip
end

#clean_google_result(string) ⇒ Object

OTHER FORMATTING ===



158
159
160
161
162
163
164
165
166
167
# File 'lib/answerific/miner.rb', line 158

def clean_google_result(string)
  string = CGI.unescapeHTML(string)
  string
  .downcase
  .gsub(/[^\.]+\.{3,}/, '')                 # remove incomplete sentences
  .gsub(/<("[^"]*"|'[^']*'|[^'">])*>/, '')  # html tags
  .gsub(/\w{3} \d{1,2}, \d{4} \.{3} /, '')  # dates (27 Jan, 2015)
  .gsub("\n",'')                            # new lines
  .strip
end

#is_wh_question(question) ⇒ Object

Returns true if question starts with a wh-question word



132
133
134
135
# File 'lib/answerific/miner.rb', line 132

def is_wh_question(question)
  wh_words = %w(who where when why what which how)
  return /^#{Regexp.union(*wh_words)}/ === question
end

#is_yes_no_question(question) ⇒ Object

Returns true if question starts with a yes-no question expression



138
139
140
141
# File 'lib/answerific/miner.rb', line 138

def is_yes_no_question(question)
  yes_no_words = %w(am are is was were have has do does did can could should may)
  return /^#{Regexp.union(*yes_no_words)}/ === question
end

#mine(query) ⇒ Object

EXTRACT INFO ===



45
46
47
48
49
50
51
52
53
# File 'lib/answerific/miner.rb', line 45

def mine(query)
  results = []

  Google::Search::Web.new(query: query).each do |r|
    results << clean_google_result(r.content)
  end

  process_google_results(results, query)
end

#parse(question) ⇒ Object

PARSE AND REARRANGE === (prepare for search engines)



57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# File 'lib/answerific/miner.rb', line 57

def parse(question)
  type = broad_question_type question
  parsed = ''

  case type
  when 'wh'
    parsed = parse_wh_question question
  when 'yes-no'
    parsed = parse_yes_no_question question
  when 'declarative'
    parsed = parse_declarative_question question
  end

  return parsed
end

#parse_declarative_question(question) ⇒ Object

Returns ‘question` without the declarative statement Example:

question: 'tell me what is Pluto'
returns : 'what is Pluto'


118
119
120
121
# File 'lib/answerific/miner.rb', line 118

def parse_declarative_question(question)
  declarative_expressions = [ 'tell me', 'I want to know' ]
  return question.gsub(/^#{Regexp.union(*declarative_expressions)}/, '').strip
end

#parse_wh_question(question) ⇒ Object

TODO consider verb permutations TODO consider wh-word: where is the sun => the sun is [located] Parses the wh-question ‘question` by removing the wh-word and moving the main verb at the end Assumptions:

* wh-word is at the beginning
* main verb follows the wh-word
    (TODO not accurate for which/whose but should be ok for the others)

Example:

question: 'where is the Kuiper belt'
returns : 'the Kuiper belt is'


83
84
85
86
87
# File 'lib/answerific/miner.rb', line 83

def parse_wh_question(question)
  words = question.split ' '
  parsed = words[2..-1] << words[1]
  parsed.join " "
end

#parse_yes_no_question(question) ⇒ Object

Returns ‘question` without the yes-no verb Example:

question: 'is pluto closer to the sun than saturn'
returns : 'pluto closer to the sun than saturn'


109
110
111
112
# File 'lib/answerific/miner.rb', line 109

def parse_yes_no_question(question)
  words = question.split ' '
  return words[1..-1].join ' '
end

#preprocess(input) ⇒ Object

Returns cleaned ‘input`



146
147
148
# File 'lib/answerific/miner.rb', line 146

def preprocess(input)
  clean(input)
end

#process_google_results(results, query) ⇒ Object

SELECT RESPONSE ===



19
20
21
22
# File 'lib/answerific/miner.rb', line 19

def process_google_results(results, query)
  candidates = select_responses(results, query)
  select_best_response(candidates)
end

#select_best_response(responses) ⇒ Object

Returns a single response from the list of responses TODO how to select the best? right now, return the first one



26
27
28
# File 'lib/answerific/miner.rb', line 26

def select_best_response(responses)
  responses.sample
end

#select_responses(results, query) ⇒ Object

Returns the responses from ‘results` that have a the words in `query`



31
32
33
34
35
36
37
38
39
40
41
# File 'lib/answerific/miner.rb', line 31

def select_responses(results, query)
  sentences = results.map { |r| split_at_dot(r) }.flatten
  query_words = query.split ' '

  # Select the responses, only keeping the sentence that contain the search query
  selected = sentences.select do |sentence|
    query_words.all? { |w| sentence.include? w }  # contains all query words
  end

  return selected
end

#split_at_dot(string) ⇒ Object



169
170
171
172
173
# File 'lib/answerific/miner.rb', line 169

def split_at_dot(string)
  # matches NUM. or ALPHAALPHA.
  re = /([0-9]|[a-z]{2})[\.\?!] ?/i
  string.split(re).each_slice(2).map(&:join)
end