Class: Answerific::Miner
- Inherits:
-
Object
- Object
- Answerific::Miner
- Defined in:
- lib/answerific/miner.rb
Overview
Miner bot that answers questions by extracting information from the web Currently only supports Google Search
Instance Method Summary collapse
-
#answer(question) ⇒ Object
Answers ‘question` by querying Google Assumes `question` is downcase, only contains alpha numeric characters (i.e. has been preprocessed by Answerific::Bot.preprocess) Returns a string containing the response or nil if none is found.
-
#broad_question_type(question) ⇒ Object
DETECT TYPE OF QUESTION ===.
-
#clean(input) ⇒ Object
Cleans the string ‘input` by removing non alpha-numeric characters.
-
#clean_google_result(string) ⇒ Object
OTHER FORMATTING ===.
-
#is_wh_question(question) ⇒ Object
Returns true if question starts with a wh-question word.
-
#is_yes_no_question(question) ⇒ Object
Returns true if question starts with a yes-no question expression.
-
#mine(query) ⇒ Object
EXTRACT INFO ===.
-
#parse(question) ⇒ Object
PARSE AND REARRANGE === (prepare for search engines).
-
#parse_declarative_question(question) ⇒ Object
Returns ‘question` without the declarative statement Example: question: ’tell me what is Pluto’ returns : ‘what is Pluto’.
-
#parse_wh_question(question) ⇒ Object
TODO consider verb permutations TODO consider wh-word: where is the sun => the sun is [located] Parses the wh-question ‘question` by removing the wh-word and moving the main verb at the end Assumptions: * wh-word is at the beginning * main verb follows the wh-word (TODO not accurate for which/whose but should be ok for the others) Example: question: ’where is the Kuiper belt’ returns : ‘the Kuiper belt is’.
-
#parse_yes_no_question(question) ⇒ Object
Returns ‘question` without the yes-no verb Example: question: ’is pluto closer to the sun than saturn’ returns : ‘pluto closer to the sun than saturn’.
-
#preprocess(input) ⇒ Object
Returns cleaned ‘input`.
-
#process_google_results(results, query) ⇒ Object
SELECT RESPONSE ===.
-
#select_best_response(responses) ⇒ Object
Returns a single response from the list of responses TODO how to select the best? right now, return the first one.
-
#select_responses(results, query) ⇒ Object
Returns the responses from ‘results` that have a the words in `query`.
- #split_at_dot(string) ⇒ Object
Instance Method Details
#answer(question) ⇒ Object
Answers ‘question` by querying Google Assumes `question` is downcase, only contains alpha numeric characters
(i.e. has been preprocessed by Answerific::Bot.preprocess)
Returns a string containing the response or nil if none is found
11 12 13 14 15 |
# File 'lib/answerific/miner.rb', line 11 def answer(question) p 'Answering ' + question return nil if !question || question.empty? mine(parse(preprocess(question))) end |
#broad_question_type(question) ⇒ Object
DETECT TYPE OF QUESTION ===
125 126 127 128 129 |
# File 'lib/answerific/miner.rb', line 125 def broad_question_type(question) return 'wh' if is_wh_question question return 'yes-no' if is_yes_no_question question return 'declarative' end |
#clean(input) ⇒ Object
Cleans the string ‘input` by removing non alpha-numeric characters
151 152 153 154 |
# File 'lib/answerific/miner.rb', line 151 def clean(input) ret = input.downcase ret.gsub(/[^0-9a-z ]/i, '').strip end |
#clean_google_result(string) ⇒ Object
OTHER FORMATTING ===
158 159 160 161 162 163 164 165 166 167 |
# File 'lib/answerific/miner.rb', line 158 def clean_google_result(string) string = CGI.unescapeHTML(string) string .downcase .gsub(/[^\.]+\.{3,}/, '') # remove incomplete sentences .gsub(/<("[^"]*"|'[^']*'|[^'">])*>/, '') # html tags .gsub(/\w{3} \d{1,2}, \d{4} \.{3} /, '') # dates (27 Jan, 2015) .gsub("\n",'') # new lines .strip end |
#is_wh_question(question) ⇒ Object
Returns true if question starts with a wh-question word
132 133 134 135 |
# File 'lib/answerific/miner.rb', line 132 def is_wh_question(question) wh_words = %w(who where when why what which how) return /^#{Regexp.union(*wh_words)}/ === question end |
#is_yes_no_question(question) ⇒ Object
Returns true if question starts with a yes-no question expression
138 139 140 141 |
# File 'lib/answerific/miner.rb', line 138 def is_yes_no_question(question) yes_no_words = %w(am are is was were have has do does did can could should may) return /^#{Regexp.union(*yes_no_words)}/ === question end |
#mine(query) ⇒ Object
EXTRACT INFO ===
45 46 47 48 49 50 51 52 53 |
# File 'lib/answerific/miner.rb', line 45 def mine(query) results = [] Google::Search::Web.new(query: query).each do |r| results << clean_google_result(r.content) end process_google_results(results, query) end |
#parse(question) ⇒ Object
PARSE AND REARRANGE === (prepare for search engines)
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'lib/answerific/miner.rb', line 57 def parse(question) type = broad_question_type question parsed = '' case type when 'wh' parsed = parse_wh_question question when 'yes-no' parsed = parse_yes_no_question question when 'declarative' parsed = parse_declarative_question question end return parsed end |
#parse_declarative_question(question) ⇒ Object
Returns ‘question` without the declarative statement Example:
question: 'tell me what is Pluto'
returns : 'what is Pluto'
118 119 120 121 |
# File 'lib/answerific/miner.rb', line 118 def parse_declarative_question(question) declarative_expressions = [ 'tell me', 'I want to know' ] return question.gsub(/^#{Regexp.union(*declarative_expressions)}/, '').strip end |
#parse_wh_question(question) ⇒ Object
TODO consider verb permutations TODO consider wh-word: where is the sun => the sun is [located] Parses the wh-question ‘question` by removing the wh-word and moving the main verb at the end Assumptions:
* wh-word is at the beginning
* main verb follows the wh-word
(TODO not accurate for which/whose but should be ok for the others)
Example:
question: 'where is the Kuiper belt'
returns : 'the Kuiper belt is'
83 84 85 86 87 |
# File 'lib/answerific/miner.rb', line 83 def parse_wh_question(question) words = question.split ' ' parsed = words[2..-1] << words[1] parsed.join " " end |
#parse_yes_no_question(question) ⇒ Object
Returns ‘question` without the yes-no verb Example:
question: 'is pluto closer to the sun than saturn'
returns : 'pluto closer to the sun than saturn'
109 110 111 112 |
# File 'lib/answerific/miner.rb', line 109 def parse_yes_no_question(question) words = question.split ' ' return words[1..-1].join ' ' end |
#preprocess(input) ⇒ Object
Returns cleaned ‘input`
146 147 148 |
# File 'lib/answerific/miner.rb', line 146 def preprocess(input) clean(input) end |
#process_google_results(results, query) ⇒ Object
SELECT RESPONSE ===
19 20 21 22 |
# File 'lib/answerific/miner.rb', line 19 def process_google_results(results, query) candidates = select_responses(results, query) select_best_response(candidates) end |
#select_best_response(responses) ⇒ Object
Returns a single response from the list of responses TODO how to select the best? right now, return the first one
26 27 28 |
# File 'lib/answerific/miner.rb', line 26 def select_best_response(responses) responses.sample end |
#select_responses(results, query) ⇒ Object
Returns the responses from ‘results` that have a the words in `query`
31 32 33 34 35 36 37 38 39 40 41 |
# File 'lib/answerific/miner.rb', line 31 def select_responses(results, query) sentences = results.map { |r| split_at_dot(r) }.flatten query_words = query.split ' ' # Select the responses, only keeping the sentence that contain the search query selected = sentences.select do |sentence| query_words.all? { |w| sentence.include? w } # contains all query words end return selected end |
#split_at_dot(string) ⇒ Object
169 170 171 172 173 |
# File 'lib/answerific/miner.rb', line 169 def split_at_dot(string) # matches NUM. or ALPHAALPHA. re = /([0-9]|[a-z]{2})[\.\?!] ?/i string.split(re).each_slice(2).map(&:join) end |