Class: Spacy::Language
- Inherits:
-
Object
- Object
- Spacy::Language
- Defined in:
- lib/ruby-spacy.rb
Overview
See also spaCy Python API document for Language.
Instance Attribute Summary collapse
-
#py_nlp ⇒ Object
readonly
A Python
Languageinstance accessible viaPyCall. -
#spacy_nlp_id ⇒ String
readonly
An identifier string that can be used to refer to the Python
Languageobject insidePyCall::execorPyCall::eval.
Instance Method Summary collapse
-
#get_lexeme(text) ⇒ Object
A utility method to get a Python
Lexemeobject. -
#initialize(model = "en_core_web_sm") ⇒ Language
constructor
Creates a language model instance, which is conventionally referred to by a variable named
nlp. -
#matcher ⇒ Matcher
Generates a matcher for the current language model.
-
#method_missing(name, *args) ⇒ Object
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism....
-
#most_similar(vector, n) ⇒ Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>
Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word.
-
#pipe(texts, disable: [], batch_size: 50) ⇒ Array<Doc>
Utility function to batch process many texts.
-
#pipe_names ⇒ Array<String>
A utility method to list pipeline components.
-
#read(text) ⇒ Object
Reads and analyze the given text.
-
#vocab(text) ⇒ Lexeme
Returns a ruby lexeme object.
-
#vocab_string_lookup(id) ⇒ Object
A utility method to lookup a vocabulary item of the given id.
Constructor Details
#initialize(model = "en_core_web_sm") ⇒ Language
Creates a language model instance, which is conventionally referred to by a variable named nlp.
219 220 221 222 223 |
# File 'lib/ruby-spacy.rb', line 219 def initialize(model = "en_core_web_sm") @spacy_nlp_id = "nlp_#{model.object_id}" PyCall.exec("import spacy; #{@spacy_nlp_id} = spacy.load('#{model}')") @py_nlp = PyCall.eval(@spacy_nlp_id) end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args) ⇒ Object
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism....
301 302 303 |
# File 'lib/ruby-spacy.rb', line 301 def method_missing(name, *args) @py_nlp.send(name, *args) end |
Instance Attribute Details
#py_nlp ⇒ Object (readonly)
Returns a Python Language instance accessible via PyCall.
215 216 217 |
# File 'lib/ruby-spacy.rb', line 215 def py_nlp @py_nlp end |
#spacy_nlp_id ⇒ String (readonly)
Returns an identifier string that can be used to refer to the Python Language object inside PyCall::exec or PyCall::eval.
212 213 214 |
# File 'lib/ruby-spacy.rb', line 212 def spacy_nlp_id @spacy_nlp_id end |
Instance Method Details
#get_lexeme(text) ⇒ Object
A utility method to get a Python Lexeme object.
257 258 259 |
# File 'lib/ruby-spacy.rb', line 257 def get_lexeme(text) @py_nlp.vocab[text] end |
#matcher ⇒ Matcher
Generates a matcher for the current language model.
233 234 235 |
# File 'lib/ruby-spacy.rb', line 233 def matcher Matcher.new(@py_nlp) end |
#most_similar(vector, n) ⇒ Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>
Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word.
271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 |
# File 'lib/ruby-spacy.rb', line 271 def most_similar(vector, n) vec_array = Numpy.asarray([vector]) py_result = @py_nlp.vocab.vectors.most_similar(vec_array, n: n) key_texts = PyCall.eval("[[str(n), #{@spacy_nlp_id}.vocab[n].text] for n in #{py_result[0][0].tolist}]") keys = key_texts.map{|kt| kt[0]} texts = key_texts.map{|kt| kt[1]} best_rows = PyCall::List.(py_result[1])[0] scores = PyCall::List.(py_result[2])[0] results = [] n.times do |i| results << {key: keys[i].to_i, text: texts[i], best_row: best_rows[i], score: scores[i]} end results end |
#pipe(texts, disable: [], batch_size: 50) ⇒ Array<Doc>
Utility function to batch process many texts
292 293 294 295 296 297 298 |
# File 'lib/ruby-spacy.rb', line 292 def pipe(texts, disable: [], batch_size: 50) docs = [] PyCall::List.(@py_nlp.pipe(texts, disable: disable, batch_size: batch_size)).each do |py_doc| docs << Doc.new(@py_nlp, py_doc: py_doc) end docs end |
#pipe_names ⇒ Array<String>
A utility method to list pipeline components.
246 247 248 249 250 251 252 |
# File 'lib/ruby-spacy.rb', line 246 def pipe_names pipe_array = [] PyCall::List.(@py_nlp.pipe_names).each do |pipe| pipe_array << pipe end pipe_array end |
#read(text) ⇒ Object
Reads and analyze the given text.
227 228 229 |
# File 'lib/ruby-spacy.rb', line 227 def read(text) Doc.new(py_nlp, text: text) end |
#vocab(text) ⇒ Lexeme
Returns a ruby lexeme object
264 265 266 |
# File 'lib/ruby-spacy.rb', line 264 def vocab(text) Lexeme.new(@py_nlp.vocab[text]) end |
#vocab_string_lookup(id) ⇒ Object
A utility method to lookup a vocabulary item of the given id.
240 241 242 |
# File 'lib/ruby-spacy.rb', line 240 def vocab_string_lookup(id) PyCall.eval("#{@spacy_nlp_id}.vocab.strings[#{id}]") end |