Class: Spacy::Language
- Inherits:
-
Object
- Object
- Spacy::Language
- Defined in:
- lib/ruby-spacy.rb
Overview
See also spaCy Python API document for Language.
Instance Attribute Summary collapse
-
#py_nlp ⇒ Object
readonly
A Python
Languageinstance accessible viaPyCall. -
#spacy_nlp_id ⇒ String
readonly
An identifier string that can be used when referring to the Python object inside
PyCall::execorPyCall::eval.
Instance Method Summary collapse
-
#get_lexeme(text) ⇒ Object
A utility method to get a Python
Lexemeobject. -
#initialize(model = "en_core_web_sm") ⇒ Language
constructor
Creates a language model instance, which is conventionally referred to by a variable named
nlp. -
#matcher ⇒ Matcher
Generates a matcher for the current language model.
-
#method_missing(name, *args) ⇒ Object
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism....
-
#most_similar(vector, n) ⇒ Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>
Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word.
-
#pipe_names ⇒ Array<String>
A utility method to list pipeline components.
-
#read(text) ⇒ Object
Reads and analyze the given text.
-
#tokenizer ⇒ Object
A utility method to get the tokenizer Python object.
-
#vocab_string_lookup(id) ⇒ Object
A utility method to lookup a vocabulary item of the given id.
Constructor Details
#initialize(model = "en_core_web_sm") ⇒ Language
Creates a language model instance, which is conventionally referred to by a variable named nlp.
489 490 491 492 493 494 |
# File 'lib/ruby-spacy.rb', line 489 def initialize(model = "en_core_web_sm") @spacy_nlp_id = "nlp_#{model.object_id}" PyCall.exec("import spacy; from spacy.tokens import Span; from spacy.matcher import Matcher; from spacy import displacy") PyCall.exec("#{@spacy_nlp_id} = spacy.load('#{model}')") @py_nlp = PyCall.eval(@spacy_nlp_id) end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args) ⇒ Object
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism....
561 562 563 |
# File 'lib/ruby-spacy.rb', line 561 def method_missing(name, *args) @py_nlp.send(name, *args) end |
Instance Attribute Details
#py_nlp ⇒ Object (readonly)
Returns a Python Language instance accessible via PyCall.
485 486 487 |
# File 'lib/ruby-spacy.rb', line 485 def py_nlp @py_nlp end |
#spacy_nlp_id ⇒ String (readonly)
Returns an identifier string that can be used when referring to the Python object inside PyCall::exec or PyCall::eval.
482 483 484 |
# File 'lib/ruby-spacy.rb', line 482 def spacy_nlp_id @spacy_nlp_id end |
Instance Method Details
#get_lexeme(text) ⇒ Object
A utility method to get a Python Lexeme object.
534 535 536 537 538 |
# File 'lib/ruby-spacy.rb', line 534 def get_lexeme(text) text = text.gsub("'", "\'") py_lexeme = PyCall.eval("#{@spacy_nlp_id}.vocab['#{text}']") return py_lexeme end |
#matcher ⇒ Matcher
Generates a matcher for the current language model.
504 505 506 |
# File 'lib/ruby-spacy.rb', line 504 def matcher Matcher.new(@spacy_nlp_id) end |
#most_similar(vector, n) ⇒ Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>
Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word.
543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 |
# File 'lib/ruby-spacy.rb', line 543 def most_similar(vector, n) vec_array = Numpy.asarray([vector]) py_result = @py_nlp.vocab.vectors.most_similar(vec_array, n: n) key_texts = PyCall.eval("[[str(n), #{@spacy_nlp_id}.vocab[n].text] for n in #{py_result[0][0].tolist()}]") keys = key_texts.map{|kt| kt[0]} texts = key_texts.map{|kt| kt[1]} best_rows = PyCall::List.(py_result[1])[0] scores = PyCall::List.(py_result[2])[0] results = [] n.times do |i| results << {key: keys[i].to_i, text: texts[i], best_row: best_rows[i], score: scores[i]} end results end |
#pipe_names ⇒ Array<String>
A utility method to list pipeline components.
517 518 519 520 521 522 523 |
# File 'lib/ruby-spacy.rb', line 517 def pipe_names pipe_array = [] PyCall::List.(@py_nlp.pipe_names).each do |pipe| pipe_array << pipe end pipe_array end |
#read(text) ⇒ Object
Reads and analyze the given text.
498 499 500 |
# File 'lib/ruby-spacy.rb', line 498 def read(text) Doc.new(@spacy_nlp_id, text) end |
#tokenizer ⇒ Object
A utility method to get the tokenizer Python object.
527 528 529 |
# File 'lib/ruby-spacy.rb', line 527 def tokenizer return PyCall.eval("#{@spacy_nlp_id}.tokenizer") end |
#vocab_string_lookup(id) ⇒ Object
A utility method to lookup a vocabulary item of the given id.
511 512 513 |
# File 'lib/ruby-spacy.rb', line 511 def vocab_string_lookup(id) PyCall.eval("#{@spacy_nlp_id}.vocab.strings[#{id}]") end |