Class: Spacy::Matcher
- Inherits:
-
Object
- Object
- Spacy::Matcher
- Defined in:
- lib/ruby-spacy.rb
Overview
See also spaCy Python API document for Matcher.
Instance Attribute Summary collapse
-
#py_matcher ⇒ Object
readonly
A Python
Matcherinstance accessible viaPyCall. -
#spacy_matcher_id ⇒ String
readonly
An identifier string that can be used when referring to the Python object inside
PyCall::execorPyCall::eval.
Instance Method Summary collapse
-
#add(text, pattern) ⇒ Object
Adds a label string and a text pattern.
-
#initialize(nlp_id) ⇒ Matcher
constructor
Creates a Matcher instance.
-
#match(doc) ⇒ Array<Hash{:match_id => Integer, :start_index => Integer, :end_index => Integer}>
Execute the match.
Constructor Details
#initialize(nlp_id) ⇒ Matcher
Creates a Spacy::Matcher instance
446 447 448 449 450 |
# File 'lib/ruby-spacy.rb', line 446 def initialize(nlp_id) @spacy_matcher_id = "doc_#{nlp_id}_matcher" PyCall.exec("#{@spacy_matcher_id} = Matcher(#{nlp_id}.vocab)") @py_matcher = PyCall.eval(@spacy_matcher_id) end |
Instance Attribute Details
#py_matcher ⇒ Object (readonly)
Returns a Python Matcher instance accessible via PyCall.
442 443 444 |
# File 'lib/ruby-spacy.rb', line 442 def py_matcher @py_matcher end |
#spacy_matcher_id ⇒ String (readonly)
Returns an identifier string that can be used when referring to the Python object inside PyCall::exec or PyCall::eval.
439 440 441 |
# File 'lib/ruby-spacy.rb', line 439 def spacy_matcher_id @spacy_matcher_id end |
Instance Method Details
#add(text, pattern) ⇒ Object
Adds a label string and a text pattern.
455 456 457 |
# File 'lib/ruby-spacy.rb', line 455 def add(text, pattern) @py_matcher.add(text, pattern) end |
#match(doc) ⇒ Array<Hash{:match_id => Integer, :start_index => Integer, :end_index => Integer}>
Execute the match.
462 463 464 465 466 467 468 469 470 471 472 473 474 475 |
# File 'lib/ruby-spacy.rb', line 462 def match(doc) str_results = PyCall.eval("#{@spacy_matcher_id}(#{doc.spacy_doc_id})").to_s s = StringScanner.new(str_results[1..-2]) results = [] while s.scan_until(/(\d+), (\d+), (\d+)/) next unless s.matched triple = s.matched.split(", ") match_id = triple[0].to_i start_index = triple[1].to_i end_index = triple[2].to_i - 1 results << {match_id: match_id, start_index: start_index, end_index: end_index} end results end |