Class: Spacy::Doc
Overview
See also spaCy Python API document for Doc.
Instance Attribute Summary collapse
-
#py_doc ⇒ Object
readonly
A Python
Docinstance accessible viaPyCall. -
#spacy_doc_id ⇒ String
readonly
An identifier string that can be used when referring to the Python object inside
PyCall::execorPyCall::eval. -
#spacy_nlp_id ⇒ String
readonly
An identifier string that can be used when referring to the Python object inside
PyCall::execorPyCall::eval. -
#text ⇒ String
readonly
A text string of the document.
Instance Method Summary collapse
-
#[](range) ⇒ Object
Returns a span if given a range object; returns a token if given an integer representing a position in the doc.
-
#displacy(style: "dep", compact: false) ⇒ String
Visualize the document in one of two styles: dep (dependencies) or ent (named entities).
-
#each ⇒ Object
Iterates over the elements in the doc yielding a token instance.
-
#ents ⇒ Array<Span>
Returns an array of spans representing named entities.
-
#initialize(nlp_id, text) ⇒ Doc
constructor
Creates a new instance of Doc.
-
#method_missing(name, *args) ⇒ Object
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.
-
#noun_chunks ⇒ Array<Span>
Returns an array of spans representing noun chunks.
-
#retokenize(start_index, end_index, attributes = {}) ⇒ Object
Retokenizes the text merging a span into a single token.
-
#retokenize_split(pos_in_doc, split_array, head_pos_in_split, ancestor_pos, attributes = {}) ⇒ Object
Retokenizes the text splitting the specified token.
-
#sents ⇒ Array<Span>
Returns an array of spans representing sentences.
-
#similarity(other) ⇒ Float
Returns a semantic similarity estimate.
-
#span(range_or_start, optional_size = nil) ⇒ Span
Returns a span of the specified range within the doc.
-
#to_s ⇒ String
String representation of the token.
-
#tokens ⇒ Array<Token>
Returns an array of tokens contained in the doc.
Constructor Details
#initialize(nlp_id, text) ⇒ Doc
Creates a new instance of Spacy::Doc.
285 286 287 288 289 290 291 292 293 |
# File 'lib/ruby-spacy.rb', line 285 def initialize(nlp_id, text) @text = text @spacy_nlp_id = nlp_id @spacy_doc_id = "doc_#{text.object_id}" quoted = text.gsub('"', '\"') PyCall.exec(%Q[text_#{text.object_id} = """#{quoted}"""]) PyCall.exec("#{@spacy_doc_id} = #{nlp_id}(text_#{text.object_id})") @py_doc = PyCall.eval(@spacy_doc_id) end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args) ⇒ Object
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.
430 431 432 |
# File 'lib/ruby-spacy.rb', line 430 def method_missing(name, *args) @py_doc.send(name, *args) end |
Instance Attribute Details
#py_doc ⇒ Object (readonly)
Returns a Python Doc instance accessible via PyCall.
271 272 273 |
# File 'lib/ruby-spacy.rb', line 271 def py_doc @py_doc end |
#spacy_doc_id ⇒ String (readonly)
Returns an identifier string that can be used when referring to the Python object inside PyCall::exec or PyCall::eval.
268 269 270 |
# File 'lib/ruby-spacy.rb', line 268 def spacy_doc_id @spacy_doc_id end |
#spacy_nlp_id ⇒ String (readonly)
Returns an identifier string that can be used when referring to the Python object inside PyCall::exec or PyCall::eval.
265 266 267 |
# File 'lib/ruby-spacy.rb', line 265 def spacy_nlp_id @spacy_nlp_id end |
#text ⇒ String (readonly)
Returns a text string of the document.
274 275 276 |
# File 'lib/ruby-spacy.rb', line 274 def text @text end |
Instance Method Details
#[](range) ⇒ Object
Returns a span if given a range object; returns a token if given an integer representing a position in the doc.
405 406 407 408 409 410 411 412 |
# File 'lib/ruby-spacy.rb', line 405 def [](range) if range.is_a?(Range) py_span = @py_doc[range] return Span.new(self, start_index: py_span.start, end_index: py_span.end - 1) else return Token.new(@py_doc[range]) end end |
#displacy(style: "dep", compact: false) ⇒ String
Visualize the document in one of two styles: dep (dependencies) or ent (named entities).
425 426 427 |
# File 'lib/ruby-spacy.rb', line 425 def displacy(style: "dep", compact: false) PyCall.eval("displacy.render(#{@spacy_doc_id}, style='#{style}', options={'compact': #{compact.to_s.capitalize}}, jupyter=False)") end |
#each ⇒ Object
Iterates over the elements in the doc yielding a token instance.
344 345 346 347 348 |
# File 'lib/ruby-spacy.rb', line 344 def each PyCall::List.(@py_doc).each do |py_token| yield Token.new(py_token) end end |
#ents ⇒ Array<Span>
Returns an array of spans representing named entities.
394 395 396 397 398 399 400 401 |
# File 'lib/ruby-spacy.rb', line 394 def ents # so that ents canbe "each"-ed in Ruby ent_array = [] PyCall::List.(@py_doc.ents).each do |ent| ent_array << ent end ent_array end |
#noun_chunks ⇒ Array<Span>
Returns an array of spans representing noun chunks.
372 373 374 375 376 377 378 379 |
# File 'lib/ruby-spacy.rb', line 372 def noun_chunks chunk_array = [] py_chunks = PyCall::List.(@py_doc.noun_chunks) py_chunks.each do |py_chunk| chunk_array << Span.new(self, start_index: py_chunk.start, end_index: py_chunk.end - 1) end chunk_array end |
#retokenize(start_index, end_index, attributes = {}) ⇒ Object
Retokenizes the text merging a span into a single token.
300 301 302 303 304 305 306 307 |
# File 'lib/ruby-spacy.rb', line 300 def retokenize(start_index, end_index, attributes = {}) py_attrs = PyCall::Dict.(attributes) PyCall.exec(<<PY) with #{@spacy_doc_id}.retokenize() as retokenizer: retokenizer.merge(#{@spacy_doc_id}[#{start_index} : #{end_index + 1}], attrs=#{py_attrs}) PY @py_doc = PyCall.eval(@spacy_doc_id) end |
#retokenize_split(pos_in_doc, split_array, head_pos_in_split, ancestor_pos, attributes = {}) ⇒ Object
Retokenizes the text splitting the specified token.
314 315 316 317 318 319 320 321 322 323 324 325 |
# File 'lib/ruby-spacy.rb', line 314 def retokenize_split(pos_in_doc, split_array, head_pos_in_split, ancestor_pos, attributes = {}) py_attrs = PyCall::Dict.(attributes) py_split_array = PyCall::List.(split_array) PyCall.exec(<<PY) with #{@spacy_doc_id}.retokenize() as retokenizer: heads = [(#{@spacy_doc_id}[#{pos_in_doc}], #{head_pos_in_split}), #{@spacy_doc_id}[#{ancestor_pos}]] attrs = #{py_attrs} split_array = #{py_split_array} retokenizer.split(#{@spacy_doc_id}[#{pos_in_doc}], split_array, heads=heads, attrs=attrs) PY @py_doc = PyCall.eval(@spacy_doc_id) end |
#sents ⇒ Array<Span>
Returns an array of spans representing sentences.
383 384 385 386 387 388 389 390 |
# File 'lib/ruby-spacy.rb', line 383 def sents sentence_array = [] py_sentences = PyCall::List.(@py_doc.sents) py_sentences.each do |py_sent| sentence_array << Span.new(self, start_index: py_sent.start, end_index: py_sent.end - 1) end sentence_array end |
#similarity(other) ⇒ Float
Returns a semantic similarity estimate.
417 418 419 |
# File 'lib/ruby-spacy.rb', line 417 def similarity(other) PyCall.eval("#{@spacy_doc_id}.similarity(#{other.spacy_doc_id})") end |
#span(range_or_start, optional_size = nil) ⇒ Span
Returns a span of the specified range within the doc.
The method should be used either of the two ways: Doc#span(range) or Doc#span{start_pos, size_of_span}.
355 356 357 358 359 360 361 362 363 364 365 366 367 368 |
# File 'lib/ruby-spacy.rb', line 355 def span(range_or_start, optional_size = nil) if optional_size start_index = range_or_start temp = tokens[start_index ... start_index + optional_size] else start_index = range_or_start.first range = range_or_start temp = tokens[range] end end_index = start_index + temp.size - 1 Span.new(self, start_index: start_index, end_index: end_index) end |
#to_s ⇒ String
String representation of the token.
329 330 331 |
# File 'lib/ruby-spacy.rb', line 329 def to_s @text end |