Class: Spacy::Language

Inherits:
Object
  • Object
show all
Defined in:
lib/ruby-spacy.rb

Overview

See also spaCy Python API document for Language.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(model = "en_core_web_sm") ⇒ Language

Creates a language model instance, which is conventionally referred to by a variable named nlp.

Parameters:

  • model (String) (defaults to: "en_core_web_sm")

    A language model installed in the system



489
490
491
492
493
494
# File 'lib/ruby-spacy.rb', line 489

def initialize(model = "en_core_web_sm")
  @spacy_nlp_id = "nlp_#{model.object_id}"
  PyCall.exec("import spacy; from spacy.tokens import Span; from spacy.matcher import Matcher; from spacy import displacy")
  PyCall.exec("#{@spacy_nlp_id} = spacy.load('#{model}')")
  @py_nlp = PyCall.eval(@spacy_nlp_id)
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(name, *args) ⇒ Object

Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism....



561
562
563
# File 'lib/ruby-spacy.rb', line 561

def method_missing(name, *args)
  @py_nlp.send(name, *args)
end

Instance Attribute Details

#py_nlpObject (readonly)

Returns a Python Language instance accessible via PyCall.

Returns:

  • (Object)

    a Python Language instance accessible via PyCall



485
486
487
# File 'lib/ruby-spacy.rb', line 485

def py_nlp
  @py_nlp
end

#spacy_nlp_idString (readonly)

Returns an identifier string that can be used when referring to the Python object inside PyCall::exec or PyCall::eval.

Returns:

  • (String)

    an identifier string that can be used when referring to the Python object inside PyCall::exec or PyCall::eval



482
483
484
# File 'lib/ruby-spacy.rb', line 482

def spacy_nlp_id
  @spacy_nlp_id
end

Instance Method Details

#get_lexeme(text) ⇒ Object

A utility method to get a Python Lexeme object.

Parameters:

  • text (String)

    A text string representing a lexeme

Returns:

  • (Object)

    Python Tokenizer object



534
535
536
537
538
# File 'lib/ruby-spacy.rb', line 534

def get_lexeme(text)
  text = text.gsub("'", "\'")
  py_lexeme = PyCall.eval("#{@spacy_nlp_id}.vocab['#{text}']")
  return py_lexeme
end

#matcherMatcher

Generates a matcher for the current language model.

Returns:



504
505
506
# File 'lib/ruby-spacy.rb', line 504

def matcher
  Matcher.new(@spacy_nlp_id)
end

#most_similar(vector, n) ⇒ Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>

Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word.

Parameters:

  • vector (Object)

    A vector representation of a word (whether existing or non-existing)

Returns:

  • (Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>)

    An array of hash objects each contains the key, text, best_row and similarity score of a lexeme



543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
# File 'lib/ruby-spacy.rb', line 543

def most_similar(vector, n)
  vec_array = Numpy.asarray([vector])
  py_result = @py_nlp.vocab.vectors.most_similar(vec_array, n: n)
  key_texts = PyCall.eval("[[str(n), #{@spacy_nlp_id}.vocab[n].text] for n in #{py_result[0][0].tolist()}]")
  keys = key_texts.map{|kt| kt[0]}
  texts = key_texts.map{|kt| kt[1]}
  best_rows = PyCall::List.(py_result[1])[0]
  scores = PyCall::List.(py_result[2])[0]

  results = []
  n.times do |i|
    results << {key: keys[i].to_i, text: texts[i], best_row: best_rows[i], score: scores[i]}
  end

  results
end

#pipe_namesArray<String>

A utility method to list pipeline components.

Returns:

  • (Array<String>)

    An array of text strings representing pipeline components



517
518
519
520
521
522
523
# File 'lib/ruby-spacy.rb', line 517

def pipe_names
  pipe_array = []
  PyCall::List.(@py_nlp.pipe_names).each do |pipe|
    pipe_array << pipe
  end
  pipe_array
end

#read(text) ⇒ Object

Reads and analyze the given text.

Parameters:

  • text (String)

    A text to be read and analyzed



498
499
500
# File 'lib/ruby-spacy.rb', line 498

def read(text)
  Doc.new(@spacy_nlp_id, text)
end

#tokenizerObject

A utility method to get the tokenizer Python object.

Returns:

  • (Object)

    Python Tokenizer object



527
528
529
# File 'lib/ruby-spacy.rb', line 527

def tokenizer
  return PyCall.eval("#{@spacy_nlp_id}.tokenizer")
end

#vocab_string_lookup(id) ⇒ Object

A utility method to lookup a vocabulary item of the given id.

Parameters:

  • id (Integer)

    A vocabulary id

Returns:

  • (Object)

    A Python Lexeme object



511
512
513
# File 'lib/ruby-spacy.rb', line 511

def vocab_string_lookup(id)
  PyCall.eval("#{@spacy_nlp_id}.vocab.strings[#{id}]")
end