Class: Natto::MeCabNode

Inherits:
MeCabStruct show all
Defined in:
lib/natto/struct.rb

Overview

MeCabNode is a wrapper for the struct mecab_node_t structure holding the parsed node.

Values for the MeCab node attributes may be obtained by using the following Symbols as keys to the layout associative array of FFI::Struct members.

  • :prev - pointer to previous node
  • :next - pointer to next node
  • :enext - pointer to the node which ends at the same position
  • :bnext - pointer to the node which starts at the same position
  • :rpath - pointer to the right path; nil if MECAB_ONE_BEST mode
  • :lpath - pointer to the right path; nil if MECAB_ONE_BEST mode
  • :surface - surface string; length may be obtained with length/rlength members
  • :feature - feature string
  • :id - unique node id
  • :length - length of surface form
  • :rlength - length of the surface form including white space before the morph
  • :rcAttr - right attribute id
  • :lcAttr - left attribute id
  • :posid - part-of-speech id
  • :char_type - character type
  • :stat - node status; 0 (NOR), 1 (UNK), 2 (BOS), 3 (EOS), 4 (EON)
  • :isbest - 1 if this node is best node
  • :alpha - forward accumulative log summation, only with marginal probability flag
  • :beta - backward accumulative log summation, only with marginal probability flag
  • :prob - marginal probability, only with marginal probability flag
  • :wcost - word cost
  • :cost - best accumulative cost from bos node to this node

Usage

An instance of MeCabNode is yielded to the block used with MeCab#parse, where the above-mentioned node attributes may be accessed by name.

nm = Natto::MeCab.new

nm.parse('卓球なんて死ぬまでの暇つぶしだよ。') do |n| 
  puts "#{n.surface}\t#{n.cost}" if n.is_nor? 
end
卓球     2874
なんて    4398
死ぬ     9261
まで     9386
       10007
暇つぶし 13324
       15346
       14396
       10194

While it is also possible to use the Symbol for the MeCab node member to index into the FFI::Struct layout associative array, please use the attribute accessors. In the case of :surface and :feature, MeCab returns the raw bytes, so natto will convert that into a string using the default encoding.

Constant Summary collapse

NOR_NODE =

Normal MeCab node defined in the dictionary, c.f. stat.

0
UNK_NODE =

Unknown MeCab node not defined in the dictionary, c.f. stat.

1
BOS_NODE =

Virtual node representing the beginning of the sentence, c.f. stat.

2
EOS_NODE =

Virutual node representing the end of the sentence, c.f. stat.

3
EON_NODE =

Virtual node representing the end of an N-Best MeCab node list, c.f. stat.

4

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods inherited from MeCabStruct

#method_missing

Constructor Details

#initialize(nptr) ⇒ MeCabNode

Initializes this node instance. Sets the MeCab feature value for this node.

Parameters:

  • nptr (FFI::Pointer)

    pointer to MeCab node



244
245
246
247
248
249
250
251
# File 'lib/natto/struct.rb', line 244

def initialize(nptr)
  super(nptr)
  @pointer = nptr

  if self[:feature]
    @feature = self[:feature].force_encoding(Encoding.default_external)
  end
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method in the class Natto::MeCabStruct

Instance Attribute Details

#featureString

Returns corresponding feature value.

Returns:

  • (String)

    corresponding feature value.



203
204
205
# File 'lib/natto/struct.rb', line 203

def feature
  @feature
end

#pointerFFI::Pointer (readonly)

Returns pointer to MeCab node struct.

Returns:

  • (FFI::Pointer)

    pointer to MeCab node struct.



205
206
207
# File 'lib/natto/struct.rb', line 205

def pointer
  @pointer
end

#surfaceString

Returns surface morpheme surface value.

Returns:

  • (String)

    surface morpheme surface value.



201
202
203
# File 'lib/natto/struct.rb', line 201

def surface
  @surface
end

Instance Method Details

#inspectString

Overrides Object#inspect.

Returns:

  • (String)

    encoded object id, stat, surface, and feature

See Also:



273
274
275
# File 'lib/natto/struct.rb', line 273

def inspect
  self.to_s
end

#is_bos?Boolean

Returns true if this is a virtual MeCab node representing the beginning of the sentence.

Returns:

  • (Boolean)


291
292
293
# File 'lib/natto/struct.rb', line 291

def is_bos?
  self.stat == BOS_NODE
end

#is_eon?Boolean

Returns true if this is a virtual MeCab node representing the end of the node list.

Returns:

  • (Boolean)


303
304
305
# File 'lib/natto/struct.rb', line 303

def is_eon?
  self.stat == EON_NODE
end

#is_eos?Boolean

Returns true if this is a virtual MeCab node representing the end of the sentence.

Returns:

  • (Boolean)


297
298
299
# File 'lib/natto/struct.rb', line 297

def is_eos?
  self.stat == EOS_NODE 
end

#is_nor?Boolean

Returns true if this is a normal MeCab node found in the dictionary.

Returns:

  • (Boolean)


279
280
281
# File 'lib/natto/struct.rb', line 279

def is_nor?
  self.stat == NOR_NODE
end

#is_unk?Boolean

Returns true if this is an unknown MeCab node not found in the dictionary.

Returns:

  • (Boolean)


285
286
287
# File 'lib/natto/struct.rb', line 285

def is_unk?
  self.stat == UNK_NODE
end

#to_sString

Returns human-readable details for the MeCab node. Overrides Object#to_s.

  • encoded object id
  • underlying FFI pointer to MeCab Node
  • stat (node type: NOR, UNK, BOS/EOS, EON)
  • surface
  • feature

Returns:

  • (String)

    encoded object id, underlying FFI pointer, stat, surface, and feature



262
263
264
265
266
267
268
# File 'lib/natto/struct.rb', line 262

def to_s
   [ super.chop,
     "@pointer=#{@pointer},",
     "stat=#{self[:stat]},", 
     "@surface=\"#{self.surface}\",",
     "@feature=\"#{self.feature}\">" ].join(' ')
end