chawan
A cup for chasen that provides an easy to use for extracting Japanese
Methods
* Chawan.parse(text)
parse the given text by analyzer, where default analyzer is :mecab
* Chawan.analyzer(xxx) (same as Chawan[xxx], Chawan.xxx)
specify analyzer
Class
* Chawan::Node (Chawan.parse returns an array of Chawan::Node)
#category : part of speech
#word : text
#attributes : keys and vals hash
Example
Chawan.parse('本日は晴天なり')
=> [<名詞: '本日'>, <助詞: 'は'>, <名詞: '晴天'>, <助動詞: 'なり'>]
Chawan.parse('本日は晴天なり').select{|node| node.category == '名詞'}.join
=> "本日晴天"
Analyzer
Parser engine is defined as 'analyzer'.
Available analyzers are:
* mecab : (default)
* chasen
Chawan[:mecab].parse('test')
=> [<名詞: 'test'>]
# same as
# Chawan.mecab.parse('test')
# Chawan.analyzer(:mecab).parse('test')
# Chawan.parse('test') # default analyzer is :mecab
Chawan[:chasen].parse('test')
=> [<記号: 't'>, <記号: 'e'>, <記号: 's'>, <記号: 't'>]
Required
* UTF-8
* 'mecab' unix command (and its path)
Todo
* gateway interface to Chawan#parse such as grep, noun, ...
* use open3 rather than backquote for executing unix commands
Author
maiha@wota.jp