Rudachi-rb

Ruby wrapper for Sudachi.
(rudachi for Ruby)

Text

Rudachi::TextParser.parse('東京都へ行く')
=> "東京都\t名詞,固有名詞,地名,一般,*,*\t東京都\nへ\t助詞,格助詞,*,*,*,*\tへ\n行く\t動詞,非自立可能,*,*,五段-カ行,終止形-一般\t行く\nEOS\n"

File

File.open('input.txt', 'w') { |f| f << '東京都へ行く' }
Rudachi::FileParser.parse('input.txt')
=> "東京都\t名詞,固有名詞,地名,一般,*,*\t東京都\nへ\t助詞,格助詞,*,*,*,*\tへ\n行く\t動詞,非自立可能,*,*,五段-カ行,終止形-一般\t行く\nEOS\n"

IO

Rudachi::StreamParser.parse(StringIO.new('東京都へ行く'))
=> "東京都\t名詞,固有名詞,地名,一般,*,*\t東京都\nへ\t助詞,格助詞,*,*,*,*\tへ\n行く\t動詞,非自立可能,*,*,五段-カ行,終止形-一般\t行く\nEOS\n"

With some options

Rudachi::TextParser.new(o: 'output.txt', m: 'A').parse('東京都へ行く')
File.read('output.txt')
=> "東京\t名詞,固有名詞,地名,一般,*,*\t東京\n都\t名詞,普通名詞,一般,*,*,*\t都\nへ\t助詞,格助詞,*,*,*,*\tへ\n行く\t動詞,非自立可能,*,*,五段-カ行,終止形-一般\t行く\nEOS"

Requirements

For JRuby, please check rudachi.

Installation

  1. Install JAR and dictionary of Sudachi (Details)
Install the Sudachi JAR file
$ wget https://github.com/WorksApplications/Sudachi/releases/download/v0.5.3/sudachi-0.5.3-executable.zip
$ unzip sudachi-0.5.3-executable.zip
$ ls sudachi-0.5.3
LICENSE-2.0.txt  README.md  javax.json-1.1.jar  jdartsclone-1.2.0.jar  licenses  sudachi-0.5.3.jar  sudachi.json  sudachi_fulldict.json
Install the Sudachi dictionary
$ wget http://sudachi.s3-website-ap-northeast-1.amazonaws.com/sudachidict/sudachi-dictionary-latest-full.zip
$ unzip -j -d sudachi-dictionary-latest-full sudachi-dictionary-latest-full.zip
$ mv sudachi-dictionary-latest-full/system_full.dic sudachi-dictionary-latest-full/system_core.dic
$ ls sudachi-dictionary-latest-full
LEGAL  LICENSE-2.0.txt  system_core.dic
  1. Install Rudachi
# Gemfile
gem 'rudachi-rb'

Then run bundle install.

  1. Initialize Rudachi
require 'rudachi/rb'

Rudachi.configure do |config|
  config.jar_path = 'sudachi-0.5.3/sudachi-0.5.3.jar'
end

Rudachi::Option.configure do |config|
  config.p = 'sudachi-dictionary-latest-full'
end
  1. Did it !!
Rudachi::TextParser.parse('こんにちは世界')
=> "こんにちは\t感動詞,一般,*,*,*,*\t今日は\n世界\t名詞,普通名詞,一般,*,*,*\t世界\nEOS\n"