KvgCharacterRecognition
KvgCharacterRecognition module contains a CJK-character recognition engine which uses pattern/template matching techniques to achieve recognitionof stroke-order and stroke-number free handwritten character patterns in the format [stroke1, stroke2 ...]. A stroke is an array of points in the format [[x1, y1], [x2, y2], ...]. For templates, we use svg data from the KanjiVG project
The engine takes 3 steps to perform the recognition of an input pattern.
- Preprocessing The preprocessing step consists of smoothing, normalizing, interpolating and downsampling of the data points.
- Feature Extraction Smoothed heatmap, significant points and directional feature densities are used as features. A heatmap divides the input pattern in small grids and stores the number of data points in each grid. Significant points are defined as start and end point of a stroke, points on curve or edge. Directional feature densities are introduced in the paper "On-line Recognition of Freely Handwritten Japanese Character Using Directional Feature Density"
- Matching We use the significant points to perform a coarse recognition of the input pattern, that filters out template patterns with great distance to the input pattern. Next, a mixed distance score of directional feature density and smoothed heatmap is calculated. ## Installation
Add this line to your application's Gemfile:
gem 'kvg_character_recognition'
And then execute:
$ bundle
Or install it yourself as:
$ gem install kvg_character_recognition
Usage
Create a database(e.g. using sqlite3 data.db)
Setup the characters table in the database and populate it with kanjivg templates from the xml release
require 'kvg_character_recognition'
KvgCharacterRecognition::Database.setup
KvgCharacterRecognition::Database.populate_from_xml "kanjivg-20150615-2.xml"
3. Recognition
Use an input field of size 300x300 for the best recognition accuracy. The input pattern in the example is the character
Configuration
You can try out different parameters for adapting the extracted features to your input settings i.e. other sample rate, size Don't forget to redo the whole database step after changing the configuration.
#this is the default configuration
config = {
size: 109, #fixed canvas size of kanjivg data
downsample_interval: 4,
interpolate_distance: 0.8,
direction_grid: 15,
smoothed_heatmap_grid: 20,
significant_points_heatmap_grid: 3
}
#from hash
Kvgcharacterrecognition.configure(config)
#from yaml file
Kvgcharacterrecognition.configure_with(path_to_yml)
#configure database with yml
#TODO why is postgres slower than sqlite?
Kvgcharacterrecognition.configure_database(path_to_yml)
Development
After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/kvg_character_recognition.
License
The gem is available as open source under the terms of the MIT License.