Class: Guesslanguage

Inherits:
Object
  • Object
show all
Defined in:
lib/deplate/guesslanguage.rb

Overview

This is ported form/based on:

  • Title: Guess language of text using ZIP

  • Submitter: Dirk Holtwick

  • Last Updated: 2004/12/07

  • Version no: 1.2

  • Category: Algorithms

aspn.activestate.com/ASPN/Cookbook/Python/Recipe/355807 www.heise.de/newsticker/data/wst-28.01.02-003/ xxx.uni-augsburg.de/format/cond-mat/0108530

Instance Method Summary collapse

Constructor Details

#initializeGuesslanguage



20
21
22
# File 'lib/deplate/guesslanguage.rb', line 20

def initialize
    @data = []
end

Instance Method Details

#guess(part) ⇒ Object



52
53
54
55
# File 'lib/deplate/guesslanguage.rb', line 52

def guess(part)
    diff, lang = guess_with_diff(part)
    lang
end

#guess_with_diff(part) ⇒ Object

<part> is a text that will be compared with the registered corpora and the function will return what you defined as <name> in the registration process.



39
40
41
42
43
44
45
46
47
48
49
50
# File 'lib/deplate/guesslanguage.rb', line 39

def guess_with_diff(part)
    what = nil
    diff = nil
    for name, corpus, ziplen in @data
        nz = zip(corpus + part).size - ziplen
        if diff.nil? or nz < diff
            what = name
            diff = nz
        end
    end
    return [diff.to_f/part.size, what]
end

#register(name, corpus) ⇒ Object

register a text as corpus for a language or author. <name> may also be a function or whatever you need to handle the result.



31
32
33
34
# File 'lib/deplate/guesslanguage.rb', line 31

def register(name, corpus)
    ziplen = zip(corpus).size
    @data << [name, corpus, ziplen]
end

#zip(text) ⇒ Object



24
25
26
# File 'lib/deplate/guesslanguage.rb', line 24

def zip(text)
    Zlib::Deflate.new.deflate(text, Zlib::FINISH)
end