Class: RSemantic::Corpus

Inherits:
Object
  • Object
show all
Defined in:
lib/rsemantic/corpus.rb

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(documents = [], options = {}) ⇒ Corpus

TODO document options

Parameters:

  • documents (Array<Document>) (defaults to: [])

    The documents to index

  • options (Hash) (defaults to: {})


10
11
12
13
14
# File 'lib/rsemantic/corpus.rb', line 10

def initialize(documents = [], options = {})
  @documents = documents
  @options   = options
  @search    = nil
end

Instance Attribute Details

#documentsArray<Document> (readonly)

Returns:



4
5
6
# File 'lib/rsemantic/corpus.rb', line 4

def documents
  @documents
end

Instance Method Details

#add_document(document) ⇒ void Also known as: <<

This method returns an undefined value.

Adds a new document to the index.

Parameters:



20
21
22
23
# File 'lib/rsemantic/corpus.rb', line 20

def add_document(document)
  @documents << document
  document.corpora << self
end

#build_indexvoid

This method returns an undefined value.

Build the index. This is required to be able to search for words or compute related documents.

If you add new documents, you have to rebuild the index.



32
33
34
# File 'lib/rsemantic/corpus.rb', line 32

def build_index
  @search = RSemantic::Search.new(@documents.map(&:text), @options)
end

#find_keywords(document, num = 5) ⇒ Object



52
53
54
55
# File 'lib/rsemantic/corpus.rb', line 52

def find_keywords(document, num = 5)
  # TODO allow limiting keywords to words that occur in this document

end


45
46
47
48
49
50
# File 'lib/rsemantic/corpus.rb', line 45

def find_related_document(document)
  @search.related(@documents.index(document)).map.with_index { |result, index|
    document = @documents[index]
    RSemantic::SearchResult.new(document, result)
  }.sort
end

#search(*words) ⇒ Object



36
37
38
39
40
41
42
43
# File 'lib/rsemantic/corpus.rb', line 36

def search(*words)
  # TODO raise if no index built yet
  results = @search.search(words)
  results.map.with_index { |result, index|
    document = @documents[index]
    RSemantic::SearchResult.new(document, result)
  }.sort
end

#to_sObject



57
58
59
# File 'lib/rsemantic/corpus.rb', line 57

def to_s
  "#<%s %d documents, @options=%s>" % [self.class.name, @documents.size, @options.inspect]
end