Class: Document

Inherits:
Object
  • Object
show all
Defined in:
lib/picolena/templates/app/models/document.rb

Overview

Document class retrieves information from filesystem and the index for any given document.

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(path) ⇒ Document

Returns a new instance of Document.



6
7
8
9
10
11
# File 'lib/picolena/templates/app/models/document.rb', line 6

def initialize(path)
  #To ensure @complete_path is an absolute direction.
  @complete_path=File.expand_path(path)
  validate_existence_of_file
  validate_in_indexed_directory
end

Instance Attribute Details

#complete_pathObject (readonly)

Returns the value of attribute complete_path.



3
4
5
# File 'lib/picolena/templates/app/models/document.rb', line 3

def complete_path
  @complete_path
end

#matching_contentObject

Returns the value of attribute matching_content.



4
5
6
# File 'lib/picolena/templates/app/models/document.rb', line 4

def matching_content
  @matching_content
end

#scoreObject

Returns the value of attribute score.



4
5
6
# File 'lib/picolena/templates/app/models/document.rb', line 4

def score
  @score
end

Class Method Details

.default_fields_for(complete_path) ⇒ Object

Fields that are shared between every document.



95
96
97
98
99
100
101
102
103
104
# File 'lib/picolena/templates/app/models/document.rb', line 95

def self.default_fields_for(complete_path)
  {
    :complete_path      => complete_path,
    :probably_unique_id => complete_path.base26_hash,
    :filename           => File.basename(complete_path),
    :basename           => File.basename(complete_path, File.extname(complete_path)).gsub(/_/,' '),
    :filetype           => File.extname(complete_path),
    :modified           => File.mtime(complete_path).strftime("%Y%m%d%H%M%S")
  }
end

Instance Method Details

#alias_pathObject

End users should not always know where documents are stored internally. An alias path can be specified in config/indexed_directories.yml

For example, with:

"/media/wiki_dump/" : "http://www.mycompany.com/wiki/"

The documents

"/media/wiki_dump/organigram.odp"

will be displayed as being:

"http://www.mycompany.com/wiki/organigram.odp"


35
36
37
38
39
# File 'lib/picolena/templates/app/models/document.rb', line 35

def alias_path
  original_dir=indexed_directory
  alias_dir=Picolena::IndexedDirectories[original_dir]
  dirname.sub(original_dir,alias_dir)
end

#basenameObject

Returns filename without extension

"buildings.odt" => "buildings"


21
22
23
# File 'lib/picolena/templates/app/models/document.rb', line 21

def basename
  filename.chomp(extname)
end

#cachedObject

Cache à la Google. Returns content as it was at the time it was indexed.



63
64
65
# File 'lib/picolena/templates/app/models/document.rb', line 63

def cached
  from_index[:content]
end

#contentObject

Retrieves content as it is now.



57
58
59
# File 'lib/picolena/templates/app/models/document.rb', line 57

def content
  PlainTextExtractor.extract_content_from(complete_path)
end

#filenameObject



17
# File 'lib/picolena/templates/app/models/document.rb', line 17

alias_method :filename, :basename

#highlighted_cache(raw_query) ⇒ Object



67
68
69
70
71
72
73
# File 'lib/picolena/templates/app/models/document.rb', line 67

def highlighted_cache(raw_query)
  #TODO: Report to Ferret. Highlight should accept :key and not only :doc_id.
  Indexer.index.highlight(Query.extract_from(raw_query), doc_id,
                          :field => :content, :excerpt_length => :all,
                          :pre_tag => "<<", :post_tag => ">>"
  ).first
end

#languageObject

Returns language.



90
91
92
# File 'lib/picolena/templates/app/models/document.rb', line 90

def language
  from_index[:language]
end

#mtimeObject



85
86
87
# File 'lib/picolena/templates/app/models/document.rb', line 85

def mtime
  from_index[:modified].to_i
end

#pretty_dateObject

Returns the last modification date before the document got indexed. Useful to know how old a document is, and to which version the cache corresponds.



77
78
79
# File 'lib/picolena/templates/app/models/document.rb', line 77

def pretty_date
  from_index[:modified].sub(/(\d{4})(\d{2})(\d{2})\d{6}/,'\1-\2-\3')
end

#pretty_mtimeObject



81
82
83
# File 'lib/picolena/templates/app/models/document.rb', line 81

def pretty_mtime
  from_index[:modified].sub(/(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})/,'\1-\2-\3 \4:\5:\6')
end

#probably_unique_idObject

Returns an id for this document. This id will be used in Controllers in order to get tiny urls. Since it’s a base26 hash of the absolute filename, it can only be “probably unique”. For huge amount of indexed documents, it would be wise to increase HashLength in config/custom/picolena.rb



45
46
47
# File 'lib/picolena/templates/app/models/document.rb', line 45

def probably_unique_id
  @probably_unique_id||=complete_path.base26_hash
end

#supported?Boolean

Returns true iff some PlainTextExtractor has been defined to convert it to plain text.

Document.new("presentation.pdf").supported? => true
Document.new("presentation.some_weird_extension").supported? => false

Returns:

  • (Boolean)


52
53
54
# File 'lib/picolena/templates/app/models/document.rb', line 52

def supported?
  PlainTextExtractor.supported_extensions.include?(self.ext_as_sym)
end