Class: Document

Overview

Representation of a document in the Solr database

This class provides an ActiveRecord-like model object for documents hosted in the RLetters Solr backend. It abstracts both single-document retrieval and document searching in class-level methods, and access to the data provided by Solr in instance-level methods and attributes.

Class Attribute Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Serializers::OpenURL

#to_openurl_params

Methods included from Serializers::RIS

included, #to_ris

Methods included from Serializers::RDF

included, #to_rdf, #to_rdf_n3, #to_rdf_xml, #to_rdf_xml_node

Methods included from Serializers::MODS

included, #to_mods

Methods included from Serializers::MARC

#author_to_marc, included, #to_marc, #to_marc21, #to_marc_json, #to_marc_xml

Methods included from Serializers::EndNote

included, #to_endnote

Methods included from Serializers::CSL

#to_csl, #to_csl_entry

Methods included from Serializers::BibTex

included, #to_bibtex

Constructor Details

#initialize(attributes = {}) ⇒ Document

Set all attributes and create author lists

This constructor copies in all attributes, as well as splitting the authors value into author_list and formatted_author_list.

Parameters:

  • attributes (Hash) (defaults to: {})

    attributes for this document



225
226
227
228
229
230
231
232
233
234
235
# File 'app/models/document.rb', line 225

def initialize(attributes = {})
  super

  # Split out the author list and format it
  self.author_list = authors.split(',').map { |a| a.strip } unless authors.nil?
  unless author_list.nil?
    self.formatted_author_list = author_list.map do |a|
      BibTeX::Names.parse(a)[0]
    end
  end
end

Class Attribute Details

.serializersHash

Registration for all serializers

This variable is a hash of hashes. For its format, see the documentation for register_serializer.

Examples:

See if there is a serializer loaded for JSON

Document.serializers.has_key? :json

Returns:

  • (Hash)

    the serializer registry



122
123
124
# File 'app/models/document.rb', line 122

def serializers
  @serializers
end

Instance Attribute Details

#author_listArray<String>

Returns the document's authors, in an array

Returns:

  • (Array<String>)

    the document's authors, in an array



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'app/models/document.rb', line 98

class Document
  # Make this class act like an ActiveRecord model, though it's not backed by
  # the database (it's in Solr).
  include ActiveModel::Model

  attr_accessor :shasum, :doi, :license, :license_url, :authors,
                :author_list, :formatted_author_list, :title, :journal,
                :year, :volume, :number, :pages, :fulltext, :term_vectors

  # The shasum attribute is the only required one
  validates :shasum, presence: true
  validates :shasum, length: { is: 40 }
  validates :shasum, format: { with: /\A[a-fA-F\d]+\z/ }

  class << self
    # Registration for all serializers
    #
    # This variable is a hash of hashes.  For its format, see the documentation
    # for register_serializer.
    #
    # @api public
    # @return [Hash] the serializer registry
    # @example See if there is a serializer loaded for JSON
    #   Document.serializers.has_key? :json
    attr_accessor :serializers

    # Register a serializer
    #
    # @api public
    # @return [undefined]
    # @param [Symbol] key the MIME type key for this serializer, as defined
    #   in config/initializers/mime_types.rb
    # @param [String] name the human-readable name of this serializer format
    # @param [Proc] method a method which accepts a Document object as a
    #   parameter and returns the serialized document as a String
    # @param [String] docs a URL pointing to documentation for this method
    # @example Register a serializer for JSON
    #   Document.register_serializer (:json,
    #                                 'JSON',
    #                                 lambda { |doc| doc.to_json },
    #                                 'http://www.json.org/')
    def register_serializer(key, name, method, docs)
      Document.serializers ||= { }
      Document.serializers[key] = { name: name, method: method, docs: docs }
    end
  end

  # Serialization methods
  include Serializers::BibTex
  include Serializers::CSL
  include Serializers::EndNote
  include Serializers::MARC
  include Serializers::MODS
  include Serializers::RDF
  include Serializers::RIS
  include Serializers::OpenURL

  # Return a document (just bibliographic data) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Look up the document with ID '1234567890abcdef1234'
  #   doc = Document.find('1234567890abcdef1234')
  def self.find(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene' }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # Return a document (bibliographic data and full text) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested, including full text
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Get the full tet of the document with ID '1234567890abcdef1234'
  #   text = Document.find_with_fulltext('1234567890abcdef1234').fulltext
  def self.find_with_fulltext(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene',
                                                     tv: 'true',
                                                     fl: Solr::Connection::DEFAULT_FIELDS_FULLTEXT }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # @return [String] the starting page of this document, if it can be parsed
  def start_page
    return '' if pages.blank?
    pages.split('-')[0]
  end

  # @return [String] the ending page of this document, if it can be parsed
  def end_page
    return '' if pages.blank?
    parts = pages.split('-')
    return '' if parts.length <= 1

    spage = parts[0]
    epage = parts[-1]

    # Check for range strings like "1442-7"
    if spage.length > epage.length
      ret = spage
      ret[-epage.length..-1] = epage
    else
      ret = epage
    end
    ret
  end

  # Set all attributes and create author lists
  #
  # This constructor copies in all attributes, as well as splitting the
  # +authors+ value into +author_list+ and +formatted_author_list+.
  #
  # @api public
  # @param [Hash] attributes attributes for this document
  def initialize(attributes = {})
    super

    # Split out the author list and format it
    self.author_list = authors.split(',').map { |a| a.strip } unless authors.nil?
    unless author_list.nil?
      self.formatted_author_list = author_list.map do |a|
        BibTeX::Names.parse(a)[0]
      end
    end
  end
end

#authorsString

Returns the document's authors, in a comma-delimited list

Returns:

  • (String)

    the document's authors, in a comma-delimited list



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'app/models/document.rb', line 98

class Document
  # Make this class act like an ActiveRecord model, though it's not backed by
  # the database (it's in Solr).
  include ActiveModel::Model

  attr_accessor :shasum, :doi, :license, :license_url, :authors,
                :author_list, :formatted_author_list, :title, :journal,
                :year, :volume, :number, :pages, :fulltext, :term_vectors

  # The shasum attribute is the only required one
  validates :shasum, presence: true
  validates :shasum, length: { is: 40 }
  validates :shasum, format: { with: /\A[a-fA-F\d]+\z/ }

  class << self
    # Registration for all serializers
    #
    # This variable is a hash of hashes.  For its format, see the documentation
    # for register_serializer.
    #
    # @api public
    # @return [Hash] the serializer registry
    # @example See if there is a serializer loaded for JSON
    #   Document.serializers.has_key? :json
    attr_accessor :serializers

    # Register a serializer
    #
    # @api public
    # @return [undefined]
    # @param [Symbol] key the MIME type key for this serializer, as defined
    #   in config/initializers/mime_types.rb
    # @param [String] name the human-readable name of this serializer format
    # @param [Proc] method a method which accepts a Document object as a
    #   parameter and returns the serialized document as a String
    # @param [String] docs a URL pointing to documentation for this method
    # @example Register a serializer for JSON
    #   Document.register_serializer (:json,
    #                                 'JSON',
    #                                 lambda { |doc| doc.to_json },
    #                                 'http://www.json.org/')
    def register_serializer(key, name, method, docs)
      Document.serializers ||= { }
      Document.serializers[key] = { name: name, method: method, docs: docs }
    end
  end

  # Serialization methods
  include Serializers::BibTex
  include Serializers::CSL
  include Serializers::EndNote
  include Serializers::MARC
  include Serializers::MODS
  include Serializers::RDF
  include Serializers::RIS
  include Serializers::OpenURL

  # Return a document (just bibliographic data) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Look up the document with ID '1234567890abcdef1234'
  #   doc = Document.find('1234567890abcdef1234')
  def self.find(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene' }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # Return a document (bibliographic data and full text) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested, including full text
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Get the full tet of the document with ID '1234567890abcdef1234'
  #   text = Document.find_with_fulltext('1234567890abcdef1234').fulltext
  def self.find_with_fulltext(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene',
                                                     tv: 'true',
                                                     fl: Solr::Connection::DEFAULT_FIELDS_FULLTEXT }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # @return [String] the starting page of this document, if it can be parsed
  def start_page
    return '' if pages.blank?
    pages.split('-')[0]
  end

  # @return [String] the ending page of this document, if it can be parsed
  def end_page
    return '' if pages.blank?
    parts = pages.split('-')
    return '' if parts.length <= 1

    spage = parts[0]
    epage = parts[-1]

    # Check for range strings like "1442-7"
    if spage.length > epage.length
      ret = spage
      ret[-epage.length..-1] = epage
    else
      ret = epage
    end
    ret
  end

  # Set all attributes and create author lists
  #
  # This constructor copies in all attributes, as well as splitting the
  # +authors+ value into +author_list+ and +formatted_author_list+.
  #
  # @api public
  # @param [Hash] attributes attributes for this document
  def initialize(attributes = {})
    super

    # Split out the author list and format it
    self.author_list = authors.split(',').map { |a| a.strip } unless authors.nil?
    unless author_list.nil?
      self.formatted_author_list = author_list.map do |a|
        BibTeX::Names.parse(a)[0]
      end
    end
  end
end

#doiString

Returns the DOI (Digital Object Identifier) of this document

Returns:

  • (String)

    the DOI (Digital Object Identifier) of this document



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'app/models/document.rb', line 98

class Document
  # Make this class act like an ActiveRecord model, though it's not backed by
  # the database (it's in Solr).
  include ActiveModel::Model

  attr_accessor :shasum, :doi, :license, :license_url, :authors,
                :author_list, :formatted_author_list, :title, :journal,
                :year, :volume, :number, :pages, :fulltext, :term_vectors

  # The shasum attribute is the only required one
  validates :shasum, presence: true
  validates :shasum, length: { is: 40 }
  validates :shasum, format: { with: /\A[a-fA-F\d]+\z/ }

  class << self
    # Registration for all serializers
    #
    # This variable is a hash of hashes.  For its format, see the documentation
    # for register_serializer.
    #
    # @api public
    # @return [Hash] the serializer registry
    # @example See if there is a serializer loaded for JSON
    #   Document.serializers.has_key? :json
    attr_accessor :serializers

    # Register a serializer
    #
    # @api public
    # @return [undefined]
    # @param [Symbol] key the MIME type key for this serializer, as defined
    #   in config/initializers/mime_types.rb
    # @param [String] name the human-readable name of this serializer format
    # @param [Proc] method a method which accepts a Document object as a
    #   parameter and returns the serialized document as a String
    # @param [String] docs a URL pointing to documentation for this method
    # @example Register a serializer for JSON
    #   Document.register_serializer (:json,
    #                                 'JSON',
    #                                 lambda { |doc| doc.to_json },
    #                                 'http://www.json.org/')
    def register_serializer(key, name, method, docs)
      Document.serializers ||= { }
      Document.serializers[key] = { name: name, method: method, docs: docs }
    end
  end

  # Serialization methods
  include Serializers::BibTex
  include Serializers::CSL
  include Serializers::EndNote
  include Serializers::MARC
  include Serializers::MODS
  include Serializers::RDF
  include Serializers::RIS
  include Serializers::OpenURL

  # Return a document (just bibliographic data) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Look up the document with ID '1234567890abcdef1234'
  #   doc = Document.find('1234567890abcdef1234')
  def self.find(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene' }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # Return a document (bibliographic data and full text) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested, including full text
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Get the full tet of the document with ID '1234567890abcdef1234'
  #   text = Document.find_with_fulltext('1234567890abcdef1234').fulltext
  def self.find_with_fulltext(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene',
                                                     tv: 'true',
                                                     fl: Solr::Connection::DEFAULT_FIELDS_FULLTEXT }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # @return [String] the starting page of this document, if it can be parsed
  def start_page
    return '' if pages.blank?
    pages.split('-')[0]
  end

  # @return [String] the ending page of this document, if it can be parsed
  def end_page
    return '' if pages.blank?
    parts = pages.split('-')
    return '' if parts.length <= 1

    spage = parts[0]
    epage = parts[-1]

    # Check for range strings like "1442-7"
    if spage.length > epage.length
      ret = spage
      ret[-epage.length..-1] = epage
    else
      ret = epage
    end
    ret
  end

  # Set all attributes and create author lists
  #
  # This constructor copies in all attributes, as well as splitting the
  # +authors+ value into +author_list+ and +formatted_author_list+.
  #
  # @api public
  # @param [Hash] attributes attributes for this document
  def initialize(attributes = {})
    super

    # Split out the author list and format it
    self.author_list = authors.split(',').map { |a| a.strip } unless authors.nil?
    unless author_list.nil?
      self.formatted_author_list = author_list.map do |a|
        BibTeX::Names.parse(a)[0]
      end
    end
  end
end

#formatted_author_listArray<Hash>

Returns the document's authors, split into name parts, in an array

Returns:

  • (Array<Hash>)

    the document's authors, split into name parts, in an array



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'app/models/document.rb', line 98

class Document
  # Make this class act like an ActiveRecord model, though it's not backed by
  # the database (it's in Solr).
  include ActiveModel::Model

  attr_accessor :shasum, :doi, :license, :license_url, :authors,
                :author_list, :formatted_author_list, :title, :journal,
                :year, :volume, :number, :pages, :fulltext, :term_vectors

  # The shasum attribute is the only required one
  validates :shasum, presence: true
  validates :shasum, length: { is: 40 }
  validates :shasum, format: { with: /\A[a-fA-F\d]+\z/ }

  class << self
    # Registration for all serializers
    #
    # This variable is a hash of hashes.  For its format, see the documentation
    # for register_serializer.
    #
    # @api public
    # @return [Hash] the serializer registry
    # @example See if there is a serializer loaded for JSON
    #   Document.serializers.has_key? :json
    attr_accessor :serializers

    # Register a serializer
    #
    # @api public
    # @return [undefined]
    # @param [Symbol] key the MIME type key for this serializer, as defined
    #   in config/initializers/mime_types.rb
    # @param [String] name the human-readable name of this serializer format
    # @param [Proc] method a method which accepts a Document object as a
    #   parameter and returns the serialized document as a String
    # @param [String] docs a URL pointing to documentation for this method
    # @example Register a serializer for JSON
    #   Document.register_serializer (:json,
    #                                 'JSON',
    #                                 lambda { |doc| doc.to_json },
    #                                 'http://www.json.org/')
    def register_serializer(key, name, method, docs)
      Document.serializers ||= { }
      Document.serializers[key] = { name: name, method: method, docs: docs }
    end
  end

  # Serialization methods
  include Serializers::BibTex
  include Serializers::CSL
  include Serializers::EndNote
  include Serializers::MARC
  include Serializers::MODS
  include Serializers::RDF
  include Serializers::RIS
  include Serializers::OpenURL

  # Return a document (just bibliographic data) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Look up the document with ID '1234567890abcdef1234'
  #   doc = Document.find('1234567890abcdef1234')
  def self.find(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene' }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # Return a document (bibliographic data and full text) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested, including full text
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Get the full tet of the document with ID '1234567890abcdef1234'
  #   text = Document.find_with_fulltext('1234567890abcdef1234').fulltext
  def self.find_with_fulltext(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene',
                                                     tv: 'true',
                                                     fl: Solr::Connection::DEFAULT_FIELDS_FULLTEXT }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # @return [String] the starting page of this document, if it can be parsed
  def start_page
    return '' if pages.blank?
    pages.split('-')[0]
  end

  # @return [String] the ending page of this document, if it can be parsed
  def end_page
    return '' if pages.blank?
    parts = pages.split('-')
    return '' if parts.length <= 1

    spage = parts[0]
    epage = parts[-1]

    # Check for range strings like "1442-7"
    if spage.length > epage.length
      ret = spage
      ret[-epage.length..-1] = epage
    else
      ret = epage
    end
    ret
  end

  # Set all attributes and create author lists
  #
  # This constructor copies in all attributes, as well as splitting the
  # +authors+ value into +author_list+ and +formatted_author_list+.
  #
  # @api public
  # @param [Hash] attributes attributes for this document
  def initialize(attributes = {})
    super

    # Split out the author list and format it
    self.author_list = authors.split(',').map { |a| a.strip } unless authors.nil?
    unless author_list.nil?
      self.formatted_author_list = author_list.map do |a|
        BibTeX::Names.parse(a)[0]
      end
    end
  end
end

#fulltetString (readonly)

Returns the full text of this document. May be nil if the query type used to retrieve the document does not provide the full text

Returns:

  • (String)

    the full text of this document. May be nil if the query type used to retrieve the document does not provide the full text



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'app/models/document.rb', line 98

class Document
  # Make this class act like an ActiveRecord model, though it's not backed by
  # the database (it's in Solr).
  include ActiveModel::Model

  attr_accessor :shasum, :doi, :license, :license_url, :authors,
                :author_list, :formatted_author_list, :title, :journal,
                :year, :volume, :number, :pages, :fulltext, :term_vectors

  # The shasum attribute is the only required one
  validates :shasum, presence: true
  validates :shasum, length: { is: 40 }
  validates :shasum, format: { with: /\A[a-fA-F\d]+\z/ }

  class << self
    # Registration for all serializers
    #
    # This variable is a hash of hashes.  For its format, see the documentation
    # for register_serializer.
    #
    # @api public
    # @return [Hash] the serializer registry
    # @example See if there is a serializer loaded for JSON
    #   Document.serializers.has_key? :json
    attr_accessor :serializers

    # Register a serializer
    #
    # @api public
    # @return [undefined]
    # @param [Symbol] key the MIME type key for this serializer, as defined
    #   in config/initializers/mime_types.rb
    # @param [String] name the human-readable name of this serializer format
    # @param [Proc] method a method which accepts a Document object as a
    #   parameter and returns the serialized document as a String
    # @param [String] docs a URL pointing to documentation for this method
    # @example Register a serializer for JSON
    #   Document.register_serializer (:json,
    #                                 'JSON',
    #                                 lambda { |doc| doc.to_json },
    #                                 'http://www.json.org/')
    def register_serializer(key, name, method, docs)
      Document.serializers ||= { }
      Document.serializers[key] = { name: name, method: method, docs: docs }
    end
  end

  # Serialization methods
  include Serializers::BibTex
  include Serializers::CSL
  include Serializers::EndNote
  include Serializers::MARC
  include Serializers::MODS
  include Serializers::RDF
  include Serializers::RIS
  include Serializers::OpenURL

  # Return a document (just bibliographic data) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Look up the document with ID '1234567890abcdef1234'
  #   doc = Document.find('1234567890abcdef1234')
  def self.find(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene' }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # Return a document (bibliographic data and full text) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested, including full text
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Get the full tet of the document with ID '1234567890abcdef1234'
  #   text = Document.find_with_fulltext('1234567890abcdef1234').fulltext
  def self.find_with_fulltext(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene',
                                                     tv: 'true',
                                                     fl: Solr::Connection::DEFAULT_FIELDS_FULLTEXT }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # @return [String] the starting page of this document, if it can be parsed
  def start_page
    return '' if pages.blank?
    pages.split('-')[0]
  end

  # @return [String] the ending page of this document, if it can be parsed
  def end_page
    return '' if pages.blank?
    parts = pages.split('-')
    return '' if parts.length <= 1

    spage = parts[0]
    epage = parts[-1]

    # Check for range strings like "1442-7"
    if spage.length > epage.length
      ret = spage
      ret[-epage.length..-1] = epage
    else
      ret = epage
    end
    ret
  end

  # Set all attributes and create author lists
  #
  # This constructor copies in all attributes, as well as splitting the
  # +authors+ value into +author_list+ and +formatted_author_list+.
  #
  # @api public
  # @param [Hash] attributes attributes for this document
  def initialize(attributes = {})
    super

    # Split out the author list and format it
    self.author_list = authors.split(',').map { |a| a.strip } unless authors.nil?
    unless author_list.nil?
      self.formatted_author_list = author_list.map do |a|
        BibTeX::Names.parse(a)[0]
      end
    end
  end
end

#fulltextObject

Returns the value of attribute fulltext



103
104
105
# File 'app/models/document.rb', line 103

def fulltext
  @fulltext
end

#journalString

Returns the journal in which this document was published

Returns:

  • (String)

    the journal in which this document was published



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'app/models/document.rb', line 98

class Document
  # Make this class act like an ActiveRecord model, though it's not backed by
  # the database (it's in Solr).
  include ActiveModel::Model

  attr_accessor :shasum, :doi, :license, :license_url, :authors,
                :author_list, :formatted_author_list, :title, :journal,
                :year, :volume, :number, :pages, :fulltext, :term_vectors

  # The shasum attribute is the only required one
  validates :shasum, presence: true
  validates :shasum, length: { is: 40 }
  validates :shasum, format: { with: /\A[a-fA-F\d]+\z/ }

  class << self
    # Registration for all serializers
    #
    # This variable is a hash of hashes.  For its format, see the documentation
    # for register_serializer.
    #
    # @api public
    # @return [Hash] the serializer registry
    # @example See if there is a serializer loaded for JSON
    #   Document.serializers.has_key? :json
    attr_accessor :serializers

    # Register a serializer
    #
    # @api public
    # @return [undefined]
    # @param [Symbol] key the MIME type key for this serializer, as defined
    #   in config/initializers/mime_types.rb
    # @param [String] name the human-readable name of this serializer format
    # @param [Proc] method a method which accepts a Document object as a
    #   parameter and returns the serialized document as a String
    # @param [String] docs a URL pointing to documentation for this method
    # @example Register a serializer for JSON
    #   Document.register_serializer (:json,
    #                                 'JSON',
    #                                 lambda { |doc| doc.to_json },
    #                                 'http://www.json.org/')
    def register_serializer(key, name, method, docs)
      Document.serializers ||= { }
      Document.serializers[key] = { name: name, method: method, docs: docs }
    end
  end

  # Serialization methods
  include Serializers::BibTex
  include Serializers::CSL
  include Serializers::EndNote
  include Serializers::MARC
  include Serializers::MODS
  include Serializers::RDF
  include Serializers::RIS
  include Serializers::OpenURL

  # Return a document (just bibliographic data) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Look up the document with ID '1234567890abcdef1234'
  #   doc = Document.find('1234567890abcdef1234')
  def self.find(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene' }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # Return a document (bibliographic data and full text) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested, including full text
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Get the full tet of the document with ID '1234567890abcdef1234'
  #   text = Document.find_with_fulltext('1234567890abcdef1234').fulltext
  def self.find_with_fulltext(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene',
                                                     tv: 'true',
                                                     fl: Solr::Connection::DEFAULT_FIELDS_FULLTEXT }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # @return [String] the starting page of this document, if it can be parsed
  def start_page
    return '' if pages.blank?
    pages.split('-')[0]
  end

  # @return [String] the ending page of this document, if it can be parsed
  def end_page
    return '' if pages.blank?
    parts = pages.split('-')
    return '' if parts.length <= 1

    spage = parts[0]
    epage = parts[-1]

    # Check for range strings like "1442-7"
    if spage.length > epage.length
      ret = spage
      ret[-epage.length..-1] = epage
    else
      ret = epage
    end
    ret
  end

  # Set all attributes and create author lists
  #
  # This constructor copies in all attributes, as well as splitting the
  # +authors+ value into +author_list+ and +formatted_author_list+.
  #
  # @api public
  # @param [Hash] attributes attributes for this document
  def initialize(attributes = {})
    super

    # Split out the author list and format it
    self.author_list = authors.split(',').map { |a| a.strip } unless authors.nil?
    unless author_list.nil?
      self.formatted_author_list = author_list.map do |a|
        BibTeX::Names.parse(a)[0]
      end
    end
  end
end

#licenseString

Returns the human-readable name of the document's license

Returns:

  • (String)

    the human-readable name of the document's license



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'app/models/document.rb', line 98

class Document
  # Make this class act like an ActiveRecord model, though it's not backed by
  # the database (it's in Solr).
  include ActiveModel::Model

  attr_accessor :shasum, :doi, :license, :license_url, :authors,
                :author_list, :formatted_author_list, :title, :journal,
                :year, :volume, :number, :pages, :fulltext, :term_vectors

  # The shasum attribute is the only required one
  validates :shasum, presence: true
  validates :shasum, length: { is: 40 }
  validates :shasum, format: { with: /\A[a-fA-F\d]+\z/ }

  class << self
    # Registration for all serializers
    #
    # This variable is a hash of hashes.  For its format, see the documentation
    # for register_serializer.
    #
    # @api public
    # @return [Hash] the serializer registry
    # @example See if there is a serializer loaded for JSON
    #   Document.serializers.has_key? :json
    attr_accessor :serializers

    # Register a serializer
    #
    # @api public
    # @return [undefined]
    # @param [Symbol] key the MIME type key for this serializer, as defined
    #   in config/initializers/mime_types.rb
    # @param [String] name the human-readable name of this serializer format
    # @param [Proc] method a method which accepts a Document object as a
    #   parameter and returns the serialized document as a String
    # @param [String] docs a URL pointing to documentation for this method
    # @example Register a serializer for JSON
    #   Document.register_serializer (:json,
    #                                 'JSON',
    #                                 lambda { |doc| doc.to_json },
    #                                 'http://www.json.org/')
    def register_serializer(key, name, method, docs)
      Document.serializers ||= { }
      Document.serializers[key] = { name: name, method: method, docs: docs }
    end
  end

  # Serialization methods
  include Serializers::BibTex
  include Serializers::CSL
  include Serializers::EndNote
  include Serializers::MARC
  include Serializers::MODS
  include Serializers::RDF
  include Serializers::RIS
  include Serializers::OpenURL

  # Return a document (just bibliographic data) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Look up the document with ID '1234567890abcdef1234'
  #   doc = Document.find('1234567890abcdef1234')
  def self.find(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene' }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # Return a document (bibliographic data and full text) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested, including full text
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Get the full tet of the document with ID '1234567890abcdef1234'
  #   text = Document.find_with_fulltext('1234567890abcdef1234').fulltext
  def self.find_with_fulltext(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene',
                                                     tv: 'true',
                                                     fl: Solr::Connection::DEFAULT_FIELDS_FULLTEXT }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # @return [String] the starting page of this document, if it can be parsed
  def start_page
    return '' if pages.blank?
    pages.split('-')[0]
  end

  # @return [String] the ending page of this document, if it can be parsed
  def end_page
    return '' if pages.blank?
    parts = pages.split('-')
    return '' if parts.length <= 1

    spage = parts[0]
    epage = parts[-1]

    # Check for range strings like "1442-7"
    if spage.length > epage.length
      ret = spage
      ret[-epage.length..-1] = epage
    else
      ret = epage
    end
    ret
  end

  # Set all attributes and create author lists
  #
  # This constructor copies in all attributes, as well as splitting the
  # +authors+ value into +author_list+ and +formatted_author_list+.
  #
  # @api public
  # @param [Hash] attributes attributes for this document
  def initialize(attributes = {})
    super

    # Split out the author list and format it
    self.author_list = authors.split(',').map { |a| a.strip } unless authors.nil?
    unless author_list.nil?
      self.formatted_author_list = author_list.map do |a|
        BibTeX::Names.parse(a)[0]
      end
    end
  end
end

#license_urlString

Returns a URL referencing the document's license terms

Returns:

  • (String)

    a URL referencing the document's license terms



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'app/models/document.rb', line 98

class Document
  # Make this class act like an ActiveRecord model, though it's not backed by
  # the database (it's in Solr).
  include ActiveModel::Model

  attr_accessor :shasum, :doi, :license, :license_url, :authors,
                :author_list, :formatted_author_list, :title, :journal,
                :year, :volume, :number, :pages, :fulltext, :term_vectors

  # The shasum attribute is the only required one
  validates :shasum, presence: true
  validates :shasum, length: { is: 40 }
  validates :shasum, format: { with: /\A[a-fA-F\d]+\z/ }

  class << self
    # Registration for all serializers
    #
    # This variable is a hash of hashes.  For its format, see the documentation
    # for register_serializer.
    #
    # @api public
    # @return [Hash] the serializer registry
    # @example See if there is a serializer loaded for JSON
    #   Document.serializers.has_key? :json
    attr_accessor :serializers

    # Register a serializer
    #
    # @api public
    # @return [undefined]
    # @param [Symbol] key the MIME type key for this serializer, as defined
    #   in config/initializers/mime_types.rb
    # @param [String] name the human-readable name of this serializer format
    # @param [Proc] method a method which accepts a Document object as a
    #   parameter and returns the serialized document as a String
    # @param [String] docs a URL pointing to documentation for this method
    # @example Register a serializer for JSON
    #   Document.register_serializer (:json,
    #                                 'JSON',
    #                                 lambda { |doc| doc.to_json },
    #                                 'http://www.json.org/')
    def register_serializer(key, name, method, docs)
      Document.serializers ||= { }
      Document.serializers[key] = { name: name, method: method, docs: docs }
    end
  end

  # Serialization methods
  include Serializers::BibTex
  include Serializers::CSL
  include Serializers::EndNote
  include Serializers::MARC
  include Serializers::MODS
  include Serializers::RDF
  include Serializers::RIS
  include Serializers::OpenURL

  # Return a document (just bibliographic data) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Look up the document with ID '1234567890abcdef1234'
  #   doc = Document.find('1234567890abcdef1234')
  def self.find(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene' }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # Return a document (bibliographic data and full text) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested, including full text
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Get the full tet of the document with ID '1234567890abcdef1234'
  #   text = Document.find_with_fulltext('1234567890abcdef1234').fulltext
  def self.find_with_fulltext(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene',
                                                     tv: 'true',
                                                     fl: Solr::Connection::DEFAULT_FIELDS_FULLTEXT }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # @return [String] the starting page of this document, if it can be parsed
  def start_page
    return '' if pages.blank?
    pages.split('-')[0]
  end

  # @return [String] the ending page of this document, if it can be parsed
  def end_page
    return '' if pages.blank?
    parts = pages.split('-')
    return '' if parts.length <= 1

    spage = parts[0]
    epage = parts[-1]

    # Check for range strings like "1442-7"
    if spage.length > epage.length
      ret = spage
      ret[-epage.length..-1] = epage
    else
      ret = epage
    end
    ret
  end

  # Set all attributes and create author lists
  #
  # This constructor copies in all attributes, as well as splitting the
  # +authors+ value into +author_list+ and +formatted_author_list+.
  #
  # @api public
  # @param [Hash] attributes attributes for this document
  def initialize(attributes = {})
    super

    # Split out the author list and format it
    self.author_list = authors.split(',').map { |a| a.strip } unless authors.nil?
    unless author_list.nil?
      self.formatted_author_list = author_list.map do |a|
        BibTeX::Names.parse(a)[0]
      end
    end
  end
end

#numberString

Returns the journal issue number in which this document was published

Returns:

  • (String)

    the journal issue number in which this document was published



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'app/models/document.rb', line 98

class Document
  # Make this class act like an ActiveRecord model, though it's not backed by
  # the database (it's in Solr).
  include ActiveModel::Model

  attr_accessor :shasum, :doi, :license, :license_url, :authors,
                :author_list, :formatted_author_list, :title, :journal,
                :year, :volume, :number, :pages, :fulltext, :term_vectors

  # The shasum attribute is the only required one
  validates :shasum, presence: true
  validates :shasum, length: { is: 40 }
  validates :shasum, format: { with: /\A[a-fA-F\d]+\z/ }

  class << self
    # Registration for all serializers
    #
    # This variable is a hash of hashes.  For its format, see the documentation
    # for register_serializer.
    #
    # @api public
    # @return [Hash] the serializer registry
    # @example See if there is a serializer loaded for JSON
    #   Document.serializers.has_key? :json
    attr_accessor :serializers

    # Register a serializer
    #
    # @api public
    # @return [undefined]
    # @param [Symbol] key the MIME type key for this serializer, as defined
    #   in config/initializers/mime_types.rb
    # @param [String] name the human-readable name of this serializer format
    # @param [Proc] method a method which accepts a Document object as a
    #   parameter and returns the serialized document as a String
    # @param [String] docs a URL pointing to documentation for this method
    # @example Register a serializer for JSON
    #   Document.register_serializer (:json,
    #                                 'JSON',
    #                                 lambda { |doc| doc.to_json },
    #                                 'http://www.json.org/')
    def register_serializer(key, name, method, docs)
      Document.serializers ||= { }
      Document.serializers[key] = { name: name, method: method, docs: docs }
    end
  end

  # Serialization methods
  include Serializers::BibTex
  include Serializers::CSL
  include Serializers::EndNote
  include Serializers::MARC
  include Serializers::MODS
  include Serializers::RDF
  include Serializers::RIS
  include Serializers::OpenURL

  # Return a document (just bibliographic data) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Look up the document with ID '1234567890abcdef1234'
  #   doc = Document.find('1234567890abcdef1234')
  def self.find(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene' }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # Return a document (bibliographic data and full text) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested, including full text
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Get the full tet of the document with ID '1234567890abcdef1234'
  #   text = Document.find_with_fulltext('1234567890abcdef1234').fulltext
  def self.find_with_fulltext(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene',
                                                     tv: 'true',
                                                     fl: Solr::Connection::DEFAULT_FIELDS_FULLTEXT }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # @return [String] the starting page of this document, if it can be parsed
  def start_page
    return '' if pages.blank?
    pages.split('-')[0]
  end

  # @return [String] the ending page of this document, if it can be parsed
  def end_page
    return '' if pages.blank?
    parts = pages.split('-')
    return '' if parts.length <= 1

    spage = parts[0]
    epage = parts[-1]

    # Check for range strings like "1442-7"
    if spage.length > epage.length
      ret = spage
      ret[-epage.length..-1] = epage
    else
      ret = epage
    end
    ret
  end

  # Set all attributes and create author lists
  #
  # This constructor copies in all attributes, as well as splitting the
  # +authors+ value into +author_list+ and +formatted_author_list+.
  #
  # @api public
  # @param [Hash] attributes attributes for this document
  def initialize(attributes = {})
    super

    # Split out the author list and format it
    self.author_list = authors.split(',').map { |a| a.strip } unless authors.nil?
    unless author_list.nil?
      self.formatted_author_list = author_list.map do |a|
        BibTeX::Names.parse(a)[0]
      end
    end
  end
end

#pagesString

Returns the page numbers in the journal of this document, in the format 'start-end'

Returns:

  • (String)

    the page numbers in the journal of this document, in the format 'start-end'



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'app/models/document.rb', line 98

class Document
  # Make this class act like an ActiveRecord model, though it's not backed by
  # the database (it's in Solr).
  include ActiveModel::Model

  attr_accessor :shasum, :doi, :license, :license_url, :authors,
                :author_list, :formatted_author_list, :title, :journal,
                :year, :volume, :number, :pages, :fulltext, :term_vectors

  # The shasum attribute is the only required one
  validates :shasum, presence: true
  validates :shasum, length: { is: 40 }
  validates :shasum, format: { with: /\A[a-fA-F\d]+\z/ }

  class << self
    # Registration for all serializers
    #
    # This variable is a hash of hashes.  For its format, see the documentation
    # for register_serializer.
    #
    # @api public
    # @return [Hash] the serializer registry
    # @example See if there is a serializer loaded for JSON
    #   Document.serializers.has_key? :json
    attr_accessor :serializers

    # Register a serializer
    #
    # @api public
    # @return [undefined]
    # @param [Symbol] key the MIME type key for this serializer, as defined
    #   in config/initializers/mime_types.rb
    # @param [String] name the human-readable name of this serializer format
    # @param [Proc] method a method which accepts a Document object as a
    #   parameter and returns the serialized document as a String
    # @param [String] docs a URL pointing to documentation for this method
    # @example Register a serializer for JSON
    #   Document.register_serializer (:json,
    #                                 'JSON',
    #                                 lambda { |doc| doc.to_json },
    #                                 'http://www.json.org/')
    def register_serializer(key, name, method, docs)
      Document.serializers ||= { }
      Document.serializers[key] = { name: name, method: method, docs: docs }
    end
  end

  # Serialization methods
  include Serializers::BibTex
  include Serializers::CSL
  include Serializers::EndNote
  include Serializers::MARC
  include Serializers::MODS
  include Serializers::RDF
  include Serializers::RIS
  include Serializers::OpenURL

  # Return a document (just bibliographic data) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Look up the document with ID '1234567890abcdef1234'
  #   doc = Document.find('1234567890abcdef1234')
  def self.find(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene' }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # Return a document (bibliographic data and full text) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested, including full text
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Get the full tet of the document with ID '1234567890abcdef1234'
  #   text = Document.find_with_fulltext('1234567890abcdef1234').fulltext
  def self.find_with_fulltext(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene',
                                                     tv: 'true',
                                                     fl: Solr::Connection::DEFAULT_FIELDS_FULLTEXT }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # @return [String] the starting page of this document, if it can be parsed
  def start_page
    return '' if pages.blank?
    pages.split('-')[0]
  end

  # @return [String] the ending page of this document, if it can be parsed
  def end_page
    return '' if pages.blank?
    parts = pages.split('-')
    return '' if parts.length <= 1

    spage = parts[0]
    epage = parts[-1]

    # Check for range strings like "1442-7"
    if spage.length > epage.length
      ret = spage
      ret[-epage.length..-1] = epage
    else
      ret = epage
    end
    ret
  end

  # Set all attributes and create author lists
  #
  # This constructor copies in all attributes, as well as splitting the
  # +authors+ value into +author_list+ and +formatted_author_list+.
  #
  # @api public
  # @param [Hash] attributes attributes for this document
  def initialize(attributes = {})
    super

    # Split out the author list and format it
    self.author_list = authors.split(',').map { |a| a.strip } unless authors.nil?
    unless author_list.nil?
      self.formatted_author_list = author_list.map do |a|
        BibTeX::Names.parse(a)[0]
      end
    end
  end
end

#shasumString

Returns the SHA-1 checksum of this document

Returns:

  • (String)

    the SHA-1 checksum of this document

Raises:

  • (RecordInvalid)

    if the SHA-1 checksum is missing (validates :presence)

  • (RecordInvalid)

    if the SHA-1 checksum is not 40 characters (validates :length)

  • (RecordInvalid)

    if the SHA-1 checksum contains invalid characters (validates :format)



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'app/models/document.rb', line 98

class Document
  # Make this class act like an ActiveRecord model, though it's not backed by
  # the database (it's in Solr).
  include ActiveModel::Model

  attr_accessor :shasum, :doi, :license, :license_url, :authors,
                :author_list, :formatted_author_list, :title, :journal,
                :year, :volume, :number, :pages, :fulltext, :term_vectors

  # The shasum attribute is the only required one
  validates :shasum, presence: true
  validates :shasum, length: { is: 40 }
  validates :shasum, format: { with: /\A[a-fA-F\d]+\z/ }

  class << self
    # Registration for all serializers
    #
    # This variable is a hash of hashes.  For its format, see the documentation
    # for register_serializer.
    #
    # @api public
    # @return [Hash] the serializer registry
    # @example See if there is a serializer loaded for JSON
    #   Document.serializers.has_key? :json
    attr_accessor :serializers

    # Register a serializer
    #
    # @api public
    # @return [undefined]
    # @param [Symbol] key the MIME type key for this serializer, as defined
    #   in config/initializers/mime_types.rb
    # @param [String] name the human-readable name of this serializer format
    # @param [Proc] method a method which accepts a Document object as a
    #   parameter and returns the serialized document as a String
    # @param [String] docs a URL pointing to documentation for this method
    # @example Register a serializer for JSON
    #   Document.register_serializer (:json,
    #                                 'JSON',
    #                                 lambda { |doc| doc.to_json },
    #                                 'http://www.json.org/')
    def register_serializer(key, name, method, docs)
      Document.serializers ||= { }
      Document.serializers[key] = { name: name, method: method, docs: docs }
    end
  end

  # Serialization methods
  include Serializers::BibTex
  include Serializers::CSL
  include Serializers::EndNote
  include Serializers::MARC
  include Serializers::MODS
  include Serializers::RDF
  include Serializers::RIS
  include Serializers::OpenURL

  # Return a document (just bibliographic data) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Look up the document with ID '1234567890abcdef1234'
  #   doc = Document.find('1234567890abcdef1234')
  def self.find(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene' }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # Return a document (bibliographic data and full text) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested, including full text
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Get the full tet of the document with ID '1234567890abcdef1234'
  #   text = Document.find_with_fulltext('1234567890abcdef1234').fulltext
  def self.find_with_fulltext(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene',
                                                     tv: 'true',
                                                     fl: Solr::Connection::DEFAULT_FIELDS_FULLTEXT }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # @return [String] the starting page of this document, if it can be parsed
  def start_page
    return '' if pages.blank?
    pages.split('-')[0]
  end

  # @return [String] the ending page of this document, if it can be parsed
  def end_page
    return '' if pages.blank?
    parts = pages.split('-')
    return '' if parts.length <= 1

    spage = parts[0]
    epage = parts[-1]

    # Check for range strings like "1442-7"
    if spage.length > epage.length
      ret = spage
      ret[-epage.length..-1] = epage
    else
      ret = epage
    end
    ret
  end

  # Set all attributes and create author lists
  #
  # This constructor copies in all attributes, as well as splitting the
  # +authors+ value into +author_list+ and +formatted_author_list+.
  #
  # @api public
  # @param [Hash] attributes attributes for this document
  def initialize(attributes = {})
    super

    # Split out the author list and format it
    self.author_list = authors.split(',').map { |a| a.strip } unless authors.nil?
    unless author_list.nil?
      self.formatted_author_list = author_list.map do |a|
        BibTeX::Names.parse(a)[0]
      end
    end
  end
end

#term_vectorsHash

Note:

This attribute may be nil, if the query type requested from the Solr server does not return term vectors.

Term vectors for this document

The Solr server returns a list of information for each term in every document. The following data is provided (based on Solr server configuration):

  • :tf, term frequency: the number of times this term appears in the given document

  • :offsets, term offsets: the start and end character offsets for this word within fulltext. Note that these offsets can be complicated by string encoding issues, be careful when using them!

  • :positions, term positions: the position of this word (in _number of words_) within fulltext. Note that these positions rely on the precise way in which Solr splits words, which is specified by Unicode UAX #29.

  • :df, document frequency: the number of documents in the collection that contain this word

  • :tfidf, term frequency-inverse document frequency: equal to (term frequency / number of words in this document) * log(size of collection / document frequency). A measure of how “significant” or “important” a given word is within a document, which gives high weight to words that occur frequently in a given document but do not occur in other documents.

Examples:

Get the frequency of the term 'general' in this document

doc.term_vectors['general'][:tf]

Returns:

  • (Hash)

    term vector information. The hash contains the following keys:

    term_vectors['word']
    term_vectors['word'][:tf] = Integer
    term_vectors['word'][:offsets] = Array<Range>
    term_vectors['word'][:offsets][0] = Range
    # ...
    term_vectors['word'][:positions] = Array<Integer>
    term_vectors['word'][:positions][0] = Integer
    # ...
    term_vectors['word'][:df] = Float
    term_vectors['word'][:tfidf] = Float
    term_vectors['otherword']
    # ...


98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'app/models/document.rb', line 98

class Document
  # Make this class act like an ActiveRecord model, though it's not backed by
  # the database (it's in Solr).
  include ActiveModel::Model

  attr_accessor :shasum, :doi, :license, :license_url, :authors,
                :author_list, :formatted_author_list, :title, :journal,
                :year, :volume, :number, :pages, :fulltext, :term_vectors

  # The shasum attribute is the only required one
  validates :shasum, presence: true
  validates :shasum, length: { is: 40 }
  validates :shasum, format: { with: /\A[a-fA-F\d]+\z/ }

  class << self
    # Registration for all serializers
    #
    # This variable is a hash of hashes.  For its format, see the documentation
    # for register_serializer.
    #
    # @api public
    # @return [Hash] the serializer registry
    # @example See if there is a serializer loaded for JSON
    #   Document.serializers.has_key? :json
    attr_accessor :serializers

    # Register a serializer
    #
    # @api public
    # @return [undefined]
    # @param [Symbol] key the MIME type key for this serializer, as defined
    #   in config/initializers/mime_types.rb
    # @param [String] name the human-readable name of this serializer format
    # @param [Proc] method a method which accepts a Document object as a
    #   parameter and returns the serialized document as a String
    # @param [String] docs a URL pointing to documentation for this method
    # @example Register a serializer for JSON
    #   Document.register_serializer (:json,
    #                                 'JSON',
    #                                 lambda { |doc| doc.to_json },
    #                                 'http://www.json.org/')
    def register_serializer(key, name, method, docs)
      Document.serializers ||= { }
      Document.serializers[key] = { name: name, method: method, docs: docs }
    end
  end

  # Serialization methods
  include Serializers::BibTex
  include Serializers::CSL
  include Serializers::EndNote
  include Serializers::MARC
  include Serializers::MODS
  include Serializers::RDF
  include Serializers::RIS
  include Serializers::OpenURL

  # Return a document (just bibliographic data) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Look up the document with ID '1234567890abcdef1234'
  #   doc = Document.find('1234567890abcdef1234')
  def self.find(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene' }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # Return a document (bibliographic data and full text) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested, including full text
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Get the full tet of the document with ID '1234567890abcdef1234'
  #   text = Document.find_with_fulltext('1234567890abcdef1234').fulltext
  def self.find_with_fulltext(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene',
                                                     tv: 'true',
                                                     fl: Solr::Connection::DEFAULT_FIELDS_FULLTEXT }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # @return [String] the starting page of this document, if it can be parsed
  def start_page
    return '' if pages.blank?
    pages.split('-')[0]
  end

  # @return [String] the ending page of this document, if it can be parsed
  def end_page
    return '' if pages.blank?
    parts = pages.split('-')
    return '' if parts.length <= 1

    spage = parts[0]
    epage = parts[-1]

    # Check for range strings like "1442-7"
    if spage.length > epage.length
      ret = spage
      ret[-epage.length..-1] = epage
    else
      ret = epage
    end
    ret
  end

  # Set all attributes and create author lists
  #
  # This constructor copies in all attributes, as well as splitting the
  # +authors+ value into +author_list+ and +formatted_author_list+.
  #
  # @api public
  # @param [Hash] attributes attributes for this document
  def initialize(attributes = {})
    super

    # Split out the author list and format it
    self.author_list = authors.split(',').map { |a| a.strip } unless authors.nil?
    unless author_list.nil?
      self.formatted_author_list = author_list.map do |a|
        BibTeX::Names.parse(a)[0]
      end
    end
  end
end

#titleString

Returns the title of this document

Returns:

  • (String)

    the title of this document



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'app/models/document.rb', line 98

class Document
  # Make this class act like an ActiveRecord model, though it's not backed by
  # the database (it's in Solr).
  include ActiveModel::Model

  attr_accessor :shasum, :doi, :license, :license_url, :authors,
                :author_list, :formatted_author_list, :title, :journal,
                :year, :volume, :number, :pages, :fulltext, :term_vectors

  # The shasum attribute is the only required one
  validates :shasum, presence: true
  validates :shasum, length: { is: 40 }
  validates :shasum, format: { with: /\A[a-fA-F\d]+\z/ }

  class << self
    # Registration for all serializers
    #
    # This variable is a hash of hashes.  For its format, see the documentation
    # for register_serializer.
    #
    # @api public
    # @return [Hash] the serializer registry
    # @example See if there is a serializer loaded for JSON
    #   Document.serializers.has_key? :json
    attr_accessor :serializers

    # Register a serializer
    #
    # @api public
    # @return [undefined]
    # @param [Symbol] key the MIME type key for this serializer, as defined
    #   in config/initializers/mime_types.rb
    # @param [String] name the human-readable name of this serializer format
    # @param [Proc] method a method which accepts a Document object as a
    #   parameter and returns the serialized document as a String
    # @param [String] docs a URL pointing to documentation for this method
    # @example Register a serializer for JSON
    #   Document.register_serializer (:json,
    #                                 'JSON',
    #                                 lambda { |doc| doc.to_json },
    #                                 'http://www.json.org/')
    def register_serializer(key, name, method, docs)
      Document.serializers ||= { }
      Document.serializers[key] = { name: name, method: method, docs: docs }
    end
  end

  # Serialization methods
  include Serializers::BibTex
  include Serializers::CSL
  include Serializers::EndNote
  include Serializers::MARC
  include Serializers::MODS
  include Serializers::RDF
  include Serializers::RIS
  include Serializers::OpenURL

  # Return a document (just bibliographic data) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Look up the document with ID '1234567890abcdef1234'
  #   doc = Document.find('1234567890abcdef1234')
  def self.find(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene' }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # Return a document (bibliographic data and full text) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested, including full text
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Get the full tet of the document with ID '1234567890abcdef1234'
  #   text = Document.find_with_fulltext('1234567890abcdef1234').fulltext
  def self.find_with_fulltext(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene',
                                                     tv: 'true',
                                                     fl: Solr::Connection::DEFAULT_FIELDS_FULLTEXT }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # @return [String] the starting page of this document, if it can be parsed
  def start_page
    return '' if pages.blank?
    pages.split('-')[0]
  end

  # @return [String] the ending page of this document, if it can be parsed
  def end_page
    return '' if pages.blank?
    parts = pages.split('-')
    return '' if parts.length <= 1

    spage = parts[0]
    epage = parts[-1]

    # Check for range strings like "1442-7"
    if spage.length > epage.length
      ret = spage
      ret[-epage.length..-1] = epage
    else
      ret = epage
    end
    ret
  end

  # Set all attributes and create author lists
  #
  # This constructor copies in all attributes, as well as splitting the
  # +authors+ value into +author_list+ and +formatted_author_list+.
  #
  # @api public
  # @param [Hash] attributes attributes for this document
  def initialize(attributes = {})
    super

    # Split out the author list and format it
    self.author_list = authors.split(',').map { |a| a.strip } unless authors.nil?
    unless author_list.nil?
      self.formatted_author_list = author_list.map do |a|
        BibTeX::Names.parse(a)[0]
      end
    end
  end
end

#volumeString

Returns the journal volume number in which this document was published

Returns:

  • (String)

    the journal volume number in which this document was published



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'app/models/document.rb', line 98

class Document
  # Make this class act like an ActiveRecord model, though it's not backed by
  # the database (it's in Solr).
  include ActiveModel::Model

  attr_accessor :shasum, :doi, :license, :license_url, :authors,
                :author_list, :formatted_author_list, :title, :journal,
                :year, :volume, :number, :pages, :fulltext, :term_vectors

  # The shasum attribute is the only required one
  validates :shasum, presence: true
  validates :shasum, length: { is: 40 }
  validates :shasum, format: { with: /\A[a-fA-F\d]+\z/ }

  class << self
    # Registration for all serializers
    #
    # This variable is a hash of hashes.  For its format, see the documentation
    # for register_serializer.
    #
    # @api public
    # @return [Hash] the serializer registry
    # @example See if there is a serializer loaded for JSON
    #   Document.serializers.has_key? :json
    attr_accessor :serializers

    # Register a serializer
    #
    # @api public
    # @return [undefined]
    # @param [Symbol] key the MIME type key for this serializer, as defined
    #   in config/initializers/mime_types.rb
    # @param [String] name the human-readable name of this serializer format
    # @param [Proc] method a method which accepts a Document object as a
    #   parameter and returns the serialized document as a String
    # @param [String] docs a URL pointing to documentation for this method
    # @example Register a serializer for JSON
    #   Document.register_serializer (:json,
    #                                 'JSON',
    #                                 lambda { |doc| doc.to_json },
    #                                 'http://www.json.org/')
    def register_serializer(key, name, method, docs)
      Document.serializers ||= { }
      Document.serializers[key] = { name: name, method: method, docs: docs }
    end
  end

  # Serialization methods
  include Serializers::BibTex
  include Serializers::CSL
  include Serializers::EndNote
  include Serializers::MARC
  include Serializers::MODS
  include Serializers::RDF
  include Serializers::RIS
  include Serializers::OpenURL

  # Return a document (just bibliographic data) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Look up the document with ID '1234567890abcdef1234'
  #   doc = Document.find('1234567890abcdef1234')
  def self.find(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene' }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # Return a document (bibliographic data and full text) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested, including full text
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Get the full tet of the document with ID '1234567890abcdef1234'
  #   text = Document.find_with_fulltext('1234567890abcdef1234').fulltext
  def self.find_with_fulltext(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene',
                                                     tv: 'true',
                                                     fl: Solr::Connection::DEFAULT_FIELDS_FULLTEXT }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # @return [String] the starting page of this document, if it can be parsed
  def start_page
    return '' if pages.blank?
    pages.split('-')[0]
  end

  # @return [String] the ending page of this document, if it can be parsed
  def end_page
    return '' if pages.blank?
    parts = pages.split('-')
    return '' if parts.length <= 1

    spage = parts[0]
    epage = parts[-1]

    # Check for range strings like "1442-7"
    if spage.length > epage.length
      ret = spage
      ret[-epage.length..-1] = epage
    else
      ret = epage
    end
    ret
  end

  # Set all attributes and create author lists
  #
  # This constructor copies in all attributes, as well as splitting the
  # +authors+ value into +author_list+ and +formatted_author_list+.
  #
  # @api public
  # @param [Hash] attributes attributes for this document
  def initialize(attributes = {})
    super

    # Split out the author list and format it
    self.author_list = authors.split(',').map { |a| a.strip } unless authors.nil?
    unless author_list.nil?
      self.formatted_author_list = author_list.map do |a|
        BibTeX::Names.parse(a)[0]
      end
    end
  end
end

#yearString

Returns the year in which this document was published

Returns:

  • (String)

    the year in which this document was published



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'app/models/document.rb', line 98

class Document
  # Make this class act like an ActiveRecord model, though it's not backed by
  # the database (it's in Solr).
  include ActiveModel::Model

  attr_accessor :shasum, :doi, :license, :license_url, :authors,
                :author_list, :formatted_author_list, :title, :journal,
                :year, :volume, :number, :pages, :fulltext, :term_vectors

  # The shasum attribute is the only required one
  validates :shasum, presence: true
  validates :shasum, length: { is: 40 }
  validates :shasum, format: { with: /\A[a-fA-F\d]+\z/ }

  class << self
    # Registration for all serializers
    #
    # This variable is a hash of hashes.  For its format, see the documentation
    # for register_serializer.
    #
    # @api public
    # @return [Hash] the serializer registry
    # @example See if there is a serializer loaded for JSON
    #   Document.serializers.has_key? :json
    attr_accessor :serializers

    # Register a serializer
    #
    # @api public
    # @return [undefined]
    # @param [Symbol] key the MIME type key for this serializer, as defined
    #   in config/initializers/mime_types.rb
    # @param [String] name the human-readable name of this serializer format
    # @param [Proc] method a method which accepts a Document object as a
    #   parameter and returns the serialized document as a String
    # @param [String] docs a URL pointing to documentation for this method
    # @example Register a serializer for JSON
    #   Document.register_serializer (:json,
    #                                 'JSON',
    #                                 lambda { |doc| doc.to_json },
    #                                 'http://www.json.org/')
    def register_serializer(key, name, method, docs)
      Document.serializers ||= { }
      Document.serializers[key] = { name: name, method: method, docs: docs }
    end
  end

  # Serialization methods
  include Serializers::BibTex
  include Serializers::CSL
  include Serializers::EndNote
  include Serializers::MARC
  include Serializers::MODS
  include Serializers::RDF
  include Serializers::RIS
  include Serializers::OpenURL

  # Return a document (just bibliographic data) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Look up the document with ID '1234567890abcdef1234'
  #   doc = Document.find('1234567890abcdef1234')
  def self.find(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene' }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # Return a document (bibliographic data and full text) by SHA-1 checksum
  #
  # @api public
  # @param [String] shasum SHA-1 checksum of the document to be retrieved
  # @param [Hash] options see +Solr::Connection.search+ for specification
  # @return [Document] the document requested, including full text
  # @raise [Solr::ConnectionError] thrown if there is an error querying Solr
  # @raise [ActiveRecord::RecordNotFound] thrown if no matching document can
  #   be found
  # @example Get the full tet of the document with ID '1234567890abcdef1234'
  #   text = Document.find_with_fulltext('1234567890abcdef1234').fulltext
  def self.find_with_fulltext(shasum, options = {})
    result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                     defType: 'lucene',
                                                     tv: 'true',
                                                     fl: Solr::Connection::DEFAULT_FIELDS_FULLTEXT }))
    fail ActiveRecord::RecordNotFound if result.num_hits != 1
    result.documents[0]
  end

  # @return [String] the starting page of this document, if it can be parsed
  def start_page
    return '' if pages.blank?
    pages.split('-')[0]
  end

  # @return [String] the ending page of this document, if it can be parsed
  def end_page
    return '' if pages.blank?
    parts = pages.split('-')
    return '' if parts.length <= 1

    spage = parts[0]
    epage = parts[-1]

    # Check for range strings like "1442-7"
    if spage.length > epage.length
      ret = spage
      ret[-epage.length..-1] = epage
    else
      ret = epage
    end
    ret
  end

  # Set all attributes and create author lists
  #
  # This constructor copies in all attributes, as well as splitting the
  # +authors+ value into +author_list+ and +formatted_author_list+.
  #
  # @api public
  # @param [Hash] attributes attributes for this document
  def initialize(attributes = {})
    super

    # Split out the author list and format it
    self.author_list = authors.split(',').map { |a| a.strip } unless authors.nil?
    unless author_list.nil?
      self.formatted_author_list = author_list.map do |a|
        BibTeX::Names.parse(a)[0]
      end
    end
  end
end

Class Method Details

.find(shasum, options = {}) ⇒ Document

Return a document (just bibliographic data) by SHA-1 checksum

Examples:

Look up the document with ID '1234567890abcdef1234'

doc = Document.find('1234567890abcdef1234')

Parameters:

  • shasum (String)

    SHA-1 checksum of the document to be retrieved

  • options (Hash) (defaults to: {})

    see Solr::Connection.search for specification

Returns:

Raises:

  • (Solr::ConnectionError)

    thrown if there is an error querying Solr

  • (ActiveRecord::RecordNotFound)

    thrown if no matching document can be found



166
167
168
169
170
171
# File 'app/models/document.rb', line 166

def self.find(shasum, options = {})
  result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                   defType: 'lucene' }))
  fail ActiveRecord::RecordNotFound if result.num_hits != 1
  result.documents[0]
end

.find_with_fulltext(shasum, options = {}) ⇒ Document

Return a document (bibliographic data and full text) by SHA-1 checksum

Examples:

Get the full tet of the document with ID '1234567890abcdef1234'

text = Document.find_with_fulltext('1234567890abcdef1234').fulltext

Parameters:

  • shasum (String)

    SHA-1 checksum of the document to be retrieved

  • options (Hash) (defaults to: {})

    see Solr::Connection.search for specification

Returns:

  • (Document)

    the document requested, including full text

Raises:

  • (Solr::ConnectionError)

    thrown if there is an error querying Solr

  • (ActiveRecord::RecordNotFound)

    thrown if no matching document can be found



184
185
186
187
188
189
190
191
# File 'app/models/document.rb', line 184

def self.find_with_fulltext(shasum, options = {})
  result = Solr::Connection.search(options.merge({ q: "shasum:#{shasum}",
                                                   defType: 'lucene',
                                                   tv: 'true',
                                                   fl: Solr::Connection::DEFAULT_FIELDS_FULLTEXT }))
  fail ActiveRecord::RecordNotFound if result.num_hits != 1
  result.documents[0]
end

.register_serializer(key, name, method, docs) ⇒ undefined

Register a serializer

Examples:

Register a serializer for JSON

Document.register_serializer (:json,
                              'JSON',
                              lambda { |doc| doc.to_json },
                              'http://www.json.org/')

Parameters:

  • key (Symbol)

    the MIME type key for this serializer, as defined in config/initializers/mime_types.rb

  • name (String)

    the human-readable name of this serializer format

  • method (Proc)

    a method which accepts a Document object as a parameter and returns the serialized document as a String

  • docs (String)

    a URL pointing to documentation for this method

Returns:

  • (undefined)


139
140
141
142
# File 'app/models/document.rb', line 139

def register_serializer(key, name, method, docs)
  Document.serializers ||= { }
  Document.serializers[key] = { name: name, method: method, docs: docs }
end

Instance Method Details

#end_pageString

Returns the ending page of this document, if it can be parsed

Returns:

  • (String)

    the ending page of this document, if it can be parsed



200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
# File 'app/models/document.rb', line 200

def end_page
  return '' if pages.blank?
  parts = pages.split('-')
  return '' if parts.length <= 1

  spage = parts[0]
  epage = parts[-1]

  # Check for range strings like "1442-7"
  if spage.length > epage.length
    ret = spage
    ret[-epage.length..-1] = epage
  else
    ret = epage
  end
  ret
end

#start_pageString

Returns the starting page of this document, if it can be parsed

Returns:

  • (String)

    the starting page of this document, if it can be parsed



194
195
196
197
# File 'app/models/document.rb', line 194

def start_page
  return '' if pages.blank?
  pages.split('-')[0]
end