Class: RDF::Microdata::Reader

Inherits:
Reader
  • Object
show all
Includes:
Expansion, Util::Logger
Defined in:
lib/rdf/microdata/reader.rb,
lib/rdf/microdata/reader/nokogiri.rb

Overview

An Microdata parser in Ruby

Based on processing rules, amended with the following:

Defined Under Namespace

Modules: Nokogiri Classes: Registry

Constant Summary

URL_PROPERTY_ELEMENTS =
%w(a area audio embed iframe img link object source track video)
DEFAULT_REGISTRY =
File.expand_path(File.join(File.dirname(__FILE__), "..", "..", "..", "etc", "registry.json"))

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Expansion

#expand, #rule

Constructor Details

#initialize(input = $stdin, options = {}) {|reader| ... } ⇒ reader

Initializes the Microdata reader instance.

Parameters:

  • input (Nokogiri::HTML::Document, Nokogiri::XML::Document, IO, File, String) (defaults to: $stdin)

    the input stream to read

  • options (Hash{Symbol => Object}) (defaults to: {})

    any additional options

Options Hash (options):

  • :encoding (Encoding) — default: Encoding::UTF_8

    the encoding of the input stream (Ruby 1.9+)

  • :validate (Boolean) — default: false

    whether to validate the parsed statements and values

  • :canonicalize (Boolean) — default: false

    whether to canonicalize parsed literals

  • :intern (Boolean) — default: true

    whether to intern all parsed URIs

  • :base_uri (#to_s) — default: nil

    the base URI to use when resolving relative URIs

  • :registry (#to_s)

Yields:

  • (reader)

    self

Yield Parameters:

  • reader (RDF::Reader)

Yield Returns:

  • (void)

    ignored

Raises:

  • (Error)

    Raises RDF::ReaderError when validating



167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
# File 'lib/rdf/microdata/reader.rb', line 167

def initialize(input = $stdin, options = {}, &block)
  super do
    @library = :nokogiri

    require "rdf/microdata/reader/#{@library}"
    @implementation = Nokogiri
    self.extend(@implementation)

    input.rewind if input.respond_to?(:rewind)
    initialize_html(input, options) rescue log_fatal($!.message, exception: RDF::ReaderError)

    log_error("Empty document") if root.nil?
    log_error(doc_errors.map(&:message).uniq.join("\n")) if !doc_errors.empty?

    log_debug(@doc, "library = #{@library}")

    # Load registry
    begin
      registry_uri = options[:registry] || DEFAULT_REGISTRY
      log_debug(@doc, "registry = #{registry_uri.inspect}")
      Registry.load_registry(registry_uri)
    rescue JSON::ParserError => e
      log_fatal("Failed to parse registry: #{e.message}", exception: RDF::ReaderError) if (root.nil? && validate?)
    end
    
    if block_given?
      case block.arity
        when 0 then instance_eval(&block)
        else block.call(self)
      end
    end
  end
end

Instance Attribute Details

#implementationModule (readonly)

Returns the HTML implementation module for this reader instance.

Returns:

  • (Module)

    Returns the HTML implementation module for this reader instance.



25
26
27
# File 'lib/rdf/microdata/reader.rb', line 25

def implementation
  @implementation
end

Instance Method Details

#base_uriHash{Symbol => RDF::URI}

Returns the base URI determined by this reader.

Examples:

reader.prefixes[:dc]  #=> RDF::URI('http://purl.org/dc/terms/')

Returns:

  • (Hash{Symbol => RDF::URI})

Since:

  • 0.3.0



35
36
37
# File 'lib/rdf/microdata/reader.rb', line 35

def base_uri
  @options[:base_uri]
end

#each_statement {|statement| ... }

This method returns an undefined value.

Iterates the given block for each RDF statement in the input.

Reads to graph and performs expansion if required.

Yields:

  • (statement)

Yield Parameters:

  • statement (RDF::Statement)


209
210
211
212
213
214
215
216
217
218
219
220
221
# File 'lib/rdf/microdata/reader.rb', line 209

def each_statement(&block)
  if block_given?
    @callback = block

    # parse
    parse_whole_document(@doc, base_uri)

    if validate? && log_statistics[:error]
      raise RDF::ReaderError, "Errors found during processing"
    end
  end
  enum_for(:each_statement)
end

#each_triple {|subject, predicate, object| ... }

This method returns an undefined value.

Iterates the given block for each RDF triple in the input.

Yields:

  • (subject, predicate, object)

Yield Parameters:

  • subject (RDF::Resource)
  • predicate (RDF::URI)
  • object (RDF::Value)


231
232
233
234
235
236
237
238
# File 'lib/rdf/microdata/reader.rb', line 231

def each_triple(&block)
  if block_given?
    each_statement do |statement|
      block.call(*statement.to_triple)
    end
  end
  enum_for(:each_triple)
end