Class: RDF::Reader Abstract

Inherits:
Object
  • Object
show all
Extended by:
Enumerable, Util::Aliasing::LateBound
Includes:
Enumerable, Readable, Util::Logger
Defined in:
lib/rdf/reader.rb

Overview

This class is abstract.

The base class for RDF parsers.

Examples:

Loading an RDF reader implementation

require 'rdf/ntriples'

Iterating over known RDF reader classes

RDF::Reader.each { |klass| puts klass.name }

Obtaining an RDF reader class

RDF::Reader.for(:ntriples)     #=> RDF::NTriples::Reader
RDF::Reader.for("etc/doap.nt")
RDF::Reader.for(file_name:      "etc/doap.nt")
RDF::Reader.for(file_extension: "nt")
RDF::Reader.for(content_type:   "application/n-triples")

Instantiating an RDF reader class

RDF::Reader.for(:ntriples).new($stdin) { |reader| ... }

Parsing RDF statements from a file

RDF::Reader.open("etc/doap.nt") do |reader|
  reader.each_statement do |statement|
    puts statement.inspect
  end
end

Parsing RDF statements from a string

data = StringIO.new(File.read("etc/doap.nt"))
RDF::Reader.for(:ntriples).new(data) do |reader|
  reader.each_statement do |statement|
    puts statement.inspect
  end
end

See Also:

Direct Known Subclasses

NTriples::Reader

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Util::Aliasing::LateBound

alias_method

Methods included from Enumerable

#dump, #each_graph, #each_object, #each_predicate, #each_quad, #each_subject, #each_term, #enum_graph, #enum_object, #enum_predicate, #enum_quad, #enum_statement, #enum_subject, #enum_term, #enum_triple, #graph_names, #has_graph?, #has_object?, #has_predicate?, #has_quad?, #has_statement?, #has_subject?, #has_term?, #has_triple?, #invalid?, #method_missing, #objects, #predicates, #project_graph, #quads, #respond_to_missing?, #statements, #subjects, #supports?, #terms, #to_a, #to_hash, #to_set, #triples, #validate!

Methods included from Countable

#count, #empty?

Methods included from Readable

#readable?

Methods included from Util::Logger

#log_debug, #log_depth, #log_error, #log_fatal, #log_info, #log_recover, #log_recovering?, #log_statistics, #log_warn, #logger

Constructor Details

#initialize(input = $stdin, options = {}) {|reader| ... } ⇒ Reader

Initializes the reader.

Options Hash (options):

  • :encoding (Encoding) — default: Encoding::UTF_8

    the encoding of the input stream

  • :validate (Boolean) — default: false

    whether to validate the parsed statements and values

  • :canonicalize (Boolean) — default: false

    whether to canonicalize parsed literals

  • :intern (Boolean) — default: true

    whether to intern all parsed URIs

  • :prefixes (Hash) — default: Hash.new

    the prefix mappings to use (not supported by all readers)

  • :base_uri (#to_s) — default: nil

    the base URI to use when resolving relative URIs (not supported by all readers)

Yields:

  • (reader)

    self

Yield Parameters:

Yield Returns:

  • (void)

    ignored



240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
# File 'lib/rdf/reader.rb', line 240

def initialize(input = $stdin, options = {}, &block)
  @options = options.dup
  @options[:validate]     ||= false
  @options[:canonicalize] ||= false
  @options[:intern]       ||= true
  @options[:prefixes]     ||= Hash.new
  @options[:base_uri]     ||= input.base_uri if input.respond_to?(:base_uri)

  @input = case input
    when String then StringIO.new(input)
    else input
  end

  if block_given?
    case block.arity
      when 0 then instance_eval(&block)
      else block.call(self)
    end
  end
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method in the class RDF::Enumerable

Instance Attribute Details

#optionsHash (readonly)

Any additional options for this reader.

Since:

  • 0.3.0



266
267
268
# File 'lib/rdf/reader.rb', line 266

def options
  @options
end

Class Method Details

.each {|klass| ... } ⇒ Enumerator

Enumerates known RDF reader classes.

Yields:

  • (klass)

Yield Parameters:

  • klass (Class)


52
53
54
# File 'lib/rdf/reader.rb', line 52

def self.each(&block)
  @@subclasses.each(&block)
end

.for(format) ⇒ Class .for(filename) ⇒ Class .for(options = {}) ⇒ Class

Finds an RDF reader class based on the given criteria.

If the reader class has a defined format, use that.

Overloads:

  • .for(format) ⇒ Class

    Finds an RDF reader class based on a symbolic name.

  • .for(filename) ⇒ Class

    Finds an RDF reader class based on a file name.

  • .for(options = {}) ⇒ Class

    Finds an RDF reader class based on various options.

    Options Hash (options):

    • :file_name (String, #to_s) — default: nil
    • :file_extension (Symbol, #to_sym) — default: nil
    • :content_type (String, #to_s) — default: nil
    • :sample (String) — default: nil

      A sample of input used for performing format detection. If we find no formats, or we find more than one, and we have a sample, we can perform format detection to find a specific format to use, in which case we pick the first one we find

    Yield Returns:

    • (String)

      another way to provide a sample, allows lazy for retrieving the sample.



90
91
92
93
94
95
# File 'lib/rdf/reader.rb', line 90

def self.for(options = {}, &block)
  options = options.merge(has_reader: true) if options.is_a?(Hash)
  if format = self.format || Format.for(options, &block)
    format.reader
  end
end

.format(klass = nil) ⇒ Class Also known as: format_class

Retrieves the RDF serialization format class for this reader class.



101
102
103
104
105
106
107
108
109
110
# File 'lib/rdf/reader.rb', line 101

def self.format(klass = nil)
  if klass.nil?
    Format.each do |format|
      if format.reader == self
        return format
      end
    end
    nil # not found
  end
end

.open(filename, format: nil, **options) {|reader| ... } ⇒ Object

Note:

A reader returned via this method may not be readable depending on the processing model of the specific reader, as the file is only open during the scope of open. The reader is intended to be accessed through a block.

Parses input from the given file name or URL.

Examples:

Parsing RDF statements from a file

RDF::Reader.open("etc/doap.nt") do |reader|
  reader.each_statement do |statement|
    puts statement.inspect
  end
end

Yields:

  • (reader)

Yield Parameters:

Yield Returns:

  • (void)

    ignored

Raises:



182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
# File 'lib/rdf/reader.rb', line 182

def self.open(filename, format: nil, **options, &block)
  Util::File.open_file(filename, options) do |file|
    format_options = options.dup
    format_options[:content_type] ||= file.content_type if file.respond_to?(:content_type)
    format_options[:file_name] ||= filename
    options[:encoding] ||= file.encoding if file.respond_to?(:encoding)
    options[:filename] ||= filename
    reader = self.for(format || format_options) do
      # Return a sample from the input file
      sample = file.read(1000)
      file.rewind
      sample
    end
    if reader
      reader.new(file, options, &block)
    else
      raise FormatError, "unknown RDF format: #{format_options.inspect}\nThis may be resolved with a require of the 'linkeddata' gem."
    end
  end
end

.optionsArray<RDF::CLI::Option>

Options suitable for automatic Reader provisioning.



115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
# File 'lib/rdf/reader.rb', line 115

def self.options
  [
    RDF::CLI::Option.new(
      symbol: :canonicalize,
      datatype: TrueClass,
      on: ["--canonicalize"],
      description: "Canonicalize input/output.") {true},
    RDF::CLI::Option.new(
      symbol: :encoding,
      datatype: Encoding,
      on: ["--encoding ENCODING"],
      description: "The encoding of the input stream.") {|arg| Encoding.find arg},
    RDF::CLI::Option.new(
      symbol: :intern,
      datatype: TrueClass,
      on: ["--intern"],
      description: "Intern all parsed URIs.") {true},
    RDF::CLI::Option.new(
      symbol: :prefixes,
      datatype: Hash,
      multiple: true,
      on: ["--prefixes PREFIX,PREFIX"],
      description: "A comma-separated list of prefix:uri pairs.") do |arg|
        arg.split(',').inject({}) do |memo, pfxuri|
          pfx,uri = pfxuri.split(':', 2)
          memo.merge(pfx.to_sym => RDF::URI(uri))
        end
    end,
    RDF::CLI::Option.new(
      symbol: :base_uri,
      datatype: RDF::URI,
      on: ["--uri URI"],
      description: "Base URI of input file, defaults to the filename.") {|arg| RDF::URI(arg)},
    RDF::CLI::Option.new(
      symbol: :validate,
      datatype: TrueClass,
      on: ["--validate"],
      description: "Validate input file.") {true},
  ]
end

.to_symSymbol

Returns a symbol appropriate to use with RDF::Reader.for()



206
207
208
# File 'lib/rdf/reader.rb', line 206

def self.to_sym
  self.format.to_sym
end

Instance Method Details

#base_uriRDF::URI

Returns the base URI determined by this reader.

Examples:

reader.prefixes[:dc]  #=> RDF::URI('http://purl.org/dc/terms/')

Since:

  • 0.3.0



276
277
278
# File 'lib/rdf/reader.rb', line 276

def base_uri
  RDF::URI(@options[:base_uri]) if @options[:base_uri]
end

#canonicalize?Boolean

Returns true if parsed values should be canonicalized.

Since:

  • 0.3.0



529
530
531
# File 'lib/rdf/reader.rb', line 529

def canonicalize?
  @options[:canonicalize]
end

#close Also known as: close!

This method returns an undefined value.

Closes the input stream, after which an IOError will be raised for further read attempts.

If the input stream is already closed, does nothing.



416
417
418
# File 'lib/rdf/reader.rb', line 416

def close
  @input.close unless @input.closed?
end

#each_statement {|statement| ... } #each_statementEnumerator Also known as: each

This method returns an undefined value.

Iterates the given block for each RDF statement.

If no block was given, returns an enumerator.

Statements are yielded in the order that they are read from the input stream.

Overloads:

  • #each_statement {|statement| ... }

    This method returns an undefined value.

    Yields:

    • (statement)

      each statement

    Yield Parameters:

    Yield Returns:

    • (void)

      ignored

Raises:

See Also:



351
352
353
354
355
356
357
358
359
360
# File 'lib/rdf/reader.rb', line 351

def each_statement(&block)
  if block_given?
    begin
      loop { block.call(read_statement) }
    rescue EOFError => e
      rewind rescue nil
    end
  end
  enum_for(:each_statement)
end

#each_triple {|subject, predicate, object| ... } #each_tripleEnumerator

This method returns an undefined value.

Iterates the given block for each RDF triple.

If no block was given, returns an enumerator.

Triples are yielded in the order that they are read from the input stream.

Overloads:

  • #each_triple {|subject, predicate, object| ... }

    This method returns an undefined value.

    Yields:

    • (subject, predicate, object)

      each triple

    Yield Parameters:

    Yield Returns:

    • (void)

      ignored

See Also:



385
386
387
388
389
390
391
392
393
394
# File 'lib/rdf/reader.rb', line 385

def each_triple(&block)
  if block_given?
    begin
      loop { block.call(*read_triple) }
    rescue EOFError => e
      rewind rescue nil
    end
  end
  enum_for(:each_triple)
end

#encodingEncoding

Returns the encoding of the input stream.



504
505
506
507
508
509
510
511
512
513
# File 'lib/rdf/reader.rb', line 504

def encoding
  case @options[:encoding]
  when String, Symbol
    Encoding.find(@options[:encoding].to_s)
  when Encoding
    @options[:encoding]
  else
    @options[:encoding] ||= Encoding.find(self.class.format.content_encoding.to_s)
  end
end

#fail_object (protected)

This method returns an undefined value.

Raises an "expected object" parsing error on the current line.

Raises:



495
496
497
# File 'lib/rdf/reader.rb', line 495

def fail_object
  log_error("Expected object (found: #{current_line.inspect})", lineno: lineno, exception: RDF::ReaderError)
end

#fail_predicate (protected)

This method returns an undefined value.

Raises an "expected predicate" parsing error on the current line.

Raises:



486
487
488
# File 'lib/rdf/reader.rb', line 486

def fail_predicate
  log_error("Expected predicate (found: #{current_line.inspect})", lineno: lineno, exception: RDF::ReaderError)
end

#fail_subject (protected)

This method returns an undefined value.

Raises an "expected subject" parsing error on the current line.

Raises:



477
478
479
# File 'lib/rdf/reader.rb', line 477

def fail_subject
  log_error("Expected subject (found: #{current_line.inspect})", lineno: lineno, exception: RDF::ReaderError)
end

#intern?Boolean

Returns true if parsed URIs should be interned.

Since:

  • 0.3.0



538
539
540
# File 'lib/rdf/reader.rb', line 538

def intern?
  @options[:intern]
end

#linenoInteger

Current line number being processed. For formats that can associate generated Statement with a particular line number from input, this value reflects that line number.



424
425
426
# File 'lib/rdf/reader.rb', line 424

def lineno
  @input.lineno
end

#prefix(name, uri) ⇒ RDF::URI #prefix(name) ⇒ RDF::URI Also known as: prefix!

Defines the given named URI prefix for this reader.

Examples:

Defining a URI prefix

reader.prefix :dc, RDF::URI('http://purl.org/dc/terms/')

Returning a URI prefix

reader.prefix(:dc)    #=> RDF::URI('http://purl.org/dc/terms/')


324
325
326
327
# File 'lib/rdf/reader.rb', line 324

def prefix(name, uri = nil)
  name = name.to_s.empty? ? nil : (name.respond_to?(:to_sym) ? name.to_sym : name.to_s.to_sym)
  uri.nil? ? prefixes[name] : prefixes[name] = uri
end

#prefixesHash{Symbol => RDF::URI}

Returns the URI prefixes currently defined for this reader.

Examples:

reader.prefixes[:dc]  #=> RDF::URI('http://purl.org/dc/terms/')

Since:

  • 0.3.0



288
289
290
# File 'lib/rdf/reader.rb', line 288

def prefixes
  @options[:prefixes] ||= {}
end

#prefixes=(prefixes) ⇒ Hash{Symbol => RDF::URI}

Defines the given URI prefixes for this reader.

Examples:

reader.prefixes = {
  dc: RDF::URI('http://purl.org/dc/terms/'),
}

Since:

  • 0.3.0



303
304
305
# File 'lib/rdf/reader.rb', line 303

def prefixes=(prefixes)
  @options[:prefixes] = prefixes
end

#read_statementRDF::Statement (protected)

This method is abstract.

Reads a statement from the input stream.

Raises:

  • (NotImplementedError)

    unless implemented in subclass



458
459
460
# File 'lib/rdf/reader.rb', line 458

def read_statement
  Statement.new(*read_triple)
end

#read_tripleArray(RDF::Term) (protected)

This method is abstract.

Reads a triple from the input stream.

Raises:

  • (NotImplementedError)

    unless implemented in subclass



468
469
470
# File 'lib/rdf/reader.rb', line 468

def read_triple
  raise NotImplementedError, "#{self.class}#read_triple" # override in subclasses
end

#rewind Also known as: rewind!

This method returns an undefined value.

Rewinds the input stream to the beginning of input.



402
403
404
# File 'lib/rdf/reader.rb', line 402

def rewind
  @input.rewind
end

#to_symSymbol

Returns a symbol appropriate to use with RDF::Reader.for()



213
214
215
# File 'lib/rdf/reader.rb', line 213

def to_sym
  self.class.to_sym
end

#valid?Boolean

Note:

this parses the full input and is valid only in the reader block. Use Reader.new(input, validate: true) if you intend to capture the result.

Examples:

Parsing RDF statements from a file

RDF::NTriples::Reader.new("!!invalid input??") do |reader|
  reader.valid? # => false
end

See Also:



443
444
445
446
447
448
# File 'lib/rdf/reader.rb', line 443

def valid?
  super && !log_statistics[:error]
rescue ArgumentError, RDF::ReaderError => e
  log_error(e.message)
  false
end

#validate?Boolean

Returns true if parsed statements and values should be validated.

Since:

  • 0.3.0



520
521
522
# File 'lib/rdf/reader.rb', line 520

def validate?
  @options[:validate]
end