Class: HexaPDF::Revisions

Inherits:

Object

Object
HexaPDF::Revisions

show all

Includes:: Enumerable

Defined in:: lib/hexapdf/revisions.rb

Overview

Manages the revisions of a PDF document.

A PDF document has one revision when it is created. Later, new revisions are added when changes are made. This allows for adding information/content to a PDF file without changing the original content.

The order of the revisions is important. In HexaPDF the oldest revision always has index 0 and the newest revision the highest index. This is also the order in which the revisions get written.

See: PDF1.7 s7.5.6, HexaPDF::Revision

Instance Attribute Summary collapse

#parser ⇒ Object readonly

The Parser instance used for reading the initial revisions.

Class Method Summary collapse

.from_io(document, io) ⇒ Object

Loads all revisions for the document from the given IO and returns the created Revisions object.

Instance Method Summary collapse

#add ⇒ Object

Adds a new empty revision to the document and returns it.
#current ⇒ Object

Returns the current revision.
#delete(index_or_rev) ⇒ Object

:call-seq: revisions.delete(index) -> rev or nil revisions.delete(oid) -> rev or nil.
#each(&block) ⇒ Object

:call-seq: revisions.each {|rev| block } -> revisions revisions.each -> Enumerator.
#initialize(document, initial_revisions: nil, parser: nil) ⇒ Revisions constructor

Creates a new revisions object for the given PDF document.
#merge(range = 0..-1)) ⇒ Object

:call-seq: revisions.merge(range = 0..-1) -> revisions.
#revision(index) ⇒ Object (also: #[])

Returns the revision at the specified index.
#size ⇒ Object

Returns the number of HexaPDF::Revision objects managed by this object.

Constructor Details

#initialize(document, initial_revisions: nil, parser: nil) ⇒ `Revisions`

Creates a new revisions object for the given PDF document.

Options:

initial_revisions: An array of revisions that should initially be used. If this option is not specified, a single empty revision is added.
parser: The parser with which the initial revisions were read. If this option is not specified even though the document was read from an IO stream, some parts may not work, like incremental writing.

# File 'lib/hexapdf/revisions.rb', line 124

def initialize(document, initial_revisions: nil, parser: nil)
  @document = document
  @parser = parser

  @revisions = []
  if initial_revisions
    @revisions += initial_revisions
  else
    add
  end
end

Instance Attribute Details

#parser ⇒ `Object` (readonly)

The Parser instance used for reading the initial revisions.



110
111
112

# File 'lib/hexapdf/revisions.rb', line 110

def parser
  @parser
end

Class Method Details

.from_io(document, io) ⇒ `Object`

Loads all revisions for the document from the given IO and returns the created Revisions object.

If the io object is nil, an empty Revisions object is returned.

# File 'lib/hexapdf/revisions.rb', line 63

def from_io(document, io)
  return new(document) if io.nil?

  parser = Parser.new(io, document)
  object_loader = lambda {|xref_entry| parser.load_object(xref_entry) }

  revisions = []
  begin
    xref_section, trailer = parser.load_revision(parser.startxref_offset)
    revisions << Revision.new(document.wrap(trailer, type: :XXTrailer),
                              xref_section: xref_section, loader: object_loader)
    seen_xref_offsets = {parser.startxref_offset => true}

    while (prev = revisions[0].trailer.value[:Prev]) &&
        !seen_xref_offsets.key?(prev)
      # PDF1.7 s7.5.5 states that :Prev needs to be indirect, Adobe's reference 3.4.4 says it
      # should be direct. Adobe's POV is followed here. Same with :XRefStm.
      xref_section, trailer = parser.load_revision(prev)
      seen_xref_offsets[prev] = true

      stm = revisions[0].trailer.value[:XRefStm]
      if stm && !seen_xref_offsets.key?(stm)
        stm_xref_section, = parser.load_revision(stm)
        xref_section.merge!(stm_xref_section)
        seen_xref_offsets[stm] = true
      end

      revisions.unshift(Revision.new(document.wrap(trailer, type: :XXTrailer),
                                     xref_section: xref_section, loader: object_loader))
    end
  rescue HexaPDF::MalformedPDFError
    reconstructed_revision = parser.reconstructed_revision
    unless revisions.empty?
      reconstructed_revision.trailer.data.value = revisions.last.trailer.data.value
    end
    revisions << reconstructed_revision
  end

  document.version = parser.file_header_version rescue '1.0'
  new(document, initial_revisions: revisions, parser: parser)
end

Instance Method Details

#add ⇒ `Object`

Adds a new empty revision to the document and returns it.

# File 'lib/hexapdf/revisions.rb', line 153

def add
  if @revisions.empty?
    trailer = {}
  else
    trailer = current.trailer.value.dup
    trailer.delete(:Prev)
    trailer.delete(:XRefStm)
  end

  rev = Revision.new(@document.wrap(trailer, type: :XXTrailer))
  @revisions.push(rev)
  rev
end

#current ⇒ `Object`

Returns the current revision.



143
144
145

# File 'lib/hexapdf/revisions.rb', line 143

def current
  @revisions.last
end

#delete(index_or_rev) ⇒ `Object`

:call-seq:

revisions.delete(index)    -> rev or nil
revisions.delete(oid)      -> rev or nil

Deletes a revision from the document, either by index or by specifying the revision object itself.

Returns the deleted revision object, or nil if the index was out of range or no matching revision was found.

Regarding the index: The oldest revision has index 0 and the current revision the highest index!

# File 'lib/hexapdf/revisions.rb', line 179

def delete(index_or_rev)
  if @revisions.length == 1
    raise HexaPDF::Error, "A document must have a least one revision, can't delete last one"
  elsif index_or_rev.kind_of?(Integer)
    @revisions.delete_at(index_or_rev)
  else
    @revisions.delete(index_or_rev)
  end
end

#each(&block) ⇒ `Object`

:call-seq:

revisions.each {|rev| block }   -> revisions
revisions.each                  -> Enumerator

Iterates over all revisions from oldest to current one.

# File 'lib/hexapdf/revisions.rb', line 214

def each(&block)
  return to_enum(__method__) unless block_given?
  @revisions.each(&block)
  self
end

#merge(range = 0..-1)) ⇒ `Object`

:call-seq:

revisions.merge(range = 0..-1)    -> revisions

Merges the revisions specified by the given range into one. Objects from newer revisions overwrite those from older ones.

# File 'lib/hexapdf/revisions.rb', line 194

def merge(range = 0..-1)
  @revisions[range].reverse.each_cons(2) do |rev, prev_rev|
    prev_rev.trailer.value.replace(rev.trailer.value)
    rev.each do |obj|
      if obj.data != prev_rev.object(obj)&.data
        prev_rev.delete(obj.oid, mark_as_free: false)
        prev_rev.add(obj)
      end
    end
  end
  _first, *other = *@revisions[range]
  other.each {|rev| @revisions.delete(rev) }
  self
end

#revision(index) ⇒ `Object` Also known as: []

Returns the revision at the specified index.



137
138
139

# File 'lib/hexapdf/revisions.rb', line 137

def revision(index)
  @revisions[index]
end

#size ⇒ `Object`

Returns the number of HexaPDF::Revision objects managed by this object.



148
149
150

# File 'lib/hexapdf/revisions.rb', line 148

def size
  @revisions.size
end

Class: HexaPDF::Revisions

Overview

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(document, initial_revisions: nil, parser: nil) ⇒ Revisions

Instance Attribute Details

#parser ⇒ Object (readonly)

Class Method Details

.from_io(document, io) ⇒ Object

Instance Method Details

#add ⇒ Object

#current ⇒ Object

#delete(index_or_rev) ⇒ Object

#each(&block) ⇒ Object

#merge(range = 0..-1)) ⇒ Object

#revision(index) ⇒ Object Also known as: []

#size ⇒ Object

#initialize(document, initial_revisions: nil, parser: nil) ⇒ `Revisions`

#parser ⇒ `Object` (readonly)

.from_io(document, io) ⇒ `Object`

#add ⇒ `Object`

#current ⇒ `Object`

#delete(index_or_rev) ⇒ `Object`

#each(&block) ⇒ `Object`

#merge(range = 0..-1)) ⇒ `Object`

#revision(index) ⇒ `Object` Also known as: []

#size ⇒ `Object`