Class: ActiveFedora::OmDatastream

Inherits:

Datastream

Object
Rubydora::Datastream
Datastream
ActiveFedora::OmDatastream

show all

Includes:: OM::XML::Document, OM::XML::TerminologyBasedSolrizer

Defined in:: lib/active_fedora/om_datastream.rb

Direct Known Subclasses

NokogiriDatastream, QualifiedDublinCoreDatastream, SimpleDatastream

Instance Attribute Summary collapse

#internal_solr_doc ⇒ Object

Returns the value of attribute internal_solr_doc.

Attributes inherited from Datastream

#digital_object, #last_modified

Class Method Summary collapse

.default_attributes ⇒ Object
.from_xml(xml, tmpl = nil) ⇒ Object

Create an instance of this class based on xml content Careful! If you call this from a constructor, be sure to provide something ‘ie.
.xml_template ⇒ Object

Instance Method Summary collapse

#autocreate? ⇒ Boolean
#content=(new_content) ⇒ Object
#content_changed? ⇒ Boolean
#datastream_content ⇒ Object
#find_by_terms(*termpointer) ⇒ Object
#from_solr(solr_doc) ⇒ Object

** Experimental **.
#generate_solr_symbol(base, data_type) ⇒ Object
#get_values(field_key, default = []) ⇒ Object
#get_values_from_solr(*term_pointer) ⇒ Array

** Experimental ** This method is called by get_values if this datastream has been initialized by calling from_solr method via ActiveFedora::Base.load_instance_from_solr.
#has_solr_name?(name, solr_doc = Hash.new) ⇒ Boolean

** Experimental **.
#is_hierarchical_term_pointer?(*term_pointer) ⇒ Boolean

** Experimental ** ====Example: [:image, :title_set=>1, :title] return true [:image, :title_set, :title] return false.
#local_or_remote_content(ensure_fetch = true) ⇒ Object
#metadata? ⇒ Boolean

Indicates that this datastream has metadata content.
#ng_xml ⇒ Object
#ng_xml=(new_xml) ⇒ Object
#ng_xml_changed? ⇒ Boolean

don’t want content eagerly loaded by proxy, so implementing methods that would be implemented by define_attribute_methods.
#ng_xml_doesnt_change! ⇒ Object
#ng_xml_will_change! ⇒ Object

don’t want content eagerly loaded by proxy, so implementing methods that would be implemented by define_attribute_methods.
#om_term_values ⇒ Object
#om_update_values ⇒ Object
#term_values(*term_pointer) ⇒ Object

override OM::XML::term_values so can lazy load from solr if this datastream initialized using from_solr.
#to_xml(xml = nil) ⇒ Object
#update_indexed_attributes(params = {}, opts = {}) ⇒ Object

Update field values within the current datastream using #update_values, which is a wrapper for OM::TermValueOperators#update_values Ignores any fields from params that this datastream’s Terminology doesn’t recognize .
#update_values(params = {}) ⇒ Object

Update values in the datastream’s xml This wraps OM::TermValueOperators#update_values so that returns an error if we have loaded from solr since datastreams loaded that way should be read-only.
#xml_loaded ⇒ Object

Methods inherited from Datastream

#create, #initialize, #inspect, #profile_from_hash, #save, #serialize!, #solrize_profile, #to_param, #to_solr, #validate_content_present

Constructor Details

This class inherits a constructor from ActiveFedora::Datastream

Instance Attribute Details

#internal_solr_doc ⇒ `Object`

Returns the value of attribute internal_solr_doc.



20
21
22

# File 'lib/active_fedora/om_datastream.rb', line 20

def internal_solr_doc
  @internal_solr_doc
end

Class Method Details

.default_attributes ⇒ `Object`



22
23
24

# File 'lib/active_fedora/om_datastream.rb', line 22

def self.default_attributes
  super.merge(:controlGroup => 'M', :mimeType => 'text/xml')
end

.from_xml(xml, tmpl = nil) ⇒ `Object`

Create an instance of this class based on xml content Careful! If you call this from a constructor, be sure to provide something ‘ie. self’ as the @tmpl. Otherwise, you will get an infinite loop!

# File 'lib/active_fedora/om_datastream.rb', line 30

def self.from_xml(xml, tmpl=nil)
  tmpl = self.new if tmpl.nil?  ## This path is used only for unit testing (e.g. MarpaDCDatastream.from_xml(fixture("data.xml")) )

  if !xml.present?
    tmpl.ng_xml = self.xml_template
  elsif xml.kind_of? Nokogiri::XML::Node || xml.kind_of?(Nokogiri::XML::Document)
    tmpl.ng_xml = xml
  else
    tmpl.ng_xml = Nokogiri::XML::Document.parse(xml)
  end

  tmpl.ng_xml_doesnt_change!

  return tmpl
end

.xml_template ⇒ `Object`



46
47
48

# File 'lib/active_fedora/om_datastream.rb', line 46

def self.xml_template
  Nokogiri::XML::Document.parse("<xml/>")
end

Instance Method Details

#autocreate? ⇒ `Boolean`



103
104
105

# File 'lib/active_fedora/om_datastream.rb', line 103

def autocreate?
  changed_attributes.has_key? :profile
end

#content=(new_content) ⇒ `Object`

# File 'lib/active_fedora/om_datastream.rb', line 111

def content=(new_content)
  ng_xml_will_change! unless EquivalentXml.equivalent?(datastream_content, new_content)
  @ng_xml = Nokogiri::XML::Document.parse(new_content)
  super(@ng_xml.to_s)
end

#content_changed? ⇒ `Boolean`

# File 'lib/active_fedora/om_datastream.rb', line 117

def content_changed?
  return false if !xml_loaded
  super
end

#datastream_content ⇒ `Object`



107
108
109

# File 'lib/active_fedora/om_datastream.rb', line 107

def datastream_content
  @datastream_content ||= Nokogiri::XML(super).to_xml  {|config| config.no_declaration}.strip
end

#find_by_terms(*termpointer) ⇒ `Object`



376
377
378

# File 'lib/active_fedora/om_datastream.rb', line 376

def find_by_terms(*termpointer)
  super
end

#from_solr(solr_doc) ⇒ `Object`

** Experimental **

This method is called by ActiveFedora::Base.load_instance_from_solr in order to initialize a nokogiri datastreams values from a solr document. This method merely sets the internal_solr_doc to the document passed in. Then any calls to get_values get values from the solr document on demand instead of directly from the xml stored in Fedora. This should be used for read-only purposes only, and instances where you want to improve performance by getting data from solr instead of Fedora.

See ActiveFedora::Base.load_instance_from_solr and get_values_from_solr for more information.

# File 'lib/active_fedora/om_datastream.rb', line 156

def from_solr(solr_doc)
  #just initialize internal_solr_doc since any value retrieval will be done via lazy loading on this doc on-demand
  @internal_solr_doc = solr_doc
end

#generate_solr_symbol(base, data_type) ⇒ `Object`



294
295
296

# File 'lib/active_fedora/om_datastream.rb', line 294

def generate_solr_symbol(base, data_type)
  ActiveFedora::SolrService.solr_name(base.to_sym, type: data_type)
end

#get_values(field_key, default = []) ⇒ `Object`



371
372
373

# File 'lib/active_fedora/om_datastream.rb', line 371

def get_values(field_key,default=[])
  term_values(*field_key)
end

#get_values_from_solr(*term_pointer) ⇒ `Array`

** Experimental ** This method is called by get_values if this datastream has been initialized by calling from_solr method via ActiveFedora::Base.load_instance_from_solr. This method retrieves values from a preinitialized @internal_solr_doc instead of xml. This makes the datastream read-only and this method is not intended to be used in any other case.

Values are retrieved from the @internal_solr_doc on-demand instead of via xml preloaded into memory.

A term_pointer is passed in and if it contains hierarchical indexes it will detect which solr field values need to be returned.

Example 1 (non-hierarchical term_pointer):

term_pointer = [:image, :title_set, :title]

Returns value of "image_title_set_title_t" in @internal_solr_doc

Example 2 (hierarchical term_pointer that contains one or more indexes):

term_pointer = [:image, {:title_set=>1}, :title]

relevant xml:  
      <image>
        <title_set>
          <title>Title 1</title>
        </title_set>
      </image>
      <image>
        <title_set>
          <title>Title 2</title>
        </title_set>
        <title_set>
          <title>Title 3</title>
        </title_set>
      </image>

Repeating element nodes are indexed and will be stored in solr as follows:
  image_0_title_set_0_title_t = "Title 1"
  image_1_title_set_0_title_t = "Title 2"
  image_1_title_set_1_title_t = "Title 3"

Even though no image element index is specified, only the second image element has two title_set elements so the expected return value is
  ["Title 3"]

While loading from solr the xml hierarchy is not immediately apparent so we must detect first how many image elements with a title_set element exist
and then check which of those elements have a second title element.

As this nokogiri datastream is indexed in solr, a value at each level in the tree will be stored independently and therefore 
if 'image_0_title_set_0_title_t' exists in solr 'image_0_title_set_t' will also exist in solr.  
So, we will build up the relevant solr names incrementally for a given term_pointer.  The last element in the
solr_name will not contain an index.

It then will do the following:
  Because no index is supplied for :image it will detect which indexes exist in solr
     image_0_title_set_t   (found key and add 'image_0_title_set' to base solr_name list)
     image_1_title_set_t   (found key and add 'image_0_title_set' to base solr_name list)
     image_2_title_set_t   (not found and stop checking indexes for image)
  After iteration 1:
     bases = ["image_0_title_set","image_1_title_set"]

  Two image nodes were found and next sees index of 1 supplied for title_set so just uses index of 1 building off bases found in previous iteration
     image_0_title_set_1_title_t (not found remove 'image_0_title_set' from base solr_name list)
     image_1_title_set_1_title_t (found and replace 'image_1_title_set' with new base 'image_1_title_set_1_title') 

  After iteration 2:
     bases = ["image_1_title_set_1_title"]
  It always looks ahead one element so we check if any elements are after title.  There are not any other elements so we are done iterating.
     returns @internal_solr_doc["image_1_title_set_1_title_t"]

# File 'lib/active_fedora/om_datastream.rb', line 228

def get_values_from_solr(*term_pointer)
  values = []
  solr_doc = @internal_solr_doc
  return values if solr_doc.nil?
  term = self.class.terminology.retrieve_term(*OM.pointers_to_flat_array(term_pointer, false))
  #check if hierarchical term pointer
  if is_hierarchical_term_pointer?(*term_pointer)
     # if we are hierarchical need to detect all possible node values that exist
     # we do this by building up the possible solr names parent by parent and/or child by child
     # if an index is supplied for any node in the pointer it will be used
     # otherwise it will include all nodes and indexes that exist in solr
     bases = []
     #add first item in term_pointer as start of bases
     # then iterate through possible nodes that might exist
     term_pointer.first.kind_of?(Hash) ? bases << term_pointer.first.keys.first : bases << term_pointer.first
     for i in 1..(term_pointer.length-1)
       #iterate in reverse so that we can modify the bases array while iterating
       (bases.length-1).downto(0) do |j|
         current_last = (term_pointer[i].kind_of?(Hash) ? term_pointer[i].keys.first : term_pointer[i])
         if (term_pointer[i-1].kind_of?(Hash))
           #just use index supplied instead of trying possibilities
           index = term_pointer[i-1].values.first
           solr_name_base = OM::XML::Terminology.term_hierarchical_name({bases[j]=>index},current_last)
           solr_name = generate_solr_symbol(solr_name_base, term.type)
           bases.delete_at(j)
           #insert the new solr name base if found
           bases.insert(j,solr_name_base) if has_solr_name?(solr_name,solr_doc)
         else
           #detect how many nodes exist
           index = 0
           current_base = bases[j]
           bases.delete_at(j)
           solr_name_base = OM::XML::Terminology.term_hierarchical_name({current_base=>index},current_last)
           solr_name = generate_solr_symbol(solr_name_base, term.type)
           #check for indexes that exist until we find all nodes
           while has_solr_name?(solr_name,solr_doc) do
             #only reinsert if it exists
             bases.insert(j,solr_name_base)
             index = index + 1
             solr_name_base = OM::XML::Terminology.term_hierarchical_name({current_base=>index},current_last)
             solr_name = generate_solr_symbol(solr_name_base, term.type)
           end
         end
       end
     end

     #all existing applicable solr_names have been found and we can now grab all values and build up our value array
     bases.each do |base|
       field_name = generate_solr_symbol(base.to_sym, term.type)
       value = (solr_doc[field_name].nil? ? solr_doc[field_name.to_s]: solr_doc[field_name])
       unless value.nil?
         value.is_a?(Array) ? values.concat(value) : values << value
       end
     end
  else
     #this is not hierarchical and we can simply look for the solr name created using the terms without any indexes
     generic_field_name_base = OM::XML::Terminology.term_generic_name(*term_pointer)
     generic_field_name = generate_solr_symbol(generic_field_name_base, term.type)
     value = (solr_doc[generic_field_name].nil? ? solr_doc[generic_field_name.to_s]: solr_doc[generic_field_name])
     unless value.nil?
       value.is_a?(Array) ? values.concat(value) : values << value
     end
  end
  values
end

#has_solr_name?(name, solr_doc = Hash.new) ⇒ `Boolean`

** Experimental **



302
303
304

# File 'lib/active_fedora/om_datastream.rb', line 302

def has_solr_name?(name, solr_doc=Hash.new)
  !solr_doc[name].nil? || !solr_doc[name.to_s].nil?
end

#is_hierarchical_term_pointer?(*term_pointer) ⇒ `Boolean`

** Experimental **

Example:

[:image, {:title_set=>1}, :title] return true
[:image, :title_set, :title]      return false

# File 'lib/active_fedora/om_datastream.rb', line 311

def is_hierarchical_term_pointer?(*term_pointer)
  if term_pointer.length>1
    term_pointer.each do |pointer|
      if pointer.kind_of?(Hash)
        return true
      end
    end
  end
  return false
end

#local_or_remote_content(ensure_fetch = true) ⇒ `Object`

# File 'lib/active_fedora/om_datastream.rb', line 98

def local_or_remote_content(ensure_fetch = true)
  @content = to_xml if ng_xml_changed? || autocreate?
  super
end

#metadata? ⇒ `Boolean`

Indicates that this datastream has metadata content.



94
95
96

# File 'lib/active_fedora/om_datastream.rb', line 94

def metadata?
  true
end

#ng_xml ⇒ `Object`

# File 'lib/active_fedora/om_datastream.rb', line 50

def ng_xml 
  @ng_xml ||= begin
  if new?
    ## Load up the template
    self.class.xml_template
  else
    Nokogiri::XML::Document.parse(datastream_content)
  end
  end
end

#ng_xml=(new_xml) ⇒ `Object`

# File 'lib/active_fedora/om_datastream.rb', line 61

def ng_xml=(new_xml)
  # before we set ng_xml, we load the datastream so we know if the new value differs.
  local_or_remote_content(true)

  case new_xml 
  when Nokogiri::XML::Document
    self.content=new_xml.to_xml
  when  Nokogiri::XML::Node 
    ## Cast a fragment to a document
    self.content=new_xml.to_s
  when String 
    self.content=new_xml
  else
    raise TypeError, "You passed a #{new_xml.class} into the ng_xml of the #{self.dsid} datastream. NokogiriDatastream.ng_xml= only accepts Nokogiri::XML::Document, Nokogiri::XML::Element, Nokogiri::XML::Node, or raw XML (String) as inputs."
  end
end

#ng_xml_changed? ⇒ `Boolean`

don’t want content eagerly loaded by proxy, so implementing methods that would be implemented by define_attribute_methods



88
89
90

# File 'lib/active_fedora/om_datastream.rb', line 88

def ng_xml_changed?
  changed_attributes.has_key? 'ng_xml'
end

#ng_xml_doesnt_change! ⇒ `Object`



83
84
85

# File 'lib/active_fedora/om_datastream.rb', line 83

def ng_xml_doesnt_change!
  changed_attributes.delete('ng_xml')
end

#ng_xml_will_change! ⇒ `Object`

don’t want content eagerly loaded by proxy, so implementing methods that would be implemented by define_attribute_methods



79
80
81

# File 'lib/active_fedora/om_datastream.rb', line 79

def ng_xml_will_change!
  changed_attributes['ng_xml'] = nil
end

#om_term_values ⇒ `Object`

17	# File 'lib/active_fedora/om_datastream.rb', line 17 alias_method(:om_term_values, :term_values)

#om_update_values ⇒ `Object`

18	# File 'lib/active_fedora/om_datastream.rb', line 18 alias_method(:om_update_values, :update_values)

#term_values(*term_pointer) ⇒ `Object`

override OM::XML::term_values so can lazy load from solr if this datastream initialized using from_solr

# File 'lib/active_fedora/om_datastream.rb', line 397

def term_values(*term_pointer)
  if @internal_solr_doc
    #lazy load values from solr on demand
    get_values_from_solr(*term_pointer)
  else
    om_term_values(*term_pointer)
  end
end

#to_xml(xml = nil) ⇒ `Object`

# File 'lib/active_fedora/om_datastream.rb', line 122

def to_xml(xml = nil)
  xml = self.ng_xml if xml.nil?
  ng_xml = self.ng_xml
  if ng_xml.respond_to?(:root) && ng_xml.root.nil? && self.class.respond_to?(:root_property_ref) && !self.class.root_property_ref.nil?
    ng_xml = self.class.generate(self.class.root_property_ref, "")
    if xml.root.nil?
      xml = ng_xml
    end
  end

  unless xml == ng_xml || ng_xml.root.nil?
    if xml.kind_of?(Nokogiri::XML::Document)
        xml.root.add_child(ng_xml.root)
    elsif xml.kind_of?(Nokogiri::XML::Node)
        xml.add_child(ng_xml.root)
    else
        raise "You can only pass instances of Nokogiri::XML::Node into this method.  You passed in #{xml}"
    end
  end
  
  return xml.to_xml {|config| config.no_declaration}.strip
end

#update_indexed_attributes(params = {}, opts = {}) ⇒ `Object`

Update field values within the current datastream using #update_values, which is a wrapper for OM::TermValueOperators#update_values Ignores any fields from params that this datastream’s Terminology doesn’t recognize

Example:

@mods_ds.update_indexed_attributes( {[{":person"=>"0"}, "role"]=>{"0"=>"role1", "1"=>"role2", "2"=>"role3"} })
=> {"person_0_role"=>{"0"=>"role1", "1"=>"role2", "2"=>"role3"}}

@mods_ds.to_xml # (the following is an approximation)
<mods>
  <mods:name type="person">
  <mods:role>
    <mods:roleTerm>role1</mods:roleTerm>
  </mods:role>
  <mods:role>
    <mods:roleTerm>role2</mods:roleTerm>
  </mods:role>
  <mods:role>
    <mods:roleTerm>role3</mods:roleTerm>
  </mods:role>
  </mods:name>
</mods>

# File 'lib/active_fedora/om_datastream.rb', line 347

def update_indexed_attributes(params={}, opts={})    
  if self.class.terminology.nil?
    raise "No terminology is set for this NokogiriDatastream class.  Cannot perform update_indexed_attributes"
  end
  # remove any fields from params that this datastream doesn't recognize    
  # make sure to make a copy of params so not to modify hash that might be passed to other methods
  current_params = params.clone
  current_params.delete_if do |term_pointer,new_values| 
    if term_pointer.kind_of?(String)
      logger.warn "WARNING: #{dsid} ignoring {#{term_pointer.inspect} => #{new_values.inspect}} because #{term_pointer.inspect} is a String (only valid OM Term Pointers will be used).  Make sure your html has the correct field_selector tags in it."
      true
    else
      !self.class.terminology.has_term?(*OM.destringify(term_pointer))
    end
  end

  result = {}
  unless current_params.empty?
    result = update_values( current_params )
  end
  
  return result
end

#update_values(params = {}) ⇒ `Object`

Update values in the datastream’s xml This wraps OM::TermValueOperators#update_values so that returns an error if we have loaded from solr since datastreams loaded that way should be read-only

Examples:

Updating multiple values with a Hash of Term pointers and values

ds.update_values( {[{":person"=>"0"}, "role", "text"]=>{"0"=>"role1", "1"=>"role2", "2"=>"role3"}, [{:person=>1}, :family_name]=>"Andronicus", [{"person"=>"1"},:given_name]=>["Titus"],[{:person=>1},:role,:text]=>["otherrole1","otherrole2"] } )
=> {"person_0_role_text"=>{"0"=>"role1", "1"=>"role2", "2"=>"role3"}, "person_1_role_text"=>{"0"=>"otherrole1", "1"=>"otherrole2"}}

# File 'lib/active_fedora/om_datastream.rb', line 386

def update_values(params={})
  if @internal_solr_doc
    raise "No update performed, this object was initialized via Solr instead of Fedora and is therefore read-only.  Please utilize ActiveFedora::Base.find to first load object via Fedora instead."
  else
    ng_xml_will_change!
    result = om_update_values(params)
    return result
  end
end

#xml_loaded ⇒ `Object`



406
407
408

# File 'lib/active_fedora/om_datastream.rb', line 406

def xml_loaded
  instance_variable_defined? :@ng_xml
end

Class: ActiveFedora::OmDatastream

Direct Known Subclasses

Instance Attribute Summary collapse

Attributes inherited from Datastream

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Datastream

Constructor Details

Instance Attribute Details

#internal_solr_doc ⇒ Object

Class Method Details

.default_attributes ⇒ Object

.from_xml(xml, tmpl = nil) ⇒ Object

.xml_template ⇒ Object

Instance Method Details

#autocreate? ⇒ Boolean

#content=(new_content) ⇒ Object

#content_changed? ⇒ Boolean

#datastream_content ⇒ Object

#find_by_terms(*termpointer) ⇒ Object

#from_solr(solr_doc) ⇒ Object

#generate_solr_symbol(base, data_type) ⇒ Object

#get_values(field_key, default = []) ⇒ Object

#get_values_from_solr(*term_pointer) ⇒ Array

Example 1 (non-hierarchical term_pointer):

Example 2 (hierarchical term_pointer that contains one or more indexes):

#has_solr_name?(name, solr_doc = Hash.new) ⇒ Boolean

#is_hierarchical_term_pointer?(*term_pointer) ⇒ Boolean

Example:

#local_or_remote_content(ensure_fetch = true) ⇒ Object

#metadata? ⇒ Boolean

#ng_xml ⇒ Object

#ng_xml=(new_xml) ⇒ Object

#ng_xml_changed? ⇒ Boolean

#ng_xml_doesnt_change! ⇒ Object

#ng_xml_will_change! ⇒ Object

#om_term_values ⇒ Object

#om_update_values ⇒ Object

#term_values(*term_pointer) ⇒ Object

#to_xml(xml = nil) ⇒ Object

#update_indexed_attributes(params = {}, opts = {}) ⇒ Object

#update_values(params = {}) ⇒ Object

Examples:

Updating multiple values with a Hash of Term pointers and values

#xml_loaded ⇒ Object

#internal_solr_doc ⇒ `Object`

.default_attributes ⇒ `Object`

.from_xml(xml, tmpl = nil) ⇒ `Object`

.xml_template ⇒ `Object`

#autocreate? ⇒ `Boolean`

#content=(new_content) ⇒ `Object`

#content_changed? ⇒ `Boolean`

#datastream_content ⇒ `Object`

#find_by_terms(*termpointer) ⇒ `Object`

#from_solr(solr_doc) ⇒ `Object`

#generate_solr_symbol(base, data_type) ⇒ `Object`

#get_values(field_key, default = []) ⇒ `Object`

#get_values_from_solr(*term_pointer) ⇒ `Array`

#has_solr_name?(name, solr_doc = Hash.new) ⇒ `Boolean`

#is_hierarchical_term_pointer?(*term_pointer) ⇒ `Boolean`

#local_or_remote_content(ensure_fetch = true) ⇒ `Object`

#metadata? ⇒ `Boolean`

#ng_xml ⇒ `Object`

#ng_xml=(new_xml) ⇒ `Object`

#ng_xml_changed? ⇒ `Boolean`

#ng_xml_doesnt_change! ⇒ `Object`

#ng_xml_will_change! ⇒ `Object`

#om_term_values ⇒ `Object`

#om_update_values ⇒ `Object`

#term_values(*term_pointer) ⇒ `Object`

#to_xml(xml = nil) ⇒ `Object`

#update_indexed_attributes(params = {}, opts = {}) ⇒ `Object`

#update_values(params = {}) ⇒ `Object`

#xml_loaded ⇒ `Object`