Class: Jcsv::Dimension

Inherits:
Object
  • Object
show all
Defined in:
lib/dimensions.rb

Overview

Class Dimension keeps track of all data dimensions in a CSV file. A data dimension is similar to a mathematical dimension such as x, y or z. In principle, every data should be associates with only one set of data dimensions. For example, let’s say that our data has an employee ID, then column ID defines a dimension on the data, since every employee has a one ID and every ID is associated with only one employee. As another example, let’s say that we have data about a medical experiment that was done with a set of patients for 4 weeks, which were given either a medicine of a placebo. The data could have columns labeled: “Patient Index”, “Week”, “Type of Medicine”, “Blood Sample”. Some entries would be:

“Patient Index” “Week” “Type of Medicine” “Blood Sample”

1                 1               Placebo                 xxxx
1                 2               Placebo                 xxxx
2                 1               med1                    xxxx
2                 2               med1                    xxxx

“Patient Index”, “Week”, “Type of Medice” are three dimensions of this data and taken together unequivocally define the data, i.e., those dimensions are similar to a DB key. Since this is a key, there should be no other line of data with the same values in the dimensions.

CSV files are not ideal for maintaining dimensions, so, in order to read dimensions in a CSV file, there is the need for some rules.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(dim_name) ⇒ Dimension


dim_name is the dimension name.




67
68
69
70
71
72
73
# File 'lib/dimensions.rb', line 67

def initialize(dim_name)
  @name = dim_name
  @frozen = false
  @next_value = 0
  @max_value = 0
  @labels = Hash.new 
end

Instance Attribute Details

#current_valueObject (readonly)

Returns the value of attribute current_value.



58
59
60
# File 'lib/dimensions.rb', line 58

def current_value
  @current_value
end

#frozenObject (readonly)

Returns the value of attribute frozen.



57
58
59
# File 'lib/dimensions.rb', line 57

def frozen
  @frozen
end

#index(label) ⇒ Object





61
62
63
# File 'lib/dimensions.rb', line 61

def index
  @index
end

#labelsObject (readonly)

Returns the value of attribute labels.



60
61
62
# File 'lib/dimensions.rb', line 60

def labels
  @labels
end

#nameObject (readonly)

Returns the value of attribute name.



56
57
58
# File 'lib/dimensions.rb', line 56

def name
  @name
end

#next_valueObject (readonly)

Returns the value of attribute next_value.



59
60
61
# File 'lib/dimensions.rb', line 59

def next_value
  @next_value
end

Instance Method Details

#[](label) ⇒ Object





158
159
160
# File 'lib/dimensions.rb', line 158

def[](label)
  index(label)
end

#add_label(label) ⇒ Object


Adds a new label to this dimension and keeps track of its index. Labels are indexed starting at 0 and always incrementing. All labels in the dimension are distinct. If trying to add a label that already exists, will:

  • add it if it is a new label and return its index;

  • return the index of an already existing label if the index is non-decreasing and monotonically increasing or if it is back to 0. That is, if the last returned index is 5, then the next index is either 5 or 6 (new label), or 0.

  • If the last returned index is 0, then the dimension becomes frozen and no more labels can be added to it. After this point, add_label has to be called always in the same order that it was called previously.




98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# File 'lib/dimensions.rb', line 98

def add_label(label)
  
  if (@labels.has_key?(label))
    # Just read one more line with the same label.  No problem, keep reading
    if (@labels[label] == @current_value)
      
    elsif (@labels[label] == @next_value)
      # Reading next label
      @current_value = @next_value
      @next_value = (@next_value + 1) % (@max_value + 1)
    elsif (@labels[label] < @current_value && @labels[label] == 0)
      # reached the last label and going back to the first one
      reset
      return true
    else
      # Label read is out of order.  Expected value is either 0 (starting over) or
      # the next value.  Although we raise an exception, we allow the calling method
      # to catch the exception and let the program still run.
      expected_value = (@labels[label] < @current_value)? 0 : @next_value
      reset if @labels[label] < @current_value
      @current_value = @labels[label] + 1
      @next_value = @current_value + 1
      raise "Missing data: next expected label was '#{@labels.key(expected_value)}' but read '#{label}'."
    end
  else
    @current_value = @labels[label] = @next_value
    @next_value += 1
    # Trying to add a label when the dimension is frozen raises an exception
    raise "Dimension '#{@name}' is frozen when adding label '#{label}'." if frozen
  end

  false
  
end

#resetObject





137
138
139
140
141
142
143
144
# File 'lib/dimensions.rb', line 137

def reset
  if !@frozen
    @frozen = true
    @max_value = @current_value
    @current_value = 0
    @next_value = 1
  end
end

#sizeObject Also known as: length





79
80
81
# File 'lib/dimensions.rb', line 79

def size
  @labels.size
end