Class: Fech::Filing

Inherits:
Object
  • Object
show all
Defined in:
lib/fech/filing.rb

Overview

Fech::Filing downloads an Electronic Filing given its ID, and will search rows by row type. Using a child Translator object, the data in each row is automatically mapped at runtime into a labeled Hash. Additional Translations may be added to change the way that data is mapped and cleaned.

Constant Summary collapse

FIRST_V3_FILING =

first filing number using the version >=3.00 format note that there are plenty of <v3 filings after this, so readable? still needs to be checked

11850

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(filing_id, opts = {}) ⇒ Filing

Create a new Filing object, assign the download directory to system’s temp folder by default.

Parameters:

  • download_dir (String)

    override the directory where files should be downloaded.

  • translate (Symbol, Array)

    a list of built-in translation sets to use



22
23
24
25
26
27
28
29
30
# File 'lib/fech/filing.rb', line 22

def initialize(filing_id, opts={})
  @filing_id    = filing_id
  @download_dir = opts[:download_dir] || Dir.tmpdir
  @translator   = Fech::Translator.new(:include => opts[:translate])
  @quote_char   = opts[:quote_char] || '"'
  @csv_parser   = opts[:csv_parser] || Fech::Csv
  @resaved      = false
  @customized   = false
end

Instance Attribute Details

#download_dirObject

Returns the value of attribute download_dir.



15
16
17
# File 'lib/fech/filing.rb', line 15

def download_dir
  @download_dir
end

#filing_idObject

Returns the value of attribute filing_id.



15
16
17
# File 'lib/fech/filing.rb', line 15

def filing_id
  @filing_id
end

#translatorObject

Returns the value of attribute translator.



15
16
17
# File 'lib/fech/filing.rb', line 15

def translator
  @translator
end

Class Method Details

.download_all(download_dir) ⇒ Object

This downloads ALL the filings.

Because this trashes the zip files after extraction (to save space), while it is safe to rerun, it has to do the whole thing over again. Update operations should just iterate single file downloads starting from the current+1th filing number.

This takes a very long time to run - on the order of an hour or two, depending on your bandwidth.

WARNING: As of July 9, 2012, this downloads 536964 files (25.8 GB), into one directory. This means that the download directory will break bash file globbing (so e.g. ls and rm *.fec will not work). If you want to get all of it, make sure to download only to a dedicated FEC filings directory.



51
52
53
54
55
# File 'lib/fech/filing.rb', line 51

def self.download_all download_dir
  `cd #{download_dir} && ftp -a ftp.fec.gov:/FEC/electronic/*.zip`
  `cd #{download_dir} && for z in *.zip; do unzip -o $z && rm $z; done`
  Dir[File.join(download_dir, '*.fec')].count
end

.for_all(options = {}) ⇒ Object

Runs the passed block on every downloaded .fec file. Pass the same options hash as you would to Fech::Filing.new. E.g. for_all(:download_dir => Rails.root.join(‘db’, ‘data’, ‘fec’, ‘filings’, :csv_parser => Fech::CsvDoctor, …) {|filing| … } filing.download is of course unnecessary.

note that if there are a lot of files (e.g. after download_all), just listing them to prepare for this will take several seconds

Special option: :from => integer or :from => range will only process filing #s starting from / within the argument

Raises:

  • (ArgumentError)


64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/fech/filing.rb', line 64

def self.for_all options = {}
  options[:download_dir] ||= Dir.tmpdir
  from = options.delete :from
  raise ArgumentError, ":from must be Integer or Range" if from and !(from.is_a?(Integer) or from.is_a?(Range))
  # .sort{|x| x.scan/\d+/.to_i } # should be no need to spend time on sort, since the file system should already do that
  Dir[File.join(options[:download_dir], '*.fec')].each do |file|
    n = file.scan(/(\d+)\.fec/)[0][0].to_i
    if from.is_a? Integer
      next unless n >= from
    elsif from.is_a? Range
      next unless n.in? from
    end
    yield Fech::Filing.new(n, options)
  end
end

.map_for(row_type, opts = {}) ⇒ Object

Returns the column names for given row type and version in the order they appear in row data.

Parameters:

  • row_type (String, Regexp)

    representation of the row desired

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :version (String, Regexp)

    representation of the version desired



198
199
200
# File 'lib/fech/filing.rb', line 198

def self.map_for(row_type, opts={})
  Fech::Mappings.for_row(row_type, opts)
end

Instance Method Details

#amendment?Boolean

Whether this filing amends a previous filing or not.

Returns:

  • (Boolean)


213
214
215
# File 'lib/fech/filing.rb', line 213

def amendment?
  !amends.nil?
end

#amendsObject

Returns the filing ID of the past filing this one amends, nil if this is a first-draft filing. :report_id in the HDR line references the amended filing



220
221
222
# File 'lib/fech/filing.rb', line 220

def amends
  header[:report_id]
end

#custom_file_pathObject

The file path where custom versions of a filing are to be saved.



282
283
284
# File 'lib/fech/filing.rb', line 282

def custom_file_path
  File.join(download_dir, "fech_#{file_name}")
end

#delimiterString

Returns the delimiter used in the filing’s version.

Returns:

  • (String)

    the delimiter used in the filing’s version



346
347
348
# File 'lib/fech/filing.rb', line 346

def delimiter
  filing_version.to_f < 6 ? "," : "\034"
end

#downloadObject

Saves the filing data from the FEC website into the default download directory.



34
35
36
37
38
39
# File 'lib/fech/filing.rb', line 34

def download
  File.open(file_path, 'w') do |file|
    file << open(filing_url).read
  end
  self
end

#each_row(opts = {}) {|Array| ... } ⇒ Object

Iterates over and yields the Filing’s lines

Parameters:

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :with_index (Boolean)

    yield both the item and its index

Yields:

  • (Array)

    a row of the filing, split by the delimiter from #delimiter



321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
# File 'lib/fech/filing.rb', line 321

def each_row(opts={}, &block)
  unless File.exists?(file_path)
    raise "File #{file_path} does not exist. Try invoking the .download method on this Filing object."
  end

  # If this is an F99, we need to parse it differently.
  resave_f99_contents if ['F99', '"F99"'].include? form_type

  c = 0
  @csv_parser.parse_row(@customized ? custom_file_path : file_path, :col_sep => delimiter, :quote_char => @quote_char, :skip_blanks => true) do |row|
    if opts[:with_index]
      yield [row, c]
      c += 1
    else
      yield row
    end
  end
end

#each_row_with_index(&block) ⇒ Object

Wrapper around .each_row to include indexes



341
342
343
# File 'lib/fech/filing.rb', line 341

def each_row_with_index(&block)
  each_row(:with_index => true, &block)
end

#file_contentsObject

The raw contents of the Filing



266
267
268
# File 'lib/fech/filing.rb', line 266

def file_contents
  File.open(file_path, 'r')
end

#file_nameObject



310
311
312
# File 'lib/fech/filing.rb', line 310

def file_name
  "#{filing_id}.fec"
end

#file_pathObject

The location of the Filing on the file system



261
262
263
# File 'lib/fech/filing.rb', line 261

def file_path
  File.join(download_dir, file_name)
end

#filing_urlObject



314
315
316
# File 'lib/fech/filing.rb', line 314

def filing_url
  "http://query.nictusa.com/dcdev/posted/#{filing_id}.fec"
end

#filing_versionObject

The version of the FEC software used to generate this Filing



234
235
236
# File 'lib/fech/filing.rb', line 234

def filing_version
  @filing_version ||= parse_filing_version
end

#fix_f99_contentsObject

Handle the contents of F99s by removing the

BEGINTEXT

and [ENDTEXT] delimiters and

putting the text content onto the same line as the summary.



290
291
292
293
294
295
296
297
298
299
300
301
# File 'lib/fech/filing.rb', line 290

def fix_f99_contents
  @customized = true
  content = file_contents.read
  regex = /\n\[BEGINTEXT\]\n(.*?)\[ENDTEXT\]\n/mi # some use eg [EndText]
  match = content.match(regex)
  if match
    repl = match[1].gsub(/"/, '""')
    content.gsub(regex, "#{delimiter}\"#{repl}\"")
  else
    content
  end
end

#form_typeObject

Determine the form type of the filing before it’s been parsed. This is needed for the F99 special case.



273
274
275
276
277
278
# File 'lib/fech/filing.rb', line 273

def form_type
  file_contents.lines.each_with_index do |row, index|
    next if index == 0
    return row.split(delimiter).first
  end
end

#hash_zip(keys, values) ⇒ Fech::Mapped, Hash

Combines an array of keys and values into an Fech::Mapped object, a type of Hash.

Parameters:

  • keys (Array)

    the desired keys for the new hash

  • values (Array)

    the desired values for the new hash

Returns:



229
230
231
# File 'lib/fech/filing.rb', line 229

def hash_zip(keys, values)
  Fech::Mapped.new(self, values.first).merge(Hash[*keys.zip(values).flatten])
end

#header(opts = {}) ⇒ Hash

Access the header (first) line of the filing, containing information about the filing’s version and metadata about the software used to file it.

Returns:

  • (Hash)

    a hash that assigns labels to the values of the filing’s header row



83
84
85
86
87
# File 'lib/fech/filing.rb', line 83

def header(opts={})
  each_row do |row|
    return parse_row?(row)
  end
end

#map(row, opts = {}) ⇒ Object

Maps a raw row to a labeled hash following any rules given in the filing’s Translator based on its version and row type. Finds the correct map for a given row, performs any matching Translations on the individual values, and returns either the entire dataset, or just those fields requested.

Parameters:

  • row (String, Regexp)

    a partial or complete name of the type of row desired

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :include (Array)

    list of field names that should be included in the returned hash



149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
# File 'lib/fech/filing.rb', line 149

def map(row, opts={})
  data = Fech::Mapped.new(self, row.first)
  full_row_map = map_for(row.first)
  
  # If specific fields were asked for, return only those
  if opts[:include]
    row_map = full_row_map.select { |k| opts[:include].include?(k) }
  else
    row_map = full_row_map
  end
  
  # Inserts the row into data, performing any specified preprocessing
  # on individual cells along the way
  row_map.each_with_index do |field, index|
    value = row[full_row_map.index(field)]
    translator.get_translations(:row => row.first,
        :version => filing_version, :action => :convert,
        :field => field).each do |translation|
      # User's Procs should be given each field's value as context
      value = translation[:proc].call(value)
    end
    data[field] = value
  end
  
  # Performs any specified group preprocessing / combinations
  combinations = translator.get_translations(:row => row.first,
        :version => filing_version, :action => :combine)
  row_hash = hash_zip(row_map, row) if combinations
  combinations.each do |translation|
    # User's Procs should be given the entire row as context
    value = translation[:proc].call(row_hash)
    field = translation[:field].source.gsub(/[\^\$]*/, "").to_sym
    data[field] = value
  end
  
  data
end

#map_for(row_type) ⇒ Object

Returns the column names for given row type and the filing’s version in the order they appear in row data.

Parameters:

  • row_type (String, Regexp)

    representation of the row desired



190
191
192
# File 'lib/fech/filing.rb', line 190

def map_for(row_type)
  mappings.for_row(row_type)
end

#mappingsObject

Gets or creats the Mappings instance for this filing_version



256
257
258
# File 'lib/fech/filing.rb', line 256

def mappings
  @mapping ||= Fech::Mappings.new(filing_version)
end

#parse_filing_versionObject

Pulls out the version number from the header line. Must parse this line manually, since we don’t know the version yet, and thus the delimiter type is still a mystery.



241
242
243
244
245
246
247
248
# File 'lib/fech/filing.rb', line 241

def parse_filing_version
  first = File.open(file_path).first
  if first.index("\034").nil?
    @csv_parser.parse(first).flatten[2]
  else
    @csv_parser.parse(first, :col_sep => "\034").flatten[2]
  end
end

#parse_row?(row, opts = {}) ⇒ Boolean

Decides what to do with a given row. If the row’s type matches the desired type, or if no type was specified, it will run the row through #map. If :raw was passed true, a flat, unmapped data array will be returned.

Parameters:

  • row (String, Regexp)

    a partial or complete name of the type of row desired

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :include (Array)

    list of field names that should be included in the returned hash

Returns:

  • (Boolean)


131
132
133
134
135
136
137
138
139
# File 'lib/fech/filing.rb', line 131

def parse_row?(row, opts={})
  # Always parse, unless :parse_if is given and does not match row
  if opts[:parse_if].nil? || \
      Fech.regexify(opts[:parse_if]).match(row.first.downcase)
    opts[:raw] ? row : map(row, opts)
  else
    false
  end
end

#readable?Boolean

Only FEC format 3.00 + is supported

Returns:

  • (Boolean)


251
252
253
# File 'lib/fech/filing.rb', line 251

def readable?
  filing_version.to_i >= 3
end

#resave_f99_contentsObject

Resave the “fixed” version of an F99



304
305
306
307
308
# File 'lib/fech/filing.rb', line 304

def resave_f99_contents
  return true if @resaved
  File.open(custom_file_path, 'w') { |f| f.write(fix_f99_contents) }
  @resaved = true
end

#rows_like(row_type, opts = {}) {|Hash| ... } ⇒ Array

Access all lines of the filing that match a given row type. Will return an Array of all available lines if called directly, or will yield the mapped rows one by one if a block is passed.

Parameters:

  • row_type (String, Regexp)

    a partial or complete name of the type of row desired

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :raw (Boolean)

    should the function return the data as an array that has not been mapped to column names

  • :include (Array)

    list of field names that should be included in the returned hash

Yields:

  • (Hash)

    each matched row’s data, as either a mapped hash or raw array

Returns:

  • (Array)

    the complete set of mapped hashes for matched lines



110
111
112
113
114
115
116
117
118
119
120
121
122
# File 'lib/fech/filing.rb', line 110

def rows_like(row_type, opts={}, &block)
  data = []
  each_row do |row|
    value = parse_row?(row, opts.merge(:parse_if => row_type))
    next if value == false
    if block_given?
      yield value
    else
      data << value if value
    end
  end
  block_given? ? nil : data
end

#summaryHash

Access the summary (second) line of the filing, containing aggregate and top-level information about the filing.

Returns:

  • (Hash)

    a hash that assigns labels to the values of the filing’s summary row



92
93
94
95
96
97
# File 'lib/fech/filing.rb', line 92

def summary
  each_row_with_index do |row, index|
    next if index == 0
    return parse_row?(row)
  end
end

#translate {|t| ... } ⇒ Object

Yields:

  • (t)

    returns a reference to the filing’s Translator

Yield Parameters:



204
205
206
207
208
209
210
# File 'lib/fech/filing.rb', line 204

def translate(&block)
  if block_given?
    yield translator
  else
    translator
  end
end