Class: Remi::Extractor::S3File

Inherits:
FileSystem show all
Defined in:
lib/remi/data_subjects/s3_file.rb

Overview

S3 File extractor Used to extract files from Amazon S3

Examples:


class MyJob < Remi::Job
  source :some_file do
    extractor Remi::Extractor::S3File.new(
      bucket: 'my-awesome-bucket',
      remote_path: 'some_file-',
      most_recent_only: true
    )
    parser Remi::Parser::CsvFile.new(
      csv_options: {
        headers: true,
        col_sep: '|'
      }
    )
  end
end

job = MyJob.new
job.some_file.df
# =>#<Daru::DataFrame:70153153438500 @name = 4c59cfdd-7de7-4264-8666-83153f46a9e4 @size = 3>
#                    id       name
#          0          1     Albert
#          1          2      Betsy
#          2          3       Camu

Instance Attribute Summary

Attributes inherited from FileSystem

#group_by, #local_path, #most_recent_by, #most_recent_only, #pattern, #remote_path

Attributes inherited from Remi::Extractor

#logger

Instance Method Summary collapse

Methods inherited from FileSystem

#entries, #matching_entries, #most_recent_matching_entry, #most_recent_matching_entry_in_group

Constructor Details

#initialize(*args, **kargs, &block) ⇒ S3File

Returns a new instance of S3File.

Parameters:

  • bucket_name (String)

    S3 bucket containing the files



34
35
36
37
# File 'lib/remi/data_subjects/s3_file.rb', line 34

def initialize(*args, **kargs, &block)
  super
  init_s3_file(*args, **kargs, &block)
end

Instance Method Details

#all_entriesArray<Extractor::FileSystemEntry>

Returns (Memoized) list of objects in the bucket/prefix.

Returns:



51
52
53
# File 'lib/remi/data_subjects/s3_file.rb', line 51

def all_entries
  @all_entries ||= all_entries!
end

#all_entries!Array<Extractor::FileSystemEntry>

Returns List of objects in the bucket/prefix.

Returns:



56
57
58
59
60
61
62
63
64
65
66
# File 'lib/remi/data_subjects/s3_file.rb', line 56

def all_entries!
  # S3 does not track anything like a create time, so use last modified for both
  bucket.objects(prefix: @remote_path.to_s).map do |entry|
    Extractor::FileSystemEntry.new(
      pathname: entry.key,
      create_time: entry.last_modified,
      modified_time: entry.last_modified,
      raw: entry
    )
  end
end

#extractArray<String>

Called to extract files from the source filesystem.

Returns:

  • (Array<String>)

    An array of paths to a local copy of the files extacted



41
42
43
44
45
46
47
48
# File 'lib/remi/data_subjects/s3_file.rb', line 41

def extract
  entries.map do |entry|
    local_file = File.join(@local_path, entry.name)
    logger.info "Downloading #{entry.pathname} from S3 to #{local_file}"
    File.open(local_file, 'wb') { |file| entry.raw.get(response_target: file) }
    local_file
  end
end

#s3_clientAws::S3::Client

Returns The S3 client used.

Returns:

  • (Aws::S3::Client)

    The S3 client used



69
70
71
# File 'lib/remi/data_subjects/s3_file.rb', line 69

def s3_client
  @s3_client ||= Aws::S3::Client.new
end