Class: Mbox2CSV::MboxParser
- Inherits:
-
Object
- Object
- Mbox2CSV::MboxParser
- Defined in:
- lib/mbox2csv.rb
Overview
Main class for parsing MBOX files, saving email data/statistics to CSV, and (optionally) extracting selected attachment types to disk.
Instance Method Summary collapse
-
#extract_attachments(extract: true, filetypes: [], output_folder: 'attachments') ⇒ Integer
Extract selected attachment file types from the MBOX into a folder.
-
#initialize(mbox_file, csv_file, stats_csv_file, recipient_stats_csv_file) ⇒ MboxParser
constructor
Initializes the MboxParser with file paths for the MBOX file, output CSV file, and statistics CSV files for sender and recipient statistics.
-
#parse ⇒ Object
Parses the MBOX file and writes the email data to the specified CSV file.
Constructor Details
#initialize(mbox_file, csv_file, stats_csv_file, recipient_stats_csv_file) ⇒ MboxParser
Initializes the MboxParser with file paths for the MBOX file, output CSV file, and statistics CSV files for sender and recipient statistics.
18 19 20 21 22 23 24 25 26 |
# File 'lib/mbox2csv.rb', line 18 def initialize(mbox_file, csv_file, stats_csv_file, recipient_stats_csv_file) @mbox_file = mbox_file @csv_file = csv_file @statistics = EmailStatistics.new @stats_csv_file = stats_csv_file @recipient_stats_csv_file = recipient_stats_csv_file @senders_folder = 'senders/' FileUtils.mkdir_p(@senders_folder) # Create the senders folder if it doesn't exist end |
Instance Method Details
#extract_attachments(extract: true, filetypes: [], output_folder: 'attachments') ⇒ Integer
Extract selected attachment file types from the MBOX into a folder.
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
# File 'lib/mbox2csv.rb', line 65 def (extract: true, filetypes: [], output_folder: 'attachments') return 0 unless extract wanted_exts = Array(filetypes).map { |e| e.to_s.downcase.sub(/\A\./, '') }.uniq raise ArgumentError, "filetypes must not be empty when extract: true" if wanted_exts.empty? FileUtils.mkdir_p(output_folder) total_written = 0 total_lines = File.foreach(@mbox_file).inject(0) { |c, _| c + 1 } = ProgressBar.create(title: "Extracting Attachments", total: total_lines, format: "%t: |%B| %p%%") File.open(@mbox_file, 'r') do |mbox| buffer = "" mbox.each_line do |line| .increment if line.start_with?("From ") total_written += (buffer, wanted_exts, output_folder) unless buffer.empty? buffer = "" end buffer << line end total_written += (buffer, wanted_exts, output_folder) unless buffer.empty? end puts "Attachment extraction completed. #{total_written} file(s) saved to #{output_folder}" total_written rescue => e puts "Error extracting attachments: #{e.}" 0 end |
#parse ⇒ Object
Parses the MBOX file and writes the email data to the specified CSV file. It also saves sender and recipient statistics to separate CSV files. A progress bar is displayed during the parsing process.
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
# File 'lib/mbox2csv.rb', line 31 def parse total_lines = File.foreach(@mbox_file).inject(0) { |c, _line| c + 1 } = ProgressBar.create(title: "Parsing Emails", total: total_lines, format: "%t: |%B| %p%%") CSV.open(@csv_file, 'w') do |csv| csv << ['From', 'To', 'Subject', 'Date', 'Body'] File.open(@mbox_file, 'r') do |mbox| buffer = "" mbox.each_line do |line| .increment if line.start_with?("From ") process_email_block(buffer, csv) unless buffer.empty? buffer = "" end buffer << line end process_email_block(buffer, csv) unless buffer.empty? end end puts "Parsing completed. Data saved to #{@csv_file}" @statistics.save_sender_statistics(@stats_csv_file) @statistics.save_recipient_statistics(@recipient_stats_csv_file) rescue => e puts "Error processing MBOX file: #{e.}" end |