OSV
OSV is a high-performance CSV parser for Ruby, implemented in Rust. It wraps BurntSushi's excellent csv-rs crate.
It provides a simple interface for reading CSV files with support for both hash-based and array-based row formats.
The array-based mode is faster than the hash-based mode, so if you don't need the hash keys, use the array-based mode.
Installation
Add this line to your application's Gemfile:
gem 'osv'
And then execute:
bundle install
Or install it directly:
gem install osv
Usage
Reading CSV Files
require 'osv'
# Basic usage - each row as a hash
OSV.for_each("data.csv") do |row|
puts row["name"] # => "John"
puts row["age"] # => "25"
end
# Return an enumerator instead of using a block
rows = OSV.for_each("data.csv")
rows.each { |row| puts row["name"] }
# High-performance array mode
OSV.for_each("data.csv", result_type: :array) do |row|
puts row[0] # First column
puts row[1] # Second column
end
Input Sources
# From a file path
OSV.for_each("data.csv") { |row| puts row["name"] }
# From a file path
OSV.for_each("data.csv.gz") { |row| puts row["name"] }
# From an IO object
File.open("data.csv") { |file| OSV.for_each(file) { |row| puts row["name"] } }
# From a string
data = StringIO.new("name,age\nJohn,25")
OSV.for_each(data) { |row| puts row["name"] }
Configuration Options
OSV.for_each("data.csv",
# Input formatting
has_headers: true, # First row contains headers (default: true)
col_sep: ",", # Column separator (default: ",")
quote_char: '"', # Quote character (default: '"')
# Output formatting
result_type: :hash, # :hash or :array (hash is default)
nil_string: nil, # String to interpret as nil when parsing (default: nil)
# Parsing behavior
flexible: false, # Allow varying number of fields (default: false)
trim: :all, # Whether to trim whitespace. Options are :all, :headers, or :fields (default: nil)
buffer_size: 1024, # Number of rows to buffer in memory (default: 1024)
ignore_null_bytes: false, # Boolean specifying if null bytes should be ignored (default: false)
lossy: false, # Boolean specifying if invalid UTF-8 characters should be replaced with a replacement character (default: false)
)
Available Options
has_headers: Boolean indicating if the first row contains headers (default: true)col_sep: String specifying the field separator (default: ",")quote_char: String specifying the quote character (default: "\"")nil_string: String that should be interpreted as nil- by default, empty strings are interpreted as empty strings
- if you want to interpret empty strings as nil, set this to an empty string
buffer_size: Integer specifying the number of rows to buffer in memory (default: 1024)result_type: String specifying the output format ("hash" or "array" or :hash or :array)flexible: Boolean specifying if the parser should be flexible (default: false)trim: String specifying the trim mode ("all" or "headers" or "fields" or :all or :headers or :fields)ignore_null_bytes: Boolean specifying if null bytes should be ignored (default: false)lossy: Boolean specifying if invalid UTF-8 characters should be replaced with a replacement character (default: false)
When has_headers is false, hash keys will be generated as "c0", "c1", etc.
Requirements
- Ruby >= 3.1.0
- Rust toolchain (for installation from source)
Performance
This library is faster than the standard Ruby CSV library. It's also faster than any other CSV gem I've been able to find.
Here's some unscientific benchmarks. You can find the code in the benchmark/comparison_benchmark.rb file.
1,000,000 records