Module: ParseKit
- Defined in:
- lib/parsekit.rb,
lib/parsekit/error.rb,
lib/parsekit/parser.rb,
lib/parsekit/version.rb
Overview
ParseKit is a Ruby document parsing toolkit with PDF and OCR support
Defined Under Namespace
Classes: Parser
Constant Summary collapse
- SUPPORTED_FORMATS =
Supported file formats and their extensions
{ pdf: ['.pdf'], docx: ['.docx'], xlsx: ['.xlsx'], xls: ['.xls'], pptx: ['.pptx'], png: ['.png'], jpeg: ['.jpg', '.jpeg'], tiff: ['.tiff', '.tif'], bmp: ['.bmp'], json: ['.json'], xml: ['.xml', '.html'], text: ['.txt', '.md', '.csv'] }.freeze
- VERSION =
"0.1.2"
Class Method Summary collapse
-
.detect_format(filename) ⇒ Symbol
Detect file format from filename/extension.
-
.native_version ⇒ String
Get the native library version.
-
.parse(input, options = {}) ⇒ String
Convenience method to parse input directly (for text).
-
.parse_bytes(data, options = {}) ⇒ String
Parse binary data.
-
.supported_formats ⇒ Array<String>
Get supported file formats.
-
.supports_file?(path) ⇒ Boolean
Check if a file format is supported.
Class Method Details
.detect_format(filename) ⇒ Symbol
Detect file format from filename/extension
72 73 74 75 76 77 78 79 80 81 82 83 |
# File 'lib/parsekit.rb', line 72 def detect_format(filename) return :unknown if filename.nil? || filename.empty? ext = File.extname(filename).downcase return :unknown if ext.empty? SUPPORTED_FORMATS.each do |format, extensions| return format if extensions.include?(ext) end :unknown end |
.native_version ⇒ String
Get the native library version
87 88 89 90 91 |
# File 'lib/parsekit.rb', line 87 def native_version version rescue StandardError "unknown" end |
.parse(input, options = {}) ⇒ String
Convenience method to parse input directly (for text)
42 43 44 |
# File 'lib/parsekit.rb', line 42 def parse(input, = {}) Parser.new().parse(input) end |
.parse_bytes(data, options = {}) ⇒ String
Parse binary data
50 51 52 53 54 |
# File 'lib/parsekit.rb', line 50 def parse_bytes(data, = {}) # Convert string to bytes if needed byte_data = data.is_a?(String) ? data.bytes : data Parser.new().parse_bytes(byte_data) end |
.supported_formats ⇒ Array<String>
Get supported file formats
58 59 60 |
# File 'lib/parsekit.rb', line 58 def supported_formats Parser.supported_formats end |
.supports_file?(path) ⇒ Boolean
Check if a file format is supported
65 66 67 |
# File 'lib/parsekit.rb', line 65 def supports_file?(path) Parser.new.supports_file?(path) end |