Class: FormatParser::PDFParser

Inherits:
Object
  • Object
show all
Includes:
IOUtils
Defined in:
lib/parsers/pdf_parser.rb

Constant Summary collapse

PDF_MARKER =

First 9 bytes of a PDF should be in this format, according to:

https://stackoverflow.com/questions/3108201/detect-if-pdf-file-is-correct-header-pdf

There are however exceptions, which are left out for now.

/%PDF-[12]\.[0-8]{1}/
PDF_CONTENT_TYPE =
'application/pdf'

Constants included from IOUtils

IOUtils::INTEGER_DIRECTIVES

Instance Method Summary collapse

Methods included from IOUtils

#read_bytes, #read_fixed_point, #read_int, #safe_read, #safe_skip, #skip_bytes

Instance Method Details

#call(io) ⇒ Object



16
17
18
19
20
21
22
23
24
25
# File 'lib/parsers/pdf_parser.rb', line 16

def call(io)
  io = FormatParser::IOConstraint.new(io)

  header = safe_read(io, 9)
  return unless header =~ PDF_MARKER

  FormatParser::Document.new(format: :pdf, content_type: PDF_CONTENT_TYPE)
rescue FormatParser::IOUtils::InvalidRead
  nil
end

#likely_match?(filename) ⇒ Boolean

Returns:

  • (Boolean)


12
13
14
# File 'lib/parsers/pdf_parser.rb', line 12

def likely_match?(filename)
  filename =~ /\.(pdf|ai)$/i
end