Module: PDFTDX

Defined in:
lib/pdftdx.rb,
lib/pdftdx/parser.rb,
lib/pdftdx/version.rb

Overview

PDF TDX Module

Defined Under Namespace

Modules: Parser

Constant Summary collapse

VERSION =

Version

'1.2.1'

Class Method Summary collapse

Class Method Details

.extract_data(pdf_file) ⇒ Array

Extract Data from PDF

Parameters:

  • pdf_file (String)

    Path to a PDF file

Returns:

  • (Array)

    An array of tables, each represented as a hash containing an optional header and table data, in the form of either one single array of rows, or a hash of sub-tables (arrays of rows) mapped by name. Table rows are represented as an array of table cells. Example: [{ head: [‘trauma.eresse.net’, ‘durjaya.dooba.io’, ‘suessmost.eresse.net’], data: { ‘System’ => [[‘Machine OS’, ‘Win32’, ‘Linux’, ‘MacOS’], [‘IP Address’, ‘10.0.232.48’, ‘10.0.232.134’, ‘10.0.232.108’]] } }]



18
19
20
21
22
23
24
25
# File 'lib/pdftdx.rb', line 18

def self.extract_data pdf_file

  # Dump PDF Data
  page_data = Pdftohtml.convert pdf_file

  # Process Page Data
  PDFTDX::Parser.process page_data
end