Module: PDFTDX
- Defined in:
- lib/pdftdx.rb,
lib/pdftdx/parser.rb,
lib/pdftdx/version.rb
Overview
PDF TDX Module
Defined Under Namespace
Modules: Parser
Constant Summary collapse
- VERSION =
Version
'1.2.2'
Class Method Summary collapse
-
.extract_data(pdf_file) ⇒ Array
Extract Data from PDF: Converts a PDF file to HTML format and then extracts anything that looks like tabular data.
Class Method Details
.extract_data(pdf_file) ⇒ Array
Extract Data from PDF: Converts a PDF file to HTML format and then extracts anything that looks like tabular data.
20 21 22 23 24 25 26 27 |
# File 'lib/pdftdx.rb', line 20 def self.extract_data pdf_file # Dump PDF Data page_data = Pdftohtml.convert pdf_file # Process Page Data PDFTDX::Parser.process page_data end |