Module: Textminer
- Extended by:
- Configuration
- Defined in:
- lib/textminer/mined.rb,
lib/textminer.rb,
lib/textminer/miner.rb,
lib/textminer/request.rb,
lib/textminer/version.rb,
lib/textminer/response.rb
Overview
Textminer::Miner
Class to give back text mining object
Defined Under Namespace
Classes: Mined, Miner, Request, Response
Constant Summary collapse
- VERSION =
"0.1.5"
Class Method Summary collapse
-
.extract(path) ⇒ Object
Thin layer around pdf-reader gem’s PDF::Reader.
-
.fetch(url) ⇒ Mined
Get full text.
-
.search(doi: nil, member: nil, filter: nil, limit: nil, options: nil) ⇒ Array
Search for papers and get full text links.
Methods included from Configuration
Class Method Details
.extract(path) ⇒ Object
Thin layer around pdf-reader gem’s PDF::Reader
This method is used internally within fetch to parse PDFs.
140 141 142 143 |
# File 'lib/textminer.rb', line 140 def self.extract(path) rr = PDF::Reader.new(path) rr.pages.map { |page| page.text }.join("\n") end |
.fetch(url) ⇒ Mined
Get full text
Work easily for open access papers, but for closed. For non-OA papers, use Crossref’s Text and Data Mining service, which requires authentication and pre-authorized IP address. Go to apps.crossref.org/clickthrough/researchers to sign up for the TDM service, to get your key. The only publishers taking part at this time are Elsevier and Wiley.
the url requested, the file path, and parsing the plain text, XML, or extracting text from the pdf.
120 121 122 |
# File 'lib/textminer.rb', line 120 def self.fetch(url) Miner.new(url).perform end |