Class: Pdfcrowd::PdfToHtmlClient

Inherits:
Object
  • Object
show all
Defined in:
lib/pdfcrowd.rb

Overview

Conversion from PDF to HTML.

Instance Method Summary collapse

Constructor Details

#initialize(user_name, api_key) ⇒ PdfToHtmlClient

Constructor for the Pdfcrowd API client.

  • user_name - Your username at Pdfcrowd.

  • api_key - Your API key.



5141
5142
5143
5144
5145
5146
5147
5148
5149
5150
# File 'lib/pdfcrowd.rb', line 5141

def initialize(user_name, api_key)
    @helper = ConnectionHelper.new(user_name, api_key)
    @fields = {
        'input_format'=>'pdf',
        'output_format'=>'html'
    }
    @file_id = 1
    @files = {}
    @raw_data = {}
end

Instance Method Details

#convertFile(file) ⇒ Object

Convert a local file.

  • file - The path to a local file to convert. The file must exist and not be empty.

  • Returns - Byte array containing the conversion output.



5206
5207
5208
5209
5210
5211
5212
5213
# File 'lib/pdfcrowd.rb', line 5206

def convertFile(file)
    if (!(File.file?(file) && !File.zero?(file)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file, "convertFile", "pdf-to-html", "The file must exist and not be empty.", "convert_file"), 470);
    end
    
    @files['file'] = file
    @helper.post(@fields, @files, @raw_data)
end

#convertFileToFile(file, file_path) ⇒ Object

Convert a local file and write the result to a local file.

  • file - The path to a local file to convert. The file must exist and not be empty.

  • file_path - The output file path. The string must not be empty. The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.



5232
5233
5234
5235
5236
5237
5238
5239
5240
5241
5242
5243
5244
5245
5246
5247
5248
5249
5250
# File 'lib/pdfcrowd.rb', line 5232

def convertFileToFile(file, file_path)
    if (!(!file_path.nil? && !file_path.empty?))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertFileToFile::file_path", "pdf-to-html", "The string must not be empty.", "convert_file_to_file"), 470);
    end
    
    if (!(isOutputTypeValid(file_path)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertFileToFile::file_path", "pdf-to-html", "The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.", "convert_file_to_file"), 470);
    end
    
    output_file = open(file_path, "wb")
    begin
        convertFileToStream(file, output_file)
        output_file.close()
    rescue Error => why
        output_file.close()
        FileUtils.rm(file_path)
        raise
    end
end

#convertFileToStream(file, out_stream) ⇒ Object

Convert a local file and write the result to an output stream.

  • file - The path to a local file to convert. The file must exist and not be empty.

  • out_stream - The output stream that will contain the conversion output.



5219
5220
5221
5222
5223
5224
5225
5226
# File 'lib/pdfcrowd.rb', line 5219

def convertFileToStream(file, out_stream)
    if (!(File.file?(file) && !File.zero?(file)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file, "convertFileToStream::file", "pdf-to-html", "The file must exist and not be empty.", "convert_file_to_stream"), 470);
    end
    
    @files['file'] = file
    @helper.post(@fields, @files, @raw_data, out_stream)
end

#convertRawData(data) ⇒ Object

Convert raw data.

  • data - The raw content to be converted.

  • Returns - Byte array with the output.



5256
5257
5258
5259
# File 'lib/pdfcrowd.rb', line 5256

def convertRawData(data)
    @raw_data['file'] = data
    @helper.post(@fields, @files, @raw_data)
end

#convertRawDataToFile(data, file_path) ⇒ Object

Convert raw data to a file.

  • data - The raw content to be converted.

  • file_path - The output file path. The string must not be empty. The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.



5274
5275
5276
5277
5278
5279
5280
5281
5282
5283
5284
5285
5286
5287
5288
5289
5290
5291
5292
# File 'lib/pdfcrowd.rb', line 5274

def convertRawDataToFile(data, file_path)
    if (!(!file_path.nil? && !file_path.empty?))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertRawDataToFile::file_path", "pdf-to-html", "The string must not be empty.", "convert_raw_data_to_file"), 470);
    end
    
    if (!(isOutputTypeValid(file_path)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertRawDataToFile::file_path", "pdf-to-html", "The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.", "convert_raw_data_to_file"), 470);
    end
    
    output_file = open(file_path, "wb")
    begin
        convertRawDataToStream(data, output_file)
        output_file.close()
    rescue Error => why
        output_file.close()
        FileUtils.rm(file_path)
        raise
    end
end

#convertRawDataToStream(data, out_stream) ⇒ Object

Convert raw data and write the result to an output stream.

  • data - The raw content to be converted.

  • out_stream - The output stream that will contain the conversion output.



5265
5266
5267
5268
# File 'lib/pdfcrowd.rb', line 5265

def convertRawDataToStream(data, out_stream)
    @raw_data['file'] = data
    @helper.post(@fields, @files, @raw_data, out_stream)
end

#convertStream(in_stream) ⇒ Object

Convert the contents of an input stream.

  • in_stream - The input stream with source data.

  • Returns - Byte array containing the conversion output.



5298
5299
5300
5301
# File 'lib/pdfcrowd.rb', line 5298

def convertStream(in_stream)
    @raw_data['stream'] = in_stream.read
    @helper.post(@fields, @files, @raw_data)
end

#convertStreamToFile(in_stream, file_path) ⇒ Object

Convert the contents of an input stream and write the result to a local file.

  • in_stream - The input stream with source data.

  • file_path - The output file path. The string must not be empty. The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.



5316
5317
5318
5319
5320
5321
5322
5323
5324
5325
5326
5327
5328
5329
5330
5331
5332
5333
5334
# File 'lib/pdfcrowd.rb', line 5316

def convertStreamToFile(in_stream, file_path)
    if (!(!file_path.nil? && !file_path.empty?))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertStreamToFile::file_path", "pdf-to-html", "The string must not be empty.", "convert_stream_to_file"), 470);
    end
    
    if (!(isOutputTypeValid(file_path)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertStreamToFile::file_path", "pdf-to-html", "The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.", "convert_stream_to_file"), 470);
    end
    
    output_file = open(file_path, "wb")
    begin
        convertStreamToStream(in_stream, output_file)
        output_file.close()
    rescue Error => why
        output_file.close()
        FileUtils.rm(file_path)
        raise
    end
end

#convertStreamToStream(in_stream, out_stream) ⇒ Object

Convert the contents of an input stream and write the result to an output stream.

  • in_stream - The input stream with source data.

  • out_stream - The output stream that will contain the conversion output.



5307
5308
5309
5310
# File 'lib/pdfcrowd.rb', line 5307

def convertStreamToStream(in_stream, out_stream)
    @raw_data['stream'] = in_stream.read
    @helper.post(@fields, @files, @raw_data, out_stream)
end

#convertUrl(url) ⇒ Object

Convert a PDF.

  • url - The address of the PDF to convert. The supported protocols are http:// and https://.

  • Returns - Byte array containing the conversion output.



5156
5157
5158
5159
5160
5161
5162
5163
# File 'lib/pdfcrowd.rb', line 5156

def convertUrl(url)
    unless /(?i)^https?:\/\/.*$/.match(url)
        raise Error.new(Pdfcrowd.create_invalid_value_message(url, "convertUrl", "pdf-to-html", "The supported protocols are http:// and https://.", "convert_url"), 470);
    end
    
    @fields['url'] = url
    @helper.post(@fields, @files, @raw_data)
end

#convertUrlToFile(url, file_path) ⇒ Object

Convert a PDF and write the result to a local file.

  • url - The address of the PDF to convert. The supported protocols are http:// and https://.

  • file_path - The output file path. The string must not be empty. The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.



5182
5183
5184
5185
5186
5187
5188
5189
5190
5191
5192
5193
5194
5195
5196
5197
5198
5199
5200
# File 'lib/pdfcrowd.rb', line 5182

def convertUrlToFile(url, file_path)
    if (!(!file_path.nil? && !file_path.empty?))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertUrlToFile::file_path", "pdf-to-html", "The string must not be empty.", "convert_url_to_file"), 470);
    end
    
    if (!(isOutputTypeValid(file_path)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertUrlToFile::file_path", "pdf-to-html", "The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.", "convert_url_to_file"), 470);
    end
    
    output_file = open(file_path, "wb")
    begin
        convertUrlToStream(url, output_file)
        output_file.close()
    rescue Error => why
        output_file.close()
        FileUtils.rm(file_path)
        raise
    end
end

#convertUrlToStream(url, out_stream) ⇒ Object

Convert a PDF and write the result to an output stream.

  • url - The address of the PDF to convert. The supported protocols are http:// and https://.

  • out_stream - The output stream that will contain the conversion output.



5169
5170
5171
5172
5173
5174
5175
5176
# File 'lib/pdfcrowd.rb', line 5169

def convertUrlToStream(url, out_stream)
    unless /(?i)^https?:\/\/.*$/.match(url)
        raise Error.new(Pdfcrowd.create_invalid_value_message(url, "convertUrlToStream::url", "pdf-to-html", "The supported protocols are http:// and https://.", "convert_url_to_stream"), 470);
    end
    
    @fields['url'] = url
    @helper.post(@fields, @files, @raw_data, out_stream)
end

#getConsumedCreditCountObject

Get the number of credits consumed by the last conversion.

  • Returns - The number of credits.



5487
5488
5489
# File 'lib/pdfcrowd.rb', line 5487

def getConsumedCreditCount()
    return @helper.getConsumedCreditCount()
end

#getDebugLogUrlObject

Get the URL of the debug log for the last conversion.

  • Returns - The link to the debug log.



5472
5473
5474
# File 'lib/pdfcrowd.rb', line 5472

def getDebugLogUrl()
    return @helper.getDebugLogUrl()
end

#getJobIdObject

Get the job id.

  • Returns - The unique job identifier.



5493
5494
5495
# File 'lib/pdfcrowd.rb', line 5493

def getJobId()
    return @helper.getJobId()
end

#getOutputSizeObject

Get the size of the output in bytes.

  • Returns - The count of bytes.



5505
5506
5507
# File 'lib/pdfcrowd.rb', line 5505

def getOutputSize()
    return @helper.getOutputSize()
end

#getPageCountObject

Get the number of pages in the output document.

  • Returns - The page count.



5499
5500
5501
# File 'lib/pdfcrowd.rb', line 5499

def getPageCount()
    return @helper.getPageCount()
end

#getRemainingCreditCountObject

Get the number of conversion credits available in your account. This method can only be called after a call to one of the convertXtoY methods. The returned value can differ from the actual count if you run parallel conversions. The special value 999999 is returned if the information is not available.

  • Returns - The number of credits.



5481
5482
5483
# File 'lib/pdfcrowd.rb', line 5481

def getRemainingCreditCount()
    return @helper.getRemainingCreditCount()
end

#getVersionObject

Get the version details.

  • Returns - API version, converter version, and client version.



5511
5512
5513
# File 'lib/pdfcrowd.rb', line 5511

def getVersion()
    return "client " + CLIENT_VERSION + ", API v2, converter " + @helper.getConverterVersion()
end

#isZippedOutputObject

A helper method to determine if the output file is a zip archive. The output of the conversion may be either an HTML file or a zip file containing the HTML and its external assets.

  • Returns - True if the conversion output is a zip file, otherwise False.



5412
5413
5414
# File 'lib/pdfcrowd.rb', line 5412

def isZippedOutput()
    @fields.fetch('image_mode', '') == 'separate' || @fields.fetch('css_mode', '') == 'separate' || @fields.fetch('font_mode', '') == 'separate' || @fields.fetch('force_zip', false) == true
end

#setAuthor(author) ⇒ Object

Set the HTML author. The author from the input PDF is used by default.

  • author - The HTML author.

  • Returns - The converter object.



5447
5448
5449
5450
# File 'lib/pdfcrowd.rb', line 5447

def setAuthor(author)
    @fields['author'] = author
    self
end

#setCssMode(mode) ⇒ Object

Specifies where the style sheets are stored.

  • mode - The style sheet storage mode. Allowed values are embed, separate.

  • Returns - The converter object.



5388
5389
5390
5391
5392
5393
5394
5395
# File 'lib/pdfcrowd.rb', line 5388

def setCssMode(mode)
    unless /(?i)^(embed|separate)$/.match(mode)
        raise Error.new(Pdfcrowd.create_invalid_value_message(mode, "setCssMode", "pdf-to-html", "Allowed values are embed, separate.", "set_css_mode"), 470);
    end
    
    @fields['css_mode'] = mode
    self
end

#setDebugLog(value) ⇒ Object

Turn on the debug logging. Details about the conversion are stored in the debug log. The URL of the log can be obtained from the getDebugLogUrl method or available in conversion statistics.

  • value - Set to true to enable the debug logging.

  • Returns - The converter object.



5465
5466
5467
5468
# File 'lib/pdfcrowd.rb', line 5465

def setDebugLog(value)
    @fields['debug_log'] = value
    self
end

#setFontMode(mode) ⇒ Object

Specifies where the fonts are stored.

  • mode - The font storage mode. Allowed values are embed, separate.

  • Returns - The converter object.



5401
5402
5403
5404
5405
5406
5407
5408
# File 'lib/pdfcrowd.rb', line 5401

def setFontMode(mode)
    unless /(?i)^(embed|separate)$/.match(mode)
        raise Error.new(Pdfcrowd.create_invalid_value_message(mode, "setFontMode", "pdf-to-html", "Allowed values are embed, separate.", "set_font_mode"), 470);
    end
    
    @fields['font_mode'] = mode
    self
end

#setForceZip(value) ⇒ Object

Enforces the zip output format.

  • value - Set to true to get the output as a zip archive.

  • Returns - The converter object.



5420
5421
5422
5423
# File 'lib/pdfcrowd.rb', line 5420

def setForceZip(value)
    @fields['force_zip'] = value
    self
end

#setHttpProxy(proxy) ⇒ Object

A proxy server used by Pdfcrowd conversion process for accessing the source URLs with HTTP scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.

  • proxy - The value must have format DOMAIN_OR_IP_ADDRESS:PORT.

  • Returns - The converter object.



5528
5529
5530
5531
5532
5533
5534
5535
# File 'lib/pdfcrowd.rb', line 5528

def setHttpProxy(proxy)
    unless /(?i)^([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z0-9]{1,}:\d+$/.match(proxy)
        raise Error.new(Pdfcrowd.create_invalid_value_message(proxy, "setHttpProxy", "pdf-to-html", "The value must have format DOMAIN_OR_IP_ADDRESS:PORT.", "set_http_proxy"), 470);
    end
    
    @fields['http_proxy'] = proxy
    self
end

#setHttpsProxy(proxy) ⇒ Object

A proxy server used by Pdfcrowd conversion process for accessing the source URLs with HTTPS scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.

  • proxy - The value must have format DOMAIN_OR_IP_ADDRESS:PORT.

  • Returns - The converter object.



5541
5542
5543
5544
5545
5546
5547
5548
# File 'lib/pdfcrowd.rb', line 5541

def setHttpsProxy(proxy)
    unless /(?i)^([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z0-9]{1,}:\d+$/.match(proxy)
        raise Error.new(Pdfcrowd.create_invalid_value_message(proxy, "setHttpsProxy", "pdf-to-html", "The value must have format DOMAIN_OR_IP_ADDRESS:PORT.", "set_https_proxy"), 470);
    end
    
    @fields['https_proxy'] = proxy
    self
end

#setImageMode(mode) ⇒ Object

Specifies where the images are stored.

  • mode - The image storage mode. Allowed values are embed, separate.

  • Returns - The converter object.



5375
5376
5377
5378
5379
5380
5381
5382
# File 'lib/pdfcrowd.rb', line 5375

def setImageMode(mode)
    unless /(?i)^(embed|separate)$/.match(mode)
        raise Error.new(Pdfcrowd.create_invalid_value_message(mode, "setImageMode", "pdf-to-html", "Allowed values are embed, separate.", "set_image_mode"), 470);
    end
    
    @fields['image_mode'] = mode
    self
end

#setKeywords(keywords) ⇒ Object

Associate keywords with the HTML document. Keywords from the input PDF are used by default.

  • keywords - The string containing the keywords.

  • Returns - The converter object.



5456
5457
5458
5459
# File 'lib/pdfcrowd.rb', line 5456

def setKeywords(keywords)
    @fields['keywords'] = keywords
    self
end

#setPdfPassword(password) ⇒ Object

Password to open the encrypted PDF file.

  • password - The input PDF password.

  • Returns - The converter object.



5340
5341
5342
5343
# File 'lib/pdfcrowd.rb', line 5340

def setPdfPassword(password)
    @fields['pdf_password'] = password
    self
end

#setPrintPageRange(pages) ⇒ Object

Set the page range to print.

  • pages - A comma separated list of page numbers or ranges.

  • Returns - The converter object.



5362
5363
5364
5365
5366
5367
5368
5369
# File 'lib/pdfcrowd.rb', line 5362

def setPrintPageRange(pages)
    unless /^(?:\s*(?:\d+|(?:\d*\s*\-\s*\d+)|(?:\d+\s*\-\s*\d*))\s*,\s*)*\s*(?:\d+|(?:\d*\s*\-\s*\d+)|(?:\d+\s*\-\s*\d*))\s*$/.match(pages)
        raise Error.new(Pdfcrowd.create_invalid_value_message(pages, "setPrintPageRange", "pdf-to-html", "A comma separated list of page numbers or ranges.", "set_print_page_range"), 470);
    end
    
    @fields['print_page_range'] = pages
    self
end

#setProxy(host, port, user_name, password) ⇒ Object

Specifies an HTTP proxy that the API client library will use to connect to the internet.

  • host - The proxy hostname.

  • port - The proxy port.

  • user_name - The username.

  • password - The password.

  • Returns - The converter object.



5576
5577
5578
5579
# File 'lib/pdfcrowd.rb', line 5576

def setProxy(host, port, user_name, password)
    @helper.setProxy(host, port, user_name, password)
    self
end

#setRetryCount(count) ⇒ Object

Specifies the number of automatic retries when the 502 HTTP status code is received. The 502 status code indicates a temporary network issue. This feature can be disabled by setting to 0.

  • count - Number of retries.

  • Returns - The converter object.



5585
5586
5587
5588
# File 'lib/pdfcrowd.rb', line 5585

def setRetryCount(count)
    @helper.setRetryCount(count)
    self
end

#setScaleFactor(factor) ⇒ Object

Set the scaling factor (zoom) for the main page area.

  • factor - The percentage value. Must be a positive integer number.

  • Returns - The converter object.



5349
5350
5351
5352
5353
5354
5355
5356
# File 'lib/pdfcrowd.rb', line 5349

def setScaleFactor(factor)
    if (!(Integer(factor) > 0))
        raise Error.new(Pdfcrowd.create_invalid_value_message(factor, "setScaleFactor", "pdf-to-html", "Must be a positive integer number.", "set_scale_factor"), 470);
    end
    
    @fields['scale_factor'] = factor
    self
end

#setSubject(subject) ⇒ Object

Set the HTML subject. The subject from the input PDF is used by default.

  • subject - The HTML subject.

  • Returns - The converter object.



5438
5439
5440
5441
# File 'lib/pdfcrowd.rb', line 5438

def setSubject(subject)
    @fields['subject'] = subject
    self
end

#setTag(tag) ⇒ Object

Tag the conversion with a custom value. The tag is used in conversion statistics. A value longer than 32 characters is cut off.

  • tag - A string with the custom tag.

  • Returns - The converter object.



5519
5520
5521
5522
# File 'lib/pdfcrowd.rb', line 5519

def setTag(tag)
    @fields['tag'] = tag
    self
end

#setTitle(title) ⇒ Object

Set the HTML title. The title from the input PDF is used by default.

  • title - The HTML title.

  • Returns - The converter object.



5429
5430
5431
5432
# File 'lib/pdfcrowd.rb', line 5429

def setTitle(title)
    @fields['title'] = title
    self
end

#setUseHttp(value) ⇒ Object

Specifies if the client communicates over HTTP or HTTPS with Pdfcrowd API. Warning: Using HTTP is insecure as data sent over HTTP is not encrypted. Enable this option only if you know what you are doing.

  • value - Set to true to use HTTP.

  • Returns - The converter object.



5555
5556
5557
5558
# File 'lib/pdfcrowd.rb', line 5555

def setUseHttp(value)
    @helper.setUseHttp(value)
    self
end

#setUserAgent(agent) ⇒ Object

Set a custom user agent HTTP header. It can be useful if you are behind a proxy or a firewall.

  • agent - The user agent string.

  • Returns - The converter object.



5564
5565
5566
5567
# File 'lib/pdfcrowd.rb', line 5564

def setUserAgent(agent)
    @helper.setUserAgent(agent)
    self
end