Class: Pdfcrowd::PdfToHtmlClient

Inherits:
Object
  • Object
show all
Defined in:
lib/pdfcrowd.rb

Overview

Conversion from PDF to HTML.

Instance Method Summary collapse

Constructor Details

#initialize(user_name, api_key) ⇒ PdfToHtmlClient

Constructor for the Pdfcrowd API client.

  • user_name - Your username at Pdfcrowd.

  • api_key - Your API key.



5302
5303
5304
5305
5306
5307
5308
5309
5310
5311
# File 'lib/pdfcrowd.rb', line 5302

def initialize(user_name, api_key)
    @helper = ConnectionHelper.new(user_name, api_key)
    @fields = {
        'input_format'=>'pdf',
        'output_format'=>'html'
    }
    @file_id = 1
    @files = {}
    @raw_data = {}
end

Instance Method Details

#convertFile(file) ⇒ Object

Convert a local file.

  • file - The path to a local file to convert. The file must exist and not be empty.

  • Returns - Byte array containing the conversion output.



5367
5368
5369
5370
5371
5372
5373
5374
# File 'lib/pdfcrowd.rb', line 5367

def convertFile(file)
    if (!(File.file?(file) && !File.zero?(file)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file, "convertFile", "pdf-to-html", "The file must exist and not be empty.", "convert_file"), 470);
    end
    
    @files['file'] = file
    @helper.post(@fields, @files, @raw_data)
end

#convertFileToFile(file, file_path) ⇒ Object

Convert a local file and write the result to a local file.

  • file - The path to a local file to convert. The file must exist and not be empty.

  • file_path - The output file path. The string must not be empty. The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.



5393
5394
5395
5396
5397
5398
5399
5400
5401
5402
5403
5404
5405
5406
5407
5408
5409
5410
5411
# File 'lib/pdfcrowd.rb', line 5393

def convertFileToFile(file, file_path)
    if (!(!file_path.nil? && !file_path.empty?))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertFileToFile::file_path", "pdf-to-html", "The string must not be empty.", "convert_file_to_file"), 470);
    end
    
    if (!(isOutputTypeValid(file_path)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertFileToFile::file_path", "pdf-to-html", "The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.", "convert_file_to_file"), 470);
    end
    
    output_file = open(file_path, "wb")
    begin
        convertFileToStream(file, output_file)
        output_file.close()
    rescue Error => why
        output_file.close()
        FileUtils.rm(file_path)
        raise
    end
end

#convertFileToStream(file, out_stream) ⇒ Object

Convert a local file and write the result to an output stream.

  • file - The path to a local file to convert. The file must exist and not be empty.

  • out_stream - The output stream that will contain the conversion output.



5380
5381
5382
5383
5384
5385
5386
5387
# File 'lib/pdfcrowd.rb', line 5380

def convertFileToStream(file, out_stream)
    if (!(File.file?(file) && !File.zero?(file)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file, "convertFileToStream::file", "pdf-to-html", "The file must exist and not be empty.", "convert_file_to_stream"), 470);
    end
    
    @files['file'] = file
    @helper.post(@fields, @files, @raw_data, out_stream)
end

#convertRawData(data) ⇒ Object

Convert raw data.

  • data - The raw content to be converted.

  • Returns - Byte array with the output.



5417
5418
5419
5420
# File 'lib/pdfcrowd.rb', line 5417

def convertRawData(data)
    @raw_data['file'] = data
    @helper.post(@fields, @files, @raw_data)
end

#convertRawDataToFile(data, file_path) ⇒ Object

Convert raw data to a file.

  • data - The raw content to be converted.

  • file_path - The output file path. The string must not be empty. The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.



5435
5436
5437
5438
5439
5440
5441
5442
5443
5444
5445
5446
5447
5448
5449
5450
5451
5452
5453
# File 'lib/pdfcrowd.rb', line 5435

def convertRawDataToFile(data, file_path)
    if (!(!file_path.nil? && !file_path.empty?))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertRawDataToFile::file_path", "pdf-to-html", "The string must not be empty.", "convert_raw_data_to_file"), 470);
    end
    
    if (!(isOutputTypeValid(file_path)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertRawDataToFile::file_path", "pdf-to-html", "The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.", "convert_raw_data_to_file"), 470);
    end
    
    output_file = open(file_path, "wb")
    begin
        convertRawDataToStream(data, output_file)
        output_file.close()
    rescue Error => why
        output_file.close()
        FileUtils.rm(file_path)
        raise
    end
end

#convertRawDataToStream(data, out_stream) ⇒ Object

Convert raw data and write the result to an output stream.

  • data - The raw content to be converted.

  • out_stream - The output stream that will contain the conversion output.



5426
5427
5428
5429
# File 'lib/pdfcrowd.rb', line 5426

def convertRawDataToStream(data, out_stream)
    @raw_data['file'] = data
    @helper.post(@fields, @files, @raw_data, out_stream)
end

#convertStream(in_stream) ⇒ Object

Convert the contents of an input stream.

  • in_stream - The input stream with source data.

  • Returns - Byte array containing the conversion output.



5459
5460
5461
5462
# File 'lib/pdfcrowd.rb', line 5459

def convertStream(in_stream)
    @raw_data['stream'] = in_stream.read
    @helper.post(@fields, @files, @raw_data)
end

#convertStreamToFile(in_stream, file_path) ⇒ Object

Convert the contents of an input stream and write the result to a local file.

  • in_stream - The input stream with source data.

  • file_path - The output file path. The string must not be empty. The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.



5477
5478
5479
5480
5481
5482
5483
5484
5485
5486
5487
5488
5489
5490
5491
5492
5493
5494
5495
# File 'lib/pdfcrowd.rb', line 5477

def convertStreamToFile(in_stream, file_path)
    if (!(!file_path.nil? && !file_path.empty?))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertStreamToFile::file_path", "pdf-to-html", "The string must not be empty.", "convert_stream_to_file"), 470);
    end
    
    if (!(isOutputTypeValid(file_path)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertStreamToFile::file_path", "pdf-to-html", "The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.", "convert_stream_to_file"), 470);
    end
    
    output_file = open(file_path, "wb")
    begin
        convertStreamToStream(in_stream, output_file)
        output_file.close()
    rescue Error => why
        output_file.close()
        FileUtils.rm(file_path)
        raise
    end
end

#convertStreamToStream(in_stream, out_stream) ⇒ Object

Convert the contents of an input stream and write the result to an output stream.

  • in_stream - The input stream with source data.

  • out_stream - The output stream that will contain the conversion output.



5468
5469
5470
5471
# File 'lib/pdfcrowd.rb', line 5468

def convertStreamToStream(in_stream, out_stream)
    @raw_data['stream'] = in_stream.read
    @helper.post(@fields, @files, @raw_data, out_stream)
end

#convertUrl(url) ⇒ Object

Convert a PDF.

  • url - The address of the PDF to convert. The supported protocols are http:// and https://.

  • Returns - Byte array containing the conversion output.



5317
5318
5319
5320
5321
5322
5323
5324
# File 'lib/pdfcrowd.rb', line 5317

def convertUrl(url)
    unless /(?i)^https?:\/\/.*$/.match(url)
        raise Error.new(Pdfcrowd.create_invalid_value_message(url, "convertUrl", "pdf-to-html", "The supported protocols are http:// and https://.", "convert_url"), 470);
    end
    
    @fields['url'] = url
    @helper.post(@fields, @files, @raw_data)
end

#convertUrlToFile(url, file_path) ⇒ Object

Convert a PDF and write the result to a local file.

  • url - The address of the PDF to convert. The supported protocols are http:// and https://.

  • file_path - The output file path. The string must not be empty. The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.



5343
5344
5345
5346
5347
5348
5349
5350
5351
5352
5353
5354
5355
5356
5357
5358
5359
5360
5361
# File 'lib/pdfcrowd.rb', line 5343

def convertUrlToFile(url, file_path)
    if (!(!file_path.nil? && !file_path.empty?))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertUrlToFile::file_path", "pdf-to-html", "The string must not be empty.", "convert_url_to_file"), 470);
    end
    
    if (!(isOutputTypeValid(file_path)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertUrlToFile::file_path", "pdf-to-html", "The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.", "convert_url_to_file"), 470);
    end
    
    output_file = open(file_path, "wb")
    begin
        convertUrlToStream(url, output_file)
        output_file.close()
    rescue Error => why
        output_file.close()
        FileUtils.rm(file_path)
        raise
    end
end

#convertUrlToStream(url, out_stream) ⇒ Object

Convert a PDF and write the result to an output stream.

  • url - The address of the PDF to convert. The supported protocols are http:// and https://.

  • out_stream - The output stream that will contain the conversion output.



5330
5331
5332
5333
5334
5335
5336
5337
# File 'lib/pdfcrowd.rb', line 5330

def convertUrlToStream(url, out_stream)
    unless /(?i)^https?:\/\/.*$/.match(url)
        raise Error.new(Pdfcrowd.create_invalid_value_message(url, "convertUrlToStream::url", "pdf-to-html", "The supported protocols are http:// and https://.", "convert_url_to_stream"), 470);
    end
    
    @fields['url'] = url
    @helper.post(@fields, @files, @raw_data, out_stream)
end

#getConsumedCreditCountObject

Get the number of credits consumed by the last conversion.

  • Returns - The number of credits.



5648
5649
5650
# File 'lib/pdfcrowd.rb', line 5648

def getConsumedCreditCount()
    return @helper.getConsumedCreditCount()
end

#getDebugLogUrlObject

Get the URL of the debug log for the last conversion.

  • Returns - The link to the debug log.



5633
5634
5635
# File 'lib/pdfcrowd.rb', line 5633

def getDebugLogUrl()
    return @helper.getDebugLogUrl()
end

#getJobIdObject

Get the job id.

  • Returns - The unique job identifier.



5654
5655
5656
# File 'lib/pdfcrowd.rb', line 5654

def getJobId()
    return @helper.getJobId()
end

#getOutputSizeObject

Get the size of the output in bytes.

  • Returns - The count of bytes.



5666
5667
5668
# File 'lib/pdfcrowd.rb', line 5666

def getOutputSize()
    return @helper.getOutputSize()
end

#getPageCountObject

Get the number of pages in the output document.

  • Returns - The page count.



5660
5661
5662
# File 'lib/pdfcrowd.rb', line 5660

def getPageCount()
    return @helper.getPageCount()
end

#getRemainingCreditCountObject

Get the number of conversion credits available in your account. This method can only be called after a call to one of the convertXtoY methods. The returned value can differ from the actual count if you run parallel conversions. The special value 999999 is returned if the information is not available.

  • Returns - The number of credits.



5642
5643
5644
# File 'lib/pdfcrowd.rb', line 5642

def getRemainingCreditCount()
    return @helper.getRemainingCreditCount()
end

#getVersionObject

Get the version details.

  • Returns - API version, converter version, and client version.



5672
5673
5674
# File 'lib/pdfcrowd.rb', line 5672

def getVersion()
    return "client " + CLIENT_VERSION + ", API v2, converter " + @helper.getConverterVersion()
end

#isZippedOutputObject

A helper method to determine if the output file is a zip archive. The output of the conversion may be either an HTML file or a zip file containing the HTML and its external assets.

  • Returns - True if the conversion output is a zip file, otherwise False.



5573
5574
5575
# File 'lib/pdfcrowd.rb', line 5573

def isZippedOutput()
    @fields.fetch('image_mode', '') == 'separate' || @fields.fetch('css_mode', '') == 'separate' || @fields.fetch('font_mode', '') == 'separate' || @fields.fetch('force_zip', false) == true
end

#setAuthor(author) ⇒ Object

Set the HTML author. The author from the input PDF is used by default.

  • author - The HTML author.

  • Returns - The converter object.



5608
5609
5610
5611
# File 'lib/pdfcrowd.rb', line 5608

def setAuthor(author)
    @fields['author'] = author
    self
end

#setCssMode(mode) ⇒ Object

Specifies where the style sheets are stored.

  • mode - The style sheet storage mode. Allowed values are embed, separate.

  • Returns - The converter object.



5549
5550
5551
5552
5553
5554
5555
5556
# File 'lib/pdfcrowd.rb', line 5549

def setCssMode(mode)
    unless /(?i)^(embed|separate)$/.match(mode)
        raise Error.new(Pdfcrowd.create_invalid_value_message(mode, "setCssMode", "pdf-to-html", "Allowed values are embed, separate.", "set_css_mode"), 470);
    end
    
    @fields['css_mode'] = mode
    self
end

#setDebugLog(value) ⇒ Object

Turn on the debug logging. Details about the conversion are stored in the debug log. The URL of the log can be obtained from the getDebugLogUrl method or available in conversion statistics.

  • value - Set to true to enable the debug logging.

  • Returns - The converter object.



5626
5627
5628
5629
# File 'lib/pdfcrowd.rb', line 5626

def setDebugLog(value)
    @fields['debug_log'] = value
    self
end

#setFontMode(mode) ⇒ Object

Specifies where the fonts are stored.

  • mode - The font storage mode. Allowed values are embed, separate.

  • Returns - The converter object.



5562
5563
5564
5565
5566
5567
5568
5569
# File 'lib/pdfcrowd.rb', line 5562

def setFontMode(mode)
    unless /(?i)^(embed|separate)$/.match(mode)
        raise Error.new(Pdfcrowd.create_invalid_value_message(mode, "setFontMode", "pdf-to-html", "Allowed values are embed, separate.", "set_font_mode"), 470);
    end
    
    @fields['font_mode'] = mode
    self
end

#setForceZip(value) ⇒ Object

Enforces the zip output format.

  • value - Set to true to get the output as a zip archive.

  • Returns - The converter object.



5581
5582
5583
5584
# File 'lib/pdfcrowd.rb', line 5581

def setForceZip(value)
    @fields['force_zip'] = value
    self
end

#setHttpProxy(proxy) ⇒ Object

A proxy server used by Pdfcrowd conversion process for accessing the source URLs with HTTP scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.

  • proxy - The value must have format DOMAIN_OR_IP_ADDRESS:PORT.

  • Returns - The converter object.



5689
5690
5691
5692
5693
5694
5695
5696
# File 'lib/pdfcrowd.rb', line 5689

def setHttpProxy(proxy)
    unless /(?i)^([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z0-9]{1,}:\d+$/.match(proxy)
        raise Error.new(Pdfcrowd.create_invalid_value_message(proxy, "setHttpProxy", "pdf-to-html", "The value must have format DOMAIN_OR_IP_ADDRESS:PORT.", "set_http_proxy"), 470);
    end
    
    @fields['http_proxy'] = proxy
    self
end

#setHttpsProxy(proxy) ⇒ Object

A proxy server used by Pdfcrowd conversion process for accessing the source URLs with HTTPS scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.

  • proxy - The value must have format DOMAIN_OR_IP_ADDRESS:PORT.

  • Returns - The converter object.



5702
5703
5704
5705
5706
5707
5708
5709
# File 'lib/pdfcrowd.rb', line 5702

def setHttpsProxy(proxy)
    unless /(?i)^([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z0-9]{1,}:\d+$/.match(proxy)
        raise Error.new(Pdfcrowd.create_invalid_value_message(proxy, "setHttpsProxy", "pdf-to-html", "The value must have format DOMAIN_OR_IP_ADDRESS:PORT.", "set_https_proxy"), 470);
    end
    
    @fields['https_proxy'] = proxy
    self
end

#setImageMode(mode) ⇒ Object

Specifies where the images are stored.

  • mode - The image storage mode. Allowed values are embed, separate.

  • Returns - The converter object.



5536
5537
5538
5539
5540
5541
5542
5543
# File 'lib/pdfcrowd.rb', line 5536

def setImageMode(mode)
    unless /(?i)^(embed|separate)$/.match(mode)
        raise Error.new(Pdfcrowd.create_invalid_value_message(mode, "setImageMode", "pdf-to-html", "Allowed values are embed, separate.", "set_image_mode"), 470);
    end
    
    @fields['image_mode'] = mode
    self
end

#setKeywords(keywords) ⇒ Object

Associate keywords with the HTML document. Keywords from the input PDF are used by default.

  • keywords - The string containing the keywords.

  • Returns - The converter object.



5617
5618
5619
5620
# File 'lib/pdfcrowd.rb', line 5617

def setKeywords(keywords)
    @fields['keywords'] = keywords
    self
end

#setPdfPassword(password) ⇒ Object

Password to open the encrypted PDF file.

  • password - The input PDF password.

  • Returns - The converter object.



5501
5502
5503
5504
# File 'lib/pdfcrowd.rb', line 5501

def setPdfPassword(password)
    @fields['pdf_password'] = password
    self
end

#setPrintPageRange(pages) ⇒ Object

Set the page range to print.

  • pages - A comma separated list of page numbers or ranges.

  • Returns - The converter object.



5523
5524
5525
5526
5527
5528
5529
5530
# File 'lib/pdfcrowd.rb', line 5523

def setPrintPageRange(pages)
    unless /^(?:\s*(?:\d+|(?:\d*\s*\-\s*\d+)|(?:\d+\s*\-\s*\d*))\s*,\s*)*\s*(?:\d+|(?:\d*\s*\-\s*\d+)|(?:\d+\s*\-\s*\d*))\s*$/.match(pages)
        raise Error.new(Pdfcrowd.create_invalid_value_message(pages, "setPrintPageRange", "pdf-to-html", "A comma separated list of page numbers or ranges.", "set_print_page_range"), 470);
    end
    
    @fields['print_page_range'] = pages
    self
end

#setProxy(host, port, user_name, password) ⇒ Object

Specifies an HTTP proxy that the API client library will use to connect to the internet.

  • host - The proxy hostname.

  • port - The proxy port.

  • user_name - The username.

  • password - The password.

  • Returns - The converter object.



5737
5738
5739
5740
# File 'lib/pdfcrowd.rb', line 5737

def setProxy(host, port, user_name, password)
    @helper.setProxy(host, port, user_name, password)
    self
end

#setRetryCount(count) ⇒ Object

Specifies the number of automatic retries when the 502 or 503 HTTP status code is received. The status code indicates a temporary network issue. This feature can be disabled by setting to 0.

  • count - Number of retries.

  • Returns - The converter object.



5746
5747
5748
5749
# File 'lib/pdfcrowd.rb', line 5746

def setRetryCount(count)
    @helper.setRetryCount(count)
    self
end

#setScaleFactor(factor) ⇒ Object

Set the scaling factor (zoom) for the main page area.

  • factor - The percentage value. Must be a positive integer number.

  • Returns - The converter object.



5510
5511
5512
5513
5514
5515
5516
5517
# File 'lib/pdfcrowd.rb', line 5510

def setScaleFactor(factor)
    if (!(Integer(factor) > 0))
        raise Error.new(Pdfcrowd.create_invalid_value_message(factor, "setScaleFactor", "pdf-to-html", "Must be a positive integer number.", "set_scale_factor"), 470);
    end
    
    @fields['scale_factor'] = factor
    self
end

#setSubject(subject) ⇒ Object

Set the HTML subject. The subject from the input PDF is used by default.

  • subject - The HTML subject.

  • Returns - The converter object.



5599
5600
5601
5602
# File 'lib/pdfcrowd.rb', line 5599

def setSubject(subject)
    @fields['subject'] = subject
    self
end

#setTag(tag) ⇒ Object

Tag the conversion with a custom value. The tag is used in conversion statistics. A value longer than 32 characters is cut off.

  • tag - A string with the custom tag.

  • Returns - The converter object.



5680
5681
5682
5683
# File 'lib/pdfcrowd.rb', line 5680

def setTag(tag)
    @fields['tag'] = tag
    self
end

#setTitle(title) ⇒ Object

Set the HTML title. The title from the input PDF is used by default.

  • title - The HTML title.

  • Returns - The converter object.



5590
5591
5592
5593
# File 'lib/pdfcrowd.rb', line 5590

def setTitle(title)
    @fields['title'] = title
    self
end

#setUseHttp(value) ⇒ Object

Specifies if the client communicates over HTTP or HTTPS with Pdfcrowd API. Warning: Using HTTP is insecure as data sent over HTTP is not encrypted. Enable this option only if you know what you are doing.

  • value - Set to true to use HTTP.

  • Returns - The converter object.



5716
5717
5718
5719
# File 'lib/pdfcrowd.rb', line 5716

def setUseHttp(value)
    @helper.setUseHttp(value)
    self
end

#setUserAgent(agent) ⇒ Object

Set a custom user agent HTTP header. It can be useful if you are behind a proxy or a firewall.

  • agent - The user agent string.

  • Returns - The converter object.



5725
5726
5727
5728
# File 'lib/pdfcrowd.rb', line 5725

def setUserAgent(agent)
    @helper.setUserAgent(agent)
    self
end