Class: Pdfcrowd::PdfToHtmlClient

Inherits:
Object
  • Object
show all
Defined in:
lib/pdfcrowd.rb

Overview

Conversion from PDF to HTML.

Instance Method Summary collapse

Constructor Details

#initialize(user_name, api_key) ⇒ PdfToHtmlClient

Constructor for the Pdfcrowd API client.

  • user_name - Your username at Pdfcrowd.

  • api_key - Your API key.



5415
5416
5417
5418
5419
5420
5421
5422
5423
5424
# File 'lib/pdfcrowd.rb', line 5415

def initialize(user_name, api_key)
    @helper = ConnectionHelper.new(user_name, api_key)
    @fields = {
        'input_format'=>'pdf',
        'output_format'=>'html'
    }
    @file_id = 1
    @files = {}
    @raw_data = {}
end

Instance Method Details

#convertFile(file) ⇒ Object

Convert a local file.

  • file - The path to a local file to convert. The file must exist and not be empty.

  • Returns - Byte array containing the conversion output.



5480
5481
5482
5483
5484
5485
5486
5487
# File 'lib/pdfcrowd.rb', line 5480

def convertFile(file)
    if (!(File.file?(file) && !File.zero?(file)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file, "convertFile", "pdf-to-html", "The file must exist and not be empty.", "convert_file"), 470);
    end
    
    @files['file'] = file
    @helper.post(@fields, @files, @raw_data)
end

#convertFileToFile(file, file_path) ⇒ Object

Convert a local file and write the result to a local file.

  • file - The path to a local file to convert. The file must exist and not be empty.

  • file_path - The output file path. The string must not be empty. The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.



5506
5507
5508
5509
5510
5511
5512
5513
5514
5515
5516
5517
5518
5519
5520
5521
5522
5523
5524
# File 'lib/pdfcrowd.rb', line 5506

def convertFileToFile(file, file_path)
    if (!(!file_path.nil? && !file_path.empty?))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertFileToFile::file_path", "pdf-to-html", "The string must not be empty.", "convert_file_to_file"), 470);
    end
    
    if (!(isOutputTypeValid(file_path)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertFileToFile::file_path", "pdf-to-html", "The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.", "convert_file_to_file"), 470);
    end
    
    output_file = open(file_path, "wb")
    begin
        convertFileToStream(file, output_file)
        output_file.close()
    rescue Error => why
        output_file.close()
        FileUtils.rm(file_path)
        raise
    end
end

#convertFileToStream(file, out_stream) ⇒ Object

Convert a local file and write the result to an output stream.

  • file - The path to a local file to convert. The file must exist and not be empty.

  • out_stream - The output stream that will contain the conversion output.



5493
5494
5495
5496
5497
5498
5499
5500
# File 'lib/pdfcrowd.rb', line 5493

def convertFileToStream(file, out_stream)
    if (!(File.file?(file) && !File.zero?(file)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file, "convertFileToStream::file", "pdf-to-html", "The file must exist and not be empty.", "convert_file_to_stream"), 470);
    end
    
    @files['file'] = file
    @helper.post(@fields, @files, @raw_data, out_stream)
end

#convertRawData(data) ⇒ Object

Convert raw data.

  • data - The raw content to be converted.

  • Returns - Byte array with the output.



5530
5531
5532
5533
# File 'lib/pdfcrowd.rb', line 5530

def convertRawData(data)
    @raw_data['file'] = data
    @helper.post(@fields, @files, @raw_data)
end

#convertRawDataToFile(data, file_path) ⇒ Object

Convert raw data to a file.

  • data - The raw content to be converted.

  • file_path - The output file path. The string must not be empty. The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.



5548
5549
5550
5551
5552
5553
5554
5555
5556
5557
5558
5559
5560
5561
5562
5563
5564
5565
5566
# File 'lib/pdfcrowd.rb', line 5548

def convertRawDataToFile(data, file_path)
    if (!(!file_path.nil? && !file_path.empty?))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertRawDataToFile::file_path", "pdf-to-html", "The string must not be empty.", "convert_raw_data_to_file"), 470);
    end
    
    if (!(isOutputTypeValid(file_path)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertRawDataToFile::file_path", "pdf-to-html", "The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.", "convert_raw_data_to_file"), 470);
    end
    
    output_file = open(file_path, "wb")
    begin
        convertRawDataToStream(data, output_file)
        output_file.close()
    rescue Error => why
        output_file.close()
        FileUtils.rm(file_path)
        raise
    end
end

#convertRawDataToStream(data, out_stream) ⇒ Object

Convert raw data and write the result to an output stream.

  • data - The raw content to be converted.

  • out_stream - The output stream that will contain the conversion output.



5539
5540
5541
5542
# File 'lib/pdfcrowd.rb', line 5539

def convertRawDataToStream(data, out_stream)
    @raw_data['file'] = data
    @helper.post(@fields, @files, @raw_data, out_stream)
end

#convertStream(in_stream) ⇒ Object

Convert the contents of an input stream.

  • in_stream - The input stream with source data.

  • Returns - Byte array containing the conversion output.



5572
5573
5574
5575
# File 'lib/pdfcrowd.rb', line 5572

def convertStream(in_stream)
    @raw_data['stream'] = in_stream.read
    @helper.post(@fields, @files, @raw_data)
end

#convertStreamToFile(in_stream, file_path) ⇒ Object

Convert the contents of an input stream and write the result to a local file.

  • in_stream - The input stream with source data.

  • file_path - The output file path. The string must not be empty. The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.



5590
5591
5592
5593
5594
5595
5596
5597
5598
5599
5600
5601
5602
5603
5604
5605
5606
5607
5608
# File 'lib/pdfcrowd.rb', line 5590

def convertStreamToFile(in_stream, file_path)
    if (!(!file_path.nil? && !file_path.empty?))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertStreamToFile::file_path", "pdf-to-html", "The string must not be empty.", "convert_stream_to_file"), 470);
    end
    
    if (!(isOutputTypeValid(file_path)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertStreamToFile::file_path", "pdf-to-html", "The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.", "convert_stream_to_file"), 470);
    end
    
    output_file = open(file_path, "wb")
    begin
        convertStreamToStream(in_stream, output_file)
        output_file.close()
    rescue Error => why
        output_file.close()
        FileUtils.rm(file_path)
        raise
    end
end

#convertStreamToStream(in_stream, out_stream) ⇒ Object

Convert the contents of an input stream and write the result to an output stream.

  • in_stream - The input stream with source data.

  • out_stream - The output stream that will contain the conversion output.



5581
5582
5583
5584
# File 'lib/pdfcrowd.rb', line 5581

def convertStreamToStream(in_stream, out_stream)
    @raw_data['stream'] = in_stream.read
    @helper.post(@fields, @files, @raw_data, out_stream)
end

#convertUrl(url) ⇒ Object

Convert a PDF.

  • url - The address of the PDF to convert. The supported protocols are http:// and https://.

  • Returns - Byte array containing the conversion output.



5430
5431
5432
5433
5434
5435
5436
5437
# File 'lib/pdfcrowd.rb', line 5430

def convertUrl(url)
    unless /(?i)^https?:\/\/.*$/.match(url)
        raise Error.new(Pdfcrowd.create_invalid_value_message(url, "convertUrl", "pdf-to-html", "The supported protocols are http:// and https://.", "convert_url"), 470);
    end
    
    @fields['url'] = url
    @helper.post(@fields, @files, @raw_data)
end

#convertUrlToFile(url, file_path) ⇒ Object

Convert a PDF and write the result to a local file.

  • url - The address of the PDF to convert. The supported protocols are http:// and https://.

  • file_path - The output file path. The string must not be empty. The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.



5456
5457
5458
5459
5460
5461
5462
5463
5464
5465
5466
5467
5468
5469
5470
5471
5472
5473
5474
# File 'lib/pdfcrowd.rb', line 5456

def convertUrlToFile(url, file_path)
    if (!(!file_path.nil? && !file_path.empty?))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertUrlToFile::file_path", "pdf-to-html", "The string must not be empty.", "convert_url_to_file"), 470);
    end
    
    if (!(isOutputTypeValid(file_path)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertUrlToFile::file_path", "pdf-to-html", "The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.", "convert_url_to_file"), 470);
    end
    
    output_file = open(file_path, "wb")
    begin
        convertUrlToStream(url, output_file)
        output_file.close()
    rescue Error => why
        output_file.close()
        FileUtils.rm(file_path)
        raise
    end
end

#convertUrlToStream(url, out_stream) ⇒ Object

Convert a PDF and write the result to an output stream.

  • url - The address of the PDF to convert. The supported protocols are http:// and https://.

  • out_stream - The output stream that will contain the conversion output.



5443
5444
5445
5446
5447
5448
5449
5450
# File 'lib/pdfcrowd.rb', line 5443

def convertUrlToStream(url, out_stream)
    unless /(?i)^https?:\/\/.*$/.match(url)
        raise Error.new(Pdfcrowd.create_invalid_value_message(url, "convertUrlToStream::url", "pdf-to-html", "The supported protocols are http:// and https://.", "convert_url_to_stream"), 470);
    end
    
    @fields['url'] = url
    @helper.post(@fields, @files, @raw_data, out_stream)
end

#getConsumedCreditCountObject

Get the number of credits consumed by the last conversion.

  • Returns - The number of credits.



5792
5793
5794
# File 'lib/pdfcrowd.rb', line 5792

def getConsumedCreditCount()
    return @helper.getConsumedCreditCount()
end

#getDebugLogUrlObject

Get the URL of the debug log for the last conversion.

  • Returns - The link to the debug log.



5777
5778
5779
# File 'lib/pdfcrowd.rb', line 5777

def getDebugLogUrl()
    return @helper.getDebugLogUrl()
end

#getJobIdObject

Get the job id.

  • Returns - The unique job identifier.



5798
5799
5800
# File 'lib/pdfcrowd.rb', line 5798

def getJobId()
    return @helper.getJobId()
end

#getOutputSizeObject

Get the size of the output in bytes.

  • Returns - The count of bytes.



5810
5811
5812
# File 'lib/pdfcrowd.rb', line 5810

def getOutputSize()
    return @helper.getOutputSize()
end

#getPageCountObject

Get the number of pages in the output document.

  • Returns - The page count.



5804
5805
5806
# File 'lib/pdfcrowd.rb', line 5804

def getPageCount()
    return @helper.getPageCount()
end

#getRemainingCreditCountObject

Get the number of conversion credits available in your account. This method can only be called after a call to one of the convertXtoY methods. The returned value can differ from the actual count if you run parallel conversions. The special value 999999 is returned if the information is not available.

  • Returns - The number of credits.



5786
5787
5788
# File 'lib/pdfcrowd.rb', line 5786

def getRemainingCreditCount()
    return @helper.getRemainingCreditCount()
end

#getVersionObject

Get the version details.

  • Returns - API version, converter version, and client version.



5816
5817
5818
# File 'lib/pdfcrowd.rb', line 5816

def getVersion()
    return "client " + CLIENT_VERSION + ", API v2, converter " + @helper.getConverterVersion()
end

#isZippedOutputObject

A helper method to determine if the output file is a zip archive. The output of the conversion may be either an HTML file or a zip file containing the HTML and its external assets.

  • Returns - True if the conversion output is a zip file, otherwise False.



5717
5718
5719
# File 'lib/pdfcrowd.rb', line 5717

def isZippedOutput()
    @fields.fetch('image_mode', '') == 'separate' || @fields.fetch('css_mode', '') == 'separate' || @fields.fetch('font_mode', '') == 'separate' || @fields.fetch('force_zip', false) == true
end

#setAuthor(author) ⇒ Object

Set the HTML author. The author from the input PDF is used by default.

  • author - The HTML author.

  • Returns - The converter object.



5752
5753
5754
5755
# File 'lib/pdfcrowd.rb', line 5752

def setAuthor(author)
    @fields['author'] = author
    self
end

#setCssMode(mode) ⇒ Object

Specifies where the style sheets are stored.

  • mode - The style sheet storage mode. Allowed values are embed, separate.

  • Returns - The converter object.



5684
5685
5686
5687
5688
5689
5690
5691
# File 'lib/pdfcrowd.rb', line 5684

def setCssMode(mode)
    unless /(?i)^(embed|separate)$/.match(mode)
        raise Error.new(Pdfcrowd.create_invalid_value_message(mode, "setCssMode", "pdf-to-html", "Allowed values are embed, separate.", "set_css_mode"), 470);
    end
    
    @fields['css_mode'] = mode
    self
end

#setDebugLog(value) ⇒ Object

Turn on the debug logging. Details about the conversion are stored in the debug log. The URL of the log can be obtained from the getDebugLogUrl method or available in conversion statistics.

  • value - Set to true to enable the debug logging.

  • Returns - The converter object.



5770
5771
5772
5773
# File 'lib/pdfcrowd.rb', line 5770

def setDebugLog(value)
    @fields['debug_log'] = value
    self
end

#setDpi(dpi) ⇒ Object

Set the output graphics DPI.

  • dpi - The DPI value.

  • Returns - The converter object.



5649
5650
5651
5652
# File 'lib/pdfcrowd.rb', line 5649

def setDpi(dpi)
    @fields['dpi'] = dpi
    self
end

#setFontMode(mode) ⇒ Object

Specifies where the fonts are stored.

  • mode - The font storage mode. Allowed values are embed, separate.

  • Returns - The converter object.



5697
5698
5699
5700
5701
5702
5703
5704
# File 'lib/pdfcrowd.rb', line 5697

def setFontMode(mode)
    unless /(?i)^(embed|separate)$/.match(mode)
        raise Error.new(Pdfcrowd.create_invalid_value_message(mode, "setFontMode", "pdf-to-html", "Allowed values are embed, separate.", "set_font_mode"), 470);
    end
    
    @fields['font_mode'] = mode
    self
end

#setForceZip(value) ⇒ Object

Enforces the zip output format.

  • value - Set to true to get the output as a zip archive.

  • Returns - The converter object.



5725
5726
5727
5728
# File 'lib/pdfcrowd.rb', line 5725

def setForceZip(value)
    @fields['force_zip'] = value
    self
end

#setHttpProxy(proxy) ⇒ Object

A proxy server used by Pdfcrowd conversion process for accessing the source URLs with HTTP scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.

  • proxy - The value must have format DOMAIN_OR_IP_ADDRESS:PORT.

  • Returns - The converter object.



5833
5834
5835
5836
5837
5838
5839
5840
# File 'lib/pdfcrowd.rb', line 5833

def setHttpProxy(proxy)
    unless /(?i)^([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z0-9]{1,}:\d+$/.match(proxy)
        raise Error.new(Pdfcrowd.create_invalid_value_message(proxy, "setHttpProxy", "pdf-to-html", "The value must have format DOMAIN_OR_IP_ADDRESS:PORT.", "set_http_proxy"), 470);
    end
    
    @fields['http_proxy'] = proxy
    self
end

#setHttpsProxy(proxy) ⇒ Object

A proxy server used by Pdfcrowd conversion process for accessing the source URLs with HTTPS scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.

  • proxy - The value must have format DOMAIN_OR_IP_ADDRESS:PORT.

  • Returns - The converter object.



5846
5847
5848
5849
5850
5851
5852
5853
# File 'lib/pdfcrowd.rb', line 5846

def setHttpsProxy(proxy)
    unless /(?i)^([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z0-9]{1,}:\d+$/.match(proxy)
        raise Error.new(Pdfcrowd.create_invalid_value_message(proxy, "setHttpsProxy", "pdf-to-html", "The value must have format DOMAIN_OR_IP_ADDRESS:PORT.", "set_https_proxy"), 470);
    end
    
    @fields['https_proxy'] = proxy
    self
end

#setImageFormat(image_format) ⇒ Object

Specifies the format for the output images.

  • image_format - The image format. Allowed values are png, jpg, svg.

  • Returns - The converter object.



5671
5672
5673
5674
5675
5676
5677
5678
# File 'lib/pdfcrowd.rb', line 5671

def setImageFormat(image_format)
    unless /(?i)^(png|jpg|svg)$/.match(image_format)
        raise Error.new(Pdfcrowd.create_invalid_value_message(image_format, "setImageFormat", "pdf-to-html", "Allowed values are png, jpg, svg.", "set_image_format"), 470);
    end
    
    @fields['image_format'] = image_format
    self
end

#setImageMode(mode) ⇒ Object

Specifies where the images are stored.

  • mode - The image storage mode. Allowed values are embed, separate, none.

  • Returns - The converter object.



5658
5659
5660
5661
5662
5663
5664
5665
# File 'lib/pdfcrowd.rb', line 5658

def setImageMode(mode)
    unless /(?i)^(embed|separate|none)$/.match(mode)
        raise Error.new(Pdfcrowd.create_invalid_value_message(mode, "setImageMode", "pdf-to-html", "Allowed values are embed, separate, none.", "set_image_mode"), 470);
    end
    
    @fields['image_mode'] = mode
    self
end

#setKeywords(keywords) ⇒ Object

Associate keywords with the HTML document. Keywords from the input PDF are used by default.

  • keywords - The string containing the keywords.

  • Returns - The converter object.



5761
5762
5763
5764
# File 'lib/pdfcrowd.rb', line 5761

def setKeywords(keywords)
    @fields['keywords'] = keywords
    self
end

#setPdfPassword(password) ⇒ Object

Password to open the encrypted PDF file.

  • password - The input PDF password.

  • Returns - The converter object.



5614
5615
5616
5617
# File 'lib/pdfcrowd.rb', line 5614

def setPdfPassword(password)
    @fields['pdf_password'] = password
    self
end

#setPrintPageRange(pages) ⇒ Object

Set the page range to print.

  • pages - A comma separated list of page numbers or ranges.

  • Returns - The converter object.



5636
5637
5638
5639
5640
5641
5642
5643
# File 'lib/pdfcrowd.rb', line 5636

def setPrintPageRange(pages)
    unless /^(?:\s*(?:\d+|(?:\d*\s*\-\s*\d+)|(?:\d+\s*\-\s*\d*))\s*,\s*)*\s*(?:\d+|(?:\d*\s*\-\s*\d+)|(?:\d+\s*\-\s*\d*))\s*$/.match(pages)
        raise Error.new(Pdfcrowd.create_invalid_value_message(pages, "setPrintPageRange", "pdf-to-html", "A comma separated list of page numbers or ranges.", "set_print_page_range"), 470);
    end
    
    @fields['print_page_range'] = pages
    self
end

#setProxy(host, port, user_name, password) ⇒ Object

Specifies an HTTP proxy that the API client library will use to connect to the internet.

  • host - The proxy hostname.

  • port - The proxy port.

  • user_name - The username.

  • password - The password.

  • Returns - The converter object.



5881
5882
5883
5884
# File 'lib/pdfcrowd.rb', line 5881

def setProxy(host, port, user_name, password)
    @helper.setProxy(host, port, user_name, password)
    self
end

#setRetryCount(count) ⇒ Object

Specifies the number of automatic retries when the 502 or 503 HTTP status code is received. The status code indicates a temporary network issue. This feature can be disabled by setting to 0.

  • count - Number of retries.

  • Returns - The converter object.



5890
5891
5892
5893
# File 'lib/pdfcrowd.rb', line 5890

def setRetryCount(count)
    @helper.setRetryCount(count)
    self
end

#setScaleFactor(factor) ⇒ Object

Set the scaling factor (zoom) for the main page area.

  • factor - The percentage value. Must be a positive integer number.

  • Returns - The converter object.



5623
5624
5625
5626
5627
5628
5629
5630
# File 'lib/pdfcrowd.rb', line 5623

def setScaleFactor(factor)
    if (!(Integer(factor) > 0))
        raise Error.new(Pdfcrowd.create_invalid_value_message(factor, "setScaleFactor", "pdf-to-html", "Must be a positive integer number.", "set_scale_factor"), 470);
    end
    
    @fields['scale_factor'] = factor
    self
end

#setSplitLigatures(value) ⇒ Object

Converts ligatures, two or more letters combined into a single glyph, back into their individual ASCII characters.

  • value - Set to true to split ligatures.

  • Returns - The converter object.



5710
5711
5712
5713
# File 'lib/pdfcrowd.rb', line 5710

def setSplitLigatures(value)
    @fields['split_ligatures'] = value
    self
end

#setSubject(subject) ⇒ Object

Set the HTML subject. The subject from the input PDF is used by default.

  • subject - The HTML subject.

  • Returns - The converter object.



5743
5744
5745
5746
# File 'lib/pdfcrowd.rb', line 5743

def setSubject(subject)
    @fields['subject'] = subject
    self
end

#setTag(tag) ⇒ Object

Tag the conversion with a custom value. The tag is used in conversion statistics. A value longer than 32 characters is cut off.

  • tag - A string with the custom tag.

  • Returns - The converter object.



5824
5825
5826
5827
# File 'lib/pdfcrowd.rb', line 5824

def setTag(tag)
    @fields['tag'] = tag
    self
end

#setTitle(title) ⇒ Object

Set the HTML title. The title from the input PDF is used by default.

  • title - The HTML title.

  • Returns - The converter object.



5734
5735
5736
5737
# File 'lib/pdfcrowd.rb', line 5734

def setTitle(title)
    @fields['title'] = title
    self
end

#setUseHttp(value) ⇒ Object

Specifies if the client communicates over HTTP or HTTPS with Pdfcrowd API. Warning: Using HTTP is insecure as data sent over HTTP is not encrypted. Enable this option only if you know what you are doing.

  • value - Set to true to use HTTP.

  • Returns - The converter object.



5860
5861
5862
5863
# File 'lib/pdfcrowd.rb', line 5860

def setUseHttp(value)
    @helper.setUseHttp(value)
    self
end

#setUserAgent(agent) ⇒ Object

Set a custom user agent HTTP header. It can be useful if you are behind a proxy or a firewall.

  • agent - The user agent string.

  • Returns - The converter object.



5869
5870
5871
5872
# File 'lib/pdfcrowd.rb', line 5869

def setUserAgent(agent)
    @helper.setUserAgent(agent)
    self
end