Class: Pdfcrowd::PdfToHtmlClient

Inherits:
Object
  • Object
show all
Defined in:
lib/pdfcrowd.rb

Overview

Conversion from PDF to HTML.

Instance Method Summary collapse

Constructor Details

#initialize(user_name, api_key) ⇒ PdfToHtmlClient

Constructor for the Pdfcrowd API client.

  • user_name - Your username at Pdfcrowd.

  • api_key - Your API key.



4450
4451
4452
4453
4454
4455
4456
4457
4458
4459
# File 'lib/pdfcrowd.rb', line 4450

def initialize(user_name, api_key)
    @helper = ConnectionHelper.new(user_name, api_key)
    @fields = {
        'input_format'=>'pdf',
        'output_format'=>'html'
    }
    @file_id = 1
    @files = {}
    @raw_data = {}
end

Instance Method Details

#convertFile(file) ⇒ Object

Convert a local file.

  • file - The path to a local file to convert. The file must exist and not be empty.

  • Returns - Byte array containing the conversion output.



4515
4516
4517
4518
4519
4520
4521
4522
# File 'lib/pdfcrowd.rb', line 4515

def convertFile(file)
    if (!(File.file?(file) && !File.zero?(file)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file, "convertFile", "pdf-to-html", "The file must exist and not be empty.", "convert_file"), 470);
    end
    
    @files['file'] = file
    @helper.post(@fields, @files, @raw_data)
end

#convertFileToFile(file, file_path) ⇒ Object

Convert a local file and write the result to a local file.

  • file - The path to a local file to convert. The file must exist and not be empty.

  • file_path - The output file path. The string must not be empty. The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.



4541
4542
4543
4544
4545
4546
4547
4548
4549
4550
4551
4552
4553
4554
4555
4556
4557
4558
4559
# File 'lib/pdfcrowd.rb', line 4541

def convertFileToFile(file, file_path)
    if (!(!file_path.nil? && !file_path.empty?))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertFileToFile::file_path", "pdf-to-html", "The string must not be empty.", "convert_file_to_file"), 470);
    end
    
    if (!(isOutputTypeValid(file_path)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertFileToFile::file_path", "pdf-to-html", "The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.", "convert_file_to_file"), 470);
    end
    
    output_file = open(file_path, "wb")
    begin
        convertFileToStream(file, output_file)
        output_file.close()
    rescue Error => why
        output_file.close()
        FileUtils.rm(file_path)
        raise
    end
end

#convertFileToStream(file, out_stream) ⇒ Object

Convert a local file and write the result to an output stream.

  • file - The path to a local file to convert. The file must exist and not be empty.

  • out_stream - The output stream that will contain the conversion output.



4528
4529
4530
4531
4532
4533
4534
4535
# File 'lib/pdfcrowd.rb', line 4528

def convertFileToStream(file, out_stream)
    if (!(File.file?(file) && !File.zero?(file)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file, "convertFileToStream::file", "pdf-to-html", "The file must exist and not be empty.", "convert_file_to_stream"), 470);
    end
    
    @files['file'] = file
    @helper.post(@fields, @files, @raw_data, out_stream)
end

#convertRawData(data) ⇒ Object

Convert raw data.

  • data - The raw content to be converted.

  • Returns - Byte array with the output.



4565
4566
4567
4568
# File 'lib/pdfcrowd.rb', line 4565

def convertRawData(data)
    @raw_data['file'] = data
    @helper.post(@fields, @files, @raw_data)
end

#convertRawDataToFile(data, file_path) ⇒ Object

Convert raw data to a file.

  • data - The raw content to be converted.

  • file_path - The output file path. The string must not be empty. The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.



4583
4584
4585
4586
4587
4588
4589
4590
4591
4592
4593
4594
4595
4596
4597
4598
4599
4600
4601
# File 'lib/pdfcrowd.rb', line 4583

def convertRawDataToFile(data, file_path)
    if (!(!file_path.nil? && !file_path.empty?))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertRawDataToFile::file_path", "pdf-to-html", "The string must not be empty.", "convert_raw_data_to_file"), 470);
    end
    
    if (!(isOutputTypeValid(file_path)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertRawDataToFile::file_path", "pdf-to-html", "The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.", "convert_raw_data_to_file"), 470);
    end
    
    output_file = open(file_path, "wb")
    begin
        convertRawDataToStream(data, output_file)
        output_file.close()
    rescue Error => why
        output_file.close()
        FileUtils.rm(file_path)
        raise
    end
end

#convertRawDataToStream(data, out_stream) ⇒ Object

Convert raw data and write the result to an output stream.

  • data - The raw content to be converted.

  • out_stream - The output stream that will contain the conversion output.



4574
4575
4576
4577
# File 'lib/pdfcrowd.rb', line 4574

def convertRawDataToStream(data, out_stream)
    @raw_data['file'] = data
    @helper.post(@fields, @files, @raw_data, out_stream)
end

#convertStream(in_stream) ⇒ Object

Convert the contents of an input stream.

  • in_stream - The input stream with source data.

  • Returns - Byte array containing the conversion output.



4607
4608
4609
4610
# File 'lib/pdfcrowd.rb', line 4607

def convertStream(in_stream)
    @raw_data['stream'] = in_stream.read
    @helper.post(@fields, @files, @raw_data)
end

#convertStreamToFile(in_stream, file_path) ⇒ Object

Convert the contents of an input stream and write the result to a local file.

  • in_stream - The input stream with source data.

  • file_path - The output file path. The string must not be empty. The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.



4625
4626
4627
4628
4629
4630
4631
4632
4633
4634
4635
4636
4637
4638
4639
4640
4641
4642
4643
# File 'lib/pdfcrowd.rb', line 4625

def convertStreamToFile(in_stream, file_path)
    if (!(!file_path.nil? && !file_path.empty?))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertStreamToFile::file_path", "pdf-to-html", "The string must not be empty.", "convert_stream_to_file"), 470);
    end
    
    if (!(isOutputTypeValid(file_path)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertStreamToFile::file_path", "pdf-to-html", "The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.", "convert_stream_to_file"), 470);
    end
    
    output_file = open(file_path, "wb")
    begin
        convertStreamToStream(in_stream, output_file)
        output_file.close()
    rescue Error => why
        output_file.close()
        FileUtils.rm(file_path)
        raise
    end
end

#convertStreamToStream(in_stream, out_stream) ⇒ Object

Convert the contents of an input stream and write the result to an output stream.

  • in_stream - The input stream with source data.

  • out_stream - The output stream that will contain the conversion output.



4616
4617
4618
4619
# File 'lib/pdfcrowd.rb', line 4616

def convertStreamToStream(in_stream, out_stream)
    @raw_data['stream'] = in_stream.read
    @helper.post(@fields, @files, @raw_data, out_stream)
end

#convertUrl(url) ⇒ Object

Convert a PDF.

  • url - The address of the PDF to convert. The supported protocols are http:// and https://.

  • Returns - Byte array containing the conversion output.



4465
4466
4467
4468
4469
4470
4471
4472
# File 'lib/pdfcrowd.rb', line 4465

def convertUrl(url)
    unless /(?i)^https?:\/\/.*$/.match(url)
        raise Error.new(Pdfcrowd.create_invalid_value_message(url, "convertUrl", "pdf-to-html", "The supported protocols are http:// and https://.", "convert_url"), 470);
    end
    
    @fields['url'] = url
    @helper.post(@fields, @files, @raw_data)
end

#convertUrlToFile(url, file_path) ⇒ Object

Convert a PDF and write the result to a local file.

  • url - The address of the PDF to convert. The supported protocols are http:// and https://.

  • file_path - The output file path. The string must not be empty. The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.



4491
4492
4493
4494
4495
4496
4497
4498
4499
4500
4501
4502
4503
4504
4505
4506
4507
4508
4509
# File 'lib/pdfcrowd.rb', line 4491

def convertUrlToFile(url, file_path)
    if (!(!file_path.nil? && !file_path.empty?))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertUrlToFile::file_path", "pdf-to-html", "The string must not be empty.", "convert_url_to_file"), 470);
    end
    
    if (!(isOutputTypeValid(file_path)))
        raise Error.new(Pdfcrowd.create_invalid_value_message(file_path, "convertUrlToFile::file_path", "pdf-to-html", "The converter generates an HTML or ZIP file. If ZIP file is generated, the file path must have a ZIP or zip extension.", "convert_url_to_file"), 470);
    end
    
    output_file = open(file_path, "wb")
    begin
        convertUrlToStream(url, output_file)
        output_file.close()
    rescue Error => why
        output_file.close()
        FileUtils.rm(file_path)
        raise
    end
end

#convertUrlToStream(url, out_stream) ⇒ Object

Convert a PDF and write the result to an output stream.

  • url - The address of the PDF to convert. The supported protocols are http:// and https://.

  • out_stream - The output stream that will contain the conversion output.



4478
4479
4480
4481
4482
4483
4484
4485
# File 'lib/pdfcrowd.rb', line 4478

def convertUrlToStream(url, out_stream)
    unless /(?i)^https?:\/\/.*$/.match(url)
        raise Error.new(Pdfcrowd.create_invalid_value_message(url, "convertUrlToStream::url", "pdf-to-html", "The supported protocols are http:// and https://.", "convert_url_to_stream"), 470);
    end
    
    @fields['url'] = url
    @helper.post(@fields, @files, @raw_data, out_stream)
end

#getConsumedCreditCountObject

Get the number of credits consumed by the last conversion.

  • Returns - The number of credits.



4796
4797
4798
# File 'lib/pdfcrowd.rb', line 4796

def getConsumedCreditCount()
    return @helper.getConsumedCreditCount()
end

#getDebugLogUrlObject

Get the URL of the debug log for the last conversion.

  • Returns - The link to the debug log.



4781
4782
4783
# File 'lib/pdfcrowd.rb', line 4781

def getDebugLogUrl()
    return @helper.getDebugLogUrl()
end

#getJobIdObject

Get the job id.

  • Returns - The unique job identifier.



4802
4803
4804
# File 'lib/pdfcrowd.rb', line 4802

def getJobId()
    return @helper.getJobId()
end

#getOutputSizeObject

Get the size of the output in bytes.

  • Returns - The count of bytes.



4814
4815
4816
# File 'lib/pdfcrowd.rb', line 4814

def getOutputSize()
    return @helper.getOutputSize()
end

#getPageCountObject

Get the number of pages in the output document.

  • Returns - The page count.



4808
4809
4810
# File 'lib/pdfcrowd.rb', line 4808

def getPageCount()
    return @helper.getPageCount()
end

#getRemainingCreditCountObject

Get the number of conversion credits available in your account. This method can only be called after a call to one of the convertXtoY methods. The returned value can differ from the actual count if you run parallel conversions. The special value 999999 is returned if the information is not available.

  • Returns - The number of credits.



4790
4791
4792
# File 'lib/pdfcrowd.rb', line 4790

def getRemainingCreditCount()
    return @helper.getRemainingCreditCount()
end

#getVersionObject

Get the version details.

  • Returns - API version, converter version, and client version.



4820
4821
4822
# File 'lib/pdfcrowd.rb', line 4820

def getVersion()
    return "client " + CLIENT_VERSION + ", API v2, converter " + @helper.getConverterVersion()
end

#isZippedOutputObject

A helper method to determine if the output file is a zip archive. The output of the conversion may be either an HTML file or a zip file containing the HTML and its external assets.

  • Returns - True if the conversion output is a zip file, otherwise False.



4721
4722
4723
# File 'lib/pdfcrowd.rb', line 4721

def isZippedOutput()
    @fields.fetch('image_mode', '') == 'separate' || @fields.fetch('css_mode', '') == 'separate' || @fields.fetch('font_mode', '') == 'separate' || @fields.fetch('force_zip', false) == true
end

#setAuthor(author) ⇒ Object

Set the HTML author. The author from the input PDF is used by default.

  • author - The HTML author.

  • Returns - The converter object.



4756
4757
4758
4759
# File 'lib/pdfcrowd.rb', line 4756

def setAuthor(author)
    @fields['author'] = author
    self
end

#setCssMode(mode) ⇒ Object

Specifies where the style sheets are stored.

  • mode - The style sheet storage mode. Allowed values are embed, separate.

  • Returns - The converter object.



4697
4698
4699
4700
4701
4702
4703
4704
# File 'lib/pdfcrowd.rb', line 4697

def setCssMode(mode)
    unless /(?i)^(embed|separate)$/.match(mode)
        raise Error.new(Pdfcrowd.create_invalid_value_message(mode, "setCssMode", "pdf-to-html", "Allowed values are embed, separate.", "set_css_mode"), 470);
    end
    
    @fields['css_mode'] = mode
    self
end

#setDebugLog(value) ⇒ Object

Turn on the debug logging. Details about the conversion are stored in the debug log. The URL of the log can be obtained from the getDebugLogUrl method or available in conversion statistics.

  • value - Set to true to enable the debug logging.

  • Returns - The converter object.



4774
4775
4776
4777
# File 'lib/pdfcrowd.rb', line 4774

def setDebugLog(value)
    @fields['debug_log'] = value
    self
end

#setFontMode(mode) ⇒ Object

Specifies where the fonts are stored.

  • mode - The font storage mode. Allowed values are embed, separate.

  • Returns - The converter object.



4710
4711
4712
4713
4714
4715
4716
4717
# File 'lib/pdfcrowd.rb', line 4710

def setFontMode(mode)
    unless /(?i)^(embed|separate)$/.match(mode)
        raise Error.new(Pdfcrowd.create_invalid_value_message(mode, "setFontMode", "pdf-to-html", "Allowed values are embed, separate.", "set_font_mode"), 470);
    end
    
    @fields['font_mode'] = mode
    self
end

#setForceZip(value) ⇒ Object

Enforces the zip output format.

  • value - Set to true to get the output as a zip archive.

  • Returns - The converter object.



4729
4730
4731
4732
# File 'lib/pdfcrowd.rb', line 4729

def setForceZip(value)
    @fields['force_zip'] = value
    self
end

#setHttpProxy(proxy) ⇒ Object

A proxy server used by Pdfcrowd conversion process for accessing the source URLs with HTTP scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.

  • proxy - The value must have format DOMAIN_OR_IP_ADDRESS:PORT.

  • Returns - The converter object.



4837
4838
4839
4840
4841
4842
4843
4844
# File 'lib/pdfcrowd.rb', line 4837

def setHttpProxy(proxy)
    unless /(?i)^([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z0-9]{1,}:\d+$/.match(proxy)
        raise Error.new(Pdfcrowd.create_invalid_value_message(proxy, "setHttpProxy", "pdf-to-html", "The value must have format DOMAIN_OR_IP_ADDRESS:PORT.", "set_http_proxy"), 470);
    end
    
    @fields['http_proxy'] = proxy
    self
end

#setHttpsProxy(proxy) ⇒ Object

A proxy server used by Pdfcrowd conversion process for accessing the source URLs with HTTPS scheme. It can help to circumvent regional restrictions or provide limited access to your intranet.

  • proxy - The value must have format DOMAIN_OR_IP_ADDRESS:PORT.

  • Returns - The converter object.



4850
4851
4852
4853
4854
4855
4856
4857
# File 'lib/pdfcrowd.rb', line 4850

def setHttpsProxy(proxy)
    unless /(?i)^([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z0-9]{1,}:\d+$/.match(proxy)
        raise Error.new(Pdfcrowd.create_invalid_value_message(proxy, "setHttpsProxy", "pdf-to-html", "The value must have format DOMAIN_OR_IP_ADDRESS:PORT.", "set_https_proxy"), 470);
    end
    
    @fields['https_proxy'] = proxy
    self
end

#setImageMode(mode) ⇒ Object

Specifies where the images are stored.

  • mode - The image storage mode. Allowed values are embed, separate.

  • Returns - The converter object.



4684
4685
4686
4687
4688
4689
4690
4691
# File 'lib/pdfcrowd.rb', line 4684

def setImageMode(mode)
    unless /(?i)^(embed|separate)$/.match(mode)
        raise Error.new(Pdfcrowd.create_invalid_value_message(mode, "setImageMode", "pdf-to-html", "Allowed values are embed, separate.", "set_image_mode"), 470);
    end
    
    @fields['image_mode'] = mode
    self
end

#setKeywords(keywords) ⇒ Object

Associate keywords with the HTML document. Keywords from the input PDF are used by default.

  • keywords - The string containing the keywords.

  • Returns - The converter object.



4765
4766
4767
4768
# File 'lib/pdfcrowd.rb', line 4765

def setKeywords(keywords)
    @fields['keywords'] = keywords
    self
end

#setPdfPassword(password) ⇒ Object

Password to open the encrypted PDF file.

  • password - The input PDF password.

  • Returns - The converter object.



4649
4650
4651
4652
# File 'lib/pdfcrowd.rb', line 4649

def setPdfPassword(password)
    @fields['pdf_password'] = password
    self
end

#setPrintPageRange(pages) ⇒ Object

Set the page range to print.

  • pages - A comma separated list of page numbers or ranges.

  • Returns - The converter object.



4671
4672
4673
4674
4675
4676
4677
4678
# File 'lib/pdfcrowd.rb', line 4671

def setPrintPageRange(pages)
    unless /^(?:\s*(?:\d+|(?:\d*\s*\-\s*\d+)|(?:\d+\s*\-\s*\d*))\s*,\s*)*\s*(?:\d+|(?:\d*\s*\-\s*\d+)|(?:\d+\s*\-\s*\d*))\s*$/.match(pages)
        raise Error.new(Pdfcrowd.create_invalid_value_message(pages, "setPrintPageRange", "pdf-to-html", "A comma separated list of page numbers or ranges.", "set_print_page_range"), 470);
    end
    
    @fields['print_page_range'] = pages
    self
end

#setProxy(host, port, user_name, password) ⇒ Object

Specifies an HTTP proxy that the API client library will use to connect to the internet.

  • host - The proxy hostname.

  • port - The proxy port.

  • user_name - The username.

  • password - The password.

  • Returns - The converter object.



4885
4886
4887
4888
# File 'lib/pdfcrowd.rb', line 4885

def setProxy(host, port, user_name, password)
    @helper.setProxy(host, port, user_name, password)
    self
end

#setRetryCount(count) ⇒ Object

Specifies the number of automatic retries when the 502 HTTP status code is received. The 502 status code indicates a temporary network issue. This feature can be disabled by setting to 0.

  • count - Number of retries.

  • Returns - The converter object.



4894
4895
4896
4897
# File 'lib/pdfcrowd.rb', line 4894

def setRetryCount(count)
    @helper.setRetryCount(count)
    self
end

#setScaleFactor(factor) ⇒ Object

Set the scaling factor (zoom) for the main page area.

  • factor - The percentage value. Must be a positive integer number.

  • Returns - The converter object.



4658
4659
4660
4661
4662
4663
4664
4665
# File 'lib/pdfcrowd.rb', line 4658

def setScaleFactor(factor)
    if (!(Integer(factor) > 0))
        raise Error.new(Pdfcrowd.create_invalid_value_message(factor, "setScaleFactor", "pdf-to-html", "Must be a positive integer number.", "set_scale_factor"), 470);
    end
    
    @fields['scale_factor'] = factor
    self
end

#setSubject(subject) ⇒ Object

Set the HTML subject. The subject from the input PDF is used by default.

  • subject - The HTML subject.

  • Returns - The converter object.



4747
4748
4749
4750
# File 'lib/pdfcrowd.rb', line 4747

def setSubject(subject)
    @fields['subject'] = subject
    self
end

#setTag(tag) ⇒ Object

Tag the conversion with a custom value. The tag is used in conversion statistics. A value longer than 32 characters is cut off.

  • tag - A string with the custom tag.

  • Returns - The converter object.



4828
4829
4830
4831
# File 'lib/pdfcrowd.rb', line 4828

def setTag(tag)
    @fields['tag'] = tag
    self
end

#setTitle(title) ⇒ Object

Set the HTML title. The title from the input PDF is used by default.

  • title - The HTML title.

  • Returns - The converter object.



4738
4739
4740
4741
# File 'lib/pdfcrowd.rb', line 4738

def setTitle(title)
    @fields['title'] = title
    self
end

#setUseHttp(value) ⇒ Object

Specifies if the client communicates over HTTP or HTTPS with Pdfcrowd API. Warning: Using HTTP is insecure as data sent over HTTP is not encrypted. Enable this option only if you know what you are doing.

  • value - Set to true to use HTTP.

  • Returns - The converter object.



4864
4865
4866
4867
# File 'lib/pdfcrowd.rb', line 4864

def setUseHttp(value)
    @helper.setUseHttp(value)
    self
end

#setUserAgent(agent) ⇒ Object

Set a custom user agent HTTP header. It can be useful if you are behind a proxy or a firewall.

  • agent - The user agent string.

  • Returns - The converter object.



4873
4874
4875
4876
# File 'lib/pdfcrowd.rb', line 4873

def setUserAgent(agent)
    @helper.setUserAgent(agent)
    self
end