Class: HTML::Proofer::UrlValidator
- Inherits:
-
Object
- Object
- HTML::Proofer::UrlValidator
- Includes:
- Utils
- Defined in:
- lib/html/proofer/url_validator.rb
Instance Attribute Summary collapse
-
#external_urls ⇒ Object
Returns the value of attribute external_urls.
-
#hydra ⇒ Object
Returns the value of attribute hydra.
-
#logger ⇒ Object
Returns the value of attribute logger.
Instance Method Summary collapse
- #add_failed_tests(filenames, desc, status = nil) ⇒ Object
-
#check_hash_in_2xx_response(href, effective_url, response, filenames) ⇒ Object
Even though the response was a success, we may have been asked to check if the hash on the URL exists on the page.
- #clean_url(href) ⇒ Object
-
#external_link_checker(external_urls) ⇒ Object
Proofer runs faster if we pull out all the external URLs and run the checks at the end.
- #handle_timeout(href, filenames, response_code) ⇒ Object
- #hash?(url) ⇒ Boolean
-
#initialize(logger, external_urls, options, typhoeus_opts, hydra_opts) ⇒ UrlValidator
constructor
A new instance of UrlValidator.
- #queue_request(method, href, filenames) ⇒ Object
- #response_handler(response, filenames) ⇒ Object
- #run ⇒ Object
- #url_processor(external_urls) ⇒ Object
Methods included from Utils
Constructor Details
#initialize(logger, external_urls, options, typhoeus_opts, hydra_opts) ⇒ UrlValidator
Returns a new instance of UrlValidator.
12 13 14 15 16 17 18 19 |
# File 'lib/html/proofer/url_validator.rb', line 12 def initialize(logger, external_urls, , typhoeus_opts, hydra_opts) @logger = logger @external_urls = external_urls @failed_tests = [] = @hydra = Typhoeus::Hydra.new(hydra_opts) @typhoeus_opts = typhoeus_opts end |
Instance Attribute Details
#external_urls ⇒ Object
Returns the value of attribute external_urls.
10 11 12 |
# File 'lib/html/proofer/url_validator.rb', line 10 def external_urls @external_urls end |
#hydra ⇒ Object
Returns the value of attribute hydra.
10 11 12 |
# File 'lib/html/proofer/url_validator.rb', line 10 def hydra @hydra end |
#logger ⇒ Object
Returns the value of attribute logger.
10 11 12 |
# File 'lib/html/proofer/url_validator.rb', line 10 def logger @logger end |
Instance Method Details
#add_failed_tests(filenames, desc, status = nil) ⇒ Object
119 120 121 122 123 124 125 |
# File 'lib/html/proofer/url_validator.rb', line 119 def add_failed_tests(filenames, desc, status = nil) if filenames.nil? @failed_tests << CheckRunner::Issue.new('', desc, nil, status) else filenames.each { |f| @failed_tests << CheckRunner::Issue.new(f, desc, nil, status) } end end |
#check_hash_in_2xx_response(href, effective_url, response, filenames) ⇒ Object
Even though the response was a success, we may have been asked to check if the hash on the URL exists on the page
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
# File 'lib/html/proofer/url_validator.rb', line 96 def check_hash_in_2xx_response(href, effective_url, response, filenames) return if [:only_4xx] return unless [:check_external_hash] return unless (hash = hash?(href)) body_doc = create_nokogiri(response.body) # user-content is a special addition by GitHub. xpath = %(//*[@name="#{hash}"]|//*[@id="#{hash}"]) if URI.parse(href).host.match(/github\.com/i) xpath << %(|//*[@name="user-content-#{hash}"]|//*[@id="user-content-#{hash}"]) end return unless body_doc.xpath(xpath).empty? add_failed_tests filenames, "External link #{href} failed: #{effective_url} exists, but the hash '#{hash}' does not", response.code end |
#clean_url(href) ⇒ Object
62 63 64 |
# File 'lib/html/proofer/url_validator.rb', line 62 def clean_url(href) Addressable::URI.parse(href).normalize end |
#external_link_checker(external_urls) ⇒ Object
Proofer runs faster if we pull out all the external URLs and run the checks at the end. Otherwise, we’re halting the consuming process for every file during the check_directory_of_files process.
In addition, sorting the list lets libcurl keep connections to the same hosts alive.
Finally, we’ll first make a HEAD request, rather than GETing all the contents. If the HEAD fails, we’ll fall back to GET, as some servers are not configured for HEAD. If we’ve decided to check for hashes, we must do a GET–HEAD is not an option.
36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
# File 'lib/html/proofer/url_validator.rb', line 36 def external_link_checker(external_urls) external_urls = Hash[external_urls.sort] count = external_urls.length check_text = "#{count} " << (count == 1 ? 'external link' : 'external links') logger.log :info, :blue, "Checking #{check_text}..." Ethon.logger = logger # log from Typhoeus/Ethon url_processor(external_urls) logger.log :debug, :yellow, "Running requests for all #{hydra.queued_requests.size} external URLs..." hydra.run end |
#handle_timeout(href, filenames, response_code) ⇒ Object
114 115 116 117 |
# File 'lib/html/proofer/url_validator.rb', line 114 def handle_timeout(href, filenames, response_code) return if [:only_4xx] add_failed_tests filenames, "External link #{href} failed: got a time out", response_code end |
#hash?(url) ⇒ Boolean
127 128 129 130 131 |
# File 'lib/html/proofer/url_validator.rb', line 127 def hash?(url) URI.parse(url).fragment rescue URI::InvalidURIError nil end |
#queue_request(method, href, filenames) ⇒ Object
66 67 68 69 70 |
# File 'lib/html/proofer/url_validator.rb', line 66 def queue_request(method, href, filenames) request = Typhoeus::Request.new(href, @typhoeus_opts.merge({ :method => method })) request.on_complete { |response| response_handler(response, filenames) } hydra.queue request end |
#response_handler(response, filenames) ⇒ Object
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
# File 'lib/html/proofer/url_validator.rb', line 72 def response_handler(response, filenames) effective_url = response.[:effective_url] href = response.request.base_url.to_s method = response.request.[:method] response_code = response.code debug_msg = "Received a #{response_code} for #{href}" debug_msg << " in #{filenames.join(' ')}" unless filenames.nil? logger.log :debug, :yellow, debug_msg if response_code.between?(200, 299) check_hash_in_2xx_response(href, effective_url, response, filenames) elsif response.timed_out? handle_timeout(href, filenames, response_code) elsif method == :head queue_request(:get, href, filenames) else return if [:only_4xx] && !response_code.between?(400, 499) # Received a non-successful http response. add_failed_tests filenames, "External link #{href} failed: #{response_code} #{response.return_message}", response_code end end |
#run ⇒ Object
21 22 23 24 |
# File 'lib/html/proofer/url_validator.rb', line 21 def run external_link_checker(external_urls) @failed_tests end |
#url_processor(external_urls) ⇒ Object
51 52 53 54 55 56 57 58 59 60 |
# File 'lib/html/proofer/url_validator.rb', line 51 def url_processor(external_urls) external_urls.each_pair do |href, filenames| href = clean_url(href) if hash?(href) && [:check_external_hash] queue_request(:get, href, filenames) else queue_request(:head, href, filenames) end end end |