Class: Webby::LinkValidator

Inherits:
Object
  • Object
show all
Defined in:
lib/webby/link_validator.rb

Overview

The Webby LinkValidator class is used to validate the hyperlinks of all the HTML files in the output directory. By default, only links to other pages in the output directory are checked. However, setting the :external flag to true will cause hyperlinks to external web sites to be validated as well.

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(opts = {}) ⇒ LinkValidator

call-seq:

LinkValidator.new( opts = {} )

Creates a new LinkValidator object. The only supported option is the :external flag. When set to true, the link validator will also check out links to external websites. This is done by opening a connection to the remote site and pulling down the page specified in the hyperlink. Use with caution.



32
33
34
35
36
37
38
39
40
41
42
# File 'lib/webby/link_validator.rb', line 32

def initialize( opts = {} )
  @log = Logging::Logger[self]

  glob = ::File.join(::Webby.site.output_dir, '**', '*.html')
  @files = Dir.glob(glob).sort
  @attr_rgxp = %r/\[@(\w+)\]$/o

  @validate_externals = opts.getopt(:external, false)
  @valid_uris = ::Webby.site.valid_uris.flatten
  @invalid_uris = []
end

Instance Attribute Details

#validate_externalsObject

Returns the value of attribute validate_externals.



21
22
23
# File 'lib/webby/link_validator.rb', line 21

def validate_externals
  @validate_externals
end

Class Method Details

.validate(opts = {}) ⇒ Object

A lazy man’s method that will instantiate a new link validator and run the validations.



17
18
19
# File 'lib/webby/link_validator.rb', line 17

def self.validate( opts = {} )
  new(opts).validate
end

Instance Method Details

#check_file(fn) ⇒ Object

Check the given file (identified by its filename for short here) by iterating through all the configured xpaths and validating that those hyperlinks ae valid.



55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# File 'lib/webby/link_validator.rb', line 55

def check_file( fn )
  @log.info "validating #{fn}"

  dir = ::File.dirname(fn)
  @doc = Hpricot(::File.read(fn))

  ::Webby.site.xpaths.each do |xpath|
    @attr_name = nil

    @doc.search(xpath).each do |element|
      @attr_name ||= @attr_rgxp.match(xpath)[1]
      uri = URI.parse(element.get_attribute(@attr_name))
      validate_uri(uri, dir)
    end
  end
  @doc = @attr_name = nil
end

#validateObject

Iterate over all the HTML files in the output directory and validate the hyperlinks.



47
48
49
# File 'lib/webby/link_validator.rb', line 47

def validate
  @files.each {|fn| check_file fn}
end

#validate_anchor(uri, doc) ⇒ Object

Validate that the anchor fragment of the URI exists in the given document. The document is an Hpricot document object.

Returns true if the anchor exists in the document and false if it does not.



139
140
141
142
143
144
145
146
147
# File 'lib/webby/link_validator.rb', line 139

def validate_anchor( uri, doc )
  return false if uri.fragment.nil?

  anchor = '#' + uri.fragment
  if doc.at(anchor).nil?
    @log.error "invalid URI '#{uri.to_s}'"
    false
  else true end
end

#validate_uri(uri, dir) ⇒ Object

Validate the the page the uri refers to actually exists. The directory of the current page being processed is needed in order to resolve relative paths.

If the uri is a relative path, then the output directory is searched for the appropriate page. If the uri is an absolute path, then the remote server is contacted and the page requested from the server. This will only take place if the LinkValidator was created with the :external flag set to true.



83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# File 'lib/webby/link_validator.rb', line 83

def validate_uri( uri, dir )
  # for relative URIs, we can see if the file exists in the output folder
  if uri.relative?
    return validate_anchor(uri, @doc) if uri.path.empty?

    path = if uri.path =~ %r/^\//
        ::File.join(::Webby.site.output_dir, uri.path)
      else
        ::File.join(dir, uri.path)
      end
    path = ::File.join(path, 'index.html') if ::File.extname(path).empty?

    uri_str = path.dup
    (uri_str << '#' << uri.fragment) if uri.fragment
    return if @valid_uris.include? uri_str

    if test ?f, path
      valid = if uri.fragment
          validate_anchor(uri, Hpricot(::File.read(path)))
        else true end
      @valid_uris << uri_str if valid
    else
      @log.error "invalid URI '#{uri.to_s}'"
    end

  # if the URI responds to the open mehod, then try to access the URI
  elsif uri.respond_to? :open
    return unless @validate_externals
    return if @valid_uris.include? uri.to_s

    if @invalid_uris.include? uri.to_s
      @log.error "could not open URI '#{uri.to_s}'"
      return
    end

    begin
      uri.open {|_| nil}
      @valid_uris << uri.to_s
    rescue Exception
      @log.error "could not open URI '#{uri.to_s}'"
      @invalid_uris << uri.to_s
    end 

  # otherwise, post a warning that the URI could not be validated
  else
    return if @valid_uris.include? uri.to_s
    @log.warn "could not validate URI '#{uri.to_s}'"
  end
end