Class: SiteInspector::Domain

Inherits:
Object
  • Object
show all
Defined in:
lib/site-inspector/domain.rb

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(host) ⇒ Domain

Returns a new instance of Domain.



7
8
9
10
11
12
13
14
# File 'lib/site-inspector/domain.rb', line 7

def initialize(host)
  host = host.downcase
  host = host.sub(/^https?:/, '')
  host = host.sub(%r{^/+}, '')
  host = host.sub(/^www\./, '')
  uri = Addressable::URI.parse "//#{host}"
  @host = uri.host
end

Instance Attribute Details

#hostObject (readonly)

Returns the value of attribute host.



5
6
7
# File 'lib/site-inspector/domain.rb', line 5

def host
  @host
end

Instance Method Details

#canonical_endpointObject



25
26
27
28
29
30
31
32
# File 'lib/site-inspector/domain.rb', line 25

def canonical_endpoint
  @canonical_endpoint ||= begin
    prefetch
    endpoints.find do |e|
      e.https? == canonically_https? && e.www? == canonically_www?
    end
  end
end

#canonically_https?Boolean

A domain is “canonically” at https if:

* at least one of its https endpoints is live and
  doesn't have an invalid hostname
* both http endpoints are either down or redirect *somewhere*
* at least one http endpoint redirects immediately to
  an *internal* https endpoint

This is meant to affirm situations like:

http:// -> http://www -> https://
https:// -> http:// -> https://www

and meant to avoid affirming situations like:

http:// -> http://non-www
http://www -> http://non-www

or:

http:// -> 200, http://www -> https://www

It allows a site to be canonically HTTPS if the cert has a valid hostname but invalid chain issues.

Returns:

  • (Boolean)


156
157
158
159
160
161
162
163
164
165
166
167
168
# File 'lib/site-inspector/domain.rb', line 156

def canonically_https?
  # Does any endpoint respond?
  return false unless up?

  # At least one of its https endpoints is live and doesn't have an invalid hostname
  return false unless https?

  # Both http endpoints are down
  return true if endpoints.select(&:http?).all? { |e| !e.up? }

  # at least one http endpoint redirects immediately to https
  endpoints.select(&:http?).any? { |e| e.redirect&.https? }
end

#canonically_www?Boolean

A domain is “canonically” at www if:

* at least one of its www endpoints responds
* both root endpoints are either down ~~or redirect *somewhere*~~, or
* at least one root endpoint redirect should immediately go to
  an *internal* www endpoint

This is meant to affirm situations like:

http:// -> https:// -> https://www
https:// -> http:// -> https://www

and meant to avoid affirming situations like:

http:// -> http://non-www,
http://www -> http://non-www

or like:

https:// -> 200, http:// -> http://www

Returns:

  • (Boolean)


125
126
127
128
129
130
131
132
133
134
135
136
137
# File 'lib/site-inspector/domain.rb', line 125

def canonically_www?
  # Does any endpoint respond?
  return false unless up?

  # Does at least one www endpoint respond?
  return false unless www?

  # Are both root endpoints down?
  return true if endpoints.select(&:root?).all? { |e| !e.up? }

  # Does either root endpoint redirect to a www endpoint?
  endpoints.select(&:root?).any? { |e| e.redirect&.www? }
end

#defaults_https?Boolean

we can say that a canonical HTTPS site “defaults” to HTTPS, even if it doesn’t strictly enforce it (e.g. having a www subdomain first to go HTTP root before HTTPS root).

TODO: not implemented.

Returns:

  • (Boolean)


96
97
98
# File 'lib/site-inspector/domain.rb', line 96

def defaults_https?
  raise 'Not implemented. Halp?'
end

#downgrades_https?Boolean

HTTPS is “downgraded” if both:

  • HTTPS is supported, and

  • The ‘canonical’ endpoint gets an immediate internal redirect to HTTP.

TODO: the redirect must be internal.

Returns:

  • (Boolean)


106
107
108
109
110
# File 'lib/site-inspector/domain.rb', line 106

def downgrades_https?
  return false unless https?

  canonical_endpoint.redirect? && canonical_endpoint.redirect.http?
end

#endpointsObject



16
17
18
19
20
21
22
23
# File 'lib/site-inspector/domain.rb', line 16

def endpoints
  @endpoints ||= [
    Endpoint.new("https://#{host}", domain: self),
    Endpoint.new("https://www.#{host}", domain: self),
    Endpoint.new("http://#{host}", domain: self),
    Endpoint.new("http://www.#{host}", domain: self)
  ]
end

#enforces_https?Boolean

HTTPS is enforced if one of the HTTPS endpoints is “up”, and if both HTTP endpoints are either:

* down, or
* redirect immediately to HTTPS.

This is different than whether a domain is “canonically” HTTPS.

  • an HTTP redirect can go to HTTPS on another domain, as long as it’s immediate.

  • a domain with an invalid cert can still be enforcing HTTPS.

TODO: need to ensure the redirect immediately goes to HTTPS. TODO: don’t need to require that the HTTPS cert is valid for this purpose.

Returns:

  • (Boolean)


85
86
87
88
89
# File 'lib/site-inspector/domain.rb', line 85

def enforces_https?
  return false unless https?

  endpoints.select(&:http?).all? { |e| !e.up? || e.redirect&.https? }
end

#government?Boolean

Returns:

  • (Boolean)


34
35
36
37
# File 'lib/site-inspector/domain.rb', line 34

def government?
  require 'gman'
  Gman.valid? host
end

#hsts?Boolean

HSTS on the canonical domain?

Returns:

  • (Boolean)


185
186
187
# File 'lib/site-inspector/domain.rb', line 185

def hsts?
  canonical_endpoint.hsts&.enabled?
end

#hsts_preload_ready?Boolean

Returns:

  • (Boolean)


193
194
195
196
197
# File 'lib/site-inspector/domain.rb', line 193

def hsts_preload_ready?
  return false unless hsts_subdomains?

  endpoints.find { |e| e.root? && e.https? }.hsts.preload_ready?
end

#hsts_subdomains?Boolean

Returns:

  • (Boolean)


189
190
191
# File 'lib/site-inspector/domain.rb', line 189

def hsts_subdomains?
  endpoints.find { |e| e.root? && e.https? }.hsts.include_subdomains?
end

#https?Boolean

HTTPS is “supported” (different than “canonical” or “enforced”) if:

  • Either of the HTTPS endpoints is listening, and doesn’t have an invalid hostname.

TODO: needs to allow an invalid chain.

Returns:

  • (Boolean)


67
68
69
# File 'lib/site-inspector/domain.rb', line 67

def https?
  endpoints.any? { |e| e.https? && e.up? && e.https.valid? }
end

#inspectObject



203
204
205
# File 'lib/site-inspector/domain.rb', line 203

def inspect
  "#<SiteInspector::Domain host=\"#{host}\">"
end

#prefetchObject

We know most API calls to the domain model are going to require That the root of all four endpoints are called. Rather than process them In serial, lets grab them in parallel and cache the results to speed up later calls.



211
212
213
214
215
216
217
# File 'lib/site-inspector/domain.rb', line 211

def prefetch
  endpoints.each do |endpoint|
    request = Typhoeus::Request.new(endpoint.uri, SiteInspector.typhoeus_defaults)
    SiteInspector.hydra.queue(request)
  end
  SiteInspector.hydra.run
end

#redirectObject

The first endpoint to respond with a redirect



180
181
182
# File 'lib/site-inspector/domain.rb', line 180

def redirect
  endpoints.find(&:external_redirect?)
end

#redirect?Boolean

A domain redirects if

  1. At least one endpoint is an external redirect, and

  2. All endpoints are either down or an external redirect

Returns:

  • (Boolean)


173
174
175
176
177
# File 'lib/site-inspector/domain.rb', line 173

def redirect?
  return false unless redirect

  endpoints.all? { |e| !e.up? || e.external_redirect? }
end

#responds?Boolean

Does any endpoint respond to HTTP? TODO: needs to allow an invalid chain.

Returns:

  • (Boolean)


46
47
48
# File 'lib/site-inspector/domain.rb', line 46

def responds?
  endpoints.any?(&:responds?)
end

#root?Boolean

Can you connect without www?

Returns:

  • (Boolean)


57
58
59
# File 'lib/site-inspector/domain.rb', line 57

def root?
  endpoints.any? { |e| e.root? && e.up? }
end

#to_h(options = {}) ⇒ Object

Converts the domain to a hash

By default, it only returns domain-wide information and information about the canonical endpoint

It will also pass options allong to each endpoint’s to_h method

options:

:all - return information about all endpoints

Returns a complete hash of the domain’s information



230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
# File 'lib/site-inspector/domain.rb', line 230

def to_h(options = {})
  prefetch

  hash = {
    host: host,
    up: up?,
    responds: responds?,
    www: www?,
    root: root?,
    https: https?,
    enforces_https: enforces_https?,
    downgrades_https: downgrades_https?,
    canonically_www: canonically_www?,
    canonically_https: canonically_https?,
    redirect: redirect?,
    hsts: hsts?,
    hsts_subdomains: hsts_subdomains?,
    hsts_preload_ready: hsts_preload_ready?,
    canonical_endpoint: canonical_endpoint.to_h(options)
  }

  if options['all']
    hash[:endpoints] = {
      https: {
        root: endpoints[0].to_h(options),
        www: endpoints[1].to_h(options)
      },
      http: {
        root: endpoints[2].to_h(options),
        www: endpoints[3].to_h(options)
      }
    }
  end

  hash
end

#to_json(*_args) ⇒ Object



267
268
269
# File 'lib/site-inspector/domain.rb', line 267

def to_json(*_args)
  to_h.to_json
end

#to_sObject



199
200
201
# File 'lib/site-inspector/domain.rb', line 199

def to_s
  host
end

#up?Boolean

Does any endpoint return a 200 or 300 response code?

Returns:

  • (Boolean)


40
41
42
# File 'lib/site-inspector/domain.rb', line 40

def up?
  endpoints.any?(&:up?)
end

#www?Boolean

TODO: These weren’t present before, and may not be useful. Can you connect to www?

Returns:

  • (Boolean)


52
53
54
# File 'lib/site-inspector/domain.rb', line 52

def www?
  endpoints.any? { |e| e.www? && e.up? }
end