Class: Wgit::Url

Inherits:
String show all
Includes:
Assertable
Defined in:
lib/wgit/url.rb

Overview

Class modeling/serialising a web based HTTP URL.

Can be an internal/relative link e.g. "about.html" or an absolute URL e.g. "http://www.google.co.uk". Is a subclass of String and uses URI and addressable/uri internally for parsing.

Most of the methods in this class return new Wgit::Url instances making the method calls chainable e.g. url.omit_base.omit_fragment etc. The methods also try to be idempotent where possible.

Constant Summary

Constants included from Assertable

Assertable::DEFAULT_DUCK_FAIL_MSG, Assertable::DEFAULT_REQUIRED_KEYS_MSG, Assertable::DEFAULT_TYPE_FAIL_MSG, Assertable::NON_ENUMERABLE_MSG

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Assertable

#assert_arr_types, #assert_required_keys, #assert_respond_to, #assert_types

Constructor Details

#initialize(url_or_obj, crawled: false, date_crawled: nil, crawl_duration: nil) ⇒ Url

Initializes a new instance of Wgit::Url which models a web based HTTP URL.

Parameters:

  • url_or_obj (String, Wgit::Url, #fetch#[])

    Is either a String based URL or an object representing a Database record e.g. a MongoDB document/object.

  • crawled (Boolean) (defaults to: false)

    Whether or not the HTML of the URL's web page has been crawled or not. Only used if url_or_obj is a String.

  • date_crawled (Time) (defaults to: nil)

    Should only be provided if crawled is true. A suitable object can be returned from Wgit::Utils.time_stamp. Only used if url_or_obj is a String.

  • crawl_duration (Float) (defaults to: nil)

    Should only be provided if crawled is true. The duration of the crawl for this Url (in seconds).

Raises:

  • (StandardError)

    If url_or_obj is an Object with missing methods.



45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# File 'lib/wgit/url.rb', line 45

def initialize(
  url_or_obj, crawled: false, date_crawled: nil, crawl_duration: nil
)
  # Init from a URL String.
  if url_or_obj.is_a?(String)
    url = url_or_obj.to_s
  # Else init from a Hash like object e.g. database object.
  else
    obj = url_or_obj
    assert_respond_to(obj, :fetch)

    url            = obj.fetch('url') # Should always be present.
    crawled        = obj.fetch('crawled', false)
    date_crawled   = obj.fetch('date_crawled', nil)
    crawl_duration = obj.fetch('crawl_duration', nil)
  end

  @uri            = Addressable::URI.parse(url)
  @crawled        = crawled
  @date_crawled   = date_crawled
  @crawl_duration = crawl_duration

  super(url)
end

Instance Attribute Details

#crawl_durationObject

The duration of the crawl for this Url (in seconds).



29
30
31
# File 'lib/wgit/url.rb', line 29

def crawl_duration
  @crawl_duration
end

#crawledObject Also known as: crawled?

Whether or not the Url has been crawled or not. A custom crawled= method is provided by this class.



23
24
25
# File 'lib/wgit/url.rb', line 23

def crawled
  @crawled
end

#date_crawledObject

The Time stamp of when this Url was crawled.



26
27
28
# File 'lib/wgit/url.rb', line 26

def date_crawled
  @date_crawled
end

Class Method Details

.parse(obj) ⇒ Wgit::Url

Initialises a new Wgit::Url instance from a String or subclass of String e.g. Wgit::Url. Any other obj type will raise an error.

If obj is already a Wgit::Url then it will be returned as is to maintain it's state. Otherwise, a new Wgit::Url is instantiated and returned. This differs from Wgit::Url.new which always instantiates a new Wgit::Url.

Note: Only use this method if you are allowing obj to be either a String or a Wgit::Url whose state you want to preserve e.g. when passing a URL to a crawl method which might redirect (calling Wgit::Url#replace). If you're sure of the type or don't care about preserving the state of the Wgit::Url, use Wgit::Url.new instead.

Parameters:

  • obj (Object)

    The object to parse, which #is_a?(String).

Returns:

Raises:

  • (StandardError)

    If obj.is_a?(String) is false.



86
87
88
89
90
91
# File 'lib/wgit/url.rb', line 86

def self.parse(obj)
  raise 'Can only parse if obj#is_a?(String)' unless obj.is_a?(String)

  # Return a Wgit::Url as is to avoid losing state e.g. date_crawled etc.
  obj.is_a?(Wgit::Url) ? obj : new(obj)
end

.parse?(obj) ⇒ Wgit::Url

Returns a Wgit::Url instance from Wgit::Url.parse, or nil if obj cannot be parsed successfully e.g. the String is invalid.

Use this method when you can't gaurentee that obj is parsable as a URL. See Wgit::Url.parse for more information.

Parameters:

  • obj (Object)

    The object to parse, which #is_a?(String).

Returns:

  • (Wgit::Url)

    A Wgit::Url instance or nil (if obj is invalid).

Raises:

  • (StandardError)

    If obj.is_a?(String) is false.



102
103
104
105
106
107
108
# File 'lib/wgit/url.rb', line 102

def self.parse?(obj)
  parse(obj)
rescue Addressable::URI::InvalidURIError
  Wgit.logger.debug("Wgit::Url.parse?('#{obj}') exception: \
Addressable::URI::InvalidURIError")
  nil
end

Instance Method Details

#absolute?Boolean Also known as: is_absolute?

Returns true if self is an absolute Url; false if relative.

Returns:

  • (Boolean)

    True if absolute, false if relative.



205
206
207
# File 'lib/wgit/url.rb', line 205

def absolute?
  @uri.absolute?
end

#concat(other) ⇒ Wgit::Url Also known as: +

Concats self and other together before returning a new Url. Self is not modified.

Parameters:

Returns:

  • (Wgit::Url)

    self + separator + other, separator depends on other.



234
235
236
237
238
239
240
241
242
243
244
245
# File 'lib/wgit/url.rb', line 234

def concat(other)
  other = Wgit::Url.new(other)
  raise 'other must be relative' unless other.relative?

  other = other.omit_leading_slash
  separator = %w[# ? .].include?(other[0]) ? '' : '/'

  # We use to_s below to call String#+, not Wgit::Url#+ (alias for concat).
  concatted = omit_trailing_slash.to_s + separator.to_s + other.to_s

  Wgit::Url.new(concatted)
end

#fragment?Boolean Also known as: is_fragment?

Returns true if self is a URL fragment e.g. #top etc. Note this shouldn't be used to determine if self contains a fragment.

Returns:

  • (Boolean)

    True if self is a fragment, false otherwise.



626
627
628
# File 'lib/wgit/url.rb', line 626

def fragment?
  start_with?('#')
end

#index?Boolean Also known as: is_index?

Returns true if self equals '/' a.k.a. index.

Returns:

  • (Boolean)

    True if self equals '/', false otherwise.



633
634
635
# File 'lib/wgit/url.rb', line 633

def index?
  self == '/'
end

#inspectString

Overrides String#inspect to distingiush this Url from a String.

Returns:

  • (String)

    A short textual representation of this Url.



123
124
125
# File 'lib/wgit/url.rb', line 123

def inspect
  "#<Wgit::Url url=\"#{self}\" crawled=#{@crawled}>"
end

#invalid?Boolean

Returns if self is an invalid (e.g. relative) HTTP URL. See Wgit::Url#valid? for the inverse (and more information).

Returns:

  • (Boolean)

    True if invalid, otherwise false.



225
226
227
# File 'lib/wgit/url.rb', line 225

def invalid?
  !valid?
end

#make_absolute(doc) ⇒ Wgit::Url

Returns an absolute form of self within the context of doc. Doesn't modify the receiver.

If self is absolute then it's returned as is, making this method idempotent. The doc's <base> element is used if present, otherwise doc.url is used as the base; which is concatted with self.

Typically used to build an absolute link obtained from a document.

Examples:

link = Wgit::Url.new('/favicon.png')
doc  = Wgit::Document.new('http://example.com')

link.make_absolute(doc) # => "http://example.com/favicon.png"

Parameters:

  • doc (Wgit::Document)

    The doc whose base Url is concatted with self.

Returns:

Raises:

  • (StandardError)

    If doc isn't a Wgit::Document or if doc.base_url raises an Exception.



275
276
277
278
279
280
281
282
283
# File 'lib/wgit/url.rb', line 275

def make_absolute(doc)
  assert_type(doc, Wgit::Document)
  raise 'Cannot make absolute when Document @url is not valid' \
  unless doc.url.valid?

  return prefix_scheme(doc.url.to_scheme&.to_sym) if scheme_relative?

  absolute? ? self : doc.base_url(link: self).concat(self)
end

#normalizeWgit::Url

Normalizes/escapes self and returns a new Wgit::Url. Self isn't modified. This should be used before GET'ing the url, in case it has IRI chars.

Returns:

  • (Wgit::Url)

    An escaped version of self.



251
252
253
# File 'lib/wgit/url.rb', line 251

def normalize
  Wgit::Url.new(@uri.normalize.to_s)
end

#omit(*components) ⇒ Wgit::Url

Omits the given URL components from self and returns a new Wgit::Url.

Calls Addressable::URI#omit underneath and creates a new Wgit::Url from the output. See the Addressable::URI docs for more information.

Parameters:

  • components (*Symbol)

    One or more Symbols representing the URL components to omit. The following components are supported: :scheme, :user, :password, :userinfo, :host, :port, :authority, :path, :query, :fragment.

Returns:

  • (Wgit::Url)

    Self's URL value with the given components omitted.



522
523
524
525
# File 'lib/wgit/url.rb', line 522

def omit(*components)
  omitted = @uri.omit(*components)
  Wgit::Url.new(omitted.to_s)
end

#omit_baseWgit::Url

Returns a new Wgit::Url with the base (scheme and host) removed e.g. Given http://google.com/search?q=something#about, search?q=something#about is returned. If relative and base isn't present then self is returned. Leading and trailing slashes are always stripped from the return value.

Returns:

  • (Wgit::Url)

    Self containing everything after the base.



561
562
563
564
565
566
567
568
# File 'lib/wgit/url.rb', line 561

def omit_base
  base_url = to_base
  omit_base = base_url ? gsub(base_url, '') : self

  return self if ['', '/'].include?(omit_base)

  Wgit::Url.new(omit_base).omit_slashes
end

#omit_fragmentWgit::Url

Returns a new Wgit::Url with the fragment portion removed e.g. Given http://google.com/search#about, http://google.com/search is returned. Self is returned as is if no fragment is present. A URL consisting of only a fragment e.g. '#about' will return an empty URL. This method assumes that the fragment is correctly placed at the very end of the URL.

Returns:

  • (Wgit::Url)

    Self with the fragment portion removed.



607
608
609
610
611
612
# File 'lib/wgit/url.rb', line 607

def omit_fragment
  fragment = to_fragment
  omit_fragment = fragment ? gsub("##{fragment}", '') : self

  Wgit::Url.new(omit_fragment)
end

#omit_leading_slashWgit::Url

Returns a new Wgit::Url containing self without a trailing slash. Is idempotent meaning self will always be returned regardless of whether there's a trailing slash or not.

Returns:

  • (Wgit::Url)

    Self without a trailing slash.



532
533
534
# File 'lib/wgit/url.rb', line 532

def omit_leading_slash
  start_with?('/') ? Wgit::Url.new(self[1..-1]) : self
end

#omit_originWgit::Url

Returns a new Wgit::Url with the origin (base + port) removed e.g. Given http://google.com:81/search?q=something#about, search?q=something#about is returned. If relative and base isn't present then self is returned. Leading and trailing slashes are always stripped from the return value.

Returns:

  • (Wgit::Url)

    Self containing everything after the origin.



576
577
578
579
580
581
582
583
# File 'lib/wgit/url.rb', line 576

def omit_origin
  origin = to_origin
  omit_origin = origin ? gsub(origin, '') : self

  return self if ['', '/'].include?(omit_origin)

  Wgit::Url.new(omit_origin).omit_slashes
end

#omit_queryWgit::Url

Returns a new Wgit::Url with the query string portion removed e.g. Given http://google.com/search?q=hello, http://google.com/search is returned. Self is returned as is if no query string is present. A URL consisting of only a query string e.g. '?q=hello' will return an empty URL.

Returns:

  • (Wgit::Url)

    Self with the query string portion removed.



592
593
594
595
596
597
# File 'lib/wgit/url.rb', line 592

def omit_query
  query = to_query
  omit_query_string = query ? gsub("?#{query}", '') : self

  Wgit::Url.new(omit_query_string)
end

#omit_slashesWgit::Url

Returns a new Wgit::Url containing self without a leading or trailing slash. Is idempotent and will return self regardless if there's slashes present or not.

Returns:

  • (Wgit::Url)

    Self without leading or trailing slashes.



550
551
552
553
# File 'lib/wgit/url.rb', line 550

def omit_slashes
  omit_leading_slash
    .omit_trailing_slash
end

#omit_trailing_slashWgit::Url

Returns a new Wgit::Url containing self without a trailing slash. Is idempotent meaning self will always be returned regardless of whether there's a trailing slash or not.

Returns:

  • (Wgit::Url)

    Self without a trailing slash.



541
542
543
# File 'lib/wgit/url.rb', line 541

def omit_trailing_slash
  end_with?('/') ? Wgit::Url.new(chop) : self
end

#prefix_scheme(scheme = :http) ⇒ Wgit::Url

Returns self having prefixed a scheme/protocol. Doesn't modify receiver. Returns self even if absolute (with scheme); therefore is idempotent.

Parameters:

  • scheme (Symbol) (defaults to: :http)

    Either :http or :https.

Returns:



290
291
292
293
294
295
296
297
298
299
# File 'lib/wgit/url.rb', line 290

def prefix_scheme(scheme = :http)
  unless %i[http https].include?(scheme)
    raise "scheme must be :http or :https, not :#{scheme}"
  end

  return self if absolute? && !scheme_relative?

  separator = scheme_relative? ? '' : '//'
  Wgit::Url.new("#{scheme}:#{separator}#{self}")
end

#query?Boolean Also known as: is_query?

Returns true if self is a URL query string e.g. ?q=hello etc. Note this shouldn't be used to determine if self contains a query.

Returns:

  • (Boolean)

    True if self is a query string, false otherwise.



618
619
620
# File 'lib/wgit/url.rb', line 618

def query?
  start_with?('?')
end

#relative?(opts = {}) ⇒ Boolean Also known as: is_relative?

Returns true if self is a relative Url; false if absolute.

An absolute URL must have a scheme prefix e.g. 'http://', otherwise the URL is regarded as being relative (regardless of whether it's valid or not). The only exception is if an opts arg is provided and self is a page belonging to that arg type e.g. host; then the link is relative.

Examples:

url = Wgit::Url.new('http://example.com/about')

url.relative? # => false
url.relative?(host: 'http://example.com') # => true

Parameters:

  • opts (Hash) (defaults to: {})

    The options with which to check relativity. Only one opts param should be provided. The provided opts param Url must be absolute and be prefixed with a scheme. Consider using the output of Wgit::Url#to_origin which should work (unless it's nil).

Options Hash (opts):

Returns:

  • (Boolean)

    True if relative, false if absolute.

Raises:

  • (StandardError)

    If self is invalid (e.g. empty) or an invalid opts param has been provided.



167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
# File 'lib/wgit/url.rb', line 167

def relative?(opts = {})
  defaults = { origin: nil, host: nil, domain: nil, brand: nil }
  opts = defaults.merge(opts)
  raise 'Url (self) cannot be empty' if empty?

  return false if scheme_relative?
  return true if @uri.relative?

  # Self is absolute but may be relative to the opts param e.g. host.
  opts.select! { |_k, v| v }
  raise "Provide only one of: #{defaults.keys}" if opts.length > 1

  return false if opts.empty?

  type, url = opts.first
  url = Wgit::Url.new(url)
  if url.invalid?
    raise "Invalid opts param value, it must be absolute, containing a \
protocol scheme and domain (e.g. http://example.com): #{url}"
  end

  case type
  when :origin # http://www.google.com:81
    to_origin == url.to_origin
  when :host   # www.google.com
    to_host   == url.to_host
  when :domain # google.com
    to_domain == url.to_domain
  when :brand  # google
    to_brand  == url.to_brand
  else
    raise "Unknown opts param: :#{type}, use one of: #{defaults.keys}"
  end
end

#replace(new_url) ⇒ String

Overrides String#replace setting the new_url @uri and String value.

Parameters:

Returns:

  • (String)

    The new URL value once set.



131
132
133
134
135
# File 'lib/wgit/url.rb', line 131

def replace(new_url)
  @uri = Addressable::URI.parse(new_url)

  super(new_url)
end

#scheme_relative?Boolean Also known as: is_scheme_relative?

Returns true if self starts with '//' a.k.a a scheme/protocol relative path.

Returns:

  • (Boolean)

    True if self starts with '//', false otherwise.



641
642
643
# File 'lib/wgit/url.rb', line 641

def scheme_relative?
  start_with?('//')
end

#to_addressable_uriAddressable::URI

Returns the Addressable::URI object for this URL.

Returns:

  • (Addressable::URI)

    The Addressable::URI object of self.



320
321
322
# File 'lib/wgit/url.rb', line 320

def to_addressable_uri
  @uri
end

#to_baseWgit::Url? Also known as: base

Returns only the base of this URL e.g. the protocol scheme and host combined.

Returns:



400
401
402
403
404
405
# File 'lib/wgit/url.rb', line 400

def to_base
  return nil unless @uri.scheme && @uri.host

  base = "#{@uri.scheme}://#{@uri.host}"
  Wgit::Url.new(base)
end

#to_brandWgit::Url? Also known as: brand

Returns a new Wgit::Url containing just the brand of this URL e.g. Given http://www.google.co.uk/about.html, google is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the brand or nil.



390
391
392
393
# File 'lib/wgit/url.rb', line 390

def to_brand
  domain = to_domain
  domain ? Wgit::Url.new(domain.split('.').first) : nil
end

#to_domainWgit::Url? Also known as: domain

Returns a new Wgit::Url containing just the domain of this URL e.g. Given http://www.google.co.uk/about.html, google.co.uk is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the domain or nil.



367
368
369
370
# File 'lib/wgit/url.rb', line 367

def to_domain
  domain = @uri.domain
  domain ? Wgit::Url.new(domain) : nil
end

#to_endpointWgit::Url Also known as: endpoint

Returns the endpoint of this URL e.g. the bit after the host with any slashes included. For example: Wgit::Url.new("http://www.google.co.uk/about.html/").to_endpoint returns "/about.html/". See Wgit::Url#to_path if you don't want the slashes.

Returns:

  • (Wgit::Url)

    Endpoint of self e.g. /about.html/. For a URL without an endpoint, / is returned.



440
441
442
443
444
# File 'lib/wgit/url.rb', line 440

def to_endpoint
  endpoint = @uri.path
  endpoint = '/' + endpoint unless endpoint.start_with?('/')
  Wgit::Url.new(endpoint)
end

#to_extensionWgit::Url? Also known as: extension

Returns a new Wgit::Url containing just the file extension of this URL e.g. Given http://google.com#about.html, html is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the extension string or nil.



486
487
488
489
490
491
492
# File 'lib/wgit/url.rb', line 486

def to_extension
  path = to_path
  return nil unless path

  segs = path.split('.')
  segs.length > 1 ? Wgit::Url.new(segs.last) : nil
end

#to_fragmentWgit::Url? Also known as: fragment

Returns a new Wgit::Url containing just the fragment string of this URL e.g. Given http://google.com#about, #about is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the fragment string or nil.



477
478
479
480
# File 'lib/wgit/url.rb', line 477

def to_fragment
  fragment = @uri.fragment
  fragment ? Wgit::Url.new(fragment) : nil
end

#to_hHash

Returns a Hash containing this Url's instance vars excluding @uri. Used when storing the URL in a Database e.g. MongoDB etc.

Returns:

  • (Hash)

    self's instance vars as a Hash.



305
306
307
308
# File 'lib/wgit/url.rb', line 305

def to_h
  h = Wgit::Utils.to_h(self, ignore: ['@uri'])
  Hash[h.to_a.insert(0, ['url', self])] # Insert url at position 0.
end

#to_hostWgit::Url? Also known as: host

Returns a new Wgit::Url containing just the host of this URL e.g. Given http://www.google.co.uk/about.html, www.google.co.uk is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the host or nil.



344
345
346
347
# File 'lib/wgit/url.rb', line 344

def to_host
  host = @uri.host
  host ? Wgit::Url.new(host) : nil
end

#to_originWgit::Url? Also known as: origin

Returns only the origin of this URL e.g. the protocol scheme, host and port combined. For http://localhost:3000/api, http://localhost:3000 gets returned. If there's no port present, then to_base is returned.

Returns:

  • (Wgit::Url, nil)

    The origin of self or nil.



412
413
414
415
416
417
# File 'lib/wgit/url.rb', line 412

def to_origin
  return nil unless to_base
  return to_base unless to_port

  Wgit::Url.new("#{to_base}:#{to_port}")
end

#to_passwordWgit::Url? Also known as: password

Returns a new Wgit::Url containing just the password string of this URL e.g. Given http://me:[email protected], pass1 is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the password string or nil.



507
508
509
510
# File 'lib/wgit/url.rb', line 507

def to_password
  password = @uri.password
  password ? Wgit::Url.new(password) : nil
end

#to_pathWgit::Url? Also known as: path

Returns the path of this URL e.g. the bit after the host without slashes. For example: Wgit::Url.new("http://www.google.co.uk/about.html/").to_path returns "about.html". See Wgit::Url#to_endpoint if you want the slashes.

Returns:

  • (Wgit::Url, nil)

    Path of self e.g. about.html or nil.



425
426
427
428
429
430
431
# File 'lib/wgit/url.rb', line 425

def to_path
  path = @uri.path
  return nil if path.nil? || path.empty?
  return Wgit::Url.new('/') if path == '/'

  Wgit::Url.new(path).omit_slashes
end

#to_portWgit::Url? Also known as: port

Returns a new Wgit::Url containing just the port of this URL e.g. Given http://www.google.co.uk:443/about.html, '443' is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the port or nil.



353
354
355
356
357
358
359
360
361
# File 'lib/wgit/url.rb', line 353

def to_port
  port = @uri.port

  # @uri.port defaults port to 80/443 if missing, so we check for :#{port}.
  return nil unless port
  return nil unless include?(":#{port}")

  Wgit::Url.new(port.to_s)
end

#to_queryWgit::Url? Also known as: query

Returns a new Wgit::Url containing just the query string of this URL e.g. Given http://google.com?q=foo&bar=1, 'q=ruby&bar=1' is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the query string or nil.



450
451
452
453
# File 'lib/wgit/url.rb', line 450

def to_query
  query = @uri.query
  query ? Wgit::Url.new(query) : nil
end

#to_query_hash(symbolize_keys: false) ⇒ Hash<String | Symbol, String> Also known as: query_hash

Returns a Hash containing just the query string parameters of this URL e.g. Given http://google.com?q=ruby, "{ 'q' => 'ruby' }" is returned.

Parameters:

  • symbolize_keys (Boolean) (defaults to: false)

    The returned Hash keys will be Symbols if true, Strings otherwise.

Returns:

  • (Hash<String | Symbol, String>)

    Containing the query string params or empty if the URL doesn't contain any query parameters.



462
463
464
465
466
467
468
469
470
471
# File 'lib/wgit/url.rb', line 462

def to_query_hash(symbolize_keys: false)
  query_str = to_query
  return {} unless query_str

  query_str.split('&').each_with_object({}) do |param, hash|
    k, v = param.split('=')
    k = k.to_sym if symbolize_keys
    hash[k] = v
  end
end

#to_schemeWgit::Url? Also known as: scheme

Returns a new Wgit::Url containing just the scheme of this URL e.g. Given http://www.google.co.uk, http is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the scheme or nil.



335
336
337
338
# File 'lib/wgit/url.rb', line 335

def to_scheme
  scheme = @uri.scheme
  scheme ? Wgit::Url.new(scheme) : nil
end

#to_sub_domainWgit::Url? Also known as: sub_domain

Returns a new Wgit::Url containing just the sub domain of this URL e.g. Given http://scripts.dev.google.com, scripts.dev is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the sub domain or nil.



376
377
378
379
380
381
382
383
384
# File 'lib/wgit/url.rb', line 376

def to_sub_domain
  return nil unless to_host

  dot_domain = ".#{to_domain}"
  return nil unless include?(dot_domain)

  sub_domain = to_host.sub(dot_domain, '')
  Wgit::Url.new(sub_domain)
end

#to_uriURI::HTTP, URI::HTTPS Also known as: uri

Returns a normalised URI object for this URL.

Returns:

  • (URI::HTTP, URI::HTTPS)

    The URI object of self.



313
314
315
# File 'lib/wgit/url.rb', line 313

def to_uri
  URI(normalize)
end

#to_urlWgit::Url Also known as: url

Returns self.

Returns:



327
328
329
# File 'lib/wgit/url.rb', line 327

def to_url
  self
end

#to_userWgit::Url? Also known as: user

Returns a new Wgit::Url containing just the username string of this URL e.g. Given http://me:[email protected], me is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the user string or nil.



498
499
500
501
# File 'lib/wgit/url.rb', line 498

def to_user
  user = @uri.user
  user ? Wgit::Url.new(user) : nil
end

#valid?Boolean Also known as: is_valid?

Returns if self is a valid and absolute HTTP URL or not. Self should always be crawlable if this method returns true.

Returns:

  • (Boolean)

    True if valid, absolute and crawable, otherwise false.



213
214
215
216
217
218
219
# File 'lib/wgit/url.rb', line 213

def valid?
  return false if relative?
  return false unless to_origin && to_domain
  return false unless URI::DEFAULT_PARSER.make_regexp.match(normalize)

  true
end