Class: ContentUrls
- Inherits:
-
Object
- Object
- ContentUrls
- Defined in:
- lib/content_urls.rb,
lib/content_urls/version.rb,
lib/content_urls/parsers/css_parser.rb,
lib/content_urls/parsers/html_parser.rb,
lib/content_urls/parsers/java_script_parser.rb
Overview
ContentUrls parses various file types (HTML, CSS, JavaScript, …) for URLs and provides methods for iterating through URLs and changing URLs.
Defined Under Namespace
Modules: Version Classes: CssParser, HtmlParser, JavaScriptParser, StyleParser
Class Method Summary collapse
-
.base_url(content, type) ⇒ String
Returns base URL found in the content, if available.
-
.rewrite_each_url(content, type, &block) ⇒ Object
Rewrites each URL in the content by calling the supplied block with each URL.
-
.to_absolute(url, base_url) ⇒ Object
Convert a relative URL to an absolute URL using base_url (for example, the content’s original location or an HTML document’s href attribute of the base tag).
-
.urls(content, type, options = {}) ⇒ Array
Returns the URLs found in the content.
Class Method Details
.base_url(content, type) ⇒ String
Returns base URL found in the content, if available.
73 74 75 76 77 78 79 80 81 |
# File 'lib/content_urls.rb', line 73 def self.base_url(content, type) base = nil if (parser = get_parser(type)) if (parser.respond_to?(:base)) base = parser.base(content) end end base end |
.rewrite_each_url(content, type, &block) ⇒ Object
Rewrites each URL in the content by calling the supplied block with each URL.
95 96 97 98 99 100 101 102 103 |
# File 'lib/content_urls.rb', line 95 def self.rewrite_each_url(content, type, &block) if (parser = get_parser(type)) parser.rewrite_each_url(content) do |url| replacement = yield url (replacement.nil? ? url : replacement) end end content end |
.to_absolute(url, base_url) ⇒ Object
Convert a relative URL to an absolute URL using base_url (for example, the content’s original location or an HTML document’s href attribute of the base tag).
111 112 113 114 115 116 117 118 |
# File 'lib/content_urls.rb', line 111 def self.to_absolute(url, base_url) return nil if url.nil? url = URI.encode(URI.decode(url.to_s.gsub(/#[a-zA-Z0-9_-]*$/,''))) # remove anchor absolute = URI(base_url).merge(url) absolute.path = '/' if absolute.path.empty? absolute.to_s end |
.urls(content, type, options = {}) ⇒ Array
Returns the URLs found in the content.
# @example Parse content obtained from a robot
response = Net::HTTP.get_response(URI('http://example.com/sample-1'))
puts "URLs found at http://example.com/sample-1:"
ContentUrls.urls(response.body, response.content_type).each do |url|
puts " #{url}"
end
# => [a list of URLs found in the content located at http://example.com/sample-1]
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/content_urls.rb', line 39 def self.urls(content, type, = {}) = { :use_base_url => false, :content_url => nil, }.merge() urls = [] if (parser = get_parser(type)) base = base_url(content, type) if [:use_base_url] base = '' if URI(base || '').relative? if [:content_url] content_url = URI([:content_url]) rescue '' content_url = '' if URI(content_url).relative? base = URI.join(content_url, base) end if URI(base).relative? parser.urls(content).each { |url| urls << url } else parser.urls(content).each { |url| urls << URI.join( base, url).to_s } end end urls end |