Class: RUtilAnts::URLCache::URLHandlers::HTTP
- Inherits:
-
Object
- Object
- RUtilAnts::URLCache::URLHandlers::HTTP
- Defined in:
- lib/rUtilAnts/URLHandlers/HTTP.rb
Overview
Handler of HTTP URLs
Class Method Summary collapse
-
.getMatchingRegexps ⇒ Object
Get a list of regexps matching the URL to get to this handler.
Instance Method Summary collapse
-
#getContent(iFollowRedirections) ⇒ Object
Get the content of the URL.
-
#getCorrespondingFileBaseName ⇒ Object
Get a corresponding file base name.
-
#getCRC ⇒ Object
Get the current CRC of the URL.
-
#getServerID ⇒ Object
Get the server ID.
-
#initialize(iURL) ⇒ HTTP
constructor
Constructor.
Constructor Details
#initialize(iURL) ⇒ HTTP
Constructor
Parameters:
-
iURL (String): The URL that this handler will manage
29 30 31 32 33 34 35 36 37 38 39 40 |
# File 'lib/rUtilAnts/URLHandlers/HTTP.rb', line 29 def initialize(iURL) @URL = iURL lURLMatch = iURL.match(/^(http|https):\/\/([^\/]*)\/(.*)$/) if (lURLMatch == nil) lURLMatch = iURL.match(/^(http|https):\/\/(.*)$/) end if (lURLMatch == nil) logBug "URL #{iURL} was identified as an http like, but it appears to be false." else @URLProtocol, @URLServer, @URLPath = lURLMatch[1..3] end end |
Class Method Details
.getMatchingRegexps ⇒ Object
Get a list of regexps matching the URL to get to this handler
Return:
-
list<Regexp>: The list of regexps matching URLs from this handler
19 20 21 22 23 |
# File 'lib/rUtilAnts/URLHandlers/HTTP.rb', line 19 def self.getMatchingRegexps return [ /^(http|https):\/\/.*$/ ] end |
Instance Method Details
#getContent(iFollowRedirections) ⇒ Object
Get the content of the URL
Parameters:
-
iFollowRedirections (Boolean): Do we follow redirections while accessing the content ?
Return:
-
Integer: Type of content returned
-
Object: The content, depending on the type previously returned:
** Exception if CONTENT_ERROR: The corresponding error ** String if CONTENT_REDIRECT: The new URL ** String if CONTENT_STRING: The real content ** String if CONTENT_LOCALFILENAME: The name of the local file name storing the content ** String if CONTENT_LOCALFILENAME_TEMPORARY: The name of the temporary local file name storing the content
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
# File 'lib/rUtilAnts/URLHandlers/HTTP.rb', line 82 def getContent(iFollowRedirections) rContentFormat = nil rContent = nil begin require 'net/http' Net::HTTP.start(@URLServer) do |iHTTPConnection| # Some websites filter out the default user agent (commons.mediawiki.org for example). Set another one. lResponse = iHTTPConnection.request_get("/#{@URLPath}", {'User-Agent' => 'RUtilAnts'}) if ((iFollowRedirections) and (lResponse.is_a?(Net::HTTPRedirection))) # We access the file through a new URL rContent = lResponse['location'] lNewURLMatch = rContent.match(/^(ftp|ftps|http|https):\/\/(.*)$/) if (lNewURLMatch == nil) if (rContent[0..0] == '/') rContent = "#{@URLProtocol}://#{@URLServer}#{rContent}" else rContent = "#{@URLProtocol}://#{@URLServer}/#{File.dirname(@URLPath)}/#{rContent}" end end rContentFormat = CONTENT_REDIRECT elsif (lResponse.is_a?(Net::HTTPOK)) # We have the web page rContent = lResponse.body rContentFormat = CONTENT_STRING else # An error occurred rContent = RuntimeError.new("Access error to #{@URL}: #{lResponse.code}.") rContentFormat = CONTENT_ERROR end end rescue Exception rContent = $! rContentFormat = CONTENT_ERROR end return rContentFormat, rContent end |
#getCorrespondingFileBaseName ⇒ Object
Get a corresponding file base name. This method has to make sure file extensions are respected, as it can be used for further processing.
Return:
-
String: The file name
64 65 66 67 68 |
# File 'lib/rUtilAnts/URLHandlers/HTTP.rb', line 64 def getCorrespondingFileBaseName # TODO: Handle the case where there is no base name (ie. www.google.com instead of www.google.com/index.html) # Check that extension has no characters following the URL (#, ? and ;) return getValidFileName(File.basename(@URLPath.gsub(/^([^#\?;]*).*$/,'\1'))) end |
#getCRC ⇒ Object
Get the current CRC of the URL
Return:
-
Integer: The CRC
54 55 56 57 |
# File 'lib/rUtilAnts/URLHandlers/HTTP.rb', line 54 def getCRC # We consider HTTP URLs to be definitive: CRCs will never change. return 0 end |
#getServerID ⇒ Object
Get the server ID
Return:
-
String: The server ID
46 47 48 |
# File 'lib/rUtilAnts/URLHandlers/HTTP.rb', line 46 def getServerID return "#{@URLProtocol}://#{@URLServer}" end |