Class: PrettyProxy
- Inherits:
-
Rack::Proxy
- Object
- Rack::Proxy
- PrettyProxy
- Defined in:
- lib/pretty_proxy.rb
Overview
The PrettyProxy class aggregate and validate the configuration of a proxy based in simple pretty url oriented rewriting rules. It’s too a rack app, and offers a abstract method for rewrite the responses returned by the proxy. The (X)HTML responses are rewritten to make the hyperlinks point to the proxy version of the page if it exist.
If you want to make a Rack app who use the proxy to point to another path of the same app you have to use a server in multithread mode, otherwise requests to the proxy will end in a deadlock. The proxy request the original page but the server don’t respond because is waiting the proxy request to be resolved. The proxy request don’t end because need the original page. A timeout error occur.
What this class can’t do but maybe will do in the future: smart handling of 3xx status response and chunked encoding (the chunks are concatened in the proxy and the transfer-encoding header removed); support more than deflate and gzip; exception classes with more than a message;
The exception classes (except Error) inherit Error, and Error inherit ArgumentError. They are empty yet, only have a message.
Glossary:
‘a valid proxy url/path’: The path (or the path of the url) start with the proxy_path and is followed by a original_path.
‘in(side)/out(side) the proxy control’: The url have (or not) the path starting with a original_path, and the scheme, port and host are the same of the original_domain.
CHANGELOG:
3.0.0
* return a String for unproxify_url (and not more a URI)
because this is a change in the API (and can break code) the major
version is now 3, if you don't use this method you can safely upgrade
* depends in addressable gem
* handles correctly the URIs without scheme (but with host)
like '//duckduckgo.com/' (spec added for that)
Defined Under Namespace
Classes: ConfigError, Error, ProxyError
Instance Attribute Summary collapse
-
#original_domain ⇒ Object
return the clone of the internal value.
-
#original_paths ⇒ Object
return the clone of the internal value (always a Set, no matter what is passed to initialize).
-
#proxy_path ⇒ Object
return the clone of the internal value.
Instance Method Summary collapse
-
#call(env) ⇒ Object
Make this class a Rack app.
-
#initialize(proxy_path, original_domain, original_paths) ⇒ PrettyProxy
constructor
Create a new PrettyProxy instance or raise a ConfigError.
-
#inside_proxy_control?(uri) ⇒ Boolean
Check if the URI::HTTP(S) is a page who can be accessed through the proxy.
-
#point_to_a_proxy_page?(hyperlink, proxy_domain) ⇒ Boolean
Take a url and the proxy domain (scheme, host and port) and return if the url point to a valid proxy page.
-
#proxify_html(html, proxy_url) ⇒ String
Take a (X)HTML Document and apply proxify_hyperlink to the ‘href’ attribute of each ‘a’ element.
-
#proxify_hyperlink(hyperlink, proxy_page_url) ⇒ String
Take a hyperlink and the url of the proxy page (not the original page) where it come from and return the rewritten hyperlink.
-
#rewrite_env(env) ⇒ Hash{String => String}
Modify a Rack environment hash of a request to the proxy version of a page to a request to the original page.
-
#rewrite_response(triplet, requested_to_proxy_env, rewritten_env) ⇒ Array<(Integer, Hash{String => String}, #each)>
Mainly apply the proxify_html to the body of the response if it is a html.
-
#same_domain_as_original?(uri) ⇒ Boolean
Check if the #scheme, #host, and #port of the argument are equal to the original_domain ones.
-
#sugared_rewrite_response(triplet, requested_to_proxy_env, rewritten_env) ⇒ Array<(Integer, Hash{String => String}, String)>
abstract
A unproxified copy of the first argument.
-
#unproxify_url(url) ⇒ String
Take a proxy url and return the original URL behind the proxy.
-
#valid_path_for_proxy?(absolute_path) ⇒ Boolean
Check if the absolute path begin with a proxy_path and is followed by a original_paths element.
Constructor Details
#initialize(proxy_path, original_domain, original_paths) ⇒ PrettyProxy
See the specs pretty_proxy_spec for examples and complete definition of invalid args.
Create a new PrettyProxy instance or raise a ConfigError. Clone the arguments.
90 91 92 93 94 95 96 97 98 99 100 101 102 |
# File 'lib/pretty_proxy.rb', line 90 def initialize(proxy_path, original_domain, original_paths) Utils.validate_proxy_path(proxy_path) Utils.validate_original_domain_and_paths(original_domain, original_paths) @proxy_path = proxy_path.clone @original_domain = Addressable::URI.parse(original_domain.clone) @original_paths = Set.new if original_paths.respond_to? :each original_paths.each { | value | @original_paths << value.clone } else @original_paths << original_paths.clone end end |
Instance Attribute Details
#original_domain ⇒ Object
return the clone of the internal value
111 112 113 |
# File 'lib/pretty_proxy.rb', line 111 [:proxy_path, :original_domain, :original_paths].each do | reader | define_method(reader) { instance_variable_get("@#{reader.to_s}").clone } end |
#original_paths ⇒ Object
return the clone of the internal value (always a Set, no matter what is passed to initialize).
111 112 113 |
# File 'lib/pretty_proxy.rb', line 111 [:proxy_path, :original_domain, :original_paths].each do | reader | define_method(reader) { instance_variable_get("@#{reader.to_s}").clone } end |
#proxy_path ⇒ Object
return the clone of the internal value
111 112 113 |
# File 'lib/pretty_proxy.rb', line 111 [:proxy_path, :original_domain, :original_paths].each do | reader | define_method(reader) { instance_variable_get("@#{reader.to_s}").clone } end |
Instance Method Details
#call(env) ⇒ Object
Make this class a Rack app. It’s overriden to repass to the rewrite_response the original Rack environment (request to the proxy) and the rewritten env (modified to point the original page request). If you don’t know the parameters and return of this method, please read http://rack.rubyforge.org/doc/SPEC.html.
364 365 366 367 368 369 370 371 |
# File 'lib/pretty_proxy.rb', line 364 def call(env) # in theory we only need to repass the rewritten_env, any original env info # needed can be passed as a environment application variable # example: (env['app_name.original_path'] = env['PATH_INFO']) # but to avoid this to be a common idiom we repass the original env too rewritten_env = rewrite_env(env) rewrite_response(perform_request(rewritten_env), env, rewritten_env) end |
#inside_proxy_control?(uri) ⇒ Boolean
Check if the URI::HTTP(S) is a page who can be accessed through the proxy.
380 381 382 383 |
# File 'lib/pretty_proxy.rb', line 380 def inside_proxy_control?(uri) same_domain_as_original?(uri) && valid_path_for_proxy?(@proxy_path + uri.path[1..-1]) end |
#point_to_a_proxy_page?(hyperlink, proxy_domain) ⇒ Boolean
Take a url and the proxy domain (scheme, host and port) and return if the url point to a valid proxy page.
405 406 407 408 |
# File 'lib/pretty_proxy.rb', line 405 def point_to_a_proxy_page?(hyperlink, proxy_domain) Utils.same_domain?(hyperlink, proxy_domain) && valid_path_for_proxy?(hyperlink.path) end |
#proxify_html(html, proxy_url) ⇒ String
Take a (X)HTML Document and apply proxify_hyperlink to the ‘href’ attribute of each ‘a’ element.
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
# File 'lib/pretty_proxy.rb', line 207 def proxify_html(html, proxy_url) parsed_html = nil # If you parse XHTML as HTML with Nokogiri and use to_s after the markup can be messed up # # Example: <meta name="description" content="not important" /> # becomes <meta name="description" content="not important" > # To avoid this we parse a document who is XML valid as XML, and, otherwise as HTML begin # this also isn't a great way to do this # the Nokogiri don't have exception classes, this way any StandardError will be silenced = Nokogiri::XML::ParseOptions::DEFAULT_XML & Nokogiri::XML::ParseOptions::STRICT & Nokogiri::XML::ParseOptions::DTDVALID parsed_html = Nokogiri::XML::Document.parse(html, nil, nil, ) rescue parsed_html = Nokogiri::HTML(html) end parsed_html.css('a').each do | hyperlink | hyperlink['href'] = proxify_hyperlink(hyperlink['href'], proxy_url) end parsed_html.to_s end |
#proxify_hyperlink(hyperlink, proxy_page_url) ⇒ String
Take a hyperlink and the url of the proxy page (not the original page) where it come from and return the rewritten hyperlink. If the page pointed vy the hyperlink is in the proxy control the rewritten hyperlink gonna point to the proxyfied version, otherwise gonna point to the original version.
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
# File 'lib/pretty_proxy.rb', line 161 def proxify_hyperlink(hyperlink, proxy_page_url) hyperlink = Addressable::URI.parse(hyperlink.clone) proxy_page_url = Addressable::URI.parse(proxy_page_url) # this is URI relative ('//duckduckgo.com', '/path', '../path') if hyperlink.relative? absolute_hyperlink = Addressable::URI.parse(unproxify_url(proxy_page_url)) .join(hyperlink) if inside_proxy_control? absolute_hyperlink # this is path relative ('../path', 'path', but not '//duckduckgo.com' or '/path') if Pathname.new(hyperlink.path).relative? if point_to_a_proxy_page?(absolute_hyperlink, proxy_page_url) # in the case of a relative path in the original page who points # to a proxy page, and the proxy page is inside the proxy control # we have to use the absolute_hyperlink or the page will be double # proxified. Example: ../proxy/content in http://example.com/proxy/content, # with original_path as '/' is http://example.com/proxy/proxy/content hyperlink = absolute_hyperlink end else hyperlink.path = @proxy_path[0..-2] + absolute_hyperlink.path hyperlink.host = proxy_page_url.host if hyperlink.host hyperlink.port = proxy_page_url.port if hyperlink.port end else hyperlink = absolute_hyperlink end else # the hyperlink is absolute if inside_proxy_control? hyperlink # if points to the proxy itself we don't double-proxify unless point_to_a_proxy_page?(hyperlink, proxy_page_url) hyperlink = proxify_uri(hyperlink, proxy_page_url) end end end hyperlink.to_s end |
#rewrite_env(env) ⇒ Hash{String => String}
Modify a Rack environment hash of a request to the proxy version of a page to a request to the original page. As in Rack::proxy is used by #call for require the original page before call rewrite_response in the response. If you want to use your own rewrite rules maybe is more wise to subclass Rack::Proxy instead subclass this class. The purpose of this class is mainly implement and enforce these rules for you.
243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 |
# File 'lib/pretty_proxy.rb', line 243 def rewrite_env(env) env = env.clone url_requested_to_proxy = Rack::Request.new(env).url # Using URI, and not Addressable::URI because the port value is incorrect in the last unproxified_url = Addressable::URI.parse(unproxify_url(url_requested_to_proxy)) if env['HTTP_HOST'] env['HTTP_HOST'] = unproxified_url.host end env['SERVER_NAME'] = unproxified_url.host env['SERVER_PORT'] = unproxified_url.inferred_port.to_s if env['SCRIPT_NAME'].empty? && !env['PATH_INFO'].empty? env['PATH_INFO'] = unproxified_url.path end if !env['SCRIPT_NAME'].empty? && env['PATH_INFO'].empty? env['SCRIPT_NAME'] = unproxified_url.path end # Seriously, i don't know how to split again the unproxified url, so PATH_INFO gonna have the full path if (!env['SCRIPT_NAME'].empty? && !env['PATH_INFO'].empty?) || (env['SCRIPT_NAME'].empty? && env['PATH_INFO'].empty?) env['PATH_INFO'] = unproxified_url.path env['SCRIPT_NAME'] = '' end env['REQUEST_PATH'] = unproxified_url.path env['REQUEST_URI'] = unproxified_url.path env end |
#rewrite_response(triplet, requested_to_proxy_env, rewritten_env) ⇒ Array<(Integer, Hash{String => String}, #each)>
Mainly apply the proxify_html to the body of the response if it is a html. Raise an error if the ‘content-encoding’ is other than deflate, gzip or identity. Change the ‘content-length’ header for the new body bytesize. Remove the ‘transfer-encoding’ if it is chunked, and act as not chunked. This method is inherited of Rack::Proxy, but in the original it have only the first parameter (the triplet). This version have the request Rack env to the proxy and the rewritten Rack env as second and third parameters, respectively.
292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 |
# File 'lib/pretty_proxy.rb', line 292 def rewrite_response(triplet, requested_to_proxy_env, rewritten_env) status, headers, body = triplet content_type = headers['content-type'] return triplet unless %r{text/html} =~ content_type || %r{application/xhtml\+xml} =~ content_type # the #each method of body can't be called twice, but we need to call it here and it is called # after this method return, so we fake the body with a array of one string # we can't return a string (even it responds to #each) see: http://rack.rubyforge.org/doc/SPEC.html (section 'The Body') page = '' body.each do | chunk | page << chunk end case headers['content-encoding'] when 'gzip' then page = Zlib::GzipReader.new(StringIO.new(page)).read when 'deflate' then page = Zlib::Inflate.inflate(page) when 'identity' then page = page when nil then page = page else fail ProxyError, 'unknown content-encoding, only encodings known are gzip, deflate and identity' end page = proxify_html(page, Rack::Request.new(requested_to_proxy_env).url) status, headers, page = sugared_rewrite_response([status, headers, page], requested_to_proxy_env, rewritten_env) case headers['content-encoding'] when 'gzip' page_ = page.clone gzip_stream = Zlib::GzipWriter.new(StringIO.new(page_)) gzip_stream.write page gzip_stream.close page = page_ when 'deflate' then page = Zlib::Deflate.deflate(page) end headers['content-length'] = page.bytesize.to_s if headers['content-length'] # TODO: find a way to make the code work with chunked encoding if 'chunked' == headers['transfer-encoding'] headers.delete('transfer-encoding') headers['content-length'] = page.bytesize.to_s end [status, headers, [page]] end |
#same_domain_as_original?(uri) ⇒ Boolean
Check if the #scheme, #host, and #port of the argument are equal to the original_domain ones.
375 376 377 |
# File 'lib/pretty_proxy.rb', line 375 def same_domain_as_original?(uri) Utils.same_domain?(@original_domain, uri) end |
#sugared_rewrite_response(triplet, requested_to_proxy_env, rewritten_env) ⇒ Array<(Integer, Hash{String => String}, String)>
This method is called only over (X)HTML responses, after they are decompressed and the hyperlinks proxified, before they are compressed again and the new content-length calculated.
The body of the triplet is a String and not a object who respond to #each, the same has to be true in the return. Return a modified clone of the response, don’t change the argument.
Returns A unproxified copy of the first argument.
355 356 357 |
# File 'lib/pretty_proxy.rb', line 355 def sugared_rewrite_response(triplet, requested_to_proxy_env, rewritten_env) triplet end |
#unproxify_url(url) ⇒ String
Take a proxy url and return the original URL behind the proxy. Preserve the query and fragment, if any. For the rewrite of a request @see rewrite_env.
135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
# File 'lib/pretty_proxy.rb', line 135 def unproxify_url(url) url = Addressable::URI.parse(url.clone) unless valid_path_for_proxy? url.path fail ProxyError, "'#{url.to_s}' isn't inside the proxy control, it can't be unproxified" end url.site = @original_domain.site url.path = url.path.slice((@proxy_path.size-1)..-1) url.to_s rescue Addressable::URI::InvalidURIError raise ArgumentError, "the url argument isn't a valid uri" end |
#valid_path_for_proxy?(absolute_path) ⇒ Boolean
Check if the absolute path begin with a proxy_path and is followed by a original_paths element.
387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 |
# File 'lib/pretty_proxy.rb', line 387 def valid_path_for_proxy?(absolute_path) return false unless absolute_path.start_with?(@proxy_path) path_without_proxy_prefix = absolute_path[(@proxy_path.size-1)..-1] @original_paths.any? do | original_path | # if we don't test this '/about' and '/about_us' will match if original_path.end_with? '/' path_without_proxy_prefix.start_with? original_path else path_without_proxy_prefix == original_path || path_without_proxy_prefix.start_with?("#{original_path}/") end end end |