Class: Arachni::Parser
- Includes:
- UI::Output, Utilities
- Defined in:
- lib/arachni/parser.rb
Overview
Analyzer class
Analyzes HTML code extracting forms, links and cookies depending on user opts.
It grabs all element attributes not just URLs and variables. All URLs are converted to absolute and URLs outside the domain are ignored.
Forms
Form analysis uses both regular expressions and the Nokogiri parser in order to be able to handle badly written HTML code, such as not closed tags and tag overlaps.
In order to ease audits, in addition to parsing forms into data structures like “select” and “option”, all auditable inputs are put under the “auditable” key.
Links
Links are extracted using the Nokogiri parser.
Cookies
Cookies are extracted from the HTTP headers and parsed by WEBrick::Cookie
Defined Under Namespace
Modules: Extractors
Instance Attribute Summary collapse
-
#opts ⇒ Options
readonly
Options instance.
-
#url ⇒ String
The url of the page.
Instance Method Summary collapse
-
#base ⇒ String
Base href if there is one.
-
#cookies ⇒ Array<Element::Cookie>
Extracts cookies from an HTTP headers and the response body.
- #doc ⇒ Object
-
#forms(html = nil) ⇒ Array<Element::Form>
Extracts forms from HTML document.
-
#headers ⇒ Hash
Returns a list of valid auditable HTTP header fields.
-
#initialize(res, opts = Options) ⇒ Parser
constructor
Instantiates Analyzer class with user options.
-
#link_vars(url) ⇒ Hash
Extracts variables and their values from a link.
-
#links(html = nil) ⇒ Array<Element::Link>
Extracts links from HTML document.
-
#page ⇒ Page
(also: #run)
Runs the Analyzer and extracts forms, links and cookies.
-
#path_in_domain?(url) ⇒ Bool
True if URL is within domain limits, false if not.
-
#paths ⇒ Array<String>
Array of distinct links to follow.
- #text? ⇒ Boolean
-
#to_absolute(relative_url) ⇒ String
Converts a relative URL to an absolute one.
Methods included from Utilities
#cookie_encode, #cookies_from_document, #cookies_from_file, #cookies_from_response, #exception_jail, #exclude_path?, #extract_domain, #form_decode, #form_encode, #form_parse_request_body, #forms_from_document, #forms_from_response, #get_path, #hash_keys_to_str, #html_decode, #html_encode, #include_path?, #links_from_document, #links_from_response, #normalize_url, #page_from_response, #page_from_url, #parse_query, #parse_set_cookie, #parse_url_vars, #path_too_deep?, #remove_constants, #seed, #skip_path?, #uri_decode, #uri_encode, #uri_parse, #uri_parser, #url_sanitize
Methods included from UI::Output
#debug?, #debug_off, #debug_on, #disable_only_positives, #flush_buffer, #mute, #muted?, old_reset_output_options, #only_positives, #only_positives?, #print_bad, #print_debug, #print_debug_backtrace, #print_debug_pp, #print_error, #print_error_backtrace, #print_info, #print_line, #print_ok, #print_status, #print_verbose, #reroute_to_file, #reroute_to_file?, reset_output_options, #set_buffer_cap, #uncap_buffer, #unmute, #verbose, #verbose?
Constructor Details
#initialize(res, opts = Options) ⇒ Parser
Instantiates Analyzer class with user options.
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
# File 'lib/arachni/parser.rb', line 101 def initialize( res, opts = Options ) @opts = opts if res.is_a? Array @secondary_responses = res[1..-1] @secondary_responses.compact! if @secondary_responses res = res.shift end @code = res.code self.url = res.effective_url @html = res.body @response = res @response_headers = res.headers_hash @doc = nil @paths = nil end |
Instance Attribute Details
#opts ⇒ Options (readonly)
Options instance
93 94 95 |
# File 'lib/arachni/parser.rb', line 93 def opts @opts end |
#url ⇒ String
Returns the url of the page.
86 87 88 |
# File 'lib/arachni/parser.rb', line 86 def url @url end |
Instance Method Details
#base ⇒ String
Returns base href if there is one.
366 367 368 369 370 371 |
# File 'lib/arachni/parser.rb', line 366 def base @base ||= begin doc.search( '//base[@href]' ).first['href'] rescue end end |
#cookies ⇒ Array<Element::Cookie>
Extracts cookies from an HTTP headers and the response body
345 346 347 348 |
# File 'lib/arachni/parser.rb', line 345 def ( Cookie.from_document( @url, doc ) | Cookie.from_headers( @url, @response_headers ) ) end |
#doc ⇒ Object
250 251 252 253 |
# File 'lib/arachni/parser.rb', line 250 def doc return @doc if @doc @doc = Nokogiri::HTML( @html ) if text? rescue nil end |
#forms(html = nil) ⇒ Array<Element::Form>
Extracts forms from HTML document
285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 |
# File 'lib/arachni/parser.rb', line 285 def forms( html = nil ) return [] if !text? && !html f = Form.from_document( @url, html || doc ) if @secondary_responses @secondary_responses.each do |response| next if response.body.to_s.empty? Form.from_document( @url, response.body ).each do |form2| f.each do |form| next if form.auditable.keys.sort != form2.auditable.keys.sort form.auditable.each do |k, v| if v != form2.auditable[k] && form.field_type_for( k ) == 'hidden' form.nonce_name = k end end end end end end f end |
#headers ⇒ Hash
Returns a list of valid auditable HTTP header fields.
It’s more of a placeholder method, it doesn’t actually analyze anything.<br/> It’s a long shot that any of these will be vulnerable but better be safe than sorry.
264 265 266 267 268 269 270 271 272 273 274 275 276 |
# File 'lib/arachni/parser.rb', line 264 def headers { 'Accept' => 'text/html,application/xhtml+xml,application' + '/xml;q=0.9,*/*;q=0.8', 'Accept-Charset' => 'ISO-8859-1,utf-8;q=0.7,*;q=0.7', 'Accept-Language' => 'en-gb,en;q=0.5', 'Accept-Encoding' => 'gzip;q=1.0,deflate;q=0.6,identity;q=0.3', 'From' => @opts.authed_by || '', 'User-Agent' => @opts.user_agent || '', 'Referer' => @url, 'Pragma' => 'no-cache' }.map { |k, v| Header.new( @url, { k => v } ) } end |
#link_vars(url) ⇒ Hash
Extracts variables and their values from a link
336 337 338 |
# File 'lib/arachni/parser.rb', line 336 def link_vars( url ) Link.parse_query_vars( url ) end |
#links(html = nil) ⇒ Array<Element::Link>
Extracts links from HTML document
317 318 319 320 321 322 323 324 325 |
# File 'lib/arachni/parser.rb', line 317 def links( html = nil ) return [] if !text? && !html if !(vars = link_vars( @url )).empty? || @response.redirection? [Link.new( @url, vars )] else [] end | Link.from_document( @url, html || doc ) end |
#page ⇒ Page Also known as: run
Runs the Analyzer and extracts forms, links and cookies
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 |
# File 'lib/arachni/parser.rb', line 153 def page req_method = 'get' begin req_method = @response.request.method.to_s rescue end self_link = Link.new( @url, inputs: link_vars( @url ) ) # non text files won't contain any auditable elements if !text? return Page.new( code: @code, url: @url, method: req_method, query_vars: self_link.auditable, body: @html, request_headers: @response.request.headers, response_headers: @response_headers, text: false ) end # extract cookies from the response = # make a list of the response cookie names = .map { |c| c.name } from_jar = [] # if there's a Netscape cookiejar file load cookies from it but only new ones, # i.e. only if they weren't in the response if @opts. from_jar |= ( @url, @opts. ) .reject { |c| .include?( c.name ) } end # if we somehow have any runtime configuration cookies load them too # but only if they haven't already been seen if @opts. && !@opts..empty? from_jar |= @opts..reject { |c| .include?( c.name ) } end # grab cookies from the HTTP cookiejar and filter out old ones, as usual from_http_jar = HTTP.instance...reject do |c| .include?( c.name ) end # these cookies are to be audited and thus are dirty and anarchistic # so they have to contain even cookies completely irrelevant to the # current page, i.e. it contains all cookies that have been observed # from the beginning of the scan = ( | from_jar | from_http_jar).map do |c| dc = c.dup dc.action = @url dc end Page.new( code: @code, url: @url, query_vars: self_link.auditable, method: req_method, body: @html, request_headers: @response.request.headers, response_headers: @response_headers, document: doc, # all paths seen in the page paths: paths, forms: forms, # all href attributes from 'a' elements links: links | [self_link], cookies: , headers: headers, # this is the page cookiejar, each time the page is to be audited # by a module the cookiejar of the HTTP class will be updated # with the cookies specified here cookiejar: | from_jar, text: true ) end |
#path_in_domain?(url) ⇒ Bool
Returns true if URL is within domain limits, false if not.
144 145 146 |
# File 'lib/arachni/parser.rb', line 144 def path_in_domain?( url ) super( url, @url ) end |
#paths ⇒ Array<String>
Array of distinct links to follow
355 356 357 358 359 360 361 |
# File 'lib/arachni/parser.rb', line 355 def paths return @paths unless @paths.nil? @paths = [] return @paths if !doc @paths = run_extractors end |
#text? ⇒ Boolean
244 245 246 247 248 |
# File 'lib/arachni/parser.rb', line 244 def text? type = @response.content_type return false if !type type.to_s.substring?( 'text' ) end |
#to_absolute(relative_url) ⇒ String
Converts a relative URL to an absolute one.
130 131 132 133 134 135 136 137 |
# File 'lib/arachni/parser.rb', line 130 def to_absolute( relative_url ) if url = base base_url = url else base_url = @url end super( relative_url, base_url ) end |