Class: Arachni::Parser

Inherits:

Object

Object
Arachni::Parser

show all

Includes:: UI::Output, Utilities

Defined in:: lib/arachni/parser.rb

Overview

Analyzer class

Analyzes HTML code extracting forms, links and cookies depending on user opts.

It grabs all element attributes not just URLs and variables. All URLs are converted to absolute and URLs outside the domain are ignored.

Forms

Form analysis uses both regular expressions and the Nokogiri parser in order to be able to handle badly written HTML code, such as not closed tags and tag overlaps.

In order to ease audits, in addition to parsing forms into data structures like “select” and “option”, all auditable inputs are put under the “auditable” key.

Links

Links are extracted using the Nokogiri parser.

Cookies

Cookies are extracted from the HTTP headers and parsed by WEBrick::Cookie

Author:

Tasos “Zapotek” Laskos <[email protected]>

Defined Under Namespace

Modules: Extractors

Instance Attribute Summary collapse

#opts ⇒ Options readonly

Options instance.
#url ⇒ String

The url of the page.

Instance Method Summary collapse

#base ⇒ String

Base href if there is one.
#cookies ⇒ Array<Element::Cookie>

Extracts cookies from an HTTP headers and the response body.
#doc ⇒ Object
#forms(html = nil) ⇒ Array<Element::Form>

Extracts forms from HTML document.
#headers ⇒ Hash

Returns a list of valid auditable HTTP header fields.
#initialize(res, opts = Options) ⇒ Parser constructor

Instantiates Analyzer class with user options.
#link_vars(url) ⇒ Hash

Extracts variables and their values from a link.
#links(html = nil) ⇒ Array<Element::Link>

Extracts links from HTML document.
#page ⇒ Page (also: #run)

Runs the Analyzer and extracts forms, links and cookies.
#path_in_domain?(url) ⇒ Bool

True if URL is within domain limits, false if not.
#paths ⇒ Array<String>

Array of distinct links to follow.
#text? ⇒ Boolean
#to_absolute(relative_url) ⇒ String

Converts a relative URL to an absolute one.

Methods included from Utilities

#cookie_encode, #cookies_from_document, #cookies_from_file, #cookies_from_response, #exception_jail, #exclude_path?, #extract_domain, #form_decode, #form_encode, #form_parse_request_body, #forms_from_document, #forms_from_response, #get_path, #hash_keys_to_str, #html_decode, #html_encode, #include_path?, #links_from_document, #links_from_response, #normalize_url, #page_from_response, #page_from_url, #parse_query, #parse_set_cookie, #parse_url_vars, #path_too_deep?, #remove_constants, #seed, #skip_path?, #uri_decode, #uri_encode, #uri_parse, #uri_parser, #url_sanitize

Methods included from UI::Output

#debug?, #debug_off, #debug_on, #disable_only_positives, #flush_buffer, #mute, #muted?, old_reset_output_options, #only_positives, #only_positives?, #print_bad, #print_debug, #print_debug_backtrace, #print_debug_pp, #print_error, #print_error_backtrace, #print_info, #print_line, #print_ok, #print_status, #print_verbose, #reroute_to_file, #reroute_to_file?, reset_output_options, #set_buffer_cap, #uncap_buffer, #unmute, #verbose, #verbose?

Constructor Details

#initialize(res, opts = Options) ⇒ `Parser`

Instantiates Analyzer class with user options.

Parameters:

res (Typhoeus::Responses, Array<Typhoeus::Responses>)
opts (Options) (defaults to: Options)

# File 'lib/arachni/parser.rb', line 101

def initialize( res, opts = Options )
    @opts = opts

    if res.is_a? Array
        @secondary_responses = res[1..-1]
        @secondary_responses.compact! if @secondary_responses
        res = res.shift
    end

    @code     = res.code
    self.url  = res.effective_url
    @html     = res.body
    @response = res

    @response_headers = res.headers_hash

    @doc   = nil
    @paths = nil
end

Instance Attribute Details

#opts ⇒ `Options` (readonly)

Options instance

Returns:

(Options)



93
94
95

# File 'lib/arachni/parser.rb', line 93

def opts
  @opts
end

#url ⇒ `String`

Returns the url of the page.

Returns:

(String) —

the url of the page



86
87
88

# File 'lib/arachni/parser.rb', line 86

def url
  @url
end

Instance Method Details

#base ⇒ `String`

Returns base href if there is one.

Returns:

(String) —

base href if there is one

# File 'lib/arachni/parser.rb', line 366

def base
    @base ||= begin
        doc.search( '//base[@href]' ).first['href']
    rescue
    end
end

#cookies ⇒ `Array<Element::Cookie>`

Extracts cookies from an HTTP headers and the response body

Returns:

(Array<Element::Cookie>)

# File 'lib/arachni/parser.rb', line 345

def cookies
    ( Cookie.from_document( @url, doc ) |
      Cookie.from_headers( @url, @response_headers ) )
end

#doc ⇒ `Object`

# File 'lib/arachni/parser.rb', line 250

def doc
    return @doc if @doc
    @doc = Nokogiri::HTML( @html ) if text? rescue nil
end

#forms(html = nil) ⇒ `Array<Element::Form>`

Extracts forms from HTML document

Parameters:

html (String) (defaults to: nil)

Returns:

(Array<Element::Form>) —

array of forms

# File 'lib/arachni/parser.rb', line 285

def forms( html = nil )
    return [] if !text? && !html

    f = Form.from_document( @url, html || doc )

    if @secondary_responses
        @secondary_responses.each do |response|
            next if response.body.to_s.empty?

            Form.from_document( @url, response.body ).each do |form2|
                f.each do |form|
                    next if form.auditable.keys.sort != form2.auditable.keys.sort
                    form.auditable.each do |k, v|
                        if v != form2.auditable[k] && form.field_type_for( k ) == 'hidden'
                            form.nonce_name = k
                        end
                    end
                end
            end
        end
    end

    f
end

#headers ⇒ `Hash`

Returns a list of valid auditable HTTP header fields.

It’s more of a placeholder method, it doesn’t actually analyze anything.<br/> It’s a long shot that any of these will be vulnerable but better be safe than sorry.

Returns:

(Hash) —

HTTP header fields

# File 'lib/arachni/parser.rb', line 264

def headers
    {
        'Accept'          => 'text/html,application/xhtml+xml,application' +
            '/xml;q=0.9,*/*;q=0.8',
        'Accept-Charset'  => 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
        'Accept-Language' => 'en-gb,en;q=0.5',
        'Accept-Encoding' => 'gzip;q=1.0,deflate;q=0.6,identity;q=0.3',
        'From'       => @opts.authed_by  || '',
        'User-Agent' => @opts.user_agent || '',
        'Referer'    => @url,
        'Pragma'     => 'no-cache'
    }.map { |k, v| Header.new( @url, { k => v } ) }
end

#link_vars(url) ⇒ `Hash`

Extracts variables and their values from a link

Parameters:

url (String)

Returns:

(Hash) —

name=>value pairs

#links(html = nil) ⇒ `Array<Element::Link>`

Extracts links from HTML document

Parameters:

html (String) (defaults to: nil)

Returns:

(Array<Element::Link>) —

of links

# File 'lib/arachni/parser.rb', line 317

def links( html = nil )
    return [] if !text? && !html

    if !(vars = link_vars( @url )).empty? || @response.redirection?
        [Link.new( @url, vars )]
    else
        []
    end | Link.from_document( @url, html || doc )
end

#page ⇒ `Page` Also known as: run

Runs the Analyzer and extracts forms, links and cookies

Returns:

(Page)

# File 'lib/arachni/parser.rb', line 153

def page
    req_method = 'get'
    begin
        req_method = @response.request.method.to_s
    rescue
    end

    self_link = Link.new( @url, inputs: link_vars( @url ) )

    # non text files won't contain any auditable elements
    if !text?
        return Page.new(
            code:             @code,
            url:              @url,
            method:           req_method,
            query_vars:       self_link.auditable,
            body:             @html,
            request_headers:  @response.request.headers,
            response_headers: @response_headers,
            text:             false
        )
    end

    # extract cookies from the response
    c_cookies = cookies

    # make a list of the response cookie names
    cookie_names = c_cookies.map { |c| c.name }

    from_jar = []

    # if there's a Netscape cookiejar file load cookies from it but only new ones,
    # i.e. only if they weren't in the response
    if @opts.cookie_jar
        from_jar |= cookies_from_file( @url, @opts.cookie_jar )
            .reject { |c| cookie_names.include?( c.name ) }
    end

    # if we somehow have any runtime configuration cookies load them too
    # but only if they haven't already been seen
    if @opts.cookies && !@opts.cookies.empty?
        from_jar |= @opts.cookies.reject { |c| cookie_names.include?( c.name ) }
    end

    # grab cookies from the HTTP cookiejar and filter out old ones, as usual
    from_http_jar = HTTP.instance.cookie_jar.cookies.reject do |c|
        cookie_names.include?( c.name )
    end

    # these cookies are to be audited and thus are dirty and anarchistic
    # so they have to contain even cookies completely irrelevant to the
    # current page, i.e. it contains all cookies that have been observed
    # from the beginning of the scan
    cookies_to_be_audited = (c_cookies | from_jar | from_http_jar).map do |c|
        dc = c.dup
        dc.action = @url
        dc
    end

    Page.new(
        code:             @code,
        url:              @url,
        query_vars:       self_link.auditable,
        method:           req_method,
        body:             @html,

        request_headers:  @response.request.headers,
        response_headers: @response_headers,

        document:         doc,

        # all paths seen in the page
        paths:            paths,
        forms:            forms,

        # all href attributes from 'a' elements
        links:            links | [self_link],

        cookies:          cookies_to_be_audited,
        headers:          headers,

        # this is the page cookiejar, each time the page is to be audited
        # by a module the cookiejar of the HTTP class will be updated
        # with the cookies specified here
        cookiejar:        c_cookies | from_jar,

        text:             true
    )
end

#path_in_domain?(url) ⇒ `Bool`

Returns true if URL is within domain limits, false if not.

Parameters:

url (String) —

to check

Returns:

(Bool) —

true if URL is within domain limits, false if not



144
145
146

# File 'lib/arachni/parser.rb', line 144

def path_in_domain?( url )
    super( url, @url )
end

#paths ⇒ `Array<String>`

Array of distinct links to follow

Returns:

(Array<String>)

# File 'lib/arachni/parser.rb', line 355

def paths
  return @paths unless @paths.nil?
  @paths = []
  return @paths if !doc

  @paths = run_extractors
end

#text? ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/arachni/parser.rb', line 244

def text?
    type = @response.content_type
    return false if !type
    type.to_s.substring?( 'text' )
end

#to_absolute(relative_url) ⇒ `String`

Converts a relative URL to an absolute one.

Returns:

(String) —

absolute URL

# File 'lib/arachni/parser.rb', line 130

def to_absolute( relative_url )
    if url = base
        base_url = url
    else
        base_url = @url
    end
    super( relative_url, base_url )
end

Class: Arachni::Parser

Overview

Forms

Links

Cookies

Defined Under Namespace

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Utilities

Methods included from UI::Output

Constructor Details

#initialize(res, opts = Options) ⇒ Parser

Instance Attribute Details

#opts ⇒ Options (readonly)

#url ⇒ String

Instance Method Details

#base ⇒ String

#cookies ⇒ Array<Element::Cookie>

#doc ⇒ Object

#forms(html = nil) ⇒ Array<Element::Form>

#headers ⇒ Hash

#link_vars(url) ⇒ Hash

#links(html = nil) ⇒ Array<Element::Link>

#page ⇒ Page Also known as: run

#path_in_domain?(url) ⇒ Bool

#paths ⇒ Array<String>

#text? ⇒ Boolean

#to_absolute(relative_url) ⇒ String

#initialize(res, opts = Options) ⇒ `Parser`

#opts ⇒ `Options` (readonly)

#url ⇒ `String`

#base ⇒ `String`

#cookies ⇒ `Array<Element::Cookie>`

#doc ⇒ `Object`

#forms(html = nil) ⇒ `Array<Element::Form>`

#headers ⇒ `Hash`

#link_vars(url) ⇒ `Hash`

#links(html = nil) ⇒ `Array<Element::Link>`

#page ⇒ `Page` Also known as: run

#path_in_domain?(url) ⇒ `Bool`

#paths ⇒ `Array<String>`

#text? ⇒ `Boolean`

#to_absolute(relative_url) ⇒ `String`