Class: Arachni::Parser

Inherits:

Object

Object
Arachni::Parser

show all

Includes:: Module::Utilities

Defined in:: lib/parser/parser.rb,
lib/parser/page.rb,
lib/parser/elements.rb

Overview

Analyzer class

Analyzes HTML code extracting forms, links and cookies depending on user opts.

It grabs all element attributes not just URLs and variables. All URLs are converted to absolute and URLs outside the domain are ignored.

Forms

Form analysis uses both regular expressions and the Nokogiri parser in order to be able to handle badly written HTML code, such as not closed tags and tag overlaps.

In order to ease audits, in addition to parsing forms into data structures like “select” and “option”, all auditable inputs are put under the “auditable” key.

Links

Links are extracted using the Nokogiri parser.

Cookies

Cookies are extracted from the HTTP headers and parsed by WEBrick::Cookie

@author: Tasos “Zapotek” Laskos

<[email protected]>
<[email protected]>

@version: 0.2

Defined Under Namespace

Modules: Element Classes: Page

Instance Attribute Summary collapse

#opts ⇒ Options readonly

Options instance.
#url ⇒ String

The url of the page.

Instance Method Summary collapse

#cookies ⇒ Array<Element::Cookie>

Extracts cookies from an HTTP headers.
#doc ⇒ Object
#exclude?(url) ⇒ Boolean
#extract_domain(url) ⇒ String

Extracts the domain from a URI object.
#forms(html = nil) ⇒ Array<Element::Form>

TODO: Add support for radio buttons.
#headers ⇒ Hash

Returns a list of valid auditable HTTP header fields.
#in_domain?(uri) ⇒ Boolean

Returns true if uri is in the same domain as the page, returns false otherwise.
#include?(url) ⇒ Boolean
#initialize(opts, res) ⇒ Parser constructor

Constructor Instantiates Analyzer class with user options.
#link_vars(link) ⇒ Hash

Extracts variables and their values from a link.
#links ⇒ Array<Element::Link>

Extracts links from HTML document.
#merge_with_cookiejar(cookies) ⇒ Array<Element::Cookie>

Merges ‘cookies’ with the cookiejar and returns it as an array.
#merge_with_cookiestore(cookies) ⇒ Object
#run ⇒ Page

Runs the Analyzer and extracts forms, links and cookies.
#to_absolute(link) ⇒ String

Converts relative URL link into an absolute URL based on the location of the page.

Methods included from Module::Utilities

#exception_jail, #get_path, #normalize_url, #read_file, #seed

Constructor Details

#initialize(opts, res) ⇒ `Parser`

Constructor Instantiates Analyzer class with user options.

Parameters:

opts (Options)

# File 'lib/parser/parser.rb', line 68

def initialize( opts, res )
    @opts = opts

    @url  = res.effective_url
    @html = res.body
    @response_headers = res.headers_hash
end

Instance Attribute Details

#opts ⇒ `Options` (readonly)

Options instance

Returns:

(Options)



60
61
62

# File 'lib/parser/parser.rb', line 60

def opts
  @opts
end

#url ⇒ `String`

Returns the url of the page.

Returns:

(String) —

the url of the page



53
54
55

# File 'lib/parser/parser.rb', line 53

def url
  @url
end

Instance Method Details

#cookies ⇒ `Array<Element::Cookie>`

Extracts cookies from an HTTP headers

Parameters:

headers (String) —

HTTP headers
html (String) —

the HTML code of the page

Returns:

(Array<Element::Cookie>) —

of cookies

# File 'lib/parser/parser.rb', line 343

def cookies

    cookies_arr = []
    cookies     = []

    begin
        doc.search( "//meta[@http-equiv]" ).each {
            |elem|

            next if elem['http-equiv'].downcase != 'set-cookie'
            k, v = elem['content'].split( ';' )[0].split( '=', 2 )
            cookies_arr << Element::Cookie.new( @url, { 'name' => k, 'value' => v } )
        }
    rescue
    end

    # don't ask me why....
    if @response_headers.to_s.substring?( 'set-cookie' )
        begin
            cookies << WEBrick::Cookie.parse_set_cookies( @response_headers['Set-Cookie'].to_s )
            cookies << WEBrick::Cookie.parse_set_cookies( @response_headers['set-cookie'].to_s )
        rescue
            return cookies_arr
        end
    end

    cookies.flatten.uniq.each_with_index {
        |cookie, i|
        cookies_arr[i] = Hash.new

        cookie.instance_variables.each {
            |var|
            value = cookie.instance_variable_get( var ).to_s
            value.strip!

            key = normalize_name( var )
            val = value.gsub( /[\"\\\[\]]/, '' )

            next if val == seed
            cookies_arr[i][key] = val
        }

        # cookies.reject!{ |cookie| cookie['name'] == cookies_arr[i]['name'] }

        cookies_arr[i] = Element::Cookie.new( @url, cookies_arr[i] )
    }
    cookies_arr.flatten!
    return cookies_arr
end

#doc ⇒ `Object`

# File 'lib/parser/parser.rb', line 125

def doc
  return @doc if @doc
  @doc = Nokogiri::HTML( @html ) if @html rescue nil
end

#exclude?(url) ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/parser/parser.rb', line 489

def exclude?( url )
    @opts.exclude.each {
        |pattern|
        return true if url.to_s =~ pattern
    }

    return false
end

#extract_domain(url) ⇒ `String`

Extracts the domain from a URI object

Parameters:

url (URI)

Returns:

(String)

# File 'lib/parser/parser.rb', line 478

def extract_domain( url )

    if !url.host then return false end

    splits = url.host.split( /\./ )

    if splits.length == 1 then return true end

    splits[-2] + "." + splits[-1]
end

#forms(html = nil) ⇒ `Array<Element::Form>`

TODO: Add support for radio buttons.

Extracts forms from HTML document

Parameters:

html (String) (defaults to: nil)

Returns:

(Array<Element::Form>) —

array of forms

#headers ⇒ `Hash`

Returns a list of valid auditable HTTP header fields.

It’s more of a placeholder method, it doesn’t actually analyze anything. It’s a long shot that any of these will be vulnerable but better be safe than sorry.

Returns:

(Hash) —

HTTP header fields

# File 'lib/parser/parser.rb', line 193

def headers( )
    headers_arr  = []
    {
        'accept'          => 'text/html,application/xhtml+xml,application' +
            '/xml;q=0.9,*/*;q=0.8',
        'accept-charset'  => 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
        'accept-language' => 'en-gb,en;q=0.5',
        'accept-encoding' => 'gzip;q=1.0,deflate;q=0.6,identity;q=0.3',
        'from'       => @opts.authed_by || '',
        'user-agent' => @opts.user_agent || '',
        'referer'    => @url,
        'pragma'     => 'no-cache'
    }.each {
        |k,v|
        headers_arr << Element::Header.new( @url, { k => v } )
    }

    return headers_arr
end

#in_domain?(uri) ⇒ `Boolean`

Returns true if uri is in the same domain as the page, returns false otherwise

Returns:

(Boolean)

# File 'lib/parser/parser.rb', line 461

def in_domain?( uri )
    curi = URI.parse( normalize_url( uri.to_s ) )

    if( @opts.follow_subdomains )
        return extract_domain( curi ) ==  extract_domain( URI( @url.to_s ) )
    end

    return curi.host == URI.parse( normalize_url( @url.to_s ) ).host
end

#include?(url) ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/parser/parser.rb', line 498

def include?( url )
    return true if @opts.include.empty?

    @opts.include.each {
        |pattern|
        return true if url.to_s =~ pattern
    }
    return false
end

#link_vars(link) ⇒ `Hash`

Extracts variables and their values from a link

Parameters:

link (String)

Returns:

(Hash) —

name=>value pairs

#links ⇒ `Array<Element::Link>`

Extracts links from HTML document

Parameters:

html (String)

Returns:

(Array<Element::Link>) —

of links

#merge_with_cookiejar(cookies) ⇒ `Array<Element::Cookie>`

Merges ‘cookies’ with the cookiejar and returns it as an array

Parameters:

cookies (Array<Hash>)

Returns:

(Array<Element::Cookie>) —

the merged cookies

# File 'lib/parser/parser.rb', line 168

def merge_with_cookiejar( cookies )
    return cookies if !@opts.cookies

    @opts.cookies.each_pair {
        |name, value|
        cookies << Element::Cookie.new( @url,
            {
                'name'    => name,
                'value'   => value
            } )
    }

    return cookies
end

#merge_with_cookiestore(cookies) ⇒ `Object`

# File 'lib/parser/parser.rb', line 130

def merge_with_cookiestore( cookies )

    @cookiestore ||= []

    if @cookiestore.empty?
        @cookiestore = cookies
    else
        tmp = {}
        @cookiestore.each {
            |cookie|
            tmp.merge!( cookie.simple )
        }

        cookies.each {
            |cookie|
            tmp.merge!( cookie.simple )
        }

        @cookiestore = tmp.map {
            |name, value|
            Element::Cookie.new( @url, {
                'name'    => name,
                'value'   => value
            } )
        }
    end

    return @cookiestore

end

#run ⇒ `Page`

Runs the Analyzer and extracts forms, links and cookies

Returns:

(Page)

# File 'lib/parser/parser.rb', line 81

def run

    # non text files won't contain any auditable elements
    type = Arachni::HTTP.content_type( @response_headers )
    if type.is_a?( String) && !type.substring?( 'text' )
        return Page.new( {
            :url         => @url,
            :query_vars  => link_vars( @url ),
            :html        => @html,
            :headers     => [],
            :response_headers     => @response_headers,
            :forms       => [],
            :links       => [],
            :cookies     => [],
            :cookiejar   => []
        } )
    end


    cookies_arr = cookies
    cookies_arr = merge_with_cookiejar( cookies_arr.flatten.uniq )

    jar = {}
    jar = @opts.cookies = Arachni::HTTP.parse_cookiejar( @opts.cookie_jar ) if @opts.cookie_jar

    preped = {}
    cookies_arr.each{ |cookie| preped.merge!( cookie.simple ) }

    jar = preped.merge( jar )

    return Page.new( {
        :url         => @url,
        :query_vars  => link_vars( @url ),
        :html        => @html,
        :headers     => headers(),
        :response_headers     => @response_headers,
        :forms       => @opts.audit_forms ? forms() : [],
        :links       => @opts.audit_links ? links() : [],
        :cookies     => merge_with_cookiestore( merge_with_cookiejar( cookies_arr ) ),
        :cookiejar   => jar
    } )

end

#to_absolute(link) ⇒ `String`

Converts relative URL link into an absolute URL based on the location of the page

Parameters:

link (String)

Returns:

(String)

# File 'lib/parser/parser.rb', line 429

def to_absolute( link )

    begin
        if URI.parse( link ).host
            return link
        end
    rescue Exception => e
        return nil if link.nil?
        #      return link
    end

    # remove anchor
    link = URI.encode( link.to_s.gsub( /#[a-zA-Z0-9_-]*$/, '' ) )

    begin
        relative = URI(link)
        url = URI.parse( @url )

        absolute = url.merge(relative)

        absolute.path = '/' if absolute.path.empty?
    rescue Exception => e
        return
    end

    return absolute.to_s
end

Class: Arachni::Parser

Overview

Forms

Links

Cookies

Defined Under Namespace

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Module::Utilities

Constructor Details

#initialize(opts, res) ⇒ Parser

Instance Attribute Details

#opts ⇒ Options (readonly)

#url ⇒ String

Instance Method Details

#cookies ⇒ Array<Element::Cookie>

#doc ⇒ Object

#exclude?(url) ⇒ Boolean

#extract_domain(url) ⇒ String

#forms(html = nil) ⇒ Array<Element::Form>

#headers ⇒ Hash

#in_domain?(uri) ⇒ Boolean

#include?(url) ⇒ Boolean

#link_vars(link) ⇒ Hash

#links ⇒ Array<Element::Link>

#merge_with_cookiejar(cookies) ⇒ Array<Element::Cookie>

#merge_with_cookiestore(cookies) ⇒ Object

#run ⇒ Page

#to_absolute(link) ⇒ String

#initialize(opts, res) ⇒ `Parser`

#opts ⇒ `Options` (readonly)

#url ⇒ `String`

#cookies ⇒ `Array<Element::Cookie>`

#doc ⇒ `Object`

#exclude?(url) ⇒ `Boolean`

#extract_domain(url) ⇒ `String`

#forms(html = nil) ⇒ `Array<Element::Form>`

#headers ⇒ `Hash`

#in_domain?(uri) ⇒ `Boolean`

#include?(url) ⇒ `Boolean`

#link_vars(link) ⇒ `Hash`

#links ⇒ `Array<Element::Link>`

#merge_with_cookiejar(cookies) ⇒ `Array<Element::Cookie>`

#merge_with_cookiestore(cookies) ⇒ `Object`

#run ⇒ `Page`

#to_absolute(link) ⇒ `String`