Class: Arachni::Parser

Inherits:
Object show all
Includes:
UI::Output, Utilities
Defined in:
lib/arachni/parser.rb

Overview

Analyzer class

Analyzes HTML code extracting forms, links and cookies depending on user opts.

It grabs all element attributes not just URLs and variables. All URLs are converted to absolute and URLs outside the domain are ignored.

Forms

Form analysis uses both regular expressions and the Nokogiri parser in order to be able to handle badly written HTML code, such as not closed tags and tag overlaps.

In order to ease audits, in addition to parsing forms into data structures like “select” and “option”, all auditable inputs are put under the “auditable” key.

Links are extracted using the Nokogiri parser.

Cookies

Cookies are extracted from the HTTP headers and parsed by WEBrick::Cookie

Author:

Defined Under Namespace

Modules: Extractors

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Utilities

#cookie_encode, #cookies_from_document, #cookies_from_file, #cookies_from_response, #exception_jail, #exclude_path?, #extract_domain, #form_decode, #form_encode, #form_parse_request_body, #forms_from_document, #forms_from_response, #get_path, #hash_keys_to_str, #html_decode, #html_encode, #include_path?, #links_from_document, #links_from_response, #normalize_url, #page_from_response, #page_from_url, #parse_query, #parse_set_cookie, #parse_url_vars, #path_too_deep?, #remove_constants, #seed, #skip_path?, #uri_decode, #uri_encode, #uri_parse, #uri_parser, #url_sanitize

Methods included from UI::Output

#debug?, #debug_off, #debug_on, #disable_only_positives, #flush_buffer, #mute, #muted?, old_reset_output_options, #only_positives, #only_positives?, #print_bad, #print_debug, #print_debug_backtrace, #print_debug_pp, #print_error, #print_error_backtrace, #print_info, #print_line, #print_ok, #print_status, #print_verbose, #reroute_to_file, #reroute_to_file?, reset_output_options, #set_buffer_cap, #uncap_buffer, #unmute, #verbose, #verbose?

Constructor Details

#initialize(res, opts = Options) ⇒ Parser

Instantiates Analyzer class with user options.

Parameters:

  • res (Typhoeus::Responses, Array<Typhoeus::Responses>)
  • opts (Options) (defaults to: Options)


101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# File 'lib/arachni/parser.rb', line 101

def initialize( res, opts = Options )
    @opts = opts

    if res.is_a? Array
        @secondary_responses = res[1..-1]
        @secondary_responses.compact! if @secondary_responses
        res = res.shift
    end

    @code     = res.code
    self.url  = res.effective_url
    @html     = res.body
    @response = res

    @response_headers = res.headers_hash

    @doc   = nil
    @paths = nil
end

Instance Attribute Details

#optsOptions (readonly)

Options instance

Returns:



93
94
95
# File 'lib/arachni/parser.rb', line 93

def opts
  @opts
end

#urlString

Returns the url of the page.

Returns:

  • (String)

    the url of the page



86
87
88
# File 'lib/arachni/parser.rb', line 86

def url
  @url
end

Instance Method Details

#baseString

Returns base href if there is one.

Returns:

  • (String)

    base href if there is one



366
367
368
369
370
371
# File 'lib/arachni/parser.rb', line 366

def base
    @base ||= begin
        doc.search( '//base[@href]' ).first['href']
    rescue
    end
end

#cookiesArray<Element::Cookie>

Extracts cookies from an HTTP headers and the response body

Returns:



345
346
347
348
# File 'lib/arachni/parser.rb', line 345

def cookies
    ( Cookie.from_document( @url, doc ) |
      Cookie.from_headers( @url, @response_headers ) )
end

#docObject



250
251
252
253
# File 'lib/arachni/parser.rb', line 250

def doc
    return @doc if @doc
    @doc = Nokogiri::HTML( @html ) if text? rescue nil
end

#forms(html = nil) ⇒ Array<Element::Form>

Extracts forms from HTML document

Parameters:

  • html (String) (defaults to: nil)

Returns:



285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
# File 'lib/arachni/parser.rb', line 285

def forms( html = nil )
    return [] if !text? && !html

    f = Form.from_document( @url, html || doc )

    if @secondary_responses
        @secondary_responses.each do |response|
            next if response.body.to_s.empty?

            Form.from_document( @url, response.body ).each do |form2|
                f.each do |form|
                    next if form.auditable.keys.sort != form2.auditable.keys.sort
                    form.auditable.each do |k, v|
                        if v != form2.auditable[k] && form.field_type_for( k ) == 'hidden'
                            form.nonce_name = k
                        end
                    end
                end
            end
        end
    end

    f
end

#headersHash

Returns a list of valid auditable HTTP header fields.

It’s more of a placeholder method, it doesn’t actually analyze anything.<br/> It’s a long shot that any of these will be vulnerable but better be safe than sorry.

Returns:

  • (Hash)

    HTTP header fields



264
265
266
267
268
269
270
271
272
273
274
275
276
# File 'lib/arachni/parser.rb', line 264

def headers
    {
        'Accept'          => 'text/html,application/xhtml+xml,application' +
            '/xml;q=0.9,*/*;q=0.8',
        'Accept-Charset'  => 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
        'Accept-Language' => 'en-gb,en;q=0.5',
        'Accept-Encoding' => 'gzip;q=1.0,deflate;q=0.6,identity;q=0.3',
        'From'       => @opts.authed_by  || '',
        'User-Agent' => @opts.user_agent || '',
        'Referer'    => @url,
        'Pragma'     => 'no-cache'
    }.map { |k, v| Header.new( @url, { k => v } ) }
end

Extracts variables and their values from a link

Parameters:

Returns:

  • (Hash)

    name=>value pairs

See Also:



336
337
338
# File 'lib/arachni/parser.rb', line 336

def link_vars( url )
    Link.parse_query_vars( url )
end

Extracts links from HTML document

Parameters:

  • html (String) (defaults to: nil)

Returns:



317
318
319
320
321
322
323
324
325
# File 'lib/arachni/parser.rb', line 317

def links( html = nil )
    return [] if !text? && !html

    if !(vars = link_vars( @url )).empty? || @response.redirection?
        [Link.new( @url, vars )]
    else
        []
    end | Link.from_document( @url, html || doc )
end

#pagePage Also known as: run

Runs the Analyzer and extracts forms, links and cookies

Returns:



153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
# File 'lib/arachni/parser.rb', line 153

def page
    req_method = 'get'
    begin
        req_method = @response.request.method.to_s
    rescue
    end

    self_link = Link.new( @url, inputs: link_vars( @url ) )

    # non text files won't contain any auditable elements
    if !text?
        return Page.new(
            code:             @code,
            url:              @url,
            method:           req_method,
            query_vars:       self_link.auditable,
            body:             @html,
            request_headers:  @response.request.headers,
            response_headers: @response_headers,
            text:             false
        )
    end

    # extract cookies from the response
    c_cookies = cookies

    # make a list of the response cookie names
    cookie_names = c_cookies.map { |c| c.name }

    from_jar = []

    # if there's a Netscape cookiejar file load cookies from it but only new ones,
    # i.e. only if they weren't in the response
    if @opts.cookie_jar
        from_jar |= cookies_from_file( @url, @opts.cookie_jar )
            .reject { |c| cookie_names.include?( c.name ) }
    end

    # if we somehow have any runtime configuration cookies load them too
    # but only if they haven't already been seen
    if @opts.cookies && !@opts.cookies.empty?
        from_jar |= @opts.cookies.reject { |c| cookie_names.include?( c.name ) }
    end

    # grab cookies from the HTTP cookiejar and filter out old ones, as usual
    from_http_jar = HTTP.instance.cookie_jar.cookies.reject do |c|
        cookie_names.include?( c.name )
    end

    # these cookies are to be audited and thus are dirty and anarchistic
    # so they have to contain even cookies completely irrelevant to the
    # current page, i.e. it contains all cookies that have been observed
    # from the beginning of the scan
    cookies_to_be_audited = (c_cookies | from_jar | from_http_jar).map do |c|
        dc = c.dup
        dc.action = @url
        dc
    end

    Page.new(
        code:             @code,
        url:              @url,
        query_vars:       self_link.auditable,
        method:           req_method,
        body:             @html,

        request_headers:  @response.request.headers,
        response_headers: @response_headers,

        document:         doc,

        # all paths seen in the page
        paths:            paths,
        forms:            forms,

        # all href attributes from 'a' elements
        links:            links | [self_link],

        cookies:          cookies_to_be_audited,
        headers:          headers,

        # this is the page cookiejar, each time the page is to be audited
        # by a module the cookiejar of the HTTP class will be updated
        # with the cookies specified here
        cookiejar:        c_cookies | from_jar,

        text:             true
    )
end

#path_in_domain?(url) ⇒ Bool

Returns true if URL is within domain limits, false if not.

Parameters:

Returns:

  • (Bool)

    true if URL is within domain limits, false if not



144
145
146
# File 'lib/arachni/parser.rb', line 144

def path_in_domain?( url )
    super( url, @url )
end

#pathsArray<String>

Array of distinct links to follow

Returns:



355
356
357
358
359
360
361
# File 'lib/arachni/parser.rb', line 355

def paths
  return @paths unless @paths.nil?
  @paths = []
  return @paths if !doc

  @paths = run_extractors
end

#text?Boolean

Returns:

  • (Boolean)


244
245
246
247
248
# File 'lib/arachni/parser.rb', line 244

def text?
    type = @response.content_type
    return false if !type
    type.to_s.substring?( 'text' )
end

#to_absolute(relative_url) ⇒ String

Converts a relative URL to an absolute one.

Returns:



130
131
132
133
134
135
136
137
# File 'lib/arachni/parser.rb', line 130

def to_absolute( relative_url )
    if url = base
        base_url = url
    else
        base_url = @url
    end
    super( relative_url, base_url )
end