Class: Arachni::Spider

Inherits:

Object

Object
Arachni::Spider

show all

Includes:: UI::Output, Utilities

Defined in:: lib/arachni/spider.rb

Overview

Crawls the target webapp until there are no new paths left.

Author:

Tasos “Zapotek” Laskos <[email protected]>

Instance Attribute Summary collapse

#opts ⇒ Arachni::Options readonly
#redirects ⇒ Array<String> readonly

URLs that caused redirects.

Instance Method Summary collapse

#done? ⇒ TrueClass, FalseClass

True if crawl is done, false otherwise.
#fancy_sitemap ⇒ Hash<Integer, String>

List of crawled URLs with their HTTP codes.
#idle? ⇒ TrueClass, FalseClass

True if the queue is empty and no requests are pending, false otherwise.
#initialize(opts = Options.instance) ⇒ Spider constructor

Instantiates Spider class with user options.
#on_complete(&block) ⇒ Object

Sets blocks to be called once the crawler is done.
#on_each_page(&block) ⇒ Object

Sets blocks to be called every time a page is visited.
#on_each_response(&block) ⇒ Object

Sets blocks to be called every time a response is received.
#paths ⇒ Array<String>

Working paths, paths that haven’t yet been followed.
#pause ⇒ TrueClass

Pauses the system on a best effort basis.
#paused? ⇒ Bool

True if the system it paused, false otherwise.
#push(paths) ⇒ Bool

Pushes new paths for the crawler to follow; if the crawler has finished it will be awaken when new paths are pushed.
#resume ⇒ TrueClass

Resumes the system on a best effort basis.
#run(pass_pages_to_block = true, &block) ⇒ Array<String>

Runs the Spider and passes the requested object to the block.
#sitemap ⇒ Array<String>

List of crawled URLs.
#url ⇒ Object

Methods included from Utilities

#cookie_encode, #cookies_from_document, #cookies_from_file, #cookies_from_response, #exception_jail, #exclude_path?, #extract_domain, #form_decode, #form_encode, #form_parse_request_body, #forms_from_document, #forms_from_response, #get_path, #hash_keys_to_str, #html_decode, #html_encode, #include_path?, #links_from_document, #links_from_response, #normalize_url, #page_from_response, #page_from_url, #parse_query, #parse_set_cookie, #parse_url_vars, #path_in_domain?, #path_too_deep?, #remove_constants, #seed, #skip_path?, #to_absolute, #uri_decode, #uri_encode, #uri_parse, #uri_parser, #url_sanitize

Methods included from UI::Output

#debug?, #debug_off, #debug_on, #disable_only_positives, #flush_buffer, #mute, #muted?, old_reset_output_options, #only_positives, #only_positives?, #print_bad, #print_debug, #print_debug_backtrace, #print_debug_pp, #print_error, #print_error_backtrace, #print_info, #print_line, #print_ok, #print_status, #print_verbose, #reroute_to_file, #reroute_to_file?, reset_output_options, #set_buffer_cap, #uncap_buffer, #unmute, #verbose, #verbose?

Constructor Details

#initialize(opts = Options.instance) ⇒ `Spider`

Instantiates Spider class with user options.

Parameters:

opts (Arachni::Options) (defaults to: Options.instance)

# File 'lib/arachni/spider.rb', line 46

def initialize( opts = Options.instance )
    @opts = opts

    @sitemap   = {}
    @redirects = []
    @paths     = []
    @visited   = Set.new

    @on_each_page_blocks     = []
    @on_each_response_blocks = []
    @on_complete_blocks      = []

    @pass_pages       = true
    @pending_requests = 0

    seed_paths
end

Instance Attribute Details

#opts ⇒ `Arachni::Options` (readonly)

Returns:

(Arachni::Options)



36
37
38

# File 'lib/arachni/spider.rb', line 36

def opts
  @opts
end

#redirects ⇒ `Array<String>` (readonly)

Returns URLs that caused redirects.

Returns:

(Array<String>) —

URLs that caused redirects



39
40
41

# File 'lib/arachni/spider.rb', line 39

def redirects
  @redirects
end

Instance Method Details

#done? ⇒ `TrueClass`, `FalseClass`

Returns true if crawl is done, false otherwise.

Returns:

(TrueClass, FalseClass) —

true if crawl is done, false otherwise



196
197
198

# File 'lib/arachni/spider.rb', line 196

def done?
    idle? || limit_reached?
end

#fancy_sitemap ⇒ `Hash<Integer, String>`

Returns list of crawled URLs with their HTTP codes.

Returns:

(Hash<Integer, String>) —

list of crawled URLs with their HTTP codes



82
83
84

# File 'lib/arachni/spider.rb', line 82

def fancy_sitemap
    @sitemap
end

#idle? ⇒ `TrueClass`, `FalseClass`

Returns true if the queue is empty and no requests are pending, false otherwise.

Returns:

(TrueClass, FalseClass) —

true if the queue is empty and no requests are pending, false otherwise



202
203
204

# File 'lib/arachni/spider.rb', line 202

def idle?
    @paths.empty? && @pending_requests == 0
end

#on_complete(&block) ⇒ `Object`

Sets blocks to be called once the crawler is done.

Parameters:

block (Block)

# File 'lib/arachni/spider.rb', line 166

def on_complete( &block )
    fail 'Block is mandatory!' if !block_given?
    @on_complete_blocks << block
    self
end

#on_each_page(&block) ⇒ `Object`

Sets blocks to be called every time a page is visited.

Parameters:

block (Block)

# File 'lib/arachni/spider.rb', line 144

def on_each_page( &block )
    fail 'Block is mandatory!' if !block_given?
    @on_each_page_blocks << block
    self
end

#on_each_response(&block) ⇒ `Object`

Sets blocks to be called every time a response is received.

Parameters:

block (Block)

# File 'lib/arachni/spider.rb', line 155

def on_each_response( &block )
    fail 'Block is mandatory!' if !block_given?
    @on_each_response_blocks << block
    self
end

#paths ⇒ `Array<String>`

Returns Working paths, paths that haven’t yet been followed. You’ll actually get a copy of the working paths and not the actual object itself; if you want to add more paths use #push.

Returns:

(Array<String>) —

Working paths, paths that haven’t yet been followed. You’ll actually get a copy of the working paths and not the actual object itself; if you want to add more paths use #push.



72
73
74

# File 'lib/arachni/spider.rb', line 72

def paths
    @paths.clone
end

#pause ⇒ `TrueClass`

Returns pauses the system on a best effort basis.

Returns:

(TrueClass) —

pauses the system on a best effort basis



207
208
209

# File 'lib/arachni/spider.rb', line 207

def pause
    @pause = true
end

#paused? ⇒ `Bool`

Returns true if the system it paused, false otherwise.

Returns:

(Bool) —

true if the system it paused, false otherwise



218
219
220

# File 'lib/arachni/spider.rb', line 218

def paused?
    @pause ||= false
end

#push(paths) ⇒ `Bool`

Pushes new paths for the crawler to follow; if the crawler has finished it will be awaken when new paths are pushed.

The paths will be sanitized and normalized (cleaned up and converted to absolute ones).

Parameters:

paths (String, Array<String>)

Returns:

(Bool) —

true if push was successful, false otherwise (provided empty or paths that must be skipped)

# File 'lib/arachni/spider.rb', line 183

def push( paths )
    paths = dedup( paths )
    return false if paths.empty?

    @paths |= paths
    @paths.uniq!

    # REVIEW: This may cause segfaults, Typhoeus::Hydra doesn't like threads.
    #Thread.new { run } if idle? # wake up the crawler
    true
end

#resume ⇒ `TrueClass`

Returns resumes the system on a best effort basis.

Returns:

(TrueClass) —

resumes the system on a best effort basis

# File 'lib/arachni/spider.rb', line 212

def resume
    @pause = false
    true
end

#run(pass_pages_to_block = true, &block) ⇒ `Array<String>`

Runs the Spider and passes the requested object to the block.

Parameters:

pass_pages_to_block (Bool) (defaults to: true) —

decides weather the block should be passed [Arachni::Page]s or [Typhoeus::Response]s
block (Block) —

to be passed each page as visited

Returns:

(Array<String>) —

sitemap

# File 'lib/arachni/spider.rb', line 95

def run( pass_pages_to_block = true, &block )
    return if !@opts.crawl?

    # options could have changed so reseed
    seed_paths

    if block_given?
        pass_pages_to_block ? on_each_page( &block ) : on_each_response( &block )
    end

    while !done?
        wait_if_paused
        while !done? && url = @paths.shift
            wait_if_paused

            visit( url ) do |res|
                obj = if pass_pages_to_block
                    Page.from_response( res, @opts )
                else
                    Parser.new( res, @opts )
                end

                if @on_each_response_blocks.any?
                    call_on_each_response_blocks( res )
                end

                if @on_each_page_blocks.any?
                    call_on_each_page_blocks( pass_pages_to_block ? obj : Page.from_response( res, @opts ) )
                end

                push( obj.paths )
            end
        end

        http.run
    end

    http.run

    call_on_complete_blocks

    sitemap
end

#sitemap ⇒ `Array<String>`

Returns list of crawled URLs.

Returns:

(Array<String>) —

list of crawled URLs



77
78
79

# File 'lib/arachni/spider.rb', line 77

def sitemap
    @sitemap.keys
end

#url ⇒ `Object`



64
65
66

# File 'lib/arachni/spider.rb', line 64

def url
    @opts.url
end

Class: Arachni::Spider

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Utilities

Methods included from UI::Output

Constructor Details

#initialize(opts = Options.instance) ⇒ Spider

Instance Attribute Details

#opts ⇒ Arachni::Options (readonly)

#redirects ⇒ Array<String> (readonly)

Instance Method Details

#done? ⇒ TrueClass, FalseClass

#fancy_sitemap ⇒ Hash<Integer, String>

#idle? ⇒ TrueClass, FalseClass

#on_complete(&block) ⇒ Object

#on_each_page(&block) ⇒ Object

#on_each_response(&block) ⇒ Object

#paths ⇒ Array<String>

#pause ⇒ TrueClass

#paused? ⇒ Bool

#push(paths) ⇒ Bool

#resume ⇒ TrueClass

#run(pass_pages_to_block = true, &block) ⇒ Array<String>

#sitemap ⇒ Array<String>

#url ⇒ Object

#initialize(opts = Options.instance) ⇒ `Spider`

#opts ⇒ `Arachni::Options` (readonly)

#redirects ⇒ `Array<String>` (readonly)

#done? ⇒ `TrueClass`, `FalseClass`

#fancy_sitemap ⇒ `Hash<Integer, String>`

#idle? ⇒ `TrueClass`, `FalseClass`

#on_complete(&block) ⇒ `Object`

#on_each_page(&block) ⇒ `Object`

#on_each_response(&block) ⇒ `Object`

#paths ⇒ `Array<String>`

#pause ⇒ `TrueClass`

#paused? ⇒ `Bool`

#push(paths) ⇒ `Bool`

#resume ⇒ `TrueClass`

#run(pass_pages_to_block = true, &block) ⇒ `Array<String>`

#sitemap ⇒ `Array<String>`

#url ⇒ `Object`