Class: Arachni::URI

Inherits:
Object show all
Extended by:
Arachni::UI::Output, Utilities
Includes:
Arachni::UI::Output, Utilities
Defined in:
lib/arachni/uri.rb,
lib/arachni/uri/scope.rb

Overview

The URI class automatically normalizes the URLs it is passed to parse while maintaining compatibility with Ruby’s URI core class.

It also provides cached (to maintain a low latency) helper class methods to ease common operations such as:

Author:

Defined Under Namespace

Classes: Error, Scope

Constant Summary collapse

CACHE_SIZES =
{
    parse:       2_500,

    normalize:   2_500,
    to_absolute: 2_500,

    encode:      1_000,
    decode:      1_000,

    scope:       1_000
}
CACHE =
{
    parser: ::URI::Parser.new
}
QUERY_CHARACTER_CLASS =
Addressable::URI::CharacterClasses::QUERY.sub( '\\&', '' )
VALID_SCHEMES =
Set.new(%w(http https))
PARTS =
%w(scheme userinfo host port path query)
TO_ABSOLUTE_PARTS =
%w(scheme userinfo host port)

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Arachni::UI::Output

debug?, debug_level_1?, debug_level_2?, debug_level_3?, debug_level_4?, debug_off, debug_on, disable_only_positives, included, mute, muted?, only_positives, only_positives?, print_bad, print_debug, print_debug_backtrace, print_debug_level_1, print_debug_level_2, print_debug_level_3, print_debug_level_4, print_error, print_error_backtrace, print_exception, print_info, print_line, print_ok, print_status, print_verbose, reroute_to_file, reroute_to_file?, reset_output_options, unmute, verbose?, verbose_on

Methods included from Utilities

available_port, available_port_mutex, bytes_to_kilobytes, bytes_to_megabytes, caller_name, caller_path, cookie_decode, cookie_encode, cookies_from_file, cookies_from_parser, cookies_from_response, exception_jail, exclude_path?, follow_protocol?, form_decode, form_encode, forms_from_parser, forms_from_response, full_and_absolute_url?, generate_token, get_path, hms_to_seconds, html_decode, html_encode, include_path?, links_from_parser, links_from_response, normalize_url, page_from_response, page_from_url, parse_set_cookie, path_in_domain?, path_too_deep?, port_available?, rand_port, random_seed, redundant_path?, regexp_array_match, remove_constants, request_parse_body, seconds_to_hms, skip_page?, skip_path?, skip_resource?, skip_response?, uri_decode, uri_encode, uri_parse, uri_parse_query, uri_parser, uri_rewrite

Constructor Details

#initialize(url) ⇒ URI

Note:

Will discard the fragment component, if there is one.

Returns a new instance of URI.

Parameters:



414
415
416
417
418
419
420
421
422
423
424
# File 'lib/arachni/uri.rb', line 414

def initialize( url )
    @data = self.class.fast_parse( url )

    fail Error, 'Failed to parse URL.' if !@data

    PARTS.each do |part|
        instance_variable_set( "@#{part}", @data[part.to_sym] )
    end

    reset_userpass
end

Class Method Details

._load(url) ⇒ Object



786
787
788
# File 'lib/arachni/uri.rb', line 786

def self._load( url )
    new url
end

.decode(string) ⇒ String

URL decodes a string.

Parameters:

Returns:



105
106
107
108
109
110
111
112
113
114
115
116
# File 'lib/arachni/uri.rb', line 105

def decode( string )
    CACHE[__method__].fetch( string ) do
        s = Addressable::URI.unencode( string )

        if s
            s.recode!
            s.gsub!( '+', ' ' )
        end

        s
    end
end

.encode(string, good_characters = nil) ⇒ String

URL encodes a string.

Parameters:

  • string (String)
  • good_characters (String, Regexp) (defaults to: nil)

    Class of characters to allow – if String is passed, it should formatted as a regexp (for ‘Regexp.new`).

Returns:

  • (String)

    Encoded string.



90
91
92
93
94
95
96
97
98
# File 'lib/arachni/uri.rb', line 90

def encode( string, good_characters = nil )
    CACHE[__method__].fetch [string, good_characters] do
        s = Addressable::URI.encode_component(
            *[string, good_characters].compact
        )
        s.recode!
        s
    end
end

.fast_parse(url) ⇒ Hash

Performs a parse that is less resource intensive than Ruby’s URI lib’s method while normalizing the URL (will also discard the fragment and path parameters).

Parameters:

Returns:

  • (Hash)

    URL components (frozen):

    * `:scheme` -- HTTP or HTTPS
    * `:userinfo` -- `username:password`
    * `:host`
    * `:port`
    * `:path`
    * `:query`
    


156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
# File 'lib/arachni/uri.rb', line 156

def fast_parse( url )
    return if !url || url.empty?
    return if url.start_with?( '#' )

    durl = url.downcase
    return if durl.start_with?( 'javascript:' ) ||
        durl.start_with?( 'data:' )

    # One to rip apart.
    url = url.dup

    # Remove the fragment if there is one.
    url.sub!( /#.*/, '' )

    # One for reference.
    c_url = url

    components = {
        scheme:   nil,
        userinfo: nil,
        host:     nil,
        port:     nil,
        path:     nil,
        query:    nil
    }

    begin
        # Parsing the URL in its schemeless form is trickier, so we
        # fake it, pass a valid scheme to get through the parsing and
        # then remove it at the other end.
        if (schemeless = url.start_with?( '//' ))
            url.insert 0, 'http:'
        end

        # url.recode!
        url = html_decode( url )

        dupped_url = url.dup
        has_path = true

        splits = url.split( ':' )
        if !splits.empty? && VALID_SCHEMES.include?( splits.first.downcase )

            splits = url.split( '://', 2 )
            components[:scheme] = splits.shift
            components[:scheme].downcase! if components[:scheme]

            if (url = splits.shift)
                userinfo_host, url =
                    url.to_s.split( '?' ).first.to_s.split( '/', 2 )

                url    = url.to_s
                splits = userinfo_host.to_s.split( '@', 2 )

                if splits.size > 1
                    components[:userinfo] = splits.first
                end

                if !splits.empty?
                    splits = splits.last.split( '/', 2 )

                    splits = splits.first.split( ':', 2 )
                    if splits.size == 2
                        host = splits.first

                        if splits.last && !splits.last.empty?
                            components[:port] = splits.last.to_i
                        end

                        if components[:port] == 80
                            components[:port] = nil
                        end
                    else
                        host = splits.last
                    end

                    if (components[:host] = host)
                        components[:host].downcase!
                    end
                else
                    has_path = false
                end
            else
                has_path = false
            end
        end

        if has_path
            splits = url.split( '?', 2 )
            if (components[:path] = splits.shift)
                if components[:scheme]
                    components[:path] = "/#{components[:path]}"
                end

                components[:path].gsub!( /\/+/, '/' )

                # Remove path params
                components[:path].sub!( /\;.*/, '' )

                if components[:path]
                    components[:path] =
                        encode( decode( components[:path] ),
                                Addressable::URI::CharacterClasses::PATH ).dup

                    components[:path].gsub!( ';', '%3B' )
                end
            end

            if c_url.include?( '?' ) &&
                !(query = dupped_url.split( '?', 2 ).last).empty?

                components[:query] = (query.split( '&', -1 ).map do |pair|
                    encode( decode( pair ), QUERY_CHARACTER_CLASS )
                end).join( '&' )
            end
        end

        if schemeless
            components.delete :scheme
        end

        components[:path] ||= components[:scheme] ? '/' : nil

        components
    rescue => e
        print_debug "Failed to parse '#{c_url}'."
        print_debug "Error: #{e}"
        print_debug_backtrace( e )

        nil
    end
end

.full_and_absolute?(url) ⇒ Bool

Returns ‘true` is the URL is full and absolute, `false` otherwise.

Parameters:

  • url (String)

    URL to check.

Returns:

  • (Bool)

    ‘true` is the URL is full and absolute, `false` otherwise.



401
402
403
404
405
406
407
408
# File 'lib/arachni/uri.rb', line 401

def full_and_absolute?( url )
    return false if url.to_s.empty?

    parsed = parse( url.to_s )
    return false if !parsed

    parsed.absolute?
end

.normalize(url) ⇒ String

Note:

This method’s results are cached for performance reasons. If you plan on doing something destructive with its return value duplicate it first because there may be references to it elsewhere.

Uses parse to parse and normalize the URL and then converts it to a common String format.

Parameters:

Returns:

  • (String)

    Normalized URL (frozen).



348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
# File 'lib/arachni/uri.rb', line 348

def normalize( url )
    return if !url || url.empty?

    cache = CACHE[__method__]

    url   = url.to_s.strip
    c_url = url.dup

    begin
        if (v = cache[url]) && v == :err
            return
        elsif v
            return v
        end

        cache[c_url] = parse( url ).to_s.freeze
    rescue => e
        print_debug "Failed to normalize '#{c_url}'."
        print_debug "Error: #{e}"
        print_debug_backtrace( e )

        cache[c_url] = :err
        nil
    end
end

.parse(url) ⇒ Object

Note:

This method’s results are cached for performance reasons. If you plan on doing something destructive with its return value duplicate it first because there may be references to it elsewhere.

Cached version of #initialize, if there’s a chance that the same URL will be needed to be parsed multiple times you should use this method.

See Also:



126
127
128
129
130
131
132
133
134
135
136
137
138
139
# File 'lib/arachni/uri.rb', line 126

def parse( url )
    return url if !url || url.is_a?( Arachni::URI )

    CACHE[__method__].fetch url do
        begin
            new( url )
        rescue => e
            print_debug "Failed to parse '#{url}'."
            print_debug "Error: #{e}"
            print_debug_backtrace( e )
            nil
        end
    end
end

.parse_query(url) ⇒ Hash

Extracts inputs from a URL query.

Parameters:

Returns:



389
390
391
392
393
394
# File 'lib/arachni/uri.rb', line 389

def parse_query( url )
    parsed = parse( url )
    return {} if !parsed

    parse( url ).query_parameters
end

.parserURI::Parser

Returns cached URI parser.

Returns:

  • (URI::Parser)

    cached URI parser



77
78
79
# File 'lib/arachni/uri.rb', line 77

def parser
    CACHE[__method__]
end

.rewrite(url, rules = Arachni::Options.scope.url_rewrites) ⇒ String

Returns Rewritten URL.

Parameters:

  • url (String)
  • rules (Hash<Regexp => String>) (defaults to: Arachni::Options.scope.url_rewrites)

    Regular expression and substitution pairs.

Returns:



380
381
382
# File 'lib/arachni/uri.rb', line 380

def rewrite( url, rules = Arachni::Options.scope.url_rewrites )
    parse( url ).rewrite( rules ).to_s
end

.to_absolute(relative, reference = Options.instance.url.to_s) ⇒ String

Note:

This method’s results are cached for performance reasons. If you plan on doing something destructive with its return value duplicate it first because there may be references to it elsewhere.

Normalizes and converts a ‘relative` URL to an absolute one by merging in with a `reference` URL.

Pretty much a cached version of #to_absolute.

Parameters:

  • relative (String)
  • reference (String) (defaults to: Options.instance.url.to_s)

    Absolute url to use as a reference.

Returns:

  • (String)

    Absolute URL (frozen).



304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
# File 'lib/arachni/uri.rb', line 304

def to_absolute( relative, reference = Options.instance.url.to_s )
    return normalize( reference ) if !relative || relative.empty?
    key = [relative, reference].hash

    cache = CACHE[__method__]
    begin
        if (v = cache[key]) && v == :err
            return
        elsif v
            return v
        end

        parsed_ref = parse( reference )

        if relative.start_with?( '//' )
            # Scheme-less URLs are expensive to parse so let's resolve
            # the issue here.
            relative = "#{parsed_ref.scheme}:#{relative}"
        end

        parsed = parse( relative )

        # Doesn't contain anything or interest (javascript: or fragment only),
        # return the ref.
        return parsed_ref.to_s if !parsed

        cache[key] = parsed.to_absolute( parsed_ref ).to_s.freeze
    rescue
        cache[key] = :err
        nil
    end
end

Instance Method Details

#==(other) ⇒ Object



433
434
435
# File 'lib/arachni/uri.rb', line 433

def ==( other )
    to_s == other.to_s
end

#_dump(_) ⇒ Object



782
783
784
# File 'lib/arachni/uri.rb', line 782

def _dump( _ )
    to_s
end

#absolute?Boolean

Returns:

  • (Boolean)


437
438
439
# File 'lib/arachni/uri.rb', line 437

def absolute?
    !!@scheme
end

#domainString

Returns ‘domain_name.tld`.

Returns:

  • (String)

    ‘domain_name.tld`



594
595
596
597
598
599
600
601
602
603
604
# File 'lib/arachni/uri.rb', line 594

def domain
    return if !host
    return @domain if @domain
    return @domain = host if ip_address?

    s = host.split( '.' )
    return @domain = s.first if s.size == 1
    return @domain = host    if s.size == 2

    @domain = s[1..-1].join( '.' )
end

#dupObject



773
774
775
776
777
778
779
780
# File 'lib/arachni/uri.rb', line 773

def dup
    i = self.class.allocate
    instance_variables.each do |iv|
        next if !(v = instance_variable_get( iv ))
        i.instance_variable_set iv, (v.dup rescue v)
    end
    i
end

#hashObject



790
791
792
# File 'lib/arachni/uri.rb', line 790

def hash
    to_s.hash
end

#hostObject



695
696
697
# File 'lib/arachni/uri.rb', line 695

def host
    @host
end

#host=(h) ⇒ Object



699
700
701
702
703
704
705
706
# File 'lib/arachni/uri.rb', line 699

def host=( h )
    @to_s          = nil
    @up_to_port    = nil
    @without_query = nil
    @domain        = nil

    @host = h
end

#ip_address?Boolean

Returns ‘true` if the URI contains an IP address, `false` otherwise.

Returns:

  • (Boolean)

    ‘true` if the URI contains an IP address, `false` otherwise.



625
626
627
# File 'lib/arachni/uri.rb', line 625

def ip_address?
    !(IPAddr.new( host ) rescue nil).nil?
end

#passwordObject



676
677
678
# File 'lib/arachni/uri.rb', line 676

def password
    @password
end

#pathObject



708
709
710
# File 'lib/arachni/uri.rb', line 708

def path
    @path
end

#path=(p) ⇒ Object



712
713
714
715
716
717
718
719
720
# File 'lib/arachni/uri.rb', line 712

def path=( p )
    @up_to_path         = nil
    @resource_name      = nil
    @resource_extension = nil
    @without_query      = nil
    @to_s               = nil

    @path = p
end

#persistent_hashObject



794
795
796
# File 'lib/arachni/uri.rb', line 794

def persistent_hash
    to_s.persistent_hash
end

#portObject



680
681
682
# File 'lib/arachni/uri.rb', line 680

def port
    @port
end

#port=(p) ⇒ Object



684
685
686
687
688
689
690
691
692
693
# File 'lib/arachni/uri.rb', line 684

def port=( p )
    @without_query = nil
    @to_s          = nil

    if p
        @port = p.to_i
    else
        @port = nil
    end
end

#queryObject



629
630
631
# File 'lib/arachni/uri.rb', line 629

def query
    @query
end

#query=(q) ⇒ Object



633
634
635
636
637
638
639
640
641
642
# File 'lib/arachni/uri.rb', line 633

def query=( q )
    @to_s             = nil
    @without_query    = nil
    @query_parameters = nil

    q = q.to_s
    q = nil if q.empty?

    @query = q
end

#query_parametersHash

Returns Extracted inputs from a URL query.

Returns:

  • (Hash)

    Extracted inputs from a URL query.



646
647
648
649
650
651
652
653
654
655
656
657
# File 'lib/arachni/uri.rb', line 646

def query_parameters
    q = self.query
    return {} if q.to_s.empty?

    @query_parameters ||= begin
        q.split( '&' ).inject( {} ) do |h, pair|
            name, value = pair.split( '=', 2 )
            h[::URI.decode( name.to_s )] = ::URI.decode( value.to_s )
            h
        end
    end
end

#relative?Boolean

Returns:

  • (Boolean)


441
442
443
# File 'lib/arachni/uri.rb', line 441

def relative?
    !absolute?
end

#resource_extensionString?

Returns The extension of the URI #file_name, ‘nil` if there is none.

Returns:

  • (String, nil)

    The extension of the URI #file_name, ‘nil` if there is none.



553
554
555
556
557
558
# File 'lib/arachni/uri.rb', line 553

def resource_extension
    name = resource_name.to_s
    return if !name.include?( '.' )

    @resource_extension ||= name.split( '.' ).last
end

#resource_nameString

Returns Name of the resource.

Returns:

  • (String)

    Name of the resource.



547
548
549
# File 'lib/arachni/uri.rb', line 547

def resource_name
    @resource_name ||= path.split( '/' ).last
end

#rewrite(rules = Arachni::Options.scope.url_rewrites) ⇒ URI

Returns Rewritten URL.

Parameters:

  • rules (Hash<Regexp => String>) (defaults to: Arachni::Options.scope.url_rewrites)

    Regular expression and substitution pairs.

Returns:

  • (URI)

    Rewritten URL.



611
612
613
614
615
616
617
618
619
620
621
# File 'lib/arachni/uri.rb', line 611

def rewrite( rules = Arachni::Options.scope.url_rewrites )
    as_string = self.to_s

    rules.each do |args|
        if (rewritten = as_string.gsub( *args )) != as_string
            return Arachni::URI( rewritten )
        end
    end

    self.dup
end

#schemeObject



722
723
724
# File 'lib/arachni/uri.rb', line 722

def scheme
    @scheme
end

#scheme=(s) ⇒ Object



726
727
728
729
730
731
732
# File 'lib/arachni/uri.rb', line 726

def scheme=( s )
    @up_to_port    = nil
    @without_query = nil
    @to_s          = nil

    @scheme = s
end

#scopeScope

Returns:



427
428
429
430
431
# File 'lib/arachni/uri.rb', line 427

def scope
    # We could have several identical URLs in play at any given time and
    # they will all have the same scope.
    CACHE[:scope].fetch( self ){ Scope.new( self ) }
end

#seed_in_host?Bool

Returns ‘true` if the scan #seed is included in the domain, `false` otherwise.

Returns:

  • (Bool)

    ‘true` if the scan #seed is included in the domain, `false` otherwise.



531
532
533
# File 'lib/arachni/uri.rb', line 531

def seed_in_host?
    host.to_s.include?( Utilities.random_seed )
end

#to_absolute(reference) ⇒ Object



535
536
537
# File 'lib/arachni/uri.rb', line 535

def to_absolute( reference )
    dup.to_absolute!( reference )
end

#to_absolute!(reference) ⇒ Arachni::URI

Converts self into an absolute URL using ‘reference` to fill in the missing data.

Parameters:

Returns:



453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
# File 'lib/arachni/uri.rb', line 453

def to_absolute!( reference )
    if !reference.is_a?( self.class )
        reference = self.class.new( reference.to_s )
    end

    TO_ABSOLUTE_PARTS.each do |part|
        next if send( part )

        ref_part = reference.send( "#{part}" )
        next if !ref_part

        send( "#{part}=", ref_part )
    end

    base_path = reference.path.split( %r{/+}, -1 )
    rel_path  = path.split( %r{/+}, -1 )

    # RFC2396, Section 5.2, 6), a)
    base_path << '' if base_path.last == '..'
    while (i = base_path.index( '..' ))
        base_path.slice!( i - 1, 2 )
    end

    if (first = rel_path.first) && first.empty?
        base_path.clear
        rel_path.shift
    end

    # RFC2396, Section 5.2, 6), c)
    # RFC2396, Section 5.2, 6), d)
    rel_path.push('') if rel_path.last == '.' || rel_path.last == '..'
    rel_path.delete('.')

    # RFC2396, Section 5.2, 6), e)
    tmp = []
    rel_path.each do |x|
        if x == '..' &&
            !(tmp.empty? || tmp.last == '..')
            tmp.pop
        else
            tmp << x
        end
    end

    add_trailer_slash = !tmp.empty?
    if base_path.empty?
        base_path = [''] # keep '/' for root directory
    elsif add_trailer_slash
        base_path.pop
    end

    while (x = tmp.shift)
        if x == '..'
            # RFC2396, Section 4
            # a .. or . in an absolute path has no special meaning
            base_path.pop if base_path.size > 1
        else
            # if x == '..'
            #   valid absolute (but abnormal) path "/../..."
            # else
            #   valid absolute path
            # end
            base_path << x
            tmp.each {|t| base_path << t}
            add_trailer_slash = false
            break
        end
    end

    base_path.push('') if add_trailer_slash
    @path = base_path.join('/')

    self
end

#to_sString

Returns:



735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
# File 'lib/arachni/uri.rb', line 735

def to_s
    @to_s ||= begin
        s = ''

        if @scheme
            s << @scheme
            s << '://'
        end

        if @userinfo
            s << @userinfo
            s << '@'
        end

        if @host
            s << @host

            if @port
                if (@scheme == 'http' && @port != 80) ||
                    (@scheme == 'https' && @port != 443)

                    s << ':'
                    s << @port.to_s
                end
            end
        end

        s << @path.to_s

        if @query
            s << '?'
            s << @query
        end

        s
    end
end

#up_to_pathString

Returns The URL up to its path component (no resource name, query, fragment, etc).

Returns:

  • (String)

    The URL up to its path component (no resource name, query, fragment, etc).



562
563
564
565
566
567
568
569
570
571
572
573
# File 'lib/arachni/uri.rb', line 562

def up_to_path
    return if !path

    @up_to_path ||= begin
        uri_path = path.dup
        uri_path = File.dirname( uri_path ) if !File.extname( path ).empty?

        uri_path << '/' if uri_path[-1] != '/'

        up_to_port + uri_path
    end
end

#up_to_portString

Returns Scheme, host & port only.

Returns:

  • (String)

    Scheme, host & port only.



577
578
579
580
581
582
583
584
585
586
587
588
589
590
# File 'lib/arachni/uri.rb', line 577

def up_to_port
    @up_to_port ||= begin
        uri_str = "#{scheme}://#{host}"

        if port && (
            (scheme == 'http' && port != 80) ||
                (scheme == 'https' && port != 443)
        )
            uri_str << ':' + port.to_s
        end

        uri_str
    end
end

#userObject



672
673
674
# File 'lib/arachni/uri.rb', line 672

def user
    @user
end

#userinfoObject



668
669
670
# File 'lib/arachni/uri.rb', line 668

def userinfo
    @userinfo
end

#userinfo=(ui) ⇒ Object



659
660
661
662
663
664
665
666
# File 'lib/arachni/uri.rb', line 659

def userinfo=( ui )
    @without_query = nil
    @to_s          = nil

    @userinfo = ui
ensure
    reset_userpass
end

#without_queryString

Returns The URL up to its resource component (query, fragment, etc).

Returns:

  • (String)

    The URL up to its resource component (query, fragment, etc).



541
542
543
# File 'lib/arachni/uri.rb', line 541

def without_query
    @without_query ||= to_s.split( '?', 2 ).first.to_s
end