Module: URI

Defined in:
lib/uri.rb,
lib/uri/ftp.rb,
lib/uri/http.rb,
lib/uri/ldap.rb,
lib/uri/https.rb,
lib/uri/ldaps.rb,
lib/uri/mailto.rb,
lib/uri/common.rb,
lib/uri/generic.rb

Overview

uri/common.rb

Author

Akira Yamada <akira@ruby-lang.org>

Revision

$Id: common.rb 27285 2010-04-10 22:05:02Z naruse $

License

You can redistribute it and/or modify it under the same term as Ruby.

Constant Summary

Class Method Summary collapse

Methods included from Escape

escape, unescape

Class Method Details

.decode_www_form(str, enc = Encoding::UTF_8) ⇒ Object

Decode URL-encoded form data from given str.

This decodes application/x-www-form-urlencoded data and returns array of key-value array. This internally uses URI.decode_www_form_component.

charset hack is not supported now because the mapping from given charset to Ruby's encoding is not clear yet. see also www.w3.org/TR/html5/syntax.html#character-encodings-0

This refers www.w3.org/TR/html5/forms.html#url-encoded-form-data

ary = URI.decode_www_form("a=1&a=2&b=3") p ary #=> [['a', '1'], ['a', '2'], ['b', '3']] p ary.assoc('a').last #=> '1' p ary.assoc('b').last #=> '3' p ary.rassoc('a').last #=> '2' p Hash # => "b"=>"3"

See URI.decode_www_form_component, URI.encode_www_form



826
827
828
829
830
831
832
833
834
835
836
# File 'lib/uri/common.rb', line 826

def self.decode_www_form(str, enc=Encoding::UTF_8)
  return [] if str.empty?
  unless /\A#{WFKV_}*=#{WFKV_}*(?:[;&]#{WFKV_}*=#{WFKV_}*)*\z/o =~ str
    raise ArgumentError, "invalid data of application/x-www-form-urlencoded (#{str})"
  end
  ary = []
  $&.scan(/([^=;&]+)=([^;&]*)/) do
    ary << [decode_www_form_component($1, enc), decode_www_form_component($2, enc)]
  end
  ary
end

.decode_www_form_component(str, enc = Encoding::UTF_8) ⇒ Object

Decode given str of URL-encoded form data.

This decods + to SP.

See URI.encode_www_form_component, URI.decode_www_form

Raises:

  • (ArgumentError)


756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
# File 'lib/uri/common.rb', line 756

def self.decode_www_form_component(str, enc=Encoding::UTF_8)
  if TBLDECWWWCOMP_.empty?
    256.times do |i|
      h, l = i>>4, i&15
      TBLDECWWWCOMP_['%%%X%X' % [h, l]] = i.chr
      TBLDECWWWCOMP_['%%%x%X' % [h, l]] = i.chr
      TBLDECWWWCOMP_['%%%X%x' % [h, l]] = i.chr
      TBLDECWWWCOMP_['%%%x%x' % [h, l]] = i.chr
    end
    TBLDECWWWCOMP_['+'] = ' '
    TBLDECWWWCOMP_.freeze
  end
  raise ArgumentError, "invalid %-encoding (#{str})" unless /\A(?:%\h\h|[^%]+)*\z/ =~ str
  str.gsub(/\+|%\h\h/, TBLDECWWWCOMP_).force_encoding(enc)
end

.encode_www_form(enum) ⇒ Object

Generate URL-encoded form data from given enum.

This generates application/x-www-form-urlencoded data defined in HTML5 from given an Enumerable object.

This internally uses URI.encode_www_form_component(str).

This doesn't convert encodings of give items, so convert them before call this method if you want to send data as other than original encoding or mixed encoding data. (strings which is encoded in HTML5 ASCII incompatible encoding is converted to UTF-8)

This doesn't treat files. When you send a file, use multipart/form-data.

This refers www.w3.org/TR/html5/forms.html#url-encoded-form-data

See URI.encode_www_form_component, URI.decode_www_form



789
790
791
792
793
794
795
796
797
798
799
800
801
802
# File 'lib/uri/common.rb', line 789

def self.encode_www_form(enum)
  str = nil
  enum.each do |k,v|
    if str
      str << '&'
    else
      str = nil.to_s
    end
    str << encode_www_form_component(k)
    str << '='
    str << encode_www_form_component(v)
  end
  str
end

.encode_www_form_component(str) ⇒ Object

Encode given str to URL-encoded form data.

This doesn't convert *, -, ., 0-9, A-Z, _, a-z, does convert SP to +, and convert others to %XX.

This refers www.w3.org/TR/html5/forms.html#url-encoded-form-data

See URI.decode_www_form_component, URI.encode_www_form



732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
# File 'lib/uri/common.rb', line 732

def self.encode_www_form_component(str)
  if TBLENCWWWCOMP_.empty?
    256.times do |i|
      TBLENCWWWCOMP_[i.chr] = '%%%02X' % i
    end
    TBLENCWWWCOMP_[' '] = '+'
    TBLENCWWWCOMP_.freeze
  end
  str = str.to_s
  if HTML5ASCIIINCOMPAT.include?(str.encoding)
    str = str.encode(Encoding::UTF_8)
  else
    str = str.dup
  end
  str.force_encoding(Encoding::ASCII_8BIT)
  str.gsub!(/[^*\-.0-9A-Z_a-z]/, TBLENCWWWCOMP_)
  str.force_encoding(Encoding::US_ASCII)
end

.extract(str, schemes = nil, &block) ⇒ Object

Synopsis

URI::extract(str[, schemes][,&blk])

Args

str

String to extract URIs from.

schemes

Limit URI matching to a specific schemes.

Description

Extracts URIs from a string. If block given, iterates through all matched URIs. Returns nil if block given or array with matches.

Usage

require "uri"

URI.extract("text here http://foo.example.org/bla and here mailto:test@example.com and here also.")
# => ["http://foo.example.com/bla", "mailto:test@example.com"]


680
681
682
# File 'lib/uri/common.rb', line 680

def self.extract(str, schemes = nil, &block)
  DEFAULT_PARSER.extract(str, schemes, &block)
end

.join(*str) ⇒ Object

Synopsis

URI::join(str[, str, ...])

Args

str

String(s) to work with

Description

Joins URIs.

Usage

require 'uri'

p URI.join("http://localhost/","main.rbx")
# => #<URI::HTTP:0x2022ac02 URL:http://localhost/main.rbx>


652
653
654
# File 'lib/uri/common.rb', line 652

def self.join(*str)
  DEFAULT_PARSER.join(*str)
end

.parse(uri) ⇒ Object

Synopsis

URI::parse(uri_str)

Args

uri_str

String with URI.

Description

Creates one of the URI's subclasses instance from the string.

Raises

URI::InvalidURIError

Raised if URI given is not a correct one.

Usage

require 'uri'

uri = URI.parse("http://www.ruby-lang.org/")
p uri
# => #<URI::HTTP:0x202281be URL:http://www.ruby-lang.org/>
p uri.scheme
# => "http"
p uri.host
# => "www.ruby-lang.org"


627
628
629
# File 'lib/uri/common.rb', line 627

def self.parse(uri)
  DEFAULT_PARSER.parse(uri)
end

.regexp(schemes = nil) ⇒ Object

Synopsis

URI::regexp([match_schemes])

Args

match_schemes

Array of schemes. If given, resulting regexp matches to URIs whose scheme is one of the match_schemes.

Description

Returns a Regexp object which matches to URI-like strings. The Regexp object returned by this method includes arbitrary number of capture group (parentheses). Never rely on it's number.

Usage

require 'uri'

# extract first URI from html_string
html_string.slice(URI.regexp)

# remove ftp URIs
html_string.sub(URI.regexp(['ftp'])

# You should not rely on the number of parentheses
html_string.scan(URI.regexp) do |*matches|
  p $&
end


715
716
717
# File 'lib/uri/common.rb', line 715

def self.regexp(schemes = nil)
  DEFAULT_PARSER.make_regexp(schemes)
end

.scheme_listObject



540
541
542
# File 'lib/uri/common.rb', line 540

def self.scheme_list
  @@schemes
end

.split(uri) ⇒ Object

Synopsis

URI::split(uri)

Args

uri

String with URI.

Description

Splits the string on following parts and returns array with result:

* Scheme
* Userinfo
* Host
* Port
* Registry
* Path
* Opaque
* Query
* Fragment

Usage

require 'uri'

p URI.split("http://www.ruby-lang.org/")
# => ["http", nil, "www.ruby-lang.org", nil, nil, "/", nil, nil, nil]


592
593
594
# File 'lib/uri/common.rb', line 592

def self.split(uri)
  DEFAULT_PARSER.split(uri)
end