Class: Mechanize

Inherits:
Object
  • Object
show all
Defined in:
lib/mechanize.rb,
lib/mechanize/cookie.rb,
lib/mechanize/version.rb,
lib/mechanize/cookie_jar.rb

Overview

The Mechanize library is used for automating interactions with a website. It can follow links and submit forms. Form fields can be populated and submitted. A history of URLs is maintained and can be queried.

Example

require 'mechanize'
require 'logger'

agent = Mechanize.new
agent.log = Logger.new "mech.log"
agent.user_agent_alias = 'Mac Safari'

page = agent.get "http://www.google.com/"
search_form = page.form_with :name => "f"
search_form.field_with(:name => "q").value = "Hello"

search_results = agent.submit search_form
puts search_results.body

Issues with mechanize

If you think you have a bug with mechanize, but aren't sure, please file a ticket at github.com/sparklemotion/mechanize/issues

Here are some common problems you may experience with mechanize

Problems connecting to SSL sites

Mechanize defaults to validating SSL certificates using the default CA certificates for your platform. At this time, Windows users do not have integration between the OS default CA certificates and OpenSSL. #cert_store explains how to download and use Mozilla's CA certificates to allow SSL sites to work.

Problems with content-length

Some sites return an incorrect content-length value. Unlike a browser, mechanize raises an error when the content-length header does not match the response length since it does not know if there was a connection problem or if the mismatch is a server bug.

The error raised, Mechanize::ResponseReadError, can be converted to a parsed Page, File, etc. depending upon the content-type:

agent = Mechanize.new
uri = URI 'http://example/invalid_content_length'

begin
  page = agent.get uri
rescue Mechanize::ResponseReadError => e
  page = e.force_parse
end

Defined Under Namespace

Modules: CookieCMethods, CookieDeprecated, CookieIMethods, CookieJarIMethods, ElementMatcher, Parser, Prependable Classes: ChunkedTerminationError, ContentTypeError, Cookie, CookieJar, DirectorySaver, Download, ElementNotFoundError, Error, File, FileConnection, FileRequest, FileResponse, FileSaver, Form, HTTP, Headers, History, Image, Page, PluggableParser, RedirectLimitReachedError, RedirectNotGetOrHeadError, ResponseCodeError, ResponseReadError, RobotsDisallowedError, TestCase, UnauthorizedError, UnsupportedSchemeError, Util, XmlFile

Constant Summary collapse

AGENT_ALIASES =

Supported User-Agent aliases for use with user_agent_alias=. The description in parenthesis is for informative purposes and is not part of the alias name.

  • Linux Firefox (43.0 on Ubuntu Linux)

  • Linux Konqueror (3)

  • Linux Mozilla

  • Mac Firefox (43.0)

  • Mac Mozilla

  • Mac Safari (9.0 on OS X 10.11.2)

  • Mac Safari 4

  • Mechanize (default)

  • Windows IE 6

  • Windows IE 7

  • Windows IE 8

  • Windows IE 9

  • Windows IE 10 (Windows 8 64bit)

  • Windows IE 11 (Windows 8.1 64bit)

  • Windows Edge

  • Windows Mozilla

  • Windows Firefox (43.0)

  • iPhone (iOS 9.1)

  • iPad (iOS 9.1)

  • Android (5.1.1)

Example:

agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'
{
  'Mechanize' => "Mechanize/#{VERSION} Ruby/#{ruby_version} (http://github.com/sparklemotion/mechanize/)",
  'Linux Firefox' => 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:43.0) Gecko/20100101 Firefox/43.0',
  'Linux Konqueror' => 'Mozilla/5.0 (compatible; Konqueror/3; Linux)',
  'Linux Mozilla' => 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624',
  'Mac Firefox' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0',
  'Mac Mozilla' => 'Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4a) Gecko/20030401',
  'Mac Safari 4' => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; de-at) AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4 Safari/531.21.10',
  'Mac Safari' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9',
  'Windows Chrome' => 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.125 Safari/537.36',
  'Windows IE 6' => 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)',
  'Windows IE 7' => 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)',
  'Windows IE 8' => 'Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727)',
  'Windows IE 9' => 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)',
  'Windows IE 10' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)',
  'Windows IE 11' => 'Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko',
  'Windows Edge' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10586',
  'Windows Mozilla' => 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6',
  'Windows Firefox' => 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0',
  'iPhone' => 'Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B5110e Safari/601.1',
  'iPad' => 'Mozilla/5.0 (iPad; CPU OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1',
  'Android' => 'Mozilla/5.0 (Linux; Android 5.1.1; Nexus 7 Build/LMY47V) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.76 Safari/537.36',
}
VERSION =
"2.8.2"

Class Attribute Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(connection_name = 'mechanize') {|_self| ... } ⇒ Mechanize

Creates a new mechanize instance. If a block is given, the created instance is yielded to the block for setting up pre-connection state such as SSL parameters or proxies:

agent = Mechanize.new do |a|
  a.proxy_addr = 'proxy.example'
  a.proxy_port = 8080
end

If you need segregated SSL connections give each agent a unique name. Otherwise the connections will be shared. This is particularly important if you are using certifcates.

agent_1 = Mechanize.new 'conn1'
agent_2 = Mechanize.new 'conn2'

Yields:

  • (_self)

Yield Parameters:

  • _self (Mechanize)

    the object that the method was called on


191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
# File 'lib/mechanize.rb', line 191

def initialize(connection_name = 'mechanize')
  @agent = Mechanize::HTTP::Agent.new(connection_name)
  @agent.context = self
  @log = nil

  # attr_accessors
  @agent.user_agent = AGENT_ALIASES['Mechanize']
  @watch_for_set    = nil
  @history_added    = nil

  # attr_readers
  @pluggable_parser = PluggableParser.new

  @keep_alive_time  = 0

  # Proxy
  @proxy_addr = nil
  @proxy_port = nil
  @proxy_user = nil
  @proxy_pass = nil

  @html_parser = self.class.html_parser

  @default_encoding = nil
  @force_default_encoding = false

  # defaults
  @agent.max_history = 50

  yield self if block_given?

  @agent.set_proxy @proxy_addr, @proxy_port, @proxy_user, @proxy_pass
end

Class Attribute Details

.html_parserObject

Default HTML parser for all mechanize instances

Mechanize.html_parser = Nokogiri::XML

624
625
626
# File 'lib/mechanize.rb', line 624

def html_parser
  @html_parser
end

.logObject

Default logger for all mechanize instances

Mechanize.log = Logger.new $stderr

631
632
633
# File 'lib/mechanize.rb', line 631

def log
  @log
end

Instance Attribute Details

#agentObject (readonly)

:section: Utilities


1235
1236
1237
# File 'lib/mechanize.rb', line 1235

def agent
  @agent
end

#default_encodingObject

A default encoding name used when parsing HTML parsing. When set it is used after any other encoding. The default is nil.


639
640
641
# File 'lib/mechanize.rb', line 639

def default_encoding
  @default_encoding
end

#force_default_encodingObject

Overrides the encodings given by the HTTP server and the HTML page with the default_encoding when set to true.


645
646
647
# File 'lib/mechanize.rb', line 645

def force_default_encoding
  @force_default_encoding
end

#history_addedObject

Callback which is invoked with the page that was added to history.


306
307
308
# File 'lib/mechanize.rb', line 306

def history_added
  @history_added
end

#html_parserObject

The HTML parser to be used when parsing documents


650
651
652
# File 'lib/mechanize.rb', line 650

def html_parser
  @html_parser
end

#keep_alive_timeObject

HTTP/1.0 keep-alive time. This is no longer supported by mechanize as it now uses net-http-persistent which only supports HTTP/1.1 persistent connections


657
658
659
# File 'lib/mechanize.rb', line 657

def keep_alive_time
  @keep_alive_time
end

#pluggable_parserObject (readonly)

The pluggable parser maps a response Content-Type to a parser class. The registered Content-Type may be either a full content type like 'image/png' or a media type 'text'. See Mechanize::PluggableParser for further details.

Example:

agent.pluggable_parser['application/octet-stream'] = Mechanize::Download

669
670
671
# File 'lib/mechanize.rb', line 669

def pluggable_parser
  @pluggable_parser
end

#proxy_addrObject (readonly)

The HTTP proxy address


674
675
676
# File 'lib/mechanize.rb', line 674

def proxy_addr
  @proxy_addr
end

#proxy_passObject (readonly)

The HTTP proxy password


679
680
681
# File 'lib/mechanize.rb', line 679

def proxy_pass
  @proxy_pass
end

#proxy_portObject (readonly)

The HTTP proxy port


684
685
686
# File 'lib/mechanize.rb', line 684

def proxy_port
  @proxy_port
end

#proxy_userObject (readonly)

The HTTP proxy username


689
690
691
# File 'lib/mechanize.rb', line 689

def proxy_user
  @proxy_user
end

#watch_for_setObject

The value of watch_for_set is passed to pluggable parsers for retrieved content


1074
1075
1076
# File 'lib/mechanize.rb', line 1074

def watch_for_set
  @watch_for_set
end

Class Method Details

.inherited(child) ⇒ Object

:nodoc:


150
151
152
153
154
# File 'lib/mechanize.rb', line 150

def self.inherited(child) # :nodoc:
  child.html_parser = html_parser
  child.log = log
  super
end

.startObject

Creates a new Mechanize instance and yields it to the given block.

After the block executes, the instance is cleaned up. This includes closing all open connections.

Mechanize.start do |m|
  m.get("http://example.com")
end

166
167
168
169
170
171
# File 'lib/mechanize.rb', line 166

def self.start
  instance = new
  yield(instance)
ensure
  instance.shutdown
end

Instance Method Details

#add_auth(uri, user, password, realm = nil, domain = nil) ⇒ Object

Adds credentials user, pass for uri. If realm is set the credentials are used only for that realm. If realm is not set the credentials become the default for any realm on that URI.

domain and realm are exclusive as NTLM does not follow RFC 2617. If domain is given it is only used for NTLM authentication.


724
725
726
# File 'lib/mechanize.rb', line 724

def add_auth uri, user, password, realm = nil, domain = nil
  @agent.add_auth uri, user, password, realm, domain
end

#auth(user, password, domain = nil) ⇒ Object Also known as: basic_auth

NOTE: These credentials will be used as a default for any challenge exposing your password to disclosure to malicious servers. Use of this method will warn. This method is deprecated and will be removed in mechanize 3.

Sets the user and password as the default credentials to be used for HTTP authentication for any server. The domain is used for NTLM authentication.


701
702
703
704
705
706
707
708
709
710
711
712
# File 'lib/mechanize.rb', line 701

def auth user, password, domain = nil
  caller.first =~ /(.*?):(\d+).*?$/

  warn <<-WARNING
At #{$1} line #{$2}

Use of #auth and #basic_auth are deprecated due to a security vulnerability.

  WARNING

  @agent.add_default_auth user, password, domain
end

#backObject

Equivalent to the browser back button. Returns the previous page visited.


232
233
234
# File 'lib/mechanize.rb', line 232

def back
  @agent.history.pop
end

#ca_fileObject

Path to an OpenSSL server certificate file


1084
1085
1086
# File 'lib/mechanize.rb', line 1084

def ca_file
  @agent.ca_file
end

#ca_file=(ca_file) ⇒ Object

Sets the certificate file used for SSL connections


1091
1092
1093
# File 'lib/mechanize.rb', line 1091

def ca_file= ca_file
  @agent.ca_file = ca_file
end

#certObject

An OpenSSL client certificate or the path to a certificate file.


1098
1099
1100
# File 'lib/mechanize.rb', line 1098

def cert
  @agent.certificate
end

#cert=(cert) ⇒ Object

Sets the OpenSSL client certificate cert to the given path or certificate instance


1106
1107
1108
# File 'lib/mechanize.rb', line 1106

def cert= cert
  @agent.certificate = cert
end

#cert_storeObject

An OpenSSL certificate store for verifying server certificates. This defaults to the default certificate store for your system.

If your system does not ship with a default set of certificates you can retrieve a copy of the set from Mozilla here: curl.haxx.se/docs/caextract.html

(Note that this set does not have an HTTPS download option so you may wish to use the firefox-db2pem.sh script to extract the certificates from a local install to avoid man-in-the-middle attacks.)

After downloading or generating a cacert.pem from the above link you can create a certificate store from the pem file like this:

cert_store = OpenSSL::X509::Store.new
cert_store.add_file 'cacert.pem'

And have mechanize use it with:

agent.cert_store = cert_store

1132
1133
1134
# File 'lib/mechanize.rb', line 1132

def cert_store
  @agent.cert_store
end

#cert_store=(cert_store) ⇒ Object

Sets the OpenSSL certificate store to store.

See also #cert_store


1141
1142
1143
# File 'lib/mechanize.rb', line 1141

def cert_store= cert_store
  @agent.cert_store = cert_store
end

#certificateObject

What is this?

Why is it different from #cert?


1150
1151
1152
# File 'lib/mechanize.rb', line 1150

def certificate # :nodoc:
  @agent.certificate
end

#click(link) ⇒ Object

If the parameter is a string, finds the button or link with the value of the string on the current page and clicks it. Otherwise, clicks the Mechanize::Page::Link object passed in. Returns the page fetched.


333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
# File 'lib/mechanize.rb', line 333

def click link
  case link
  when Page::Link then
    referer = link.page || current_page()
    if @agent.robots
      if (referer.is_a?(Page) and referer.parser.nofollow?) or
         link.rel?('nofollow') then
        raise RobotsDisallowedError.new(link.href)
      end
    end
    if link.noreferrer?
      href = @agent.resolve(link.href, link.page || current_page)
      referer = Page.new
    else
      href = link.href
    end
    get href, [], referer
  when String, Regexp then
    if real_link = page.link_with(:text => link)
      click real_link
    else
      button = nil
      # Note that this will not work if we have since navigated to a different page.
      # Should rather make each button aware of its parent form.
      form = page.forms.find do |f|
        button = f.button_with(:value => link)
        button.is_a? Form::Submit
      end
      submit form, button if form
    end
  when Form::Submit, Form::ImageButton then
    # Note that this will not work if we have since navigated to a different page.
    # Should rather make each button aware of its parent form.
    form = page.forms.find do |f|
      f.buttons.include?(link)
    end
    submit form, link if form
  else
    referer = current_page()
    href = link.respond_to?(:href) ? link.href :
      (link['href'] || link['src'])
    get href, [], referer
  end
end

#conditional_requestsObject

Are If-Modified-Since conditional requests enabled?


731
732
733
# File 'lib/mechanize.rb', line 731

def conditional_requests
  @agent.conditional_requests
end

#conditional_requests=(enabled) ⇒ Object

Disables If-Modified-Since conditional requests (enabled by default)


738
739
740
# File 'lib/mechanize.rb', line 738

def conditional_requests= enabled
  @agent.conditional_requests = enabled
end

#content_encoding_hooksObject

A list of hooks to call before reading response header 'content-encoding'.

The hook is called with the agent making the request, the URI of the request, the response an IO containing the response body.


299
300
301
# File 'lib/mechanize.rb', line 299

def content_encoding_hooks
  @agent.content_encoding_hooks
end

A Mechanize::CookieJar which stores cookies


745
746
747
# File 'lib/mechanize.rb', line 745

def cookie_jar
  @agent.cookie_jar
end

Replaces the cookie jar with cookie_jar


752
753
754
# File 'lib/mechanize.rb', line 752

def cookie_jar= cookie_jar
  @agent.cookie_jar = cookie_jar
end

#cookiesObject

Returns a list of cookies stored in the cookie jar.


759
760
761
# File 'lib/mechanize.rb', line 759

def cookies
  @agent.cookie_jar.to_a
end

#current_pageObject Also known as: page

Returns the latest page loaded by Mechanize


239
240
241
# File 'lib/mechanize.rb', line 239

def current_page
  @agent.current_page
end

#delete(uri, query_params = {}, headers = {}) ⇒ Object

DELETE uri with query_params, and setting headers:

query_params is formatted into a query string using Mechanize::Util.build_query_string, which see.

delete('http://example/', {'q' => 'foo'}, {})

427
428
429
430
431
# File 'lib/mechanize.rb', line 427

def delete(uri, query_params = {}, headers = {})
  page = @agent.fetch(uri, :delete, headers, query_params)
  add_to_history(page)
  page
end

#download(uri, io_or_filename, parameters = [], referer = nil, headers = {}) ⇒ Object

GETs uri and writes it to io_or_filename without recording the request in the history. If io_or_filename does not respond to #write it will be used as a file name. parameters, referer and headers are used as in #get.

By default, if the Content-type of the response matches a Mechanize::File or Mechanize::Page parser, the response body will be loaded into memory before being saved. See #pluggable_parser for details on changing this default.

For alternate ways of downloading files see Mechanize::FileSaver and Mechanize::DirectorySaver.


392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
# File 'lib/mechanize.rb', line 392

def download uri, io_or_filename, parameters = [], referer = nil, headers = {}
  page = transact do
    get uri, parameters, referer, headers
  end

  io = if io_or_filename.respond_to? :write then
         io_or_filename
       else
         ::File.open(io_or_filename, 'wb')
       end

  case page
  when Mechanize::File then
    io.write page.body
  else
    body_io = page.body_io

    until body_io.eof? do
      io.write body_io.read 16384
    end
  end

  page
ensure
  io.close if io and not io_or_filename.respond_to? :write
end

#follow_meta_refreshObject

Follow HTML meta refresh and HTTP Refresh headers. If set to :anywhere meta refresh tags outside of the head element will be followed.


767
768
769
# File 'lib/mechanize.rb', line 767

def follow_meta_refresh
  @agent.follow_meta_refresh
end

#follow_meta_refresh=(follow) ⇒ Object

Controls following of HTML meta refresh and HTTP Refresh headers in responses.


775
776
777
# File 'lib/mechanize.rb', line 775

def follow_meta_refresh= follow
  @agent.follow_meta_refresh = follow
end

#follow_meta_refresh_selfObject

Follow an HTML meta refresh and HTTP Refresh headers that have no “url=” in the content attribute.

Defaults to false to prevent infinite refresh loops.


785
786
787
# File 'lib/mechanize.rb', line 785

def follow_meta_refresh_self
  @agent.follow_meta_refresh_self
end

#follow_meta_refresh_self=(follow) ⇒ Object

Alters the following of HTML meta refresh and HTTP Refresh headers that point to the same page.


793
794
795
# File 'lib/mechanize.rb', line 793

def follow_meta_refresh_self= follow
  @agent.follow_meta_refresh_self = follow
end

#get(uri, parameters = [], referer = nil, headers = {}) {|page| ... } ⇒ Object

GET the uri with the given request parameters, referer and headers.

The referer may be a URI or a page.

parameters is formatted into a query string using Mechanize::Util.build_query_string, which see.

Yields:


442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
# File 'lib/mechanize.rb', line 442

def get(uri, parameters = [], referer = nil, headers = {})
  method = :get

  referer ||=
    if uri.to_s =~ %r{\Ahttps?://}
      Page.new
    else
      current_page || Page.new
    end

  # FIXME: Huge hack so that using a URI as a referer works.  I need to
  # refactor everything to pass around URIs but still support
  # Mechanize::Page#base
  unless Mechanize::Parser === referer then
    referer = if referer.is_a?(String) then
                Page.new URI(referer)
              else
                Page.new referer
              end
  end

  # fetch the page
  headers ||= {}
  page = @agent.fetch uri, method, headers, parameters, referer
  add_to_history(page)
  yield page if block_given?
  page
end

#get_file(url) ⇒ Object

GET url and return only its contents


474
475
476
# File 'lib/mechanize.rb', line 474

def get_file(url)
  get(url).body
end

#gzip_enabledObject

Is gzip compression of responses enabled?


800
801
802
# File 'lib/mechanize.rb', line 800

def gzip_enabled
  @agent.gzip_enabled
end

#gzip_enabled=(enabled) ⇒ Object

Disables HTTP/1.1 gzip compression (enabled by default)


807
808
809
# File 'lib/mechanize.rb', line 807

def gzip_enabled=enabled
  @agent.gzip_enabled = enabled
end

#head(uri, query_params = {}, headers = {}) {|page| ... } ⇒ Object

HEAD uri with query_params and headers:

query_params is formatted into a query string using Mechanize::Util.build_query_string, which see.

head('http://example/', {'q' => 'foo'}, {})

Yields:


486
487
488
489
490
491
492
# File 'lib/mechanize.rb', line 486

def head(uri, query_params = {}, headers = {})
  page = @agent.fetch uri, :head, headers, query_params

  yield page if block_given?

  page
end

#historyObject

The history of this mechanize run


248
249
250
# File 'lib/mechanize.rb', line 248

def history
  @agent.history
end

#idle_timeoutObject

Connections that have not been used in this many seconds will be reset.


814
815
816
# File 'lib/mechanize.rb', line 814

def idle_timeout
  @agent.idle_timeout
end

#idle_timeout=(idle_timeout) ⇒ Object

Sets the idle timeout to idle_timeout. The default timeout is 5 seconds. If you experience “too many connection resets”, reducing this value may help.


822
823
824
# File 'lib/mechanize.rb', line 822

def idle_timeout= idle_timeout
  @agent.idle_timeout = idle_timeout
end

#ignore_bad_chunkingObject

When set to true mechanize will ignore an EOF during chunked transfer encoding so long as at least one byte was received. Be careful when enabling this as it may cause data loss.

Net::HTTP does not inform mechanize of where in the chunked stream the EOF occurred. Usually it is after the last-chunk but before the terminating CRLF (invalid termination) but it may occur earlier. In the second case your response body may be incomplete.


836
837
838
# File 'lib/mechanize.rb', line 836

def ignore_bad_chunking
  @agent.ignore_bad_chunking
end

#ignore_bad_chunking=(ignore_bad_chunking) ⇒ Object

When set to true mechanize will ignore an EOF during chunked transfer encoding. See ignore_bad_chunking for further details


844
845
846
# File 'lib/mechanize.rb', line 844

def ignore_bad_chunking= ignore_bad_chunking
  @agent.ignore_bad_chunking = ignore_bad_chunking
end

#keep_aliveObject

Are HTTP/1.1 keep-alive connections enabled?


851
852
853
# File 'lib/mechanize.rb', line 851

def keep_alive
  @agent.keep_alive
end

#keep_alive=(enable) ⇒ Object

Disable HTTP/1.1 keep-alive connections if enable is set to false. If you are experiencing “too many connection resets” errors setting this to false will eliminate them.

You should first investigate reducing idle_timeout.


862
863
864
# File 'lib/mechanize.rb', line 862

def keep_alive= enable
  @agent.keep_alive = enable
end

#keyObject

An OpenSSL private key or the path to a private key


1157
1158
1159
# File 'lib/mechanize.rb', line 1157

def key
  @agent.private_key
end

#key=(key) ⇒ Object

Sets the OpenSSL client key to the given path or key instance. If a path is given, the path must contain an RSA key file.


1165
1166
1167
# File 'lib/mechanize.rb', line 1165

def key= key
  @agent.private_key = key
end

#logObject

The current logger. If no logger has been set Mechanize.log is used.


869
870
871
# File 'lib/mechanize.rb', line 869

def log
  @log || Mechanize.log
end

#log=(logger) ⇒ Object

Sets the logger used by this instance of mechanize


876
877
878
# File 'lib/mechanize.rb', line 876

def log= logger
  @log = logger
end

#max_file_bufferObject

Responses larger than this will be written to a Tempfile instead of stored in memory. The default is 100,000 bytes.

A value of nil disables creation of Tempfiles.


886
887
888
# File 'lib/mechanize.rb', line 886

def max_file_buffer
  @agent.max_file_buffer
end

#max_file_buffer=(bytes) ⇒ Object

Sets the maximum size of a response body that will be stored in memory to bytes. A value of nil causes all response bodies to be stored in memory.

Note that for Mechanize::Download subclasses, the maximum buffer size multiplied by the number of pages stored in history (controlled by #max_history) is an approximate upper limit on the amount of memory Mechanize will use. By default, Mechanize can use up to ~5MB to store response bodies for non-File and non-Page (HTML) responses.

See also the discussion under #max_history=


903
904
905
# File 'lib/mechanize.rb', line 903

def max_file_buffer= bytes
  @agent.max_file_buffer = bytes
end

#max_historyObject

Maximum number of items allowed in the history. The default setting is 50 pages. Note that the size of the history multiplied by the maximum response body size


257
258
259
# File 'lib/mechanize.rb', line 257

def max_history
  @agent.history.max_size
end

#max_history=(length) ⇒ Object

Sets the maximum number of items allowed in the history to length.

Setting the maximum history length to nil will make the history size unlimited. Take care when doing this, mechanize stores response bodies in memory for pages and in the temporary files directory for other responses. For a long-running mechanize program this can be quite large.

See also the discussion under #max_file_buffer=


271
272
273
# File 'lib/mechanize.rb', line 271

def max_history= length
  @agent.history.max_size = length
end

#open_timeoutObject

Length of time to wait until a connection is opened in seconds


910
911
912
# File 'lib/mechanize.rb', line 910

def open_timeout
  @agent.open_timeout
end

#open_timeout=(open_timeout) ⇒ Object

Sets the connection open timeout to open_timeout


917
918
919
# File 'lib/mechanize.rb', line 917

def open_timeout= open_timeout
  @agent.open_timeout = open_timeout
end

#parse(uri, response, body) ⇒ Object

Parses the body of the response from uri using the pluggable parser that matches its content type


1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
# File 'lib/mechanize.rb', line 1241

def parse uri, response, body
  content_type = nil

  unless response['Content-Type'].nil?
    data, = response['Content-Type'].split ';', 2
    content_type, = data.downcase.split ',', 2 unless data.nil?
  end

  parser_klass = @pluggable_parser.parser content_type

  unless parser_klass <= Mechanize::Download then
    body = case body
           when IO, Tempfile, StringIO then
             body.read
           else
             body
           end
  end

  parser_klass.new uri, response, body, response.code do |parser|
    parser.mech = self if parser.respond_to? :mech=

    parser.watch_for_set = @watch_for_set if
      @watch_for_set and parser.respond_to?(:watch_for_set=)
  end
end

#passObject

OpenSSL client key password


1172
1173
1174
# File 'lib/mechanize.rb', line 1172

def pass
  @agent.pass
end

#pass=(pass) ⇒ Object

Sets the client key password to pass


1179
1180
1181
# File 'lib/mechanize.rb', line 1179

def pass= pass
  @agent.pass = pass
end

#post(uri, query = {}, headers = {}) ⇒ Object

POST to the given uri with the given query.

query is processed using Mechanize::Util.each_parameter (which see), and then encoded into an entity body. If any IO/FileUpload object is specified as a field value the “enctype” will be multipart/form-data, or application/x-www-form-urlencoded otherwise.

Examples:

agent.post 'http://example.com/', "foo" => "bar"

agent.post 'http://example.com/', [%w[foo bar]]

agent.post('http://example.com/', "<message>hello</message>",
           'Content-Type' => 'application/xml')

511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
# File 'lib/mechanize.rb', line 511

def post(uri, query = {}, headers = {})
  return request_with_entity(:post, uri, query, headers) if String === query

  node = {}
  # Create a fake form
  class << node
    def search(*args); []; end
  end
  node['method'] = 'POST'
  node['enctype'] = 'application/x-www-form-urlencoded'

  form = Form.new(node)

  Mechanize::Util.each_parameter(query) { |k, v|
    if v.is_a?(IO)
      form.enctype = 'multipart/form-data'
      ul = Form::FileUpload.new({'name' => k.to_s},::File.basename(v.path))
      ul.file_data = v.read
      form.file_uploads << ul
    elsif v.is_a?(Form::FileUpload)
      form.enctype = 'multipart/form-data'
      form.file_uploads << v
    else
      form.fields << Form::Field.new({'name' => k.to_s},v)
    end
  }
  post_form(uri, form, headers)
end

#post_connect_hooksObject

A list of hooks to call after retrieving a response. Hooks are called with the agent, the URI, the response, and the response body.


312
313
314
# File 'lib/mechanize.rb', line 312

def post_connect_hooks
  @agent.post_connect_hooks
end

#pre_connect_hooksObject

A list of hooks to call before retrieving a response. Hooks are called with the agent, the URI, the response, and the response body.


320
321
322
# File 'lib/mechanize.rb', line 320

def pre_connect_hooks
  @agent.pre_connect_hooks
end

#pretty_print(q) ⇒ Object

:nodoc:


1268
1269
1270
1271
1272
1273
1274
1275
# File 'lib/mechanize.rb', line 1268

def pretty_print(q) # :nodoc:
  q.object_group(self) {
    q.breakable
    q.pp cookie_jar
    q.breakable
    q.pp current_page
  }
end

#put(uri, entity, headers = {}) ⇒ Object

PUT to uri with entity, and setting headers:

put('http://example/', 'new content', {'Content-Type' => 'text/plain'})

545
546
547
# File 'lib/mechanize.rb', line 545

def put(uri, entity, headers = {})
  request_with_entity(:put, uri, entity, headers)
end

#read_timeoutObject

Length of time to wait for data from the server


924
925
926
# File 'lib/mechanize.rb', line 924

def read_timeout
  @agent.read_timeout
end

#read_timeout=(read_timeout) ⇒ Object

Sets the timeout for each chunk of data read from the server to read_timeout. A single request may read many chunks of data.


932
933
934
# File 'lib/mechanize.rb', line 932

def read_timeout= read_timeout
  @agent.read_timeout = read_timeout
end

#redirect_okObject Also known as: follow_redirect?

Controls how mechanize deals with redirects. The following values are allowed:

:all, true

All 3xx redirects are followed (default)

:permanent

Only 301 Moved Permanantly redirects are followed

false

No redirects are followed


944
945
946
# File 'lib/mechanize.rb', line 944

def redirect_ok
  @agent.redirect_ok
end

#redirect_ok=(follow) ⇒ Object Also known as: follow_redirect=

Sets the mechanize redirect handling policy. See redirect_ok for allowed values


954
955
956
# File 'lib/mechanize.rb', line 954

def redirect_ok= follow
  @agent.redirect_ok = follow
end

#redirection_limitObject

Maximum number of redirections to follow


963
964
965
# File 'lib/mechanize.rb', line 963

def redirection_limit
  @agent.redirection_limit
end

#redirection_limit=(limit) ⇒ Object

Sets the maximum number of redirections to follow to limit


970
971
972
# File 'lib/mechanize.rb', line 970

def redirection_limit= limit
  @agent.redirection_limit = limit
end

#request_headersObject

A hash of custom request headers that will be sent on every request


983
984
985
# File 'lib/mechanize.rb', line 983

def request_headers
  @agent.request_headers
end

#request_headers=(request_headers) ⇒ Object

Replaces the custom request headers that will be sent on every request with request_headers


991
992
993
# File 'lib/mechanize.rb', line 991

def request_headers= request_headers
  @agent.request_headers = request_headers
end

#request_with_entity(verb, uri, entity, headers = {}) ⇒ Object

Makes an HTTP request to url using HTTP method verb. entity is used as the request body, if allowed.


553
554
555
556
557
558
559
560
561
562
563
564
565
566
# File 'lib/mechanize.rb', line 553

def request_with_entity(verb, uri, entity, headers = {})
  cur_page = current_page || Page.new

  log.debug("query: #{ entity.inspect }") if log

  headers = {
    'Content-Type' => 'application/octet-stream',
    'Content-Length' => entity.size.to_s,
  }.update headers

  page = @agent.fetch uri, verb, headers, [entity], cur_page
  add_to_history(page)
  page
end

#resetObject

Clears history and cookies.


1292
1293
1294
# File 'lib/mechanize.rb', line 1292

def reset
  @agent.reset
end

#resolve(link) ⇒ Object

Resolve the full path of a link / uri


976
977
978
# File 'lib/mechanize.rb', line 976

def resolve link
  @agent.resolve link
end

#retry_change_requestsObject

Retry POST and other non-idempotent requests. See RFC 2616 9.1.2.


998
999
1000
# File 'lib/mechanize.rb', line 998

def retry_change_requests
  @agent.retry_change_requests
end

#retry_change_requests=(retry_change_requests) ⇒ Object

When setting retry_change_requests to true you are stating that, for all the URLs you access with mechanize, making POST and other non-idempotent requests is safe and will not cause data duplication or other harmful results.

If you are experiencing “too many connection resets” errors you should instead investigate reducing the idle_timeout or disabling keep_alive connections.


1012
1013
1014
# File 'lib/mechanize.rb', line 1012

def retry_change_requests= retry_change_requests
  @agent.retry_change_requests = retry_change_requests
end

#robotsObject

Will /robots.txt files be obeyed?


1019
1020
1021
# File 'lib/mechanize.rb', line 1019

def robots
  @agent.robots
end

#robots=(enabled) ⇒ Object

When enabled mechanize will retrieve and obey robots.txt files


1027
1028
1029
# File 'lib/mechanize.rb', line 1027

def robots= enabled
  @agent.robots = enabled
end

#scheme_handlersObject

The handlers for HTTP and other URI protocols.


1034
1035
1036
# File 'lib/mechanize.rb', line 1034

def scheme_handlers
  @agent.scheme_handlers
end

#scheme_handlers=(scheme_handlers) ⇒ Object

Replaces the URI scheme handler table with scheme_handlers


1041
1042
1043
# File 'lib/mechanize.rb', line 1041

def scheme_handlers= scheme_handlers
  @agent.scheme_handlers = scheme_handlers
end

#set_proxy(address, port, user = nil, password = nil) ⇒ Object

Sets the proxy address at port with an optional user and password


1280
1281
1282
1283
1284
1285
1286
1287
# File 'lib/mechanize.rb', line 1280

def set_proxy address, port, user = nil, password = nil
  @proxy_addr = address
  @proxy_port = port
  @proxy_user = user
  @proxy_pass = password

  @agent.set_proxy address, port, user, password
end

#shutdownObject

Shuts down this session by clearing browsing state and closing all persistent connections.


1300
1301
1302
1303
# File 'lib/mechanize.rb', line 1300

def shutdown
  reset
  @agent.shutdown
end

#ssl_versionObject

SSL version to use.


1186
1187
1188
# File 'lib/mechanize.rb', line 1186

def ssl_version
  @agent.ssl_version
end

#ssl_version=(ssl_version) ⇒ Object

Sets the SSL version to use to version without client/server negotiation.


1194
1195
1196
# File 'lib/mechanize.rb', line 1194

def ssl_version= ssl_version
  @agent.ssl_version = ssl_version
end

#submit(form, button = nil, headers = {}) ⇒ Object

Submits form with an optional button.

Without a button:

page = agent.get('http://example.com')
agent.submit(page.forms.first)

With a button:

agent.submit(page.forms.first, page.forms.first.buttons.first)

580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
# File 'lib/mechanize.rb', line 580

def submit(form, button = nil, headers = {})
  form.add_button_to_query(button) if button

  case form.method.upcase
  when 'POST'
    post_form(form.action, form, headers)
  when 'GET'
    get(form.action.gsub(/\?[^\?]*$/, ''),
        form.build_query,
        form.page,
        headers)
  else
    raise ArgumentError, "unsupported method: #{form.method.upcase}"
  end
end

#transactObject

Runs given block, then resets the page history as it was before. self is given as a parameter to the block. Returns the value of the block.


600
601
602
603
604
605
606
607
# File 'lib/mechanize.rb', line 600

def transact
  history_backup = @agent.history.dup
  begin
    yield self
  ensure
    @agent.history = history_backup
  end
end

#user_agentObject

The identification string for the client initiating a web request


1048
1049
1050
# File 'lib/mechanize.rb', line 1048

def user_agent
  @agent.user_agent
end

#user_agent=(user_agent) ⇒ Object

Sets the User-Agent used by mechanize to user_agent. See also user_agent_alias


1056
1057
1058
# File 'lib/mechanize.rb', line 1056

def user_agent= user_agent
  @agent.user_agent = user_agent
end

#user_agent_alias=(name) ⇒ Object

Set the user agent for the Mechanize object based on the given name.

See also AGENT_ALIASES


1065
1066
1067
1068
# File 'lib/mechanize.rb', line 1065

def user_agent_alias= name
  self.user_agent = AGENT_ALIASES[name] ||
    raise(ArgumentError, "unknown agent alias #{name.inspect}")
end

#verify_callbackObject

A callback for additional certificate verification. See OpenSSL::SSL::SSLContext#verify_callback

The callback can be used for debugging or to ignore errors by always returning true. Specifying nil uses the default method that was valid when the SSLContext was created


1206
1207
1208
# File 'lib/mechanize.rb', line 1206

def verify_callback
  @agent.verify_callback
end

#verify_callback=(verify_callback) ⇒ Object

Sets the OpenSSL certificate verification callback


1213
1214
1215
# File 'lib/mechanize.rb', line 1213

def verify_callback= verify_callback
  @agent.verify_callback = verify_callback
end

#verify_modeObject

the OpenSSL server certificate verification method. The default is OpenSSL::SSL::VERIFY_PEER and certificate verification uses the default system certificates. See also cert_store


1222
1223
1224
# File 'lib/mechanize.rb', line 1222

def verify_mode
  @agent.verify_mode
end

#verify_mode=(verify_mode) ⇒ Object

Sets the OpenSSL server certificate verification method.


1229
1230
1231
# File 'lib/mechanize.rb', line 1229

def verify_mode= verify_mode
  @agent.verify_mode = verify_mode
end

#visited?(url) ⇒ Boolean Also known as: visited_page

Returns a visited page for the url passed in, otherwise nil

Returns:

  • (Boolean)

278
279
280
281
282
# File 'lib/mechanize.rb', line 278

def visited? url
  url = url.href if url.respond_to? :href

  @agent.visited_page url
end