Module: Microget

Extended by:
Microget
Included in:
Microget
Defined in:
lib/microget.rb

Overview

An no-nonsense, pedal-to-the-metal unbuffered HTTP streaming client for doing GETs of large bodies, fast.

Constant Summary collapse

VERSION =
'1.0.0'
HEADER_LIMIT =
1024 * 64
HEADER_SEPARATOR =
"\r\n\r\n"
STATUS_PAT =

"HTTP/1.1 200 OK"

/HTTP\/([\d\.]+) (\d+) (.+)$/
SOCKET_TIMEOUT =

After which time to assume that the connection to the server has died

60 * 5

Instance Method Summary collapse

Instance Method Details

#get_status_headers_and_body_socket(uri, request_headers: {}) ⇒ Array<Numeric, Hash, Socket>

Executes a GET request to the given URI with the given headers.

Reads the status code and the response headers and parses them into a Hash and the numeric status code. Once that is done, it returns the socket so that the caller can read the body. The caller is responsible for closing the socket when done.

Parameters:

  • uri (String)

    the full URI of the request

  • request_headers (Hash) (defaults to: {})

    all the request headers to send with the request

Returns:

  • (Array<Numeric, Hash, Socket>)

    the HTTP status code, the header hash and the socket the body can be read from



24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# File 'lib/microget.rb', line 24

def get_status_headers_and_body_socket(uri, request_headers: {})
  uri = URI(uri.to_s)
  raise ('Only plain HTTP is supported (%s)' % uri) unless uri.scheme == 'http'
  raise "Unknown host" unless uri.host
  
  socket = TCPSocket.open(uri.host, uri.port || 80)
  socket.write("GET #{uri.request_uri} HTTP/1.1\r\n")
  
  # AWS signs the Host: header, so introducing port 80 into it "just because" is a bad idea
  if uri.port && uri.port.to_i != 80
    socket.write("Host: %s:%d\r\n" % [uri.host, uri.port])
  else
    socket.write("Host: %s\r\n" % uri.host)
  end
  socket.write("Connection: close\r\n") # Do not request keepalive
  
  # Write all the request headers
  request_headers.each { |k, v| socket.write("%s: %s\r\n" % [k,v]) }
  
  # Terminate the request
  socket.write("\r\n")

  # First read anything that might be related to the headers, up to and including \r\n\r\n.
  # Once that one is encountered - stash the remaining part we have read, and parse the headers
  headers_buf = read_ahead_headers(socket)
  status_code, header_hash = parse_status_and_headers(headers_buf)
  [status_code, header_hash, socket]
end

#perform_get(uri, request_headers: {}, chunk_size: 1024 * 1024 * 5) {|Array<Numeric, Hash, String>| ... } ⇒ Numeric

Executes a GET request to the given URI. Will yield the status, header hash and a chunk of the body to the given block.

The socket will be read from as long as the block given to the method yields a truthy value. Once the block returns a truthy value (or the HTTP response is read completely) the method will return the number of bytes of the body it did read and terminate.

Parameters:

  • uri (String)

    the full URI of the request

  • request_headers (Hash) (defaults to: {})

    all the request headers to send with the request

  • chunk_size (Numeric) (defaults to: 1024 * 1024 * 5)

    what size to feed to read() when reading the response from the socket

Yields:

  • (Array<Numeric, Hash, String>)

    the status code, the header hash and the chunk of the body data read.

Returns:

  • (Numeric)

    the total number of body bytes read from the socket



65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# File 'lib/microget.rb', line 65

def perform_get(uri, request_headers: {}, chunk_size: 1024 * 1024 * 5)
  status_code, header_hash, socket = get_status_headers_and_body_socket(uri, request_headers: request_headers)
  body_bytes_received = 0
  # ...and then just read the body, without any buffering, using a non-blocking read
  while !socket.eof?
    begin
      data = socket.read_nonblock(chunk_size)
      body_bytes_received += data.bytesize
      return unless yield(status_code, header_hash, data)
    rescue IO::WaitReadable
      IO.select([socket], [], SOCKET_TIMEOUT)
      retry
    end
  end
  
  # Yield the status and headers once with an empty response
  # so that the client gets at least something.
  if body_bytes_received.zero?
    yield(status_code, header_hash, '')
  end
  
  body_bytes_received
ensure
  socket.close if socket && !socket.closed?
end