Module: Microget
Overview
An no-nonsense, pedal-to-the-metal unbuffered HTTP streaming client for doing GETs of large bodies, fast.
Constant Summary collapse
- VERSION =
'1.0.0'- HEADER_LIMIT =
1024 * 64
- HEADER_SEPARATOR =
"\r\n\r\n"- STATUS_PAT =
"HTTP/1.1 200 OK"
/HTTP\/([\d\.]+) (\d+) (.+)$/- SOCKET_TIMEOUT =
After which time to assume that the connection to the server has died
60 * 5
Instance Method Summary collapse
-
#get_status_headers_and_body_socket(uri, request_headers: {}) ⇒ Array<Numeric, Hash, Socket>
Executes a GET request to the given URI with the given headers.
-
#perform_get(uri, request_headers: {}, chunk_size: 1024 * 1024 * 5) {|Array<Numeric, Hash, String>| ... } ⇒ Numeric
Executes a GET request to the given URI.
Instance Method Details
#get_status_headers_and_body_socket(uri, request_headers: {}) ⇒ Array<Numeric, Hash, Socket>
Executes a GET request to the given URI with the given headers.
Reads the status code and the response headers and parses them into a Hash and the numeric status code. Once that is done, it returns the socket so that the caller can read the body. The caller is responsible for closing the socket when done.
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
# File 'lib/microget.rb', line 24 def get_status_headers_and_body_socket(uri, request_headers: {}) uri = URI(uri.to_s) raise ('Only plain HTTP is supported (%s)' % uri) unless uri.scheme == 'http' raise "Unknown host" unless uri.host socket = TCPSocket.open(uri.host, uri.port || 80) socket.write("GET #{uri.request_uri} HTTP/1.1\r\n") # AWS signs the Host: header, so introducing port 80 into it "just because" is a bad idea if uri.port && uri.port.to_i != 80 socket.write("Host: %s:%d\r\n" % [uri.host, uri.port]) else socket.write("Host: %s\r\n" % uri.host) end socket.write("Connection: close\r\n") # Do not request keepalive # Write all the request headers request_headers.each { |k, v| socket.write("%s: %s\r\n" % [k,v]) } # Terminate the request socket.write("\r\n") # First read anything that might be related to the headers, up to and including \r\n\r\n. # Once that one is encountered - stash the remaining part we have read, and parse the headers headers_buf = read_ahead_headers(socket) status_code, header_hash = parse_status_and_headers(headers_buf) [status_code, header_hash, socket] end |
#perform_get(uri, request_headers: {}, chunk_size: 1024 * 1024 * 5) {|Array<Numeric, Hash, String>| ... } ⇒ Numeric
Executes a GET request to the given URI. Will yield the status, header hash and a chunk of the body to the given block.
The socket will be read from as long as the block given to the method yields a truthy value. Once the block returns a truthy value (or the HTTP response is read completely) the method will return the number of bytes of the body it did read and terminate.
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
# File 'lib/microget.rb', line 65 def perform_get(uri, request_headers: {}, chunk_size: 1024 * 1024 * 5) status_code, header_hash, socket = get_status_headers_and_body_socket(uri, request_headers: request_headers) body_bytes_received = 0 # ...and then just read the body, without any buffering, using a non-blocking read while !socket.eof? begin data = socket.read_nonblock(chunk_size) body_bytes_received += data.bytesize return unless yield(status_code, header_hash, data) rescue IO::WaitReadable IO.select([socket], [], SOCKET_TIMEOUT) retry end end # Yield the status and headers once with an empty response # so that the client gets at least something. if body_bytes_received.zero? yield(status_code, header_hash, '') end body_bytes_received ensure socket.close if socket && !socket.closed? end |