Class: WebServerUid
- Inherits:
-
Object
- Object
- WebServerUid
- Includes:
- Comparable
- Defined in:
- lib/web_server_uid.rb,
lib/web_server_uid/version.rb
Overview
A WebServerUid represents a UID token, as issued by web browsers like Apache (mod_uid, www.lexa.ru/programs/mod-uid-eng.html) or nginx (http_userid_module, nginx.org/en/docs/http/ngx_http_userid_module.html).
(Note that while this is called a “UID”, it is almost certainly better understood as a “browser ID”, because it is unique to each browser and very unlikely to be managed in the same way as any “current user” concept you have.)
UID tokens can be very useful when tracking visitors to your site, and more so than just setting a unique cookie from your Rails app, for exactly one reason: since your front-end web server can issue and set the cookie directly, it means that you can get the UID logged on the very first request visitors make to your site – which is often a really critical one, since it tells you how they got there in the first place (the HTTP referer) and which page they first viewed (the landing page).
So, generally, you’ll want to do this:
-
Turn on
mod_uid
orhttp_userid_module
. -
Add the UID to the logs – in nginx, you’ll want to log both $uid_got and $uid_set, to handle both the case where you’ve already seen the browser before and the case where you haven’t.
-
In your Rails application,
Constant Summary collapse
- BASE64_ALPHABET =
This contains all Base64 characters from all possible variants of Base64, according to en.wikipedia.org/wiki/Base64 – this is so that we accept Base64-encoded UID cookies, no matter what their source.
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/-_\\.:!"
- BASE64_PADDING =
This is, similarly, all characters that can be used as Base64 padding
"=-"
- BASE64_REGEX =
This is a Regexp that matches any valid Base64 data
Regexp.new("^[#{BASE64_ALPHABET}]+[#{BASE64_PADDING}]*$")
- RAW_BINARY_LENGTH =
How long is the raw binary data required to be (in bytes) after we decode it?
16
- DEFAULT_ALLOWED_EXTRA_BINARY_DATA =
By default, how much extra binary data (in bytes) should we allow?
1
- VERSION =
"1.0.2"
Class Method Summary collapse
-
.from_base64(b, options = { }) ⇒ Object
Creates a new instance from a base64 string; see #initialize for more details.
-
.from_binary(b, options = { }) ⇒ Object
Creates a new instance from a binary string; see #initialize for more details.
-
.from_header(s, expected_name) ⇒ Object
Given a string like “st_brid=0100007FE7D7F35241946D1E02030303”, and the expected name of the ID cookie (e.g.,
st_brid
), returns a WebServerUid if one is found, and nil otherwise. -
.from_hex(h, options = { }) ⇒ Object
Creates a new instance from a hex string; see #initialize for more details.
-
.generate(options = { }) ⇒ Object
Generates a brand-new instance, from scratch.
Instance Method Summary collapse
-
#<=>(other) ⇒ Object
This, plus Comparable, implements all the equality and comparison operators we could ever need.
-
#binary_components ⇒ Object
Returns an Array of length 4; each component will be a single, four-byte Integer, in big-endian byte order, representing the underlying UID.
-
#cookie_version_number ⇒ Object
The version number of the cookie – the LSB of the sequencer_component.
-
#eql?(other) ⇒ Boolean
…well, except for this one.
-
#extra_binary_data ⇒ Object
Returns any extra binary data that was supplied (and successfully ignored) past the end of the input string.
-
#hash ⇒ Object
Let’s make sure we hash ourselves correctly, so we, well, work inside a Hash.
-
#initialize(raw_data, type, options = { }) ⇒ WebServerUid
constructor
Creates a new WebServerUid object.
- #inspect ⇒ Object
-
#issue_time ⇒ Object
This is the “issue time” – the time at which the UID was generated, as a Un*x epoch time – as an integer.
-
#issue_time_as_time ⇒ Object
This is the issue time, as a Time object.
-
#pid ⇒ Object
As explained above, this is just the PID itself from the third comppnent.
-
#pid_component ⇒ Object
This is the “process ID” component – the third four bytes.
-
#sequencer ⇒ Object
The actual sequencer value.
-
#sequencer_as_hex ⇒ Object
The sequencer value, as a six-byte hex string, which is a much easier way of looking at it (since it’s oddly defined to start at 0x030303.).
-
#sequencer_component ⇒ Object
This is the “sequencer” component – the last four bytes, which contains both a cookie version number (the LSB) and a sequence number (the three MSBs).
-
#service_number ⇒ Object
This is the “service number” – the first byte of the UID string.
-
#service_number_as_ip ⇒ Object
Returns the “service number” as an IPAddr object; you can call #to_s on this to get a string in dotted notation.
-
#to_base64_string ⇒ Object
Returns the Base64-encoded variant of the UID – exactly the string that ends up in a cookie in client browsers.
-
#to_binary_string ⇒ Object
Returns a pure-binary string for this UID.
-
#to_hex_string ⇒ Object
Returns the hex-encoded variant of the UID – exactly the string that nginx logs to disk or puts in a header created with $uid_got, etc.
- #to_s ⇒ Object
Constructor Details
#initialize(raw_data, type, options = { }) ⇒ WebServerUid
Creates a new WebServerUid object. raw_data
must be a String, in one of the following formats:
-
Hex-encoded – the format nginx renders them in logs; e.g.,
0100007FE7D7F35241946D1E02030303
. This is a hex encoding of four little-endian four-byte integers underneath. -
Base64.encoded – the format of the actual cookie in client browsers; e.g.,
fwAAAVLz1+cebZRBAwMDAgS=
. This is a Base64 encoding of four big-endian four-byte integers. -
Raw binary – the hex-decoded or Base64-decoded version of above; e.g.,
\x01\x00\x00\x7F\xE7\xD7\xF3RA\x94m\x1E\x02\x03\x03\x03
. This is expected to be four big-endian four-byte integers.
…and type
must be the corresponding format – one of :binary, :hex, or :base64. (It is not possible to guess the format 100% reliably from the inbound raw_data
, since raw binary can happen to look like one of the others.) (type
can also be :generated
, for exclusive use of .generate
, above.)
options
can contain:
- :max_allowed_extra_binary_data
-
If more data is present in the input string than is necessary for the UID to be parsed, this determines how much extra is allowed before an exception is raised; this defaults to 1, since, if you use nginx’s
userid_mark
directive, you’ll get exactly that character in the Base64 at the end, and this will translate to extra data.
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
# File 'lib/web_server_uid.rb', line 161 def initialize(raw_data, type, = { }) raise ArgumentError, "Type must be one of :binary, :hex, or :base64, not #{type.inspect}" unless [ :binary, :hex, :base64, :generated ].include?(type) @input_type = type @binary_components = case type when :hex then @raw_binary_data = [ raw_data ].pack("H*") @raw_binary_data.unpack("VVVV") when :base64 then @raw_binary_data = Base64.decode64(raw_data) @raw_binary_data.unpack("NNNN") when :binary, :generated then @raw_binary_data = raw_data @raw_binary_data.unpack("NNNN") else raise "wrong type: #{type.inspect}; need to add support for it?" end @extra_binary_data = @raw_binary_data[RAW_BINARY_LENGTH..-1] @raw_binary_data = @raw_binary_data[0..(RAW_BINARY_LENGTH - 1)] if @raw_binary_data.length < RAW_BINARY_LENGTH raise ArgumentError, "This UID cookie does not appear to be long enough; its raw binary data is of length #{@raw_binary_data.length}, which is less than #{RAW_BINARY_LENGTH.inspect}: #{raw_data.inspect} (became #{@raw_binary_data.inspect})" end if @extra_binary_data.length > ([:max_allowed_extra_binary_data] || DEFAULT_ALLOWED_EXTRA_BINARY_DATA) raise ArgumentError, "This UID cookie has #{@extra_binary_data.length} bytes of extra binary data at the end: #{@raw_binary_data.inspect} adds #{@extra_binary_data.inspect}" end end |
Class Method Details
.from_base64(b, options = { }) ⇒ Object
Creates a new instance from a base64 string; see #initialize for more details. Nicely returns nil if passed nil.
51 52 53 |
# File 'lib/web_server_uid.rb', line 51 def from_base64(b, = { }) new(b, :base64, ) if b end |
.from_binary(b, options = { }) ⇒ Object
Creates a new instance from a binary string; see #initialize for more details. Nicely returns nil if passed nil.
46 47 48 |
# File 'lib/web_server_uid.rb', line 46 def from_binary(b, = { }) new(b, :binary, ) if b end |
.from_header(s, expected_name) ⇒ Object
Given a string like “st_brid=0100007FE7D7F35241946D1E02030303”, and the expected name of the ID cookie (e.g., st_brid
), returns a WebServerUid if one is found, and nil otherwise. Also returns nil if input is nil. This is the exact format you get in a request.env header if you have lines like these in your nginx config:
proxy_set_header X-Nginx-Browser-ID-Got $uid_got;
proxy_set_header X-Nginx-Browser-ID-Set $uid_set;
This is just a simple little method to make your parsing a bit easier.
63 64 65 66 67 |
# File 'lib/web_server_uid.rb', line 63 def from_header(s, expected_name) if s && s =~ /#{expected_name}\s*\=\s*([0-9A-F]{32})/i from_hex($1) end end |
.from_hex(h, options = { }) ⇒ Object
Creates a new instance from a hex string; see #initialize for more details. Nicely returns nil if passed nil.
41 42 43 |
# File 'lib/web_server_uid.rb', line 41 def from_hex(h, = { }) new(h, :hex, ) if h end |
.generate(options = { }) ⇒ Object
Generates a brand-new instance, from scratch. This follows exactly the algorithm in nginx-1.5.10:
-
The first four bytes are the local IP address (entire if IPv4, four LSB if IPv6);
-
The next four bytes are the current time, as a Unix epoch time;
-
The next two bytes are a function of the start time of the process, but LSBs in microseconds;
-
The next two bytes are the PID of the process;
-
The next three bytes are a sequence value, starting at 0x030303;
-
The last byte is 2, for version 2.
options
can contain:
- :ip_address
-
Must be an IPAddr object to use as the IP address of this machine, in lieu of autodetection (see #find_local_ip_address, below).
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
# File 'lib/web_server_uid.rb', line 82 def generate( = { }) # Yes, global variables. What what? # # Well, in certain cases (like under Rails), this class may get unloaded and reloaded. (Yes, it's in a gem, so # theoretically this shouldn't happen, but we want to be really, really careful.) Because we need to be really # sure to maintain uniqueness, we use global variables, which, unlike class variables, won't get reset if this # class gets loaded or unloaded $_web_server_uid_start_value ||= ((Time.now.usec / 20) << 16) | (Process.pid & 0xFFFF) $_web_server_uid_sequencer ||= 0x030302 $_web_server_uid_sequencer += 1 $_web_server_uid_sequencer &= 0xFFFFFF extra = .keys - [ :ip_address ] if extra.length > 0 raise ArgumentError, "Unknown keys: #{extra.inspect}" end ip_address = if [:ip_address] IPAddr.new([:ip_address]) else find_local_ip_address end components = [ ip_address.to_i & 0xFFFFFFFF, Time.now.to_i, $_web_server_uid_start_value, ($_web_server_uid_sequencer << 8) | 0x2 ] binary = components.pack("NNNN") new(binary, :generated) end |
Instance Method Details
#<=>(other) ⇒ Object
This, plus Comparable, implements all the equality and comparison operators we could ever need.
192 193 194 195 196 197 198 199 200 |
# File 'lib/web_server_uid.rb', line 192 def <=>(other) other_components = other.binary_components binary_components.each_with_index do |our_component, index| other_component = other_components[index] out = our_component <=> other_component return out unless out == 0 end 0 end |
#binary_components ⇒ Object
Returns an Array of length 4; each component will be a single, four-byte Integer, in big-endian byte order, representing the underlying UID.
246 247 248 |
# File 'lib/web_server_uid.rb', line 246 def binary_components @binary_components end |
#cookie_version_number ⇒ Object
The version number of the cookie – the LSB of the sequencer_component.
308 309 310 |
# File 'lib/web_server_uid.rb', line 308 def @binary_components[3] & 0xFF end |
#eql?(other) ⇒ Boolean
…well, except for this one. ;)
213 214 215 |
# File 'lib/web_server_uid.rb', line 213 def eql?(other) self == other end |
#extra_binary_data ⇒ Object
Returns any extra binary data that was supplied (and successfully ignored) past the end of the input string.
251 252 253 |
# File 'lib/web_server_uid.rb', line 251 def extra_binary_data @extra_binary_data end |
#hash ⇒ Object
Let’s make sure we hash ourselves correctly, so we, well, work inside a Hash. :)
218 219 220 |
# File 'lib/web_server_uid.rb', line 218 def hash binary_components.hash end |
#inspect ⇒ Object
206 207 208 |
# File 'lib/web_server_uid.rb', line 206 def inspect to_s end |
#issue_time ⇒ Object
This is the “issue time” – the time at which the UID was generated, as a Un*x epoch time – as an integer.
267 268 269 |
# File 'lib/web_server_uid.rb', line 267 def issue_time @binary_components[1] end |
#issue_time_as_time ⇒ Object
This is the issue time, as a Time object.
272 273 274 |
# File 'lib/web_server_uid.rb', line 272 def issue_time_as_time Time.at(issue_time) end |
#pid ⇒ Object
As explained above, this is just the PID itself from the third comppnent.
286 287 288 |
# File 'lib/web_server_uid.rb', line 286 def pid pid_component & 0xFFFF end |
#pid_component ⇒ Object
This is the “process ID” component – the third four bytes. While this is documented as simply being the process ID of the server process, realistically, servers add more entropy to avoid collisions (and because PIDs are often only two bytes long). Nginx sets the top two bytes to the two least-significant bytes of the current time in microseconds, for example. So we have #pid_component, here, that returns the whole thing, and #pid that returns just the actual PID.
281 282 283 |
# File 'lib/web_server_uid.rb', line 281 def pid_component @binary_components[2] end |
#sequencer ⇒ Object
The actual sequencer value.
297 298 299 |
# File 'lib/web_server_uid.rb', line 297 def sequencer sequencer_component >> 8 end |
#sequencer_as_hex ⇒ Object
The sequencer value, as a six-byte hex string, which is a much easier way of looking at it (since it’s oddly defined to start at 0x030303.)
303 304 305 |
# File 'lib/web_server_uid.rb', line 303 def sequencer_as_hex "%06x" % sequencer end |
#sequencer_component ⇒ Object
This is the “sequencer” component – the last four bytes, which contains both a cookie version number (the LSB) and a sequence number (the three MSBs).
292 293 294 |
# File 'lib/web_server_uid.rb', line 292 def sequencer_component @binary_components[3] end |
#service_number ⇒ Object
This is the “service number” – the first byte of the UID string. Typically, this is the IP address of the server that generated the UID.
257 258 259 |
# File 'lib/web_server_uid.rb', line 257 def service_number @binary_components[0] end |
#service_number_as_ip ⇒ Object
Returns the “service number” as an IPAddr object; you can call #to_s on this to get a string in dotted notation.
262 263 264 |
# File 'lib/web_server_uid.rb', line 262 def service_number_as_ip IPAddr.new(service_number, Socket::AF_INET) end |
#to_base64_string ⇒ Object
Returns the Base64-encoded variant of the UID – exactly the string that ends up in a cookie in client browsers.
This will be identical for two equivalent UIDs, no matter what representations they were parsed from.
233 234 235 |
# File 'lib/web_server_uid.rb', line 233 def to_base64_string Base64.encode64(@binary_components.pack("NNNN")) end |
#to_binary_string ⇒ Object
Returns a pure-binary string for this UID.
This will be identical for two equivalent UIDs, no matter what representations they were parsed from.
240 241 242 |
# File 'lib/web_server_uid.rb', line 240 def to_binary_string @binary_components.pack("NNNN") end |
#to_hex_string ⇒ Object
Returns the hex-encoded variant of the UID – exactly the string that nginx logs to disk or puts in a header created with $uid_got, etc.
This will be identical for two equivalent UIDs, no matter what representations they were parsed from.
226 227 228 |
# File 'lib/web_server_uid.rb', line 226 def to_hex_string @binary_components.pack("VVVV").bytes.map { |b| "%02X" % b }.join("") end |
#to_s ⇒ Object
202 203 204 |
# File 'lib/web_server_uid.rb', line 202 def to_s "<#{self.class.name} from #{@input_type}: #{to_hex_string}>" end |