Class: WebServerUid

Inherits:
Object
  • Object
show all
Includes:
Comparable
Defined in:
lib/web_server_uid.rb,
lib/web_server_uid/version.rb

Overview

A WebServerUid represents a UID token, as issued by web browsers like Apache (mod_uid, www.lexa.ru/programs/mod-uid-eng.html) or nginx (http_userid_module, nginx.org/en/docs/http/ngx_http_userid_module.html).

(Note that while this is called a “UID”, it is almost certainly better understood as a “browser ID”, because it is unique to each browser and very unlikely to be managed in the same way as any “current user” concept you have.)

UID tokens can be very useful when tracking visitors to your site, and more so than just setting a unique cookie from your Rails app, for exactly one reason: since your front-end web server can issue and set the cookie directly, it means that you can get the UID logged on the very first request visitors make to your site – which is often a really critical one, since it tells you how they got there in the first place (the HTTP referer) and which page they first viewed (the landing page).

So, generally, you’ll want to do this:

  • Turn on mod_uid or http_userid_module.

  • Add the UID to the logs – in nginx, you’ll want to log both $uid_got and $uid_set, to handle both the case where you’ve already seen the browser before and the case where you haven’t.

  • In your Rails application,

Constant Summary collapse

BASE64_ALPHABET =

This contains all Base64 characters from all possible variants of Base64, according to en.wikipedia.org/wiki/Base64 – this is so that we accept Base64-encoded UID cookies, no matter what their source.

"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/-_\\.:!"
BASE64_PADDING =

This is, similarly, all characters that can be used as Base64 padding

"=-"
BASE64_REGEX =

This is a Regexp that matches any valid Base64 data

Regexp.new("^[#{BASE64_ALPHABET}]+[#{BASE64_PADDING}]*$")
RAW_BINARY_LENGTH =

How long is the raw binary data required to be (in bytes) after we decode it?

16
DEFAULT_ALLOWED_EXTRA_BINARY_DATA =

By default, how much extra binary data (in bytes) should we allow?

1
VERSION =
"1.0.2"

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(raw_data, type, options = { }) ⇒ WebServerUid

Creates a new WebServerUid object. raw_data must be a String, in one of the following formats:

  • Hex-encoded – the format nginx renders them in logs; e.g., 0100007FE7D7F35241946D1E02030303. This is a hex encoding of four little-endian four-byte integers underneath.

  • Base64.encoded – the format of the actual cookie in client browsers; e.g., fwAAAVLz1+cebZRBAwMDAgS=. This is a Base64 encoding of four big-endian four-byte integers.

  • Raw binary – the hex-decoded or Base64-decoded version of above; e.g., \x01\x00\x00\x7F\xE7\xD7\xF3RA\x94m\x1E\x02\x03\x03\x03. This is expected to be four big-endian four-byte integers.

…and type must be the corresponding format – one of :binary, :hex, or :base64. (It is not possible to guess the format 100% reliably from the inbound raw_data, since raw binary can happen to look like one of the others.) (type can also be :generated, for exclusive use of .generate, above.)

options can contain:

:max_allowed_extra_binary_data

If more data is present in the input string than is necessary for the UID to be parsed, this determines how much extra is allowed before an exception is raised; this defaults to 1, since, if you use nginx’s userid_mark directive, you’ll get exactly that character in the Base64 at the end, and this will translate to extra data.

Raises:

  • (ArgumentError)


161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
# File 'lib/web_server_uid.rb', line 161

def initialize(raw_data, type, options = { })
  raise ArgumentError, "Type must be one of :binary, :hex, or :base64, not #{type.inspect}" unless [ :binary, :hex, :base64, :generated ].include?(type)
  @input_type = type

  @binary_components = case type
  when :hex then
    @raw_binary_data = [ raw_data ].pack("H*")
    @raw_binary_data.unpack("VVVV")
  when :base64 then
    @raw_binary_data = Base64.decode64(raw_data)
    @raw_binary_data.unpack("NNNN")
  when :binary, :generated then
    @raw_binary_data = raw_data
    @raw_binary_data.unpack("NNNN")
  else
    raise "wrong type: #{type.inspect}; need to add support for it?"
  end

  @extra_binary_data = @raw_binary_data[RAW_BINARY_LENGTH..-1]
  @raw_binary_data = @raw_binary_data[0..(RAW_BINARY_LENGTH - 1)]

  if @raw_binary_data.length < RAW_BINARY_LENGTH
    raise ArgumentError, "This UID cookie does not appear to be long enough; its raw binary data is of length #{@raw_binary_data.length}, which is less than #{RAW_BINARY_LENGTH.inspect}: #{raw_data.inspect} (became #{@raw_binary_data.inspect})"
  end

  if @extra_binary_data.length > (options[:max_allowed_extra_binary_data] || DEFAULT_ALLOWED_EXTRA_BINARY_DATA)
    raise ArgumentError, "This UID cookie has #{@extra_binary_data.length} bytes of extra binary data at the end: #{@raw_binary_data.inspect} adds #{@extra_binary_data.inspect}"
  end
end

Class Method Details

.from_base64(b, options = { }) ⇒ Object

Creates a new instance from a base64 string; see #initialize for more details. Nicely returns nil if passed nil.



51
52
53
# File 'lib/web_server_uid.rb', line 51

def from_base64(b, options = { })
  new(b, :base64, options) if b
end

.from_binary(b, options = { }) ⇒ Object

Creates a new instance from a binary string; see #initialize for more details. Nicely returns nil if passed nil.



46
47
48
# File 'lib/web_server_uid.rb', line 46

def from_binary(b, options = { })
  new(b, :binary, options) if b
end

.from_header(s, expected_name) ⇒ Object

Given a string like “st_brid=0100007FE7D7F35241946D1E02030303”, and the expected name of the ID cookie (e.g., st_brid), returns a WebServerUid if one is found, and nil otherwise. Also returns nil if input is nil. This is the exact format you get in a request.env header if you have lines like these in your nginx config:

proxy_set_header X-Nginx-Browser-ID-Got $uid_got;
proxy_set_header X-Nginx-Browser-ID-Set $uid_set;

This is just a simple little method to make your parsing a bit easier.



63
64
65
66
67
# File 'lib/web_server_uid.rb', line 63

def from_header(s, expected_name)
  if s && s =~ /#{expected_name}\s*\=\s*([0-9A-F]{32})/i
    from_hex($1)
  end
end

.from_hex(h, options = { }) ⇒ Object

Creates a new instance from a hex string; see #initialize for more details. Nicely returns nil if passed nil.



41
42
43
# File 'lib/web_server_uid.rb', line 41

def from_hex(h, options = { })
  new(h, :hex, options) if h
end

.generate(options = { }) ⇒ Object

Generates a brand-new instance, from scratch. This follows exactly the algorithm in nginx-1.5.10:

  • The first four bytes are the local IP address (entire if IPv4, four LSB if IPv6);

  • The next four bytes are the current time, as a Unix epoch time;

  • The next two bytes are a function of the start time of the process, but LSBs in microseconds;

  • The next two bytes are the PID of the process;

  • The next three bytes are a sequence value, starting at 0x030303;

  • The last byte is 2, for version 2.

options can contain:

:ip_address

Must be an IPAddr object to use as the IP address of this machine, in lieu of autodetection (see #find_local_ip_address, below).



82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
# File 'lib/web_server_uid.rb', line 82

def generate(options = { })
  # Yes, global variables. What what?
  #
  # Well, in certain cases (like under Rails), this class may get unloaded and reloaded. (Yes, it's in a gem, so
  # theoretically this shouldn't happen, but we want to be really, really careful.) Because we need to be really
  # sure to maintain uniqueness, we use global variables, which, unlike class variables, won't get reset if this
  # class gets loaded or unloaded
  $_web_server_uid_start_value ||= ((Time.now.usec / 20) << 16) | (Process.pid & 0xFFFF)
  $_web_server_uid_sequencer ||= 0x030302
  $_web_server_uid_sequencer += 1
  $_web_server_uid_sequencer &= 0xFFFFFF

  extra = options.keys - [ :ip_address ]
  if extra.length > 0
    raise ArgumentError, "Unknown keys: #{extra.inspect}"
  end

  ip_address = if options[:ip_address]
    IPAddr.new(options[:ip_address])
  else
    find_local_ip_address
  end

  components = [
    ip_address.to_i & 0xFFFFFFFF,
    Time.now.to_i,
    $_web_server_uid_start_value,
    ($_web_server_uid_sequencer << 8) | 0x2
  ]

  binary = components.pack("NNNN")
  new(binary, :generated)
end

Instance Method Details

#<=>(other) ⇒ Object

This, plus Comparable, implements all the equality and comparison operators we could ever need.



192
193
194
195
196
197
198
199
200
# File 'lib/web_server_uid.rb', line 192

def <=>(other)
  other_components = other.binary_components
  binary_components.each_with_index do |our_component, index|
    other_component = other_components[index]
    out = our_component <=> other_component
    return out unless out == 0
  end
  0
end

#binary_componentsObject

Returns an Array of length 4; each component will be a single, four-byte Integer, in big-endian byte order, representing the underlying UID.



246
247
248
# File 'lib/web_server_uid.rb', line 246

def binary_components
  @binary_components
end

The version number of the cookie – the LSB of the sequencer_component.



308
309
310
# File 'lib/web_server_uid.rb', line 308

def cookie_version_number
  @binary_components[3] & 0xFF
end

#eql?(other) ⇒ Boolean

…well, except for this one. ;)

Returns:

  • (Boolean)


213
214
215
# File 'lib/web_server_uid.rb', line 213

def eql?(other)
  self == other
end

#extra_binary_dataObject

Returns any extra binary data that was supplied (and successfully ignored) past the end of the input string.



251
252
253
# File 'lib/web_server_uid.rb', line 251

def extra_binary_data
  @extra_binary_data
end

#hashObject

Let’s make sure we hash ourselves correctly, so we, well, work inside a Hash. :)



218
219
220
# File 'lib/web_server_uid.rb', line 218

def hash
  binary_components.hash
end

#inspectObject



206
207
208
# File 'lib/web_server_uid.rb', line 206

def inspect
  to_s
end

#issue_timeObject

This is the “issue time” – the time at which the UID was generated, as a Un*x epoch time – as an integer.



267
268
269
# File 'lib/web_server_uid.rb', line 267

def issue_time
  @binary_components[1]
end

#issue_time_as_timeObject

This is the issue time, as a Time object.



272
273
274
# File 'lib/web_server_uid.rb', line 272

def issue_time_as_time
  Time.at(issue_time)
end

#pidObject

As explained above, this is just the PID itself from the third comppnent.



286
287
288
# File 'lib/web_server_uid.rb', line 286

def pid
  pid_component & 0xFFFF
end

#pid_componentObject

This is the “process ID” component – the third four bytes. While this is documented as simply being the process ID of the server process, realistically, servers add more entropy to avoid collisions (and because PIDs are often only two bytes long). Nginx sets the top two bytes to the two least-significant bytes of the current time in microseconds, for example. So we have #pid_component, here, that returns the whole thing, and #pid that returns just the actual PID.



281
282
283
# File 'lib/web_server_uid.rb', line 281

def pid_component
  @binary_components[2]
end

#sequencerObject

The actual sequencer value.



297
298
299
# File 'lib/web_server_uid.rb', line 297

def sequencer
  sequencer_component >> 8
end

#sequencer_as_hexObject

The sequencer value, as a six-byte hex string, which is a much easier way of looking at it (since it’s oddly defined to start at 0x030303.)



303
304
305
# File 'lib/web_server_uid.rb', line 303

def sequencer_as_hex
  "%06x" % sequencer
end

#sequencer_componentObject

This is the “sequencer” component – the last four bytes, which contains both a cookie version number (the LSB) and a sequence number (the three MSBs).



292
293
294
# File 'lib/web_server_uid.rb', line 292

def sequencer_component
  @binary_components[3]
end

#service_numberObject

This is the “service number” – the first byte of the UID string. Typically, this is the IP address of the server that generated the UID.



257
258
259
# File 'lib/web_server_uid.rb', line 257

def service_number
  @binary_components[0]
end

#service_number_as_ipObject

Returns the “service number” as an IPAddr object; you can call #to_s on this to get a string in dotted notation.



262
263
264
# File 'lib/web_server_uid.rb', line 262

def service_number_as_ip
  IPAddr.new(service_number, Socket::AF_INET)
end

#to_base64_stringObject

Returns the Base64-encoded variant of the UID – exactly the string that ends up in a cookie in client browsers.

This will be identical for two equivalent UIDs, no matter what representations they were parsed from.



233
234
235
# File 'lib/web_server_uid.rb', line 233

def to_base64_string
  Base64.encode64(@binary_components.pack("NNNN"))
end

#to_binary_stringObject

Returns a pure-binary string for this UID.

This will be identical for two equivalent UIDs, no matter what representations they were parsed from.



240
241
242
# File 'lib/web_server_uid.rb', line 240

def to_binary_string
  @binary_components.pack("NNNN")
end

#to_hex_stringObject

Returns the hex-encoded variant of the UID – exactly the string that nginx logs to disk or puts in a header created with $uid_got, etc.

This will be identical for two equivalent UIDs, no matter what representations they were parsed from.



226
227
228
# File 'lib/web_server_uid.rb', line 226

def to_hex_string
  @binary_components.pack("VVVV").bytes.map { |b| "%02X" % b }.join("")
end

#to_sObject



202
203
204
# File 'lib/web_server_uid.rb', line 202

def to_s
  "<#{self.class.name} from #{@input_type}: #{to_hex_string}>"
end