HasUnpublishedPassword

What is it?

This is a gem which performs offline checks against the HIBP master list.

It can be used to ensure that your users are not using credentials which have previously been leaked.

The checks are performed using a pre-built cuckoo filter.

Status

I just threw this together and haven't used it in production yet.

I've pre-built filters from three datasets (found in the hibp-cuckoo-filter gem):

  • The 'top 560k passwords' filter is 992kb, with a 1.7% false positive rate. Uses 1mb ram once loaded.
  • The 'top 5600k passwords' filter is 9.6mb, with a 2.1% false positive rate. Uses 8mb ram once loaded.
  • The 'top 56m passwords' filter is 80mb, with a 2.6% false positive rate. Uses 67mb ram once loaded.

On my recent macbook pro, checking a single password against the largest filter takes about 3.2 microseconds.

Installation

Add this line to your application's Gemfile:

gem 'has_unpublished_password'
gem 'hibp-cuckoo-filter' # optional data files, if you don't want to build your own.

And then execute:

$ bundle

Or install it yourself as:

$ gem install has_unpublished_password

Usage

Configuration

Add an initializer (eg config/initializers/has_unpublished_password.rb):

HasUnpublishedPassword.configure do |config|
  # Valid values: :small, :medium, :large
  config.filter = :large
end

Validation

validates :password, never_leaked_to_hibp: true

Low level usage

filter = HasUnpublishedPassword.import('serialized.json.gz')
filter.has? Digest::SHA1.hexdigest('password') # true

Development

Native component

The native component is written in rust.

To update it to a new version, cross compile it in release mode, then:

cd ../rust-cuckoofilter/cabi

cargo build --target=x86_64-unknown-linux-gnu --release
cp target/release/libcuckoofilter_cabi.d ../../has_unpublished_password/nativeext/x86_64/

cargo build --target=x86_64-apple-darwin --release
cp target/release/libcuckoofilter_cabi.dylib ../../has_unpublished_password/nativeext/x86_64/

Building the filter

First, download the master list from HIBP (I used the 'ordered by frequency' list) and decompress it.

Then, run data/prepare.sh <path-to-master-list-file>.

This takes quite awhile; it'll print how many lines it's completed periodically.

This process writes its result to serialized.json.gz when complete.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/danielheath/has_unpublished_password. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the HasUnpublishedPassword project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

24 bits is 2mb. 1/3rd of values will be filled by 5 million rows.

take 6 24-bit sequences (for 12mb). That yields a 0.13% collision chance. That's promising.