Cryptosphere Identity Format (CIF)

Pronounced "sif" like the beginning of "sift"

A certificate format for the Cryptosphere. We have elected not to use ASN.1-derived formats like X.509, and instead use a novel certificate format (Cue obligatory XKCD comic).

This repository provides both the home of the format and a reference implementation in Ruby.

Rationale

The existing public key infrastructure has a number of known issues:

The goal of a new certificate format should be to address all of these points, with special attention paid to the third: designing a format that satisfies security concerns at a linguistic level.

Our design will consider the Security Applications of Formal Language Theory

Improvements

We propose the following to address the above problems:

  • A simple design that builds on existing standards (including JSON)
  • A human-readable format that can be viewed in any file viewer or editor
  • A format that learns the lessons of LANGSEC, with a formal grammar that is unambiguous and easy to implement

Full Recognition Before Processing

Linguistic Underpinnings

To understand the design choices of CIF from a linguistic perspective, we have to examine one of the most fundamental parts of language theory, the Chomsky Hierarchy. Languages, be they natural languages we speak, the programming languages humans use, or the instruction set architectures that our CPUs execute fall into four fundamental categories:

  • Regular: regular expressions. Can understand sequential patterns. Can't count
  • Context-free: can understand tree structures, but can't use symbols within what it's processing to help further understand what's being described
  • Context-sensitive: interprets portions of what's being processed to control subsequent processing
  • Recursively enumerable (Turing complete): capable of unbounded computation

We will select a format that is context-sensitive. At first glance this might not satisfy LANGSEC's requirements:

Context Free Or Regular

We will not be building a "weird machine", however. We will use a very simple format with built-in restrictions that will hopefully make even the most skeptical LANGSEC scruitinizer happy.

Our grammar will be context-sensitive because it includes a length prefix. That's the weirdest part about it. The length prefix will also be bounded, providing a maximum message length, and thus a guaranteed end to any computation. Some may see a maximum length on input documents as a weakness. We see it as a strength.

Even better, we're not going to invent anything new. We're merely going to synthesize existing ideas.

Self-Delimiting Strings

A self-delimiting string is a simple idea: you read some sort of length prefix, then can read an arbitrary string containing any data you want. When you're done, you can interpret the remaining data however you wish.

Some examples of self-delimiting strings are:

  • netstrings: Dan Bernstein's string format. Uses a decimal prefix of unbounded size, supporting arbitrary-length documents
  • git pkt-lines: Format used by the git protocol. Uses a fixed 4-byte prefix of hex digits, representing a 16-bit value. Messages (prefix excluded) can be a maximum of 65520 bytes (or 65524 bytes with prefix).

We will be using git pkt-lines to frame our certificates. The size limitation presents some problems, but we will work around them, and hopefully end up in a better place for doing so from a language-theoretic perspective.

Installation

Add this line to your application's Gemfile:

gem 'cif'

And then execute:

$ bundle

Or install it yourself as:

$ gem install cif

Contributing

  • Fork this repository on github
  • Make your changes and send us a pull request
  • If we like them we'll merge them

License

All project documentation is provided under the Creative Commons Attribution 3.0 Unported license.

Ruby source code Copyright (c) 2013 Tony Arcieri. Distributed under the MIT License. See LICENSE.txt for further details.