unibits | Reveal the Unicode [version] [ci]

Ruby library and CLI command that visualizes various Unicode and ASCII/single byte encodings in the terminal:

  • Makes analyzing encodings easier
  • Helps you with debugging strings
  • Highlights invalid/special/blank bytes/characters/codepoints
  • Supports UTF-8, UTF-16LE/UTF-16BE, UTF-32LE/UTF-32BE, ISO-8859-X, Windows-125X, IBMX, CP85X, macX, TIS-620/Windows-874, KOI8-R/KOI8-U, 7-Bit ASCII/GB1988, and arbitrary BINARY data

Color Coding

Each byte of the given string is highlighted using the following mechanism (characters -> codepoints):

  • Red for invalid bytes
  • Light blue for blanks
  • Blue for control characters
  • Non-control formatting characters in pink
  • Green for marks (Unicode only)
  • Orange for unassigned codepoints
  • Lighter orange for unassigned codepoints which are also ignorable
  • Random color for all other codepoints

The same colors are used in the higher-level companion tool uniscribe.

Setup

Make sure you have Ruby installed and installing gems works properly. Then do:

$ gem install unibits

Usage

Pass the string to debug to unibits:

From CLI

$ unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"

From Ruby

require 'unibits/kernel_method'
unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"

Advanced Options

unibits takes some optional options:

  • encoding (e): The encoding of the given string (uses the string's default encoding if none given)
  • convert (c): An encoding the string should be converted to before visualizing it
  • stats: Whether to show a short stats header (default: true), you can deactivate on the CLI with --no-stats
  • wide-ambiguous: Treat characters of ambiguous width as 2 spaces instead of 1 (more info)
  • width (w): Set a custom column width, if not set, unibits will retrieve it from the terminal or just use 80

Examples of Valid Encodings

UTF-8

CLI: $ unibits -e utf-8 -c utf-8 "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"

Ruby: unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪", encoding: 'utf-8', convert: 'utf-8'

Screenshot UTF-8

UTF-16LE

CLI: $ unibits -e utf-8 -c utf-16le "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"

Ruby: unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪", encoding: 'utf-8', convert: 'utf-16le'

Screenshot UTF-16LE

UTF-32BE

CLI: $ unibits -e utf-8 -c utf-32be "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"

Ruby: unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪", encoding: 'utf-8', convert: 'utf-32be'

Screenshot UTF-32BE

BINARY

CLI: $ unibits -e binary "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"

Ruby: unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪", encoding: 'binary'

Screenshot BINARY

ASCII

CLI: $ unibits -e utf-8 -c ascii "ascii"

Ruby: unibits "ascii", encoding: 'utf-8', convert: 'ascii'

Screenshot ASCII

Examples of Invalid Encodings

UTF-8

Example in Ruby: unibits "unexpected \x80 | not enough \xF0\x9F\x8C | overlong \xE0\x81\x81 | surrogate \xED\xA0\x80 | too large \xF5\x8F\xBF\xBF"

Screenshot invalid UTF-8

ASCII

Example in Ruby: unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪", encoding: 'ascii'

Screenshot invalid ASCII

Notes

More info

Related gems

Lots of thanks to @damienklinnert for the motivation and inspiration required to build this! 馃巻

Copyright (C) 2017-2020 Jan Lelis https://janlelis.com. Released under the MIT license.