A syntax highlighting a library for Ruby.

This fork is maintained and version 1.1.0 has been published from it. However, there's currently none or not much new development going on here and the original author, @jamis, recommends using CodeRay, over this library.


This is a simple syntax highlighting library for Ruby. It is a naive syntax analysis tool, meaning that it does not “understand” the syntaxes of the languages it processes, but merely does some semi-intelligent pattern matching.


There are primarily two uses for the Syntax library:

# Convert text from a supported syntax to a supported highlight format (like HTML). # Tokenize text in a supported syntax and process the tokens directly.

Highlighting a supported syntax

require 'syntax/convertors/html'

convertor = Syntax::Convertors::HTML.for_syntax "ruby"
puts convertor.convert( "file.rb" ) )

The above snippet will emit HTML, using spans and CSS to indicate the different highlight “groups”. (Sample CSS files are included in the “data” directory.)

Tokenize text

require 'syntax'

tokenizer = Syntax.load "ruby"
tokenizer.tokenize( "file.rb" ) ) do |token|
  puts "group(#{}, #{token.instruction}) lexeme(#{token})"

Tokenizing is straightforward process. Each time a new token is discovered by the tokenizer, it is yielded to the given block.

  • is the lexical group to which the token belongs. Each supported syntax may have it's own set of lexical groups.

  • token.instruction is an instruction used to determine how this token should be treated. It will be :none for normal tokens, :region_open if the token starts a nested region, and :region_close if it closes the last opened region.

  • token is itself a subclass of String, so you can use it just as you would a string. It represents the lexeme that was actually parsed.