Introduction
RubyBreaker is a dynamic type documentation tool written in pure Ruby. It provides the framework for dynamically instrumenting a Ruby program to monitor objects during executions and document the observed type information. In other words, RubyBreaker "breaks" Ruby code out of its obscurity and wildness (as in "code breaking" or "horse breaking") by auto-documenting type information. The type documentation generated by RubyBreaker is also an executable Ruby code that can be used as an input to subsequent analyses.
The primary goal of RubyBreaker is to assign a type signature to every method in selected modules and classes. A type signature is written in the RubyBreaker Type Annotation Language which resembles the documentation style used in Ruby API Doc. Manual code change is not required if used in Rakefile and is kept minimal if otherwise. Overall, this tool should help Ruby programmers document their code more rigorously and effectively.
Current limitations are:
- Auto-documentation of block arguments (inherent)
- Parametric polymorphic types
- RDoc or YARD documentation support
To contribute to the project, visit RubyBreaker's GitHub page and RubyGems page. RubyBreaker RDoc can be found in here.
Requirements
Ruby 1.9.x and TreeTop 1.x
If the most recent Ruby 1.9 is installed on the computer, it will probably work. If TreeTop is not installed, use RubyGems or download from the following URL: TreeTop
Installation
It is as simple as running the following command:
$ gem install rubybreaker
Tutorial
This tutorial will describe the basic usage of the tool, the RubyBreaker Type Annotation Language, and the RubyBreaker Type System.
Usage
RubyBreaker takes advantage of test cases that already come with the source program. It is recommended that RubyBreaker is run as a Rake task, which requires a minimum code change in the Rakefile and no code change in the source program. If not used as a Rake task, it requires a minimum code change in each test case or the source program but should not affect the development process much. Let's briefly see how RubyBreaker can be run directly as a command-line program to understand the general concept of the tool. We will explain how to use RubyBreaker in a Rakefile later.
$ rubybreaker -v prog.rb
This runs RubyBreaker in verbose mode on prog.rb
. Note that RubyBreaker
will actually run prog.rb
(by simply require
ing the program file).
Somewhere in the program, there has to be a program entry point to
indicate where the monitoring of objects starts. Let's assume prog.rb
as the following:
require "rubybreaker" # required if using "ruby" instead
class A
def foo(x)
x.to_s
end
end
class B
def (y,z)
y.foo(z)
end
end
RubyBreaker.run(A, B)
A.new.foo(1)
This example will show how A#foo
method is given a type by RubyBreaker.
After running rubybreaker -v prog.rb
, the following output will be
generated and saved into prog.rubybreaker.rb
.
# This file is auto-generated by RubyBreaker
require "rubybreaker"
class A
typesig("foo(fixnum[to_s]) -> string")
end
Here, the typesig
method call registers foo
as a method type that takes
an object that has Fixnum#to_s
method and returns a String
. This
method is made available by importing rubybreaker
. Now, assume that an
additional code, B.new.bar(A.new,1)
, is added at the end of prog.rb
. The
subsequent run will generate the following result:
# This file is auto-generated by RubyBreaker
require "rubybreaker"
class A
typesig("foo(fixnum[to_s]) -> string")
end
class B
typesig("bar(a[foo], fixnum[to_s]) -> string")
end
Keep in mind that RubyBreaker is designed to gather type information based on the actual execution of the source program. This means the program should be equipped with test cases that have a reasonable program path coverage. Additionally, RubyBreaker assumes that test runs are correct and the program behaves correctly (for those test runs) as intended by the programmer. This assumption is not a strong requirement, but is necessary to obtain precise and accurate type information.
Using Ruby Unit Testing Framework
Instead of manually inserting the entry point indicator into the program, you can take advantage of Ruby's built-in testing framework. This is preferred to modifying the source program directly, especially for the long term program maintainability. But no worries! This method is as simple as the previous one.
require "test/unit"
require "rubybreaker" # This should come after test/unit.
class TestClassA < Test::Unit::TestCase
def setup()
RubyBreaker.breakable(Class1, Class2, ...)
...
end
# ...tests!...
end
That's it! The only requirements are to indicate to RubyBreaker which modules
and classes to "break" and to place require rubybreaker
after
require test/unit
.
Using RSpec
The requirement is same for RSpec but use before
instead of setup
to
specify which modules and classes to "break".
require "rspec"
require "rubybreaker"
describe "TestClassA Test"
before { RubyBreaker.breakable(Class1, Class2, ...) }
...
# ...tests!...
end
Using Rakefile
By running RubyBreaker along with the Rakefile, you can avoid modifying the
source program at all. (You no longer need to import rubybreaker
in the
test cases neither.) Therefore, this is the recommended way to use
RubyBreaker. The following code snippet describes how it can be done:
require "rubybreaker/task"
...
desc "Run RubyBreaker"
Rake::RubyBreakerTestTask.new(:"rubybreaker") do |t|
t.libs << "lib"
t.test_files = ["test/foo/tc_foo1.rb"]
# ...Other test task options..
t.rubybreaker_opts << "-v" # run in verbose mode
t.breakable = ["Class1", "Class2", ...] # specify what to monitor
end
Note that RubyBrakerTestTask
can simply replace your TestTask
block in
Rakefile. In fact, the former is a subclass of the latter and includes all
features supported by the latter. The only additional options are
rubybreaker_opts
which is RubyBreaker's command-line options and
breakable
which specifies which modules and classes to monitor. Since
Class1
and Class2
are not recognized by this Rakefile, you must use
string literals to specify modules and classes (and with full namespace).
If this is the route you are taking, there needs no editing of the source program whatsoever. This task will take care of instrumenting the specified modules and classes at proper moments.
Type Annotation
The annotation language used in RubyBreaker resembles the method documentation used by Ruby Standard Library Doc. Each type signature defines a method type using the name, argument types, block type, and return type. But, let us consider a simple case where there is one argument type and a return type.
class A
...
typesig("foo(fixnum) -> string")
end
In RubyBreaker, a type signature is recognized by the meta-class level
method typesig
which takes a string as an argument. This string is the
actual type signature written in the Ruby Type Annotation Language. This
language is designed to reflect the common documentation practice used by
RubyDoc. It starts with the name of the method. In the above example, foo
is currently being given a type. The rest of the signature takes a typical
method type symbol, (x) -> y
where x
is the argument type and y
is the
return type. In the example shown above, the method takes a Fixnum
object
and returns a String
object. Note that these types are in lowercase,
indicating they are objects and not modules or classes themselves.
There are several types that represent an object: nominal, duck, fusion, nil, 'any', 'or', optional, variable-length, and block. Each type signature itself represents a method type or a method list type (explained below).
Nominal Type
This is the simplest and most intuitive way to represent an object. For
instance, fixnum
is an object of type Fixnum
. Use lower-case letters and
underscores instead of camelized name. MyClass
, for example would be
my_class
in RubyBreaker type signatures. There is no particular
reason for this convention other than it is the common practice used in
RubyDoc. Use /
to indicate the namespace delimiter ::
. For example,
NamspaceA::ClassB
would be represented by namespace_a/class_b
in
a RubyBreaker type signature.
Self Type
This type is similar to the nominal type but is referring to the current
object--that is, the receiver of the method being typed. RubyBreaker will
auto-document the return type as a self type if the return value is the same
as the receiver of that call. It is also recommended to use this type over
a nominal type (if the return value is self
) since it depicts more
precise return type.
Duck Type
This type is inspired by the Ruby Language's duck typing, "if it
walks like a duck and quacks like a duck, it must be a duck." Using this
type, an object can be represented simply by a list of method names. For
example [walks, quacks]
is an object that has walks
and quacks
methods. Note that these method names do not reveal any type
information for themselves.
Fusion Type
Duck type is very flexible but can be too lenient when trying to restrict
the type of an object. RubyBreaker provides a type called the fusion type
which lists method names but with respect to a nominal type. For
example, fixnum[to_f, to_s]
represents an object that has methods to_f
and to_s
whose types are same as those of Fixnum
. This is more
restrictive (precise) than [to_f, to_s]
because the two methods must have
the same types as to_f
and to_s
methods, respectively, in Fixnum
.
Nil Type
A nil type represents a value of nil and is denoted by nil
.
Any Type
RubyBreaker also provides a way to represent an object that is compatible with
any type. This type is denoted by ?
. Use caution with this type because
it should be only used for an object that requires an arbitrary yet most
specific type--that is, ?
is a subtype of any other type, but any
other type is not a subtype of ?
. This becomes a bit complicated for
method or block argument types because of their contra-variance
characteristic. Please refer to the section Subtyping.
Or Type
Any above types can be "or"ed together, using ||
, to represent an object
that can be either one or the other. It does not represent an object that
has to be both (which is not supported by RubyBreaker).
Optional Argument Type and Variable-Length Argument Type
Another useful features of Ruby are the optional argument type and the
variable-length argument type. The former represents an argument that has a
default value (and therefore does not have to be provided). The latter
represents zero or more arguments of the same type. These are denoted by
suffices, ?
and *
, respectively.
Block Type
One of the Ruby's prominent features is the block argument. It allows
the caller to pass in a piece of code to be executed inside the callee. This
code block can be executed by the Ruby construct, yield
, or by directly
calling the call
method of the block object. In RubyBreaker, this type can
be respresented by curly brackets. For instance, {|fixnum,string| ->
string}
represents a block that takes two arguments--one Fixnum
and one
String
--and returns a String
.
RubyBreaker does supports nested blocks as Ruby 1.9 finally allows them.
However, keep in mind that RubyBreaker cannot automatically document the
block types due to yield
being a language construct rather than a method,
which means it cannot be captured by meta-programming!
Method Type and Method List Types
Method type is similar to the block type, but it represents an actual method and not a block object. It is the "root" type that the type annotation language supports, along with method list types. Method list type is a collection of method types to represent more than one type information for the given method. Why would this type be needed? Consider the following Ruby code:
def foo(x)
case x
when Fixnum
1
when String
"1"
end
end
There is no way to document the type of foo
without using a method list
type. Let's try to give a method type to foo
without a method list. The
closest we can come up with would be foo(fixnum or string) -> fixnum and
string
. But RubyBreaker does not have the "and" type in the type annotation
language because it gives me an headache! (By the way, it needs to be an
"and" type because the caller must handle both Fixnum
and String
return
values.)
It is a dilemma because Ruby programmers actually enjoy using this kind of
dynamic type checks in their code. To alleviate this headache, RubyBreaker
supports the method list type to represent different scenarios depending on
the argument types. Thus, the foo
method shown above can be given the
following method list type:
typesig("foo(fixnum) -> fixnum")
typesig("foo(string) -> string")
These two type signatures simply tell RubyBreaker that foo
has two method
types--one for a Fixnum
argument and another for a String
argument.
Depending on the argument type, the return type is determined. In this
example, a Fixnum
is returned when the argument is also a Fixnum
and a
String
is returned when the argument is also a String
. When
automatically documenting such a type, RubyBreaker looks for the (subtyping)
compatibility between the return types and "promote" the method type to a
method list type by spliting the type signature into two (or more in
subsequent "promotions").
Type System
RubyBreaker comes with its own type system to auto-document the type information. Each method in a "breakable" module is dynamically instrumented to be monitored during runtime. This monitoring code observes the types of the arguments, block, and return value of each method. Once this information is gathered, RubyBreaker will compare it to the information gathered so far. If these two types are "compatiable", RubyBreaker will choose more general type of the two. Otherwise, RubyBreaker will use the method list type to accommodate two "incompatible" types.
Subtyping and Subclassing
RubyBreaker uses subtyping to choose one from the two "compatible" types.
Two types are "compatible" if one is subtype of another. This means that the
subtype can be represented using the supertype instead. This is why
RubyBrekaer chooses the latter to document both types. RubyBreaker relies on
subclassing of Ruby to determine a subtyping relationship between two types.
For example, Fixnum
is considered to be subtype of Numeric
since the
former is subclass of the latter. (Strictly speaking, Fixnum
is not really
subtype of Numeric
because some methods are overriden in Fixnum
with
method types that are not subtype of the counterparts in Numeric
. But,
RubyBreaker is lenient and considers them compatible--that is, Numeric
can
represent any Fixnum
.
Pluggable Type System (Advanced)
Yes, RubyBreaker was designed with the replaceable type system in mind. In other words, anyone can write his own type system and plug it into RubyBreaker. Technical documentation coming soon...
Acknowledgment
The term, "Fusion Type," is first coined by Professor Michael W. Hicks at University of Maryland and represents an object using a structural type with respect to a nominal type.
Copyright
Copyright (c) 2012 Jong-hoon (David) An. All Rights Reserved.