Module: CGenerator

Defined in:: lib/cgen/cgen.rb

Overview

The CGenerator module is a framework for dynamically generating C extensions. It is a bit like Perl’s inline but intended for a different purpose: managing incremental, structured additions to C source files, and compiling the code and loading the library just in time for execution. Whereas inline helps you write a C extension, CGenerator helps you write a Ruby program that generates C extensions. To put it another way, this is a Ruby interface to the Ruby C API.

The original use of CGenerator was as the back end of a compiler for mathematical expressions in C-like syntax involving limited Ruby subexpressions. In that case, CGenerator allowed the compiler to think about the syntax and semantics of the input expressions without having to worry about the high-level structure of the generated .c and .h files.

One potential use is quick-turnaround development and testing of C code, possibly using Ruby as a driver environment; the library under construction needn’t be Ruby-specific. If SWIG didn’t support Ruby, this framework could be the starting point for a program that generates wrapper code for existing libraries. Finally, a Ruby package that includes C extensions could benefit from being able to use Ruby code to dynamically specify the contents and control the build process during installation.

The CGenerator framework consists of two base classes, Accumulator and Template. Think of accumulators as blanks in a form and templates as the form around the blanks, except that accumulators and templates can nest within each other. The base classes have subclasses which hierarchically decompose the information managed by the framework. This hierarchy is achieved by inheritance along the parent attribute, which is secondary to subclass inheritance.

Templates

The main template in the CGenerator module is Library. It has accumulators for such constructs as including header files, declaring variables, declaring Ruby symbols, declaring classes, defining functions, and defining structs. Some accumulators, such as those for adding function and struct definitions, return a new template each time they are called. Those templates, in turn, have their own accumulators for structure members, function arguments, declarations, initialization, scope, etc.

Library templates

A Library corresponds to one main C source file and one shared C library (.so or .dll). It manages the Init_library code (including registration of methods), as well as user-specified declaration and initialization in the scope of the .c file and its corresponding .h file. All files generated in the process of building the library are kept in a directory with the same name as the library. Additional C files in this directory will be compiled and linked to the library.

Each library is the root of a template containment hierarchy, and it alone has a #commit method. After client code has sent all desired fragments to the accumulators, calling the commit method uses the structure imposed by the sub-templates of the library to joins the fragments into two strings, one for the .h file and one for the .c file. Then each string is written to the corresponding file (only if the string is different from the current file contents), and the library is compiled (if necessary) and loaded.

Function templates

Function templates are used to define the functions in a Library. The base class, CGenerator::Function, generates a function (by default, static) in the library without registering it with Ruby in any way.

The CGenerator::RubyFunction templates define the function as above and also register the function as an instance method, module function, or singleton method of a specified class or module, or as a global function (a private method of Kernel).

Client code does not instantiate these templates directly, but instead uses the library’s #define accumulator methods, which return the new template.

The function template for the library’s initialization function can be accessed using the library’s #init_library_function method, although direct access to this template is typically not needed. (Use the library’s #setup method to write code to the #init_library_function.)

Struct templates

A struct template generates a typedef for a C struct. It can be external, in which case it is written to the .h file. It has a #declare accumulator for adding data members.

Accumulators

Accumulators are a way of defining a hierarchical structure and populating it with data in such a way that the data can be serialized to a string at any point during the process without side effects. Templates are Accumulators which contain other accumulators and have convenience methods for accessing them from client code.

Accumulators can be fairly unstructured–they just accumulate in sequence whatever is sent to them, possibly with some filtering, which may include other accumulators. Templates are usually more more structured. In general, only Templates can be parents; other accumulators set the #parent of each accumulated item to be the accumulator’s #parent, simplifying the #parent hierarchy.

Accumulators are responsible for the format of each accumulated item, for joining the items to form a string when requested to do so, and for doing any necessary preprocessing on the items (e.g., discarding duplicates).

From the point of view of client code, accumulators are methods for “filling in the blanks” in templates. Client code doesn’t access the accumulator object directly, only through a method on the template. For example:

lib.declare :global_int_array =>
              'int global_int_array[100]',
            :name =>
              'char *name'

is used to access the “declare” accumulator of the library (which is actually delegated to a file template).

Providing a key for each declaration (in the example, the keys are symbols, but they can be any hash keys) helps CGenerator reject repeated declarations. (Redundancy checking by simple string comparison is inadequate, because it would allow two declarations of different types, but the same name, or two declarations with insignificant whitespace differences.)

The basic Accumulator class adds fragments to an array in sequence. When converted to a string with #to_s, it joins the fragments with newline separators. These behaviors change as needed in the subclasses. Note that the accumulated items need not all be strings, they need only respond to to_s.

Return values of accumulators are not very consistent: in general, an accumulator returns whatever is needed for the caller to continue working with the thing that was just accumulated. It might be a template which supports some other accumulators, or it might be a string which can be inserted in C code.

Some accumulators take existing Ruby objects as an argument. These accumulators typically return, as a Ruby symbol, the C identifier that has been defined or declared to refer to that Ruby object. This can be interpolated into C code to refer to the Ruby object from C.

Note about argument order: Since hashes are unordered, passing a hash of key-value pairs to #declare or similar methods will not preserve the textual ordering. Internally, cgen sorts this hash into an array of pairs so that at least the result is deterministic, reducing recompilation. One can force an argument order by using an array of pairs.

lib.declare [[:global_int_array,
               'int global_int_array[100]'],
             [:name =>
               'char *name']

Alternately, simply break the declaration into multiple declares.

C code output

Format

Some effort is made to generate readable code. Relative tabbing within code fragments is preserved. One goal of CGenerator is producing Ruby extensions that can be saved and distributed with little or no modification (as opposed to just created and loaded on the fly).

Use of C identifiers

CGenerator attempts to generate C identifiers in non-conflicting ways… (prove some nice property)

Usage

Create a library with:

lib = CGenerator::Library.new "my_lib_name"

The name must be an identifier: /[A-Za-z0-9_]*/.

It is useful to keep a reference to lib around to send define and declare messages to.

Example

require 'cgen'

lib = CGenerator::Library.new "sample_lib"

class Point; end

lib.declare_extern_struct(:point).instance_eval {
  # make it extern so we can see it from another lib
  declare :x => "double x"
  declare :y => "double y"
}

lib.define_c_global_function(:new_point).instance_eval {
  arguments "x", "y"        # 'VALUE' is assumed
  declare :p => "point *p"
  declare :result => "VALUE result"
      # semicolons are added automatically
  body %{
    result = Data_Make_Struct(#{lib.declare_class Point}, point, 0, free, p);
    p->x = NUM2DBL(x);
    p->y = NUM2DBL(y);

//  might want to do something like this, too:
//  rb_funcall(result, #{lib.declare_symbol :initialize}, 0);
  }
  returns "result"
      # can put a return statement in the body, if preferred
}

for var in [:x, :y]   # metaprogramming in C!
  lib.define_c_method(Point, var).instance_eval {
    declare :p => "point *p"
    body %{
      Data_Get_Struct(self, point, p);
    }
    returns "rb_float_new(p->#{var})"
  }
end

# A utility function, available to other C files
lib.define_c_function("distance").instance_eval {
  arguments "point *p1", "point *p2"
  return_type "double"
  scope :extern
  returns "sqrt(pow(p1->x - p2->x, 2) + pow(p1->y - p2->y, 2))"
  include "<math.h>"
  # The include accumulator call propagates up the parent
  # hierarchy until something handles it. In this case,
  # the Library lib handles it by adding an include
  # directive to the .c file. This allows related, but
  # separate aspects of the C source to be handled in
  # the same place in the Ruby code. We could also have
  # called include directly on lib.
}

lib.define_c_method(Point, :distance).instance_eval {
  # no name conflict between this "distance" and the previous one,
  # because "method" and "Point" are both part of the C identifier
  # for this method
  arguments "other"
  declare :p => "point *p"
  declare :q => "point *q"
  body %{
    Data_Get_Struct(self, point, p);
    Data_Get_Struct(other, point, q);
  }
  returns "rb_float_new(distance(p, q))"
}

lib.commit # now you can use the new definitions

p1 = new_point(1, 2)
puts "p1: x is #{p1.x}, y is #{p1.y}"

p2 = new_point(5, 8)
puts "p2: x is #{p2.x}, y is #{p2.y}"

puts "distance from p1 to p2 is #{p1.distance p2}"

Output is:

p1: x is 1.0, y is 2.0
p2: x is 5.0, y is 8.0
distance from p1 to p2 is 7.211102551

That’s a lot of code to do a simple operation, compared with an Inline-style construct. CGenerator’s value shows up with more complex tasks. The sample.rb file extends this example.

Notes

My first Ruby extension was built with this module. That speaks well of the

elegance, simplicity, and utter coolness of Ruby and its extension architecture. Thanks matz!

Some accumulators, like declare_symbol and declare_class, operate by default

on the file scope, even if called on a method definition, so the declarations are shared across the library. This reduces redundancy with no disadvantage. (In general, accumulator calls propagate first thru the inheritance hierarchy and then thru the parent Template hierarchy.)

Note that accumulators can nest within accumulators, because #to_s is

applied recursively. This is very useful (see Library#initialize for example). This defines a many-to-one dataflow pattern among accumulators. A one-to-many dataflow pattern arises when a method calls several accumulators, as in #define_c_method and kin.

CGenerator makes no attempt to check for C syntax errors in code supplied to

the accumulators.

It may help to think of templates as heterogeneous collections, like

structs, and accumulators as homogeneous collections, like arrays.

The containment hierarchy is represented by the #parent accessor in

Accumulators and Templates. It provides a secondary inheritance of calls to accumulators. (As a result, doing #include at the function level adds an #include directive at the file level.)

The basic Template and Accumulator class are more general than C source, or

even strings. The Module#inherit method is also reusable.

CGenerator does not allow more than one C function with the same name. This

could be changed fairly easily.

You can subclass Library and override #extconf to do more complex processing

than just #create_makefile.

Calling a #to_s method on an accumulator more than once has no unexpected

side effects. It can be called at any time for a snapshot of the whole library or of a subtemplate.

CGenerator is probably not very efficient, so it may not be useful with

large amounts of C code.

Library#commit will try to commit even if already comitted (in which case it

raises a CommitError) or if the lib is empty. Use #committed? and #empty? to check for these cases. (Should these checks, or just the latter, be automatic?)

Library#commit first reads the .c and .h file and checks for changes. If

there are none, it doesn’t write to the file. If neither file gets written to, make won’t need to compile them…

CGenerator generates header file entries for any non-static functions or

data. This can be used for communication between files in the library without using Ruby calls, and to provide an API for other C libraries and executables.

Accumulator#inspect is a nice hierarchy-aware inspector.

To do

Automatically generate inner and outer functions as in Michael Neumann’s

cplusruby. Similarly, should there be another field to refer to a struct of function pointers, so that C code can call the inner functions without funcall?

Try CFLAGS for optimization: with gcc, -march, -O3, -fomit-frame-pointer
Rename to something less generic (cgen –> “sagehen”?)
Option to target ruby/ext dir: generate MANIFEST file, but no Makefile. What

about depend? how to generate it in installation-independent format?

Let user set dir to build in, rather than rely on chdir, which is not thread

safe.

Instead of using an external program to makedepend, do it manually based on

include operations? (Might have to do this anyway on mswin.)

Investigate Tiny C–Linux only, but fast, and libtcc allows dynamic codegen.

Maybe best used in “develop” mode, rather than for production code (no -O).

Option in define_c_method to make method private/protected (see

rb_define_private/protected_method in intern.h).

Optimization: declare_symbol and declare_module should do less work if the

declaration has already neen done.

Macros, e.g. something for rb_funcall that does the declare_class for you.
Extend c_array_args features to rb_array_args and fixed length arglists.
Exception if modify descendant of Library after committed. Accumulators

always notify parent before admitting any changes.

Freeze data structures after commit?
More wrappers: define_class, globals (in C and in Ruby), funcalls,

iterators, argument type conversion, etc. Really, all of Chapter 17 of Thomas & Hunt.

Make commit happen automatically when the first call is made to a method in

the library. (Use alias, maybe, since method_missing won’t work–won’t let you override.)

Finer granularity in accumulators. For example, #init could take a (lvalue,

rvalue) pair, which would allow it to detect initialization of the same var with different values.

make this into Inline for ruby:

Module#define_c_method (“name”) { … }

(use instance_eval, so that accumulators can be used in the block?) The main drawback is that no Library is specified, so where does it go? (Actually, CShadow solves this problem, if you don’t mind having a struct as overhead.)

investigate unloading a .so/.dll. Or: maybe rb_define_* can be called again,

but in a different library (append version number to the lib name, but not to the dir name). See Ruby/DL in RAA. (See dln.c, eval.c, ruby.c in ruby source. It all seems possible, but a bit of work.)

parser/generator for (at first) simple ruby code, like ‘@x.y’: one option

would be to use init to define a Proc and use setup to call the Proc

check ANSI, check w/ other compilers
Improve space recognition in the tab routines (check for t and handle

intelligently, etc.).

Formalize the relation between templates and accumulators. Make it easier

for client code to use its own templates and accumulators.

Double-ended accumulators (add_at_end vs. add_at_beginning, or push vs.

unshift).

Automatically load/link other dynamic or static libs in the same dir. For

static, use ‘have_library’ in #extconf; see p.185 of pickaxe. For dynamic, use Ruby/DL from RAA, or just require. (Currently, this can be done manually by subclassing and overriding extconf.)

More thorough checking for assert_uncommitted. Currently, just a few

top-level methods (Library#commit and some of the define methods) check if the library has already been committed. Ideally, #commit would freeze all accumulators. But then the problem is how to report a freeze exception in a way that makes clear that the problem is really with commit.

Defined Under Namespace

Modules: KeyAccumulator, SetAccumulator Classes: Accumulator, CFile, CFragment, Function, GlobalFunction, Library, Method, MethodPrototype, ModuleFunction, Prototype, RubyFunction, SingletonMethod, Structure, Template

Constant Summary collapse

VERSION =

'0.16.11'

OpName =

{
  '<'   => :op_lt,
  '<='  => :op_le,
  '>'   => :op_gt,
  '>='  => :op_ge,
  '=='  => :op_eqeq
}

Class Method Summary collapse

.make_c_name(s) ⇒ Object

Generates a unique C itentifier from the given Ruby identifier, which may include /[@$?!]/, ‘::’, and even ‘.’.
.translate_ruby_identifier(s) ⇒ Object

Class Method Details

.make_c_name(s) ⇒ `Object`

Generates a unique C itentifier from the given Ruby identifier, which may include /[@$?!]/, ‘::’, and even ‘.’. (Some special globals are not yet supported: $: and $-I, for example.)

It is unique in the sense that distinct Ruby identifiers map to distinct C identifiers. (Not completely checked. Might fail for some really obscure cases.)

# File 'lib/cgen/cgen.rb', line 2009

def CGenerator.make_c_name s
  s = s.to_s
  OpName[s] || translate_ruby_identifier(s)
end

.translate_ruby_identifier(s) ⇒ `Object`

# File 'lib/cgen/cgen.rb', line 2014

def CGenerator.translate_ruby_identifier(s)
  # For uniqueness, we use a single '_' to indicate our subs
  # and translate pre-existing '_' to '__'
  # It should be possible to write another method which 
  # converts the output back to the original.
  c_name = s.gsub(/_/, '__')


  # Ruby identifiers can include prefix $, @, or @@, or suffix ?, !, or =
  # and they can be [] or []=
  c_name.gsub!(/\$/, 'global_')
  c_name.gsub!(/@/, 'attr_')
  c_name.gsub!(/\?/, '_query')
  c_name.gsub!(/!/, '_bang')
  c_name.gsub!(/=/, '_equals')
  c_name.gsub!(/::/, '_')
  c_name.gsub!(/\[\]/, '_brackets')

  # so that some Ruby expressions can be associated with a name,
  # we allow '.' in the str. Eventually, handle more Ruby exprs.
  c_name.gsub!(/\./, '_dot_')

  # we should also make an attempt to encode special globals
  # like $: and $-I

  unless c_name =~ /\A[A-Za-z_]\w*\z/
    raise SyntaxError,
      "Cgen's encoding cannot handle #{s.inspect}; " +
      "best try is #{c_name.inspect}."
  end

  c_name.intern
end

Module: CGenerator

Overview

Overview

Templates

Library templates

Function templates

Struct templates

Accumulators

C code output

Format

Use of C identifiers

Usage

Example

Notes

To do

Defined Under Namespace

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.make_c_name(s) ⇒ Object

.translate_ruby_identifier(s) ⇒ Object

.make_c_name(s) ⇒ `Object`

.translate_ruby_identifier(s) ⇒ `Object`