Invisible Unicorns

Bilingual Programming: Ruby and C

I really like the C programming language, probably for silly reasons. It was one of the first languages I really learned vs. merely used because it was what I was taught in my first year of university. Since then, I've used it for a number of little projects when I want to write relatively "low-level" code. Rust is a much nicer language, but I have a weird affinity for the, for lack of a better word, crappiness of C. That being said, on the other end of the spectrum, I also really enjoy using Ruby. Today I'm going to talk about combining C and Ruby, for fun and very little profit.

I recently wrote a small utility called "dym" (for "Did You Mean...?") that does fuzzy-matching on strings. I wanted something that would suggest what you might have meant if you typed a command incorrectly when using shell scripts. The utility would take the list of all valid commands and tell you which one was "closest" to your mis-typed command. I wrote dym in C because it mostly "thinks" in terms of bytes and C tends to be relatively good at dealing with bytes (though I admit I had some issues with array bounds and memory, initially). Recently I extracted some of the main functionality—e.g. finding the Damerau-Levenshtein edit distance between two strings—into a small libdym.a library because I wanted this functionality to be able to be used elsewhere. As an experiment, I decided to see if I could make it work in Ruby.

Ruby has a C library that can be used to create "extensions" to Ruby in C. I figure dym is a good candidate to test this with since it really only needs to export one fairly simple function. What I want to do is to implement, in Ruby, the ability to calculate the edit distance between two strings using libdym. The first thing I did, as you might expect, was search for how to write an Ruby C extension. I found a number of good articles. At this point, I'm pretty ready to start.

The first thing we need to do is decide how we want Ruby to use our C code. I want to add an edit_distance method to Ruby's built-in String class (which is easy to do since Ruby is extremely flexible). I just want to use the Damerau-Levenshtein edit distance, rather than allowing the algorithm to be selected, in the spirit of making things "simple," in the same way that Ruby's set data structure picks a reasonable algorithm for you. So, now that we know what we want, we can start working on our extension.

Before we write any new C code, we need to create an extconf.rb file which serves as something like a "pre-Makefile" for the extension. Our version of this file looks like the following (annotations added):

# Requre a library used to make Makefiles
require 'mkmf'

# Make sure we link libdym and make sure that has dym_dl_edist() in it
have_library 'dym', 'dym_dl_edist'

# Create a Makefile for the C extension
create_makefile 'dym'

That's it for the basic version of that file. Now we need to create the actual C extension. Let's build that up in stages. The first thing we need is an "init" function. The init function needs to be named EXTNAME_Init where EXTNAME is the name of our extension ("dym" in this case). Within this function we create a Ruby module called "DYM" and we give it an edist method that will calculate the Damerau-Levenshtein edit distance.

#include "ruby.h"

void dym_Init()
	/* VALUE is Ruby's object type */
	VALUE mod = rb_define_module("DYM");

	 * Create a method in the module "mod", with the name "edist", which
	 * calls the C function rbdym_dl_edist (which we will write next), and
	 * takes 2 arguments.
	rb_define_method(mod, "edist", rbdym_dl_edist, 2);

That's all we need to tell Ruby how to use our module. Now we need to write the function that wraps our C code such that Ruby can understand it.

static VALUE rbdym_dl_edist(VALUE self, VALUE s1, VALUE s2)
	int dist;
	char *cstr1;
	char *cstr2;
	VALUE rb_dist;

	/* Make sure that both arguments are strings */
	if (RB_TYPE_P(s1, T_STRING) != 1 || RB_TYPE_P(s2, T_STRING) != 1) {
		return Qnil;

	 * Convert the ruby strings into C strings (this isn't very efficient
	 * but the library interface is what it is right now.
	cstr1 = rstr2cstr(s1);
	cstr2 = rstr2cstr(s2);

	if (cstr1 == NULL || cstr2 == NULL) {
		return Qnil;

	 * Calculate the Damerau-Levenshtein edit distance between the two
	 * strings, as an integer and convert it to a numeric Ruby value.
	dist = dym_dl_edist(cstr1, cstr2);
	rb_dist = INT2NUM(dist);
	/* Clean up our dynamically-allocated C strings */

	/* Return the edit distance as a Ruby value */
	return rb_dist;

/* Convert a Ruby string to a (dynamically-allocated) C string */
static char *rstr2cstr(VALUE str)
	size_t len;
	char *cstr;

	if (RB_TYPE_P(str, T_STRING) != 1) {
		return NULL;

	len = RSTRING_LEN(str);
	cstr = calloc(len + 1, 1);
	if (cstr == NULL) {
		return NULL;
	strncpy(cstr, RSTRING_PTR(str), len);

	return cstr;

And that's it! Our C extension is done. To get this into a form where Ruby can use it, we now need to run the following commands:

$ ruby extconf.rb
$ make

In order to keep from having to keep doing that, though, I wrote up a Rakefile to run these commands:

task default: ['']

file "Makefile" do
  `ruby extconf.rb`

file '' => ['Makefile', 'dym.c'] do

task :clean do
  rm_f 'dym.o'
  rm_f ''
  rm_f 'mkmf.log'
  rm_f 'Makefile'

Now we can just run rake to prepare the extension.

The final thing that we need to do is to write the Ruby code that actually uses the C extension. That looks like the following:

require './dym'

class String
  include DYM

  def edit_dist(str)
    edist(self, str)

  def closest_match(strings)
    strings = strings.sort
    closest_dist = self.length
    closest = nil
    strings.each do |str|
      dist = edist self, str
      if dist < closest_dist
        closest_dist = dist
        closest = str

We can now use our C extension! This is what it's all been leading up to:

$ irb
irb(main):001:0> require './rbdym'
irb(main):002:0> s = 'test'
irb(main):003:0> s.edit_distance 'tsey'
=> 2

This edit distance of 2, by the way, is the result of changing the last letter and swapping e/s.

So that's it; we've successfully built a Ruby extension in C. We're officially bilingual!