Invisible Unicorns

Bilingual Programming: Ruby and C

I really like the C programming language, probably for silly reasons. It was one of the first languages I really learned vs. merely used because it was what I was taught in my first year of university. Since then, I've used it for a number of little projects when I want to write relatively "low-level" code. Rust is a much nicer language, but I have a weird affinity for the, for lack of a better word, crappiness of C. That being said, on the other end of the spectrum, I also really enjoy using Ruby. Today I'm going to talk about combining C and Ruby, for fun and very little profit.

I recently wrote a small utility called "dym" (for "Did You Mean...?") that does fuzzy-matching on strings. I wanted something that would suggest what you might have meant if you typed a command incorrectly when using shell scripts. The utility would take the list of all valid commands and tell you which one was "closest" to your mis-typed command. I wrote dym in C because it mostly "thinks" in terms of bytes and C tends to be relatively good at dealing with bytes (though I admit I had some issues with array bounds and memory, initially). Recently I extracted some of the main functionality—e.g. finding the Damerau-Levenshtein edit distance between two strings—into a small libdym.a library because I wanted this functionality to be able to be used elsewhere. As an experiment, I decided to see if I could make it work in Ruby.

Ruby has a C library that can be used to create "extensions" to Ruby in C. I figure dym is a good candidate to test this with since it really only needs to export one fairly simple function. What I want to do is to implement, in Ruby, the ability to calculate the edit distance between two strings using libdym. The first thing I did, as you might expect, was search for how to write an Ruby C extension. I found a number of good articles. At this point, I'm pretty ready to start.

The first thing we need to do is decide how we want Ruby to use our C code. I want to add an edit_distance method to Ruby's built-in String class (which is easy to do since Ruby is extremely flexible). I just want to use the Damerau-Levenshtein edit distance, rather than allowing the algorithm to be selected, in the spirit of making things "simple," in the same way that Ruby's set data structure picks a reasonable algorithm for you. So, now that we know what we want, we can start working on our extension.

Before we write any new C code, we need to create an extconf.rb file which serves as something like a "pre-Makefile" for the extension. Our version of this file looks like the following (annotations added):

# Requre a library used to make Makefiles
require 'mkmf'

# Make sure we link libdym and make sure that has dym_dl_edist() in it
have_library 'dym', 'dym_dl_edist'

# Create a Makefile for the C extension
create_makefile 'dym'

That's it for the basic version of that file. Now we need to create the actual C extension. Let's build that up in stages. The first thing we need is an "init" function. The init function needs to be named EXTNAME_Init where EXTNAME is the name of our extension ("dym" in this case). Within this function we create a Ruby module called "DYM" and we give it an edist method that will calculate the Damerau-Levenshtein edit distance.

#include "ruby.h"

void dym_Init()
{
	/* VALUE is Ruby's object type */
	VALUE mod = rb_define_module("DYM");

	/*
	 * Create a method in the module "mod", with the name "edist", which
	 * calls the C function rbdym_dl_edist (which we will write next), and
	 * takes 2 arguments.
	 */
	rb_define_method(mod, "edist", rbdym_dl_edist, 2);
}

That's all we need to tell Ruby how to use our module. Now we need to write the function that wraps our C code such that Ruby can understand it.

static VALUE rbdym_dl_edist(VALUE self, VALUE s1, VALUE s2)
{
	int dist;
	char *cstr1;
	char *cstr2;
	VALUE rb_dist;

	/* Make sure that both arguments are strings */
	if (RB_TYPE_P(s1, T_STRING) != 1 || RB_TYPE_P(s2, T_STRING) != 1) {
		return Qnil;
	}

	/*
	 * Convert the ruby strings into C strings (this isn't very efficient
	 * but the library interface is what it is right now.
	 */
	cstr1 = rstr2cstr(s1);
	cstr2 = rstr2cstr(s2);

	if (cstr1 == NULL || cstr2 == NULL) {
		free(cstr1);
		free(cstr2);
		return Qnil;
	}

	/*
	 * Calculate the Damerau-Levenshtein edit distance between the two
	 * strings, as an integer and convert it to a numeric Ruby value.
	 */
	dist = dym_dl_edist(cstr1, cstr2);
	rb_dist = INT2NUM(dist);
	
	/* Clean up our dynamically-allocated C strings */
	free(cstr2);
	free(cstr1);

	/* Return the edit distance as a Ruby value */
	return rb_dist;
}

/* Convert a Ruby string to a (dynamically-allocated) C string */
static char *rstr2cstr(VALUE str)
{
	size_t len;
	char *cstr;

	if (RB_TYPE_P(str, T_STRING) != 1) {
		return NULL;
	}

	len = RSTRING_LEN(str);
	cstr = calloc(len + 1, 1);
	if (cstr == NULL) {
		return NULL;
	}
	strncpy(cstr, RSTRING_PTR(str), len);

	return cstr;
}

And that's it! Our C extension is done. To get this into a form where Ruby can use it, we now need to run the following commands:

$ ruby extconf.rb
$ make

In order to keep from having to keep doing that, though, I wrote up a Rakefile to run these commands:

task default: ['dym.so']

file "Makefile" do
  `ruby extconf.rb`
end

file 'dym.so' => ['Makefile', 'dym.c'] do
  `make`
end

task :clean do
  rm_f 'dym.o'
  rm_f 'dym.so'
  rm_f 'mkmf.log'
  rm_f 'Makefile'
end

Now we can just run rake to prepare the extension.

The final thing that we need to do is to write the Ruby code that actually uses the C extension. That looks like the following:

require './dym'

class String
  include DYM

  def edit_dist(str)
    edist(self, str)
  end

  def closest_match(strings)
    strings = strings.sort
    closest_dist = self.length
    closest = nil
    strings.each do |str|
      dist = edist self, str
      if dist < closest_dist
        closest_dist = dist
        closest = str
      end
    end
    closest
  end
end

We can now use our C extension! This is what it's all been leading up to:

$ irb
irb(main):001:0> require './rbdym'
irb(main):002:0> s = 'test'
irb(main):003:0> s.edit_distance 'tsey'
=> 2

This edit distance of 2, by the way, is the result of changing the last letter and swapping e/s.

So that's it; we've successfully built a Ruby extension in C. We're officially bilingual!