Bilingual Programming: Ruby and C
David 2020-08-05
I really like the C programming language, probably for silly reasons. It was one of the first languages I really learned vs. merely used because it was what I was taught in my first year of university. Since then, I've used it for a number of little projects when I want to write relatively "low-level" code. Rust is a much nicer language, but I have a weird affinity for the, for lack of a better word, crappiness of C. That being said, on the other end of the spectrum, I also really enjoy using Ruby. Today I'm going to talk about combining C and Ruby, for fun and very little profit.
I recently wrote a small utility called
"dym" (for "Did You Mean...?") that does fuzzy-matching on strings. I wanted
something that would suggest what you might have meant if you typed a command
incorrectly when using shell scripts. The utility would take the list of all
valid commands and tell you which one was "closest" to your mis-typed command.
I wrote dym in C because it mostly "thinks" in terms of bytes and C tends to be
relatively good at dealing with bytes (though I admit I had some issues with
array bounds and memory, initially). Recently I extracted some of the main
functionality—e.g. finding the
Damerau-Levenshtein edit distance
between two strings—into a small libdym.a
library because I wanted this
functionality to be able to be used elsewhere. As an experiment, I decided to
see if I could make it work in Ruby.
Ruby has a C library
that can be used to create "extensions" to Ruby in C. I figure dym is a good
candidate to test this with since it really only needs to export one fairly
simple function. What I want to do is to implement, in Ruby, the ability to
calculate the edit distance between two strings using libdym
. The first thing
I did, as you might expect, was search for how to write an Ruby C extension. I
found a number of
good
articles. At this point, I'm
pretty ready to start.
The first thing we need to do is decide how we want Ruby to use our C code. I
want to add an edit_distance
method to Ruby's built-in String
class (which
is easy to do since Ruby is extremely flexible). I just want to use the
Damerau-Levenshtein edit distance, rather than allowing the algorithm to be
selected, in the spirit of making things "simple," in the same way that Ruby's
set
data structure picks a reasonable algorithm for you. So, now that we know
what we want, we can start working on our extension.
Before we write any new C code, we need to create an extconf.rb
file which
serves as something like a "pre-Makefile" for the extension. Our version of
this file looks like the following (annotations added):
# Requre a library used to make Makefiles
require 'mkmf'
# Make sure we link libdym and make sure that has dym_dl_edist() in it
have_library 'dym', 'dym_dl_edist'
# Create a Makefile for the C extension
create_makefile 'dym'
That's it for the basic version of that file. Now we need to create the actual
C extension. Let's build that up in stages. The first thing we need is an
"init" function. The init function needs to be named EXTNAME_Init
where
EXTNAME
is the name of our extension ("dym" in this case). Within this
function we create a Ruby module called "DYM" and we give it an edist
method
that will calculate the Damerau-Levenshtein edit distance.
#include "ruby.h"
void dym_Init()
{
/* VALUE is Ruby's object type */
VALUE mod = rb_define_module("DYM");
/*
* Create a method in the module "mod", with the name "edist", which
* calls the C function rbdym_dl_edist (which we will write next), and
* takes 2 arguments.
*/
rb_define_method(mod, "edist", rbdym_dl_edist, 2);
}
That's all we need to tell Ruby how to use our module. Now we need to write the function that wraps our C code such that Ruby can understand it.
static VALUE rbdym_dl_edist(VALUE self, VALUE s1, VALUE s2)
{
int dist;
char *cstr1;
char *cstr2;
VALUE rb_dist;
/* Make sure that both arguments are strings */
if (RB_TYPE_P(s1, T_STRING) != 1 || RB_TYPE_P(s2, T_STRING) != 1) {
return Qnil;
}
/*
* Convert the ruby strings into C strings (this isn't very efficient
* but the library interface is what it is right now.
*/
cstr1 = rstr2cstr(s1);
cstr2 = rstr2cstr(s2);
if (cstr1 == NULL || cstr2 == NULL) {
free(cstr1);
free(cstr2);
return Qnil;
}
/*
* Calculate the Damerau-Levenshtein edit distance between the two
* strings, as an integer and convert it to a numeric Ruby value.
*/
dist = dym_dl_edist(cstr1, cstr2);
rb_dist = INT2NUM(dist);
/* Clean up our dynamically-allocated C strings */
free(cstr2);
free(cstr1);
/* Return the edit distance as a Ruby value */
return rb_dist;
}
/* Convert a Ruby string to a (dynamically-allocated) C string */
static char *rstr2cstr(VALUE str)
{
size_t len;
char *cstr;
if (RB_TYPE_P(str, T_STRING) != 1) {
return NULL;
}
len = RSTRING_LEN(str);
cstr = calloc(len + 1, 1);
if (cstr == NULL) {
return NULL;
}
strncpy(cstr, RSTRING_PTR(str), len);
return cstr;
}
And that's it! Our C extension is done. To get this into a form where Ruby can use it, we now need to run the following commands:
$ ruby extconf.rb
$ make
In order to keep from having to keep doing that, though, I wrote up a Rakefile to run these commands:
task default: ['dym.so']
file "Makefile" do
`ruby extconf.rb`
end
file 'dym.so' => ['Makefile', 'dym.c'] do
`make`
end
task :clean do
rm_f 'dym.o'
rm_f 'dym.so'
rm_f 'mkmf.log'
rm_f 'Makefile'
end
Now we can just run rake
to prepare the extension.
The final thing that we need to do is to write the Ruby code that actually uses the C extension. That looks like the following:
require './dym'
class String
include DYM
def edit_dist(str)
edist(self, str)
end
def closest_match(strings)
strings = strings.sort
closest_dist = self.length
closest = nil
strings.each do |str|
dist = edist self, str
if dist < closest_dist
closest_dist = dist
closest = str
end
end
closest
end
end
We can now use our C extension! This is what it's all been leading up to:
$ irb
irb(main):001:0> require './rbdym'
irb(main):002:0> s = 'test'
irb(main):003:0> s.edit_distance 'tsey'
=> 2
This edit distance of 2, by the way, is the result of changing the last letter and swapping e/s.
So that's it; we've successfully built a Ruby extension in C. We're officially bilingual!