JSON Parser for MRuby or how to define classes and exceptions in C

mruby, Ruby Leave a comment

MRuby is progressing quite rapidly, it is now possible to implement a real REPL, see github. So, it is time to continue my efforts to integrate MRuby into ArangoDB. ArangoDB currently uses JavaScript to implement transactions and enrich the database with small Actionlets, i. e. functions that take a HTTP request and produce a HTTP response. Additionally, there is a shell, which allows you to administrate the server.

I want to be able to also use MRuby for these jobs. These easiest part is the interactive shell. It requires the following components:

  • a JSON parser and stringifier
  • a HTTP client
  • a REPL

MRuby now comes with a simple REPL. However, there are still some issues with parsing and readline support, so for the time being I stick to the approach described in my last post.

In principle the JSON parser can be implemented in Ruby itself using one of the known json packages. As ArangoDB comes with a parser, I will be using the built-in parser. The same is presumable true for the HTTP client.

In this blog post I want to describe what I’ve done to integrate the JSON parser into MRuby. I hope that will help others to write C/C++ extensions.

The last post describes how to integrate C code into MRuby. Let me go into more details.

I need a class ArangoJson with a method parse which takes a string as argument, parses the JSON structures and returns a Ruby hash.

  void TRI_InitMRUtils (MR_state_t* mrs) {
    struct RClass *rcl;
 
    rcl = mrb_define_class(&mrs->_mrb, "ArangoJson", mrs->_mrb.object_class);
 
    mrb_define_class_method(&mrs->_mrb, rcl, "parse", MR_JsonParse, ARGS_REQ(1));
  }

mrb_define_class defines a new class called ArangoJson which is a sub-class of Object. mrb_define_class_method defines a new C method in this class which takes 1 argument (but see below).

  static mrb_value MR_JsonParse (mrb_state* mrb, mrb_value self) {
    char* errmsg;
    char* s;
    size_t l;
    TRI_json_t* json;
 
    res = mrb_get_args(mrb, "s", &s, &l);
 
    if (s == NULL) {
      return mrb_nil_value();
    }
 
    json = TRI_Json2String(TRI_UNKNOWN_MEM_ZONE, s, &errmsg);
 
    if (json == NULL) {
      ...;
    }
 
    return MR_ObjectJson(mrb, json);
  }

mrb_get_args extracts a string argument from the argument list and stores the result in s. As far as I can see, you do not need to free the returned string.

Everything works as expected

  fceller@kendenich:~/ArangoDB> ./arangoirb
  arangoirb> ArangoJson.parse '{ "a" : 1 }'
  {"a"=>1.0}

Stragely, if you pass a number instead of a string, the program crashs

  arangoirb> ArangoJson.parse 1
  Segmentation fault: 11

Similarly, passing no argument does not raise an exception. So, I not sure what the meaning the ARGS_REQ parameter is. I hope the MRuby team can shed some light on it.

The conversion function MR_ObjectJson is straight forward and shows how to use the various data types in MRuby.

  static mrb_value MR_ObjectJson (mrb_state* mrb, TRI_json_t const* json) {
    switch (json->_type) {
      case TRI_JSON_UNUSED:
        return mrb_nil_value();
 
      case TRI_JSON_NULL:
        return mrb_nil_value();
 
      case TRI_JSON_BOOLEAN:
        return json->_value._boolean ? mrb_true_value() : mrb_false_value();
 
      case TRI_JSON_NUMBER:
        return mrb_float_value(json->_value._number);
 
      case TRI_JSON_STRING:
        return mrb_str_new(mrb, json->_value._string.data, json->_value._string.length - 1);
 
      case TRI_JSON_ARRAY: {
        size_t n;
        size_t i;
        mrb_value a;
        TRI_json_t* sub;
        mrb_value key;
        mrb_value val;
 
        n = json->_value._objects._length;
        a = mrb_hash_new_capa(mrb, n);
 
        for (i = 0;  i < n;  i += 2) {
          sub = (TRI_json_t*) TRI_AtVector(&json->_value._objects, i);
 
          if (sub->_type != TRI_JSON_STRING) {
            continue;
          }
 
          key = mrb_str_new(mrb, sub->_value._string.data, sub->_value._string.length - 1);
          sub = (TRI_json_t*) TRI_AtVector(&json->_value._objects, i + 1);
          val = MR_ObjectJson(mrb, sub);
 
          mrb_hash_set(mrb, a, key, val);
        }
 
        return a;
      }
 
      case TRI_JSON_LIST: {
        size_t n;
        size_t i;
        mrb_value a;
        TRI_json_t* elm;
        mrb_value val;
 
        n = json->_value._objects._length;
        a = mrb_ary_new_capa(mrb, n);
 
        for (i = 0;  i < n;  ++i) {
          elm = (TRI_json_t*) TRI_AtVector(&json->_value._objects, i);
          val = MR_ObjectJson(mrb, elm);
 
          mrb_ary_set(mrb, a, i, val);
        }
 
        return a;
      }
    }
 
    return mrb_nil_value();
  }

This leaves the error handling. I wanted my own exception. Luckily it is quite easy to define:

 void TRI_InitMRUtils (MR_state_t* mrs) {
   mrs->_arangoError = mrb_define_class(&mrs->_mrb, "ArangoError", mrs->_mrb.eStandardError_class);
 }

The only problem is, where to store the defined class. mrb_state has no custom data pointer – at least I did not find any. So I used the following trick: Instead of using a mrb_state as returned by mrb_open, I use a slightly large data structure with my extensions at the end.

  typedef struct MR_state_s {
    struct mrb_state _mrb;
    struct RClass* _arangoError;
  }
  MR_state_t;
 
  MR_state_t* MR_create ()
    MR_state_t mrs;
 
    mrb_state* mrb = mrb_open();
 
    memcpy(&mrs, mrb, sizeof(mrb_state));
 
    mrs._arangoError = NULL;
 
    TRI_InitMRUtils(&mrs);
  }

So, creating an exception with two custom instance variables @error_num and @error_message

  mrb_value MR_ArangoError (mrb_state* mrb, int errNum, char const* errMessage) {
    MR_state_t* mrs;
    mrb_value exc;
    mrb_value val;
    mrb_sym id;
 
    mrs = (MR_state_t*) mrb;
    exc = mrb_exc_new(mrb, mrs->_arangoError, errMessage, strlen(errMessage));
 
    id = mrb_intern(mrb, "@error_num");
    val = mrb_fixnum_value(errNum);
    mrb_iv_set(mrb, exc, id, val);
 
    id = mrb_intern(mrb, "@error_message");
    val = mrb_str_new(mrb, errMessage, strlen(errMessage));
    mrb_iv_set(mrb, exc, id, val);
 
    return exc;
  }

Because Ruby has no closed, but an open class definition, one can now use a Ruby definition for the getter extending the class defined in C.

  arangoirb> class ArangoError
    def error_num
      return @error_num
    end
    def error_message
      return @error_message
    end
  end

What remains to do to finish the MRuby shell is

  • implement the “require” function to load Ruby files
  • add a wrapper to the HTTP client

About Frank Celler

Frank is both entrepreneur and backend developer, developing mostly memory databases for two decades. He is the lead developer of ArangoDB and co-founder of triAGENS. Try to challenge Frank asking him questions on C, C++ and MRuby. Besides Frank organizes Cologne’s nosql group & nosql conferences.