crisp - A(nother) LISP Interpreter. Part 2: Objects and Parsing

Tags: software-dev · C

In this series…

Intro

This is my second post about crisp, my (WIP) lisp interpreter.

As of the end of the last post, we have a tokenisation function which takes a string and chunks it into important bits of syntax: parens, atoms, and symbols. Next, we need to transform this list into a tree describing the flow of data through the program, as a lisp interpreter would see it: the Abstract Syntax Tree. In lisp, this is rather straight forward as the syntax enforces a tree-like structure, so we can also directly execute this AST as a tree of lisp data and functions.

In this post I will show my version of a lisp parser which takes the tokens and transforms into AST. The AST nodes will actually be lisp data itself which can be directly evaluated. The form of the node presents our first problem: how should lisp data be represented (in our c backend)?

The next task is forming the tree (composed of lisp data nodes) from the tokens, correctly fitting data into lists and sub-lists.

Representing lisp data in c

Our lisp data can be represented as an object which can be an atom (-> value), a symbol (-> a variable, which evals to an atom/value), or a list. c has no object form, but it does have structures and linked lists. We can create a list as a singly linked list, so that each element in the list points to the object it holds and to the next element in the list.

struct LispListElement {
  LispObject *value;
  LispListElement *next;
};

The object then contains the data, whether that is another list, an atomistic value, or a symbol. This is another struct:

struct LispObject {
  LispAtom *value_atom;
  char *value_symbol;
  LispListElement *value_list;
};

Then, the atom. A lisp atom is either a number or a string. Numbers are floating point or integer which complicates matters, but they should be freely inter-convertible.

struct LispAtom {
  double value_float;
  long value_int;
  char *value_string;
};

This is the start of my “types.h” file, a header containing information on each type.

Parsing a list of tokens into a tree

Taking the list of tokens and converting to a tree is actually very simple. In c, this mostly comes down to an accounting problem. We need a stack to hold the open lists, and we need to either count items in a list, or vary the size of lists as we add them. We’re using singly linked lists; so we can grow or shrink at will. The stack we can implement as a dyamic array, resized as we add lists to the stack.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#include "parse.h"
#include "ast.h"
#include "object.h"
#include "exception.h"
#include "debug.h"
#include "list.h"



LispListElement *parse(char tokens, int n_tokens)
{
  LispListElement *root = LispList_new_element();
  LispObject *current_list = NULL, *new = NULL, open_lists= NULL;
  int n_open_lists = 0;

  for (int i = 0; i < n_tokens; i++) {

    if (strcmp(tokens[i], "(") == 0) {
      new = LispObject_new_list();

      if (current_list != NULL) {
        open_lists = realloc(open_lists, (++n_open_lists)*sizeof(LispObject *));
        open_lists[n_open_lists-1] = current_list;
        LispList_add_object_to_list(current_list->value_list, new);
      }
      else {
        LispList_add_object_to_list(root, new);
      }

      current_list = new;
    }
    else if (strcmp(tokens[i], ")") == 0) {

      if (n_open_lists) {
        current_list = open_lists[(--n_open_lists)];
        open_lists = realloc(open_lists, n_open_lists*sizeof(LispObject *));
      }
      else {
        current_list = NULL;
        open_lists = NULL;
      }

    }
    else {
      new = LispObject_new_guess_type(tokens[i]);

      if (current_list != NULL) {
        LispList_add_object_to_list(current_list->value_list, new);
      }
      else {
        LispList_add_object_to_list(root, new);
      }
    }

  }

  free(open_lists);
  return root;
}

Continue reading


Questions? Comments? Get in touch on Twitter!