crisp - A(nother) LISP Interpreter. Part 3: C++, objects, polymorphism, and exceptions
03 Dec 2019In this series…
Why?
The c
language is beautiful and stark. It is powerful, but lacks tools. If you want to do a lot in c
, you need to write a lot. In addition, there are no c
dev jobs going round where I am, so I’m moving to C++
. C++
has a lot of nice things built in, classes are great, templates could make writing the LispObject stuff easier and there are things like std::shared_ptr
which could negate the need for a separate garbage collector.
So.. C++
?
Yeah, C++
! Hello std::forward_list
instead of DIY linked list,
std::stack
instead of dynamic array. std::string
and
std::stringstream
over char *
and snprintf
and soooo many streams!
C++
has classes; definable structures with methods, constructors, and (importantly) destructors. This replaces the plain structs I used for everything in c
: for the objects, lists, atoms and so on. What’s even better, is that classes can be derived from one another, using polymorphs. So an object’s purpose could be fulfilled by different objects of slightly different type. We can have an object of atomic integer type, or of list type, both fit into an element of a list. We could use a polymorphic class to do this.
Next steps
I have started re-writing the interpreter in C++
from the ground
up. I’ll keep to the same lisp-y syntax and the program will do the
same jobs, but under the hood everything will be represented by an
object. I’ll start with the tokeniser and parser, where I will need
to implement LispObject
, LispAtom
, LispList
, and
LispEnvironment
then build up the evaluator and flesh out the
standard library.
As of this post, the interpreter is pretty much at the state where I
left it in c
, minus a lot of the standard library.
Tokenising a string
I converted the tokenise function over to C++
, making small
changes, but nothing major:
#define IS_WHITESPACE(C) ((C == '\n') || (C == ' ') || (C == '\t'))
#define ADD_TO_TOKENS(KW) \
new_token = new LispToken(KW);\
if (rv == NULL) {\
rv = new_token;\
}\
else {\
current_token->next = new_token;\
}\
current_token = new_token;
LispToken *tokenise(std::string input)
{
char ch, nch;
unsigned long i = 0;
std::stringstream kw_or_name_builder;
bool in_quote = 0, add_close_parens_on_break = 0, add_close_parens_on_parens = 0;
int parens_level = 0;
LispToken *rv = NULL, *current_token = NULL, *new_token = NULL;
for (i = 0, ch = input[0], nch=input[1]; i < input.length(); ch = input[++i], nch=input[i+1]) {
if (input[i] == ';') {
for (;input[i] != '\n' && i < input.length(); i++);
continue;
}
// if breaking char: space, newline, or parens
if (( IS_WHITESPACE(ch) || (ch == ')') || (ch == '(') || (ch == '\'')) && !in_quote) {
// finish reading keyword or name
if (kw_or_name_builder.str().length()) {
ADD_TO_TOKENS(kw_or_name_builder.str());
kw_or_name_builder.str("");
if (add_close_parens_on_break) {
add_close_parens_on_break = 0;
ADD_TO_TOKENS(")");
}
}
// TODO switch-case
// action needed on breaking char?
if (ch == '(') {
ADD_TO_TOKENS("(");
parens_level++;
}
if (ch == ')') {
ADD_TO_TOKENS(")");
parens_level--;
if (add_close_parens_on_parens) {
add_close_parens_on_parens = 0;
ADD_TO_TOKENS(")");
}
}
else if (ch == '\'') {
ADD_TO_TOKENS("(");
ADD_TO_TOKENS("quote");
if (nch == '('){
//debug_message("NEXT CHAR IS '('; quote list\n");
add_close_parens_on_parens = 1;
}
else if (IS_WHITESPACE(nch)) {
//error
//debug_message("NEXT CHAR IS WHITE SPACE! ERROR");
//Exception_raise("SyntaxError", "tokenise", NULL, "single quote should be before a list or other object.");
}
else {
add_close_parens_on_break = 1;
//debug_message("NEXT CHAR IS '('; quote kw\n");
}
}
}
else {
if (ch == '"')
in_quote = !in_quote;
kw_or_name_builder << ch;
}
}
if (kw_or_name_builder.str().length()) {
ADD_TO_TOKENS(kw_or_name_builder.str());
kw_or_name_builder.str();
if (add_close_parens_on_break) {
add_close_parens_on_break = 0;
ADD_TO_TOKENS(")");
}
}
return rv;
}
Main difference is the use of std::string
and std::stringstream
to builder up strings. This means that there’s none of the
accoutning of string length or memory size needed in the c
implementation.
Started at the end? You might be interested in reading the first two entries.
Questions? Comments? Get in touch on Twitter!