Designing a Programming Language: Part II

Welcome back to our series on creating your own programming language! In this article, we’ll be learning more about designing a language. I’ll be outlining a simple language I’ll call “Pangolin”, after the oddly adorable animal, and we’ll discuss the syntax and decisions that go into every step of the language. Separately, there will be interpreters for Pangolin available in the GitHub repository linked at the end of the article. These interpreters, written in several languages, will feature comments explaining how it works and how to adapt the techniques it describes for making your own language.

As a refresher, an interpreter is a program that runs your programming language. It has to:

Read the code of the program it will run
Turn that code, which is just a text file, into structured data called the abstract syntax tree
Use the abstract syntax tree to run, or evaluate, the program

In other words, an interpreter converts the code into something the computer can understand and execute.
So let’s start by talking more about how to design a language. After all, there are a lot of choices to make!
Most programming languages you might’ve seen before have a lot of different features. Things like:

While loops for repeating an action until something changes
For loops for repeating an action a number of times
Variables for storing data
If statements for making choices
Objects or arrays for storing complex data
Functions for structuring code you’ll run again and again
Modules or namespaces to make writing library code easier
Types to ensure that data is used correctly

But do we need all of these things in a programming language? Why do most languages have them in some form?
So in general, programming languages are what we call “Turing complete”. That means that they are powerful enough to describe anything a computer could even theoretically do. In fact, you’ve probably seen so many Turing complete languages it might be hard to imagine what a language that isn’t Turing complete would even look like!

Imagine having a language where you don’t have functions, while loops, or for loops. Instead, you can write exactly one action on each line of code. The number of steps your program can run is exactly the lines of code you write. This means you couldn’t make things like operating systems that have to run indefinitely, or user interfaces in webpages or games.

The easiest way to make a language Turing complete is to include some kind of while loop or to have functions. You basically just need some way to do an action over and over and over again until some condition is met. You don’t even need both of these, just one or the other is good enough. The lambda calculus, arguably the oldest programming language even though it predated computers, had only functions. The original version of Fortran only had loops.

If everything beyond functions and variables doesn’t increase the power of the programming language, then why is the syntax of most programming languages so complicated? It’s a matter of usability.

Just because you can do everything with functions doesn’t mean you’d actually want to. It’s similar to why we don’t use assembly language, the most basic language that the CPU can understand, for all our programming. Most programming languages are equivalent in power but not equivalent in usability.

So our first goal here is to create a simple, yet usable, programming language.
Our first version of Pangolin, called PangolinV1, is going to have:

While loops for iteration
If expressions for branching
Functions
Variables
Numbers
A method for printing output

Now we’re going to outline the design choices of PangolinV1 and talk about what the other possibilities are!

General Design Choices

In PangolinV1, all executed code returns a value, like in Ruby. There’s no distinction between expressions and statements, like in JavaScript or Python, where some code returns values and some code doesn’t. The only values in PangolinV1 are numbers and functions.
PangolinV1’s syntax doesn’t care about whitespace. (var x 10) and

(var
x
10)

both mean the same thing. Other languages, like Python and Haskell, require you to have a particular indentation, and whitespace is important.

Creating, assigning, and using variables

Creating

(var x 10)

Assigning

(set x 3)

Using

Here are the rules for variables in PangolinV1:

you have to declare a variable before you use it
variables have to be declared with an initial value; they can’t be undefined
the value returned by declaring a variable is the value of the variable
the value returned by assigning a value to a variable is the new value of the variable
no parentheses are needed around a variable use

If expressions

(if cond code1 code2)

For if expressions the condition is evaluated first: if it is a positive number, then execute the first option. If it’s 0, or a negative number, then execute the second option. If it evaluates to something not a number, like a function, then the program should crash immediately. The value returned by the if is the value returned by the code it ends up executing

Sequences of code

(block
code1
code2
…
coden)

Sequences of expressions are started with block, which turns a group of expressions into a single one. Each expression is executed in turn and then the result returned by the “block” is the value of the last expression evaluated.

While loops

(while cond
code)

While loops first evaluate their condition and, like the if-expression above, if the condition is a number greater than 0 the body of the loop runs and the loop starts back over. If the value of the condition is a number equal to or less than 0, then the body is skipped. If the condition evaluates to a function, then the program should crash. The value returned by the while loop is 0.

The value returned at the end of the while loop is very arbitrary. It easily could have been anything, like the number 42 or a function that prints the first ten prime numbers.
It could even return something more useful, like the number of times the loop ran before it finished! Another choice we made is that you could have while loops where the body of the loop runs once no matter what. These were common in older languages and were generally called “do-while” loops.

Operations on numbers

Addition and subtraction do what you’d expect, but are written like

(+ 2 3)

(- 10 100)

The comparison operators are < and =. In

(< x y)

if x is less than y then (< x y) returns 1, otherwise it returns 0. What should happen if you pass in a function instead of a number to <? Should that just automatically return 0? Should it cause the program to crash? For Pangolin, I chose to have it crash because I personally think if you compare a function and a number together then some kind of mistake happened and the program should stop so you can debug it. There are a lot of people who would disagree with that, though. Maybe you’re one of them!

Similarly, = returns 1 if both numbers are the same and 0 if they’re not, and if something that isn’t a number is passed to = then the program crashes immediately.

Function declarations and calls

Declaring

(function (arg1 arg2 ...) body)

Calling

(fn arg1 arg2 arg3 ...)

Functions in Pangolin are always unnamed. To create a named function, you need to assign a function value to a variable, like

(var id (function (x) x))

When functions are called all their arguments are evaluated first, then the values are handed to the function.

There’s a really interesting design decision here, though, that might not be obvious at first! If a variable is passed to a function, what does that mean? Since variables are like boxes with data in them, does giving a variable to a function mean that you’re giving the function the entire box to play around with, or are you handing the function what’s in the box? In other words, if you have the following program should it print out 1 or 2?

(block
(var x 1)
(var fn (function (a) (set a (+ a 1)))
(fn x)
(print x))

This is the difference between what computer scientists call “call-by-value” and “call-by-reference” in a programming language. Call-by-reference is when you’re giving the function access to “the container” and call-by-value is when you’re only giving the function access to “the value” of the variable. In a call-by-reference language, the program will print out 2 because the function fn can change the contents of the variable. In a call-by-value language, it will print out 1 because the function only gets the contents of the variable. In Pangolin, we’ve chosen our functions to operate as call-by-value.

In Pangolin, like most languages, when you create variables inside of a function those variables are only visible within the function.

Printing: You can print values with the built in print function.

(print 10)

Right now, print only prints the word function when you pass it a function, but often in the Lisp family of languages printing a function actually shows you the code of the function! That’s a really useful feature for debugging, and one you might want to try in your own languages.

So that’s our summary of Pangolin and some of the design choices that we had to make to design a language. There are a lot more small details to consider than you might think!

I hope you check out the PangolinV1 interpreters and follow the instructions there to try designing your own language.

Next time we’ll be talking about adding more features and kinds of data to Pangolin and how this opens up even more choices.

This is one part of a series on how people create programming languages. Read Part I.

Learn More

Pangolin Code Repository

https://github.com/clarissalittler/creating-languages

Lambda Calculus

http://www.cs.unc.edu/~stotts/723/Lambda/overview.html

Fortran Reference

A reference on Fortran 77, a version of Fortran that has call-by-reference functions and is a good glimpse into what older languages looked like
http://web.stanford.edu/class/me200c/tutorial_77/

Forth

PangolinV1 is meant to look like a simple, modern, language. Forth is an example of just how different a programming language can look while still being Turing complete
http://thinking-forth.sourceforge.net/

Pharo

Another very different kind of programming language is Pharo, which has a common ancestor with Scratch!
http://pharo.org/

How To Create a Programming Language

https://kidscodecs.com/build-a-programming-language-1/
https://kidscodecs.com/designing-programming-language-part-ii/