Computers are amazing pieces of engineering, but they’re useless if we can’t feasibly instruct them on what to do. It’s no surprise then that almost as long as we’ve had computers, we’ve had some kind of special languages we can write in that makes programming possible.
There’s dozens of popular programming languages today, and countless more that people have created for fun, for research, or as experiments. If you’re interested in how to create your own programming language this series of articles is going to help you gather ideas and get started. Designing and implementing languages isn’t as hard as you might think, but it is a big topic with a number of steps that you need to do.
In this first article, we’ll talk about what goes into making a language and get you started thinking about what your language will look like.
Design versus Implementation
First, I want to point out that there’s a difference between designing and implementing a programming language. An implementation of a programming language is a program that turns the text of the programs into actions on the computer. There’s two kinds of implementations: interpreters and compilers.
Both compilers and interpreters start the same way. They read the program that you want to run from a file and then turn it into a representation of the program as data.
What happens from there is different though. An interpreter will take this representation of the program and execute the program immediately. A compiler, on the other hand, is a program that reads the code and transforms it into another language. Often, a compiler turns a program into the low level instructions for the processor itself: either the actual machine code or the human readable step just above that, called assembly.
We need to define a few terms now. The act of reading the program from the file is called parsing and the bit of code that does this is called the parser. The representation of code as data is the abstract syntax tree (AST).
For example, imagine we’re writing an interpreter for Python, in Python. That might sound silly, but it happens more often than you’d think. Our program might look like
for i in range(0,10): print(i)
but from the perspective of the interpreter it might represent this program as something like
forLoop = For(Var("i"), Function("range",Int(0),Int(10)), BuiltIn("print",Var("i")))
where For, Var, Function, Int, and BuiltIn are classes that the the interpreter writer had to create. You can picture the structure of program as something like
Then, once the interpreter has created this AST, it will execute the code. This means taking apart the objects that represent the program and executing corresponding actions. A for-loop will get turned into something that runs repeatedly. Variables become set aside data that can be retrieved later. Built-in operations like print will write output to the console. Now, in this case that’ll be a little trivial because it’s easy to translate Python concepts into Python code. But if you were writing a Python interpreter in something like Haskell, a Python loop is executed as a Haskell function.
Now, to actually make a language I recommend starting with an interpreter. It’s generally less complicated than a compiler, because in an interpreter you only need to execute the code instead of figuring out how to translate it into code in a different language that still does what you want.
Here’s the basic steps and order I recommend for writing a programming language:
- Designing your language
- Creating the AST for the language
- Writing the code to execute the AST
- Choosing what the language should look like
- Writing the parser
We’ll preview the first two of these in this article and the rest will be in future installments.
Designing your language
Actually coming up with a language is the place to let your creativity shine. Start thinking about what would be cool or weird or interesting to see in a programming language.
You’ll need to think, though, about some basic questions for how the language will work.
- How do you iterate in your language? That is, how do you execute the same steps multiple times?
- How do you make choices in your language? You’ll need some way of making decisions about when something should happen. Most languages do this with some form of if-statement.
- What kind of data will you have: numbers, strings, lists, etc.?
- How will functions work?
- Do you want to be able to create concurrent threads?
- Are there any languages you’re inspired by?
- Are there any languages that you almost love but wish you could fix?
There’s even more design decisions to think about, but those are good starters.
Choosing what your language should look like
You may have already had a picture in your head of how your language will look. The actual way the language looks when written out is called its syntax. I recommend writing code by hand to figure out what syntax you’d like to use. If you don’t like the way your language works, it’ll be a lot harder to actually use it. I recommend picking the syntax of a language you like and building off of that. After all, so many languages stole the syntax of C and tweaked it a little bit, so you can borrow from languages you like too.
It’s a good idea to “fake” some programs in your new language with pen & paper and then make notes about what they should do.
If you spend some time thinking about these two topics and planning things out, you’ll be ready to start writing code for next time. Until then, check the further reading for some other tutorials on writing interpreters.