I always wanted to write a compiler, but I was intimidated by what I would need to know. Compiler theory is not the most accessible area of programming. If you look on Amazon, the authoritative text on the subject is colloquially referred to as “The Dragon Book” . Dragons. It’s like they’re trying to scare you away!
I knew it couldn’t be as difficult as it was made out to be. With a little research, and one journey down the wrong path behind me, I eventually came across a source of material that made compilers accessible to the DIY programmer.
I’ve been picking up Go lately, and I wanted to apply it to a project that I could sink my teeth into. I’ve written plenty of services, micro and otherwise, so even though I knew Golang was good at that, I wanted to do something different. Since it is sometimes referred to as a 21st Century C, I figured it would be neat to attempt to write a compiler in it.
Being lazy, the first thing I did was google around to see if someone had already done it for me. And of course, they had! I found a tutorial series that uses Go to write a compiler .
It was a good starting point. I went through the first series of posts, and started to get a handle on the fundamentals: lexical scanning, parsing, abstract syntax trees, and emitting code. There was just one problem. This particular compiler didn’t output assembly. It was making C code.
A compiler that takes a high level language as input and outputs another high level language as output is not what I think of as a compiler. If that’s a compiler, then what’s a decompiler? There’s a whole rabbit hole of internet pedantry you can tumble into here, but I left that behind and continued searching for something to generate assembly. Thanks to this always informative site I found what I was looking for.
In 1988, Jack Crenshaw started a 16 part series that explains how to write a Compiler. He does it with very few assumptions of experience from the reader. It would take until 1995 for him to finish writing the 16th chapter and the series is not completed. But what it covers in those 16 tutorials is the approach I was looking for: start from scratch. Jack wrote the code in Turbo Pascal, and the assembly that it output was for the Motorola 68000 CPU . You can’t ask for a more distinguished processor than that. This was the brain inside the original Apple Macintosh! The Sega Genesis! The Commodore Amiga! The Atari ST! And even SNK’s Neo-Geo.
I figured that it was not going to be too hard to port to Go. Crenshaw’s code is written in procedural style, as it was written before object-oriented hit it big. His code is very cleanly abstracted, which makes it a relatively small conversion. What I came up with is very-much-not-idiomatic-go, but it does map (basically) line for line with the original Turbo Pascal.
If you’re thinking of writing your own compiler in Go, I don’t recommend just pulling down my repo and running it. You’ll get the most benefit if you work through Jack’s original text, and try to convert the code yourself. My source should be handy if you get stuck on anything.
A few notes on my implementation:
- Turbo Pascal has a handy feature to read a single character of input without needing you to hit enter. I wish Go had something similar out of the box. I eventually found the excellent termbox-go library. It does key scans, and it works decently cross platform as well. The only problem is that it takes over control of your entire console. Termbox is really a fully-fledged terminal i/o solution, and it excels at position specific character output. If you want to plot terminal apps with grid precision, this is the library for you. But if you want to punch out normal writes and writelns to stdout, it’s not going to play nice. So I did something a bit hacky, and wrote a buffered line output abstraction. This hooks in nicely with Turbo’s write and writeln statements, but let there be no misunderstanding: I know it is a kludgey hack and has very rough edge cases that have been ignored. Don’t try resizing your terminal after starting, for example. Pay no attention to the man behind the curtain, kids.
- Mostly I have written up my code samples to represent the finished state of logic as it is at the end of each tutorial. But there’s a few times where part way through a chapter, Jack will ask you to save your work, and open up a fresh copy to work on a tangent. Chapter 9 is a good example, where there are different versions to parse the original style language, and a C style, semicolon required language. In cases like this I’ve made two alternate versions of the chapters. If you’re reading along with the original tutorials, these splits should make sense. I’ve also added a comment at the top of each source file when I’ve branched off for a reason.
- Also at the top of each source file is a sample test. These are not meant to be exhaustive cases. In principal, I should have wrote these when I was initially converting the code, to save myself some time when I came back later to double-check my work. In practice, I skipped all that and then spent way more time later than I should in testing. When I eventually got them right, I annotated the source files. Consider my laziness a gift to you.
If you’ve gotten this far into the article, you are probably interested in having a look at the code. For now, I’ve posted it up onto github. In the case that I need to relocate the code you’ll always be able to find a link from here. All problems in programming can be solved with another level of abstraction.
As for what’s next, I’d like to hear from you. Jack never culminated the series with a completed compiler, but I think there’s enough information in the tutorials to build something fully-featured. If you’re interested in that, let me know. Another approach would be to update the codebase to an idomatic Go style. While the procedural style is easy to read, Go makes it very easy to write the abstract syntax tree data structures that are missing in Turbo Pascal. Covering this would probably cross the gap between a more traditional tree based parser. Again, if that appeals to you, let me know. You can reach me on twitter and byemail