I’ve always had an interest in compiler-compilers, but it was usually from afar. In University, I did a project to convert YACC to output Pascal code instead of C. Luckily I only had to work on the back end of YACC. I’ve written a few of my own programming languages, and even got a fair chunk into my own Pascal compiler, but I usually stayed away from Compiler-compilers after that horrible experience. Usually I wrote the syntax analyzer myself, as a set of routines driven by conditions based on the input token. I found this a much easier way to comprehend what I was doing than writing BNF.
I’m not saying Treetop is the answer to my prayers, but it’s a little closer to them than YACC was. (The answer to my prayers would allow me to put the grammar code and the code that uses the grammar in the same file, and be fully interpretive…) A problem with Treetop, though, is that it is notoriously short on documentation. Even the rdoc files are mostly empty. And the examples are far too simple to be useful templates.
I’m hoping with this tutorial to give you a better,
introduction to Treetop grammars.
It assumes you know a little bit about Ruby
(e.g. you have read the first few chapters of
The Pragmatic Programmer’s Guide
And also that you have Ruby
installed on your machine.
We are going to build a simple grammar to parse a list of email addresses, the kind that usually appear at the top of those chain emails. Why you would want to do that we’d rather not know, but the exercise does at least provide us with a simple “language” that allows us to explore the characteristics of Treetop.
Here is our sample data that we want to end up understanding:
(A note if any one of these is your email address: I recommend you get out of the business of selling replica watches or pharmaceutical drugs online. There’s no money in it. Really.)
How do we express our understanding? How about a simple list of email addresses and names, one per line:
We are going to use Treetop and Ruby to perform this.