Part 5 of 10: A Match made in Patterns

Let's take another look at the email list sample that we want our program to understand:

"Jena L. Dovie" <jdovie_qs@agora.bungi.com>, <marleen_df@acg-aos.com>;  Charmain Lashunda  <c.lashunda_mc@promero.com>; "Traci Shauna"   <traci_shaunaxp@cs.com>

We might as well start at the beginning, now that we've matched the quote. Let's look at the first email name:

"Jena L. Dovie"

If I asked you, "where is the actual email name?", you would probably say something like "It's whatever's between the double quotes". And you'd be right. So let's call the email name with the quotes the "enclosed email name" because it's enclosed by the double quotes.

Our enclosed email name could be matched by a rule that looks like this:

The email name enclosed in double quotes

But now we need to understand something fundamental about computer syntax checkers. Computer syntax checkers are like those poor people who offer instantaneous translation at the UN. They have to start translating, even before they know what the speaker is really going to say. Computer parsers like the one created by Treetop start at the beginning of your text, and they make decisions as they go along without knowing what comes next.

So here is a clearer definition, that's closer to the way Treetop parsers try to understand your text:

A double-quote (")

followed by the email name

followed by another double-quote (")

Save your double_quote.treetop program as email_list.treetop and make the changes indicated in bold:

# a Treetop Grammar to parse email lists
#
 
grammar EmailList
  rule full_email_address
    '"' email_name '"'
  end
end

(Yes, there is something wrong with this program. I'm sure you're smart enough to see it right off. However, let's use this opportunity to see how Treetop tells us about errors.)

Save your program talk_to_me.rb as parse_email_list.rb and change the lines to load the treetop grammar so it loads our new email_list grammar.

Treetop.load 'email_list'
puts 'Loaded email list grammar with no problems...'
 
parser = EmailListParser.new

Save and run your parse_email_list.rb program. Notice that it loads correctly. (If you followed my instructions exactly. If not, you know what to fix...)

Enter the following text into your test program:

"Jena L. Dovie"

You get an error message something like this:

(eval):43:in `_nt_full_email_address': undefined local variable or method `_nt_email_name' for #<EmailListParser:0x1005b0580> (NameError)
from /Library/Ruby/Gems/1.8/gems/treetop-1.4.4/lib/treetop/runtime/compiled_parser.rb:18:in `send'
from /Library/Ruby/Gems/1.8/gems/treetop-1.4.4/lib/treetop/runtime/compiled_parser.rb:18:in `parse'
from parse_email_list.rb:17

As usual, not very helpful. But, if we focus on some key information in the error message (with the help of a blue marker):

(eval):43:in `_nt_full_email_address': undefined local variable or method `_nt_email_name' for #<EmailListParser:0x1005b0580> (NameError)

So, what Treetop is trying to tell us, in its arcane way, is that the rule (_nt_) full_email_address refers to the rule email_name — but we never defined any such rule!

That's easy to fix:

# a Treetop Grammar to parse email lists
#
 
grammar EmailList
  rule full_email_address
    '"' email_name '"'
  end
  rule email_name
    [^"]*
  end
end

Oops. Completely new concept here. What we have entered for email name is called a Regular Expression. Regular expressions match patterns of text, so it's like Treetop but much more condensed. The email name pattern [^"]* can be expressed as the following rule.

Match any character except the double-quote

And keep matching as long as possible

Some other useful regular expressions you might need in Treetop grammars:

[a-zA-Z]: match any letter in the range a to z or A to Z (e.g. match 'a', 'e', 'W', 'Z', but don't match '%', '3' or '*').
.*: match any character, any number of them (even none at all!).
[.]*: match any number of periods, e.g. '', '.', '..', '...'.
c+: match at least 1 c, e.g. 'c', 'cc', 'ccc'.

If this is your first encounter with Regular Expressions, accept that it's going to take you a while to digest them. You can read about Regular Expressions at Regular-Expressions.info. There is even a tutorial there you can try.

Before we go on, let's fire up our test program and see if our grammar that has the regular expression for email_name works. Save the modified grammar and run your test program. Enter the following text when prompted:

"Jena L. Dovie"

Yes! It understands! But understands what? That's for our next tutorial.

Previous|Next