adamcrussell

Parse::Yapp - Introductory Example

Over the course of the past year or so I have made use of Parse::Yapp for several solutions to the Perl Weekly Challenge. This most recent time, for Challenge 039, I thought that it may be helpful to others to dive deeper into this extremely useful module. My hope is to illustrate the use of Parse::Yapp with examples while taking note of the features being used, both in this blog post and in future postings. I am aiming for a slightly simpler and practical minded example based treatment than what is provided by the current module docs which, to the extent in which they document many of the modules's features, are already quite excellent.

To start, let's look at a specific problem, to implement a Reverse Polish Notation calculator.

Recall that Reverse Polish Notation, otherwise known as postfix notation, is simply the positioning of the operator after the operands. For example, in infix notation we would write 10 + 8 and in postfix notation we would write 10 8 +.

Here we are only concerned with the basic four functions (+, -, *, /).

The code for this is a small script (ch-2.pl) to create some examples and call the parser and the parser definition (Rpn.yp).

ch-2.pl The test script.
ch-2.pl The test script.
Rpn.yp The RPN calculator grammar.
Rpn.yp The RPN calculator grammar.

Sample Run

$ perl -I. ch-2.pl
18
-48
80
26
4

What I Did

  1. Define a grammar in Rpn.yp. Unless you really prefer writing your lexing functions elsewhere include them in this file as well.
  2. Execute yapp -m Rpn Rpn.yp. This generates the parser and saves it to a file named Rpn.pm. Specifying an output filename with the -m option will allow you to give the generated parser a name which may make more sense then the default.
  3. Write a wrapper to the parser, as I have done in ch-2.pl. Mine here is very small and straightforward since we're just parsing several strings. Larger projects will need more code to, say, stream larger bodies of text.
  4. The use of -I. in the example above is so that Perl can find Rpn.pm, which happens to be in the same directory I am running the command in (the current working directory). You may have your Rpn.pm (or other generated module) located somewhere else.

The RPN Grammar in Detail

Header

Everything before the first %% is the header section. In Rpn.yp we have just a single line in the header section %token NUMBER which declares a token. In this case NUMBER can be thought of as a label for a whole group of tokens defined in the lexer. Look at line 24. There we can see that a regular expression is defined which identifies numbers and sends them to the parser. NUMBER can be used for any value returned on this line, this is much more convenient than trying to identify which combinations of digits we might want to accept! Technically this %token line is optional, since Yapp does not require this sort of explicit token declaration, however its use certainly helps readability of your grammar.

Rules

The next section, after the first %% is the rules section. This is the heart of the grammar, where we define what we understand to be the definition of the postfix rules are implemented.

Each rule is given a name or label, followed by a : and one or right hand side rule definitions separated by a | and terminated with a ;. Once defined the rule name can be used on the right hand side. In this way rules have a recursive definition.

On line 4 we define a line as being either nothing more than a newline (\n) or a rpn followed by a newline. But what is this rpn? Look to line 7. There the rpn rule is defined as either a NUMBER or an rpn followed by another rpn, followed by an operator. We have four different lines, one for each operator. 

After the right hand side of the rule we define actions which are blocks of Perl code which do something according to the rules that have just been defined. For the four operator rules we take the two tokens and perform the expected operation, adding them, subtracting the, etc. We refer to the tokens by their position in the idiomatic Perl argument array @_. We can see, then, that we can think of each action as a subroutine with its own arguments. Inside @_ we can expect to find $_[1] to $_[n] to be the parameters matching the tokens in the rule with $_[0] saved for a reference to the parser object itself. What happens to the value of this action? You can consider them saved as the value of the left hand side! In this way we can see that just with our simple recursively defined rules we can evaluate a somewhat complicated looking postfix expression such as 2 2 + 3 3 + + 1 7 - + ! The answer is 4, by the way. 

Note that our top level rule has the action of printing the answer once it sees a newline.

Footer

The final section, everything after the second %% is the footer section. This section is technically optional, however, unless your flexing function is particularly complicated it makes sense to usually place it here along with the error reporting and parse functions. sub lexer is our lexer function. The data is stored in a hash under the {INPUT} key. We loop over the input identifying tokens with regular expressions and sending them to the parser. Each token is sent as a pair where the first element is a name or label (remember NUMBER?) and the second is the value of the token. By using s// we are removing the token from the input, reducing it at each pass. When the input is finally empty, having been reduced in this way, the lexer function completes.

sub error is a simple error reporting function. We can customize the error reporting here, perhaps in an effort to provide something useful to the user so they can correct some syntax error. This is optional and if no function is provided for this then any errors will report the default Parser error.

sub parse is the entry point for the parser. See ch-2.pl where it is called by the test script. We can see that it gets passed in an input string, sets it to the value of the {INPUT} key, and calls YYParse (the actual parser generated by Yapp) with references to the lexer and error functions. Finally, the result from YYParse is returned. 

Notes

Implementing an RPN calculator for the Perl Weekly Challenge provided what I considered to be the material for an easily digestible introductory example to Parse::Yapp. As other examples arise I will make additional posts here to further add to the body of documentation.

Comments for this post were locked by the author