ly -- a parser generator

shortly

or simply, ly: a short way to lex/yacc

source

Overview

An easy, short way to create a parser. You may have come across lex and yacc and even used them to write parsers for languages. Here is a much easier way to write parsers and still have the expressiveness, reliability and performance of lex and yacc — because ly translates your specification to lex and yacc.

There are some advantages of using ly over methods such as recursive descent and ad-hoc parsing:

An Example

e.ly
1/ arithmetic
2.skip  [\ \r\n\t]
3.type  int
4s:     e;
5e:     e OP:[-+*/] t | t;
6t:     LIT:[0-9]+ ;

That’s it! That is your grammar and scanner defined.

Line 1 is a comment; line 2 tells the lexer to skip whitespace; line 3 tells ly the type of the actions and to generate a scanner called elex instead of the default yylex. ly generated scanners always return a string (char*) which is the text that matches the rule. We must provide yylex() but we can call elex() to get the token and it’s value. elex() puts the token value in elval

You may have noticed the initial ’\' in the .skip directive. Wherever you want a space in a regular expression, you must escape it because ly can’t (yet) parse regexs right. ly considers any string starting with a symbol to be a regex and considers the regex to end at the first whitespace. So the ’\' escapes the whitespace and the .skip regex is [ \r\n\t]

Actions speak louder than Words

Here are the actions and yylex function defined:

e.c
1#line 392 "ly.far"
2#include <stdio.h>
3#include "eh.h"
4#define CS(x,y) case x:y;break
5yylex()
6{int tok=elex();
7 {switch(tok)
8  {CS(LIT,yylval=atoi(elval));
9   CS(OP,yylval=*elval);
10 }}
11 return tok;
12}
13S1(s1)
14{
15  printf("%d\n",x);
16}
17#define CASE(a,b) case a:return  x b z;
18S3(e1)
19{switch(y)
20 {CASE(’+’,+)
21  CASE(’-’,-)
22  CASE(’*’,*)
23  CASE(’/’,/)
24}}
25S1(e2){return x;}
26S1(t1){return x;}
27main(int c,char**v)
28{
29  return yyparse();
30}

First note how yylex() is written. In the case of a literal, LIT, call atoi(), and in the case of an OP, get it’s ascii code and save it in yylval.

To make the parser actually evaluate the expression, ’actions’ for each rule are required. Note how the function names are s1, e1, e2 and t1. You can see that the names are encoded as the rule name (say, "s") followed by the rule number (say, "1"). So the non-terminal e has two rules which are named e1 and e2.

e.c is compiled and linked with the lex and yacc source generated by ly. here is an example of it working:

echo 1+2*3 | ./e
9

Note that we get the "cheap calculator" result, 9. (the "scientific calculator" result is 7).

Headings and Paragraphs

This document was marked up and processed by a parser I wrote using ly. To demonstrate this practical application, here’s a very simple wiki markup parser:

hp.ly
1/ a heading and paragraph parser
2s:     hp;
3hp:    hp p | hp HD:^=+.*\n | ;
4SP:    [\ \t]+ ;
5p:     words ?NLNL:{SP}*\n{SP}*\n[\ \t\n]* ;
6words: words {SP}+|{SP}*\n{SP}* word | word;
7word:  [^\ \t\n]+ ;

Notice line 4: this declares a lex definition. Definitions are macros that are called using {}. Note {SP} is used a few times: it’s hard to get the whitespace rules right for a language like this.

Note also the rule for ’p’ on line 5. The NLNL token is prefixed with a ’?’. This means the action ignores the token and so the action has only one parameter.

Also, semicolons are separated from regexs by a space so that ly doesn’t think ’;’ is part of the regex.

hp.c
1#line 432 "ly.far"
2#include "hph.h"
3#include "u.h"
4S1(s1){O("%s",x);}
5C2(hp1) 
6S2(hp2){I i=1;for(;y[i]==’=’;i++);R Os("%s<h%d>%s</h%d>\n\n",x,i,y+i,i);}
7C0(hp3)
8S1(p1){R Os("<p>%s</p>\n",x);}
9C3(words1)C1(words2)C1(word1)
10extern FILE* yyin;
11main(I c,C**v)
12{
13 if(c>1)yyin=fopen(v[1],"r");
14 yyparse();
15}

Note that action p1 takes only one parameter. You can find the unfamiliar functions Os, O and R in u.h and u.c in the source tarball.

test.hp
1=ly
2==Introduction
3ly is a short way of writing a parser
4
5as you will
6

The test input above produces this output:

./hp test.hp
<h1>ly
</h1>

<h2>Introduction
</h2>

<p>ly is a short way of writing a parser</p>
<p>as you will</p>

Building

run ./configure and make ly e hp to get ly, the simple calculator and hp.

Platforms