Calculator Tutorial

Constantine Plotnikov

<constantine.plotnikov@gmail.com
   >

Revision History
Revision 0.2.1	2009-01-19
The initial version of the tutorial for the ETL framework version 0.2.1.

Abstract

This document describes how to create a simple calculator language using ETL. The tutorial introduces the most of the extensibility constructs of the ETL grammar definition language. The tutorial assumes the ETL parser version 0.2.1.

1. Installing Framework

Firstly, you need to download and unpack archive with the parser in any directory. You could do it by going to the project page on the SourceForge: http://sf.net/projects/etl . Unpack the archive to the any directory (preferably to one that does not contain spaces in the path) and follow installation instructions . It is recommended generating xmlout sample files, since this tutorial links to them in a number of places in order to show ASTs for samples.

To start working with the framework you will need just the etl-parser.jar library. This jar file does not have any external dependencies. It uses java.util.logging as a logging framework.

Note that all Java sources, grammar and sample files used in this tutorial are included into the ETL package.

Also you could browse html and xml files that represent etl sources for this tutorial at the directory if you have downloaded xmlout version of the package or generated xml and html files according to installation instructions ../xmlout/calculator .

2. Basic Grammar

In this part of the tutorial will consider a minimal grammar that works.

2.1. Creating Basic Grammar

Let's create a very simple grammar that contains only print statement and the only expression that supported are positive numbers. Let's place the grammar in the file calculator-basic-0_2_1.g.etl . ^[1] :

Example 1. calculator-basic-0_2_1.g.etl

     1: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
     2: grammar calculator.CalculatorBasic {
     3:     namespace default t = "http://etl.sf.net/samples/calculator";
     4:     context default Expressions {
     5:         op composite NumberLiteral(f) {
     6:             @ value = float | integer;
     7:         };
     8:         op composite StringValue(f) {
     9:             ^ t:StringLiteral {
    10:                 @ value = string(quote='"');
    11:             };
    12:         };
    13:         statement PrintStatement {
    14:             % print;
    15:             @ values += list , {
    16:                 expression;
    17:             };
    18:         };
    19:         statement ExpressionStatement {
    20:             @ value = expression;
    21:         };
    22:     };
    23: };

It is a current conventions for ETL grammars to end with extension .g.etl for the ETL sources to end with extension .{language-specific-part}.etl . And we use the extension .c.etl for sources that conform to the calculator grammars. However the ETL parser does enforces any naming convention for the grammars or sources conforming to them.

The first thing to note is that ETL shares the common lexical syntax across all languages that could be defined using ETL. The lexical level contains integers, string, identifiers, and graphic tokens ^[2] . The keywords in ETL are not global like in other languages, but they are normal tokens like identifiers that play role of the keyword in the specific place of the grammar, but which could be used freely in other places.

In addition to the lexical syntax, ETL languages share the common phrase syntax . The main elements of the phrase syntax are blocks and segments. The segment is a sequence of significant tokens ^[3] and blocks terminated with semicolon. The block is a sequence of segments enclosed into braces. The ETL source is a sequence of segments.

Above the phrase level, the ETL has a term syntax . The term syntax is defined using grammars, and the grammar maps the parsed phrase layer to AST consisting from objects and properties. The object is type is a pair of namespace and name (just like the in case of XML element). The property is identified by the name. The properties are assumed to be of the two types: single value and multiple value properties. The ETL term parser maps structure returned by phrase parser to the object tree that is directly usable by the tools. The resulting object model is quite simple and flexible and there are ready parsers to map it to JavaBeans, EMF, Java objects with public fields. It is also easy to create a custom AST parser that would map to other object structures like JSON or MOF, since at the core of the framework is an event parser that works as a kind of iterator that simultaneously iterates events returned by lexer, phrase parser, and term parser.

The ETL grammar syntax is also defined using ETL, and it is also organized into segments and blocks. The grammar like the most other ETL files starts with document type declaration . In some cases the document type declaration could be missing and in that cases the ETL parser will use the default grammar. In this particular case the grammar is identified using public identifier , that has the same semantics and syntax like in XML. The mapping for the grammar's public identifier -//IDN etl.sf.net//ETL//Grammar 0.2.1 is implemented in the parser. For other public identifiers the entity resolver should be used. The Apache XML catalog resolver could be used for building resolution basing on a number of catalog formats. It is also possible to specify a system identifier, and we will get back to this topic in the later parts of the tutorial . The document type declaration starts with the doctype keyword. And the document declaration statement could happen only as the first statement in the ETL source.

The second line is the start of the grammar declaration . The grammar declaration starts with the grammar keyword. Then there is a fully qualified name of the grammar that uses syntax similar to the syntax of fully qualified Java class name. This name is later available in events returned by parser, so it will be possible to recognize that source belongs to the specific parser. But it does not affect process otherwise. The any name could be specified here, but it is recommended to follow Java naming conventions. The curly brace starts a block with grammar contents.

The first thing to note is that grammar content is unordered. And the top level grammar members could happen in any order, without changing their semantics. However it is a good style to put grammar wide declarations at the start of the grammar.

The first line the inside our grammar is a namespace declaration . Like in XML, a namespace declaration creates mapping between prefix and URL. This prefix is used later in object creation expressions. This particular namespace is designated as a default one.

The namespace declaration is followed by the context declaration . The context declaration starts with the context keyword, this keyword is optionally followed by the modifiers default or abstract . The modifier abstract is covered later in this tutorial. The modifier default specifies which context is used for the top level of the source that conforms to the grammar if no context is specified in the document type declaration. If no context is declared as default in the grammar, the context should be specified doctype statement or set in the parser. We have exactly one context in the parser, and we designate it as a default one. The modifier default in our sample grammar is followed by the context name. The context name has the grammar scope and it used to refer the context from different parts of the grammar. The information about the current context is also available from parser events, and it can be used in ETL processing tools.

The context contains four statements each creating a new declaration. The first one is the operator declaration that declares a numeric literal value. The keyword op specifies that an operator is being declared. The keyword composite specifies that an operator is defined using the free-form syntax in the body. This keyword is followed by the operator name within context. If the default namespace is specified and the operator does not contain an object creation expression at root level, this name is used to construct the object name for this operator. This name also plays an important role in the extensibility scenarios supported by ETL. In the parentheses after the operator name goes the operator type specification. The type f means that operator is a primary operator has neither left nor right argument. We will return to the operator types in the later sections of the tutorial.

The operator declarations form expression syntax for the context. Each operator declaration introduces infix, prefix, suffix, or primary operator. If the context defines expressions, at least one primary operator must present in the context grammar, or it will not be possible to build an expressions.

The primary operator is the simplest operator kind. It has only operator expression, which is wrapped into the operator's object (in our case this operator's object is specified implicitly). The content of our operator definition contains a property assignment statement (or let statement ) starting with @ . The keyword @ is followed by property name and assignment operator. There are two assignment operators used in ETL grammars = (single value assignment) and += (multiple value assignment). The multiple assignment should be used if the property in the underlying model is multivalue property (like list or array). The single assignment should be used if the property is expected to be a single value one.

On the right hand side of the property statement is a syntax expression . Syntax expressions in ETL consume some tokens in the stream and yield objects, properties, and values. This particular expression can consume either ETL integer or ETL float, and yields them as values. If this operator matches, a logical AST node will produced with the object name t:NumberLiteral and the single property value containing integer or float token text.

The next operator definition defines a string literal. Differently from NumberLiteral, it defines root object for the value explicitly. This is done when either namespace is different from default namespace, or when name of object does not match the name of the definition (like it was done in this case for no particular reason). The operator matches a string value with double quote. It is recommended to use a shorter form when it is possible. However the full form shown in the case of StringLiteral is always available.

The ETL expressions are not directly usable. They must be contained in the statements which are organized in the blocks. The next member of the context defines a print statement. Like an operator, the statement maps to a single object, which name is in our case inferred just like the operator's object name.

The block that is contained in the statement, contains several segments. Each segment of the statement is evaluated sequentially if the statement is matched. The statements of two kinds are allowed here, a let statement (that was discussed in the operator's description) and expression statement.

The first segment in the statement definition contains % print expression. This expression matches the identifier print and yields no value. The keyword % acts like an escape here, specifying that the next token is a keyword of the grammar that is being created. Several keywords could be specified in the sequence and they could be interleaved by the blocks, which could contain syntax expressions as well.

There are few things to note in the next statement as well. The assignment operator in let statement has the form += meaning that multiple values are expected. The the immediate syntax expression to match is the list operator. That matches values in its content separated by separator that comes right after the list keyword (in our cast it is comma). The elements in the list are specified by the block, that contains the syntax expression expression . The form used in this statement means any expression defined using operators in this context. It is possible to refer expressions in other context like expression(Expressions) , and this form has the similar effect to expression without arguments in this case, but generally it behaves differently in the case of extensibility constructs. For this grammar the expression will match either integer literal or string literal. However we extend the grammar further with additional elements in the later sections of the tutorial.

The last definition is the statement ExpressionStatement, this is a simple statement that contains a single expression. When the statement is executed, it evaluates the expression.

This is a small grammar that will serve as a basis for the rest of the tutorial. It will be extended in the later sections of this tutorial with additional elements, but it could be already used to run some samples (for example for converting positive numbers from bases 2..36 to a decimal form, or for converting Unicode escapes to the characters).

2.2. Sample Texts

The ETL file that matches this small grammar is actually very simple. Lets place the following text in the file sample-basic.c.etl located in the same directory as grammar file.

Example 2. sample-basic.c.etl

     1: doctype "calculator-basic-0_2_1.g.etl";
     2: print "zeroes: ", 24#0#, " ", 0.0; 
     3: print "ones: ", 1, " ", 1.0, " ", 10E-1; 
     4: print "halves: ", 0.5, " ", 5.0E-1, " ", 2#0.001#e2; 
     5: print "dozens: ", 12#10#, " ", 3#1_1_0#, " ", 0.12e2, " ", 36#C#;

The sample is meant to print numbers in the specified form interleaved with string literals that are printed as is. The numbers on each line has the same numeric value but they are written in the different form. Note that ETL uses Ada-like based integers, and the last line contains a lot of examples. Also in ETL, it is possible to use underscore in the number separate logical groups.

You could check also check sample AST . Note that the AST is clickable provided that you have JavaScript enabled. You could click on object and property names on the left side to have a corresponding text highlighted, or text on the right side to have innermost object or property for that position highlighted.

2.3. Error Recovery

The ETL has a build-in error recovery strategy. If a syntax error in the statement happens, the parsing of the statement is aborted until the end of the segment. The errors on the lexical level and phrase level do not cause syntax errors automatically by themselves, but since lexical errors are treated as white spaces, the parser will be likely confused by the next token. So as result of the parsing the segment an incomplete object tree will be built.

But in the case of lexical and phrase level errors, the parser is able to build at least partial object tree, that could be used to find out further semantic errors.

Let's consider some erroneous sources and how they parsed by to see the error recovery in the process. You could check the sources using the etl2html script that could be installed according to installation instructions.

Example 3. calculator-basic-with-errors.g.etl

     1: // This file demonstrates the error recovery strategy in ETL. 
     2: // Errors here are intentional.
     3: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
     4: grammar calculator.CalculatorBasic {
     5:     namespace default t = "http://etl.sf.net/samples/calculator";
     6:     context default Expressions {
     7:         op composite NumberLiteral(f) {
     8:             @ value = float | integer;
     9:         };
    10:         op composite StringValue(f) {
    11:             ^ t:StringLiteral {
    12:                 @ value = string(quote='");
    13:             };
    14:         };
    15:         statement PrintStatement {
    16:             % print
    17:             @ values += list , {
    18:                 expression;
    19:             };
    20:         };
    21:         statement ExpressionStatement {
    22:             @ value = expression
    23:         };
    24:     };
    25: };

LEXICAL_ERROR: calculator-basic-with-errors.g.etl(12,40,487)-(12,44,491): New line encountered in string. If you want to insert new line use \n syntax
SEGMENT_ERROR: calculator-basic-with-errors.g.etl(13,13,504)-(13,13,504): Unterminated segment, semicolon expected here
SYNTAX_ERROR: calculator-basic-with-errors.g.etl(13,13,504)-(13,13,504): Structural error. Expecting tokens "LA[phraseTokens=[];tokens={STRING[quote="]=[null], STRING[quote=']=[null]}]".
SYNTAX_ERROR: calculator-basic-with-errors.g.etl(17,13,585)-(17,13,585): Syntax error. Expecting end of segment.
SEGMENT_ERROR: calculator-basic-with-errors.g.etl(23,9,741)-(23,9,741): Unterminated segment, semicolon expected here

The table after the source summaries the errors and theirs positions. Let's look at generated errors, the first error is on the line 10 and relates to missing closing quote in the string. The lexer return an error saying that newline is not expected in this kind of the string. Then the phrase parser is unable to find a semicolon before end of the block, and complains about it as well. But it generates an event saying that the segment is finished. The term parser still expects a string, and when finds out that segment has ended, it reports a syntax error.

The next error is missing semicolon after the keyword print on the line 14. The term parser encounters the symbol @ that could happen only at the beginning of the statement according to the grammar syntax. The term parser skips until end of the segment including blocks that are nested in this segment. The skipped tokens are highlighted as red in the sample. Since there is a semicolon before end of the block, the phrase parser has nothing to complain about.

The last error is missing semicolon at the end of the block on the line 20. This error is detected by phrase parser. The phrase parser reports that there is an error. But it also reports that segment has ended. Since segment has ended where the term parser has expected it end, the term parser does not complain, and just forwards phrase parser error to the client code.

However despite of these errors, most of the AST is built anyway as you could see it here . Compare it with AST of the correct grammar . As you can see only information from erroneous nodes is missing.

Let's now see what will happen if we will try to use grammar that contains errors:

Example 4. sample-basic-with-errors.c.etl

     1: // This file demonstrates the error recovery strategy in ETL. 
     2: // Errors here are intentional.
     3: doctype "calculator-basic-with-errors.g.etl";
     4: print "zeroes: ", 24#0, " " , 0.0 , "  '', 0; 
     5: print "ones: ', 1, " ", 1.0, " ", 10E-1 {a;a}; 
     6: print "halves: ", 0.5, " ", 5.0E-1, " ", 2#0.001#e2; 
     7: print "dozens: ", 12#10#, " ", 3#1_1_0#, " ", 0.12e2, " ", 36#C {{

GRAMMAR_ERROR: calculator-basic-with-errors.g.etl(12,40,487)-(12,44,491): New line encountered in string. If you want to insert new line use \n syntax
GRAMMAR_ERROR: calculator-basic-with-errors.g.etl(13,13,504)-(13,13,504): Unterminated segment, semicolon expected here
GRAMMAR_ERROR: calculator-basic-with-errors.g.etl(13,13,504)-(13,13,504): Structural error. Expecting tokens "LA[phraseTokens=[];tokens={STRING[quote="]=[null], STRING[quote=']=[null]}]".
GRAMMAR_ERROR: calculator-basic-with-errors.g.etl(17,13,585)-(17,13,585): Syntax error. Expecting end of segment.
GRAMMAR_ERROR: calculator-basic-with-errors.g.etl(23,9,741)-(23,9,741): Unterminated segment, semicolon expected here
GRAMMAR_ERROR: sample-basic-with-errors.c.etl(3,45,141)-(3,45,141): Unable to parse grammar with public id null and system id file:/C:/projects/sf/etl-java/samples/calculator/src/calculator/grammars/calculator-basic-with-errors.g.etl due to unexpected problem.
LEXICAL_ERROR: sample-basic-with-errors.c.etl(4,19,162)-(4,23,166): Based number is not terminated
LEXICAL_ERROR: sample-basic-with-errors.c.etl(4,37,180)-(4,47,190): New line encountered in string. If you want to insert new line use \n syntax
LEXICAL_ERROR: sample-basic-with-errors.c.etl(5,32,223)-(5,48,239): New line encountered in string. If you want to insert new line use \n syntax
LEXICAL_ERROR: sample-basic-with-errors.c.etl(7,60,355)-(7,64,359): Based number is not terminated
SEGMENT_ERROR: sample-basic-with-errors.c.etl(7,66,361)-(7,66,361): Block starting here is not terminated
SEGMENT_ERROR: sample-basic-with-errors.c.etl(7,67,362)-(7,67,362): Unterminated segment, semicolon expected here
SEGMENT_ERROR: sample-basic-with-errors.c.etl(7,65,360)-(7,65,360): Block starting here is not terminated
SEGMENT_ERROR: sample-basic-with-errors.c.etl(7,67,362)-(7,67,362): Unterminated segment, semicolon expected here

The all grammar errors are reported when reading the source, but they are classified as the grammar errors, the position information supplied with the error points to the grammar source.

However the last grammar error is reported at the end of the doctype directive, saying that the specified grammar could not be used. When the error with the grammar happens, the term parser fall backs to the default grammar that is able to accept any ETL source. Because of this, we will not see syntax errors in the source. The grammar treats all tokens the same, and you can see that print is not highlighted as an error. But the phrase errors and lexical errors are still detected, we have made a plenty of them in the source.

You could check also check sample AST to validate it.

2.4. AST classes

Let's now write the code that supports this simple sample. We use the simplest AST parser from ETL library, FieldTermParser .

To simplify mapping, it is recommended to put classes related to the same namespace to the same package and to use the same class name as an object name. The ETL AST parsers allow some degree of the customization and it is possible to violate these recommendations. The field term parser assumes there is some base class that holds basic information like start and end position of node in the source code, and structural information like parent child relationship. For the tutorial we do no need parent child relationship relation, but information about positions might be useful for error reporting. So lets create Node class with the field location that hold position information. The parser can either store positions as fields of the type TextPos , scatter it among the fields integer types, or store position information as a single location field of the type SourceLocation (or it possible to customize processing in some other ways). We will go for the latest variant and the result class will look like the following. Note that class and field have a public access. Otherwise the parser will not be able to get an access to them using reflection.

Example 5. Node.java

  1 package calculator.interpreter.ast;
  2 
  3 import net.sf.etl.parsers.SourceLocation;
  4 
  5 /**
  6  * The basic node class
  7  */
  8 public abstract class Node {
  9   /** the position of the node */
 10   public SourceLocation location;
 11 }
 12

Then lets define a simple class Expression with the abstract method Object eval(Environment env) . Ignore the env parameter in this section of the tutorial; we will define this class later . In this section of the tutorial, this class is unused and it could be an empty class.

Example 6. Expression.java

  1 package calculator.interpreter.ast;
  2 
  3 import calculator.interpreter.Environment;
  4 
  5 /**
  6  * The base class for all expressions.
  7  */
  8 public abstract class Expression extends Node {
  9   /**
 10    * Evaluate the value of the expression.
 11    * 
 12    * @param env the evaluation environment
 13    * @return the resulting value
 14    */
 15   public abstract Object eval(Environment env);
 16 }
 17

The classes StringLiteral and NumberLiteral have a similar field value that is shared among them. Let's create the abstract class Literal to contain a shared state.

Example 7. Literal.java

  1 package calculator.interpreter.ast;
  2 
  3 /**
  4  * The literal expression
  5  */
  6 public abstract class Literal extends Expression {
  7   /** The literal value */
  8   public String value;
  9 }
 10

Then we just need to implement method eval() in both classes. But it is trivial when using the utility class LiteralUtils from the parser. This class allows parsing number and string tokens.

Example 8. StringLiteral.java

  1 package calculator.interpreter.ast;
  2 
  3 import net.sf.etl.parsers.LiteralUtils;
  4 import calculator.interpreter.Environment;
  5 
  6 /**
  7  * The string literal
  8  */
  9 public class StringLiteral extends Literal {
 10   @Override
 11   public Object eval(Environment env) {
 12     return LiteralUtils.parseString(value);
 13   }
 14 }
 15

Note that for sake of simplicity, we use only double numbers here.

Example 9. NumberLiteral.java

  1 package calculator.interpreter.ast;
  2 
  3 import net.sf.etl.parsers.LiteralUtils;
  4 import calculator.interpreter.Environment;
  5 
  6 /**
  7  * The number literal
  8  */
  9 public class NumberLiteral extends Literal {
 10   @Override
 11   public Object eval(Environment env) {
 12     return LiteralUtils.parseDouble(value);
 13   }
 14 }
 15

Let's now implement statement classes. Let's define the Statement abstract class. Right now we have two statement types, but we will add more in later sections of the tutorial. There are fields annotations and documentation . Ignore these fields for now, we will cover them in the later sections . The class Statement also has the following abstract method Environment eval(Environment context, Cell result) . We will discuss the parameter result later . And the argument value env should be returned as is if statement does nothing with the environment (and our print and expression statements are such statements).

Example 10. Statement.java

  1 package calculator.interpreter.ast;
  2 
  3 import java.util.LinkedList;
  4 import calculator.interpreter.Cell;
  5 import calculator.interpreter.Environment;
  6 import calculator.interpreter.ast.meta.Annotation;
  7 import calculator.interpreter.ast.meta.DocumentationLine;
  8 
  9 /**
 10  * The base statement class
 11  */
 12 public abstract class Statement extends Node {
 13   /** the annotations associated with statement */
 14   public final LinkedList<Annotation> annotations = new LinkedList<Annotation>();
 15   /** the documentation lines associated with statement */
 16   public final LinkedList<DocumentationLine> documentation = new LinkedList<DocumentationLine>();
 17 
 18   /**
 19    * @return true if the statement has metadata
 20    */
 21   public boolean hasMetadata() {
 22     return !annotations.isEmpty() || !documentation.isEmpty();
 23   }
 24 
 25   /**
 26    * The evaluate statement
 27    * 
 28    * @param env the evaluation environment
 29    * @param result the evaluation result (the cell might be null, in that case
 30    *          the result must be discarded).
 31    * @return new context (if environment is not changed returns {@code env}
 32    *         argument)
 33    */
 34   public abstract Environment eval(Environment env, Cell result);
 35 }
 36

The print statement should be able to reference multiple child nodes. The FieldTermParser currently supports ArrayList and LinkedList collection types. Other collection types might be supported in the future versions of the parser. The implementation of the PrintStatement is a quite simple. It just evaluates all its arguments and print results using System.out.print(o) method, and than prints new line.

Example 11. PrintStatement.java

  1 package calculator.interpreter.ast;
  2 
  3 import java.util.LinkedList;
  4 import calculator.interpreter.Cell;
  5 import calculator.interpreter.Environment;
  6 
  7 /**
  8  * The print statement
  9  */
 10 public class PrintStatement extends Statement {
 11   /** the values to print */
 12   public final LinkedList<Expression> values = new LinkedList<Expression>();
 13 
 14   @Override
 15   public Environment eval(Environment env, Cell result) {
 16     for (Expression e : values) {
 17       System.out.print(e.eval(env));
 18     }
 19     System.out.println();
 20     return env;
 21   }
 22 }
 23

The class ExpressionStatement has even simpler structure as it contains only a single expression. This expression is simply evaluated by the statement and it is set to result (if the caller has provided a Cell object for it).

Example 12. ExpressionStatement.java

  1 package calculator.interpreter.ast;
  2 
  3 import calculator.interpreter.Cell;
  4 import calculator.interpreter.Environment;
  5 
  6 /**
  7  * The expression statement. The result of the statement execution is an value
  8  * of wrapped expression.
  9  */
 10 public class ExpressionStatement extends Statement {
 11   /** the expression to be interpreted */
 12   public Expression value;
 13 
 14   @Override
 15   public Environment eval(Environment env, Cell result) {
 16     Object rc = value.eval(env);
 17     if (result != null) {
 18       result.value = rc;
 19     }
 20     return env;
 21   }
 22 }
 23

2.5. Implementing Interpreter

Now we have defined all AST classes for basic grammar and we need to read actual source statement by statement and to evaluate statements.

The ETL grammars are created in a way that maps each segment to at most one object. So the any source code is mapped to a sequence of the objects. The AST parser work in the model that allows reading such sequence of the objects, with objects and properties mapped according to conversions of the particular AST parser kind.

So our interpreter reads these statements one by one and executes them.

Example 13. Interpreter.java

  1 package calculator.interpreter;
  2 
  3 import java.util.LinkedList;
  4 import net.sf.etl.parsers.ErrorInfo;
  5 import net.sf.etl.parsers.StandardGrammars;
  6 import net.sf.etl.parsers.TermParser;
  7 import net.sf.etl.parsers.TermParserFactory;
  8 import net.sf.etl.parsers.TermToken;
  9 import net.sf.etl.parsers.beans.FieldTermParser;
 10 import net.sf.etl.parsers.utils.AbstractTreeParser.PositionPolicy;
 11 import org.xml.sax.InputSource;
 12 import calculator.interpreter.ast.Node;
 13 import calculator.interpreter.ast.Statement;
 14 
 15 /**
 16  * The base class for the interpreter.
 17  */
 18 public class Interpreter {
 19   /**
 20    * The list that contains errors that are accumulated during the parsing
 21    * process.
 22    */
 23   protected final LinkedList<String> errors = new LinkedList<String>();
 24   /**
 25    * If true interpreter runs in interactive mode
 26    */
 27   protected boolean interactive = false;
 28 
 29   /**
 30    * The main interpreter loop it reads a sequence of statements from the AST
 31    * parser {@code statements} and executes them one by one. If an error
 32    * encountered, it is printed and the entire top level statement is skipped
 33    * (it is assumed the user will enter a correct version of the statement after
 34    * it receives an error about previous input).
 35    * 
 36    * @param statements the AST parser that returns a sequence of statements
 37    * @param env the initial environment
 38    * @return the resulting environment of the interpretation
 39    * @see #enlistError(TermToken)
 40    * @see #printErrors()
 41    */
 42   protected Environment interpret(FieldTermParser<Node> statements,
 43       Environment env) {
 44     final Cell lastResult = new Cell();
 45     while (statements.hasNext()) {
 46       Node current = statements.next();
 47       if (printErrors()) {
 48         printPrompt();
 49         continue;
 50       }
 51       if (current instanceof Statement) {
 52         Statement s = (Statement) current;
 53         lastResult.value = null;
 54         try {
 55           env = s.eval(env, lastResult);
 56           processResult(lastResult);
 57         } catch (EvalException ex) {
 58           System.err.println("EVAL ERROR: " + ex.getMessage());
 59         } catch (Exception ex) {
 60           System.err.println("Executing the statement of the type "
 61               + s.getClass().getName() + " at " + s.location.toShortString()
 62               + " resulted in the exception.");
 63           ex.printStackTrace();
 64         } finally {
 65           printPrompt();
 66         }
 67       } else {
 68         throw new IllegalArgumentException(
 69             "The input is not a valid calculator input file: "
 70                 + current.getClass().getName());
 71       }
 72     }
 73     printErrors();
 74     return env;
 75   }
 76 
 77   /**
 78    * Process the last result
 79    * 
 80    * @param lastResult the last result holder
 81    */
 82   protected void processResult(Cell lastResult) {
 83     if (interactive) {
 84       System.out.println("Result: " + lastResult.value);
 85     }
 86   }
 87 
 88   /**
 89    * Print prompt on the console
 90    */
 91   public void printPrompt() {
 92     if (interactive) {
 93       System.out.print("eval> ");
 94       System.out.flush();
 95     }
 96   }
 97 
 98   /**
 99    * Print errors that encountered during parsing
100    * 
101    * @param code the parser
102    * @return true if some errors were detected
103    */
104   protected boolean printErrors() {
105     if (!errors.isEmpty()) {
106       System.err.println("Errors in the input, the statement is skipped.");
107       for (String e : errors) {
108         System.err.println(e);
109       }
110       errors.clear();
111       return true;
112     }
113     return false;
114   }
115 
116   /**
117    * Enlist error that happens during parsing an object or when checking if the
118    * next object exists. The enlisted errors are later printed using
119    * {@link #printErrors()} method in the interpreter loop. This method is
120    * called from the method
121    * {@link FieldTermParser#handleErrorFromParser(TermToken)}.
122    * 
123    * @param errorToken the error token
124    */
125   protected void enlistError(TermToken errorToken) {
126     ErrorInfo ei = errorToken.errorInfo();
127     String text = errorToken.kind().name() + " "
128         + ei.location().toShortString() + ": " + ei.message();
129     errors.add(text);
130   }
131 
132   /**
133    * Create tree parser from event parser. This parser calculator text into a
134    * sequence of {@link Statement}-s located at the top level of the grammar.
135    * 
136    * @param parser the event parser
137    * @return the AST parser
138    */
139   protected FieldTermParser<Node> createTreeParser(TermParser parser) {
140     FieldTermParser<Node> rc = new FieldTermParser<Node>(parser,
141         Interpreter.class.getClassLoader()) {
142       @Override
143       protected void handleErrorFromParser(TermToken errorToken) {
144         enlistError(errorToken);
145       }
146     };
147     rc.mapNamespaceToPackage("http://etl.sf.net/samples/calculator",
148         "calculator.interpreter.ast");
149     rc.mapNamespaceToPackage("http://etl.sf.net/samples/calculator/arith",
150         "calculator.interpreter.ast.arith");
151     rc.mapNamespaceToPackage("http://etl.sf.net/samples/calculator/logic",
152         "calculator.interpreter.ast.logic");
153     rc.mapNamespaceToPackage("http://etl.sf.net/samples/calculator/vars",
154         "calculator.interpreter.ast.vars");
155     rc.mapNamespaceToPackage("http://etl.sf.net/samples/calculator/meta",
156         "calculator.interpreter.ast.meta");
157     rc.mapNamespaceToPackage("http://etl.sf.net/samples/calculator/lambda",
158         "calculator.interpreter.ast.lambda");
159     rc.ignoreNamespace(StandardGrammars.DOCTYPE_NS);
160     rc.setPosPolicy(PositionPolicy.SOURCE_LOCATION);
161     rc.setAbortOnDefaultGrammar(true);
162     return rc;
163   }
164 
165   /**
166    * Interpret the single source (file or stdin)
167    * 
168    * @param factory the preconfigured factory for the term parser
169    * @param source the source to interpret
170    * @param env the initial environment to interpret
171    * @return the resulting environment of the source execution
172    */
173   public Environment interpretSource(TermParserFactory factory,
174       InputSource source, Environment env) {
175     TermParser parser = factory.newTermParser();
176     parser.parse(source);
177     try {
178       parser.advance();
179       FieldTermParser<Node> code = createTreeParser(parser);
180       return interpret(code, env);
181     } finally {
182       parser.close();
183     }
184   }
185 }
186

2.5.1. Starting Interpretation

The interpreter entry point is interpretSource(...) method. The parameter factory is used to create a parser instance. Like XML parser factory classes, it is possible to specify some parameters that affect the parsing process on the factory. Usually a single factory is used to create parsers for the same type of the source. The inputSource parameter allows specifying what should be parsed in a flexible way. The last env parameter is not used for now.

2.5.2. Interpreter Loop

The interpreter loop is implemented in the method interpret(...) . The field term parser is used as a kind of iterator. The statements are read from it and executed one by one. If the statement throws an exception, the exception is printed and execution continues. The exception EvalException is treated specially, for it a stack trace is not printed. It is used when it is thrown when the error is due to invalid user input and that it is just needed to show a message to the user.If execution was successful, the result of the executing statement is passed to processResult(...) method. If the variable printExecutionResults is set to true, the execution results are printed out.

The input error handling model is very simple and was created to support interactive interaction mode, it relies on the build in error recovery support that we have discussed earlier. For other usages (like code generation) it might be preferable to read source entirely and not to process it further at all if there are some syntax errors.

The field errors contains a list of errors that were accumulated during the parsing process. The errors are collected by the method enlistError(...) . This method just formats the error text and adds it to the error list. Note that if the syntax error is encountered, the tree might be constructed incompletely, so some properties that are deterministically set by the grammar might contain null. So if statement contains some errors (checked by the method printErrors() ), it skipped and the errors are printed. However you need to enter a semicolon to terminate the current segment in the input and to press newline. The prompt for the next statement will not appear otherwise.

2.5.3. Creating FieldTermParser

The method createFieldParser(...) creates a field parser from the event term parser. Our model objects for interpreter are already created according to the convention required by the class FieldTermParser , so we need very little effort for customizing the parser.

There are few things to note about this parser. The created parser extends FieldTermParser and uses Node as a base class for all used AST nodes. The first argument of the constructor is an event term parser instance like a base class. The second argument is a classloader. This classloader should be able to load all AST classes. Naturally the classloader of the class Interpreter will work just fine. In the more complex scenarios (for example if different parts of AST are loaded in different classloaders), the class loading could get really complicated and you might want to overload the getObjectClass(...) method. The overridden method handleErrorFromParser is invoked if the event parser reports some error during construction of the objects (this method just delegates to the method enlistError(...) ). Note that up to this point, the functionality of the parser could have been customized completely generally, without implementing any classes.

The later code perform tuning our parsers for the needs of interpreter. The method mapNamespaceToPackage maps the namespace declared in the grammar to the java package where AST classes reside. The fully qualified java class name will be produced by appending the class name to the package name. It is also possible to specify a mapping for particular combination of name and namespace, but considering that we have chosen java class names that match grammar object name, we do not need to do it now. Note that we have used a crystal ball here to define namespace mapping for all namespaces that we later use in the tutorial, but if in the real project you are short on the crystal balls, you could as well use a configuration file or allow plugins to contribute the namespace to package mapping.

The method ignoreNamespace(...) tells parser that we are not interested to receive objects from the specified namespace. The document type instruction is represented by the own object, and we do not want to provide a mapping to it.

The method setPosPolicy(PositionPolicy.SOURCE_LOCATION) specifies the policy of mapping positional information to AST (and if neither policy suits your AST, you could write a custom one).

The method setAbortOnDefaultGrammar(true) requires a little explanation. As we have discussed in the section Error Recovery , the ETL TermParser is always able to parse the entire source (unless there is an IO error), since it should also support text editing scenarios that are bound to have incorrect text inside them from time to time. So if there is an error in the document type declaration or there is a problem with loading or compiling the referenced grammar, the default grammar is used that is able to parse any ETL source. This grammar creates objects that use namespace defined in the constant StandardGrammars.DEFAULT_NS . But in the most cases, AST parsers are not interested in objects from this namespace. There is an option to ignore them, like we have done for the document type namespace. But in that case the ETL parser will read the entire source in the method hasNext(...) . But this method call provides a better alternative, it specifies that if such object from default namespace is encountered, it should be treated as EOF. So no further parsing will be performed.

2.5.4. Running Basic Interpreter

With implementing the Interpreter class, we have almost all pieces in the place. The class InterpreterBasic implements the rest of mundane work for creating the interpreter. The only piece that is specific to ETL is creation of the TermParserFactory in the method start(...) . The default grammar is set to the factory. This grammar is used when the first segment of the source does not starts with the token doctype . For example if standard input is used for the source and type expressions right away, the default grammar will be used. We will allow to customize default grammar in limited way by using -Dcalc.lang={language level} virtual machine argument. Then the default grammar will be a grammar /calculator/calculator-{language level}-0_2_1.g.etl located somewhere on the application classpath. If property is not specified the basic grammar is used.

There is also the method initialEnvironment(...) currently it is just a placeholder for future extensions.

Now it is possible to run our extremely basic calculator. It accepts as application argument a file name with source that contains a set of print statements. Or it is possible to use it without arguments, so standard input will be used.

Example 14. RunnerBasic.java

  1 package calculator.interpreter;
  2 
  3 import java.io.File;
  4 import java.net.URL;
  5 import net.sf.etl.parsers.TermParserFactory;
  6 import org.xml.sax.InputSource;
  7 
  8 /**
  9  * The runner for sources for basic calculator language.
 10  */
 11 public class RunnerBasic {
 12   /**
 13    * The name of file to execute
 14    */
 15   protected String filename;
 16   /**
 17    * The configured term parser factory
 18    */
 19   protected TermParserFactory factory;
 20   /**
 21    * The interpreter
 22    */
 23   protected Interpreter interpreter;
 24 
 25   /**
 26    * Prints usage information and exits
 27    */
 28   public void usage() {
 29     System.err.println("Calculator Interpreter (basic version). "
 30         + "(c) Constantine Plotnikov, 2008");
 31     System.err.println("Usage: java " + getClass().getName() + " [<file>]");
 32     System.err.println(" <file> - the file to be interpreted. "
 33         + "If absent, stdin will be used.");
 34     System.exit(1);
 35   }
 36 
 37   /**
 38    * Interpreter entry point. The method parses arguments and stars
 39    * interpretation process for the source basing on the arguments.
 40    * 
 41    * @param args application arguments
 42    */
 43   public void start(String[] args) {
 44     interpreter = createInterpreter();
 45     parseArgs(args);
 46     createFactory();
 47     try {
 48       interpreter.interpretSource(factory, getInput(filename),
 49           initialEnvironment());
 50     } catch (Exception e) {
 51       System.err
 52           .println("Problem with running parser on interpreting the code.");
 53       e.printStackTrace();
 54       System.exit(1);
 55     }
 56   }
 57 
 58   /**
 59    * Create factory instance
 60    */
 61   protected void createFactory() {
 62     factory = TermParserFactory.newInstance();
 63     factory.setDefaultGrammar(getDefaultGrammarPublicId(), getDefaultGrammar(),
 64         null);
 65   }
 66 
 67   /**
 68    * @return public identifier for the default grammar
 69    */
 70   protected String getDefaultGrammarPublicId() {
 71     return null;
 72   }
 73 
 74   /**
 75    * @return create interpreter instance
 76    */
 77   public Interpreter createInterpreter() {
 78     return new Interpreter();
 79   }
 80 
 81   /**
 82    * @return the initial environment for the interpreter
 83    */
 84   protected Environment initialEnvironment() {
 85     return new Environment();
 86   }
 87 
 88   /**
 89    * Parse arguments and initialize interpreter state
 90    * 
 91    * @param args arguments to parse
 92    */
 93   protected void parseArgs(String[] args) {
 94     if (args.length > 1) {
 95       usage();
 96     }
 97     filename = args.length == 1 ? args[0] : null;
 98     if (filename == null) {
 99       interpreter.interactive = true;
100       System.out.println("Running interpreter in interactive mode.");
101       interpreter.printPrompt();
102     }
103   }
104 
105   /**
106    * @return the default grammar to use in the case if the grammar is not
107    *         specified
108    */
109   protected String getDefaultGrammar() {
110     String lang = getLanguageLevel();
111     URL grammar = getClass().getResource(
112         "/calculator/grammars/calculator-" + lang + "-0_2_1.g.etl");
113     return grammar == null ? null : grammar.toString();
114   }
115 
116   /**
117    * @return the current language level
118    */
119   protected String getLanguageLevel() {
120     return System.getProperty("calc.lang", getDefaultLanguageLevel());
121   }
122 
123   /**
124    * @return the default language level
125    */
126   protected String getDefaultLanguageLevel() {
127     return "basic";
128   }
129 
130   /**
131    * Create input source basing on whether the input file present.
132    * 
133    * @param filename the filename or null if stdin should be used
134    * @return the created input source
135    * @throws Exception if there is a problem with creating a input source
136    */
137   protected static InputSource getInput(String filename) throws Exception {
138     InputSource source;
139     if (filename == null) {
140       source = new InputSource(System.in);
141       source.setSystemId("<stdin>");
142     } else {
143       source = new InputSource(new File(filename).toURI().toURL().toString());
144     }
145     return source;
146   }
147 
148   /**
149    * Application entry point.
150    * 
151    * @param args application arguments
152    */
153   public static void main(String[] args) {
154     new RunnerBasic().start(args);
155   }
156 }
157

3. Implementing Arithmetics

The calculator currently is a quite dummy, it is only able to print numbers that given to it. But it could understand numbers in the bases up to the base 36. Let's add arithmetics to it.

3.1. Creating Grammar With Arithmetics

Instead of simply modifying the file calculator-basic-0_2_1.g.etl we will go a complex way. We will create a new grammar that extends our basic grammar (it will be placed in the file calculator-arith-0_2_1.g.etl ).

Example 15. calculator-arith-0_2_1.g.etl

     1: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
     2: grammar calculator.CalculatorArith {
     3:     include "calculator-basic-0_2_1.g.etl";
     4:     namespace t = "http://etl.sf.net/samples/calculator";
     5:     namespace default a = "http://etl.sf.net/samples/calculator/arith";
     6:     context default Expressions {
     7:         op composite Identity(f) {
     8:             ^t:Identity {
     9:                 % (;
    10:                 @ value = expression;
    11:                 % );
    12:             };
    13:         };
    14:         op UnaryMinus(fy, 200, -) {
    15:             @ value = right;
    16:         };
    17:         op UnaryPlus(fy, 200, +) {
    18:             @ value = right;
    19:         };
    20:         op Power(xfy, 300, **) {
    21:             @ first = left;
    22:             @ second = right;
    23:         };
    24:         op Multiply(yfx, 400, *) {
    25:             @ first = left;
    26:             @ second = right;
    27:         };
    28:         op Divide(yfx, 400, /) {
    29:             @ first = left;
    30:             @ second = right;
    31:         };
    32:         op Remainder(yfx, 400, %) {
    33:             @ first = left;
    34:             @ second = right;
    35:         };
    36:         op Plus(yfx, 500, +) {
    37:             @ first = left;
    38:             @ second = right;
    39:         };
    40:         op Minus(yfx, 500, -) {
    41:             @ first = left;
    42:             @ second = right;
    43:         };
    44:     };
    45: };

The overall grammar structure is similar to what is we have defined in the section Basic Grammar . There are two namespace declarations and the default namespace for this grammar is different from the basic grammar.

Another new element here is the include statement. The statement specifies that all contexts and definitions from the specified grammar should be included in this grammar. If the including grammar already have a definition with the same name, the definition from included grammar will not be added.

Note that contexts also support include operations. And all grammar include statements are executed before context include statements. And we will give an example of it in the section Introducing Variables .

It is also possible to include multiple grammars. If two included grammars contains the definition with the same name, this results in conflict (but if it is the the same definition that is available through multiple include paths, there is no conflict). This conflict might be resolved by creating a definition in the including grammar with the same name. Such definition might integrate features of definitions from the both parent grammars.

And because of the include statement, we do not have to copy definitions from the basic grammar.

There are no build-in expression parentheses in ETL, since programming language designers use the parentheses in many creative ways. They have to be defined explicitly using a primary operator, but it is quite simple. The Identity operator does just this. It is a primary operator just like the operators NumberLiteral and StringValue . Since we want to define this operator in the non-default namespace, we have to specify the root object explicitly.

This grammar provides prefix operators + (unary plus) and - (unary minus), and infix operators ** (power operator), * (multiply), / (divide), % (reminder), + (plus) and - (minus). These operators have normal precedence levels. So a + b ** -c * d will be parsed as a + ((b ** (-c)) * d) . Now let's look at details of the operator definition. The parentheses (right after operator name) contain operator specification. ^[4]

There might be from one to three elements in the operator specification. The second element in the parentheses is a precedence. The precedence of the expression is a precedence of its outermost operator. The precedence of the primary operators by default is zero, so it is not specified explicitly in our examples.

The first element of the operator specification is associativity specification. The supported associativity is specified as a pattern. The letter f means operator itself. The letter x means an expression of the precedence that is strictly less than the precedence of the operator (so the operator is non-associative on that side). The letter y means an expression of the precedence that is less or equal than the precedence of the operator (so the operator is associative on that side). If there are no letters on the left or right of the letter f in the associativity specification, it means that the operator does not have the left or the right parameter. So if the associativity is f , it means a primary operator that has no arguments at all. The associativity fy , it means a prefix right-associative operator that has one argument on the right (like unary minus). The associativity yfx , it means an infix left-associative operator (like plus and minus). This operator notation is a quite powerful, and allows defining quite complex expressions. But what it makes it more interesting, it is extensible. It is easy to add new operator at existing or new precedence level when we extending the grammars and we do it a number of times in our samples.

The third element is available only for non-composite operators. It specifies the operator text, only one token is allowed here. The composite operators are able to specify the operator text that consists of any valid syntax expression, but this text is contained in the body of the operator. And in the later sections of the tutorial we will define quite a few of weird operators.

The body of simple operator definition contains only mapping for the left and right arguments. The syntax expressions right and left denote the right and left arguments of operator correspondingly. The mapping to the arguments could happen in any order (for composite operators it could be freely interleaved with syntax of the operator), but it must present at the top level of the block if we do not use root object, or right under top object if we use top object. There must exactly one mapping for each argument the operator could have.

In our sample grammars, for most unary operators will will map arguments to the property value . And for infix operator will will use the properties first and second . However in real grammars it might make sense to use descriptive names instead of referring to the operand positions.

The file sample-arith.c.etl demonstrates what is possible using newly defined constructs:

Example 16. sample-arith.c.etl

     1: doctype "calculator-arith-0_2_1.g.etl";
     2: print "zeroes: ", 2*3#10#-3 * 2 ** 3 / 4, " ", +1 + -1; 
     3: print "dozens: ", 12#100# ** (1/2), " ", 3#1_0# * 16#10000# ** +8#0.1#;

You could check also check sample AST .

3.2. Implementing AST classes

Since all our new AST classes for new primary, infix, and prefix operator share common properties, we will use base classes with these properties already implemented.

Example 17. InfixOp.java

  1 package calculator.interpreter.ast;
  2 
  3 /**
  4  * Base class for infix operators
  5  */
  6 public abstract class InfixOp extends Expression {
  7   /** The first (left) operand of operator */
  8   public Expression first;
  9   /** The second (right) operand of operator */
 10   public Expression second;
 11 }
 12

Example 18. UnaryOp.java

  1 package calculator.interpreter.ast;
  2 
  3 /**
  4  * The unary operation
  5  */
  6 public abstract class UnaryOp extends Expression {
  7   /** The operand of unary operation */
  8   public Expression value;
  9 }
 10

The implementation of identity operator is trivial.

Example 19. Identity.java

  1 package calculator.interpreter.ast;
  2 
  3 import calculator.interpreter.Environment;
  4 
  5 /**
  6  * The identity operator that just returns what is enclosed.
  7  */
  8 public class Identity extends UnaryOp {
  9 
 10   @Override
 11   public Object eval(Environment env) {
 12     return value.eval(env);
 13   }
 14 }
 15

Because many operations work only with numerical values, we could use a common code that checks arguments. The class EvalUtils contains these utility members to work with the different value types. Its primary function is to throw an exception in the right form when an unexpected value encountered. Note that a real language should allow extending the operators in some flexible way, but type classes, multimethods, and similar things are not subject of this tutorial.

Example 20. EvalUtils.java

  1 package calculator.interpreter;
  2 
  3 import net.sf.etl.parsers.SourceLocation;
  4 import calculator.interpreter.ast.Expression;
  5 import calculator.interpreter.ast.Node;
  6 
  7 /**
  8  * Evaluation utilities
  9  */
 10 public class EvalUtils {
 11   /**
 12    * Evaluate expression to double
 13    * 
 14    * @param owner the owner
 15    * @param env the evaluation environment
 16    * @param ex the expression to evaluate
 17    * @return the double value
 18    */
 19   public static double toDouble(Node owner, Environment env, Expression ex) {
 20     Object o = ex.eval(env);
 21     if (o instanceof Double) {
 22       return ((Double) o).doubleValue();
 23     } else {
 24       throw new EvalException(ex, "The node type "
 25           + owner.getClass().getSimpleName() + " expects expression at "
 26           + ex.location.toShortString()
 27           + " to be of type double, and it evaluated to " + o
 28           + (o != null ? " : " + o.getClass().getName() : ""));
 29     }
 30   }
 31 
 32   /**
 33    * Evaluate expression to boolean
 34    * 
 35    * @param owner the owner
 36    * @param env the evaluation environment
 37    * @param ex the expression to evaluate
 38    * @return the boolean value
 39    */
 40   public static boolean toBoolean(Node owner, Environment env, Expression ex) {
 41     Object o = ex.eval(env);
 42     if (o instanceof Boolean) {
 43       return ((Boolean) o).booleanValue();
 44     } else {
 45       throw new EvalException(ex, "The node type "
 46           + owner.getClass().getSimpleName() + " expects expression at "
 47           + ex.location.toShortString()
 48           + " to be of type boolean, and it evaluated to " + o
 49           + (o != null ? " : " + o.getClass().getName() : ""));
 50     }
 51   }
 52 
 53   /**
 54    * Return value casted to string or throw {@link EvalException}.
 55    * 
 56    * @param location location of expression
 57    * @param role the role of the expression
 58    * @param value the value to cast
 59    * @return the casted value
 60    */
 61   public static String expectString(SourceLocation location, String role,
 62       Object value) {
 63     if (value instanceof String) {
 64       return (String) value;
 65     } else {
 66       throw new EvalException(location, "Expecting " + role + " and found "
 67           + value + " : " + (value == null ? null : value.getClass()));
 68     }
 69   }
 70 }
 71

With this utility, implementing operators that require only float arguments is really simple.

Example 21. UnaryMinus.java

  1 package calculator.interpreter.ast.arith;
  2 
  3 import calculator.interpreter.Environment;
  4 import calculator.interpreter.EvalUtils;
  5 import calculator.interpreter.ast.UnaryOp;
  6 
  7 /**
  8  * Unary minus
  9  */
 10 public class UnaryMinus extends UnaryOp {
 11 
 12   @Override
 13   public Object eval(Environment env) {
 14     return -EvalUtils.toDouble(this, env, value);
 15   }
 16 }
 17

Example 22. Power.java

  1 package calculator.interpreter.ast.arith;
  2 
  3 import calculator.interpreter.Environment;
  4 import calculator.interpreter.EvalUtils;
  5 import calculator.interpreter.ast.InfixOp;
  6 
  7 /**
  8  * Power operator
  9  */
 10 public class Power extends InfixOp {
 11 
 12   @Override
 13   public Object eval(Environment env) {
 14     double f = EvalUtils.toDouble(this, env, first);
 15     double s = EvalUtils.toDouble(this, env, second);
 16     return Math.pow(f, s);
 17   }
 18 }
 19

The Divide, Remainder, Minus, UnaryPlus operators are implemented in a similar way.

A bit more complex is a Plus operator, since we would like to support it for string concatenation as well.

Example 23. Plus.java

  1 package calculator.interpreter.ast.arith;
  2 
  3 import calculator.interpreter.Environment;
  4 import calculator.interpreter.EvalException;
  5 import calculator.interpreter.EvalUtils;
  6 import calculator.interpreter.ast.InfixOp;
  7 
  8 /**
  9  * Plus operator
 10  */
 11 public class Plus extends InfixOp {
 12   @Override
 13   public Object eval(Environment env) {
 14     Object f = first.eval(env);
 15     if (f instanceof String) {
 16       return ((String) f) + second.eval(env);
 17     } else if (f instanceof Double) {
 18       double s = EvalUtils.toDouble(this, env, second);
 19       return ((Double) f) + s;
 20     } else {
 21       throw new EvalException(first,
 22           "Unsupported type of operand for plus operator: " + f);
 23     }
 24   }
 25 }
 26

And it is also possible to implement a multiply operator for strings and numbers as well.

Example 24. Multiply.java

  1 package calculator.interpreter.ast.arith;
  2 
  3 import calculator.interpreter.Environment;
  4 import calculator.interpreter.EvalException;
  5 import calculator.interpreter.EvalUtils;
  6 import calculator.interpreter.ast.InfixOp;
  7 
  8 /**
  9  * Multiply operation
 10  */
 11 public class Multiply extends InfixOp {
 12   @Override
 13   public Object eval(Environment env) {
 14     Object f = first.eval(env);
 15     if (f instanceof String) {
 16       String str = (String) f;
 17       double s = EvalUtils.toDouble(this, env, second);
 18       if (s > 1024 && s >= 0) {
 19         throw new EvalException(second,
 20             "The string could be multiplied by non-negative "
 21                 + "numbers smaller than 1024 and the second "
 22                 + "argument evaluated to " + s);
 23       }
 24       int n = (int) s;
 25       StringBuilder rc = new StringBuilder(str.length() * n);
 26       for (int i = 0; i < n; i++) {
 27         rc.append(str);
 28       }
 29       return rc.toString();
 30     } else if (f instanceof Double) {
 31       double s = EvalUtils.toDouble(this, env, second);
 32       return ((Double) f) * s;
 33     } else {
 34       throw new EvalException(first,
 35           "Unsupported type of operand for plus operator: " + f);
 36     }
 37   }
 38 }
 39

3.3. Running the code

No particular efforts are required to run calculator with extensions we just provided. If the source contains the doctype declaration, the correct grammar will be automatically picked. The default grammar (for example for interactive scenario) could be changed by setting a system property calc.lang to the value "arith".

4. Introducing Variables

4.1. Defining the grammar

In the current form, the values once calculated cannot be reused later. In this section we will add variables to the language. This section also describes some definition related constructs in the grammar definition language.

Example 25. calculator-vars-0_2_1.g.etl

     1: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
     2: grammar calculator.CalculatorVars {
     3:     include "calculator-arith-0_2_1.g.etl";
     4:     namespace t = "http://etl.sf.net/samples/calculator";
     5:     namespace m = "http://etl.sf.net/samples/calculator/meta";
     6:     namespace default v = "http://etl.sf.net/samples/calculator/vars";
     7:     context abstract MetaInformation {
     8:         def ExpressionDef {
     9:             expression(Expressions); 
    10:         };
    11:         def NameDef {
    12:             ^ v:Name { 
    13:                 {
    14:                     @ literal = identifier;
    15:                 } | {
    16:                     @ quoted = string(prefix=Q|q, quote="'");
    17:                 };
    18:             };
    19:         };
    20:         documentation Documentation {
    21:             @ documentation += doclines wrapper m:DocumentationLine.text;
    22:         };
    23:         attributes Annotations {
    24:             @ annotations += % @ {
    25:                 ^ m:Annotation {
    26:                     @ name = ref(NameDef);
    27:                     % ( {
    28:                         @ arguments += list , {ref(ExpressionDef);}?;
    29:                     } % ) ?;
    30:                 }; 
    31:             } *;
    32:         };
    33:     };
    34:     context default Expressions {
    35:         include MetaInformation;
    36:         def ExpressionDef {
    37:             expression;
    38:         };
    39:         statement VarStatement {
    40:             @ type = token(var) | token(let);
    41:             @ name = ref(NameDef);
    42:             % = {
    43:                 @ value = expression;
    44:             }?;
    45:         };
    46:         op composite Name(f) {
    47:             ref(NameDef);
    48:         };
    49:         def SequenceDef {
    50:             ^ t:Sequence {
    51:                 @ statements += block; 
    52:             };
    53:         };
    54:         op composite Sequence(f) {
    55:             ref(SequenceDef);
    56:         };
    57:         op Assignment(xfy, 2000, =) {
    58:             @ first = left;
    59:             @ second = right;
    60:         };
    61:         statement Help {
    62:             % help;
    63:             @ variable = ref(NameDef);
    64:         };
    65:         statement IncludeStatement {
    66:             ^ t:IncludeStatement {
    67:                 % include;
    68:                 @ systemId = string(quote='"');
    69:             };
    70:         };
    71:     };
    72: };

In this grammar we have two contexts instead of one. In addition to the context Expressions , there is the context MetaInformation . The context MetaInformation is abstract. This means that it purely holder for the definition and it cannot be referred in any way except by include directives. This context is included in the context Expressions . The context include operation is executed after all grammar include operations, and the context include directives are also included through grammar include. The meaning of the context include operation is similar to grammar include operation. All definitions from the included context are copied in the including context unless redefined.

This grammar demonstrates use of the new definition type. The keyword def starts a fragment definition. The content of this expression is literally substituted by the content of the corresponding fragment definition. The fragment definitions are used in two roles: as reusable fragments of the syntax and as extension points for the context. The fragment is a reusable syntax fragment that is can be referred in other syntax expressions by the name using the operator ref({fragment name}) . The fragments can reference each other, but since their semantics is a literal substitution, they cannot do it recursively.

The important reused fragment in this grammar is NameDef . It represents a name of variable and it is used in different places where variable name could happen. The expression identifier within the property literal matches any identifier. However there is also an alternative mapping that uses the property quoted defined as a prefixed string with prefixes q or Q (the prefixed string is an identifier immediately followed by the string, in our case only the strings with prefixes q and Q will be recognized as names like Q'a' or q'let'). As result, the NameDef fragment will match an identifier and a single quoted string prefixed with Q or q . The same object v:Name will be created, but different properties of this object will be used.

This is done to solve an important problem with extensibility. Since ETL supports an evolution of the languages, the identifiers that were not previously used as keywords could become keywords with evolution of the language or use extensions. So it might be possible to refer to the name when in the context where keyword with the same text is available. By introducing the quoted form of names to our grammar, we will be able to use new constructs and old names. Note that it is still a bad style to use names that are known to match some keyword, and thus the syntax for them could be somewhat less convenient than for normal names.

The quoted for of names also allow solving another problem, the different languages support different syntaxes for identifiers. For example Java allow "$" in identifiers, but the ETL does not. The quoted names will allow solving this integration problem by allowing to specify such names as well.

In our sample we have the definition ExpressionDef that defines a syntax fragment that serves as an extension point in this context. This fragment is later used by ref(ExpressionDef) expression. This definition is redefined in the context Expressions to refer "this" context instead of referring the context by the name.

The documentation definition is also a new element in this grammar. This definition specifies how documentation comments are mapped. The ETL supports C#-style documentation comments which is a sequence of line comments that start with /// . The documentation definition must contain a single multivalue property assignment statement with doclines expression on the right hand side. The doclines expression matches all documentation comments that comes before statement. By default, the comments are mapped to the single value token. But in most cases, the tokens are wanted in the object form in order to supply positional information. The text wrapper t:DocumentationLine.text after the token doclines specifies that each documentation line should be wrapped into the object DocumetationLine and the value should be mapped to the property text . The definition applies to the all statements in the context. However in our tutorial we just ignore documentation for all statements except variable definition.

The attributes definition also applies to the all statements in the context. It is executed as if its text was specified at the beginning of the each statement in the context inside an explicit or implicit top level object of the statement. The primary use of this definition kind is to define a syntax for structured meta-information (like .NET attributes or Java annotations). This particular definition allows for Java-style annotations. There might be mechanisms in the future versions ETL to allow disabling attributes for the some statements.

The VarStatement defines a variable. The variable has name and optional value. The variable could be either multiple assignment (the "var" form) or single assignment (the "let" form) kind. Note the use of token(...) syntax expression. This expression matches and yields the token specified in the parentheses. In that case either of the values "let" and "var" will be assigned to property type . The value expression is an optional (note that the syntax operator ? applies to the both the block and = . And the value could be assigned later.

The primary operator Name is an expression which evaluates to the value assigned to the variable. Note that we here used NameDef directly to simplify object model and to demonstrate usage of definition as top level objects. The possible alternative would have been to wrap it into additional object to make a better distinction between the name concept and this specific usage of the name.

The infix operator Assignment is a bit tricky to define. On the left hand side we want to allow only names for now. On the right hand side we want to allow any expression including assignment. However we want to allow assignment only on top level expressions. Because of the last constraint we give the assignment the high precedence level 2000. But this implies that on the left hand side could be any expression except assignment as well. We will need to check this during evaluation or semantic analysis. Also note that assignment operator is right-associative like it is in C or Java.

The operator Sequence allows using blocks in expressions. The value of sequence expression is the value of the last statement (or null if there are no statements). So we can write the number 12 in a complex way like {let a = 2; a*a;} * 3; . The block will define a new lexical scope in our language. So variables defined in it, are not available outside of the block. And variables from outer scope are available inside the block. Like the expression syntax expression that we have used in the fragment ExpressionDef , the block syntax expression takes an optional argument in the form of the context name. If context name is not explicitly specified for the block syntax expression, the block takes statements from the current context.

The statement Help is supposed to print interesting information about variable (like documentation and annotations).

The statement IncludeStatement is supposed to include the referenced source. We will use it in order not to duplicate definitions from var's level prelude in the later level preludes. But it could be also used to organize sources into multiple modules.

4.2. ETL conflict resolution strategies

Note that from formal point of view we have introduced several ambiguous constructs in this grammar, but they are resolved automatically by ETL.

The important point about the ETL is that the ETL grammars are LL(1) from point of view of the source verification (from point of view of AST building the things are somewhat complicated since the ETL parser does some trickery to support operators correctly). This restricts what the ETL parser can do to resolve conflicts.

The first ambiguity is between general expression statement and statements like let, var, help, and print. All these statements might start with identifier now. However the expression statement starts with unspecified identifier, but statement VarStatement starts with either let or var identifier. In the ETL the most specific alternative is selected. If there is two alternatives one with generic identifier and one that exactly matches the keyword, the most specific one will be chosen. However if would have introduced two statements that start with identifier, the grammar compiler would have detected a conflict and grammar would not have been compiled.

Considering unpredictability of the language evolution process (particularly if users are allowed to innovate too), it is better to provide ways in the defined grammars to resolve conflict between keywords tokens and tokens used as user supplied names. The quoted form of the name provided in our grammar, is one of the possible ways.

The second ambiguous construct is Annotation. After the annotation name there are optional parentheses with annotation arguments. Because of them the expression like @Test (2)-1; could evaluate to the different values if understood differently. However the ETL has greedy strategy of the parsing in the all elements. So the part @Test (2) will be understood as the annotation with name Test with the single argument 2 . It might be better not to introduce an ambiguous constructs in the grammar. For example, we could have made parentheses after annotation name mandatory rather optional.

4.3. Implementing AST classes

The AST classes are created in the same way as before. But now we finally make a use of some elements that we have previously added for the future use. The fields documentation and annotations of the Statement class are finally used to store documentation and annotations correspondingly.

The Environment class (that could have been empty class in the previous sections of the tutorial) implements lexical scope for variables. The scope is represented as a linked list of the variables. While this implementation is quite inefficient, it is very simple and does the job for the purposes of the tutorial.

Example 26. Environment.java

  1 package calculator.interpreter;
  2 
  3 import net.sf.etl.parsers.SourceLocation;
  4 
  5 /**
  6  * The evaluation environment
  7  */
  8 public class Environment {
  9   /** the previous environment or null for top one */
 10   private final Environment previous;
 11   /** the variable */
 12   private final Variable var;
 13 
 14   /**
 15    *The constructor for the context
 16    * 
 17    * @param previous the previous context (might be null)
 18    * @param var the variable
 19    */
 20   public Environment(Environment previous, Variable var) {
 21     super();
 22     this.previous = previous;
 23     this.var = var;
 24   }
 25 
 26   /**
 27    * The constructor for initial environment
 28    */
 29   public Environment() {
 30     var = null;
 31     previous = null;
 32   }
 33 
 34   /**
 35    * Find a variable
 36    * 
 37    * @param name the name to lookup
 38    * @return the variable or null if it was not found.
 39    */
 40   public Variable find(String name) {
 41     for (Environment env = this; env.previous != null; env = env.previous) {
 42       if (env.var.name.equals(name)) {
 43         return env.var;
 44       }
 45     }
 46     return null;
 47   }
 48 
 49   /**
 50    * Find variable or fail with exception
 51    * 
 52    * @param location the location of the node that requested the the variable
 53    * @param name the name to lookup
 54    * @return the found variable
 55    * @throws EvalException if variable is not found
 56    */
 57   public Variable findDefined(SourceLocation location, String name) {
 58     Variable rc = find(name);
 59     if (rc == null) {
 60       throw new EvalException(location, "No variable with " + name
 61           + " is defined.");
 62     }
 63     return rc;
 64   }
 65 
 66   /**
 67    * Check if the variable is free
 68    * 
 69    * @param location the location of the check
 70    * @param name the name to check
 71    */
 72   public void checkFree(SourceLocation location, String name) {
 73     Variable previousDef = find(name);
 74     if (previousDef != null) {
 75       throw new EvalException(location, "Then name '" + name
 76           + "' has been already defined at "
 77           + previousDef.defineLocation.toShortString());
 78     }
 79   }
 80 }
 81

The class Variable encapsulates all operations on the variable. For the purposes of error reporting quite a lot of information is stored here.

Example 27. Variable.java

  1 package calculator.interpreter;
  2 
  3 import java.util.List;
  4 import calculator.interpreter.ast.Node;
  5 import net.sf.etl.parsers.SourceLocation;
  6 
  7 /**
  8  * The variable
  9  */
 10 public class Variable {
 11   /** the name of variable to be defined */
 12   public final String name;
 13   /** the true value indicate that this name is mutable variable */
 14   public final boolean isVar;
 15   /** the true value indicate that is this name has been bound at least once */
 16   public boolean isBound;
 17   /** the value of the variable */
 18   public Object value;
 19   /** the location where variable is defined */
 20   public final SourceLocation defineLocation;
 21   /** the location where variable is set */
 22   public SourceLocation bindLocation;
 23   /** the documentation associated with binding (might be null) */
 24   public final String documentation;
 25   /** the annotation values (might be null) */
 26   public final List<AnnotationValue> annotations;
 27 
 28   /**
 29    *The constructor for the variable
 30    * 
 31    * @param previous the previous context (might be null)
 32    * @param name the name of variable or constant
 33    * @param isVar if true, the name could be rebound after it was bound
 34    * @param defineLocation the location where the name was defined
 35    * @param documentation the documentation associated with the variable
 36    * @param annotations the list of annotation values
 37    */
 38   public Variable(String name, boolean isVar, SourceLocation defineLocation,
 39       String documentation, List<AnnotationValue> annotations) {
 40     super();
 41     this.name = name;
 42     this.isVar = isVar;
 43     this.defineLocation = defineLocation;
 44     this.documentation = documentation;
 45     this.annotations = annotations;
 46   }
 47 
 48   /**
 49    * Get the value bound to the name
 50    * 
 51    * @param node the node that requested the bind operation
 52    * @return the current value of the binding
 53    */
 54   public Object get(Node node) {
 55     if (!isBound) {
 56       throw new EvalException(node, "The name " + name + " defined at "
 57           + defineLocation.toShortString() + " has not been bound yet.");
 58     }
 59     return value;
 60   }
 61 
 62   /**
 63    * Bind the variable with the value
 64    * 
 65    * @param location the location of the node that requested the bind operation
 66    * @param value the value to bind
 67    * @return the value argument
 68    */
 69   public Object bind(SourceLocation location, Object value) {
 70     if (!isBound) {
 71       isBound = true;
 72     } else if (!isVar) {
 73       throw new EvalException(location,
 74           "The value has been already bound to constant " + name + " at "
 75               + bindLocation);
 76     }
 77     bindLocation = location;
 78     this.value = value;
 79     return value;
 80   }
 81 
 82   /**
 83    * @return the help string for the variable
 84    */
 85   public String help() {
 86     StringBuilder b = new StringBuilder();
 87     b.append("------ ");
 88     b.append(isVar ? "Varaible " : "Constant ").append(name)
 89         .append(" ------\n");
 90     b.append("Defined at ").append(defineLocation.toShortString()).append('\n');
 91     if (isBound) {
 92       b.append("Bound at ").append(bindLocation.toShortString()).append(
 93           " to value ").append(value).append('\n');
 94     }
 95     if (documentation == null) {
 96       b.append("(No documentation defined)\n");
 97     } else {
 98       b.append(documentation);
 99     }
100     if (annotations == null || annotations.size() == 0) {
101       b.append("(No annotations defined)\n");
102     } else {
103       for (AnnotationValue v : annotations) {
104         b.append(v).append('\n');
105       }
106     }
107     return b.toString();
108   }
109 }
110

The class AnnotationValue represents an evaluated form of the annotation. It also allows launching a processor the annotation, that might modify a variable state. This class is mostly introduced to demonstrate how annotations could be added to the language.

Example 28. AnnotationValue.java

  1 package calculator.interpreter;
  2 
  3 import java.util.List;
  4 import net.sf.etl.parsers.SourceLocation;
  5 
  6 /**
  7  * The value of annotation
  8  */
  9 public class AnnotationValue {
 10   /** the location for annotation in the source */
 11   public final SourceLocation location;
 12   /** the annotation name */
 13   public final String name;
 14   /** the annotation arguments */
 15   public final List<Object> arguments;
 16 
 17   /**
 18    * The constructor
 19    * 
 20    * @param location the location of the annotation in the source
 21    * @param name the annotation name
 22    * @param arguments the annotation arguments
 23    */
 24   public AnnotationValue(SourceLocation location, String name,
 25       List<Object> arguments) {
 26     super();
 27     this.location = location;
 28     this.name = name;
 29     this.arguments = arguments;
 30   }
 31 
 32   /**
 33    * Process an annotation
 34    * 
 35    * @param env the processing environment (includes the variable)
 36    * @param var the variable to process
 37    */
 38   @SuppressWarnings("unchecked")
 39   public void process(Environment env, Variable var) {
 40     String className = "calculator.interpreter.annotations." + name
 41         + "Annotation";
 42     try {
 43       Class<? extends Processor> c = (Class<? extends Processor>) Class
 44           .forName(className);
 45       Processor p = c.newInstance();
 46       p.process(this, env, var);
 47     } catch (ClassNotFoundException e) {
 48       // ignore the problem
 49     } catch (Exception e) {
 50       throw new EvalException(location,
 51           "Processor instance failed to be created: " + e);
 52     }
 53   }
 54 
 55   @Override
 56   public String toString() {
 57     StringBuilder rc = new StringBuilder();
 58     rc.append('@').append(name).append('(');
 59     boolean isFirst = true;
 60     for (Object v : arguments) {
 61       if (isFirst) {
 62         isFirst = false;
 63       } else {
 64         rc.append(", ");
 65       }
 66       rc.append(v);
 67     }
 68     rc.append(')');
 69     return rc.toString();
 70   }
 71 
 72   /**
 73    * A processor for annotations on the variables
 74    */
 75   public interface Processor {
 76     /**
 77      * Process annotation (typically by assigning some value to the variable)
 78      * 
 79      * @param annotation an annotation to process
 80      * @param env the environment
 81      * @param var the variable to which annotation is attached
 82      */
 83     void process(AnnotationValue annotation, Environment env, Variable var);
 84   }
 85 }
 86

The class JavaConstantAnntation is an example annotation that assigns the value of the static field on some Java class to the annotated variable. ^[5]

Example 29. JavaReflectionProcessor.java

  1 package calculator.interpreter.annotations;
  2 
  3 import java.lang.reflect.Field;
  4 import calculator.interpreter.AnnotationValue;
  5 import calculator.interpreter.EvalException;
  6 import calculator.interpreter.EvalUtils;
  7 
  8 /**
  9  * Base class for annotation processors based on reflection
 10  */
 11 public abstract class JavaReflectionProcessor implements
 12     AnnotationValue.Processor {
 13 
 14   /**
 15    * Get class for reflection annotation
 16    * 
 17    * @param annotation the annotation value
 18    * @param pos position within annotation
 19    * @return the loaded class for annotation
 20    */
 21   protected static Class<?> getClass(AnnotationValue annotation, int pos) {
 22     String className = EvalUtils.expectString(annotation.location,
 23         "the class name (the argument " + pos + ")", annotation.arguments
 24             .get(pos));
 25     if ("double".equals(className)) {
 26       return double.class;
 27     } else if ("float".equals(className)) {
 28       return float.class;
 29     } else if ("char".equals(className)) {
 30       return char.class;
 31     } else if ("long".equals(className)) {
 32       return long.class;
 33     } else if ("int".equals(className)) {
 34       return int.class;
 35     } else if ("short".equals(className)) {
 36       return short.class;
 37     } else if ("byte".equals(className)) {
 38       return byte.class;
 39     }
 40     try {
 41       return Class.forName(className);
 42     } catch (ClassNotFoundException e) {
 43       throw new EvalException(annotation.location, "No class found: "
 44           + className);
 45     }
 46   }
 47 
 48   /**
 49    * Get field for the class
 50    * 
 51    * @param annotation the annotation
 52    * @param c the class
 53    * @param pos the position within annotation
 54    * @return the field object
 55    */
 56   protected static Field getField(AnnotationValue annotation, Class<?> c,
 57       int pos) {
 58     String fieldName = EvalUtils.expectString(annotation.location,
 59         "the field name (the argument " + pos + ")", annotation.arguments
 60             .get(pos));
 61     try {
 62       return c.getField(fieldName);
 63     } catch (NoSuchFieldException e) {
 64       throw new EvalException(annotation.location, "No field " + fieldName
 65           + " found in  " + c.getName());
 66     } catch (Exception ex) {
 67       throw new EvalException(annotation.location, "Unable to acces the field "
 68           + fieldName + " in  " + c.getName() + ": " + ex);
 69     }
 70   }
 71 }
 72

Example 30. JavaConstantAnnotation.java

  1 package calculator.interpreter.annotations;
  2 
  3 import java.lang.reflect.Field;
  4 import calculator.interpreter.AnnotationValue;
  5 import calculator.interpreter.Environment;
  6 import calculator.interpreter.EvalException;
  7 import calculator.interpreter.Variable;
  8 
  9 /**
 10  * Annotation that binds value to the static constant defined in some java
 11  * class. The annotation has the following arguments:
 12  * <ol>
 13  * <li>Fully qualified Java class name (like "java.lang.Math")</li>
 14  * <li>A constant name with class (like "PI")</li>
 15  * </ol>
 16  */
 17 public class JavaConstantAnnotation extends JavaReflectionProcessor {
 18   /** {@inheritDoc} */
 19   public void process(AnnotationValue annotation, Environment env, Variable var) {
 20     if (annotation.arguments.size() != 2) {
 21       throw new EvalException(annotation.location,
 22           "Wrong number of the arguments for annotation expected 2 and found "
 23               + annotation.arguments);
 24     }
 25     Class<?> c = getClass(annotation, 0);
 26     Field f = getField(annotation, c, 1);
 27     try {
 28       Object value = f.get(null);
 29       var.bind(annotation.location, value);
 30     } catch (EvalException ex) {
 31       throw ex;
 32     } catch (Exception ex) {
 33       throw new EvalException(annotation.location, "Unable to acces the field "
 34           + f.getName() + " in  " + c.getName() + ": " + ex);
 35     }
 36   }
 37 }
 38

The class VarStatement defines a variable or constant in the environment. Note that variables and constants could go in unbound form. They could not be read until bound later using assignment operator. Also the annotations have a chance to bind the variable with a value before value of the expression is assigned.

Example 31. VarStatement.java

  1 package calculator.interpreter.ast.vars;
  2 
  3 import java.util.LinkedList;
  4 import calculator.interpreter.AnnotationValue;
  5 import calculator.interpreter.Cell;
  6 import calculator.interpreter.Environment;
  7 import calculator.interpreter.Variable;
  8 import calculator.interpreter.ast.Expression;
  9 import calculator.interpreter.ast.Statement;
 10 import calculator.interpreter.ast.meta.Annotation;
 11 import calculator.interpreter.ast.meta.DocumentationLine;
 12 
 13 /**
 14  * The variable definition statement
 15  */
 16 public class VarStatement extends Statement {
 17   /** the type of the statement ("var" or "let") */
 18   public String type;
 19   /** the definition name */
 20   public Name name;
 21   /** the statement value (optional) */
 22   public Expression value;
 23 
 24   @Override
 25   public Environment eval(Environment env, Cell result) {
 26     env.checkFree(location, name.name());
 27     boolean isVar = "var".equals(type);
 28     String docs;
 29     if (!this.documentation.isEmpty()) {
 30       StringBuilder b = new StringBuilder();
 31       for (DocumentationLine l : this.documentation) {
 32         b.append(l.text).append('\n');
 33       }
 34       docs = b.toString();
 35     } else {
 36       docs = null;
 37     }
 38     LinkedList<AnnotationValue> evaluatedAnnotations = new LinkedList<AnnotationValue>();
 39     for (Annotation a : this.annotations) {
 40       evaluatedAnnotations.add(a.eval(env));
 41     }
 42     Variable var = new Variable(name.name(), isVar, location, docs,
 43         evaluatedAnnotations);
 44     Environment rc = new Environment(env, var);
 45     for (AnnotationValue a1 : var.annotations) {
 46       a1.process(rc, var);
 47     }
 48     if (value != null) {
 49       var.bind(this.location, value.eval(env));
 50     }
 51     if (var.isBound && result != null) {
 52       result.value = var.value;
 53     }
 54     return rc;
 55   }
 56 }
 57

The Assignment operator is used to assign a value to the variable. Note that ETL is unable to express that only names are allowed on the left hand side. So we have to check if the left hand side is a name ourselves.

Example 32. Assignment.java

  1 package calculator.interpreter.ast.vars;
  2 
  3 import calculator.interpreter.Environment;
  4 import calculator.interpreter.EvalException;
  5 import calculator.interpreter.ast.InfixOp;
  6 
  7 /**
  8  * Execute assignment operator
  9  */
 10 public class Assignment extends InfixOp {
 11   @Override
 12   public Object eval(Environment env) {
 13     String name;
 14     if (first instanceof Name) {
 15       name = ((Name) first).name();
 16     } else {
 17       throw new EvalException(first,
 18           "Left hand side of assignment must be an identifier");
 19     }
 20     Object value = second.eval(env);
 21     return env.findDefined(location, name).bind(location, value);
 22   }
 23 }
 24

The Sequence class represents a block operator. It just executes its statements one by one. Note that because expression could not modify the parent environment, the lexical scoping is implemented more or less automatically. Also the sequence operator does not handle exceptions. If sequence operator fails, the entire statement fails as well. Note that empty sequence results in the null.

Example 33. Sequence.java

  1 package calculator.interpreter.ast;
  2 
  3 import java.util.LinkedList;
  4 import calculator.interpreter.Cell;
  5 import calculator.interpreter.Environment;
  6 
  7 /**
  8  * A sequence of the statements to evaluate
  9  */
 10 public class Sequence extends Expression {
 11   /** statements to evaluate */
 12   public final LinkedList<Statement> statements = new LinkedList<Statement>();
 13 
 14   @Override
 15   public Object eval(Environment env) {
 16     Cell result = new Cell();
 17     for (Statement s : statements) {
 18       result.value = null;
 19       env = s.eval(env, result);
 20     }
 21     return result.value;
 22   }
 23 }
 24

The IncludeStatement class includes interprets statements from other source in the current environment. So they are run as if they were executed instead of include directive. But it does not make sense to include module several times in the same scope since included modules are supposed to create new definition and redefining them is an error. So we implement the check if the module has been included in the class InterpreterVars . Note that include statement assumes that the runner RunnerVars or its subclass is used and will fail with error otherwise.

Example 34. IncludeStatement.java

  1 package calculator.interpreter.ast;
  2 
  3 import java.net.URI;
  4 import java.net.URISyntaxException;
  5 import net.sf.etl.parsers.LiteralUtils;
  6 import net.sf.etl.parsers.ParserException;
  7 import org.xml.sax.InputSource;
  8 import calculator.interpreter.Cell;
  9 import calculator.interpreter.Environment;
 10 import calculator.interpreter.EvalException;
 11 import calculator.interpreter.RunnerVars;
 12 
 13 /**
 14  * The include statement
 15  */
 16 public class IncludeStatement extends Statement {
 17   /** the system identifier to be included */
 18   public String systemId;
 19 
 20   @Override
 21   public Environment eval(Environment env, Cell result) {
 22     RunnerVars r = (RunnerVars) env.findDefined(this.location,
 23         RunnerVars.RUNNER_VAR_NAME).get(this);
 24     InputSource source = createInputSource();
 25     try {
 26       return r.createInterpreter().interpretSource(r.getFactory(), source, env);
 27     } catch (ParserException e) {
 28       throw new EvalException(this, "The soruce " + source.getSystemId()
 29           + "(public '" + source.getPublicId() + "') could not be opened: " + e);
 30     }
 31   }
 32 
 33   /**
 34    * @return a input source for the include directive
 35    */
 36   protected InputSource createInputSource() {
 37     return new InputSource(resolveSystemId());
 38   }
 39 
 40   /**
 41    * @return the system id resolved with respect to the current source location
 42    */
 43   protected String resolveSystemId() {
 44     String mySystemId = location.systemId();
 45     String resolvedSystemId;
 46     if (!mySystemId.startsWith("<")) {
 47       try {
 48         URI myUri = new URI(mySystemId);
 49         resolvedSystemId = myUri.resolve(LiteralUtils.parseString(systemId))
 50             .toString();
 51       } catch (URISyntaxException e) {
 52         throw new EvalException(this, "Problem with parsing URIs " + e);
 53       }
 54     } else {
 55       resolvedSystemId = systemId;
 56     }
 57     return resolvedSystemId;
 58   }
 59 }
 60

The remaining AST classes DocumentationLine , Annotation , Name , and Help are trivial to implement.

4.4. Running the code

The sources conforming to the grammar defined in this section are able to be run with our basic runner. The language level just needs to be set to the value vars .

But considering that we now can create definitions, it might make sense to have some definitions loaded (like the constants E or PI) before executing the code. In this section will add support for loading the language specific prelude before executing the file provided on the command line. The runner below does just this.

Example 35. RunnerVars.java

  1 package calculator.interpreter;
  2 
  3 import java.net.URL;
  4 import net.sf.etl.parsers.SourceLocation;
  5 import net.sf.etl.parsers.TermParserFactory;
  6 import net.sf.etl.parsers.TextPos;
  7 import org.xml.sax.InputSource;
  8 
  9 /**
 10  * The runner for language level "vars" and higher.
 11  */
 12 public class RunnerVars extends RunnerBasic {
 13   /** the variable name where runner is stored */
 14   public static final String RUNNER_VAR_NAME = "$runner";
 15   /** If true, the prelude is loaded in the initial context. */
 16   protected boolean loadPrelude = true;
 17   /** If not null, the string will be used as a default language */
 18   protected String defaultLanguage = null;
 19 
 20   @Override
 21   protected String getDefaultLanguageLevel() {
 22     return defaultLanguage != null ? defaultLanguage : "vars";
 23   }
 24 
 25   @Override
 26   protected Environment initialEnvironment() {
 27     Environment context = super.initialEnvironment();
 28     SourceLocation nowhere = new SourceLocation(TextPos.START, TextPos.START,
 29         "<system>");
 30     Variable runnerVar = new Variable(RUNNER_VAR_NAME, false, nowhere, null,
 31         null);
 32     runnerVar.bind(nowhere, this);
 33     context = new Environment(context, runnerVar);
 34     if (loadPrelude) {
 35       String prelude = getPrelude();
 36       if (prelude != null) {
 37         InputSource source = new InputSource(prelude);
 38         context = createInterpreter().interpretSource(factory, source, context);
 39       }
 40     }
 41     return context;
 42   }
 43 
 44   @Override
 45   public Interpreter createInterpreter() {
 46     return new InterpreterVars();
 47   }
 48 
 49   /**
 50    * @return the default grammar to use in the case if the grammar is not
 51    *         specified
 52    */
 53   protected String getPrelude() {
 54     String lang = getLanguageLevel();
 55     URL prelude = getClass().getResource(
 56         "/calculator/grammars/prelude-" + lang + ".c.etl");
 57     return prelude == null ? null : prelude.toString();
 58   }
 59 
 60   @Override
 61   protected void parseArgs(String[] args) {
 62     int i = 0;
 63     while (i < args.length && args[i].startsWith("-")) {
 64       i = parseOption(args, i);
 65     }
 66     if (i < args.length - 1) {
 67       usage();
 68     }
 69     if (i == args.length - 1) {
 70 
 71     }
 72     super.parseArgs(i == args.length - 1 ? new String[] { args[i] }
 73         : new String[0]);
 74   }
 75 
 76   /**
 77    * Parse single option
 78    * 
 79    * @param args the arguments to parse
 80    * @param i the current position
 81    * @return the position after option
 82    */
 83   protected int parseOption(String[] args, int i) {
 84     if (args[i].equals("-e")) {
 85       loadPrelude = false;
 86       i++;
 87     } else if (args[i].equals("-l")) {
 88       if (i == args.length - 1) {
 89         usage();
 90       } else {
 91         defaultLanguage = args[i + 1];
 92         i += 2;
 93       }
 94     } else {
 95       usage();
 96     }
 97     return i;
 98   }
 99 
100   @Override
101   public void usage() {
102     System.err.println("Calculator Interpreter (vars version). "
103         + "(c) Constantine Plotnikov, 2008");
104     System.err.println("Usage: java " + getClass().getName() + " "
105         + optionList() + " [<file>]");
106     System.err.println(" <file> - the file to be interpreted. "
107         + "If absent, stdin will be used.");
108     optionUsage();
109     System.exit(1);
110   }
111 
112   /**
113    * The usage string for options
114    */
115   protected void optionUsage() {
116     System.err.println(" -e - if specified, the prelude is not loaded.");
117     System.err.println(" -l <lang level> - the default language level "
118         + "(the default value is 'vars').");
119   }
120 
121   /**
122    * @return the list of options (for command line usage)
123    */
124   protected String optionList() {
125     return "[-e] [-l <lang level]";
126   }
127 
128   /**
129    * Application entry point.
130    * 
131    * @param args application arguments
132    */
133   public static void main(String[] args) {
134     new RunnerVars().start(args);
135   }
136 
137   /**
138    * @return a factory for term parsers
139    */
140   public TermParserFactory getFactory() {
141     return factory;
142   }
143 }
144

Note that the runner creates an interpreter of other type, this interpreter does nothing except checking if this source has been already visited in order to avoid cyclic and duplicate includes.

Example 36. InterpreterVars.java

  1 package calculator.interpreter;
  2 
  3 import net.sf.etl.parsers.SourceLocation;
  4 import net.sf.etl.parsers.TextPos;
  5 import net.sf.etl.parsers.beans.FieldTermParser;
  6 import calculator.interpreter.ast.Node;
  7 
  8 /**
  9  * The var's specific interpreter. The only thing that it does, it ensures that
 10  * files are processed only once using real resolved system identifier.
 11  */
 12 public class InterpreterVars extends Interpreter {
 13   @Override
 14   protected Environment interpret(FieldTermParser<Node> statements,
 15       Environment env) {
 16     String systemId = statements.getSystemId();
 17     String visitedVar = "$interpreted " + systemId;
 18     if (env.find(visitedVar) != null) {
 19       return env;
 20     }
 21     env = new Environment(env, new Variable(visitedVar, false,
 22         new SourceLocation(TextPos.START, TextPos.START, systemId), null, null));
 23     return super.interpret(statements, env);
 24   }
 25 }
 26

With that runner we could have the following prelude that uses constants from Java classes using the class JavaConstantAnnotation .

Example 37. prelude-vars.c.etl

     1: doctype "calculator-vars-0_2_1.g.etl";
     2: // This file contains some useful constants. This file is loaded 
     3: // when language level is "vars".
     4: 
     5: /// The constant PI
     6: @JavaConstant("java.lang.Math","PI") 
     7: let PI;
     8: /// The constant E
     9: @JavaConstant("java.lang.Math","E") 
    10: let E;
    11: /// Not a number
    12: @JavaConstant("java.lang.Double","NaN") 
    13: let NaN;
    14: /// Positive infinity
    15: @JavaConstant("java.lang.Double","POSITIVE_INFINITY") 
    16: let POSITIVE_INFINITY;
    17: /// Negative infinity
    18: @JavaConstant("java.lang.Double","NEGATIVE_INFINITY") 
    19: let NEGATIVE_INFINITY;
    20: /// Maximum positive value
    21: @JavaConstant("java.lang.Double","MAX_VALUE") 
    22: let MAX_NUMBER;
    23: /// Minimum positive value
    24: @Q'JavaConstant'("java.lang.Double","MIN_VALUE") 
    25: let q'MIN_NUMBER';
    26: /// Maximum integer value
    27: let MAX_INTEGER = 16#7FFF_FFFF#;
    28: /// Minimum integer value
    29: let MIN_INTEGER = -16#8000_0000#;
    30: /// The null constant
    31: let null = {};

You could check also check prelude AST .

After that we could use constant defined in the prelude in any source like the following:

Example 38. sample-vars.c.etl

     1: doctype "calculator-vars-0_2_1.g.etl";
     2: 
     3: print "PI ** E: ", Q'PI' ** E;
     4: print "E ** PI: ", E ** PI;
     5: /// The dozen constant defined in a complex way
     6: let dozen = {let a = {let one = 1; one + one;}; a * a;} * 3;
     7: help dozen;
     8: help PI;
     9: let a;
    10: a = 4;
    11: help a;
    12: // Using keyword as a name. Quoting is only required
    13: // where it could happen as a keyword, but possible everywhere.
    14: let let;
    15: q'let' = 2;
    16: let Q'print';
    17: // quoting of left hand side is required, but in expression
    18: // is not since "let" is not a keyword in the expression context.
    19: // the last in the sum is also "let" but with 
    20: // all unicode escapes used. 
    21: Q'print' = let + q'let' + Q'\x6c\u0065\U000074;';
    22: // in the sample below quoting is not required
    23: // since "print" and "let" are not keywords in this context
    24: print "used keyword vars: ", let * print;  
    25: help let;
    26: help q'print';
    27: // the $runner is the system variable defined by the runner
    28: help q'$runner';
    29:

You could check also check sample AST .

Note that the sample uses quoted form of names to access values, this includes places where usage of the plain identifier would have caused a syntax error.

5. Logical expressions and Control Flow Operators

5.1. Defining the language

Now lets add logic and control flow expressions to the language.

Example 39. calculator-logic-0_2_1.g.etl

     1: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
     2: grammar calculator.CalculatorLogic {
     3:     include "calculator-vars-0_2_1.g.etl";
     4:     namespace t = "http://etl.sf.net/samples/calculator";
     5:     namespace default l = "http://etl.sf.net/samples/calculator/logic";
     6:     context default Expressions {
     7:         import matchSelectors = Selectors;
     8:         import selectSelectors = Selectors;
     9:         
    10:         op composite BooleanLiteral(f) {
    11:             ^ t:BooleanLiteral {
    12:                 @ value = token(true) | token(false);
    13:             };
    14:         };
    15:         op LessThen(xfx, 1200, <) {
    16:             @ first = left;
    17:             @ second = right;
    18:         };
    19:         op LessOrEqual(xfx, 1200, <=) {
    20:             @ first = left;
    21:             @ second = right;
    22:         };
    23:         op MoreThen(xfx, 1200, >) {
    24:             @ first = left;
    25:             @ second = right;
    26:         };
    27:         op MoreOrEqual(xfx, 1200, >=) {
    28:             @ first = left;
    29:             @ second = right;
    30:         };
    31:         op Equal(xfx, 1300, ==) {
    32:             @ first = left;
    33:             @ second = right;
    34:         };
    35:         op NotEqual(xfx, 1300, !=) {
    36:             @ first = left;
    37:             @ second = right;
    38:         };
    39:         op composite RegExLiteral(f) {
    40:             ^t:RegExLiteral {
    41:                 @ value = string(prefix=R, quote='"');
    42:             };
    43:         };
    44:         op Matches(xfx, 1300, ==~) {
    45:             @ first = left;
    46:             @ second = right;
    47:         };
    48:         op LogicalNot(fy, 1400, !) {
    49:             @ value = right;
    50:         };
    51:         op ConditionalAnd(yfx, 1500, &&) {
    52:             @ first = left;
    53:             @ second = right;
    54:         };
    55:         op ConditionalOr(yfx, 1600, ||) {
    56:             @ first = left;
    57:             @ second = right;
    58:         };
    59:         op composite IfExpression(f, 1700) {
    60:             % if % (;
    61:             @ condition = expression;
    62:             %);
    63:             @ thenPart = expression(precedence = 1700);
    64:             % else {
    65:                 @ elsePart = expression(precedence = 1700);
    66:             } ?;
    67:         };
    68:         op composite IfExpressionCStyle(xfy,1700) {
    69:             ^ l:IfExpression {
    70:                 @ condition = left;
    71:                 % ?;
    72:                 @ thenPart = expression;
    73:                 % :;
    74:                 @ elsePart = right;
    75:             };
    76:         };
    77:         op composite Match(xf, 1700) {
    78:             @ value = left;
    79:             % match;
    80:             @ selectors += block(matchSelectors);
    81:         };
    82:         op composite Select(f) {
    83:             % select;
    84:             @ selectors += block(selectSelectors);
    85:         };
    86:         statement WhileStatement{
    87:             % while;
    88:             @ label = identifier?;
    89:             % (;
    90:             @ condition = expression;
    91:             % );
    92:             @ body = expression;
    93:         };
    94:         statement BreakStatement {
    95:             % break;
    96:             @ label = identifier?;
    97:         };
    98:         statement ContinueStatement {
    99:             % continue;
   100:             @ label = identifier?;
   101:         };
   102:         statement AssertStatement {
   103:             % assert;
   104:             @ condition = expression;
   105:             % : {
   106:                 @ message = expression;
   107:             } ?;
   108:         };
   109:     };
   110:     context Selectors {
   111:         import expressions = Expressions;
   112:         statement CasePart {
   113:             @ selector = expression(expressions);
   114:             % =>;
   115:             @ value = expression(expressions); 
   116:         };
   117:         statement ElsePart {
   118:             % else;
   119:             @ value = expression(expressions);
   120:         };
   121:     };
   122: };

The simplest logical expression is a boolean literal that can be true or false. The definition BooleanLiteral specifies just it. The literal definition is followed by relational operators. In our grammar they are non-associative. And comparison operators have a lesser precedence then equality operators.

The definition RegExLiteral demonstrates how to implement how to specify prefixed strings. Note that differently from string literal, we will pass the content of the string to the regular expression pattern as is, so we will not need to escape backslashes. The only thing that ETL cares to be escaped in strings is a backslash characters and a closing quote character. Other validation has to be performed by the processors of the language. However these characters are escaped in the regular expressions in the same way. The operator Matches allows to match the string with the regular expression.

The operator IfExpression defines a conditional expression that looks mostly like Java's if statement. The body is any expression of level conditions and above (assignment we will need quotes) and blocks are supported because we have Sequence operator in our grammar, the else part is optional and because of this we could not define this operator as fy operator and have to emulate right argument processing in the definition, for that purpose that the expression part is defined with the same precedence as one of the operator. The next definition IfExpressionCStyle defines C-like conditional operator. Note that it maps to the same object as the operator IfExpression . It is actually the recommended to use the same mapping in cases where syntax constructs express the same concept in a syntactically different ways.

The next two operators Match and Select make use of new extensibility construct. They use syntax expression block(matchSelectors) and block(selectSelectors) correspondingly. This construct matches block with statements from the referenced context. However there is no such context in the grammar. In the beginning of the context Expressions we had definitions import matchSelectors = Selectors; and import selectSelectors = Selectors; . This definition specifies a local name for the context Selectors in the context. This mapping can be redefined during the grammar include operation or context include operations just like definitions. So if we write import matchSelectors = MySelectors; in the context that includes the context Expressions through grammar or context include, this context MySelectors will be used for match body. Also because we have two independent definitions we will be able to redefine them independently. It is recommended that local name should be named after the role that the referred context plays in the current context. While syntax of body for the operators Match and Select is the same, the semantics is different. The select operator evaluates the selector expression and if it is true, evaluates and returns the expression value . The match operator matches the left operand against the selector expression (so it acts like a Java's switch).

The context Selectors used by these two operators contains the CasePart statement and the ElsePart statement. The ElsePart is supposed to happen as the last member of the block, but ETL currently provides no means to express this constraint. So this condition should be checked on semantic level on the language. This context also imports the context Expressions and uses the syntax operator expression in the form expression(expressions) .

The statements WhileStatement , BreakStatement , and ContinueStatement do just what they do in C or Java. The only difference is that the label for the while statement goes after keyword while.

The statement AssertStatement behaves mostly like Java's assert statement. It checks the condition and throws an exception otherwise. The exception will include the message that is specified after the character : (colon).

5.2. Implementing AST classes

Implementing AST classes for this part of the tutorial is mostly trivial, and we consider only somewhat tricky classes.

The class WhileStatement is actually quite simple to implement if we use the lexical scopes and exceptions to implement break and continue statements. Note that the variable that is not addressable expressions only in a quoted form is created to support loop.

Example 40. WhileStatement.java

  1 package calculator.interpreter.ast.logic;
  2 
  3 import calculator.interpreter.Cell;
  4 import calculator.interpreter.Environment;
  5 import calculator.interpreter.EvalUtils;
  6 import calculator.interpreter.Variable;
  7 import calculator.interpreter.ast.Expression;
  8 import calculator.interpreter.ast.Statement;
  9 
 10 /**
 11  * The while loop statement
 12  */
 13 public class WhileStatement extends Statement {
 14   /** The label for the statement (might be null) */
 15   public String label;
 16   /** The condition for the statement */
 17   public Expression condition;
 18   /** The body of the statement */
 19   public Expression body;
 20 
 21   @Override
 22   public Environment eval(Environment env, Cell result) {
 23     final Object token = new Object();
 24     Variable unnamedVar = new Variable(tokenName(null), true, location, null,
 25         null);
 26     unnamedVar.bind(location, token);
 27     Environment bodyContext = new Environment(env, unnamedVar);
 28     Variable namedVar;
 29     if (label != null) {
 30       namedVar = new Variable(tokenName(label), true, location, null, null);
 31       namedVar.bind(location, token);
 32       bodyContext = new Environment(bodyContext, namedVar);
 33     } else {
 34       namedVar = null;
 35     }
 36     try {
 37       while (EvalUtils.toBoolean(this, env, condition)) {
 38         try {
 39           body.eval(bodyContext);
 40         } catch (LoopGotoException ex) {
 41           if (ex.token != token) {
 42             throw ex;
 43           }
 44           if (ex.isBreak) {
 45             break;
 46           }
 47         }
 48       }
 49     } finally {
 50       unnamedVar.bind(location, null);
 51       if (namedVar != null) {
 52         namedVar.bind(location, null);
 53       }
 54     }
 55     return env;
 56   }
 57 
 58   /**
 59    * Generate token name for loop
 60    * 
 61    * @param label the loop label to use (or null for token without label)
 62    * @return the token name
 63    */
 64   public static String tokenName(String label) {
 65     return "$loop token " + (label == null ? "" : label);
 66   }
 67 
 68   /**
 69    * The exception that is used to break or continue the loop
 70    */
 71   @SuppressWarnings("serial")
 72   public static class LoopGotoException extends RuntimeException {
 73     /** the token for exception */
 74     public final Object token;
 75     /** if true, break requested, otherwise continue */
 76     public final boolean isBreak;
 77 
 78     /**
 79      * The constructor
 80      * 
 81      * @param token the token
 82      * @param isBreak the break/continue flag
 83      */
 84     public LoopGotoException(Object token, boolean isBreak) {
 85       super();
 86       this.token = token;
 87       this.isBreak = isBreak;
 88     }
 89   }
 90 }
 91

After this the break/continue statements can be implemented with this base class.

Example 41. LoopGotoStatement.java

  1 package calculator.interpreter.ast.logic;
  2 
  3 import calculator.interpreter.Cell;
  4 import calculator.interpreter.Environment;
  5 import calculator.interpreter.EvalException;
  6 import calculator.interpreter.Variable;
  7 import calculator.interpreter.ast.Statement;
  8 
  9 /**
 10  * The loop goto statement
 11  */
 12 public abstract class LoopGotoStatement extends Statement {
 13   /** the label of loop (may be null) */
 14   public String label;
 15 
 16   @Override
 17   public Environment eval(Environment env, Cell result) {
 18     Variable var = env.find(WhileStatement.tokenName(label));
 19     if (var == null) {
 20       throw new EvalException(this, "The break and continue statements "
 21           + "must happen only in the lexical scope of the loop.");
 22     }
 23     if (var.value == null) {
 24       throw new EvalException(this, "The break and continue statements "
 25           + "must happen only while loop is acitve.");
 26     }
 27     throw new WhileStatement.LoopGotoException(var.value, isBreak());
 28   }
 29 
 30   /**
 31    * @return true if it is break statement, false if continue
 32    */
 33   protected abstract boolean isBreak();
 34 }
 35

The AssertStatement is a bit more interesting from implementation point of view. Its logic quite trivial, but for assert statement it might be useful to print an assertion text, but in the runtime we already have AST rather then text. However if the object implements TokenCollector interface, the FieldTermParser will notify the object about every token that happens in its scope. We could use this functionality to accumulate the text of the entire assert statement and then to select the part relevant to the condition using positional information from the statement and the the condition.

Example 42. AssertStatement.java

  1 package calculator.interpreter.ast.logic;
  2 
  3 import net.sf.etl.parsers.TermToken;
  4 import net.sf.etl.parsers.beans.TokenCollector;
  5 import calculator.interpreter.Cell;
  6 import calculator.interpreter.Environment;
  7 import calculator.interpreter.EvalException;
  8 import calculator.interpreter.ast.Expression;
  9 import calculator.interpreter.ast.Statement;
 10 
 11 /**
 12  * The assert statement
 13  */
 14 public class AssertStatement extends Statement implements TokenCollector {
 15   /** The condition to assert */
 16   public Expression condition;
 17   /** The message to use in case of assertion failure */
 18   public Expression message;
 19   /** The text of the assert statement */
 20   private StringBuilder textBuilder = new StringBuilder();
 21   /** The text of the condition */
 22   private String conditionText;
 23 
 24   @Override
 25   public Environment eval(Environment env, Cell result) {
 26     Object c = condition.eval(env);
 27     if (!(c instanceof Boolean)) {
 28       throw new EvalException(this, "The condition " + conditionText() + " at "
 29           + condition.location.toShortString()
 30           + " did not evaluated to the boolean: " + c);
 31     }
 32     if (!(Boolean) c) {
 33       throw new EvalException(this, "The assertion " + conditionText()
 34           + " failed" + (message == null ? "." : ": " + message.eval(env)));
 35     }
 36     return env;
 37   }
 38 
 39   /**
 40    * @return the text of condition
 41    */
 42   private String conditionText() {
 43     if (conditionText == null) {
 44       int start = (int) (condition.location.start().offset() - location.start()
 45           .offset());
 46       int end = (int) (condition.location.end().offset() - location.start()
 47           .offset());
 48       conditionText = textBuilder.substring(start, end);
 49       textBuilder = null;
 50     }
 51     return conditionText;
 52   }
 53 
 54   /** {@inheritDoc} */
 55   public void collect(TermToken token) {
 56     if (token.hasLexicalToken()) {
 57       textBuilder.append(token.token().token().text());
 58     }
 59   }
 60 }
 61

5.3. Running the code

We could use the same runner as for the "vars" language level just with the logic language level specified.

The logic's prelude will just include vars's prelude.

Example 43. prelude-logic.c.etl

     1: doctype "calculator-logic-0_2_1.g.etl";
     2: include "prelude-vars.c.etl";

And the sample that conforms to this grammar and prelude would be:

Example 44. sample-logic.c.etl

     1: doctype "calculator-logic-0_2_1.g.etl";
     2: 
     3: let dozen1 = {
     4:     var rc = 0;
     5:     var n = 12;
     6:     while outer (true) {
     7:         while(true) {
     8:             if(n < 1) {
     9:                 break outer;
    10:             } else {
    11:                 n = n - 1;
    12:                 rc = rc + 1;
    13:                 continue outer;
    14:             }; 
    15:         };
    16:     };
    17:     rc;
    18: };
    19: assert dozen1 == 12;
    20: let dozen2 = {
    21:     var a = 420;
    22:     var b = 96;
    23:     while(a != 0) {
    24:         if(a < b) {
    25:             let t = a;
    26:             a = b;
    27:             b = t;
    28:         };
    29:         a = a - b;
    30:     };
    31:     b;
    32: };
    33: 
    34: assert dozen2 == 12;
    35: assert "\\" ==~ R"\\";
    36: assert "\\abba\\" ==~ R".\w{4,}.";
    37: assert ! "\\a+ba\\" ==~ R".\w{4,}.";
    38: assert "1111" == "\x31\u0031\U31;\U0000031;";
    39: assert (if(1==2)  11) == null;
    40: assert (if(false) 11 else 11 + 11) == 22;
    41: assert (if(true)  11 else 11 + 11) == 11;
    42:

You could check also check sample AST .

6. Lambda Expressions

6.1. Defining the grammar

To have more fun with the calculations in our calculator we need to support functions. Otherwise sin , cos , and others have to be introduced in some very ad hoc manner. To simplify explanations lets split our extension to the grammar into two parts. The first part will contains only the core constructs.

Example 45. calculator-lambda-basic-0_2_1.g.etl

     1: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
     2: grammar abstract calculator.CalculatorLambdaBasic {
     3:     include "calculator-logic-0_2_1.g.etl";
     4:     namespace v = "http://etl.sf.net/samples/calculator/vars";
     5:     namespace default l = "http://etl.sf.net/samples/calculator/lambda";
     6:     context abstract LetRecDefs {
     7:         def EpxressionDef {
     8:             // redefine it to refer to the correct expression
     9:             expression(Expressions);
    10:         };
    11:         def ParameterList {
    12:             @ parameters += list , {
    13:                 ^l:Parameter {
    14:                     modifiers {
    15:                         @ type = modifier var;
    16:                     };
    17:                     @ name = ref(NameDef);
    18:                 };
    19:             } ?;
    20:         };
    21:         def LetRecAssignDef {
    22:             ^v:LetRecPart {
    23:                 @ name = ref(NameDef);
    24:                 % =;
    25:                 @ value = ref(ExpressionDef);
    26:             };
    27:         };
    28:     };
    29:     context default Expressions {
    30:         include LetRecDefs;
    31:         import letRecContent = LetRecContent;
    32:         
    33:         op composite Lambda(fy, 1950) {
    34:             % \;
    35:             ref(ParameterList) | %( { ref(ParameterList); } % );
    36:             % =>;
    37:             @ value = right;
    38:         };
    39:         op composite Apply(yf, 100) {
    40:             @ function = left;
    41:             % (;
    42:             @ arguments += list , {
    43:                 expression;
    44:             } ?;
    45:             % );
    46:         };
    47:         statement LetRec {
    48:             ^ v:LetRec {
    49:                 % letrec;
    50:                 @ definitions += ref(LetRecAssignDef) | {
    51:                     block(letRecContent);
    52:                 };
    53:             };
    54:         };
    55:     };
    56:     context LetRecContent {
    57:         include LetRecDefs;
    58:         include MetaInformation;
    59:         import expressions = Expressions;
    60:         
    61:         def EpxressionDef {
    62:             expression(expressions);
    63:         };        
    64:         statement LetPart {
    65:             ref(LetRecAssignDef);
    66:         };
    67:     };
    68: };

Note that this grammar has "abstract" modifier. This means that this grammar cannot be directly used. It exists only to be included into some other grammar. The abstract grammars are a nice way of providing a set of reusable definitions or for refactoring the grammars into simpler parts (and we use it here for the later purpose). Also note that we use the same namespace prefix l as we used for the logic grammar definitions, but map it to a different namespace. Since namespace declarations are local to the grammar, it in no way affects definitions included from the logic grammar. But it is recommended to use different prefixes for different namespaces in groups of related grammars in order to avoid confusion during grammar reading by a human.

The primary element introduced in the grammar is a lambda expression defined by the operator Lambda . We have placed the lambda expressions between conditional operator and assignment operator. So it will not be possible to to assignment on the value part of the lambda expression without parentheses, but it will be possible to assigning lambda expression to the variable without parentheses. The list of parameters could be optionally enclosed into parentheses. The "var" modifier on the parameter of the lambda abstraction allows specifying that the parameter should be mutable in the body of the lambda abstraction. That comes handy in some situations.

The operator Apply allows invoking lambda abstraction with zero or more arguments. Note that in our language we will support currying and and uncurrying. So invocation of two argument function max()(1)(2) will have the same semantics as max(1,2) (but not the same as max(1)(2)() , as later will try to use 2 as a function and thus will fail).

Along with lambda expressions, it is useful to introduce the recursive form of the let construct. Strictly speaking it is not required, since we could already define unbounded let and and bind it later. But having having such a statement will express the intention more clearly and allows for more convenient syntax. The letrec construct will have two basic forms:

// single var form
letrec name = {the expression that uses name};
// multiple var form
letrec {
    name1 = {the expression that uses name1 ... nameN};
    name2 = {the expression that uses name1 ... nameN};
    ...
    nameN = {the expression that uses name1 ... nameN};
};

The first form is simply a shortcut for the one element second form. But the second form is itself equivalent to the following:

let name1;
let name2;
...
let nameN;
name1 = {the expression that uses name1 ... nameN};
name2 = {the expression that uses name1 ... nameN};
...
nameN = {the expression that uses name1 ... nameN};

We will define the reusable parts for letrec and lambdas in the context LetRecDefs . Note that because context include happens after grammar include, we do not need to redefine the fragment ExpressionDef in the expression context again. The definition from the logic grammar will be included in the context Expressions and override the definition that will try to come for the context LetRecDefs .

The context LetRecContent simply reuses definitions from LetRecDefs and converts them to statements. But there is an interesting extensibility aspect here. We include the contexts LetRecDefs and MetaInformation (defined in the logic grammar). The both contexts contains the fragment ExpressionDef . The definitions are the same and the definition in the either context would have worked for us. But since these contexts are independent, there is a definition conflict, and we have to override the definitions by creating a definition with the same name in the LetRecContent context.

Lets now add new constructs that will extend the grammar with other forms of the function invocation that allow defining new control constructs like it is done in Scala or Groovy (or in the plenty of others programming languages that support lambda abstractions).

Example 46. calculator-lambda-0_2_1.g.etl

     1: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
     2: grammar calculator.CalculatorLambda {
     3:     include "calculator-lambda-basic-0_2_1.g.etl";
     4:     namespace t = "http://etl.sf.net/samples/calculator/vars";
     5:     namespace v = "http://etl.sf.net/samples/calculator/vars";
     6:     namespace default l = "http://etl.sf.net/samples/calculator/lambda";
     7:     context abstract LetRecDefs {
     8:         def LetRecFunctionDef {
     9:             ^ v:LetRecPart {
    10:                 % to;
    11:                 @ name = ref(NameDef);
    12:                 @ value = ^l:Lambda {
    13:                     % (;
    14:                     ref(ParameterList);
    15:                     % ) % =>;
    16:                     @ value = ref(ExpressionDef);
    17:                 };
    18:             };
    19:         };        
    20:     };
    21:     context default Expressions {
    22:         include LetRecDefs;
    23:         import letRecContent = LetRecContent;
    24:         
    25:         op composite LambdaNoArgs(fy, 1950) {
    26:             ^ l:Lambda {
    27:                 % \>;
    28:                 @ value = right;
    29:             };
    30:         };        
    31:         def SequenceDef {
    32:             ^ l:LambdaSequence {
    33:                 @statements += block;
    34:             };
    35:         };
    36:         statement BlockParameters {
    37:             % ?;
    38:             ref(ParameterList);
    39:         };
    40:         statement Function {
    41:             ^v:LetRec {
    42:                 @ definitions += ref(LetRecFunctionDef);
    43:             };
    44:         };
    45:         op composite InfixOperatorAppy(yfx, 1150) {
    46:             ^ l:Apply {
    47:                 @ arguments += left;
    48:                 % #;
    49:                 @ function = expression(precedence = 99);
    50:                 @ arguments += right;
    51:             };
    52:         };        
    53:         op composite PrefixOperatorAppy(fy, 1150) {
    54:             ^ l:Apply {
    55:                 % #;
    56:                 @ function = expression(precedence = 99);
    57:                 @ arguments += right;
    58:             };
    59:         };
    60:         op composite ApplyBlock(xf, 1175) {
    61:             ^ l:Apply {
    62:                 @ function = left;
    63:                 @ arguments += ref(SequenceDef);
    64:             };
    65:         };
    66:         op composite Lazy(f, 1950) {
    67:             ^ l:Apply {
    68:                 @ function = token(&) wrapper t:Name.literal;
    69:                 @ arguments += ^ l:Lambda { 
    70:                     @value = expression(precedence=1949); 
    71:                 };
    72:             };
    73:         };
    74:         op Dereference(fy, 100, *) {
    75:             @ value = right;
    76:         };
    77:         op composite NotEqualDeref(xfx, 1300) {
    78:             ^ l:Apply {
    79:                 @ arguments += left;
    80:                 @ function = token(!*=) wrapper t:Name.literal;
    81:                 @ arguments += right;
    82:             };
    83:         };
    84:         op composite EqualDeref(xfx, 1300) {
    85:             ^ l:Apply {
    86:                 @ arguments += left;
    87:                 @ function = token(=*=) wrapper t:Name.literal;
    88:                 @ arguments += right;
    89:             };
    90:         };
    91:     };
    92:     context LetRecContent {
    93:         statement FunctionPart {
    94:             ref(LetRecFunctionDef);
    95:         };
    96:     };
    97: };

The first thing that we introduce here is a short form for the lambda abstraction without arguments as the operator LambdaNoArgs . The definition just maps to the same object as the operator Lambda from the included grammar.

Another thing that we introduce is the block form of the lambda abstraction. We will indirectly redefine the operator Sequence . It closely resembles what is proposed for Java closures, except that we prefix parameters with the question mark and put them on the separate statement rather than detecting using the keyword => after parameter text in order to make the grammar deterministic. While we cannot restrict using ETL grammar that the statement BlockParameters appears only at the beginning of the block, we will check it during evaluation. If there is no statement BlockParameters in the beginning of the block, we will revert to the old behavior. Since this is a new behavior different from behavior of the old Sequence operator, we will redefine the Sequence operator to map to other object than in the vars grammar. To do this, we just need to redefine the fragment SequenceDef, and everywhere we have used it new lambda blocks will be supported.

Another thing we are introducing in this grammar is a syntactic sugar for defining the functions. The context LetRecDefs and LetRecContent contains a new member each to support this update. Also we use new definition to define the statement Function in the context Expressions . Note that we do not introduce new AST objects here, these constructs are mapped to existing LetRec and Lambda objects. Such approach is supported by ETL, but it should be used when the new construct is completely logically equivalent to some existing construct. With definition like it provided in the tutorial, it is not possible to distinguish between two forms by examining AST. If we would want to perform some special documentation or annotation processing related to the function definition, we will have problems with implementing it.

We also introduce syntax sugar for applying function to one and two arguments in infix (the operator InfixOperatorAppy ) and prefix forms (the operator PrefixOperatorAppy ). This approach is less flexible than in the Scala, as we have to prefix our operator with the symbol "#", but it still works. We have placed these operators between arithmetic and relational operators. So a logical expression has to be put in the parentheses when using these operators. Note that we can specify not only function names, but any functional expressions of precedence less than 100 (this allows expressions in parentheses, lambda blocks, and names). And this expression will evaluate before its arguments are evaluated, so lexical evaluation order will not be kept here. The final application sugar is the block application operator ApplyBlock . It has greater precedence than infix and prefix apply forms.

The another things in the our grammar is support for the lazy values. To support them we introduce the operator Lazy and the operator Dereference . The dereference operator invokes its argument if it is a zero argument function (and than reapplies itself to the result), otherwise it just returns the argument. So the dereference operator can resolves lazy values even if they are resolved to lazy values.

From the logical point of view, lazy function creates a zero argument function from another zero-argument function that invokes the function passed as argument only once and remembers the result. And with exclusion of exception handling (that we do not have in our language), we could implement support for lazy values in the language like the following:

let lazy = {? var f;
    assert f != null;
    var value;
    {?;
        if(f != null) {
            value = f();
            f = null;
        };
        value;
    };
};
val lazySum = #lazy {?; 1 + *lazy(\> 2 + 3);};

However it is somewhat inconvenient to lambda abstraction every time we want to create a value. So defined the operator lazy as mapping to the function defined in the prelude, but we have wrapped its right argument into a lambda abstraction. This trick only works with the right argument of the operator. Also note that we use & a function name in our grammar. This demonstrates another trick. Our name has quoted and literal forms. We use a literal form to refer the function in the operator definition. However in the function definition we could not use ampersand as a function name directly. But we could use it in a quoted form. Also we we use it in a quoted form to refer to the lazy function driectly, without wrapping a right argument into the lambda abstraction automatically.

With these syntax extensions we could define new control constructs as it could be seen from prelude and sample sources.

The final two operators demonstrate a simpler case with mapping operators to function. The operators NotEqualDeref and EqualDeref define dereferencing version of equality operators. But instead of implementing new node types in Java code, we just define the as a mapping to function invocation with the operator text as a name. We then define the corresponding functions in the beginning of the prelude. This is how the operators like plus and etc. could have been defined if we would have started with lambda abstractions ^[6] . In the similar way the users could also extend our lambda grammar and introduce own simple operators mapped to the functions without adding new AST node types. With using lambda abstraction trick that we used for lazy values, we could do quite funny things in the language itself.

Example 47. prelude-lambda.c.etl

     1: doctype "calculator-lambda-0_2_1.g.etl";
     2: include "prelude-logic.c.etl";
     3: 
     4: /// The construct lazy value from the lambda abstraction
     5: ///
     6: /// @param f the function
     7: /// @return the function that remembers the result invocation of 
     8: ///         the function without arguments 
     9: let q'&' = {? var f;
    10:     assert f != null;
    11:     var value;
    12:     {?;
    13:         if(f != null) {
    14:             value = f();
    15:             f = null;
    16:         };
    17:         value;
    18:     };
    19: };
    20: /// The implementation of NotEqualDeref operator
    21: to q'!*='(x, y) =>  * x != * y;
    22: /// The implementation of EqualDeref operator
    23: to q'=*='(x, y) =>  * x == * y;
    24: 
    25: /// The maximum value (works for comparable objects)
    26: to max(x, y) => x < y ? y : x;
    27: /// The minimum value (works for comparable objects)
    28: to min(x, y) => x < y ? x : y;
    29: 
    30: /// Ensure that the number in the range of integer values.
    31: /// @param x the number to check
    32: /// @return itself (to allows variable arguments) 
    33: to assertIntRange(x) => {
    34:     assert MIN_INTEGER <= x && x <= MAX_INTEGER : "The value is "+x;
    35:     assertIntRange;
    36: };
    37: 
    38: /// Ensure that the number is the integer (there is no fractional pat).
    39: /// @param x the number to check 
    40: /// @return itself (to allows variable arguments) 
    41: to assertInt(x) => {
    42:     assertIntRange(x);
    43:     assert x % 1 == 0 : "The value is "+x;
    44:     assertInt;
    45: };
    46: 
    47: /// Ensure that the number is the non-negative integer.
    48: /// @param x the number to check 
    49: /// @return itself (to allows variable arguments) 
    50: to assertNonNegativeInt(x) => {
    51:     assertInt(x);
    52:     assert x >= 0 : "The value is "+x;
    53:     assertNonNegativeInt;
    54: };
    55: 
    56: /// The factorial value
    57: ///
    58: /// @param n the non-negative integer value
    59: /// @return the factorial result
    60: let factorial = {?n;
    61:     assertNonNegativeInt(n);
    62:     to fac(x) => x match { 0 => 1; else x * fac(x-1); };
    63:     fac(n);
    64: };
    65: 
    66: letrec {
    67:     /// Odd number check
    68:     /// 
    69:     /// @param x a non-negative integer number
    70:     /// @return true for odd numbers, false for even ones
    71:     is_odd = {?x; assertNonNegativeInt(x); x != 0 && !is_even(x-1);};
    72:     /// Even number check
    73:     /// 
    74:     /// @param x a non-negative integer number
    75:     /// @return false for odd numbers, true for even ones
    76:     to is_even(x) => {assertNonNegativeInt(x); x == 0 || is_odd(x-1);};
    77: };
    78: 
    79: /// Repeats execution of the one argument function from low to high
    80: /// adding +1 on each step until the resulting value will be higher then
    81: /// high. The example below will print numbers 1.5, 2.5, 3.5, and 4.5.
    82: /// {@example
    83: ///   1.5 #upto 5 {?i;
    84: ///      print i;
    85: ///   }; }
    86: /// 
    87: /// If the high value is greater then the low value, the cycle is never executed.
    88: ///
    89: /// @param low the start value for the loop (must be in integer range)
    90: /// @param high the limit value for the loop (must be in integer range)
    91: /// @param f the closure to use for the loop. 
    92: ///          If it returns false, the cycle is aborted, 
    93: ///          all other values are ignored.
    94: /// @return null value
    95: to upto(var low, high, f) => {
    96:     assertIntRange(low, high);
    97:     while(low<high && f(low) != false) {
    98:         low = low + 1;
    99:     };
   100: };
   101: 
   102: /// Repeats execution of the one argument function from start to limit
   103: /// adding step on each step until the resulting value will be higher or lower then
   104: /// limit (the direction depends on step's sign). 
   105: /// {@example
   106: ///   for(5,1,-0.125) {?i;
   107: ///      print i;
   108: ///   }; }
   109: /// 
   110: /// If the high value is greater then the low value, the cycle is never executed.
   111: ///
   112: /// @param start the start value for the loop (start/step must be in integer range)
   113: /// @param limit the limit value for the loop (limit/step must be in integer range)
   114: /// @param step the loop step (must ot be 0)
   115: /// @param f the closure to use for the loop. 
   116: ///          If it returns false, the cycle is aborted, 
   117: ///          all other values are ignored.
   118: /// @return null value
   119: to for(var start, limit, step, f) => {
   120:     assert step != 0;
   121:     assertIntRange(start/step, start/step);
   122:     while((step > 0 ? start<limit : start>limit) && f(start) != false) {
   123:         start = start + step;
   124:     };
   125: };
   126: 
   127: /// Convert Java number to integer
   128: @JavaMethod("java.lang.Number", "intValue")
   129: let toInt;
   130: 
   131: /// Convert a value to string
   132: @JavaMethod("java.lang.Object", "toString")
   133: let str;
   134: 
   135: @JavaMethod("java.lang.Math", "abs", "double")
   136: let abs;
   137: 
   138: @JavaMethod("java.lang.Math", "sin", "double")
   139: let sin;
   140: 
   141: @JavaMethod("java.lang.Math", "cos", "double")
   142: let cos;
   143: 
   144: /// Sign of the number
   145: ///
   146: /// @param x the floating point number to check
   147: /// @return -1, 0, or 1 depending on th sing of the number
   148: let sign = \x => select { x < 0 => -1; x > 0 => 1; else 0; };

You could check also check prelude AST .

Example 48. sample-lambda.c.etl

     1: doctype "calculator-lambda-0_2_1.g.etl";
     2: 
     3: /// The special marker value used to stop computation in sum function
     4: let STOP = \x => x;
     5: /// Sum arguments up to stop marker like {@code sum(0,1,2,3,STOP)}
     6: ///
     7: /// @return the function that accumulates results.
     8: let sum = {?;
     9:     var sum = 0;
    10:     to acc(x) => if( x == STOP) {sum;} else {sum = sum + x; acc;};
    11: };
    12: /// Greatest common divisor
    13: ///
    14: /// @param a the positive integer value
    15: /// @param b the positive integer value
    16: /// @return the greatest common divisor
    17: let gcd = {? var q'a', var b;
    18:     assertInt(a,b);
    19:     assert a > 0 && b > 0;
    20:     if(a < b) {
    21:         let t = a;
    22:         a = Q'b';
    23:         b = t;
    24:     };
    25:     while(b != 0) {
    26:         a = a % b;
    27:         let t = a;
    28:         a = b;
    29:         b = t;
    30:     };
    31:     a;
    32: };
    33: 
    34: 
    35: assert 11 !*= {?; 12;};
    36: assert (\> 12) =*= (& 12);
    37: assert 12 == * (\()=> \> & max(10)(12));
    38: assert 12 == * (\()=> \> #q'&' {?;1+10+1;});
    39: assert 12 == #abs 1-13;
    40: assert 12 == sum(0,1,2,3,3,2,1,0,STOP);
    41: assert 12 == factorial(4) / 2;
    42: assert 12 == 420 #gcd 140 + 4;
    43: assert 12 == {
    44:     var s = 0;
    45:     0 #upto 4 {?i;
    46:         s = s + i;
    47:     };
    48:     s * 2;
    49: }; 
    50: assert 12 == {
    51:     var s = 0;
    52:     var n = 0;
    53:     for(14, 9.9, -0.25) {?i;
    54:         s = s + i;
    55:         n = n + 1;
    56:     };
    57:     s / n;
    58: };
    59: assert  1 == #sign 12;
    60: assert -1 == #sign -MIN_NUMBER;
    61: assert  0 == sign(12-12);
    62: assert   12 ==~ {?n; sign(n) == 1;};
    63: assert ! 12 ==~ (\n => sign(n) == -1);

You could check also check sample AST .

6.2. Implementing AST classes

The core class for syntax elements provides in this grammar is the class Function .

Example 49. Function.java

  1 package calculator.interpreter;
  2 
  3 import java.util.ArrayList;
  4 import java.util.Collections;
  5 import java.util.List;
  6 import net.sf.etl.parsers.SourceLocation;
  7 
  8 /**
  9  * The function abstract class
 10  */
 11 public abstract class Function implements Matcher {
 12   /**
 13    * @return amount of arguments for this function (the function should not
 14    *         change its arity during the lifetime of this object)
 15    */
 16   public abstract int arity();
 17 
 18   /**
 19    * Apply the function, the amount of arguments must match the arity. This
 20    * method should not be invoked directly, use
 21    * {@link Function#apply(SourceLocation, Object, Object[])} instead.
 22    * 
 23    * @param location the location that requested invocation
 24    * @param args the function arguments, the implementers must not change this
 25    *          list.
 26    * @return a result of function invocation
 27    */
 28   public abstract Object apply(SourceLocation location, List<Object> args);
 29 
 30   /** {@inheritDoc} */
 31   public boolean match(SourceLocation location, Object value) {
 32     Object r = Function.apply(location, this, Collections.singletonList(value));
 33     return r instanceof Boolean && ((Boolean) r).booleanValue();
 34   }
 35 
 36   /**
 37    * Apply the function, if the amount of arguments is insufficient, return the
 38    * curried function, otherwise return the invocation result. If there are too
 39    * much of the arguments, it first applies function with available arguments
 40    * and if the result is a function, it tries to apply it with rest of the
 41    * arguments.
 42    * 
 43    * @param location the location of the node that requested evaluation
 44    * @param function the function to invoke (checked to be instance of
 45    *          {@link Function})
 46    * @param args the function arguments
 47    * @return the invocation result
 48    * @throws EvalException if there are too much of the arguments or if the
 49    *           argument {@code f} is not the function.
 50    */
 51   public static Object apply(SourceLocation location, Object function,
 52       List<Object> args) {
 53     int n = args.size();
 54     int i = 0;
 55     boolean firstApply = true;
 56     while ((i < n || n == 0 && firstApply) && function instanceof Function) {
 57       if (firstApply) {
 58         firstApply = false;
 59       }
 60       Function f = (Function) function;
 61       int a = f.arity();
 62       assert a >= 0;
 63       if (i + a > n) {
 64         function = new Function.CurriedFunction(f, args.subList(i, n));
 65       } else {
 66         function = f.apply(location, args.subList(i, i + a));
 67       }
 68       i += a;
 69     }
 70     if (i < n) {
 71       throw new EvalException(location, "The function required since "
 72           + (n - i) + " arguments are remaining "
 73           + "and found the value or the class "
 74           + (function == null ? null : function.getClass().getName()));
 75     }
 76     return function;
 77   }
 78 
 79   /**
 80    * The curried function
 81    */
 82   static class CurriedFunction extends Function {
 83     /** the wrapped function */
 84     final Function wrapped;
 85     /** the fixed arguments of the function */
 86     final List<Object> fixedArguments;
 87 
 88     /**
 89      * The constructor for the curried function
 90      * 
 91      * @param wrapped function
 92      * @param fixedArguments fixed arguments
 93      */
 94     public CurriedFunction(Function wrapped, List<Object> fixedArguments) {
 95       super();
 96       this.wrapped = wrapped;
 97       this.fixedArguments = new ArrayList<Object>(fixedArguments);
 98     }
 99 
100     /** {@inheritDoc} */
101     public Object apply(SourceLocation location, List<Object> args) {
102       ArrayList<Object> a = new ArrayList<Object>(fixedArguments.size()
103           + args.size());
104       a.addAll(fixedArguments);
105       a.addAll(args);
106       return apply(location, wrapped, a);
107     }
108 
109     /** {@inheritDoc} */
110     public int arity() {
111       return wrapped.arity() - fixedArguments.size();
112     }
113   }
114 }
115

The class is an abstract class for all functions and supports currying and uncurrying during the function invocation. The class is terribly inefficient, but it does its job for the tutorial.

To interface with Java, we will also support the annotation JavaMethodAnnotation that will bind the variable with the function that use reflection to invoke the Java's static or instance method:

Example 50. JavaMethodAnnotation.java

  1 package calculator.interpreter.annotations;
  2 
  3 import java.lang.reflect.Method;
  4 import java.lang.reflect.Modifier;
  5 import java.util.List;
  6 import net.sf.etl.parsers.SourceLocation;
  7 import calculator.interpreter.AnnotationValue;
  8 import calculator.interpreter.Environment;
  9 import calculator.interpreter.EvalException;
 10 import calculator.interpreter.EvalUtils;
 11 import calculator.interpreter.Function;
 12 import calculator.interpreter.Variable;
 13 
 14 /**
 15  * The java method annotation
 16  */
 17 public class JavaMethodAnnotation extends JavaReflectionProcessor {
 18   /** {@inheritDoc} */
 19   public void process(AnnotationValue annotation, Environment env, Variable var) {
 20     if (annotation.arguments.size() < 2) {
 21       throw new EvalException(annotation.location,
 22           "Wrong number of the arguments for annotation expected at least 2 and found "
 23               + annotation.arguments);
 24     }
 25     Class<?> c = getClass(annotation, 0);
 26     String methodName = EvalUtils.expectString(annotation.location,
 27         "the field name (the argument " + 1 + ")", annotation.arguments.get(1));
 28     Class<?>[] parameterTypes = new Class[annotation.arguments.size() - 2];
 29     for (int i = 0; i < annotation.arguments.size() - 2; i++) {
 30       parameterTypes[i] = getClass(annotation, i + 2);
 31     }
 32     try {
 33       Method method = c.getMethod(methodName, parameterTypes);
 34       var.bind(annotation.location, new MethodFunction(method));
 35     } catch (Exception e) {
 36       throw new EvalException(annotation.location, "The method " + methodName
 37           + " cannot be located: " + e);
 38     }
 39   }
 40 
 41   /**
 42    * The method based function
 43    */
 44   private static class MethodFunction extends Function {
 45     /** the method to invoke */
 46     private final Method method;
 47     /** the arity of the method */
 48     private final int arity;
 49     /** the static modifier */
 50     private final boolean isStatic;
 51 
 52     /**
 53      * The constructor
 54      * 
 55      * @param method the wrapped method
 56      */
 57     public MethodFunction(Method method) {
 58       this.method = method;
 59       isStatic = Modifier.isStatic(method.getModifiers());
 60       arity = method.getParameterTypes().length + (isStatic ? 0 : 1);
 61     }
 62 
 63     @Override
 64     public Object apply(SourceLocation location, List<Object> args) {
 65       assert args.size() == arity;
 66       Object[] a = (isStatic ? args : args.subList(1, arity)).toArray();
 67       Object t = isStatic ? null : args.get(0);
 68       try {
 69         return method.invoke(t, a);
 70       } catch (Exception e) {
 71         throw new EvalException(location, "The method " + method
 72             + " cannot be invoked: " + e);
 73       }
 74     }
 75 
 76     @Override
 77     public int arity() {
 78       return arity;
 79     }
 80 
 81   }
 82 }
 83

In the implementation of the operator Lambda we just create an implementation of this interface and return it as a value of the operation. We keep the lexical scope for the function, so it is able to use all definitions available from it, and we add the parameter names bound to argument values to the scope. With this usage we naturally support recursive invocations.

Example 51. Lambda.java

  1 package calculator.interpreter.ast.lambda;
  2 
  3 import java.util.Iterator;
  4 import java.util.LinkedList;
  5 import java.util.List;
  6 import net.sf.etl.parsers.SourceLocation;
  7 import calculator.interpreter.Environment;
  8 import calculator.interpreter.Function;
  9 import calculator.interpreter.Variable;
 10 import calculator.interpreter.ast.Expression;
 11 
 12 /**
 13  * Lambda expression
 14  */
 15 public class Lambda extends Expression {
 16   /** the parameters of the lambda abstraction */
 17   public final LinkedList<Parameter> parameters = new LinkedList<Parameter>();
 18   /** the body of the lambda abstraction */
 19   public Expression value;
 20 
 21   @Override
 22   public Object eval(Environment env) {
 23     for (Parameter p : parameters) {
 24       env.checkFree(p.location, p.name.name());
 25     }
 26     return new LambdaFunction(env);
 27   }
 28 
 29   /**
 30    * The function implementation over lambda expression
 31    */
 32   private class LambdaFunction extends Function {
 33     /** The environment where lambda expression was created */
 34     final Environment environment;
 35 
 36     /**
 37      * The constructor for the function
 38      * 
 39      * @param env the environment where expression is created
 40      */
 41     public LambdaFunction(Environment env) {
 42       super();
 43       this.environment = env;
 44     }
 45 
 46     @Override
 47     public Object apply(SourceLocation location, List<Object> args) {
 48       assert arity() == args.size();
 49       Environment env = environment;
 50       Iterator<Object> ia = args.iterator();
 51       for (Iterator<Parameter> i = parameters.iterator(); i.hasNext();) {
 52         Parameter p = i.next();
 53         Object arg = ia.next();
 54         Variable var = new Variable(p.name.name(), "var".equals(p.type),
 55             p.location, null, null);
 56         var.bind(location, arg);
 57         env = new Environment(env, var);
 58       }
 59       return value.eval(env);
 60     }
 61 
 62     @Override
 63     public int arity() {
 64       return parameters.size();
 65     }
 66   }
 67 }
 68

In the operator LambdaSequence we just delegate all hard work to the lambda operator. This is actually an expected way how extensions over existing constructs are provided when using ETL. Instead of creating some macro, the transformations are done over objects during compilation or interpretation. However it is possible to introduce a kind of macro facility and it will be possibly covered in some future tutorials.

Example 52. LambdaSequence.java

  1 package calculator.interpreter.ast.lambda;
  2 
  3 import calculator.interpreter.Environment;
  4 import calculator.interpreter.ast.Sequence;
  5 
  6 /**
  7  * The sequence that might include the parameter statement
  8  */
  9 public class LambdaSequence extends Sequence {
 10   /** converted expression */
 11   private Lambda converted;
 12 
 13   @Override
 14   public Object eval(Environment env) {
 15     if (statements.isEmpty() || !(statements.get(0) instanceof BlockParameters)) {
 16       return super.eval(env);
 17     } else {
 18       if (converted == null) {
 19         BlockParameters params = (BlockParameters) statements.get(0);
 20         Lambda l = new Lambda();
 21         l.location = location;
 22         l.parameters.addAll(params.parameters);
 23         Sequence s = new Sequence();
 24         s.location = location;
 25         s.statements.addAll(statements.subList(1, statements.size()));
 26         l.value = s;
 27         converted = l;
 28       }
 29       return converted.eval(env);
 30     }
 31   }
 32 }
 33

The letrec statement just expands itself two the sequence of VarStatement and ExpressionStatemenet instances on the first execution. So the statement works just like a macro, but it is implemented on AST level.

Example 53. LetRec.java

  1 package calculator.interpreter.ast.vars;
  2 
  3 import java.util.LinkedList;
  4 import calculator.interpreter.Cell;
  5 import calculator.interpreter.Environment;
  6 import calculator.interpreter.ast.ExpressionStatement;
  7 import calculator.interpreter.ast.Statement;
  8 
  9 /**
 10  * The letrec allows defining mutually recursive definitions. It is equivalent
 11  * to a sequence of unbounded let statements followed by assignment.
 12  */
 13 public class LetRec extends Statement {
 14   /** definitions from letrec */
 15   public final LinkedList<LetRecPart> definitions = new LinkedList<LetRecPart>();
 16   /** true means that letrec parts has been already transformed */
 17   private boolean converted = false;
 18   /** transformed statements */
 19   private final LinkedList<Statement> statements = new LinkedList<Statement>();
 20 
 21   @Override
 22   public Environment eval(Environment env, Cell result) {
 23     if (!converted) {
 24       for (LetRecPart p : definitions) {
 25         VarStatement s = new VarStatement();
 26         s.location = p.location;
 27         Statement metadataSource = p.hasMetadata() ? p
 28             : this.hasMetadata() ? this : null;
 29         if (metadataSource != null) {
 30           s.annotations.addAll(metadataSource.annotations);
 31           s.documentation.addAll(metadataSource.documentation);
 32         }
 33         s.name = p.name;
 34         statements.add(s);
 35       }
 36       for (LetRecPart p : definitions) {
 37         ExpressionStatement s = new ExpressionStatement();
 38         s.location = p.location;
 39         Assignment a = new Assignment();
 40         a.location = p.location;
 41         a.first = p.name;
 42         a.second = p.value;
 43         s.value = a;
 44         statements.add(s);
 45       }
 46       converted = true;
 47     }
 48     for (Statement s : statements) {
 49       if (result != null)
 50         result.value = null;
 51       env = s.eval(env, result);
 52     }
 53     return env;
 54   }
 55 }
 56

The class LetRecPart is just a data holder for it.

Example 54. LetRecPart.java

  1 package calculator.interpreter.ast.vars;
  2 
  3 import calculator.interpreter.Cell;
  4 import calculator.interpreter.Environment;
  5 import calculator.interpreter.EvalException;
  6 import calculator.interpreter.ast.Expression;
  7 import calculator.interpreter.ast.Statement;
  8 
  9 /**
 10  * The part of letrec statement
 11  */
 12 public class LetRecPart extends Statement {
 13   /** the name part */
 14   public Name name;
 15   /** the value part */
 16   public Expression value;
 17 
 18   @Override
 19   public Environment eval(Environment env, Cell result) {
 20     throw new EvalException(this, "This statement is available only as "
 21         + "a part of letrec statement and should not be "
 22         + "evaluated directly");
 23   }
 24 }
 25

The implementation of other operators is even more trivial.

6.3. Running it

We do not need any extensions to the runner RunnerVars to to run sources with this language level. Just specify language level lambda .

7. Using Catalogs

Our samples referred to the calculator grammars and included sources using relative URLs up to this point. This way has an obvious problem the reference is context dependent. If we move the file around, we will need to change the reference. This will also have problem with a typical plugin situation, when grammars and reusable sources are located at the strange places.

Absolute URLs will not solve the problem either. Absolute local files URLs are system dependent and will mean different things on the different computers. The network URLs like http or ftp not only require network availability, but also make us to rely on some party to maintain these servers online and continue keeping the files as the specified locations.

Another way to refer to the grammars is using the default grammar mechanism that we have already implemented in our interpreter. But this mechanism has disadvantages as well. Only one grammar could be specified as default. So we are unable to make backward incompatible changes to the grammar, old sources should conform the new grammar as well. This is actually a big deal. The ETL allows introducing backward incompatible changes into the languages when extending the grammars, so allowing graceful fixing language design mistakes, and you are bound to make ones if you are working hard enough. Using the default grammar mechanism removes this power from the language designer. So this mechanism is better to be supported only for the short lived text (like stdin).

In XML world this problem is solved using XML catalogs ^[7] URIs, and public identifiers. The ETL is designed to reuse this mechanism for grammars as well and it is possible use this mechanism for our language as well (for the include statement).

For the catalog implementation we will use the resolver library from the Apache XML commons project . This library is redistributed within ETL package as it is used by the etl2xml utilities. This library should be added to the project classpath. However the catalog library is used only through the interface EntityResolver , so it possible to provide other implementations of this interface.

7.1. Defining the grammar

First we need to define the catalog file, the catalog file that includes the new grammar and prelude files will look like the following:

Example 55. catalog.xml

  1 <?xml version="1.0" encoding="UTF-8"?>
  2 <!DOCTYPE catalog
  3   PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
  4          "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
  5 <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"
  6          prefer="public">
  7     <public publicId="-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar basic//EN"
  8             uri="calculator-basic-0_2_1.g.etl"/>
  9     <public publicId="-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar arith//EN"
 10             uri="calculator-arith-0_2_1.g.etl"/>
 11     <public publicId="-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar vars//EN"
 12             uri="calculator-vars-0_2_1.g.etl"/>
 13     <public publicId="-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar logic//EN"
 14             uri="calculator-logic-0_2_1.g.etl"/>
 15     <public publicId="-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar lambda//EN"
 16             uri="calculator-lambda-0_2_1.g.etl"/>
 17     <public publicId="-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar catalog//EN"
 18             uri="calculator-catalog-0_2_1.g.etl"/>
 19     <public publicId="-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar//EN"
 20             uri="calculator-catalog-0_2_1.g.etl"/>
 21     <public publicId="-//IDN etl.sf.net//Calculator Sample Prelude vars//EN"
 22             uri="prelude-vars.c.etl"/>
 23     <public publicId="-//IDN etl.sf.net//Calculator Sample Prelude logic//EN"
 24             uri="prelude-logic.c.etl"/>
 25     <public publicId="-//IDN etl.sf.net//Calculator Sample Prelude lambda//EN"
 26             uri="prelude-lambda.c.etl"/>
 27     <public publicId="-//IDN etl.sf.net//Calculator Sample Prelude catalog//EN"
 28             uri="prelude-catalog.c.etl"/>
 29     <public publicId="-//IDN etl.sf.net//Calculator Sample Prelude//EN"
 30             uri="prelude-catalog.c.etl"/>
 31 </catalog>

The only thing that we will add to the grammar in this section is an include directive that supports public identifiers in addition to system identifiers. And note that we referred to the lambda grammar using a public identifier.

Example 56. calculator-catalog-0_2_1.g.etl

     1: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
     2: grammar calculator.CalculatorCatalog {
     3:     include public "-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar lambda//EN";
     4:     namespace default t = "http://etl.sf.net/samples/calculator";
     5:     context default Expressions {
     6:         statement IncludeStatement {
     7:             ^t:IncludeStatementCatalog {
     8:                 % include;
     9:                 {
    10:                     @ systemId = string(quote='"');
    11:                     % public {
    12:                         @ publicId = string(quote='"');
    13:                     }?;
    14:                 } | {
    15:                     % public {
    16:                         @ publicId = string(quote='"');
    17:                     };
    18:                 };
    19:             };
    20:         };
    21:     };
    22: };
    23:

The only new definition is an extended version of the statement IncludeStatement . Now it supports public identifiers in addition to system identifiers. With this statement defined we now are able to refer to the calculator files like the following:

Example 57. prelude-catalog.c.etl

     1: doctype "urn:publicid:-:IDN+etl.sf.net:Calculator+Sample+0.2.1+Grammar:EN"; 
     2: include public "-//IDN etl.sf.net//Calculator Sample Prelude lambda//EN";
     3:

You could check also check prelude AST to see how our changes in the grammar affect include statement.

Note that we have referred to the grammar using the urn form of the public identifier as system identifiers. It is possible to map any URIs using catalogs as well, and "urn:publicid" is a special form of URI that is equivalent to public identifiers. However it is recommended to use public identifier explicitly if it is possible.

And this final part of the tutorial we could run all previous sources:

Example 58. sample-catalog.c.etl

     1: doctype public "-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar//EN";
     2: // run all samples
     3: include "sample-basic.c.etl";
     4: include "sample-arith.c.etl";
     5: // since vars conflicts with logic over variable name, load it in the separate context
     6: { include "sample-vars.c.etl"; };
     7: include "sample-logic.c.etl";
     8: include "sample-lambda.c.etl";
     9: help is_odd;
    10: help q'!*=';

You could check also check sample AST .

Note that the grammar is referenced using public identifier.

7.2. Defining AST classes

There is only new include statement in AST classes. And by itself it needs very little additional work as we have left very handy extension point in the original statement.

Example 59. IncludeStatementCatalog.java

  1 package calculator.interpreter.ast;
  2 
  3 import net.sf.etl.parsers.LiteralUtils;
  4 import org.xml.sax.InputSource;
  5 
  6 /**
  7  * Redefined include statement that uses public id as well
  8  */
  9 public class IncludeStatementCatalog extends IncludeStatement {
 10   /** The public identifier */
 11   public String publicId;
 12 
 13   @Override
 14   protected InputSource createInputSource() {
 15     String resolvedSystemId = systemId == null ? null : resolveSystemId();
 16     InputSource input = new InputSource();
 17     input.setSystemId(resolvedSystemId);
 18     input.setPublicId(LiteralUtils.parseString(publicId));
 19     return input;
 20   }
 21 }
 22

We just handle the case of the missing system identifier and specify public identifier if it is available. We let the ETL parser to handle the public identifier.

7.3. Running it

The new include statement rely on the fact that the parser configured with entity resolver, the runner RunnerCatalog just provides it.

Example 60. RunnerCatalog.java

  1 package calculator.interpreter;
  2 
  3 import java.net.URL;
  4 import org.apache.xml.resolver.CatalogManager;
  5 import org.apache.xml.resolver.tools.CatalogResolver;
  6 
  7 /**
  8  * The runner that understands XML catalogs using entity resolver library
  9  */
 10 public class RunnerCatalog extends RunnerVars {
 11   /**
 12    * The collection of catalog files.
 13    */
 14   private final StringBuilder catalogs = new StringBuilder();
 15   /**
 16    * if true, the grammars catalog on the class path will be attempted to be
 17    * loaded.
 18    */
 19   private boolean loadSystemCatalog = true;
 20   /**
 21    * if true, only public identifiers will be used for the default grammars.
 22    */
 23   private boolean publicIdOnly = false;
 24   /**
 25    * The catalog resolver
 26    */
 27   private CatalogResolver resolver;
 28 
 29   @Override
 30   protected void parseArgs(String[] args) {
 31     super.parseArgs(args);
 32     // add grammars catalog if it is on classpath and it is the file
 33     if (loadSystemCatalog) {
 34       URL systemCatalog = getClass().getResource(
 35           "/calculator/grammars/catalog.xml");
 36       if (systemCatalog == null) {
 37         System.err.println("The system catalog is not on the classpath");
 38         usage();
 39       }
 40       if (!"file".equals(systemCatalog.getProtocol())) {
 41         System.err.println("Only the file protocol is suported: "
 42             + systemCatalog);
 43         usage();
 44       }
 45       if (catalogs.length() != 0) {
 46         catalogs.append(';');
 47       }
 48       catalogs.append(systemCatalog.getPath());
 49     }
 50     CatalogManager mgr = CatalogManager.getStaticManager();
 51     mgr.setCatalogFiles(catalogs.toString());
 52     mgr.setPreferPublic(true);
 53     resolver = new CatalogResolver(mgr);
 54   }
 55 
 56   @Override
 57   protected String getDefaultGrammar() {
 58     return publicIdOnly ? null : super.getDefaultGrammar();
 59   }
 60 
 61   @Override
 62   protected String getDefaultGrammarPublicId() {
 63     return "-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar "
 64         + getLanguageLevel() + "//EN";
 65   }
 66 
 67   /**
 68    * @return the resolver for the catalogs
 69    */
 70   public CatalogResolver resolver() {
 71     return resolver;
 72   }
 73 
 74   @Override
 75   protected int parseOption(String[] args, int i) {
 76     if ("-c".equals(args[i])) {
 77       // the catalog file on the command line
 78       if (i + 1 >= args.length) {
 79         System.err.println("Missing argument for -c option");
 80         usage();
 81       }
 82       if (catalogs.length() != 0) {
 83         catalogs.append(';');
 84       }
 85       catalogs.append(args[i + 1]);
 86       return i + 2;
 87     }
 88     if ("--no-system-catalog".equals(args[i])) {
 89       // disable loading of the system grammar catalog
 90       loadSystemCatalog = false;
 91       return i + 1;
 92     }
 93     if ("--no-system-default-grammar".equals(args[i])) {
 94       // disable loading of the system grammar catalog
 95       publicIdOnly = true;
 96       return i + 1;
 97     }
 98     return super.parseOption(args, i);
 99   }
100 
101   @Override
102   protected String getDefaultLanguageLevel() {
103     return "catalog";
104   }
105 
106   @Override
107   protected String optionList() {
108     return super.optionList()
109         + " [-c <catalog file>] [--no-system-catalog] [--no-system-default-grammar]";
110   }
111 
112   @Override
113   protected void optionUsage() {
114     super.optionUsage();
115     System.err.println(" -c <catalog file> - load catalog file");
116     System.err.println(" --no-system-catalog - disable loading"
117         + " system catalog");
118     System.err.println(" --no-system-default-grammar - disable providing "
119         + "system id for the default grammar");
120   }
121 
122   @Override
123   protected void createFactory() {
124     super.createFactory();
125     factory.setEntityResolver(resolver);
126   }
127 
128   /**
129    * Application entry point.
130    * 
131    * @param args application arguments
132    */
133   public static void main(String[] args) {
134     new RunnerCatalog().start(args);
135   }
136 }
137

The resolver itself is created in the method parseArgs(...) . We also add a catalog with calculator grammars if it is on the classpath. Note that the resolver library only support catalogs on the file system. So if you package the calculator application in the jar file, the loading of the system grammar catalog will stop working. In this case you will need to provide catalog on the command line explicitly. When creating the tool that support language extensions it is very likely that a custom resolver has to be created that is aware of the plugin architecture. For example, Eclipse has implemented own XML catalog support for editing XML files.

In the method createFactory() we set this created resolver to the parser factory, so it will associated with each new parser created using this factory instance. The resolver will be used for both the ETL sources to be parsed and the grammar files (as a grammar file is a ETL file).

No even can use only public identifiers for the specifying the default grammar as well. The method getDefaultGrammarPublicId() returns the public identifier for the default grammar, and the method getDefaultGrammar() now can be forced to return null, in the case if we want to check if public identifiers are actually used.

If you run this this runner it will complain that it cannot locate the file CatalogManager.properties . This message is harmless, but you could put the following file on the classpath:

Example 61. CatalogManager.properties

  1 #verbosity=10
  2 prefer=public
  3 relative-catalogs=false

Note that if you will uncomment the line with the property verbosity=10 you will be able to see the messages about the resolution process

7.4. Evolving the language

The public and system identifiers play an important role in the language evolution. The grammar is a mapping from the input text file to AST and the language could change in the two aspects:

The allowed set of inputs could change, by introducing new operator and keywords and by removing old ones from the language.
The mapping from tokens to AST could change. So previously the same input could have a different meaning to the tools.

The grammar could be also changed in the way that changes neither of these aspects. For example new abstract context and grammars could be introduced, or other syntax expressions with the same meaning could be used. Also grammar could be changed from using grammar include directive with the previous version of the grammar, to direct inclusion of the definitions. Also the used grammar definition language itself could be changed, since ETL is still evolving.

There is a little problem if such changes happen during private evolution of the language, when the both language and sources that conform to it are changed in the parallel. However if the grammar is released into the wild, the users would like be sure that the meaning of the sources that they have created is not changed behind their backs. For example an addition of a new operator or statement that uses a specific identifier as a keyword (for example "until") could cause an error in sources that use this identifier as a name.

System and public identifiers provide a way around this problem. The changed grammar could be referred by a new identifier. So the some sources will use old grammars and other sources will use new grammars. This does not imply that old grammars should be supported infinitely, but if support for a grammar is dropped and grammar is removed, the user will receive a error message that the grammar could not be located. As alternative, the user could receive an error message saying that his grammar maps to the unsupported AST object, if we drop support for the grammar on the AST level. The user than has to visit the file and change public identifier in the source that uses obsolete grammar and adjust the source to conform to the new grammar (or in case of feeling lucky just do global search and replace of doctype instructions).

It is also possible to create a tool that upgrades old grammar syntax to a new one. During the development of the ETL parser version 0.2.1 there were some changes of the grammar definition comparing with the version 0.2.0. To upgrade the sample grammars an XSLT script was written that replaces old syntax constructs with new ones. This script is included in the distributive of the ETL framework. Considering that the changes were mostly cosmetic ones, XSLT was enough. It might be also possible to write tool that does smarter things, for example replacing names that match new keywords with some quoted form.

Public identifiers are generally more convenient than system ones for distributing multiple grammars. If there was no changes between "1.0b2", "1.0rc", and "1.0" versions of the grammar, the public identifiers that correspond to these versions could simply be mapped to the same grammar file using the catalog file.

8. Conclusion

In this tutorial we have defined a simple calculator language that demonstrated typical grammar language features. And we have mapped it to the simple AST. However some features of the grammar language are still uncovered and you will need to study the definition of ETL grammar and other sample grammars.

As you have possibly noticed, we have used concepts natural to language design. We have added new grammars, contexts, operators, statements, and fragments. We did not need to translate these concepts to EBNF-like notation. The ETL grammar compiler does this work for us. We also did not have to think how to update productions of update grammars when we have changed something. We have just redefined what we needed to change. This is what makes ETL a higher-level grammar definition languages than most of others language definition frameworks.

Note that while our sample language works, we have used the simplest possible implementation we can get away with. We also have not defined some trivial parts like the collection types. They bring few new things from point of view of the syntax. So please do not consider this tutorial to be a tutorial on the topic of programming language implementation.

Another important uncovered aspect is how to support macros using ETL. It is possible to create a limited macro facility using ETL means, but it requires jumping through some loops and will be covered in the next versions of the tutorial.

^[1] There are also extensibility constructs that are not covered in this tutorial. Consult ETL specification, ETL grammar definition, and samples that come with ETL parser to get more information.

^[2]Actually lexical level picked a lot of ideas from Prolog, Java, and even Ada. So you will likely find a lot of familiar lexical elements.

^[3]The significant token is anything except white spaces and comments (however documentation comments are considered as significant).

^[4]Those who are familiar with the Prolog programming language surely already guessed the source of inspiration for this notation.

^[5]It might have been done in a simpler way, but in this way we make some use of the annotations.

^[6]This shows how important is choosing right core abstractions when designing a language. However ETL allows defining languages that are not computationally complete, so you do not have to start creating a complete programming language when defining a DSL.

^[7]It is strongly advised to read the article if you are not familiar with XML catalogs, as it introduces some terms that are not explained here.