Revision History | |
---|---|
Revision 0.2.1 | 2009-01-19 |
The initial version of the tutorial for the ETL framework version 0.2.1. |
Abstract
This document describes how to create a simple calculator language using ETL. The tutorial introduces the most of the extensibility constructs of the ETL grammar definition language. The tutorial assumes the ETL parser version 0.2.1.
Table of Contents
Firstly, you need to download and unpack archive with the parser in any directory. You could do it by going to the project page on the SourceForge: http://sf.net/projects/etl . Unpack the archive to the any directory (preferably to one that does not contain spaces in the path) and follow installation instructions . It is recommended generating xmlout sample files, since this tutorial links to them in a number of places in order to show ASTs for samples.
To start working with the framework you will need just the
etl-parser.jar
library. This jar file does not have any external dependencies. It
uses
java.util.logging
as a logging framework.
Note that all Java sources, grammar and sample files used in this tutorial are included into the ETL package.
Also you could browse html and xml files that represent etl sources for this tutorial at the directory if you have downloaded xmlout version of the package or generated xml and html files according to installation instructions ../xmlout/calculator .
In this part of the tutorial will consider a minimal grammar that works.
Let's create a very simple grammar that contains only print
statement and the only expression that supported are positive
numbers. Let's place the grammar in the file
calculator-basic-0_2_1.g.etl
.
[1]
:
Example 1. calculator-basic-0_2_1.g.etl
1: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
2: grammar calculator.CalculatorBasic {
3: namespace default t = "http://etl.sf.net/samples/calculator";
4: context default Expressions {
5: op composite NumberLiteral(f) {
6: @ value = float | integer;
7: };
8: op composite StringValue(f) {
9: ^ t:StringLiteral {
10: @ value = string(quote='"');
11: };
12: };
13: statement PrintStatement {
14: % print;
15: @ values += list , {
16: expression;
17: };
18: };
19: statement ExpressionStatement {
20: @ value = expression;
21: };
22: };
23: };
It is a current conventions for ETL grammars to end with extension
.g.etl
for the ETL sources to end with extension
.{language-specific-part}.etl
. And we use the extension
.c.etl
for sources that conform to the calculator grammars. However the ETL
parser does enforces any naming convention for the grammars or
sources conforming to them.
The first thing to note is that ETL shares the common lexical syntax across all languages that could be defined using ETL. The lexical level contains integers, string, identifiers, and graphic tokens [2] . The keywords in ETL are not global like in other languages, but they are normal tokens like identifiers that play role of the keyword in the specific place of the grammar, but which could be used freely in other places.
In addition to the lexical syntax, ETL languages share the common phrase syntax . The main elements of the phrase syntax are blocks and segments. The segment is a sequence of significant tokens [3] and blocks terminated with semicolon. The block is a sequence of segments enclosed into braces. The ETL source is a sequence of segments.
Above the phrase level, the ETL has a term syntax . The term syntax is defined using grammars, and the grammar maps the parsed phrase layer to AST consisting from objects and properties. The object is type is a pair of namespace and name (just like the in case of XML element). The property is identified by the name. The properties are assumed to be of the two types: single value and multiple value properties. The ETL term parser maps structure returned by phrase parser to the object tree that is directly usable by the tools. The resulting object model is quite simple and flexible and there are ready parsers to map it to JavaBeans, EMF, Java objects with public fields. It is also easy to create a custom AST parser that would map to other object structures like JSON or MOF, since at the core of the framework is an event parser that works as a kind of iterator that simultaneously iterates events returned by lexer, phrase parser, and term parser.
The ETL grammar syntax is also defined using ETL, and it is also
organized into segments and blocks. The grammar like the most other
ETL files starts with
document type declaration
. In some cases the document type declaration could be missing and
in that cases the ETL parser will use the default grammar. In this
particular case the grammar is identified using
public identifier
, that has the same semantics and syntax like in XML. The mapping
for the grammar's public identifier
-//IDN etl.sf.net//ETL//Grammar 0.2.1
is implemented in the parser. For other public identifiers the
entity resolver should be used. The Apache XML catalog resolver
could be used for building resolution basing on a number of catalog
formats. It is also possible to specify a system identifier, and we
will get back to this topic in the
later parts of the tutorial
. The document type declaration starts with the
doctype
keyword. And the document declaration statement could happen only as
the first statement in the ETL source.
The second line is the start of the
grammar declaration
. The grammar declaration starts with the
grammar
keyword. Then there is a fully qualified name of the grammar that
uses syntax similar to the syntax of fully qualified Java class
name. This name is later available in events returned by parser, so
it will be possible to recognize that source belongs to the specific
parser. But it does not affect process otherwise. The any name could
be specified here, but it is recommended to follow Java naming
conventions. The curly brace starts a block with grammar contents.
The first thing to note is that grammar content is unordered. And the top level grammar members could happen in any order, without changing their semantics. However it is a good style to put grammar wide declarations at the start of the grammar.
The first line the inside our grammar is a namespace declaration . Like in XML, a namespace declaration creates mapping between prefix and URL. This prefix is used later in object creation expressions. This particular namespace is designated as a default one.
The namespace declaration is followed by the
context declaration
. The context declaration starts with the
context
keyword, this keyword is optionally followed by the modifiers
default
or
abstract
. The modifier
abstract
is covered
later
in this tutorial. The modifier
default
specifies which context is used for the top level of the source that
conforms to the grammar if no context is specified in the document
type declaration. If no context is declared as default in the
grammar, the context should be specified doctype statement or set in
the parser. We have exactly one context in the parser, and we
designate it as a default one. The modifier
default
in our sample grammar is followed by the context name. The context
name has the grammar scope and it used to refer the context from
different parts of the grammar. The information about the current
context is also available from parser events, and it can be used in
ETL processing tools.
The context contains four statements each creating a new
declaration. The first one is the
operator declaration
that declares a numeric literal value. The keyword
op
specifies that an operator is being declared. The keyword
composite
specifies that an operator is defined using the free-form syntax in
the body. This keyword is followed by the operator name within
context. If the default namespace is specified and the operator does
not contain an object creation expression at root level, this name
is used to construct the object name for this operator. This name
also plays an important role in the extensibility scenarios
supported by ETL. In the parentheses after the operator name goes
the operator type specification. The type
f
means that operator is a
primary operator
has neither left nor right argument. We will return to the operator
types in the later sections of the tutorial.
The operator declarations form expression syntax for the context. Each operator declaration introduces infix, prefix, suffix, or primary operator. If the context defines expressions, at least one primary operator must present in the context grammar, or it will not be possible to build an expressions.
The primary operator is the simplest operator kind. It has only
operator expression, which is wrapped into the operator's object (in
our case this operator's object is specified implicitly). The
content of our operator definition contains a
property assignment statement
(or
let statement
) starting with
@
. The keyword
@
is followed by property name and assignment operator. There are two
assignment operators used in ETL grammars
=
(single value assignment) and
+=
(multiple value assignment). The multiple assignment should be used
if the property in the underlying model is multivalue property (like
list or array). The single assignment should be used if the property
is expected to be a single value one.
On the right hand side of the property statement is a
syntax expression
. Syntax expressions in ETL consume some tokens in the stream and
yield objects, properties, and values. This particular expression
can consume either ETL integer or ETL float, and yields them as
values. If this operator matches, a logical AST node will produced
with the object name
t:NumberLiteral
and the single property
value
containing integer or float token text.
The next operator definition defines a string literal. Differently
from NumberLiteral, it defines root object for the value explicitly.
This is done when either namespace is different from default
namespace, or when name of object does not match the name of the
definition (like it was done in this case for no particular reason).
The operator matches a string value with double quote. It is
recommended to use a shorter form when it is possible. However the
full form shown in the case of
StringLiteral
is always available.
The ETL expressions are not directly usable. They must be contained in the statements which are organized in the blocks. The next member of the context defines a print statement. Like an operator, the statement maps to a single object, which name is in our case inferred just like the operator's object name.
The block that is contained in the statement, contains several segments. Each segment of the statement is evaluated sequentially if the statement is matched. The statements of two kinds are allowed here, a let statement (that was discussed in the operator's description) and expression statement.
The first segment in the statement definition contains
% print
expression. This expression matches the identifier
print
and yields no value. The keyword
%
acts like an escape here, specifying that the next token is a
keyword of the grammar that is being created. Several keywords could
be specified in the sequence and they could be interleaved by the
blocks, which could contain syntax expressions as well.
There are few things to note in the next statement as well. The
assignment operator in let statement has the form
+=
meaning that multiple values are expected. The the immediate syntax
expression to match is the list operator. That matches values in its
content separated by separator that comes right after the
list
keyword (in our cast it is comma). The elements in the list are
specified by the block, that contains the syntax expression
expression
. The form used in this statement means any expression defined using
operators in this context. It is possible to refer expressions in
other context like
expression(Expressions)
, and this form has the similar effect to expression without
arguments in this case, but generally it behaves differently in the
case of extensibility constructs. For this grammar the expression
will match either integer literal or string literal. However we
extend the grammar further with additional elements in the later
sections of the tutorial.
The last definition is the statement ExpressionStatement, this is a simple statement that contains a single expression. When the statement is executed, it evaluates the expression.
This is a small grammar that will serve as a basis for the rest of the tutorial. It will be extended in the later sections of this tutorial with additional elements, but it could be already used to run some samples (for example for converting positive numbers from bases 2..36 to a decimal form, or for converting Unicode escapes to the characters).
The ETL file that matches this small grammar is actually very
simple. Lets place the following text in the file
sample-basic.c.etl
located in the same directory as grammar file.
Example 2. sample-basic.c.etl
1: doctype "calculator-basic-0_2_1.g.etl";
2: print "zeroes: ", 24#0#, " ", 0.0;
3: print "ones: ", 1, " ", 1.0, " ", 10E-1;
4: print "halves: ", 0.5, " ", 5.0E-1, " ", 2#0.001#e2;
5: print "dozens: ", 12#10#, " ", 3#1_1_0#, " ", 0.12e2, " ", 36#C#;
The sample is meant to print numbers in the specified form interleaved with string literals that are printed as is. The numbers on each line has the same numeric value but they are written in the different form. Note that ETL uses Ada-like based integers, and the last line contains a lot of examples. Also in ETL, it is possible to use underscore in the number separate logical groups.
You could check also check sample AST . Note that the AST is clickable provided that you have JavaScript enabled. You could click on object and property names on the left side to have a corresponding text highlighted, or text on the right side to have innermost object or property for that position highlighted.
The ETL has a build-in error recovery strategy. If a syntax error in the statement happens, the parsing of the statement is aborted until the end of the segment. The errors on the lexical level and phrase level do not cause syntax errors automatically by themselves, but since lexical errors are treated as white spaces, the parser will be likely confused by the next token. So as result of the parsing the segment an incomplete object tree will be built.
But in the case of lexical and phrase level errors, the parser is able to build at least partial object tree, that could be used to find out further semantic errors.
Let's consider some erroneous sources and how they parsed by to see
the error recovery in the process. You could check the sources using
the
etl2html
script that could be installed according to installation
instructions.
Example 3. calculator-basic-with-errors.g.etl
1: // This file demonstrates the error recovery strategy in ETL.
2: // Errors here are intentional.
3: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
4: grammar calculator.CalculatorBasic {
5: namespace default t = "http://etl.sf.net/samples/calculator";
6: context default Expressions {
7: op composite NumberLiteral(f) {
8: @ value = float | integer;
9: };
10: op composite StringValue(f) {
11: ^ t:StringLiteral {
12: @ value = string(quote='");
13: };
14: };
15: statement PrintStatement {
16: % print
17: @ values += list , {
18: expression;
19: };
20: };
21: statement ExpressionStatement {
22: @ value = expression
23: };
24: };
25: };
The table after the source summaries the errors and theirs positions. Let's look at generated errors, the first error is on the line 10 and relates to missing closing quote in the string. The lexer return an error saying that newline is not expected in this kind of the string. Then the phrase parser is unable to find a semicolon before end of the block, and complains about it as well. But it generates an event saying that the segment is finished. The term parser still expects a string, and when finds out that segment has ended, it reports a syntax error.
The next error is missing semicolon after the keyword
print
on the line 14. The term parser encounters the symbol
@
that could happen only at the beginning of the statement according
to the grammar syntax. The term parser skips until end of the
segment including blocks that are nested in this segment. The
skipped tokens are highlighted as red in the sample. Since there is
a semicolon before end of the block, the phrase parser has nothing
to complain about.
The last error is missing semicolon at the end of the block on the line 20. This error is detected by phrase parser. The phrase parser reports that there is an error. But it also reports that segment has ended. Since segment has ended where the term parser has expected it end, the term parser does not complain, and just forwards phrase parser error to the client code.
However despite of these errors, most of the AST is built anyway as you could see it here . Compare it with AST of the correct grammar . As you can see only information from erroneous nodes is missing.
Let's now see what will happen if we will try to use grammar that contains errors:
Example 4. sample-basic-with-errors.c.etl
1: // This file demonstrates the error recovery strategy in ETL.
2: // Errors here are intentional.
3: doctype "calculator-basic-with-errors.g.etl";
4: print "zeroes: ", 24#0, " " , 0.0 , " '', 0;
5: print "ones: ', 1, " ", 1.0, " ", 10E-1 {a;a};
6: print "halves: ", 0.5, " ", 5.0E-1, " ", 2#0.001#e2;
7: print "dozens: ", 12#10#, " ", 3#1_1_0#, " ", 0.12e2, " ", 36#C {{
The all grammar errors are reported when reading the source, but they are classified as the grammar errors, the position information supplied with the error points to the grammar source.
However the last grammar error is reported at the end of the doctype
directive, saying that the specified grammar could not be used. When
the error with the grammar happens, the term parser fall backs to
the default grammar that is able to accept any ETL source. Because
of this, we will not see syntax errors in the source. The grammar
treats all tokens the same, and you can see that
print
is not highlighted as an error. But the phrase errors and lexical
errors are still detected, we have made a plenty of them in the
source.
You could check also check sample AST to validate it.
Let's now write the code that supports this simple sample. We use
the simplest AST parser from ETL library,
FieldTermParser
.
To simplify mapping, it is recommended to put classes related to the
same namespace to the same package and to use the same class name as
an object name. The ETL AST parsers allow some degree of the
customization and it is possible to violate these recommendations.
The field term parser assumes there is some base class that holds
basic information like start and end position of node in the source
code, and structural information like parent child relationship. For
the tutorial we do no need parent child relationship relation, but
information about positions might be useful for error reporting. So
lets create
Node
class with the field location that hold position information. The
parser can either store positions as fields of the type
TextPos
, scatter it among the fields integer types, or store position
information as a single
location
field of the type
SourceLocation
(or it possible to customize processing in some other ways). We will
go for the latest variant and the result class will look like the
following. Note that class and field have a public access. Otherwise
the parser will not be able to get an access to them using
reflection.
Example 5. Node.java
1 package calculator.interpreter.ast; 2 3 import net.sf.etl.parsers.SourceLocation; 4 5 /** 6 * The basic node class 7 */ 8 public abstract class Node { 9 /** the position of the node */ 10 public SourceLocation location; 11 } 12
Then lets define a simple class
Expression
with the abstract method
Object eval(Environment env)
. Ignore the
env
parameter in this section of the tutorial; we will define this class
later
. In this section of the tutorial, this class is unused and it could
be an empty class.
Example 6. Expression.java
1 package calculator.interpreter.ast; 2 3 import calculator.interpreter.Environment; 4 5 /** 6 * The base class for all expressions. 7 */ 8 public abstract class Expression extends Node { 9 /** 10 * Evaluate the value of the expression. 11 * 12 * @param env the evaluation environment 13 * @return the resulting value 14 */ 15 public abstract Object eval(Environment env); 16 } 17
The classes
StringLiteral
and
NumberLiteral
have a similar field
value
that is shared among them. Let's create the abstract class
Literal
to contain a shared state.
Example 7. Literal.java
1 package calculator.interpreter.ast; 2 3 /** 4 * The literal expression 5 */ 6 public abstract class Literal extends Expression { 7 /** The literal value */ 8 public String value; 9 } 10
Then we just need to implement method
eval()
in both classes. But it is trivial when using the utility class
LiteralUtils
from the parser. This class allows parsing number and string tokens.
Example 8. StringLiteral.java
1 package calculator.interpreter.ast;
2
3 import net.sf.etl.parsers.LiteralUtils;
4 import calculator.interpreter.Environment;
5
6 /**
7 * The string literal
8 */
9 public class StringLiteral extends Literal {
10 @Override
11 public Object eval(Environment env) {
12 return LiteralUtils.parseString(value);
13 }
14 }
15
Note that for sake of simplicity, we use only double numbers here.
Example 9. NumberLiteral.java
1 package calculator.interpreter.ast;
2
3 import net.sf.etl.parsers.LiteralUtils;
4 import calculator.interpreter.Environment;
5
6 /**
7 * The number literal
8 */
9 public class NumberLiteral extends Literal {
10 @Override
11 public Object eval(Environment env) {
12 return LiteralUtils.parseDouble(value);
13 }
14 }
15
Let's now implement statement classes. Let's define the
Statement
abstract class. Right now we have two statement types, but we will
add more in later sections of the tutorial. There are fields
annotations
and
documentation
. Ignore these fields for now, we will cover them in the
later sections
. The class
Statement
also has the following abstract method
Environment eval(Environment context, Cell
result)
. We will discuss the parameter
result
later
. And the argument value
env
should be returned as is if statement does nothing with the
environment (and our print and expression statements are such
statements).
Example 10. Statement.java
1 package calculator.interpreter.ast; 2 3 import java.util.LinkedList; 4 import calculator.interpreter.Cell; 5 import calculator.interpreter.Environment; 6 import calculator.interpreter.ast.meta.Annotation; 7 import calculator.interpreter.ast.meta.DocumentationLine; 8 9 /** 10 * The base statement class 11 */ 12 public abstract class Statement extends Node { 13 /** the annotations associated with statement */ 14 public final LinkedList<Annotation> annotations = new LinkedList<Annotation>(); 15 /** the documentation lines associated with statement */ 16 public final LinkedList<DocumentationLine> documentation = new LinkedList<DocumentationLine>(); 17 18 /** 19 * @return true if the statement has metadata 20 */ 21 public boolean hasMetadata() { 22 return !annotations.isEmpty() || !documentation.isEmpty(); 23 } 24 25 /** 26 * The evaluate statement 27 * 28 * @param env the evaluation environment 29 * @param result the evaluation result (the cell might be null, in that case 30 * the result must be discarded). 31 * @return new context (if environment is not changed returns {@code env} 32 * argument) 33 */ 34 public abstract Environment eval(Environment env, Cell result); 35 } 36
The print statement should be able to reference multiple child
nodes. The
FieldTermParser
currently supports
ArrayList
and
LinkedList
collection types. Other collection types might be supported in the
future versions of the parser. The implementation of the
PrintStatement is a quite simple. It just evaluates all its
arguments and print results using
System.out.print(o)
method, and than prints new line.
Example 11. PrintStatement.java
1 package calculator.interpreter.ast;
2
3 import java.util.LinkedList;
4 import calculator.interpreter.Cell;
5 import calculator.interpreter.Environment;
6
7 /**
8 * The print statement
9 */
10 public class PrintStatement extends Statement {
11 /** the values to print */
12 public final LinkedList<Expression> values = new LinkedList<Expression>();
13
14 @Override
15 public Environment eval(Environment env, Cell result) {
16 for (Expression e : values) {
17 System.out.print(e.eval(env));
18 }
19 System.out.println();
20 return env;
21 }
22 }
23
The class
ExpressionStatement
has even simpler structure as it contains only a single expression.
This expression is simply evaluated by the statement and it is set
to result (if the caller has provided a Cell object for it).
Example 12. ExpressionStatement.java
1 package calculator.interpreter.ast;
2
3 import calculator.interpreter.Cell;
4 import calculator.interpreter.Environment;
5
6 /**
7 * The expression statement. The result of the statement execution is an value
8 * of wrapped expression.
9 */
10 public class ExpressionStatement extends Statement {
11 /** the expression to be interpreted */
12 public Expression value;
13
14 @Override
15 public Environment eval(Environment env, Cell result) {
16 Object rc = value.eval(env);
17 if (result != null) {
18 result.value = rc;
19 }
20 return env;
21 }
22 }
23
Now we have defined all AST classes for basic grammar and we need to read actual source statement by statement and to evaluate statements.
The ETL grammars are created in a way that maps each segment to at most one object. So the any source code is mapped to a sequence of the objects. The AST parser work in the model that allows reading such sequence of the objects, with objects and properties mapped according to conversions of the particular AST parser kind.
So our interpreter reads these statements one by one and executes them.
Example 13. Interpreter.java
1 package calculator.interpreter;
2
3 import java.util.LinkedList;
4 import net.sf.etl.parsers.ErrorInfo;
5 import net.sf.etl.parsers.StandardGrammars;
6 import net.sf.etl.parsers.TermParser;
7 import net.sf.etl.parsers.TermParserFactory;
8 import net.sf.etl.parsers.TermToken;
9 import net.sf.etl.parsers.beans.FieldTermParser;
10 import net.sf.etl.parsers.utils.AbstractTreeParser.PositionPolicy;
11 import org.xml.sax.InputSource;
12 import calculator.interpreter.ast.Node;
13 import calculator.interpreter.ast.Statement;
14
15 /**
16 * The base class for the interpreter.
17 */
18 public class Interpreter {
19 /**
20 * The list that contains errors that are accumulated during the parsing
21 * process.
22 */
23 protected final LinkedList<String> errors = new LinkedList<String>();
24 /**
25 * If true interpreter runs in interactive mode
26 */
27 protected boolean interactive = false;
28
29 /**
30 * The main interpreter loop it reads a sequence of statements from the AST
31 * parser {@code statements} and executes them one by one. If an error
32 * encountered, it is printed and the entire top level statement is skipped
33 * (it is assumed the user will enter a correct version of the statement after
34 * it receives an error about previous input).
35 *
36 * @param statements the AST parser that returns a sequence of statements
37 * @param env the initial environment
38 * @return the resulting environment of the interpretation
39 * @see #enlistError(TermToken)
40 * @see #printErrors()
41 */
42 protected Environment interpret(FieldTermParser<Node> statements,
43 Environment env) {
44 final Cell lastResult = new Cell();
45 while (statements.hasNext()) {
46 Node current = statements.next();
47 if (printErrors()) {
48 printPrompt();
49 continue;
50 }
51 if (current instanceof Statement) {
52 Statement s = (Statement) current;
53 lastResult.value = null;
54 try {
55 env = s.eval(env, lastResult);
56 processResult(lastResult);
57 } catch (EvalException ex) {
58 System.err.println("EVAL ERROR: " + ex.getMessage());
59 } catch (Exception ex) {
60 System.err.println("Executing the statement of the type "
61 + s.getClass().getName() + " at " + s.location.toShortString()
62 + " resulted in the exception.");
63 ex.printStackTrace();
64 } finally {
65 printPrompt();
66 }
67 } else {
68 throw new IllegalArgumentException(
69 "The input is not a valid calculator input file: "
70 + current.getClass().getName());
71 }
72 }
73 printErrors();
74 return env;
75 }
76
77 /**
78 * Process the last result
79 *
80 * @param lastResult the last result holder
81 */
82 protected void processResult(Cell lastResult) {
83 if (interactive) {
84 System.out.println("Result: " + lastResult.value);
85 }
86 }
87
88 /**
89 * Print prompt on the console
90 */
91 public void printPrompt() {
92 if (interactive) {
93 System.out.print("eval> ");
94 System.out.flush();
95 }
96 }
97
98 /**
99 * Print errors that encountered during parsing
100 *
101 * @param code the parser
102 * @return true if some errors were detected
103 */
104 protected boolean printErrors() {
105 if (!errors.isEmpty()) {
106 System.err.println("Errors in the input, the statement is skipped.");
107 for (String e : errors) {
108 System.err.println(e);
109 }
110 errors.clear();
111 return true;
112 }
113 return false;
114 }
115
116 /**
117 * Enlist error that happens during parsing an object or when checking if the
118 * next object exists. The enlisted errors are later printed using
119 * {@link #printErrors()} method in the interpreter loop. This method is
120 * called from the method
121 * {@link FieldTermParser#handleErrorFromParser(TermToken)}.
122 *
123 * @param errorToken the error token
124 */
125 protected void enlistError(TermToken errorToken) {
126 ErrorInfo ei = errorToken.errorInfo();
127 String text = errorToken.kind().name() + " "
128 + ei.location().toShortString() + ": " + ei.message();
129 errors.add(text);
130 }
131
132 /**
133 * Create tree parser from event parser. This parser calculator text into a
134 * sequence of {@link Statement}-s located at the top level of the grammar.
135 *
136 * @param parser the event parser
137 * @return the AST parser
138 */
139 protected FieldTermParser<Node> createTreeParser(TermParser parser) {
140 FieldTermParser<Node> rc = new FieldTermParser<Node>(parser,
141 Interpreter.class.getClassLoader()) {
142 @Override
143 protected void handleErrorFromParser(TermToken errorToken) {
144 enlistError(errorToken);
145 }
146 };
147 rc.mapNamespaceToPackage("http://etl.sf.net/samples/calculator",
148 "calculator.interpreter.ast");
149 rc.mapNamespaceToPackage("http://etl.sf.net/samples/calculator/arith",
150 "calculator.interpreter.ast.arith");
151 rc.mapNamespaceToPackage("http://etl.sf.net/samples/calculator/logic",
152 "calculator.interpreter.ast.logic");
153 rc.mapNamespaceToPackage("http://etl.sf.net/samples/calculator/vars",
154 "calculator.interpreter.ast.vars");
155 rc.mapNamespaceToPackage("http://etl.sf.net/samples/calculator/meta",
156 "calculator.interpreter.ast.meta");
157 rc.mapNamespaceToPackage("http://etl.sf.net/samples/calculator/lambda",
158 "calculator.interpreter.ast.lambda");
159 rc.ignoreNamespace(StandardGrammars.DOCTYPE_NS);
160 rc.setPosPolicy(PositionPolicy.SOURCE_LOCATION);
161 rc.setAbortOnDefaultGrammar(true);
162 return rc;
163 }
164
165 /**
166 * Interpret the single source (file or stdin)
167 *
168 * @param factory the preconfigured factory for the term parser
169 * @param source the source to interpret
170 * @param env the initial environment to interpret
171 * @return the resulting environment of the source execution
172 */
173 public Environment interpretSource(TermParserFactory factory,
174 InputSource source, Environment env) {
175 TermParser parser = factory.newTermParser();
176 parser.parse(source);
177 try {
178 parser.advance();
179 FieldTermParser<Node> code = createTreeParser(parser);
180 return interpret(code, env);
181 } finally {
182 parser.close();
183 }
184 }
185 }
186
The interpreter entry point is
interpretSource(...)
method. The parameter
factory
is used to create a parser instance. Like XML parser factory
classes, it is possible to specify some parameters that affect the
parsing process on the factory. Usually a single factory is used to
create parsers for the same type of the source. The
inputSource
parameter allows specifying what should be parsed in a flexible
way. The last
env
parameter is not used for now.
The interpreter loop is implemented in the method
interpret(...)
. The field term parser is used as a kind of iterator. The
statements are read from it and executed one by one. If the
statement throws an exception, the exception is printed and
execution continues. The exception
EvalException
is treated specially, for it a stack trace is not printed. It is
used when it is thrown when the error is due to invalid user input
and that it is just needed to show a message to the user.If
execution was successful, the result of the executing statement is
passed to
processResult(...)
method. If the variable
printExecutionResults
is set to true, the execution results are printed out.
The input error handling model is very simple and was created to support interactive interaction mode, it relies on the build in error recovery support that we have discussed earlier. For other usages (like code generation) it might be preferable to read source entirely and not to process it further at all if there are some syntax errors.
The field
errors
contains a list of errors that were accumulated during the parsing
process. The errors are collected by the method
enlistError(...)
. This method just formats the error text and adds it to the error
list. Note that if the syntax error is encountered, the tree might
be constructed incompletely, so some properties that are
deterministically set by the grammar might contain null. So if
statement contains some errors (checked by the method
printErrors()
), it skipped and the errors are printed. However you need to enter
a semicolon to terminate the current segment in the input and to
press newline. The prompt for the next statement will not appear
otherwise.
The method
createFieldParser(...)
creates a field parser from the event term parser. Our model
objects for interpreter are already created according to the
convention required by the class
FieldTermParser
, so we need very little effort for customizing the parser.
There are few things to note about this parser. The created parser
extends
FieldTermParser
and uses
Node
as a base class for all used AST nodes. The first argument of the
constructor is an event term parser instance like a base class. The
second argument is a classloader. This classloader should be able
to load all AST classes. Naturally the classloader of the class
Interpreter
will work just fine. In the more complex scenarios (for example if
different parts of AST are loaded in different classloaders), the
class loading could get really complicated and you might want to
overload the
getObjectClass(...)
method. The overridden method
handleErrorFromParser
is invoked if the event parser reports some error during
construction of the objects (this method just delegates to the
method
enlistError(...)
). Note that up to this point, the functionality of the parser
could have been customized completely generally, without
implementing any classes.
The later code perform tuning our parsers for the needs of
interpreter. The method
mapNamespaceToPackage
maps the namespace declared in the grammar to the java package
where AST classes reside. The fully qualified java class name will
be produced by appending the class name to the package name. It is
also possible to specify a mapping for particular combination of
name and namespace, but considering that we have chosen java class
names that match grammar object name, we do not need to do it now.
Note that we have used a crystal ball here to define namespace
mapping for all namespaces that we later use in the tutorial, but
if in the real project you are short on the crystal balls, you
could as well use a configuration file or allow plugins to
contribute the namespace to package mapping.
The method
ignoreNamespace(...)
tells parser that we are not interested to receive objects from the
specified namespace. The document type instruction is represented
by the own object, and we do not want to provide a mapping to it.
The method
setPosPolicy(PositionPolicy.SOURCE_LOCATION)
specifies the policy of mapping positional information to AST (and
if neither policy suits your AST, you could write a custom one).
The method
setAbortOnDefaultGrammar(true)
requires a little explanation. As we have discussed in the section
Error Recovery
, the ETL TermParser is always able to parse the entire source
(unless there is an IO error), since it should also support text
editing scenarios that are bound to have incorrect text inside them
from time to time. So if there is an error in the document type
declaration or there is a problem with loading or compiling the
referenced grammar, the default grammar is used that is able to
parse any ETL source. This grammar creates objects that use
namespace defined in the constant
StandardGrammars.DEFAULT_NS
. But in the most cases, AST parsers are not interested in objects
from this namespace. There is an option to ignore them, like we
have done for the document type namespace. But in that case the ETL
parser will read the entire source in the method
hasNext(...)
. But this method call provides a better alternative, it specifies
that if such object from default namespace is encountered, it
should be treated as EOF. So no further parsing will be performed.
With implementing the
Interpreter
class, we have almost all pieces in the place. The class
InterpreterBasic
implements the rest of mundane work for creating the interpreter.
The only piece that is specific to ETL is creation of the
TermParserFactory
in the method
start(...)
. The default grammar is set to the factory. This grammar is used
when the first segment of the source does not starts with the token
doctype
. For example if standard input is used for the source and type
expressions right away, the default grammar will be used. We will
allow to customize default grammar in limited way by using
-Dcalc.lang={language level}
virtual machine argument. Then the default grammar will be a
grammar
/calculator/calculator-{language
level}-0_2_1.g.etl
located somewhere on the application classpath. If property is not
specified the basic grammar is used.
There is also the method
initialEnvironment(...)
currently it is just a placeholder for future extensions.
Now it is possible to run our extremely basic calculator. It accepts as application argument a file name with source that contains a set of print statements. Or it is possible to use it without arguments, so standard input will be used.
Example 14. RunnerBasic.java
1 package calculator.interpreter; 2 3 import java.io.File; 4 import java.net.URL; 5 import net.sf.etl.parsers.TermParserFactory; 6 import org.xml.sax.InputSource; 7 8 /** 9 * The runner for sources for basic calculator language. 10 */ 11 public class RunnerBasic { 12 /** 13 * The name of file to execute 14 */ 15 protected String filename; 16 /** 17 * The configured term parser factory 18 */ 19 protected TermParserFactory factory; 20 /** 21 * The interpreter 22 */ 23 protected Interpreter interpreter; 24 25 /** 26 * Prints usage information and exits 27 */ 28 public void usage() { 29 System.err.println("Calculator Interpreter (basic version). " 30 + "(c) Constantine Plotnikov, 2008"); 31 System.err.println("Usage: java " + getClass().getName() + " [<file>]"); 32 System.err.println(" <file> - the file to be interpreted. " 33 + "If absent, stdin will be used."); 34 System.exit(1); 35 } 36 37 /** 38 * Interpreter entry point. The method parses arguments and stars 39 * interpretation process for the source basing on the arguments. 40 * 41 * @param args application arguments 42 */ 43 public void start(String[] args) { 44 interpreter = createInterpreter(); 45 parseArgs(args); 46 createFactory(); 47 try { 48 interpreter.interpretSource(factory, getInput(filename), 49 initialEnvironment()); 50 } catch (Exception e) { 51 System.err 52 .println("Problem with running parser on interpreting the code."); 53 e.printStackTrace(); 54 System.exit(1); 55 } 56 } 57 58 /** 59 * Create factory instance 60 */ 61 protected void createFactory() { 62 factory = TermParserFactory.newInstance(); 63 factory.setDefaultGrammar(getDefaultGrammarPublicId(), getDefaultGrammar(), 64 null); 65 } 66 67 /** 68 * @return public identifier for the default grammar 69 */ 70 protected String getDefaultGrammarPublicId() { 71 return null; 72 } 73 74 /** 75 * @return create interpreter instance 76 */ 77 public Interpreter createInterpreter() { 78 return new Interpreter(); 79 } 80 81 /** 82 * @return the initial environment for the interpreter 83 */ 84 protected Environment initialEnvironment() { 85 return new Environment(); 86 } 87 88 /** 89 * Parse arguments and initialize interpreter state 90 * 91 * @param args arguments to parse 92 */ 93 protected void parseArgs(String[] args) { 94 if (args.length > 1) { 95 usage(); 96 } 97 filename = args.length == 1 ? args[0] : null; 98 if (filename == null) { 99 interpreter.interactive = true; 100 System.out.println("Running interpreter in interactive mode."); 101 interpreter.printPrompt(); 102 } 103 } 104 105 /** 106 * @return the default grammar to use in the case if the grammar is not 107 * specified 108 */ 109 protected String getDefaultGrammar() { 110 String lang = getLanguageLevel(); 111 URL grammar = getClass().getResource( 112 "/calculator/grammars/calculator-" + lang + "-0_2_1.g.etl"); 113 return grammar == null ? null : grammar.toString(); 114 } 115 116 /** 117 * @return the current language level 118 */ 119 protected String getLanguageLevel() { 120 return System.getProperty("calc.lang", getDefaultLanguageLevel()); 121 } 122 123 /** 124 * @return the default language level 125 */ 126 protected String getDefaultLanguageLevel() { 127 return "basic"; 128 } 129 130 /** 131 * Create input source basing on whether the input file present. 132 * 133 * @param filename the filename or null if stdin should be used 134 * @return the created input source 135 * @throws Exception if there is a problem with creating a input source 136 */ 137 protected static InputSource getInput(String filename) throws Exception { 138 InputSource source; 139 if (filename == null) { 140 source = new InputSource(System.in); 141 source.setSystemId("<stdin>"); 142 } else { 143 source = new InputSource(new File(filename).toURI().toURL().toString()); 144 } 145 return source; 146 } 147 148 /** 149 * Application entry point. 150 * 151 * @param args application arguments 152 */ 153 public static void main(String[] args) { 154 new RunnerBasic().start(args); 155 } 156 } 157
The calculator currently is a quite dummy, it is only able to print numbers that given to it. But it could understand numbers in the bases up to the base 36. Let's add arithmetics to it.
Instead of simply modifying the file
calculator-basic-0_2_1.g.etl
we will go a complex way. We will create a new grammar that extends
our basic grammar (it will be placed in the file
calculator-arith-0_2_1.g.etl
).
Example 15. calculator-arith-0_2_1.g.etl
1: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
2: grammar calculator.CalculatorArith {
3: include "calculator-basic-0_2_1.g.etl";
4: namespace t = "http://etl.sf.net/samples/calculator";
5: namespace default a = "http://etl.sf.net/samples/calculator/arith";
6: context default Expressions {
7: op composite Identity(f) {
8: ^t:Identity {
9: % (;
10: @ value = expression;
11: % );
12: };
13: };
14: op UnaryMinus(fy, 200, -) {
15: @ value = right;
16: };
17: op UnaryPlus(fy, 200, +) {
18: @ value = right;
19: };
20: op Power(xfy, 300, **) {
21: @ first = left;
22: @ second = right;
23: };
24: op Multiply(yfx, 400, *) {
25: @ first = left;
26: @ second = right;
27: };
28: op Divide(yfx, 400, /) {
29: @ first = left;
30: @ second = right;
31: };
32: op Remainder(yfx, 400, %) {
33: @ first = left;
34: @ second = right;
35: };
36: op Plus(yfx, 500, +) {
37: @ first = left;
38: @ second = right;
39: };
40: op Minus(yfx, 500, -) {
41: @ first = left;
42: @ second = right;
43: };
44: };
45: };
The overall grammar structure is similar to what is we have defined in the section Basic Grammar . There are two namespace declarations and the default namespace for this grammar is different from the basic grammar.
Another new element here is the include statement. The statement specifies that all contexts and definitions from the specified grammar should be included in this grammar. If the including grammar already have a definition with the same name, the definition from included grammar will not be added.
Note that contexts also support include operations. And all grammar include statements are executed before context include statements. And we will give an example of it in the section Introducing Variables .
It is also possible to include multiple grammars. If two included grammars contains the definition with the same name, this results in conflict (but if it is the the same definition that is available through multiple include paths, there is no conflict). This conflict might be resolved by creating a definition in the including grammar with the same name. Such definition might integrate features of definitions from the both parent grammars.
And because of the include statement, we do not have to copy definitions from the basic grammar.
There are no build-in expression parentheses in ETL, since
programming language designers use the parentheses in many creative
ways. They have to be defined explicitly using a primary operator,
but it is quite simple. The
Identity
operator does just this. It is a primary operator just like the
operators
NumberLiteral
and
StringValue
. Since we want to define this operator in the non-default
namespace, we have to specify the root object explicitly.
This grammar provides prefix operators
+
(unary plus) and
-
(unary minus), and infix operators
**
(power operator),
*
(multiply),
/
(divide),
%
(reminder),
+
(plus) and
-
(minus). These operators have normal precedence levels. So
a + b ** -c * d
will be parsed as
a + ((b ** (-c)) * d)
. Now let's look at details of the operator definition. The
parentheses (right after operator name) contain operator
specification.
[4]
There might be from one to three elements in the operator specification. The second element in the parentheses is a precedence. The precedence of the expression is a precedence of its outermost operator. The precedence of the primary operators by default is zero, so it is not specified explicitly in our examples.
The first element of the operator specification is associativity
specification. The supported associativity is specified as a
pattern. The letter
f
means operator itself. The letter
x
means an expression of the precedence that is strictly less than the
precedence of the operator (so the operator is non-associative on
that side). The letter
y
means an expression of the precedence that is less or equal than the
precedence of the operator (so the operator is associative on that
side). If there are no letters on the left or right of the letter
f
in the associativity specification, it means that the operator does
not have the left or the right parameter. So if the associativity is
f
, it means a primary operator that has no arguments at all. The
associativity
fy
, it means a prefix right-associative operator that has one argument
on the right (like unary minus). The associativity
yfx
, it means an infix left-associative operator (like plus and minus).
This operator notation is a quite powerful, and allows defining
quite complex expressions. But what it makes it more interesting, it
is extensible. It is easy to add new operator at existing or new
precedence level when we extending the grammars and we do it a
number of times in our samples.
The third element is available only for non-composite operators. It specifies the operator text, only one token is allowed here. The composite operators are able to specify the operator text that consists of any valid syntax expression, but this text is contained in the body of the operator. And in the later sections of the tutorial we will define quite a few of weird operators.
The body of simple operator definition contains only mapping for the
left and right arguments. The syntax expressions
right
and
left
denote the right and left arguments of operator correspondingly. The
mapping to the arguments could happen in any order (for composite
operators it could be freely interleaved with syntax of the
operator), but it must present at the top level of the block if we
do not use root object, or right under top object if we use top
object. There must exactly one mapping for each argument the
operator could have.
In our sample grammars, for most unary operators will will map
arguments to the property
value
. And for infix operator will will use the properties
first
and
second
. However in real grammars it might make sense to use descriptive
names instead of referring to the operand positions.
The file
sample-arith.c.etl
demonstrates what is possible using newly defined constructs:
Example 16. sample-arith.c.etl
1: doctype "calculator-arith-0_2_1.g.etl";
2: print "zeroes: ", 2*3#10#-3 * 2 ** 3 / 4, " ", +1 + -1;
3: print "dozens: ", 12#100# ** (1/2), " ", 3#1_0# * 16#10000# ** +8#0.1#;
You could check also check sample AST .
Since all our new AST classes for new primary, infix, and prefix operator share common properties, we will use base classes with these properties already implemented.
Example 17. InfixOp.java
1 package calculator.interpreter.ast; 2 3 /** 4 * Base class for infix operators 5 */ 6 public abstract class InfixOp extends Expression { 7 /** The first (left) operand of operator */ 8 public Expression first; 9 /** The second (right) operand of operator */ 10 public Expression second; 11 } 12
Example 18. UnaryOp.java
1 package calculator.interpreter.ast; 2 3 /** 4 * The unary operation 5 */ 6 public abstract class UnaryOp extends Expression { 7 /** The operand of unary operation */ 8 public Expression value; 9 } 10
The implementation of identity operator is trivial.
Example 19. Identity.java
1 package calculator.interpreter.ast;
2
3 import calculator.interpreter.Environment;
4
5 /**
6 * The identity operator that just returns what is enclosed.
7 */
8 public class Identity extends UnaryOp {
9
10 @Override
11 public Object eval(Environment env) {
12 return value.eval(env);
13 }
14 }
15
Because many operations work only with numerical values, we could
use a common code that checks arguments. The class
EvalUtils
contains these utility members to work with the different value
types. Its primary function is to throw an exception in the right
form when an unexpected value encountered. Note that a real language
should allow extending the operators in some flexible way, but type
classes, multimethods, and similar things are not subject of this
tutorial.
Example 20. EvalUtils.java
1 package calculator.interpreter; 2 3 import net.sf.etl.parsers.SourceLocation; 4 import calculator.interpreter.ast.Expression; 5 import calculator.interpreter.ast.Node; 6 7 /** 8 * Evaluation utilities 9 */ 10 public class EvalUtils { 11 /** 12 * Evaluate expression to double 13 * 14 * @param owner the owner 15 * @param env the evaluation environment 16 * @param ex the expression to evaluate 17 * @return the double value 18 */ 19 public static double toDouble(Node owner, Environment env, Expression ex) { 20 Object o = ex.eval(env); 21 if (o instanceof Double) { 22 return ((Double) o).doubleValue(); 23 } else { 24 throw new EvalException(ex, "The node type " 25 + owner.getClass().getSimpleName() + " expects expression at " 26 + ex.location.toShortString() 27 + " to be of type double, and it evaluated to " + o 28 + (o != null ? " : " + o.getClass().getName() : "")); 29 } 30 } 31 32 /** 33 * Evaluate expression to boolean 34 * 35 * @param owner the owner 36 * @param env the evaluation environment 37 * @param ex the expression to evaluate 38 * @return the boolean value 39 */ 40 public static boolean toBoolean(Node owner, Environment env, Expression ex) { 41 Object o = ex.eval(env); 42 if (o instanceof Boolean) { 43 return ((Boolean) o).booleanValue(); 44 } else { 45 throw new EvalException(ex, "The node type " 46 + owner.getClass().getSimpleName() + " expects expression at " 47 + ex.location.toShortString() 48 + " to be of type boolean, and it evaluated to " + o 49 + (o != null ? " : " + o.getClass().getName() : "")); 50 } 51 } 52 53 /** 54 * Return value casted to string or throw {@link EvalException}. 55 * 56 * @param location location of expression 57 * @param role the role of the expression 58 * @param value the value to cast 59 * @return the casted value 60 */ 61 public static String expectString(SourceLocation location, String role, 62 Object value) { 63 if (value instanceof String) { 64 return (String) value; 65 } else { 66 throw new EvalException(location, "Expecting " + role + " and found " 67 + value + " : " + (value == null ? null : value.getClass())); 68 } 69 } 70 } 71
With this utility, implementing operators that require only float arguments is really simple.
Example 21. UnaryMinus.java
1 package calculator.interpreter.ast.arith;
2
3 import calculator.interpreter.Environment;
4 import calculator.interpreter.EvalUtils;
5 import calculator.interpreter.ast.UnaryOp;
6
7 /**
8 * Unary minus
9 */
10 public class UnaryMinus extends UnaryOp {
11
12 @Override
13 public Object eval(Environment env) {
14 return -EvalUtils.toDouble(this, env, value);
15 }
16 }
17
Example 22. Power.java
1 package calculator.interpreter.ast.arith;
2
3 import calculator.interpreter.Environment;
4 import calculator.interpreter.EvalUtils;
5 import calculator.interpreter.ast.InfixOp;
6
7 /**
8 * Power operator
9 */
10 public class Power extends InfixOp {
11
12 @Override
13 public Object eval(Environment env) {
14 double f = EvalUtils.toDouble(this, env, first);
15 double s = EvalUtils.toDouble(this, env, second);
16 return Math.pow(f, s);
17 }
18 }
19
The Divide, Remainder, Minus, UnaryPlus operators are implemented in a similar way.
A bit more complex is a Plus operator, since we would like to support it for string concatenation as well.
Example 23. Plus.java
1 package calculator.interpreter.ast.arith;
2
3 import calculator.interpreter.Environment;
4 import calculator.interpreter.EvalException;
5 import calculator.interpreter.EvalUtils;
6 import calculator.interpreter.ast.InfixOp;
7
8 /**
9 * Plus operator
10 */
11 public class Plus extends InfixOp {
12 @Override
13 public Object eval(Environment env) {
14 Object f = first.eval(env);
15 if (f instanceof String) {
16 return ((String) f) + second.eval(env);
17 } else if (f instanceof Double) {
18 double s = EvalUtils.toDouble(this, env, second);
19 return ((Double) f) + s;
20 } else {
21 throw new EvalException(first,
22 "Unsupported type of operand for plus operator: " + f);
23 }
24 }
25 }
26
And it is also possible to implement a multiply operator for strings and numbers as well.
Example 24. Multiply.java
1 package calculator.interpreter.ast.arith; 2 3 import calculator.interpreter.Environment; 4 import calculator.interpreter.EvalException; 5 import calculator.interpreter.EvalUtils; 6 import calculator.interpreter.ast.InfixOp; 7 8 /** 9 * Multiply operation 10 */ 11 public class Multiply extends InfixOp { 12 @Override 13 public Object eval(Environment env) { 14 Object f = first.eval(env); 15 if (f instanceof String) { 16 String str = (String) f; 17 double s = EvalUtils.toDouble(this, env, second); 18 if (s > 1024 && s >= 0) { 19 throw new EvalException(second, 20 "The string could be multiplied by non-negative " 21 + "numbers smaller than 1024 and the second " 22 + "argument evaluated to " + s); 23 } 24 int n = (int) s; 25 StringBuilder rc = new StringBuilder(str.length() * n); 26 for (int i = 0; i < n; i++) { 27 rc.append(str); 28 } 29 return rc.toString(); 30 } else if (f instanceof Double) { 31 double s = EvalUtils.toDouble(this, env, second); 32 return ((Double) f) * s; 33 } else { 34 throw new EvalException(first, 35 "Unsupported type of operand for plus operator: " + f); 36 } 37 } 38 } 39
No particular efforts are required to run calculator with extensions we just provided. If the source contains the doctype declaration, the correct grammar will be automatically picked. The default grammar (for example for interactive scenario) could be changed by setting a system property calc.lang to the value "arith".
In the current form, the values once calculated cannot be reused later. In this section we will add variables to the language. This section also describes some definition related constructs in the grammar definition language.
Example 25. calculator-vars-0_2_1.g.etl
1: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
2: grammar calculator.CalculatorVars {
3: include "calculator-arith-0_2_1.g.etl";
4: namespace t = "http://etl.sf.net/samples/calculator";
5: namespace m = "http://etl.sf.net/samples/calculator/meta";
6: namespace default v = "http://etl.sf.net/samples/calculator/vars";
7: context abstract MetaInformation {
8: def ExpressionDef {
9: expression(Expressions);
10: };
11: def NameDef {
12: ^ v:Name {
13: {
14: @ literal = identifier;
15: } | {
16: @ quoted = string(prefix=Q|q, quote="'");
17: };
18: };
19: };
20: documentation Documentation {
21: @ documentation += doclines wrapper m:DocumentationLine.text;
22: };
23: attributes Annotations {
24: @ annotations += % @ {
25: ^ m:Annotation {
26: @ name = ref(NameDef);
27: % ( {
28: @ arguments += list , {ref(ExpressionDef);}?;
29: } % ) ?;
30: };
31: } *;
32: };
33: };
34: context default Expressions {
35: include MetaInformation;
36: def ExpressionDef {
37: expression;
38: };
39: statement VarStatement {
40: @ type = token(var) | token(let);
41: @ name = ref(NameDef);
42: % = {
43: @ value = expression;
44: }?;
45: };
46: op composite Name(f) {
47: ref(NameDef);
48: };
49: def SequenceDef {
50: ^ t:Sequence {
51: @ statements += block;
52: };
53: };
54: op composite Sequence(f) {
55: ref(SequenceDef);
56: };
57: op Assignment(xfy, 2000, =) {
58: @ first = left;
59: @ second = right;
60: };
61: statement Help {
62: % help;
63: @ variable = ref(NameDef);
64: };
65: statement IncludeStatement {
66: ^ t:IncludeStatement {
67: % include;
68: @ systemId = string(quote='"');
69: };
70: };
71: };
72: };
In this grammar we have two contexts instead of one. In addition to
the context
Expressions
, there is the context
MetaInformation
. The context
MetaInformation
is abstract. This means that it purely holder for the definition and
it cannot be referred in any way except by include directives. This
context is included in the context
Expressions
. The context include operation is executed after all grammar
include operations, and the context include directives are also
included through grammar include. The meaning of the context include
operation is similar to grammar include operation. All definitions
from the included context are copied in the including context unless
redefined.
This grammar demonstrates use of the new definition type. The
keyword
def
starts a fragment definition. The content of this expression is
literally substituted by the content of the corresponding fragment
definition. The fragment definitions are used in two roles: as
reusable fragments of the syntax and as extension points for the
context. The fragment is a reusable syntax fragment that is can be
referred in other syntax expressions by the name using the operator
ref({fragment name})
. The fragments can reference each other, but since their semantics
is a literal substitution, they cannot do it recursively.
The important reused fragment in this grammar is
NameDef
. It represents a name of variable and it is used in different
places where variable name could happen. The expression
identifier
within the property
literal
matches any identifier. However there is also an alternative mapping
that uses the property
quoted
defined as a prefixed string with prefixes
q
or
Q
(the prefixed string is an identifier immediately followed by the
string, in our case only the strings with prefixes
q
and
Q
will be recognized as names like Q'a' or q'let'). As result, the
NameDef fragment will match an identifier and a single quoted string
prefixed with
Q
or
q
. The same object
v:Name
will be created, but different properties of this object will be
used.
This is done to solve an important problem with extensibility. Since ETL supports an evolution of the languages, the identifiers that were not previously used as keywords could become keywords with evolution of the language or use extensions. So it might be possible to refer to the name when in the context where keyword with the same text is available. By introducing the quoted form of names to our grammar, we will be able to use new constructs and old names. Note that it is still a bad style to use names that are known to match some keyword, and thus the syntax for them could be somewhat less convenient than for normal names.
The quoted for of names also allow solving another problem, the different languages support different syntaxes for identifiers. For example Java allow "$" in identifiers, but the ETL does not. The quoted names will allow solving this integration problem by allowing to specify such names as well.
In our sample we have the definition
ExpressionDef
that defines a syntax fragment that serves as an extension point in
this context. This fragment is later used by
ref(ExpressionDef)
expression. This definition is redefined in the context
Expressions
to refer "this" context instead of referring the context by the
name.
The documentation definition is also a new element in this grammar.
This definition specifies how documentation comments are mapped. The
ETL supports C#-style documentation comments which is a sequence of
line comments that start with
///
. The documentation definition must contain a single multivalue
property assignment statement with
doclines
expression on the right hand side. The
doclines
expression matches all documentation comments that comes before
statement. By default, the comments are mapped to the single value
token. But in most cases, the tokens are wanted in the object form
in order to supply positional information. The text
wrapper t:DocumentationLine.text
after the token
doclines
specifies that each documentation line should be wrapped into the
object
DocumetationLine
and the value should be mapped to the property
text
. The definition applies to the all statements in the context.
However in our tutorial we just ignore documentation for all
statements except variable definition.
The attributes definition also applies to the all statements in the context. It is executed as if its text was specified at the beginning of the each statement in the context inside an explicit or implicit top level object of the statement. The primary use of this definition kind is to define a syntax for structured meta-information (like .NET attributes or Java annotations). This particular definition allows for Java-style annotations. There might be mechanisms in the future versions ETL to allow disabling attributes for the some statements.
The VarStatement defines a variable. The variable has name and
optional value. The variable could be either multiple assignment
(the "var" form) or single assignment (the "let" form) kind. Note
the use of
token(...)
syntax expression. This expression matches and yields the token
specified in the parentheses. In that case either of the values
"let" and "var" will be assigned to property
type
. The value expression is an optional (note that the syntax operator
?
applies to the both the block and
=
. And the value could be assigned later.
The primary operator
Name
is an expression which evaluates to the value assigned to the
variable. Note that we here used NameDef directly to simplify object
model and to demonstrate usage of definition as top level objects.
The possible alternative would have been to wrap it into additional
object to make a better distinction between the name concept and
this specific usage of the name.
The infix operator
Assignment
is a bit tricky to define. On the left hand side we want to allow
only names for now. On the right hand side we want to allow any
expression including assignment. However we want to allow assignment
only on top level expressions. Because of the last constraint we
give the assignment the high precedence level 2000. But this implies
that on the left hand side could be any expression except assignment
as well. We will need to check this during evaluation or semantic
analysis. Also note that assignment operator is right-associative
like it is in C or Java.
The operator
Sequence
allows using blocks in expressions. The value of sequence expression
is the value of the last statement (or null if there are no
statements). So we can write the number 12 in a complex way like
{let a = 2; a*a;} * 3;
. The block will define a new lexical scope in our language. So
variables defined in it, are not available outside of the block. And
variables from outer scope are available inside the block. Like the
expression syntax expression that we have used in the fragment
ExpressionDef
, the block syntax expression takes an optional argument in the form
of the context name. If context name is not explicitly specified for
the block syntax expression, the block takes statements from the
current context.
The statement
Help
is supposed to print interesting information about variable (like
documentation and annotations).
The statement
IncludeStatement
is supposed to include the referenced source. We will use it in
order not to duplicate definitions from var's level prelude in the
later level preludes. But it could be also used to organize sources
into multiple modules.
Note that from formal point of view we have introduced several ambiguous constructs in this grammar, but they are resolved automatically by ETL.
The important point about the ETL is that the ETL grammars are LL(1) from point of view of the source verification (from point of view of AST building the things are somewhat complicated since the ETL parser does some trickery to support operators correctly). This restricts what the ETL parser can do to resolve conflicts.
The first ambiguity is between general expression statement and
statements like let, var, help, and print. All these statements
might start with identifier now. However the expression statement
starts with unspecified identifier, but statement
VarStatement
starts with either
let
or
var
identifier. In the ETL the most specific alternative is selected. If
there is two alternatives one with generic identifier and one that
exactly matches the keyword, the most specific one will be chosen.
However if would have introduced two statements that start with
identifier, the grammar compiler would have detected a conflict and
grammar would not have been compiled.
Considering unpredictability of the language evolution process (particularly if users are allowed to innovate too), it is better to provide ways in the defined grammars to resolve conflict between keywords tokens and tokens used as user supplied names. The quoted form of the name provided in our grammar, is one of the possible ways.
The second ambiguous construct is Annotation. After the annotation
name there are optional parentheses with annotation arguments.
Because of them the expression like
@Test (2)-1;
could evaluate to the different values if understood differently.
However the ETL has greedy strategy of the parsing in the all
elements. So the part
@Test (2)
will be understood as the annotation with name
Test
with the single argument
2
. It might be better not to introduce an ambiguous constructs in the
grammar. For example, we could have made parentheses after
annotation name mandatory rather optional.
The AST classes are created in the same way as before. But now we
finally make a use of some elements that we have previously added
for the future use. The fields
documentation
and
annotations
of the
Statement
class are finally used to store documentation and annotations
correspondingly.
The
Environment
class (that could have been empty class in the previous sections of
the tutorial) implements lexical scope for variables. The scope is
represented as a linked list of the variables. While this
implementation is quite inefficient, it is very simple and does the
job for the purposes of the tutorial.
Example 26. Environment.java
1 package calculator.interpreter; 2 3 import net.sf.etl.parsers.SourceLocation; 4 5 /** 6 * The evaluation environment 7 */ 8 public class Environment { 9 /** the previous environment or null for top one */ 10 private final Environment previous; 11 /** the variable */ 12 private final Variable var; 13 14 /** 15 *The constructor for the context 16 * 17 * @param previous the previous context (might be null) 18 * @param var the variable 19 */ 20 public Environment(Environment previous, Variable var) { 21 super(); 22 this.previous = previous; 23 this.var = var; 24 } 25 26 /** 27 * The constructor for initial environment 28 */ 29 public Environment() { 30 var = null; 31 previous = null; 32 } 33 34 /** 35 * Find a variable 36 * 37 * @param name the name to lookup 38 * @return the variable or null if it was not found. 39 */ 40 public Variable find(String name) { 41 for (Environment env = this; env.previous != null; env = env.previous) { 42 if (env.var.name.equals(name)) { 43 return env.var; 44 } 45 } 46 return null; 47 } 48 49 /** 50 * Find variable or fail with exception 51 * 52 * @param location the location of the node that requested the the variable 53 * @param name the name to lookup 54 * @return the found variable 55 * @throws EvalException if variable is not found 56 */ 57 public Variable findDefined(SourceLocation location, String name) { 58 Variable rc = find(name); 59 if (rc == null) { 60 throw new EvalException(location, "No variable with " + name 61 + " is defined."); 62 } 63 return rc; 64 } 65 66 /** 67 * Check if the variable is free 68 * 69 * @param location the location of the check 70 * @param name the name to check 71 */ 72 public void checkFree(SourceLocation location, String name) { 73 Variable previousDef = find(name); 74 if (previousDef != null) { 75 throw new EvalException(location, "Then name '" + name 76 + "' has been already defined at " 77 + previousDef.defineLocation.toShortString()); 78 } 79 } 80 } 81
The class
Variable
encapsulates all operations on the variable. For the purposes of
error reporting quite a lot of information is stored here.
Example 27. Variable.java
1 package calculator.interpreter;
2
3 import java.util.List;
4 import calculator.interpreter.ast.Node;
5 import net.sf.etl.parsers.SourceLocation;
6
7 /**
8 * The variable
9 */
10 public class Variable {
11 /** the name of variable to be defined */
12 public final String name;
13 /** the true value indicate that this name is mutable variable */
14 public final boolean isVar;
15 /** the true value indicate that is this name has been bound at least once */
16 public boolean isBound;
17 /** the value of the variable */
18 public Object value;
19 /** the location where variable is defined */
20 public final SourceLocation defineLocation;
21 /** the location where variable is set */
22 public SourceLocation bindLocation;
23 /** the documentation associated with binding (might be null) */
24 public final String documentation;
25 /** the annotation values (might be null) */
26 public final List<AnnotationValue> annotations;
27
28 /**
29 *The constructor for the variable
30 *
31 * @param previous the previous context (might be null)
32 * @param name the name of variable or constant
33 * @param isVar if true, the name could be rebound after it was bound
34 * @param defineLocation the location where the name was defined
35 * @param documentation the documentation associated with the variable
36 * @param annotations the list of annotation values
37 */
38 public Variable(String name, boolean isVar, SourceLocation defineLocation,
39 String documentation, List<AnnotationValue> annotations) {
40 super();
41 this.name = name;
42 this.isVar = isVar;
43 this.defineLocation = defineLocation;
44 this.documentation = documentation;
45 this.annotations = annotations;
46 }
47
48 /**
49 * Get the value bound to the name
50 *
51 * @param node the node that requested the bind operation
52 * @return the current value of the binding
53 */
54 public Object get(Node node) {
55 if (!isBound) {
56 throw new EvalException(node, "The name " + name + " defined at "
57 + defineLocation.toShortString() + " has not been bound yet.");
58 }
59 return value;
60 }
61
62 /**
63 * Bind the variable with the value
64 *
65 * @param location the location of the node that requested the bind operation
66 * @param value the value to bind
67 * @return the value argument
68 */
69 public Object bind(SourceLocation location, Object value) {
70 if (!isBound) {
71 isBound = true;
72 } else if (!isVar) {
73 throw new EvalException(location,
74 "The value has been already bound to constant " + name + " at "
75 + bindLocation);
76 }
77 bindLocation = location;
78 this.value = value;
79 return value;
80 }
81
82 /**
83 * @return the help string for the variable
84 */
85 public String help() {
86 StringBuilder b = new StringBuilder();
87 b.append("------ ");
88 b.append(isVar ? "Varaible " : "Constant ").append(name)
89 .append(" ------\n");
90 b.append("Defined at ").append(defineLocation.toShortString()).append('\n');
91 if (isBound) {
92 b.append("Bound at ").append(bindLocation.toShortString()).append(
93 " to value ").append(value).append('\n');
94 }
95 if (documentation == null) {
96 b.append("(No documentation defined)\n");
97 } else {
98 b.append(documentation);
99 }
100 if (annotations == null || annotations.size() == 0) {
101 b.append("(No annotations defined)\n");
102 } else {
103 for (AnnotationValue v : annotations) {
104 b.append(v).append('\n');
105 }
106 }
107 return b.toString();
108 }
109 }
110
The class
AnnotationValue
represents an evaluated form of the annotation. It also allows
launching a processor the annotation, that might modify a variable
state. This class is mostly introduced to demonstrate how
annotations could be added to the language.
Example 28. AnnotationValue.java
1 package calculator.interpreter; 2 3 import java.util.List; 4 import net.sf.etl.parsers.SourceLocation; 5 6 /** 7 * The value of annotation 8 */ 9 public class AnnotationValue { 10 /** the location for annotation in the source */ 11 public final SourceLocation location; 12 /** the annotation name */ 13 public final String name; 14 /** the annotation arguments */ 15 public final List<Object> arguments; 16 17 /** 18 * The constructor 19 * 20 * @param location the location of the annotation in the source 21 * @param name the annotation name 22 * @param arguments the annotation arguments 23 */ 24 public AnnotationValue(SourceLocation location, String name, 25 List<Object> arguments) { 26 super(); 27 this.location = location; 28 this.name = name; 29 this.arguments = arguments; 30 } 31 32 /** 33 * Process an annotation 34 * 35 * @param env the processing environment (includes the variable) 36 * @param var the variable to process 37 */ 38 @SuppressWarnings("unchecked") 39 public void process(Environment env, Variable var) { 40 String className = "calculator.interpreter.annotations." + name 41 + "Annotation"; 42 try { 43 Class<? extends Processor> c = (Class<? extends Processor>) Class 44 .forName(className); 45 Processor p = c.newInstance(); 46 p.process(this, env, var); 47 } catch (ClassNotFoundException e) { 48 // ignore the problem 49 } catch (Exception e) { 50 throw new EvalException(location, 51 "Processor instance failed to be created: " + e); 52 } 53 } 54 55 @Override 56 public String toString() { 57 StringBuilder rc = new StringBuilder(); 58 rc.append('@').append(name).append('('); 59 boolean isFirst = true; 60 for (Object v : arguments) { 61 if (isFirst) { 62 isFirst = false; 63 } else { 64 rc.append(", "); 65 } 66 rc.append(v); 67 } 68 rc.append(')'); 69 return rc.toString(); 70 } 71 72 /** 73 * A processor for annotations on the variables 74 */ 75 public interface Processor { 76 /** 77 * Process annotation (typically by assigning some value to the variable) 78 * 79 * @param annotation an annotation to process 80 * @param env the environment 81 * @param var the variable to which annotation is attached 82 */ 83 void process(AnnotationValue annotation, Environment env, Variable var); 84 } 85 } 86
The class
JavaConstantAnntation
is an example annotation that assigns the value of the static field
on some Java class to the annotated variable.
[5]
Example 29. JavaReflectionProcessor.java
1 package calculator.interpreter.annotations; 2 3 import java.lang.reflect.Field; 4 import calculator.interpreter.AnnotationValue; 5 import calculator.interpreter.EvalException; 6 import calculator.interpreter.EvalUtils; 7 8 /** 9 * Base class for annotation processors based on reflection 10 */ 11 public abstract class JavaReflectionProcessor implements 12 AnnotationValue.Processor { 13 14 /** 15 * Get class for reflection annotation 16 * 17 * @param annotation the annotation value 18 * @param pos position within annotation 19 * @return the loaded class for annotation 20 */ 21 protected static Class<?> getClass(AnnotationValue annotation, int pos) { 22 String className = EvalUtils.expectString(annotation.location, 23 "the class name (the argument " + pos + ")", annotation.arguments 24 .get(pos)); 25 if ("double".equals(className)) { 26 return double.class; 27 } else if ("float".equals(className)) { 28 return float.class; 29 } else if ("char".equals(className)) { 30 return char.class; 31 } else if ("long".equals(className)) { 32 return long.class; 33 } else if ("int".equals(className)) { 34 return int.class; 35 } else if ("short".equals(className)) { 36 return short.class; 37 } else if ("byte".equals(className)) { 38 return byte.class; 39 } 40 try { 41 return Class.forName(className); 42 } catch (ClassNotFoundException e) { 43 throw new EvalException(annotation.location, "No class found: " 44 + className); 45 } 46 } 47 48 /** 49 * Get field for the class 50 * 51 * @param annotation the annotation 52 * @param c the class 53 * @param pos the position within annotation 54 * @return the field object 55 */ 56 protected static Field getField(AnnotationValue annotation, Class<?> c, 57 int pos) { 58 String fieldName = EvalUtils.expectString(annotation.location, 59 "the field name (the argument " + pos + ")", annotation.arguments 60 .get(pos)); 61 try { 62 return c.getField(fieldName); 63 } catch (NoSuchFieldException e) { 64 throw new EvalException(annotation.location, "No field " + fieldName 65 + " found in " + c.getName()); 66 } catch (Exception ex) { 67 throw new EvalException(annotation.location, "Unable to acces the field " 68 + fieldName + " in " + c.getName() + ": " + ex); 69 } 70 } 71 } 72
Example 30. JavaConstantAnnotation.java
1 package calculator.interpreter.annotations; 2 3 import java.lang.reflect.Field; 4 import calculator.interpreter.AnnotationValue; 5 import calculator.interpreter.Environment; 6 import calculator.interpreter.EvalException; 7 import calculator.interpreter.Variable; 8 9 /** 10 * Annotation that binds value to the static constant defined in some java 11 * class. The annotation has the following arguments: 12 * <ol> 13 * <li>Fully qualified Java class name (like "java.lang.Math")</li> 14 * <li>A constant name with class (like "PI")</li> 15 * </ol> 16 */ 17 public class JavaConstantAnnotation extends JavaReflectionProcessor { 18 /** {@inheritDoc} */ 19 public void process(AnnotationValue annotation, Environment env, Variable var) { 20 if (annotation.arguments.size() != 2) { 21 throw new EvalException(annotation.location, 22 "Wrong number of the arguments for annotation expected 2 and found " 23 + annotation.arguments); 24 } 25 Class<?> c = getClass(annotation, 0); 26 Field f = getField(annotation, c, 1); 27 try { 28 Object value = f.get(null); 29 var.bind(annotation.location, value); 30 } catch (EvalException ex) { 31 throw ex; 32 } catch (Exception ex) { 33 throw new EvalException(annotation.location, "Unable to acces the field " 34 + f.getName() + " in " + c.getName() + ": " + ex); 35 } 36 } 37 } 38
The class
VarStatement
defines a variable or constant in the environment. Note that
variables and constants could go in unbound form. They could not be
read until bound later using assignment operator. Also the
annotations have a chance to bind the variable with a value before
value of the expression is assigned.
Example 31. VarStatement.java
1 package calculator.interpreter.ast.vars;
2
3 import java.util.LinkedList;
4 import calculator.interpreter.AnnotationValue;
5 import calculator.interpreter.Cell;
6 import calculator.interpreter.Environment;
7 import calculator.interpreter.Variable;
8 import calculator.interpreter.ast.Expression;
9 import calculator.interpreter.ast.Statement;
10 import calculator.interpreter.ast.meta.Annotation;
11 import calculator.interpreter.ast.meta.DocumentationLine;
12
13 /**
14 * The variable definition statement
15 */
16 public class VarStatement extends Statement {
17 /** the type of the statement ("var" or "let") */
18 public String type;
19 /** the definition name */
20 public Name name;
21 /** the statement value (optional) */
22 public Expression value;
23
24 @Override
25 public Environment eval(Environment env, Cell result) {
26 env.checkFree(location, name.name());
27 boolean isVar = "var".equals(type);
28 String docs;
29 if (!this.documentation.isEmpty()) {
30 StringBuilder b = new StringBuilder();
31 for (DocumentationLine l : this.documentation) {
32 b.append(l.text).append('\n');
33 }
34 docs = b.toString();
35 } else {
36 docs = null;
37 }
38 LinkedList<AnnotationValue> evaluatedAnnotations = new LinkedList<AnnotationValue>();
39 for (Annotation a : this.annotations) {
40 evaluatedAnnotations.add(a.eval(env));
41 }
42 Variable var = new Variable(name.name(), isVar, location, docs,
43 evaluatedAnnotations);
44 Environment rc = new Environment(env, var);
45 for (AnnotationValue a1 : var.annotations) {
46 a1.process(rc, var);
47 }
48 if (value != null) {
49 var.bind(this.location, value.eval(env));
50 }
51 if (var.isBound && result != null) {
52 result.value = var.value;
53 }
54 return rc;
55 }
56 }
57
The
Assignment
operator is used to assign a value to the variable. Note that ETL is
unable to express that only names are allowed on the left hand side.
So we have to check if the left hand side is a name ourselves.
Example 32. Assignment.java
1 package calculator.interpreter.ast.vars;
2
3 import calculator.interpreter.Environment;
4 import calculator.interpreter.EvalException;
5 import calculator.interpreter.ast.InfixOp;
6
7 /**
8 * Execute assignment operator
9 */
10 public class Assignment extends InfixOp {
11 @Override
12 public Object eval(Environment env) {
13 String name;
14 if (first instanceof Name) {
15 name = ((Name) first).name();
16 } else {
17 throw new EvalException(first,
18 "Left hand side of assignment must be an identifier");
19 }
20 Object value = second.eval(env);
21 return env.findDefined(location, name).bind(location, value);
22 }
23 }
24
The
Sequence
class represents a block operator. It just executes its statements
one by one. Note that because expression could not modify the parent
environment, the lexical scoping is implemented more or less
automatically. Also the sequence operator does not handle
exceptions. If sequence operator fails, the entire statement fails
as well. Note that empty sequence results in the null.
Example 33. Sequence.java
1 package calculator.interpreter.ast;
2
3 import java.util.LinkedList;
4 import calculator.interpreter.Cell;
5 import calculator.interpreter.Environment;
6
7 /**
8 * A sequence of the statements to evaluate
9 */
10 public class Sequence extends Expression {
11 /** statements to evaluate */
12 public final LinkedList<Statement> statements = new LinkedList<Statement>();
13
14 @Override
15 public Object eval(Environment env) {
16 Cell result = new Cell();
17 for (Statement s : statements) {
18 result.value = null;
19 env = s.eval(env, result);
20 }
21 return result.value;
22 }
23 }
24
The
IncludeStatement
class includes interprets statements from other source in the
current environment. So they are run as if they were executed
instead of include directive. But it does not make sense to include
module several times in the same scope since included modules are
supposed to create new definition and redefining them is an error.
So we implement the check if the module has been included in the
class
InterpreterVars
. Note that include statement assumes that the runner
RunnerVars
or its subclass is used and will fail with error otherwise.
Example 34. IncludeStatement.java
1 package calculator.interpreter.ast;
2
3 import java.net.URI;
4 import java.net.URISyntaxException;
5 import net.sf.etl.parsers.LiteralUtils;
6 import net.sf.etl.parsers.ParserException;
7 import org.xml.sax.InputSource;
8 import calculator.interpreter.Cell;
9 import calculator.interpreter.Environment;
10 import calculator.interpreter.EvalException;
11 import calculator.interpreter.RunnerVars;
12
13 /**
14 * The include statement
15 */
16 public class IncludeStatement extends Statement {
17 /** the system identifier to be included */
18 public String systemId;
19
20 @Override
21 public Environment eval(Environment env, Cell result) {
22 RunnerVars r = (RunnerVars) env.findDefined(this.location,
23 RunnerVars.RUNNER_VAR_NAME).get(this);
24 InputSource source = createInputSource();
25 try {
26 return r.createInterpreter().interpretSource(r.getFactory(), source, env);
27 } catch (ParserException e) {
28 throw new EvalException(this, "The soruce " + source.getSystemId()
29 + "(public '" + source.getPublicId() + "') could not be opened: " + e);
30 }
31 }
32
33 /**
34 * @return a input source for the include directive
35 */
36 protected InputSource createInputSource() {
37 return new InputSource(resolveSystemId());
38 }
39
40 /**
41 * @return the system id resolved with respect to the current source location
42 */
43 protected String resolveSystemId() {
44 String mySystemId = location.systemId();
45 String resolvedSystemId;
46 if (!mySystemId.startsWith("<")) {
47 try {
48 URI myUri = new URI(mySystemId);
49 resolvedSystemId = myUri.resolve(LiteralUtils.parseString(systemId))
50 .toString();
51 } catch (URISyntaxException e) {
52 throw new EvalException(this, "Problem with parsing URIs " + e);
53 }
54 } else {
55 resolvedSystemId = systemId;
56 }
57 return resolvedSystemId;
58 }
59 }
60
The remaining AST classes
DocumentationLine
,
Annotation
,
Name
, and
Help
are trivial to implement.
The sources conforming to the grammar defined in this section are
able to be run with our basic runner. The language level just needs
to be set to the value
vars
.
But considering that we now can create definitions, it might make sense to have some definitions loaded (like the constants E or PI) before executing the code. In this section will add support for loading the language specific prelude before executing the file provided on the command line. The runner below does just this.
Example 35. RunnerVars.java
1 package calculator.interpreter; 2 3 import java.net.URL; 4 import net.sf.etl.parsers.SourceLocation; 5 import net.sf.etl.parsers.TermParserFactory; 6 import net.sf.etl.parsers.TextPos; 7 import org.xml.sax.InputSource; 8 9 /** 10 * The runner for language level "vars" and higher. 11 */ 12 public class RunnerVars extends RunnerBasic { 13 /** the variable name where runner is stored */ 14 public static final String RUNNER_VAR_NAME = "$runner"; 15 /** If true, the prelude is loaded in the initial context. */ 16 protected boolean loadPrelude = true; 17 /** If not null, the string will be used as a default language */ 18 protected String defaultLanguage = null; 19 20 @Override 21 protected String getDefaultLanguageLevel() { 22 return defaultLanguage != null ? defaultLanguage : "vars"; 23 } 24 25 @Override 26 protected Environment initialEnvironment() { 27 Environment context = super.initialEnvironment(); 28 SourceLocation nowhere = new SourceLocation(TextPos.START, TextPos.START, 29 "<system>"); 30 Variable runnerVar = new Variable(RUNNER_VAR_NAME, false, nowhere, null, 31 null); 32 runnerVar.bind(nowhere, this); 33 context = new Environment(context, runnerVar); 34 if (loadPrelude) { 35 String prelude = getPrelude(); 36 if (prelude != null) { 37 InputSource source = new InputSource(prelude); 38 context = createInterpreter().interpretSource(factory, source, context); 39 } 40 } 41 return context; 42 } 43 44 @Override 45 public Interpreter createInterpreter() { 46 return new InterpreterVars(); 47 } 48 49 /** 50 * @return the default grammar to use in the case if the grammar is not 51 * specified 52 */ 53 protected String getPrelude() { 54 String lang = getLanguageLevel(); 55 URL prelude = getClass().getResource( 56 "/calculator/grammars/prelude-" + lang + ".c.etl"); 57 return prelude == null ? null : prelude.toString(); 58 } 59 60 @Override 61 protected void parseArgs(String[] args) { 62 int i = 0; 63 while (i < args.length && args[i].startsWith("-")) { 64 i = parseOption(args, i); 65 } 66 if (i < args.length - 1) { 67 usage(); 68 } 69 if (i == args.length - 1) { 70 71 } 72 super.parseArgs(i == args.length - 1 ? new String[] { args[i] } 73 : new String[0]); 74 } 75 76 /** 77 * Parse single option 78 * 79 * @param args the arguments to parse 80 * @param i the current position 81 * @return the position after option 82 */ 83 protected int parseOption(String[] args, int i) { 84 if (args[i].equals("-e")) { 85 loadPrelude = false; 86 i++; 87 } else if (args[i].equals("-l")) { 88 if (i == args.length - 1) { 89 usage(); 90 } else { 91 defaultLanguage = args[i + 1]; 92 i += 2; 93 } 94 } else { 95 usage(); 96 } 97 return i; 98 } 99 100 @Override 101 public void usage() { 102 System.err.println("Calculator Interpreter (vars version). " 103 + "(c) Constantine Plotnikov, 2008"); 104 System.err.println("Usage: java " + getClass().getName() + " " 105 + optionList() + " [<file>]"); 106 System.err.println(" <file> - the file to be interpreted. " 107 + "If absent, stdin will be used."); 108 optionUsage(); 109 System.exit(1); 110 } 111 112 /** 113 * The usage string for options 114 */ 115 protected void optionUsage() { 116 System.err.println(" -e - if specified, the prelude is not loaded."); 117 System.err.println(" -l <lang level> - the default language level " 118 + "(the default value is 'vars')."); 119 } 120 121 /** 122 * @return the list of options (for command line usage) 123 */ 124 protected String optionList() { 125 return "[-e] [-l <lang level]"; 126 } 127 128 /** 129 * Application entry point. 130 * 131 * @param args application arguments 132 */ 133 public static void main(String[] args) { 134 new RunnerVars().start(args); 135 } 136 137 /** 138 * @return a factory for term parsers 139 */ 140 public TermParserFactory getFactory() { 141 return factory; 142 } 143 } 144
Note that the runner creates an interpreter of other type, this interpreter does nothing except checking if this source has been already visited in order to avoid cyclic and duplicate includes.
Example 36. InterpreterVars.java
1 package calculator.interpreter;
2
3 import net.sf.etl.parsers.SourceLocation;
4 import net.sf.etl.parsers.TextPos;
5 import net.sf.etl.parsers.beans.FieldTermParser;
6 import calculator.interpreter.ast.Node;
7
8 /**
9 * The var's specific interpreter. The only thing that it does, it ensures that
10 * files are processed only once using real resolved system identifier.
11 */
12 public class InterpreterVars extends Interpreter {
13 @Override
14 protected Environment interpret(FieldTermParser<Node> statements,
15 Environment env) {
16 String systemId = statements.getSystemId();
17 String visitedVar = "$interpreted " + systemId;
18 if (env.find(visitedVar) != null) {
19 return env;
20 }
21 env = new Environment(env, new Variable(visitedVar, false,
22 new SourceLocation(TextPos.START, TextPos.START, systemId), null, null));
23 return super.interpret(statements, env);
24 }
25 }
26
With that runner we could have the following prelude that uses
constants from Java classes using the class
JavaConstantAnnotation
.
Example 37. prelude-vars.c.etl
1: doctype "calculator-vars-0_2_1.g.etl";
2: // This file contains some useful constants. This file is loaded
3: // when language level is "vars".
4:
5: /// The constant PI
6: @JavaConstant("java.lang.Math","PI")
7: let PI;
8: /// The constant E
9: @JavaConstant("java.lang.Math","E")
10: let E;
11: /// Not a number
12: @JavaConstant("java.lang.Double","NaN")
13: let NaN;
14: /// Positive infinity
15: @JavaConstant("java.lang.Double","POSITIVE_INFINITY")
16: let POSITIVE_INFINITY;
17: /// Negative infinity
18: @JavaConstant("java.lang.Double","NEGATIVE_INFINITY")
19: let NEGATIVE_INFINITY;
20: /// Maximum positive value
21: @JavaConstant("java.lang.Double","MAX_VALUE")
22: let MAX_NUMBER;
23: /// Minimum positive value
24: @Q'JavaConstant'("java.lang.Double","MIN_VALUE")
25: let q'MIN_NUMBER';
26: /// Maximum integer value
27: let MAX_INTEGER = 16#7FFF_FFFF#;
28: /// Minimum integer value
29: let MIN_INTEGER = -16#8000_0000#;
30: /// The null constant
31: let null = {};
You could check also check prelude AST .
After that we could use constant defined in the prelude in any source like the following:
Example 38. sample-vars.c.etl
1: doctype "calculator-vars-0_2_1.g.etl";
2:
3: print "PI ** E: ", Q'PI' ** E;
4: print "E ** PI: ", E ** PI;
5: /// The dozen constant defined in a complex way
6: let dozen = {let a = {let one = 1; one + one;}; a * a;} * 3;
7: help dozen;
8: help PI;
9: let a;
10: a = 4;
11: help a;
12: // Using keyword as a name. Quoting is only required
13: // where it could happen as a keyword, but possible everywhere.
14: let let;
15: q'let' = 2;
16: let Q'print';
17: // quoting of left hand side is required, but in expression
18: // is not since "let" is not a keyword in the expression context.
19: // the last in the sum is also "let" but with
20: // all unicode escapes used.
21: Q'print' = let + q'let' + Q'\x6c\u0065\U000074;';
22: // in the sample below quoting is not required
23: // since "print" and "let" are not keywords in this context
24: print "used keyword vars: ", let * print;
25: help let;
26: help q'print';
27: // the $runner is the system variable defined by the runner
28: help q'$runner';
29:
You could check also check sample AST .
Note that the sample uses quoted form of names to access values, this includes places where usage of the plain identifier would have caused a syntax error.
Now lets add logic and control flow expressions to the language.
Example 39. calculator-logic-0_2_1.g.etl
1: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
2: grammar calculator.CalculatorLogic {
3: include "calculator-vars-0_2_1.g.etl";
4: namespace t = "http://etl.sf.net/samples/calculator";
5: namespace default l = "http://etl.sf.net/samples/calculator/logic";
6: context default Expressions {
7: import matchSelectors = Selectors;
8: import selectSelectors = Selectors;
9:
10: op composite BooleanLiteral(f) {
11: ^ t:BooleanLiteral {
12: @ value = token(true) | token(false);
13: };
14: };
15: op LessThen(xfx, 1200, <) {
16: @ first = left;
17: @ second = right;
18: };
19: op LessOrEqual(xfx, 1200, <=) {
20: @ first = left;
21: @ second = right;
22: };
23: op MoreThen(xfx, 1200, >) {
24: @ first = left;
25: @ second = right;
26: };
27: op MoreOrEqual(xfx, 1200, >=) {
28: @ first = left;
29: @ second = right;
30: };
31: op Equal(xfx, 1300, ==) {
32: @ first = left;
33: @ second = right;
34: };
35: op NotEqual(xfx, 1300, !=) {
36: @ first = left;
37: @ second = right;
38: };
39: op composite RegExLiteral(f) {
40: ^t:RegExLiteral {
41: @ value = string(prefix=R, quote='"');
42: };
43: };
44: op Matches(xfx, 1300, ==~) {
45: @ first = left;
46: @ second = right;
47: };
48: op LogicalNot(fy, 1400, !) {
49: @ value = right;
50: };
51: op ConditionalAnd(yfx, 1500, &&) {
52: @ first = left;
53: @ second = right;
54: };
55: op ConditionalOr(yfx, 1600, ||) {
56: @ first = left;
57: @ second = right;
58: };
59: op composite IfExpression(f, 1700) {
60: % if % (;
61: @ condition = expression;
62: %);
63: @ thenPart = expression(precedence = 1700);
64: % else {
65: @ elsePart = expression(precedence = 1700);
66: } ?;
67: };
68: op composite IfExpressionCStyle(xfy,1700) {
69: ^ l:IfExpression {
70: @ condition = left;
71: % ?;
72: @ thenPart = expression;
73: % :;
74: @ elsePart = right;
75: };
76: };
77: op composite Match(xf, 1700) {
78: @ value = left;
79: % match;
80: @ selectors += block(matchSelectors);
81: };
82: op composite Select(f) {
83: % select;
84: @ selectors += block(selectSelectors);
85: };
86: statement WhileStatement{
87: % while;
88: @ label = identifier?;
89: % (;
90: @ condition = expression;
91: % );
92: @ body = expression;
93: };
94: statement BreakStatement {
95: % break;
96: @ label = identifier?;
97: };
98: statement ContinueStatement {
99: % continue;
100: @ label = identifier?;
101: };
102: statement AssertStatement {
103: % assert;
104: @ condition = expression;
105: % : {
106: @ message = expression;
107: } ?;
108: };
109: };
110: context Selectors {
111: import expressions = Expressions;
112: statement CasePart {
113: @ selector = expression(expressions);
114: % =>;
115: @ value = expression(expressions);
116: };
117: statement ElsePart {
118: % else;
119: @ value = expression(expressions);
120: };
121: };
122: };
The simplest logical expression is a boolean literal that can be
true or false. The definition
BooleanLiteral
specifies just it. The literal definition is followed by relational
operators. In our grammar they are non-associative. And comparison
operators have a lesser precedence then equality operators.
The definition
RegExLiteral
demonstrates how to implement how to specify prefixed strings. Note
that differently from string literal, we will pass the content of
the string to the regular expression pattern as is, so we will not
need to escape backslashes. The only thing that ETL cares to be
escaped in strings is a backslash characters and a closing quote
character. Other validation has to be performed by the processors of
the language. However these characters are escaped in the regular
expressions in the same way. The operator
Matches
allows to match the string with the regular expression.
The operator
IfExpression
defines a conditional expression that looks mostly like Java's if
statement. The body is any expression of level conditions and above
(assignment we will need quotes) and blocks are supported because we
have
Sequence
operator in our grammar, the else part is optional and because of
this we could not define this operator as
fy
operator and have to emulate right argument processing in the
definition, for that purpose that the expression part is defined
with the same precedence as one of the operator. The next definition
IfExpressionCStyle
defines C-like conditional operator. Note that it maps to the same
object as the operator
IfExpression
. It is actually the recommended to use the same mapping in cases
where syntax constructs express the same concept in a syntactically
different ways.
The next two operators
Match
and
Select
make use of new extensibility construct. They use syntax expression
block(matchSelectors)
and
block(selectSelectors)
correspondingly. This construct matches block with statements from
the referenced context. However there is no such context in the
grammar. In the beginning of the context
Expressions
we had definitions
import matchSelectors = Selectors;
and
import selectSelectors = Selectors;
. This definition specifies a local name for the context Selectors
in the context. This mapping can be redefined during the grammar
include operation or context include operations just like
definitions. So if we write
import matchSelectors = MySelectors;
in the context that includes the context
Expressions
through grammar or context include, this context MySelectors will be
used for match body. Also because we have two independent
definitions we will be able to redefine them independently. It is
recommended that local name should be named after the role that the
referred context plays in the current context. While syntax of body
for the operators
Match
and
Select
is the same, the semantics is different. The select operator
evaluates the
selector
expression and if it is true, evaluates and returns the expression
value
. The match operator matches the left operand against the selector
expression (so it acts like a Java's switch).
The context
Selectors
used by these two operators contains the
CasePart
statement and the
ElsePart
statement. The ElsePart is supposed to happen as the last member of
the block, but ETL currently provides no means to express this
constraint. So this condition should be checked on semantic level on
the language. This context also imports the context
Expressions
and uses the syntax operator
expression
in the form
expression(expressions)
.
The statements
WhileStatement
,
BreakStatement
, and
ContinueStatement
do just what they do in C or Java. The only difference is that the
label for the while statement goes after keyword while.
The statements
WhileStatement
,
BreakStatement
, and
ContinueStatement
do just what they do in C or Java. The only difference is that the
label for the while statement goes after keyword while.
The statement
AssertStatement
behaves mostly like Java's assert statement. It checks the condition
and throws an exception otherwise. The exception will include the
message that is specified after the character
:
(colon).
Implementing AST classes for this part of the tutorial is mostly trivial, and we consider only somewhat tricky classes.
The class
WhileStatement
is actually quite simple to implement if we use the lexical scopes
and exceptions to implement break and continue statements. Note that
the variable that is not addressable expressions only in a quoted
form is created to support loop.
Example 40. WhileStatement.java
1 package calculator.interpreter.ast.logic; 2 3 import calculator.interpreter.Cell; 4 import calculator.interpreter.Environment; 5 import calculator.interpreter.EvalUtils; 6 import calculator.interpreter.Variable; 7 import calculator.interpreter.ast.Expression; 8 import calculator.interpreter.ast.Statement; 9 10 /** 11 * The while loop statement 12 */ 13 public class WhileStatement extends Statement { 14 /** The label for the statement (might be null) */ 15 public String label; 16 /** The condition for the statement */ 17 public Expression condition; 18 /** The body of the statement */ 19 public Expression body; 20 21 @Override 22 public Environment eval(Environment env, Cell result) { 23 final Object token = new Object(); 24 Variable unnamedVar = new Variable(tokenName(null), true, location, null, 25 null); 26 unnamedVar.bind(location, token); 27 Environment bodyContext = new Environment(env, unnamedVar); 28 Variable namedVar; 29 if (label != null) { 30 namedVar = new Variable(tokenName(label), true, location, null, null); 31 namedVar.bind(location, token); 32 bodyContext = new Environment(bodyContext, namedVar); 33 } else { 34 namedVar = null; 35 } 36 try { 37 while (EvalUtils.toBoolean(this, env, condition)) { 38 try { 39 body.eval(bodyContext); 40 } catch (LoopGotoException ex) { 41 if (ex.token != token) { 42 throw ex; 43 } 44 if (ex.isBreak) { 45 break; 46 } 47 } 48 } 49 } finally { 50 unnamedVar.bind(location, null); 51 if (namedVar != null) { 52 namedVar.bind(location, null); 53 } 54 } 55 return env; 56 } 57 58 /** 59 * Generate token name for loop 60 * 61 * @param label the loop label to use (or null for token without label) 62 * @return the token name 63 */ 64 public static String tokenName(String label) { 65 return "$loop token " + (label == null ? "" : label); 66 } 67 68 /** 69 * The exception that is used to break or continue the loop 70 */ 71 @SuppressWarnings("serial") 72 public static class LoopGotoException extends RuntimeException { 73 /** the token for exception */ 74 public final Object token; 75 /** if true, break requested, otherwise continue */ 76 public final boolean isBreak; 77 78 /** 79 * The constructor 80 * 81 * @param token the token 82 * @param isBreak the break/continue flag 83 */ 84 public LoopGotoException(Object token, boolean isBreak) { 85 super(); 86 this.token = token; 87 this.isBreak = isBreak; 88 } 89 } 90 } 91
After this the break/continue statements can be implemented with this base class.
Example 41. LoopGotoStatement.java
1 package calculator.interpreter.ast.logic;
2
3 import calculator.interpreter.Cell;
4 import calculator.interpreter.Environment;
5 import calculator.interpreter.EvalException;
6 import calculator.interpreter.Variable;
7 import calculator.interpreter.ast.Statement;
8
9 /**
10 * The loop goto statement
11 */
12 public abstract class LoopGotoStatement extends Statement {
13 /** the label of loop (may be null) */
14 public String label;
15
16 @Override
17 public Environment eval(Environment env, Cell result) {
18 Variable var = env.find(WhileStatement.tokenName(label));
19 if (var == null) {
20 throw new EvalException(this, "The break and continue statements "
21 + "must happen only in the lexical scope of the loop.");
22 }
23 if (var.value == null) {
24 throw new EvalException(this, "The break and continue statements "
25 + "must happen only while loop is acitve.");
26 }
27 throw new WhileStatement.LoopGotoException(var.value, isBreak());
28 }
29
30 /**
31 * @return true if it is break statement, false if continue
32 */
33 protected abstract boolean isBreak();
34 }
35
The
AssertStatement
is a bit more interesting from implementation point of view. Its
logic quite trivial, but for assert statement it might be useful to
print an assertion text, but in the runtime we already have AST
rather then text. However if the object implements
TokenCollector
interface, the
FieldTermParser
will notify the object about every token that happens in its scope.
We could use this functionality to accumulate the text of the entire
assert statement and then to select the part relevant to the
condition using positional information from the statement and the
the condition.
Example 42. AssertStatement.java
1 package calculator.interpreter.ast.logic;
2
3 import net.sf.etl.parsers.TermToken;
4 import net.sf.etl.parsers.beans.TokenCollector;
5 import calculator.interpreter.Cell;
6 import calculator.interpreter.Environment;
7 import calculator.interpreter.EvalException;
8 import calculator.interpreter.ast.Expression;
9 import calculator.interpreter.ast.Statement;
10
11 /**
12 * The assert statement
13 */
14 public class AssertStatement extends Statement implements TokenCollector {
15 /** The condition to assert */
16 public Expression condition;
17 /** The message to use in case of assertion failure */
18 public Expression message;
19 /** The text of the assert statement */
20 private StringBuilder textBuilder = new StringBuilder();
21 /** The text of the condition */
22 private String conditionText;
23
24 @Override
25 public Environment eval(Environment env, Cell result) {
26 Object c = condition.eval(env);
27 if (!(c instanceof Boolean)) {
28 throw new EvalException(this, "The condition " + conditionText() + " at "
29 + condition.location.toShortString()
30 + " did not evaluated to the boolean: " + c);
31 }
32 if (!(Boolean) c) {
33 throw new EvalException(this, "The assertion " + conditionText()
34 + " failed" + (message == null ? "." : ": " + message.eval(env)));
35 }
36 return env;
37 }
38
39 /**
40 * @return the text of condition
41 */
42 private String conditionText() {
43 if (conditionText == null) {
44 int start = (int) (condition.location.start().offset() - location.start()
45 .offset());
46 int end = (int) (condition.location.end().offset() - location.start()
47 .offset());
48 conditionText = textBuilder.substring(start, end);
49 textBuilder = null;
50 }
51 return conditionText;
52 }
53
54 /** {@inheritDoc} */
55 public void collect(TermToken token) {
56 if (token.hasLexicalToken()) {
57 textBuilder.append(token.token().token().text());
58 }
59 }
60 }
61
We could use the same runner as for the "vars" language level just with the logic language level specified.
The logic's prelude will just include vars's prelude.
Example 43. prelude-logic.c.etl
1: doctype "calculator-logic-0_2_1.g.etl";
2: include "prelude-vars.c.etl";
And the sample that conforms to this grammar and prelude would be:
Example 44. sample-logic.c.etl
1: doctype "calculator-logic-0_2_1.g.etl";
2:
3: let dozen1 = {
4: var rc = 0;
5: var n = 12;
6: while outer (true) {
7: while(true) {
8: if(n < 1) {
9: break outer;
10: } else {
11: n = n - 1;
12: rc = rc + 1;
13: continue outer;
14: };
15: };
16: };
17: rc;
18: };
19: assert dozen1 == 12;
20: let dozen2 = {
21: var a = 420;
22: var b = 96;
23: while(a != 0) {
24: if(a < b) {
25: let t = a;
26: a = b;
27: b = t;
28: };
29: a = a - b;
30: };
31: b;
32: };
33:
34: assert dozen2 == 12;
35: assert "\\" ==~ R"\\";
36: assert "\\abba\\" ==~ R".\w{4,}.";
37: assert ! "\\a+ba\\" ==~ R".\w{4,}.";
38: assert "1111" == "\x31\u0031\U31;\U0000031;";
39: assert (if(1==2) 11) == null;
40: assert (if(false) 11 else 11 + 11) == 22;
41: assert (if(true) 11 else 11 + 11) == 11;
42:
You could check also check sample AST .
To have more fun with the calculations in our calculator we need to
support functions. Otherwise
sin
,
cos
, and others have to be introduced in some very ad hoc manner. To
simplify explanations lets split our extension to the grammar into
two parts. The first part will contains only the core constructs.
Example 45. calculator-lambda-basic-0_2_1.g.etl
1: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
2: grammar abstract calculator.CalculatorLambdaBasic {
3: include "calculator-logic-0_2_1.g.etl";
4: namespace v = "http://etl.sf.net/samples/calculator/vars";
5: namespace default l = "http://etl.sf.net/samples/calculator/lambda";
6: context abstract LetRecDefs {
7: def EpxressionDef {
8: // redefine it to refer to the correct expression
9: expression(Expressions);
10: };
11: def ParameterList {
12: @ parameters += list , {
13: ^l:Parameter {
14: modifiers {
15: @ type = modifier var;
16: };
17: @ name = ref(NameDef);
18: };
19: } ?;
20: };
21: def LetRecAssignDef {
22: ^v:LetRecPart {
23: @ name = ref(NameDef);
24: % =;
25: @ value = ref(ExpressionDef);
26: };
27: };
28: };
29: context default Expressions {
30: include LetRecDefs;
31: import letRecContent = LetRecContent;
32:
33: op composite Lambda(fy, 1950) {
34: % \;
35: ref(ParameterList) | %( { ref(ParameterList); } % );
36: % =>;
37: @ value = right;
38: };
39: op composite Apply(yf, 100) {
40: @ function = left;
41: % (;
42: @ arguments += list , {
43: expression;
44: } ?;
45: % );
46: };
47: statement LetRec {
48: ^ v:LetRec {
49: % letrec;
50: @ definitions += ref(LetRecAssignDef) | {
51: block(letRecContent);
52: };
53: };
54: };
55: };
56: context LetRecContent {
57: include LetRecDefs;
58: include MetaInformation;
59: import expressions = Expressions;
60:
61: def EpxressionDef {
62: expression(expressions);
63: };
64: statement LetPart {
65: ref(LetRecAssignDef);
66: };
67: };
68: };
Note that this grammar has "abstract" modifier. This means that this
grammar cannot be directly used. It exists only to be included into
some other grammar. The abstract grammars are a nice way of
providing a set of reusable definitions or for refactoring the
grammars into simpler parts (and we use it here for the later
purpose). Also note that we use the same namespace prefix
l
as we used for the logic grammar definitions, but map it to a
different namespace. Since namespace declarations are local to the
grammar, it in no way affects definitions included from the logic
grammar. But it is recommended to use different prefixes for
different namespaces in groups of related grammars in order to avoid
confusion during grammar reading by a human.
The primary element introduced in the grammar is a lambda expression
defined by the operator
Lambda
. We have placed the lambda expressions between conditional operator
and assignment operator. So it will not be possible to to assignment
on the value part of the lambda expression without parentheses, but
it will be possible to assigning lambda expression to the variable
without parentheses. The list of parameters could be optionally
enclosed into parentheses. The "var" modifier on the parameter of
the lambda abstraction allows specifying that the parameter should
be mutable in the body of the lambda abstraction. That comes handy
in some situations.
The operator
Apply
allows invoking lambda abstraction with zero or more arguments. Note
that in our language we will support currying and and uncurrying. So
invocation of two argument function
max()(1)(2)
will have the same semantics as
max(1,2)
(but not the same as
max(1)(2)()
, as later will try to use 2 as a function and thus will fail).
Along with lambda expressions, it is useful to introduce the recursive form of the let construct. Strictly speaking it is not required, since we could already define unbounded let and and bind it later. But having having such a statement will express the intention more clearly and allows for more convenient syntax. The letrec construct will have two basic forms:
// single var form letrec name = {the expression that uses name}; // multiple var form letrec { name1 = {the expression that uses name1 ... nameN}; name2 = {the expression that uses name1 ... nameN}; ... nameN = {the expression that uses name1 ... nameN}; };
The first form is simply a shortcut for the one element second form. But the second form is itself equivalent to the following:
let name1; let name2; ... let nameN; name1 = {the expression that uses name1 ... nameN}; name2 = {the expression that uses name1 ... nameN}; ... nameN = {the expression that uses name1 ... nameN};
We will define the reusable parts for letrec and lambdas in the
context
LetRecDefs
. Note that because context include happens after grammar include,
we do not need to redefine the fragment
ExpressionDef
in the expression context again. The definition from the logic
grammar will be included in the context
Expressions
and override the definition that will try to come for the context
LetRecDefs
.
The context
LetRecContent
simply reuses definitions from
LetRecDefs
and converts them to statements. But there is an interesting
extensibility aspect here. We include the contexts
LetRecDefs
and
MetaInformation
(defined in the logic grammar). The both contexts contains the
fragment
ExpressionDef
. The definitions are the same and the definition in the either
context would have worked for us. But since these contexts are
independent, there is a definition conflict, and we have to override
the definitions by creating a definition with the same name in the
LetRecContent
context.
Lets now add new constructs that will extend the grammar with other forms of the function invocation that allow defining new control constructs like it is done in Scala or Groovy (or in the plenty of others programming languages that support lambda abstractions).
Example 46. calculator-lambda-0_2_1.g.etl
1: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
2: grammar calculator.CalculatorLambda {
3: include "calculator-lambda-basic-0_2_1.g.etl";
4: namespace t = "http://etl.sf.net/samples/calculator/vars";
5: namespace v = "http://etl.sf.net/samples/calculator/vars";
6: namespace default l = "http://etl.sf.net/samples/calculator/lambda";
7: context abstract LetRecDefs {
8: def LetRecFunctionDef {
9: ^ v:LetRecPart {
10: % to;
11: @ name = ref(NameDef);
12: @ value = ^l:Lambda {
13: % (;
14: ref(ParameterList);
15: % ) % =>;
16: @ value = ref(ExpressionDef);
17: };
18: };
19: };
20: };
21: context default Expressions {
22: include LetRecDefs;
23: import letRecContent = LetRecContent;
24:
25: op composite LambdaNoArgs(fy, 1950) {
26: ^ l:Lambda {
27: % \>;
28: @ value = right;
29: };
30: };
31: def SequenceDef {
32: ^ l:LambdaSequence {
33: @statements += block;
34: };
35: };
36: statement BlockParameters {
37: % ?;
38: ref(ParameterList);
39: };
40: statement Function {
41: ^v:LetRec {
42: @ definitions += ref(LetRecFunctionDef);
43: };
44: };
45: op composite InfixOperatorAppy(yfx, 1150) {
46: ^ l:Apply {
47: @ arguments += left;
48: % #;
49: @ function = expression(precedence = 99);
50: @ arguments += right;
51: };
52: };
53: op composite PrefixOperatorAppy(fy, 1150) {
54: ^ l:Apply {
55: % #;
56: @ function = expression(precedence = 99);
57: @ arguments += right;
58: };
59: };
60: op composite ApplyBlock(xf, 1175) {
61: ^ l:Apply {
62: @ function = left;
63: @ arguments += ref(SequenceDef);
64: };
65: };
66: op composite Lazy(f, 1950) {
67: ^ l:Apply {
68: @ function = token(&) wrapper t:Name.literal;
69: @ arguments += ^ l:Lambda {
70: @value = expression(precedence=1949);
71: };
72: };
73: };
74: op Dereference(fy, 100, *) {
75: @ value = right;
76: };
77: op composite NotEqualDeref(xfx, 1300) {
78: ^ l:Apply {
79: @ arguments += left;
80: @ function = token(!*=) wrapper t:Name.literal;
81: @ arguments += right;
82: };
83: };
84: op composite EqualDeref(xfx, 1300) {
85: ^ l:Apply {
86: @ arguments += left;
87: @ function = token(=*=) wrapper t:Name.literal;
88: @ arguments += right;
89: };
90: };
91: };
92: context LetRecContent {
93: statement FunctionPart {
94: ref(LetRecFunctionDef);
95: };
96: };
97: };
The first thing that we introduce here is a short form for the
lambda abstraction without arguments as the operator
LambdaNoArgs
. The definition just maps to the same object as the operator
Lambda
from the included grammar.
Another thing that we introduce is the block form of the lambda
abstraction. We will indirectly redefine the operator
Sequence
. It closely resembles what is proposed for Java closures, except
that we prefix parameters with the question mark and put them on the
separate statement rather than detecting using the keyword
=>
after parameter text in order to make the grammar deterministic.
While we cannot restrict using ETL grammar that the statement
BlockParameters
appears only at the beginning of the block, we will check it during
evaluation. If there is no statement
BlockParameters
in the beginning of the block, we will revert to the old behavior.
Since this is a new behavior different from behavior of the old
Sequence operator, we will redefine the Sequence operator to map to
other object than in the vars grammar. To do this, we just need to
redefine the fragment SequenceDef, and everywhere we have used it
new lambda blocks will be supported.
Another thing we are introducing in this grammar is a syntactic
sugar for defining the functions. The context
LetRecDefs
and
LetRecContent
contains a new member each to support this update. Also we use new
definition to define the statement
Function
in the context
Expressions
. Note that we do not introduce new AST objects here, these
constructs are mapped to existing
LetRec
and
Lambda
objects. Such approach is supported by ETL, but it should be used
when the new construct is completely logically equivalent to some
existing construct. With definition like it provided in the
tutorial, it is not possible to distinguish between two forms by
examining AST. If we would want to perform some special
documentation or annotation processing related to the function
definition, we will have problems with implementing it.
We also introduce syntax sugar for applying function to one and two
arguments in infix (the operator
InfixOperatorAppy
) and prefix forms (the operator
PrefixOperatorAppy
). This approach is less flexible than in the Scala, as we have to
prefix our operator with the symbol "#", but it still works. We have
placed these operators between arithmetic and relational operators.
So a logical expression has to be put in the parentheses when using
these operators. Note that we can specify not only function names,
but any functional expressions of precedence less than 100 (this
allows expressions in parentheses, lambda blocks, and names). And
this expression will evaluate before its arguments are evaluated, so
lexical evaluation order will not be kept here. The final
application sugar is the block application operator
ApplyBlock
. It has greater precedence than infix and prefix apply forms.
The another things in the our grammar is support for the lazy
values. To support them we introduce the operator
Lazy
and the operator
Dereference
. The dereference operator invokes its argument if it is a zero
argument function (and than reapplies itself to the result),
otherwise it just returns the argument. So the dereference operator
can resolves lazy values even if they are resolved to lazy values.
From the logical point of view, lazy function creates a zero argument function from another zero-argument function that invokes the function passed as argument only once and remembers the result. And with exclusion of exception handling (that we do not have in our language), we could implement support for lazy values in the language like the following:
let lazy = {? var f; assert f != null; var value; {?; if(f != null) { value = f(); f = null; }; value; }; }; val lazySum = #lazy {?; 1 + *lazy(\> 2 + 3);};
However it is somewhat inconvenient to lambda abstraction every time
we want to create a value. So defined the operator lazy as mapping
to the function defined in the prelude, but we have wrapped its
right argument into a lambda abstraction. This trick only works with
the right argument of the operator. Also note that we use
&
a function name in our grammar. This demonstrates another trick. Our
name has quoted and literal forms. We use a literal form to refer
the function in the operator definition. However in the function
definition we could not use ampersand as a function name directly.
But we could use it in a quoted form. Also we we use it in a quoted
form to refer to the lazy function driectly, without wrapping a
right argument into the lambda abstraction automatically.
With these syntax extensions we could define new control constructs as it could be seen from prelude and sample sources.
The final two operators demonstrate a simpler case with mapping
operators to function. The operators
NotEqualDeref
and
EqualDeref
define dereferencing version of equality operators. But instead of
implementing new node types in Java code, we just define the as a
mapping to function invocation with the operator text as a name. We
then define the corresponding functions in the beginning of the
prelude. This is how the operators like plus and etc. could have
been defined if we would have started with lambda abstractions
[6]
. In the similar way the users could also extend our lambda grammar
and introduce own simple operators mapped to the functions without
adding new AST node types. With using lambda abstraction trick that
we used for lazy values, we could do quite funny things in the
language itself.
Example 47. prelude-lambda.c.etl
1: doctype "calculator-lambda-0_2_1.g.etl";
2: include "prelude-logic.c.etl";
3:
4: /// The construct lazy value from the lambda abstraction
5: ///
6: /// @param f the function
7: /// @return the function that remembers the result invocation of
8: /// the function without arguments
9: let q'&' = {? var f;
10: assert f != null;
11: var value;
12: {?;
13: if(f != null) {
14: value = f();
15: f = null;
16: };
17: value;
18: };
19: };
20: /// The implementation of NotEqualDeref operator
21: to q'!*='(x, y) => * x != * y;
22: /// The implementation of EqualDeref operator
23: to q'=*='(x, y) => * x == * y;
24:
25: /// The maximum value (works for comparable objects)
26: to max(x, y) => x < y ? y : x;
27: /// The minimum value (works for comparable objects)
28: to min(x, y) => x < y ? x : y;
29:
30: /// Ensure that the number in the range of integer values.
31: /// @param x the number to check
32: /// @return itself (to allows variable arguments)
33: to assertIntRange(x) => {
34: assert MIN_INTEGER <= x && x <= MAX_INTEGER : "The value is "+x;
35: assertIntRange;
36: };
37:
38: /// Ensure that the number is the integer (there is no fractional pat).
39: /// @param x the number to check
40: /// @return itself (to allows variable arguments)
41: to assertInt(x) => {
42: assertIntRange(x);
43: assert x % 1 == 0 : "The value is "+x;
44: assertInt;
45: };
46:
47: /// Ensure that the number is the non-negative integer.
48: /// @param x the number to check
49: /// @return itself (to allows variable arguments)
50: to assertNonNegativeInt(x) => {
51: assertInt(x);
52: assert x >= 0 : "The value is "+x;
53: assertNonNegativeInt;
54: };
55:
56: /// The factorial value
57: ///
58: /// @param n the non-negative integer value
59: /// @return the factorial result
60: let factorial = {?n;
61: assertNonNegativeInt(n);
62: to fac(x) => x match { 0 => 1; else x * fac(x-1); };
63: fac(n);
64: };
65:
66: letrec {
67: /// Odd number check
68: ///
69: /// @param x a non-negative integer number
70: /// @return true for odd numbers, false for even ones
71: is_odd = {?x; assertNonNegativeInt(x); x != 0 && !is_even(x-1);};
72: /// Even number check
73: ///
74: /// @param x a non-negative integer number
75: /// @return false for odd numbers, true for even ones
76: to is_even(x) => {assertNonNegativeInt(x); x == 0 || is_odd(x-1);};
77: };
78:
79: /// Repeats execution of the one argument function from low to high
80: /// adding +1 on each step until the resulting value will be higher then
81: /// high. The example below will print numbers 1.5, 2.5, 3.5, and 4.5.
82: /// {@example
83: /// 1.5 #upto 5 {?i;
84: /// print i;
85: /// }; }
86: ///
87: /// If the high value is greater then the low value, the cycle is never executed.
88: ///
89: /// @param low the start value for the loop (must be in integer range)
90: /// @param high the limit value for the loop (must be in integer range)
91: /// @param f the closure to use for the loop.
92: /// If it returns false, the cycle is aborted,
93: /// all other values are ignored.
94: /// @return null value
95: to upto(var low, high, f) => {
96: assertIntRange(low, high);
97: while(low<high && f(low) != false) {
98: low = low + 1;
99: };
100: };
101:
102: /// Repeats execution of the one argument function from start to limit
103: /// adding step on each step until the resulting value will be higher or lower then
104: /// limit (the direction depends on step's sign).
105: /// {@example
106: /// for(5,1,-0.125) {?i;
107: /// print i;
108: /// }; }
109: ///
110: /// If the high value is greater then the low value, the cycle is never executed.
111: ///
112: /// @param start the start value for the loop (start/step must be in integer range)
113: /// @param limit the limit value for the loop (limit/step must be in integer range)
114: /// @param step the loop step (must ot be 0)
115: /// @param f the closure to use for the loop.
116: /// If it returns false, the cycle is aborted,
117: /// all other values are ignored.
118: /// @return null value
119: to for(var start, limit, step, f) => {
120: assert step != 0;
121: assertIntRange(start/step, start/step);
122: while((step > 0 ? start<limit : start>limit) && f(start) != false) {
123: start = start + step;
124: };
125: };
126:
127: /// Convert Java number to integer
128: @JavaMethod("java.lang.Number", "intValue")
129: let toInt;
130:
131: /// Convert a value to string
132: @JavaMethod("java.lang.Object", "toString")
133: let str;
134:
135: @JavaMethod("java.lang.Math", "abs", "double")
136: let abs;
137:
138: @JavaMethod("java.lang.Math", "sin", "double")
139: let sin;
140:
141: @JavaMethod("java.lang.Math", "cos", "double")
142: let cos;
143:
144: /// Sign of the number
145: ///
146: /// @param x the floating point number to check
147: /// @return -1, 0, or 1 depending on th sing of the number
148: let sign = \x => select { x < 0 => -1; x > 0 => 1; else 0; };
You could check also check prelude AST .
Example 48. sample-lambda.c.etl
1: doctype "calculator-lambda-0_2_1.g.etl";
2:
3: /// The special marker value used to stop computation in sum function
4: let STOP = \x => x;
5: /// Sum arguments up to stop marker like {@code sum(0,1,2,3,STOP)}
6: ///
7: /// @return the function that accumulates results.
8: let sum = {?;
9: var sum = 0;
10: to acc(x) => if( x == STOP) {sum;} else {sum = sum + x; acc;};
11: };
12: /// Greatest common divisor
13: ///
14: /// @param a the positive integer value
15: /// @param b the positive integer value
16: /// @return the greatest common divisor
17: let gcd = {? var q'a', var b;
18: assertInt(a,b);
19: assert a > 0 && b > 0;
20: if(a < b) {
21: let t = a;
22: a = Q'b';
23: b = t;
24: };
25: while(b != 0) {
26: a = a % b;
27: let t = a;
28: a = b;
29: b = t;
30: };
31: a;
32: };
33:
34:
35: assert 11 !*= {?; 12;};
36: assert (\> 12) =*= (& 12);
37: assert 12 == * (\()=> \> & max(10)(12));
38: assert 12 == * (\()=> \> #q'&' {?;1+10+1;});
39: assert 12 == #abs 1-13;
40: assert 12 == sum(0,1,2,3,3,2,1,0,STOP);
41: assert 12 == factorial(4) / 2;
42: assert 12 == 420 #gcd 140 + 4;
43: assert 12 == {
44: var s = 0;
45: 0 #upto 4 {?i;
46: s = s + i;
47: };
48: s * 2;
49: };
50: assert 12 == {
51: var s = 0;
52: var n = 0;
53: for(14, 9.9, -0.25) {?i;
54: s = s + i;
55: n = n + 1;
56: };
57: s / n;
58: };
59: assert 1 == #sign 12;
60: assert -1 == #sign -MIN_NUMBER;
61: assert 0 == sign(12-12);
62: assert 12 ==~ {?n; sign(n) == 1;};
63: assert ! 12 ==~ (\n => sign(n) == -1);
You could check also check sample AST .
The core class for syntax elements provides in this grammar is the
class
Function
.
Example 49. Function.java
1 package calculator.interpreter; 2 3 import java.util.ArrayList; 4 import java.util.Collections; 5 import java.util.List; 6 import net.sf.etl.parsers.SourceLocation; 7 8 /** 9 * The function abstract class 10 */ 11 public abstract class Function implements Matcher { 12 /** 13 * @return amount of arguments for this function (the function should not 14 * change its arity during the lifetime of this object) 15 */ 16 public abstract int arity(); 17 18 /** 19 * Apply the function, the amount of arguments must match the arity. This 20 * method should not be invoked directly, use 21 * {@link Function#apply(SourceLocation, Object, Object[])} instead. 22 * 23 * @param location the location that requested invocation 24 * @param args the function arguments, the implementers must not change this 25 * list. 26 * @return a result of function invocation 27 */ 28 public abstract Object apply(SourceLocation location, List<Object> args); 29 30 /** {@inheritDoc} */ 31 public boolean match(SourceLocation location, Object value) { 32 Object r = Function.apply(location, this, Collections.singletonList(value)); 33 return r instanceof Boolean && ((Boolean) r).booleanValue(); 34 } 35 36 /** 37 * Apply the function, if the amount of arguments is insufficient, return the 38 * curried function, otherwise return the invocation result. If there are too 39 * much of the arguments, it first applies function with available arguments 40 * and if the result is a function, it tries to apply it with rest of the 41 * arguments. 42 * 43 * @param location the location of the node that requested evaluation 44 * @param function the function to invoke (checked to be instance of 45 * {@link Function}) 46 * @param args the function arguments 47 * @return the invocation result 48 * @throws EvalException if there are too much of the arguments or if the 49 * argument {@code f} is not the function. 50 */ 51 public static Object apply(SourceLocation location, Object function, 52 List<Object> args) { 53 int n = args.size(); 54 int i = 0; 55 boolean firstApply = true; 56 while ((i < n || n == 0 && firstApply) && function instanceof Function) { 57 if (firstApply) { 58 firstApply = false; 59 } 60 Function f = (Function) function; 61 int a = f.arity(); 62 assert a >= 0; 63 if (i + a > n) { 64 function = new Function.CurriedFunction(f, args.subList(i, n)); 65 } else { 66 function = f.apply(location, args.subList(i, i + a)); 67 } 68 i += a; 69 } 70 if (i < n) { 71 throw new EvalException(location, "The function required since " 72 + (n - i) + " arguments are remaining " 73 + "and found the value or the class " 74 + (function == null ? null : function.getClass().getName())); 75 } 76 return function; 77 } 78 79 /** 80 * The curried function 81 */ 82 static class CurriedFunction extends Function { 83 /** the wrapped function */ 84 final Function wrapped; 85 /** the fixed arguments of the function */ 86 final List<Object> fixedArguments; 87 88 /** 89 * The constructor for the curried function 90 * 91 * @param wrapped function 92 * @param fixedArguments fixed arguments 93 */ 94 public CurriedFunction(Function wrapped, List<Object> fixedArguments) { 95 super(); 96 this.wrapped = wrapped; 97 this.fixedArguments = new ArrayList<Object>(fixedArguments); 98 } 99 100 /** {@inheritDoc} */ 101 public Object apply(SourceLocation location, List<Object> args) { 102 ArrayList<Object> a = new ArrayList<Object>(fixedArguments.size() 103 + args.size()); 104 a.addAll(fixedArguments); 105 a.addAll(args); 106 return apply(location, wrapped, a); 107 } 108 109 /** {@inheritDoc} */ 110 public int arity() { 111 return wrapped.arity() - fixedArguments.size(); 112 } 113 } 114 } 115
The class is an abstract class for all functions and supports currying and uncurrying during the function invocation. The class is terribly inefficient, but it does its job for the tutorial.
To interface with Java, we will also support the annotation
JavaMethodAnnotation
that will bind the variable with the function that use reflection to
invoke the Java's static or instance method:
Example 50. JavaMethodAnnotation.java
1 package calculator.interpreter.annotations; 2 3 import java.lang.reflect.Method; 4 import java.lang.reflect.Modifier; 5 import java.util.List; 6 import net.sf.etl.parsers.SourceLocation; 7 import calculator.interpreter.AnnotationValue; 8 import calculator.interpreter.Environment; 9 import calculator.interpreter.EvalException; 10 import calculator.interpreter.EvalUtils; 11 import calculator.interpreter.Function; 12 import calculator.interpreter.Variable; 13 14 /** 15 * The java method annotation 16 */ 17 public class JavaMethodAnnotation extends JavaReflectionProcessor { 18 /** {@inheritDoc} */ 19 public void process(AnnotationValue annotation, Environment env, Variable var) { 20 if (annotation.arguments.size() < 2) { 21 throw new EvalException(annotation.location, 22 "Wrong number of the arguments for annotation expected at least 2 and found " 23 + annotation.arguments); 24 } 25 Class<?> c = getClass(annotation, 0); 26 String methodName = EvalUtils.expectString(annotation.location, 27 "the field name (the argument " + 1 + ")", annotation.arguments.get(1)); 28 Class<?>[] parameterTypes = new Class[annotation.arguments.size() - 2]; 29 for (int i = 0; i < annotation.arguments.size() - 2; i++) { 30 parameterTypes[i] = getClass(annotation, i + 2); 31 } 32 try { 33 Method method = c.getMethod(methodName, parameterTypes); 34 var.bind(annotation.location, new MethodFunction(method)); 35 } catch (Exception e) { 36 throw new EvalException(annotation.location, "The method " + methodName 37 + " cannot be located: " + e); 38 } 39 } 40 41 /** 42 * The method based function 43 */ 44 private static class MethodFunction extends Function { 45 /** the method to invoke */ 46 private final Method method; 47 /** the arity of the method */ 48 private final int arity; 49 /** the static modifier */ 50 private final boolean isStatic; 51 52 /** 53 * The constructor 54 * 55 * @param method the wrapped method 56 */ 57 public MethodFunction(Method method) { 58 this.method = method; 59 isStatic = Modifier.isStatic(method.getModifiers()); 60 arity = method.getParameterTypes().length + (isStatic ? 0 : 1); 61 } 62 63 @Override 64 public Object apply(SourceLocation location, List<Object> args) { 65 assert args.size() == arity; 66 Object[] a = (isStatic ? args : args.subList(1, arity)).toArray(); 67 Object t = isStatic ? null : args.get(0); 68 try { 69 return method.invoke(t, a); 70 } catch (Exception e) { 71 throw new EvalException(location, "The method " + method 72 + " cannot be invoked: " + e); 73 } 74 } 75 76 @Override 77 public int arity() { 78 return arity; 79 } 80 81 } 82 } 83
In the implementation of the operator
Lambda
we just create an implementation of this interface and return it as
a value of the operation. We keep the lexical scope for the
function, so it is able to use all definitions available from it,
and we add the parameter names bound to argument values to the
scope. With this usage we naturally support recursive invocations.
Example 51. Lambda.java
1 package calculator.interpreter.ast.lambda; 2 3 import java.util.Iterator; 4 import java.util.LinkedList; 5 import java.util.List; 6 import net.sf.etl.parsers.SourceLocation; 7 import calculator.interpreter.Environment; 8 import calculator.interpreter.Function; 9 import calculator.interpreter.Variable; 10 import calculator.interpreter.ast.Expression; 11 12 /** 13 * Lambda expression 14 */ 15 public class Lambda extends Expression { 16 /** the parameters of the lambda abstraction */ 17 public final LinkedList<Parameter> parameters = new LinkedList<Parameter>(); 18 /** the body of the lambda abstraction */ 19 public Expression value; 20 21 @Override 22 public Object eval(Environment env) { 23 for (Parameter p : parameters) { 24 env.checkFree(p.location, p.name.name()); 25 } 26 return new LambdaFunction(env); 27 } 28 29 /** 30 * The function implementation over lambda expression 31 */ 32 private class LambdaFunction extends Function { 33 /** The environment where lambda expression was created */ 34 final Environment environment; 35 36 /** 37 * The constructor for the function 38 * 39 * @param env the environment where expression is created 40 */ 41 public LambdaFunction(Environment env) { 42 super(); 43 this.environment = env; 44 } 45 46 @Override 47 public Object apply(SourceLocation location, List<Object> args) { 48 assert arity() == args.size(); 49 Environment env = environment; 50 Iterator<Object> ia = args.iterator(); 51 for (Iterator<Parameter> i = parameters.iterator(); i.hasNext();) { 52 Parameter p = i.next(); 53 Object arg = ia.next(); 54 Variable var = new Variable(p.name.name(), "var".equals(p.type), 55 p.location, null, null); 56 var.bind(location, arg); 57 env = new Environment(env, var); 58 } 59 return value.eval(env); 60 } 61 62 @Override 63 public int arity() { 64 return parameters.size(); 65 } 66 } 67 } 68
In the operator
LambdaSequence
we just delegate all hard work to the lambda operator. This is
actually an expected way how extensions over existing constructs are
provided when using ETL. Instead of creating some macro, the
transformations are done over objects during compilation or
interpretation. However it is possible to introduce a kind of macro
facility and it will be possibly covered in some future tutorials.
Example 52. LambdaSequence.java
1 package calculator.interpreter.ast.lambda; 2 3 import calculator.interpreter.Environment; 4 import calculator.interpreter.ast.Sequence; 5 6 /** 7 * The sequence that might include the parameter statement 8 */ 9 public class LambdaSequence extends Sequence { 10 /** converted expression */ 11 private Lambda converted; 12 13 @Override 14 public Object eval(Environment env) { 15 if (statements.isEmpty() || !(statements.get(0) instanceof BlockParameters)) { 16 return super.eval(env); 17 } else { 18 if (converted == null) { 19 BlockParameters params = (BlockParameters) statements.get(0); 20 Lambda l = new Lambda(); 21 l.location = location; 22 l.parameters.addAll(params.parameters); 23 Sequence s = new Sequence(); 24 s.location = location; 25 s.statements.addAll(statements.subList(1, statements.size())); 26 l.value = s; 27 converted = l; 28 } 29 return converted.eval(env); 30 } 31 } 32 } 33
The letrec statement just expands itself two the sequence of
VarStatement
and
ExpressionStatemenet
instances on the first execution. So the statement works just like a
macro, but it is implemented on AST level.
Example 53. LetRec.java
1 package calculator.interpreter.ast.vars;
2
3 import java.util.LinkedList;
4 import calculator.interpreter.Cell;
5 import calculator.interpreter.Environment;
6 import calculator.interpreter.ast.ExpressionStatement;
7 import calculator.interpreter.ast.Statement;
8
9 /**
10 * The letrec allows defining mutually recursive definitions. It is equivalent
11 * to a sequence of unbounded let statements followed by assignment.
12 */
13 public class LetRec extends Statement {
14 /** definitions from letrec */
15 public final LinkedList<LetRecPart> definitions = new LinkedList<LetRecPart>();
16 /** true means that letrec parts has been already transformed */
17 private boolean converted = false;
18 /** transformed statements */
19 private final LinkedList<Statement> statements = new LinkedList<Statement>();
20
21 @Override
22 public Environment eval(Environment env, Cell result) {
23 if (!converted) {
24 for (LetRecPart p : definitions) {
25 VarStatement s = new VarStatement();
26 s.location = p.location;
27 Statement metadataSource = p.hasMetadata() ? p
28 : this.hasMetadata() ? this : null;
29 if (metadataSource != null) {
30 s.annotations.addAll(metadataSource.annotations);
31 s.documentation.addAll(metadataSource.documentation);
32 }
33 s.name = p.name;
34 statements.add(s);
35 }
36 for (LetRecPart p : definitions) {
37 ExpressionStatement s = new ExpressionStatement();
38 s.location = p.location;
39 Assignment a = new Assignment();
40 a.location = p.location;
41 a.first = p.name;
42 a.second = p.value;
43 s.value = a;
44 statements.add(s);
45 }
46 converted = true;
47 }
48 for (Statement s : statements) {
49 if (result != null)
50 result.value = null;
51 env = s.eval(env, result);
52 }
53 return env;
54 }
55 }
56
The class
LetRecPart
is just a data holder for it.
Example 54. LetRecPart.java
1 package calculator.interpreter.ast.vars;
2
3 import calculator.interpreter.Cell;
4 import calculator.interpreter.Environment;
5 import calculator.interpreter.EvalException;
6 import calculator.interpreter.ast.Expression;
7 import calculator.interpreter.ast.Statement;
8
9 /**
10 * The part of letrec statement
11 */
12 public class LetRecPart extends Statement {
13 /** the name part */
14 public Name name;
15 /** the value part */
16 public Expression value;
17
18 @Override
19 public Environment eval(Environment env, Cell result) {
20 throw new EvalException(this, "This statement is available only as "
21 + "a part of letrec statement and should not be "
22 + "evaluated directly");
23 }
24 }
25
The implementation of other operators is even more trivial.
Our samples referred to the calculator grammars and included sources using relative URLs up to this point. This way has an obvious problem the reference is context dependent. If we move the file around, we will need to change the reference. This will also have problem with a typical plugin situation, when grammars and reusable sources are located at the strange places.
Absolute URLs will not solve the problem either. Absolute local files URLs are system dependent and will mean different things on the different computers. The network URLs like http or ftp not only require network availability, but also make us to rely on some party to maintain these servers online and continue keeping the files as the specified locations.
Another way to refer to the grammars is using the default grammar mechanism that we have already implemented in our interpreter. But this mechanism has disadvantages as well. Only one grammar could be specified as default. So we are unable to make backward incompatible changes to the grammar, old sources should conform the new grammar as well. This is actually a big deal. The ETL allows introducing backward incompatible changes into the languages when extending the grammars, so allowing graceful fixing language design mistakes, and you are bound to make ones if you are working hard enough. Using the default grammar mechanism removes this power from the language designer. So this mechanism is better to be supported only for the short lived text (like stdin).
In XML world this problem is solved using XML catalogs [7] URIs, and public identifiers. The ETL is designed to reuse this mechanism for grammars as well and it is possible use this mechanism for our language as well (for the include statement).
For the catalog implementation we will use
the resolver library
from the
Apache XML commons project
. This library is redistributed within ETL package as it is used by
the etl2xml utilities. This library should be added to the project
classpath. However the catalog library is used only through the
interface
EntityResolver
, so it possible to provide other implementations of this interface.
First we need to define the catalog file, the catalog file that includes the new grammar and prelude files will look like the following:
Example 55. catalog.xml
1 <?xml version="1.0" encoding="UTF-8"?> 2 <!DOCTYPE catalog 3 PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" 4 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> 5 <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog" 6 prefer="public"> 7 <public publicId="-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar basic//EN" 8 uri="calculator-basic-0_2_1.g.etl"/> 9 <public publicId="-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar arith//EN" 10 uri="calculator-arith-0_2_1.g.etl"/> 11 <public publicId="-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar vars//EN" 12 uri="calculator-vars-0_2_1.g.etl"/> 13 <public publicId="-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar logic//EN" 14 uri="calculator-logic-0_2_1.g.etl"/> 15 <public publicId="-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar lambda//EN" 16 uri="calculator-lambda-0_2_1.g.etl"/> 17 <public publicId="-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar catalog//EN" 18 uri="calculator-catalog-0_2_1.g.etl"/> 19 <public publicId="-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar//EN" 20 uri="calculator-catalog-0_2_1.g.etl"/> 21 <public publicId="-//IDN etl.sf.net//Calculator Sample Prelude vars//EN" 22 uri="prelude-vars.c.etl"/> 23 <public publicId="-//IDN etl.sf.net//Calculator Sample Prelude logic//EN" 24 uri="prelude-logic.c.etl"/> 25 <public publicId="-//IDN etl.sf.net//Calculator Sample Prelude lambda//EN" 26 uri="prelude-lambda.c.etl"/> 27 <public publicId="-//IDN etl.sf.net//Calculator Sample Prelude catalog//EN" 28 uri="prelude-catalog.c.etl"/> 29 <public publicId="-//IDN etl.sf.net//Calculator Sample Prelude//EN" 30 uri="prelude-catalog.c.etl"/> 31 </catalog>
The only thing that we will add to the grammar in this section is an include directive that supports public identifiers in addition to system identifiers. And note that we referred to the lambda grammar using a public identifier.
Example 56. calculator-catalog-0_2_1.g.etl
1: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
2: grammar calculator.CalculatorCatalog {
3: include public "-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar lambda//EN";
4: namespace default t = "http://etl.sf.net/samples/calculator";
5: context default Expressions {
6: statement IncludeStatement {
7: ^t:IncludeStatementCatalog {
8: % include;
9: {
10: @ systemId = string(quote='"');
11: % public {
12: @ publicId = string(quote='"');
13: }?;
14: } | {
15: % public {
16: @ publicId = string(quote='"');
17: };
18: };
19: };
20: };
21: };
22: };
23:
The only new definition is an extended version of the statement
IncludeStatement
. Now it supports public identifiers in addition to system
identifiers. With this statement defined we now are able to refer to
the calculator files like the following:
Example 57. prelude-catalog.c.etl
1: doctype "urn:publicid:-:IDN+etl.sf.net:Calculator+Sample+0.2.1+Grammar:EN";
2: include public "-//IDN etl.sf.net//Calculator Sample Prelude lambda//EN";
3:
You could check also check prelude AST to see how our changes in the grammar affect include statement.
Note that we have referred to the grammar using the urn form of the public identifier as system identifiers. It is possible to map any URIs using catalogs as well, and "urn:publicid" is a special form of URI that is equivalent to public identifiers. However it is recommended to use public identifier explicitly if it is possible.
And this final part of the tutorial we could run all previous sources:
Example 58. sample-catalog.c.etl
1: doctype public "-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar//EN";
2: // run all samples
3: include "sample-basic.c.etl";
4: include "sample-arith.c.etl";
5: // since vars conflicts with logic over variable name, load it in the separate context
6: { include "sample-vars.c.etl"; };
7: include "sample-logic.c.etl";
8: include "sample-lambda.c.etl";
9: help is_odd;
10: help q'!*=';
You could check also check sample AST .
Note that the grammar is referenced using public identifier.
There is only new include statement in AST classes. And by itself it needs very little additional work as we have left very handy extension point in the original statement.
Example 59. IncludeStatementCatalog.java
1 package calculator.interpreter.ast;
2
3 import net.sf.etl.parsers.LiteralUtils;
4 import org.xml.sax.InputSource;
5
6 /**
7 * Redefined include statement that uses public id as well
8 */
9 public class IncludeStatementCatalog extends IncludeStatement {
10 /** The public identifier */
11 public String publicId;
12
13 @Override
14 protected InputSource createInputSource() {
15 String resolvedSystemId = systemId == null ? null : resolveSystemId();
16 InputSource input = new InputSource();
17 input.setSystemId(resolvedSystemId);
18 input.setPublicId(LiteralUtils.parseString(publicId));
19 return input;
20 }
21 }
22
We just handle the case of the missing system identifier and specify public identifier if it is available. We let the ETL parser to handle the public identifier.
The new include statement rely on the fact that the parser
configured with entity resolver, the runner
RunnerCatalog
just provides it.
Example 60. RunnerCatalog.java
1 package calculator.interpreter; 2 3 import java.net.URL; 4 import org.apache.xml.resolver.CatalogManager; 5 import org.apache.xml.resolver.tools.CatalogResolver; 6 7 /** 8 * The runner that understands XML catalogs using entity resolver library 9 */ 10 public class RunnerCatalog extends RunnerVars { 11 /** 12 * The collection of catalog files. 13 */ 14 private final StringBuilder catalogs = new StringBuilder(); 15 /** 16 * if true, the grammars catalog on the class path will be attempted to be 17 * loaded. 18 */ 19 private boolean loadSystemCatalog = true; 20 /** 21 * if true, only public identifiers will be used for the default grammars. 22 */ 23 private boolean publicIdOnly = false; 24 /** 25 * The catalog resolver 26 */ 27 private CatalogResolver resolver; 28 29 @Override 30 protected void parseArgs(String[] args) { 31 super.parseArgs(args); 32 // add grammars catalog if it is on classpath and it is the file 33 if (loadSystemCatalog) { 34 URL systemCatalog = getClass().getResource( 35 "/calculator/grammars/catalog.xml"); 36 if (systemCatalog == null) { 37 System.err.println("The system catalog is not on the classpath"); 38 usage(); 39 } 40 if (!"file".equals(systemCatalog.getProtocol())) { 41 System.err.println("Only the file protocol is suported: " 42 + systemCatalog); 43 usage(); 44 } 45 if (catalogs.length() != 0) { 46 catalogs.append(';'); 47 } 48 catalogs.append(systemCatalog.getPath()); 49 } 50 CatalogManager mgr = CatalogManager.getStaticManager(); 51 mgr.setCatalogFiles(catalogs.toString()); 52 mgr.setPreferPublic(true); 53 resolver = new CatalogResolver(mgr); 54 } 55 56 @Override 57 protected String getDefaultGrammar() { 58 return publicIdOnly ? null : super.getDefaultGrammar(); 59 } 60 61 @Override 62 protected String getDefaultGrammarPublicId() { 63 return "-//IDN etl.sf.net//Calculator Sample 0.2.1 Grammar " 64 + getLanguageLevel() + "//EN"; 65 } 66 67 /** 68 * @return the resolver for the catalogs 69 */ 70 public CatalogResolver resolver() { 71 return resolver; 72 } 73 74 @Override 75 protected int parseOption(String[] args, int i) { 76 if ("-c".equals(args[i])) { 77 // the catalog file on the command line 78 if (i + 1 >= args.length) { 79 System.err.println("Missing argument for -c option"); 80 usage(); 81 } 82 if (catalogs.length() != 0) { 83 catalogs.append(';'); 84 } 85 catalogs.append(args[i + 1]); 86 return i + 2; 87 } 88 if ("--no-system-catalog".equals(args[i])) { 89 // disable loading of the system grammar catalog 90 loadSystemCatalog = false; 91 return i + 1; 92 } 93 if ("--no-system-default-grammar".equals(args[i])) { 94 // disable loading of the system grammar catalog 95 publicIdOnly = true; 96 return i + 1; 97 } 98 return super.parseOption(args, i); 99 } 100 101 @Override 102 protected String getDefaultLanguageLevel() { 103 return "catalog"; 104 } 105 106 @Override 107 protected String optionList() { 108 return super.optionList() 109 + " [-c <catalog file>] [--no-system-catalog] [--no-system-default-grammar]"; 110 } 111 112 @Override 113 protected void optionUsage() { 114 super.optionUsage(); 115 System.err.println(" -c <catalog file> - load catalog file"); 116 System.err.println(" --no-system-catalog - disable loading" 117 + " system catalog"); 118 System.err.println(" --no-system-default-grammar - disable providing " 119 + "system id for the default grammar"); 120 } 121 122 @Override 123 protected void createFactory() { 124 super.createFactory(); 125 factory.setEntityResolver(resolver); 126 } 127 128 /** 129 * Application entry point. 130 * 131 * @param args application arguments 132 */ 133 public static void main(String[] args) { 134 new RunnerCatalog().start(args); 135 } 136 } 137
The resolver itself is created in the method
parseArgs(...)
. We also add a catalog with calculator grammars if it is on the
classpath. Note that the resolver library only support catalogs on
the file system. So if you package the calculator application in the
jar file, the loading of the system grammar catalog will stop
working. In this case you will need to provide catalog on the
command line explicitly. When creating the tool that support
language extensions it is very likely that a custom resolver has to
be created that is aware of the plugin architecture. For example,
Eclipse has implemented own XML catalog support for editing XML
files.
In the method
createFactory()
we set this created resolver to the parser factory, so it will
associated with each new parser created using this factory instance.
The resolver will be used for both the ETL sources to be parsed and
the grammar files (as a grammar file is a ETL file).
No even can use only public identifiers for the specifying the
default grammar as well. The method
getDefaultGrammarPublicId()
returns the public identifier for the default grammar, and the
method
getDefaultGrammar()
now can be forced to return null, in the case if we want to check if
public identifiers are actually used.
If you run this this runner it will complain that it cannot locate
the file
CatalogManager.properties
. This message is harmless, but you could put the following file on
the classpath:
Note that if you will uncomment the line with the property
verbosity=10
you will be able to see the messages about the resolution process
The public and system identifiers play an important role in the language evolution. The grammar is a mapping from the input text file to AST and the language could change in the two aspects:
The allowed set of inputs could change, by introducing new operator and keywords and by removing old ones from the language.
The mapping from tokens to AST could change. So previously the same input could have a different meaning to the tools.
The grammar could be also changed in the way that changes neither of these aspects. For example new abstract context and grammars could be introduced, or other syntax expressions with the same meaning could be used. Also grammar could be changed from using grammar include directive with the previous version of the grammar, to direct inclusion of the definitions. Also the used grammar definition language itself could be changed, since ETL is still evolving.
There is a little problem if such changes happen during private evolution of the language, when the both language and sources that conform to it are changed in the parallel. However if the grammar is released into the wild, the users would like be sure that the meaning of the sources that they have created is not changed behind their backs. For example an addition of a new operator or statement that uses a specific identifier as a keyword (for example "until") could cause an error in sources that use this identifier as a name.
System and public identifiers provide a way around this problem. The changed grammar could be referred by a new identifier. So the some sources will use old grammars and other sources will use new grammars. This does not imply that old grammars should be supported infinitely, but if support for a grammar is dropped and grammar is removed, the user will receive a error message that the grammar could not be located. As alternative, the user could receive an error message saying that his grammar maps to the unsupported AST object, if we drop support for the grammar on the AST level. The user than has to visit the file and change public identifier in the source that uses obsolete grammar and adjust the source to conform to the new grammar (or in case of feeling lucky just do global search and replace of doctype instructions).
It is also possible to create a tool that upgrades old grammar syntax to a new one. During the development of the ETL parser version 0.2.1 there were some changes of the grammar definition comparing with the version 0.2.0. To upgrade the sample grammars an XSLT script was written that replaces old syntax constructs with new ones. This script is included in the distributive of the ETL framework. Considering that the changes were mostly cosmetic ones, XSLT was enough. It might be also possible to write tool that does smarter things, for example replacing names that match new keywords with some quoted form.
Public identifiers are generally more convenient than system ones for distributing multiple grammars. If there was no changes between "1.0b2", "1.0rc", and "1.0" versions of the grammar, the public identifiers that correspond to these versions could simply be mapped to the same grammar file using the catalog file.
In this tutorial we have defined a simple calculator language that demonstrated typical grammar language features. And we have mapped it to the simple AST. However some features of the grammar language are still uncovered and you will need to study the definition of ETL grammar and other sample grammars.
As you have possibly noticed, we have used concepts natural to language design. We have added new grammars, contexts, operators, statements, and fragments. We did not need to translate these concepts to EBNF-like notation. The ETL grammar compiler does this work for us. We also did not have to think how to update productions of update grammars when we have changed something. We have just redefined what we needed to change. This is what makes ETL a higher-level grammar definition languages than most of others language definition frameworks.
Note that while our sample language works, we have used the simplest possible implementation we can get away with. We also have not defined some trivial parts like the collection types. They bring few new things from point of view of the syntax. So please do not consider this tutorial to be a tutorial on the topic of programming language implementation.
Another important uncovered aspect is how to support macros using ETL. It is possible to create a limited macro facility using ETL means, but it requires jumping through some loops and will be covered in the next versions of the tutorial.
[1] There are also extensibility constructs that are not covered in this tutorial. Consult ETL specification, ETL grammar definition, and samples that come with ETL parser to get more information.
[2] Actually lexical level picked a lot of ideas from Prolog, Java, and even Ada. So you will likely find a lot of familiar lexical elements.
[3] The significant token is anything except white spaces and comments (however documentation comments are considered as significant).
[4] Those who are familiar with the Prolog programming language surely already guessed the source of inspiration for this notation.
[5] It might have been done in a simpler way, but in this way we make some use of the annotations.
[6] This shows how important is choosing right core abstractions when designing a language. However ETL allows defining languages that are not computationally complete, so you do not have to start creating a complete programming language when defining a DSL.
[7] It is strongly advised to read the article if you are not familiar with XML catalogs, as it introduces some terms that are not explained here.