Design Log

Constantine Plotnikov

<constantine.plotnikov@gmail.com
   >

Revision History
Revision 0.2.1	2009-01-19
Added changes for the version 0.2.1.
Revision 0.2.0	2006-02-05
This is the first version of the document.

Abstract

This document describes design alternatives and rejected design choices for ETL meta-language.

Table of Contents

1. Why not macros?

2. Recursion

3. Rejected Phrase Syntaxes

3.1. Lisp/Scheme
3.2. Dylan
3.3. Python
3.4. Scripting

4. Token and Include Wrappers

5. Context Include from Designated Grammar

6. Attributes and Documentation

Bibliography

1. Why not macros?

Most of current language extensibility work focuses on macros. This precludes alternative tools for the languages. Sometimes tools are interested in the deep semantics of the source. Such tools would need to expand constructs to more primitive ones. Others like text editors are only interested in surface syntax. Expansion into deeper constructs only would confuse them. However without expansion it is often impossible to check for correctness of the syntax.

2. Recursion

The language consciously limits recursion forms to only expressions and blocks. It would have been possible to introduce other recursive constructs too. For example def construct might have invocation semantics instead of replace semantics.

The current intuition about other constructs is that it would have made the language so rich that it would be possible to define new constructs in non-uniform way. Therefore we would get back in the hell of languages that looks completely unlike each other.

3. Rejected Phrase Syntaxes

At some point of language design alternative phrase syntaxes were considered. The sections below list rejected alternatives.

3.1. Lisp/Scheme

This oldest implemented extensible phrase syntax that is available. It has many advantages, but also as many disadvantages. Among them is that it is too difficult to control structure of the program. It is too easy to skip extra parenthesis. Syntax aware editors help, but only up to the point. Sometimes a luxury of having syntax aware editor is missed, particularly in code generation tasks.

Other problem is that infix syntax is somewhat beneficial in many programming constructs like property navigation and arithmetic because it is easier to be aware of the context of operation.

3.2. Dylan

Biggest problem is impossibility to reliably detect start of block so error recovery is extremely difficult. Another significant problem is that it supported only start/end style of statements. All statements without blocks have to be hard-coded into the grammar. precedence

3.3. Python

Python phase syntax is extremely attractive. It is most friendly to touch-typing of all other considered syntaxes. It is also allows quite compact statements with minimum of additional symbols. This phrase syntax even has been implemented in one of internal prototypes.

The biggest disadvantage of this syntax is that it is difficult to use blocks in expressions because end of statement always matches end of line. Also there is a problem that if statement has more than two sections, it has to be on more than one line. For example of discussion of this issue see [LDINJSP] .

Despite the fact that Python syntax has not been chosen as phrase syntax, its ideas greatly contributed to the project. For example, error recovery on block level and phrase level syntax was inspired by Python line syntax and Python error recovery policy.

3.4. Scripting

Error recovery/formatting problem

Scripting variation of current syntax has been mostly inspired by E programming language. The rules for it look quite simple. However there is a problem with error recovery. Semicolon in current syntax clearly designates end of the statement and in case of error parser has to skip until it. In scripting syntax, it is not clear whether the statement is continuation of previous statement or new one. E's rule is that it is a continuation if expression had not finished being parsed. However this rule is formulated on term layer rather than on phrase level. There is an alternative rule in python that it is a continuation if it is inside brackets. But this rule somewhat limit brackets usage and makes syntax very sensitive to mistakes related to brackets.

There is also rule for fallback continuation with \ character and new line. This rule is one of major things that make experience with make files so horrible.

4. Token and Include Wrappers

The parser logically works as function that takes input as argument and returns AST.

Tokens are mapped to simple values in the AST like string, integers, or even enumeration. There are also objects that have properties that hold these simple values. Also objects might have properties that hold other objects.

When it is straightforwardly mapped to object models of programming languages, there is a problem. Position information might be attached only to objects. It cannot be attached to primitive values.

This led to design of token wrappers which is shorter form of the following construct. For example there is the following construct:

integer wrapper xj:IntegerLiteral.value

That is shorthand for the following:

^ xj:IntegerLiteral {
  @ value = integer;
}

However possibly it is a misplaced workaround. The position information is reported by term parser for primitive values too. It is specific object model what cannot represent them. So possibly we should just make tokens a primitive in the model. And request that if the tokens should be wrapped to object to represent position in object model, than it should be done by mapping level rather than by parser, because parser returns all required information.

Also it looks like in some cases we might not care about whether token wrapped or not, and usage of unwrapped token would be faster.

Somewhat similar problem is with includes. It is unreasonable to expect that Class statement will implement abstract method statement subclass. So currently a wrapper that wraps class statement might be specified in include from method content context to classifier context in EJ grammar.

However it is a fix to limitation of the object model. It is not clear whether this fix is well placed.

5. Context Include from Designated Grammar

At one time context include construct allowed the following form:

grammar A {
	import B = "other.g.etl";
	context c {
		include e from B;
	};
};

It looked like a nice construct at first. The problem were that it were hard to define a clean include semantics for it. Particularly how it will interact with grammar and context includes.

This construct has been eliminated in favor of the future suppress/rename constructs that allow achieving the same goals in more clean way. However it might be possible to return to this issue in the future when include semantics is well defined.

6. Attributes and Documentation

Attributes and documentation statements insert values at some attribute inside normal statement. However these objects act as prefix operators that act on statements rather than on expressions. So more logical way would be generating objects like AttributedStatement and DocumentedStatement .

Primary problem with such approach is usability. The user needs to retrieve attributes and documentation associated with statement. Not a statement associated with attributes or documentation.

As one of hacks currently implemented are fallback statements. These statements are considered to become active if there error in attributes or documentation comments. However the hack is quite dirty from point of view of implementation. It will be yet another special case from point of view of LL1 grammar.

If there is a statement with empty look-ahead, it is chosen as fallback. Otherwise arbitrary statement is chosen.

Bibliography

[LDINJSP] Guido Rossum, van. Language Design Is Not Just Solving Puzzles. http://www.artima.com/weblogs/viewpost.jsp?thread=147358 . 2006-02-10.