Features Planned for Future of ETL

Constantine Plotnikov

<constantine.plotnikov@gmail.com
   >

Revision History
Revision 0.2.1	2009-01-19
Updated the section Section 2.1, “Optimization” reflect changes in the version of 0.2.1.
Revision 0.2.0	2006-02-05
This is the first release of the document.

Abstract

This article describes features planned for future releases.

Table of Contents

1. Language Features

1.1. Unicode Support
1.2. Selective disabling/enabling annotations
1.3. Primary operators
1.4. Fixed values
1.5. Modifiers and All syntax operators
1.6. Processing Instructions
1.7. "#!" - comment
1.8. Abstract Definitions and Imports
1.9. Structured Contexts
1.10. Nested context
1.11. Extensibility Constructs

2. Runtime features

2.1. Optimization
2.2. IDE Support
2.3. Better Error Recovery
2.4. Better Command Line Utilities

1. Language Features

1.1. Unicode Support

Current version of parser has the following defects with respect to Unicode:

Only ASCII characters are supported in the identifiers, graphics and for brackets and quotes. Range should be extended to all applicable Unicode characters. Also ignorable characters should be supported too and they should not affect matching of keywords.
For identifiers, brackets, and quotes support would be quite straightforward (except for different open and close quotes in strings).
For graphics, a valid set of characters is to be decided.
Surrogate characters are not supported in the strings.

1.2. Selective disabling/enabling annotations

Currently annotations are disabled and enabled for all statements in the context. However for some statements, the annotations might be useless. In that case it might be make sense to allow disabling annotations for these statements.

1.3. Primary operators

The composite "f" operators are planned to be renamed to the primary operators and this will be reflected in the syntax.

1.4. Fixed values

The ability to insert a value tokens to the event stream that do not correspond to any text in the input stream. This way it would be possible to reuse AST nodes in a greater way (for example Java's "conditional and" and "conditional or" operators will be expressible using "if" operator).

1.5. Modifiers and All syntax operators

Another features that is planned is checking multiplicity of modifiers in the parser. So the there could be at most one modifier for each single assignment variable and only the one modifier with a specified value.

It is also planned to introduce "all" operator that that matches its content in any order, but each part only once.

The both features require adding state to the parser in order avoid exponential number of states.

1.6. Processing Instructions

The block comments of the form /*? ?*/ and line comments that starts with //? will be treated as processing instructions. Processing instructions will start with name (ETL identifier). The processing instructions names that start with etl will be reserved. The processing instruction etl (must be first element in the file and possibly after #! - comment) will be used to specify a version of the specification that is used and the character set of the file.

1.7. "#!" - comment

The sequence of characters #! will be treated as line comment at beginning of the file to support creation scripts on the Unix-like platforms.

1.8. Abstract Definitions and Imports

Abstract context usually have usually have extension points that should be implemented by including context. Two extension points are usually used imports and defs.

One of planned features is support for abstract imports and definitions. The contexts that include such abstract context will have an error if such definitions and imports are not redefined.

The abstract def and import construct will have no body and might be contained only in abstract context:

   import abstract expressions;
   def abstract MethodModifiers;

All abstract imports and definitions must be resolved in non-abstract context.

This feature would be most useful for developers of grammars. So they will know that they have forgotten to implement some required extension point.

1.9. Structured Contexts

Another important feature that has not got into the current release is the support for structured contexts. It is often desirable to allow statements to happen only in certain order within context. For example on top level context of Java it is desirable to allow package statement first than imports and than definitions. Switch statement is another example. Default construct in switch should follow case constructs.

In order to allow this a notion of statement precedence will be introduced. Statements are sorted by precedence. Statements of the same precedence could be mixed together. There will be also statements that have precedence "any". These statements might happen in any precedence level. Example of "any" statement is blank fallback statement.

1.10. Nested context

Some contexts use dependent context to implement some features. These dependent contexts usually need to refer to imports in primary context or just primary context. An example is switch statement in samples.

In order to simplify work with such utility contexts, it is currently supposed to implement nested contexts. These nested contexts will inherit imports and defs from parent context. The child context will be able to refer to parent context using ".." import name. When parent context is included into some other context, the child context is automatically included and there will be include relationship between similarly named nested contexts.

Possibly there will be some restriction on what nested contexts will be able to do.

This feature would be just convenient shortcut. It is possible to do this in other way, however it is somewhat inconvenient.

1.11. Extensibility Constructs

It is planned to add the following minor extensibility constructs to the language.

1.11.1. Suppress

The suppress construct will look like the following:

   suppress Name; 
   suppress Name from IncludedGrammar; 
   suppress Name from IncludedGrammar.includedContextName;

The construct will suppress definition that is defined in the specified grammar and contexts.

The last form from examples is available only for definitions and imports in context. Other forms are available for grammar imports and contexts.

1.11.2. Rename

The rename construct will look like the following:

   rename Name as NewName; 
   rename Name from IncludedGrammar as NewName; 
   rename Name from IncludedGrammar.includedContextName as NewName;

The construct will rename definition from included grammar or to other name.

The last form from examples is available only for definitions and imports in context. Other forms are available for grammar imports and contexts.

2. Runtime features

2.1. Optimization

The following optimizations are planned:

Peep-hole optimization. Currently the compiler generates state machine where choices transfer control to other choices with practically the same set of alternatives. It is planned to implement a peephole optimization where choice would refer to the state referred by the next choice directly without repeating intermediate choice. This process will also remove nop states from the state machine that are currently added to simplify compilation process.

2.2. IDE Support

Currently grammars, once compiled, live in grammar cache forever. This is suitable behavior for most command line tools. Such tools work for limited time and usually with limited set of grammars. However GUI and server tools might require mechanisms to evict grammars from cache in order to save memory and to use most current version of grammar.

It is planned to provide a pluggable grammar cache mechanism and its implementation that is compatible with typical IDE plug-in architecture. This implementation will remove grammars when plug-ins that provide them are stopped and will remove from cache grammars changed in project file system. Also non-system grammars will be evicted from cache after certain timeout.

2.3. Better Error Recovery

Smarter error recovery is planned. It will use keywords, list separators, and block starts and will try to use them during recovery process.

2.4. Better Command Line Utilities

The current command line utilities are quite inconsistent and are inconvenient to use. The next version will have more consistent support for XML catalogs and a better handling of command line parameters. Currently only limited documentation is available in installation instructions .