Problem Statement

DSL are hot topic, a good introduction can be found at this link .

It is useful to contrast ETL with two other technologies that allow creation of domain specific languages:

XML is a similar framework for creation domain specific languages however it is not suitable for defining languages based on statements and expressions. For example of attempt to define such language see "http://www.o-xml.org/spec/langspec.html" (they actually cheat a bit, because they use expression language in some places). The resulting language is so cumbersome that is practically unusable. On other hand XML has been quite successful for DSL oriented on document construction (for example Docbook) or for machine-processable data exchange (Web Services, however people do not like XML for verbosity in this area).
ETL does not directly compete with XML because it does well things where XML is too verbose to be useful as surface syntax. And ETL has some disadvantages in areas where XML shines. For example ETL is unsuitable for creation of articles like this one. For web services, ETL parser is unjustifiably complex and easy writing/reading by human is not significant for this area (however it is still required that text can be analyzed and XML still provides this).
Eclipse GMF framework has somewhat similar purpose but is targeted to creation of Graphical Domain Specific Languages. For some purposes such frameworks have been very successful (UML, ER-diagrams). For other they are not so good. For example diagram for complex BPEL process quickly goes out of control. However XML representation is not much better.

There are gray areas where several approaches are applicable. In those areas it might be useful to have several views of the same concept.

Note that statement/expression-oriented languages are created very often. Among examples are:

Relax NG Compact Syntax (http://www.relaxng.org/compact-20021121.html)
Entity Catalogs (http://www.oasis-open.org/specs/tr9401.html)
Drools language (http://drools.codehaus.org/) (it uses XML as basis for DSL, and the resulting language is difficult to read).
Hibernate mapping language (uses XML).
New research programming languages are invented (for example BitC http://www.coyotos.org/docs/bitc/spec.html). Another interesting link is http://blog.intentionalsoftware.com/intentional_software/2005/12/computer_langua.html .
Antlr and yacc grammar definition syntaxes.

Currently there are three basic approaches for creation of support tools for DSL:

Base language on XML.
Use UML tools or other graphical tools and generate code from it
Create custom grammar and create parsers using parser LL or LR parser generators.

If approach 1 is selected, the resulting language is usually too cumbersome to be directly written by human. Even schema aware editors do not help much. The do reduce typing. But even reading requires too much effort.

In case of approach 2, things are sometimes better. However it is quite difficult to update model. The user has to use mouse rather than typing. And this is significantly slower for most users.

When constructs expressed by language are very high-level, it still could be better from long-term point of view. However this significantly raises barrier for usefulness of tools. Another significant problem is accessibility. Visually-impaired people will have problem with usage of visual tools, on other hand other kinds of users might have a problem with tools that require a lot of typing (movement-impaired people).

In case of approach 3 it is very easy to start. It is possible to define a grammar for the language that suits problem domain. It is even possible to generate parser from grammar using tools like yacc and antlr. However it is very hard to move beyond this point.

The developers in approach 3 face the following problems afterwards.

Parsers works well on correct source, but error recovery is very difficult and not automated. Because LL and LR grammars are very flexible, it is possible to define almost any language, and this disables development of meaningful error recovery policy. Default generated parsers usually stop at first error in source.
Incremental parsers that are required for editors (for example Eclipse) are difficult to develop. They usually coded manually for each language.
Automatically generated parsers have limited modes supported. For example parser might create AST or execute actions when some construct is detected. The actions usually are hard-coded into grammar and than executed. Also generated parsers usually work in push model (in XML it corresponds to SAX API) rather than pull model (in XML it corresponds to StAX API).
Parser generators generate code that has to be compiled and than used. This code usually strongly coupled with specific version of runtime. It is quite regular situation when two antlr generated parsers cannot live together because of version incompatibility between runtimes. Compare it with world of XML where different parsers provide the same external interface and use the same grammar definition files that can be updated on fly (DTD and XML Schema).
It is impossible for user of the compiler that uses parser to update grammar recognized by parser to use some minor extension of the language. For example user cannot support of C# "using" statement to Java compiler. Another possible example is "foreach" statement and enumerations that has been missing until Java 1.5. This tool extension requires two things:
1. Adding support of new constructs to the parser. This is very hard using current parser technologies. Particularly considering that many production-level parsers are written manually in order to support error recovery.
2. Adding support to compiler. This is much easier part because technology for extending transformation components is well developed. Compilers are often using it internally to transform high-level constructs to more primitive forms. This is just a question of compiler plug-in architecture.
This could look like a minor issue. Why one would want to change compiler? But it looks such only because we have used to this. This restriction disables user innovation at language level.
On code library level there is a lot of innovation because users can use reusable libraries and develop custom code basing on them. The best ideas from custom code libraries are selected and integrated into reusable libraries.
However on language level there is much less innovation by users of compilers. Creating new dialect of the language is too big effort. And new feature has to be used at least in three projects to be considered reusable. Most of times, it is economically unfeasible to invest into new tool chain. If code were written in that way, we would not have ability to define even simple procedures and we had to ask our library providers for new procedures. The custom code would have been a big main method.
Only people that have full power to innovate are compiler developers. Others have to live to restrictions laid by developers of compilers.
I think that one of reasons of popularity of the XML is that it allowed easy innovation on syntax level and ready-to-use tool chain that supported the process.
There are no standard and portable syntax to define grammars. Each tool uses own tool specific language. This makes impossible publishing executable specifications. The specification of some language has to be transformed into tool specific specification.
Languages do not mix well. To create a new language that combine features of two one have to build tool-chain almost from ground up. For example there is Java language and SQL language. There also a hybrid language named SQLJ. It have not caught much support because it requires the separate tool chain. It requires own compiler and own editor support and for editor a new parser has to be written that can works with errors in both SQL and Java code.
Other problem is that languages are very different. SQL code do not look like Java code and programmers has to switch between different rules for literal values (for example number literals are different in SQL and escape sequences do not look like ones from Java) and other syntax peculiarities.
There are no reusable or standard components between languages. If a new language is designed, a language designer cannot get some standard module like "OASIS Common Arithmetic Syntax" and reuse it in the language. But such thing is possible for XML, authors of WS-Security grammar had reused XML signature and XML encryption grammars.

I think that there are the several reasons for these problems.

Languages are defined using too low level constructs. If read language specification, it talks about statements, operators, parts of statements, names. However when we read grammar even in language specifications, we see productions and tokens.
Because of this when we want to introduce new statement or operator to the language, we have to map it to update of the productions. Because it routine and error prone work. It is obvious place of automation.
Basic low-level constructs like blocks, literal values are designed completely independently and rules are different for different languages. If languages are mixed together, it will be difficult for language user to understand what is allowed and what is not. Also it is difficult to mix languages together because of reserved word concept, what is reserved word in one context might have no significance in another.
There are no common standard interfaces for language tool chain. For XML there is standard tool-chain that includes editors, parsers, and transformation components. DSL creators have to use XML that have ready tool chain or they have to reproduce it for themselves. Producing tool chain is very heavy investment.

To summarize points above, it is believed that these problems generic parser generators are caused by fact that solved task is too broad. And if we will reduce the scope of tool we can gain advantages of common generic tool-chain. This has been done with XML before after all.

So to address these problems a domain specific language construction framework named ETL has been created. It is designed with the following principles in mind (no particular order):

The framework should work with plain text.
1. It should be possible to use generic text editors that are not aware of ETL to edit ETL texts.
2. Usage of special hidden markup in the text should not be required.
3. The framework should not restrict editing operations; particularly it should be possible to save and load an incorrect text in editor as intermediate phase.
4. It should be possible to define language that is reasonably easy to read and write. As a dog-food test, the grammar language definition syntax should be written in itself and it should use the same parser generator pipeline to parse grammars.
It should use high-level constructs like operator and statement, because these constructs are actually used in language definition process. Translation of these constructs to executable parser should be task of tool-chain rather than language designer.
It should be possible to define a new language that extends existing language with new operators and statements.
It should be possible to define and reuse language modules.
It should be possible to redefine existing operators and statements when language is extended.
It should be possible to define languages independently and to combine them together. I.e. things like "OASIS Common Arithmetic Syntax" should be possible.
It should have common structural principles for organizing source code at least on lexical and phrase level.
It should be possible to create common reusable tool-chain that contains the following components (we already have this for XML):
1. Generic parser with a working error recovery model and syntax construct identification model. The parser should be useful for different situations like editors, interpreters/shell scripts, compilers, code analyzers. Note that editors and compilers put very heavy requirements on error recovery. Basically source should always parse and errors should be reported.
2. Generic editor that can be specialized for specific language that uses generic parser and provide basic services like outline and syntax highlighting. Note that it is believed to be possible with planned additions to the parser. However it is not yet tested that it is actually possible.
The framework should be suitable for definition of different classes of the languages that are based on statements and expressions including:
1. Imperative
2. Scripting
3. Functional
4. Rule-based
Such languages should be definable in the way that is more or less natural to them.
Note that it is not requirement to be able to define existing languages except by sheer luck. This framework is targeted to construction of new languages. Support for old languages would have put the framework back to production/token area. This is similar to XML. It is not possible to define rtf syntax using XML. But it is possible to define XML language for documents.
The framework should follow MDA principles. From logical point of view, the parser output should be a model that can be fed further into model processing tools.

The project is more or less successful in following these principles. With any luck ETL will be viable replacement for XML in domains where XML is currently misused only because XML has a good tool support.

Note that resulting language definition framework is able to do more in tool-chain for domain specific languages because set of allowed languages is restricted. The allowed languages are strict subset of the LL(1) languages, and even not all LL(1) languages are supported. The languages should follow rules of lexical and phrase level.

Tools like antlr and yacc can handle much larger set of languages, but for that larger set of languages much less common tools can be developed because these languages have much less in common. Also because these languages have a little in common, it is not possible to merge languages cleanly.