Revision History | |
---|---|
Revision 0.2.1 | 2009-01-19 |
The specification finally carries much more meat than before. EBNF is provided to specify some layers and grammar definition in itself is included into the document. | |
Revision 0.2.0 | 2006-02-05 |
This is the first draft of the specification. ETL now uses itself to define grammar. |
Abstract
This specification describes syntax and semantics of ETL meta-language. This language is intended to provide framework for creation of human readable and human writable languages that can be extended and combined together.
Table of Contents
List of Examples
The language consists from three layers: Lexical, Phrase, and Syntax. Each layer delimits underlying layer and annotates objects or underlying layer.
This is primitive layer provided by underlying runtime. This layer produces a sequence of characters from some data source.
On this level a character stream is translated into the sequence of tokens.
On this level stream of tokens is translated into blocks, segments, and annotated tokens. Tokens are annotated as belonging to one of the following classes:
Ignorable
Control
Significant
Only significant tokens have to be considered by other levels during parsing. Other are just passed through.
On this level source code is mapped to abstract syntax tree.
Differently from most of other syntax definition frameworks, this level uses notion of statements and operators rather than some form BNF.
There are the following reasons for it:
Abstraction of statement and operator is used by language designers for long time. However these abstractions are not directly expressed by meta-languages.
Using higher level constructs directly, provides more high-level extension points. It is possible to add new statements and operators rather than new productions that should be integrated with existing languages.
Table of Contents
The lexical level is quite traditional. Current definition of lexical is incomplete with respect to Unicode. Outside of strings and comments only ASCII characters are supported.
The lexical level is fixed and it cannot be extended by grammar writer. [1]
Another feature is that there are no keywords on lexical level. Whether the specific token is keyword or not is content dependent. Keywords are treated as local rather then global.
|
Both UNIX and DOS new line styles are supported
|
Tab, space and new line characters outside of string and comments are considered as white space tokens. The conforming parser may merge individual characters into bigger tokens.
|
Other kinds of Unicode spaces and new lines will be supported in the future.
Comments might be of three kinds:
These are traditional C++/Java/C# comments that starts at with
/*
and end with
*/
. Nested block comments are not allowed.
These are traditional C++/Java/C# line comments that starts at
with
//
and end at the end of the line.
These are traditional C# documentation comments. These comments
are a specialization of line comments and are treated as line
comments in places where documentation comments are not expected.
Documentation comments start with
///
and last until the end of the line.
A single format for documentation comments has been selected to make different languages defined with ETL framework consistent. C# comment format has an advantage over Java format in that it allows any text inside comments. In java documentation comments it is not possible to use block comments or documentation comments in sample code.
|
The ETL recognizes traditional bracket kinds
[]
,
()
,
{}
. The biggest difference from C-like languages is that '[' can be
directly followed by graphics token, and ']' can be prefixed by
graphics token. The text
[++i++]
will be parsed as three tokens
[++
,
i
, and
++]
. So it is a good style always put spaces after open square bracket
and before close bracket. The graphic suffixes and prefixes allow
easy introducing brackets with custom semantics to the language.
|
All brackets are singleton characters. They always consist of single character from the stream.
Strings definition is most similar to C and Java tradition. The biggest difference is that by themselves, there is no difference between char and string literal.
The current version supports only two kinds of quotes for the
strings:
"
and
'
[2]
.
The string could be prefixed by identifier like UTF8"Some text". The
two letter prefixes that start with the upper case and lower case
letter
Q
(Unicode code points U+0051 and U+0071) are reserved for the future
use in the Unicode support.
The lexer also supports multiline strings that could include newline character directly in the text. A multiline strings starts with triple quote character and ends with sequence of three quotes of the same type. TODO backslash semantics.
|
Numbers are borrowed from Ada programming language almost as is. There are two formats for numbers: decimal and based.
Decimal numbers are like ones in other languages. However floating
point number must have digits around
"."
char.
Bases numbers allow using any base from 2 to 36 inclusive. In based
numbers exponent is decimal number that specifies power of base on
which mantissa is multiplied. For example
2#1#E10
is floating point number
1024
.
36#10.0#E-1
is floating point number
1
.
Numbers can have underscore symbol in the mantissa. It is ignored
during evaluating value. It can be used to improve readability of
the number for example:
16#7FFF_FFFF#
.
Numbers are divided into two different classes: integers and
floating point numbers. Floating point numbers are ones that contain
'.'
in mantissa or have
exponent
specified.
Numbers can have an optional suffix that is an identifier. The
suffix cannot start from upper case or lower case letter "E" to
avoid conflict with exponent specification and it cannot start with
underscore character in order to prevent confusion with separator of
number parts. The suffix was introduced in order to support typed
numeric constants used in C and Java. For example:
36#XYZ#ul
,
36#XYZ#i32
.
|
Identifiers are quite typical. Currently identifiers must start with letter or underscore and continue with letter, underscore or digit.
|
The name of category and its definition is borrowed from prolog.
Graphics token is non empty sequence of the following characters:
~+-%^&*|<=:?!>.@/\$
. Such tokens are typically used to define operators. Note that
backslash does not have any special meaning in the graphics token.
|
There are two singleton characters that do fall in category of graphics. It is semicolon and comma.
|
This section provides aggregate view of productions of lexical level.
[1] This is not as big problem as it looks. Basic tokens like strings, identifiers, and numbers are repeated from language to language.
[2]
The character
`
is considered to be graphics, as it is used as graphics in the
most languages. The prefixed strings give ability to have a lot
different string tokens anyway.
Notion of phrase layer was initially borrowed from Dylan. Than the idea of what such syntax should do was significantly affected by Python line syntax.
On phase level a forest of a blocks and segments is built. A source is sequence of ignorable tokens and segments. A segment is a sequence of tokens and blocks terminated by semicolon. A block is sequence of segments and tokens enclosed into curly brackets.
These tokens can be ignored during parsing terms. Tokens in this category are white space, new lines, and comments (except documentation comments).
Control tokens are tokens that are designate start and end of blocks and segments. Because they are processed by this phrase parser, they can be ignored on term parser level.
These tokens are significant tokens that are parsed by term parsers. These are tokens like identifiers, numbers, and strings. Documentation tokens are also considered as significant tokens.
The for example lets consider the following text.
{ a ;}; a {b;} c; /// a a;
The text above is interpreted as the following by parser.
There are three segments on top level.
The first segment consists of one block with an ignorable white
space token and nested single segment with significant token
a
and white space token that follows this token. Semicolons and
braces are reported as control tokens.
The next segment is similar and significant and ignorable tokens in addition to block.
Last segment starts with a documentation token. That is followed by white space.
Phase grammar | |||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Grammar can be considered as mapping from source to abstract syntax tree that can be represented by the following model:
AST Object Model | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Tree should be well formed
The properties with the same name should have the same interpretation with in context of the same object type. For example simple property might not be a list property in context of the same object type. And if it contains objects, it cannot contain values in other context. It is a grammar error, if it can produce non well-formed tree. Process may fail to detect it.
AST model is only one of possible views of the object model. Such view can be useful for tools that are primary interested in significant information from the source.
This section specifies document object model from point of view of the client APIs. How this document object model is built from source code is specified in the chapter Term Layer Grammar Language . Note that if we remove all nodes except object and properties and values, what will remain will be AST model and it should also follow well-formness constraint .
This chapter specifies grammar language using EBNF and plain text. The specification also features definition of ETL grammar language using ETL grammar language itself .
Table of Contents
Parsers on term layer delimits stream of tokens from phrase parser by objects and properties. Other way to look at term parsing process is that parser maps source code to AST.
AST is assumed to consist of objects and properties. Objects types are designated by name and namespace. Properties are designated by name.
AST structure is closed related to tree produced by phrase layer. A segment sequence of on source level or block level is always described by some grammar and is described by some context.
Statement declaration in the grammar describes syntax of a single segment.
Grammar defines mapping from sequence of tokens to AST.
Top level element of the grammar is grammar object. The grammar consists of context. Each context has syntax definitions, fragments, and imports.
Syntax definitions define syntax constructs available in that context. There are two major classes for syntax definitions: statements and operators. Each statement describes syntax of one segment. Operators describe the syntax of expression. A syntax of expression forms in modular way just like in prolog. The almost entire operator level is borrowed from prolog. Only major addition is composite operators that allow more syntax constructs to appear in the grammar.
Top Level | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Context contains syntax definitions. Like grammars contexts might be included into each other and imported from each other. The context contains three kinds of definitions. The operators used to define expressions. They are composite and simple. Simple operators are just like prolog's ones. Composite allows using complex expressions in the place of the operator. So complex operators like Java new operator can be defined. Statements allow defining a statement that could happen at block level. Among other things, an expression statement might be defined. The context might also contain reusable blocks of syntax.
The context includes all definitions inherited through grammar and context include operations. The context can add new definitions to the set of already inherited definitions. It is also possible to redefine inherited definitions. Removing is not directly possible. However it is possible to create a "def" with the same as a definition in the parent context. It will remove statement or operator.
Context Definition | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
This definition specifies a statement for the context. Each segment in the source should match one of the statements. The root syntax expression in the statement must be an object creation expression.
Statement definition | |||||
---|---|---|---|---|---|
|
This definition specifies mapping for attributes in this context. There could be only attributes definition per context.
Attributes are standard prefix for all statements in this context. They allow defining constructs like Java annotation and C# attributes. The attributes behave as if they were inserted into object creation construct inside statement as first element. However because they are common for all statements, this does not cause conflict.
Because attributes are assumed as defined inside object context, they should specify property to which they are mapped.
Attributes definition | |||||
---|---|---|---|---|---|
|
This definition specifies mapping for documentation in this context. There could be only documentation definition per context.
If there is no documentation definition in the context, the documentation comments are just treated as normal line comments. So they will not be seen by parsers that look only for AST events.
Documentation definition | |||||
---|---|---|---|---|---|
|
This definition specifies a infix, prefix, or suffix operator. And it is also possible to specify primary expressions using this construct.
The associativity is specified like it is done in the Prolog
Operator definition | |||||
---|---|---|---|---|---|
|
This definition specifies a reusable fragment that could be used in statement definition and other fragment definitions.
Fragment definition | |||||
---|---|---|---|---|---|
|
ETL syntax expressions are evaluated with respect to some token stream. The expression might consume some tokens and yield some AST related events. Some expressions do both.
All token expressions consume a single token of the specified kind and yield token text as a value event.
|
Fragment definitions defined with "def" statement . Note that referenced fragment is included into definition, so it is not possible to refer fragments recursively.
The fragment reference expression has the same effect as if text of the referenced fragment definition has been written textually instead of fragment reference. The only difference is that the fragment definition uses namespace declarations from the grammar where it is defined.
|
/// String token definition def String { string(quote="\"") | string(quote='\''); }; /// This definition provides way of referencing other grammars. def GrammarRef { { @ systemId = ref(String); % public { @ publicId = ref(String); }?; } | { % public { @ publicId = ref(String); }; }; }; /// Grammar include statement Include { ^ g:GrammarInclude { % include { ref(GrammarRef); }; }; };
Operand expressions are different from most of other kinds of syntax expression. They neither consume tokens nor generate values at place where they are defined. They are used to specify properties for left and right argument of operations. This syntax expression is allowed to happen only at top level object and it should not be wrapped.
Currently it is possible to use these expressions anywhere at top level object. However, future versions of the specification might limit valid paces only to first and last statement of operation object.
|
This expression consumes documentation lines and yields them as values. This expression might happen only in context of documentation definition .
Like other expressions that directly yield value, the doclines expression supports wrappers.
|
This section provides the text of ETL grammar language defined using grammar language itself. It is expected that parses will actually use this definition (possibly stripped of comments) during parsing process. It expected that a simplified bootstrap parser will read this grammar (for example error recovery is not needed for such parser since it will only read a correct grammar), and than a compiled parser is used to parse all other grammars.
Example A.1. grammar-0_2_1.g.etl
1: // Reference ETL Parser for Java
2: // Copyright (c) 2000-2009 Constantine A Plotnikov
3: //
4: // Permission is hereby granted, free of charge, to any person
5: // obtaining a copy of this software and associated documentation
6: // files (the "Software"), to deal in the Software without restriction,
7: // including without limitation the rights to use, copy, modify, merge,
8: // publish, distribute, sublicense, and/or sell copies of the Software,
9: // and to permit persons to whom the Software is furnished to do so,
10: // subject to the following conditions:
11: //
12: // The above copyright notice and this permission notice shall be
13: // included in all copies or substantial portions of the Software.
14: //
15: // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16: // EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
17: // MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18: // NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
19: // BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
20: // ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
21: // CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22: // SOFTWARE.
23: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
24: /// This is a grammar for 0.2.1 syntax defined using 0.2.1 syntax.
25: ///
26: /// This is a definition for the grammar language itself.
27: /// This grammar is actually used for parsing other grammars.
28: /// The text of this specific grammar itself is parsed using bootstrap
29: /// parser, then the grammar is compiled using normal grammar compilation
30: /// path.
31: ///
32: /// The parsing model is AST building. The parser tries to match syntax
33: /// constructs and creates AST according to specified AST constructs.
34: /// AST is assumed to contain objects and properties. So AST is directly
35: /// mappable to object models like C#, JavaBeans, EMOF, MOF, and EMF.
36: ///
37: /// Properties are identified by name and objects are identified by
38: /// namespace URI and name. The object identification idea is borrowed from XMI
39: /// and it is even possible to generate XMI-file without prior knowledge of
40: /// metamodel.
41: ///
42: /// There are two kinds of syntax constructs: expressions and statements.
43: ///
44: /// Expression model is borrowed from prolog. Operator has been borrowed from
45: /// Prolog almost as is. Each operator has precedence and associativity.
46: /// Associativity has format "AfA" where A can be "", "x", or "y". Blank
47: /// specifies that there is no argument at this place. The "y" matches
48: /// expression of the same precedence and the "x" matches expression of lesser
49: /// precedence. For example yfx operator is "+" and "-" from C, x+y-z is parsed
50: /// as (x+y)-y. Example of xfy operator is assignment operator from C. a=b=c
51: /// is parsed as a=(b=c). The yfy operator is any associative. If .. is a yfy
52: /// operator of the same level as "+", then a + b + c .. a + b + c will be
53: /// ((a+b)+c)..((a+b)+c)
54: ///
55: /// Also "f" operators have special semantics, they appear on level 0 and
56: /// are primary operators. They do not have neither left nor right part.
57: ///
58: /// Operators can be simple or composite. Simple operators have just a token
59: /// specified. See the definition of "|" and "?" operators in this grammar.
60: /// Composite operators allow more complex syntax constructs. Composite
61: /// operators are usually used to define primary level of the grammar. However
62: /// they can be used to specify non primary operators too. Java method
63: /// invocation and array access operators are examples of this. Composite
64: /// operator can use all syntax expressions used in statements.
65: ///
66: /// Statement defines content of segment returned from term parser. The
67: /// statement is defined using generic constructs like pattern, lists, choice,
68: /// and tokens.
69: ///
70: /// @author const
71: grammar net.sf.etl.grammars.Grammar {
72: namespace default g = "http://etl.sf.net/etl/grammar/0.2.1";
73:
74: /// This abstract context contains definition used across this grammar.
75: context abstract Base {
76:
77: /// String token definition. The two type of string are understood
78: /// by the grammar language and they have the same semantics.
79: def String {
80: string(quote="\"") | string(quote='\'');
81: };
82:
83: /// Documentation mapping definition. This mapping is used
84: /// by all statements in the grammar.
85: documentation Documentation {
86: @ documentation += doclines wrapper g:DocumentationLine.text;
87: };
88:
89: /// Definition of object name expression. This reusable fragment
90: /// is used in places where object name is required. The used prefix
91: /// is defined by {@link #GrammarContent.Namespace} definition.
92: def ObjectNameDef {
93: ^ g:ObjectName {
94: @ prefix = identifier;
95: % :;
96: @ name = identifier;
97: };
98: };
99:
100: /// Definition for wrapper section fragment. This fragment is attached
101: /// to syntax expressions that match and produce individual tokens.
102: /// The fragment specification causes this token to be wrapped into
103: /// into the specified object and property.
104: def WrapperDef {
105: % wrapper {
106: ref(WrapperObject);
107: };
108: };
109:
110: /// Definition for wrapper specification fragment. Wrapper is
111: /// is usually attached to tokens. When token matches, its value
112: /// wrapped into specified object and property.
113: def WrapperObject {
114: ^ g:Wrapper {
115: @ object = ref(ObjectNameDef);
116: % .;
117: @ property = identifier;
118: };
119: };
120: };
121:
122:
123:
124: /// This is base mapping syntax context. Mapping context might
125: /// contain blank statements and let statements.
126: context abstract BaseMappingSyntax {
127: include Base;
128:
129: /// Let statement. It is used to define mapping from syntax
130: /// to property of the object. The statement matches expression
131: /// after "=" or "+=" and yields property assignment. All objects
132: /// or values that are encountered in the property are assigned
133: /// to property of top object specified by name property.
134: ///
135: /// The "+=" version assumes list property, the "=" version assumes
136: /// property with upper multiplicity equal to 1.
137: /// {@example #CompositeOperatorSyntax
138: /// @ name = identifier; // match single identifier
139: /// // match non-empty sequence of numbers separated by comma
140: /// @ numbers += list , { integer | float;};
141: /// }
142: statement Let {
143: % @;
144: @ name = identifier;
145: @ operator = token(+=) | token(=);
146: @ expression = expression;
147: };
148:
149: /// This is blank statement. It is used to attach attributes
150: /// and documentation comments. This is may be used for example
151: /// for attaching annotations after last statement.
152: statement BlankSyntaxStatement {
153: };
154: };
155:
156: /// This is base syntax context. This context might contain object
157: /// specification in addition to let statement.
158: context abstract BaseSyntax {
159: include BaseMappingSyntax;
160:
161: /// Utility definition used in different parts of the syntax
162: /// It allows to specify a block consisting of syntax statements.
163: /// It matches the specified statements in the sequence.
164: /// {@example #CompositeSyntax
165: /// {% let; @name = identifier; % =; @value = expression;};
166: /// }
167: def SequenceDef {
168: ^ g:Sequence {
169: @ syntax += block;
170: };
171: };
172:
173: /// Object expression. It is used to specify context of parsing.
174: ///
175: /// The expression matches its content, and creates an context object
176: /// all properties that are directly or indirectly specified in the
177: /// content will be assumed to be specified in context of this object
178: /// unless new object directive is encountered.
179: ///
180: /// It is an error to specify value or object generators inside object
181: /// without property layer.
182: /// {@example #CompositeSyntax
183: /// ^ t:Ref {% ref % (; @name = identifier; % );};
184: /// }
185: op composite ObjectOp(f) {
186: % ^;
187: @ name = ref(ObjectNameDef);
188: @ syntax = ref(SequenceDef);
189: };
190: };
191:
192:
193: /// This is base syntax context.
194: context abstract BaseCompositeSyntax {
195: include BaseSyntax;
196:
197: /// Expression statement. This statement is a container for expression.
198: /// The statement has the same semantics as expression contained in it.
199: statement ExpressionStatement {
200: @ syntax = expression;
201: };
202: };
203:
204:
205: /// This syntax is used inside documentation statement.
206: context DocumentationSyntax {
207: include BaseMappingSyntax;
208: /// This is doclines expression. It matches sequence of documentation lines.
209: /// {@example #DocumentationSyntax;
210: /// @ documentation += doclines wrapper xj:DocumentationLine.text;
211: /// };
212: op composite DoclinesOp(f) {
213: % doclines;
214: @ wrapper = ref(WrapperDef)?;
215: };
216: };
217:
218: /// This context specifies syntax for simple operations
219: context SimpleOpSyntax {
220: include BaseCompositeSyntax;
221:
222: /// This expression matches left operand in expression. It is used
223: /// in let expression to specify property to which left operand
224: /// of operator should be assigned.
225: /// {@example #ContextContent
226: /// op Minus(500, yfx, -) {
227: /// @ minuend = left;
228: /// @ subtrahend = right;
229: /// };
230: /// };
231: op composite Left(f) {
232: ^ g:OperandOp {
233: @ position = token(left);
234: };
235: };
236:
237: /// This expression matches right operand in expression. It is used
238: /// in let expression to specify property to which right operand
239: /// of operator should be assigned.
240: op composite Right(f) {
241: ^ g:OperandOp {
242: @ position = token(right);
243: };
244: };
245: };
246:
247: /// This context contains definition of primitive syntax operators
248: context abstract CompositeOperatorsSyntax {
249: include BaseCompositeSyntax;
250:
251: /// Choice operator. It matches one of two alternatives. It is
252: /// an error if both alternatives match an empty sequence or
253: /// or might start with the same token. Note that it is not error
254: /// if one alternative starts with generic token kind (for example
255: /// string quoted with double quote, and another one starts with
256: /// specific token like token "my string".
257: op ChoiceOp(xfy,300,|) {
258: @ options += left; @ options += right;
259: };
260:
261: /// First choice operator. It tries to match the first
262: /// alternative than the second one. This operator never
263: /// produces conflicts even if the second alternative matches
264: /// the first one.
265: op FirstChoiceOp(xfy,200,/) {
266: @ first = left; @ second = right;
267: };
268:
269: /// This operator matches empty sequence of tokens or its operand.
270: op OptionalOp(yf,100,?) {
271: @ syntax = left;
272: };
273:
274: /// This operation matches non empty sequence of specified operand.
275: op OneOrMoreOp(yf,100,+) {
276: @ syntax = left;
277: };
278:
279: /// This operation is composition of optional and one of more operators.
280: op ZeroOrMoreOp(yf,100,*) {
281: @ syntax = left;
282: };
283:
284: };
285:
286: /// This context defines expressions that might happen in context
287: /// of modifiers expressions.
288: context ModifiersSyntax {
289: include BaseMappingSyntax;
290:
291: /// This is modifier specification. It can contain optional wrapper.
292: op composite ModifierOp(f) {
293: % modifier;
294: @ value = token;
295: @ wrapper = ref(WrapperDef)?;
296: };
297: };
298:
299: /// Free form composite syntax
300: context CompositeSyntax {
301: include CompositeOperatorsSyntax;
302:
303: /// A keyword definition statement. It could happen only
304: /// as part of {@link #PatternOp}
305: def KeywordStmtDef {
306: ^ g:KeywordStatement {
307: % % {
308: @ text = token;
309: };
310: };
311: };
312:
313:
314: /// This is a sequence of keywords and blocks separated by white spaces.
315: /// It is used to define literal syntax patterns in the grammar. Keywords
316: /// are just parsed and are not reported to the parser. Contents of the
317: /// blocks is a sequence of syntax expressions and it is passed through
318: /// to the root sequence. Note that two blocks must be separated by one
319: /// or more keyword.
320: op composite PatternOp(f) {
321: ^ g:Sequence {
322: @ syntax += {
323: {
324: ref(KeywordStmtDef);
325: block?;
326: }+ | {
327: block;
328: {
329: ref(KeywordStmtDef);
330: block?;
331: }*;
332: };
333: };
334: };
335: };
336:
337: /// Reference to definition in this context or in included context.
338: /// The expression is replaced with content of original definition.
339: /// Recursion is not allowed to be created using references.
340: op composite RefOp(f) {
341: % ref % ( {
342: @ name = identifier;
343: } % ) ;
344: };
345:
346: /// Block reference. The statement matches block that that contains
347: /// statements of the specified context. If no context is specified,
348: /// reference to current context is assumed. Block produces possibly
349: /// empty sequence of objects. And it should happen in context of
350: /// of list property.
351: op composite BlockRef(f) {
352: % block;
353: % ( {
354: @ context = identifier ;
355: } % ) ?;
356: };
357:
358: /// This is reusable fragment used to specify expression precedence
359: def ExpressionPrecedenceDef {
360: % precedence % = {
361: @ precedence = integer;
362: };
363: };
364: /// Expression reference. This reference matches expression from
365: /// specified context and of specified precedence. If context is omitted,
366: /// current context is assumed. The expression production always
367: /// produces a single object as result if parsing is successful.
368: op composite ExpressionRef(f) {
369: % expression;
370: % ( {
371: {
372: ref(ExpressionPrecedenceDef);
373: } | {
374: @ context = identifier;
375: % , {
376: ref(ExpressionPrecedenceDef);
377: }?;
378: };
379: } % ) ?;
380: };
381:
382:
383: /// This construct matches sequence separated by the specified
384: /// separator. This construct is just useful shortcut. The separator
385: /// can be any specific token. The expression
386: /// {@example #CompositeSyntax
387: /// list , {
388: /// ref(Something);
389: /// };
390: /// }
391: /// is equivalent to
392: /// {@example #CompositeSyntax
393: /// {
394: /// ref(Something);
395: /// % , {
396: /// ref(Something);
397: /// }*;
398: /// };
399: /// }
400: op composite ListOp(f) {
401: % list {
402: @ separator = token;
403: @ syntax = ref(SequenceDef);
404: };
405: };
406:
407:
408: /// This construct matches set of modifiers. This construct
409: /// matches any number or modifiers in any order. Each modifier
410: /// matches and produces its text as a value. Wrapper specified
411: /// for modifiers construct applies to all modifiers inside it
412: /// unless overridden by modifier.
413: op composite ModifiersOp(f) {
414: % modifiers;
415: @ wrapper = ref(WrapperDef)?;
416: @ modifiers += block(ModifiersSyntax);
417: };
418:
419: /// This construct matches any token or token specified in brackets.
420: /// It produces a value of its text. If no token is specified,
421: /// the construct matches any significant token with exception of
422: /// documentation comment. See this grammar for numerous examples of its
423: /// usage (including this definition).
424: ///
425: /// Optional wrapper causes wrapping value produced by this expression
426: /// into specified wrapper.
427: op composite TokenOp(f) {
428: % token {
429: % ( {
430: @ value = token;
431: } % ) ?;
432: };
433: @ wrapper = ref(WrapperDef)?;
434: };
435: /// This operator matches string with specified quote kind.
436: /// The quote must be specified. The operator produces matched text
437: /// as a value.
438: ///
439: /// The operator optionally supports prefixed and multiline strings.
440: /// Only strings that match the specific prefix could be specified.
441: ///
442: /// Optional wrapper causes wrapping value produced by this expression
443: /// into specified wrapper.
444: op composite StringOp(f) {
445: % string % ( {
446: % prefix % = {
447: @ prefix += list | {
448: identifier;
449: };
450: } % , ?;
451: % quote % = {
452: @ quote = ref(String);
453: };
454: % , % multiline % = {
455: @ multiline = token(true);
456: }?;
457: } % );
458: @ wrapper = ref(WrapperDef)?;
459: };
460:
461: /// This operator matches any identifier. The operator produces matched text
462: /// as a value.
463: ///
464: /// Optional wrapper causes wrapping value produced by this expression
465: /// into specified wrapper.
466: op composite IdentifierOp(f) {
467: % identifier;
468: @ wrapper = ref(WrapperDef)?;
469: };
470:
471:
472: /// This operator matches integer without suffix or with specified suffix
473: /// The operator produces matched text as a value.
474: ///
475: /// Optional wrapper causes wrapping value produced by this expression
476: /// into specified wrapper.
477: op composite IntegerOp(f) {
478: % integer {
479: % ( {
480: % suffix % = {
481: @ suffix += list | {
482: identifier;
483: };
484: }?;
485: } % ) ?;
486: @ wrapper = ref(WrapperDef)?;
487: };
488: };
489:
490:
491: /// This operator matches float without suffix or with specified suffix.
492: /// The operator produces matched text as a value.
493: ///
494: /// Optional wrapper causes wrapping value produced by this expression
495: /// into specified wrapper.
496: op composite FloatOp(f) {
497: % float;
498: % ( {
499: % suffix % = {
500: @ suffix += list | {
501: identifier;
502: };
503: }? ;
504: } % ) ?;
505: @ wrapper = ref(WrapperDef)?;
506: };
507:
508:
509: /// This operator matches any graphics token.
510: /// The operator produces matched text as a value.
511: ///
512: /// Optional wrapper causes wrapping value produced by this expression
513: /// into specified wrapper.
514: op composite GraphicsOp(f) {
515: % graphics;
516: @ wrapper = ref(WrapperDef)?;
517: };
518:
519: };
520:
521: /// Composite operator syntax.
522: /// Note that this definition is oversimplified. There are additional
523: /// constraint that "left" and "right" expression might happen only on top
524: /// level. The construct will be possibly adjusted later.
525: context CompositeOpSyntax {
526: include SimpleOpSyntax;
527: include CompositeSyntax;
528: };
529:
530: /// This context defines content of context statement. So it defines itself.
531: context ContextContent {
532: include Base;
533:
534: /// This is blank statement. It is used to attach attributes
535: /// and documentation comments.
536: statement BlankContextStatement {
537: };
538:
539:
540: /// Operator associativity definition. It matches any valid
541: /// associativity.
542: def OpAssociativity {
543: token(f) | token(xf) | token(yf) |token(xfy) |
544: token(xfx) |token(yfx) |token(fx) | token(fy) | token(yfy);
545: };
546:
547:
548: /// Operator definition. There are two kinds of operators - simple
549: /// composite.
550: ///
551: /// If the operator definition does not contain a single object creation
552: /// expression it is assumed to have a content wrapped in the object
553: /// creation expression with default namespace and operator name as an
554: /// object name.
555: statement OperatorDefinition {
556: % op;
557: modifiers wrapper g:Modifier.value {
558: @ isComposite = modifier composite;
559: };
560: @ name = identifier;
561: % ( {
562: @ associativity = ref(OpAssociativity);
563: % , {
564: @ precedence = integer;
565: % , {
566: @ text = token;
567: } % ) {
568: @ syntax += block(SimpleOpSyntax);
569: } | % ) {
570: @ syntax += block(CompositeOpSyntax);
571: };
572: } | % ) {
573: @ syntax += block(CompositeOpSyntax);
574: };
575: };
576: };
577:
578: /// Attributes definition. Attributes can be applied only to
579: /// statements. To apply them to expressions, define an composite
580: /// operator that uses the same syntax. Such operator and attributes
581: /// declaration can share syntax through def statement.
582: statement Attributes {
583: % attributes;
584: @ name = identifier;
585: @ syntax += block(CompositeSyntax);
586: };
587:
588: /// Statement definition. Statement attempts to match entire segment.
589: /// If statement matches part of segment and there are some
590: /// unmatched significant tokens left, it is a syntax error.
591: ///
592: /// If the statement definition does not contain a single object creation
593: /// expression it is assumed to have a content wrapped in the object
594: /// creation expression with default namespace and statement name as an
595: /// object name.
596: statement Statement {
597: % statement;
598: @ name = identifier;
599: @ syntax += block(CompositeSyntax);
600: };
601:
602: /// Documentation syntax. It matches documentation comments before
603: /// start of grammar. The definition is used to specify property
604: /// where documentation is put.
605: statement DocumentationSyntax {
606: % documentation;
607: @ name = identifier;
608: @ syntax += block(DocumentationSyntax);
609: };
610:
611: /// A fragment definition. It is used to define reusable parts of the
612: /// syntax. References to definitions are replaced with content of the
613: /// definition, so it is an error for definition to refer to itself
614: /// through ref construct.
615: statement Def {
616: % def;
617: @ name = identifier;
618: @ syntax += block(CompositeSyntax);
619: };
620:
621: /// Include operation cause all definitions except redefined
622: /// to be included in this context. It is an error if two definitions
623: /// are available using different paths. If wrapper chain is specified
624: /// The statements will be wrapped into the specified chain.
625: statement ContextInclude {
626: % include;
627: @ contextName = identifier;
628: @ wrappers += % wrapper {
629: list / {
630: ref(WrapperObject);
631: };
632: }?;
633: };
634:
635: /// Import operation makes context referenceable from this context or
636: /// allows redefinition of context reference.
637: statement ContextImport {
638: % import;
639: @ localName = identifier;
640: % = {
641: @ contextName = identifier;
642: % from {
643: @ grammarName = identifier;
644: }?;
645: };
646: };
647: };
648:
649: /// This context defines grammar content.
650: context GrammarContent {
651: include Base;
652:
653: /// This definition provides way of referencing other grammars.
654: def GrammarRef {
655: {
656: @ systemId = ref(String);
657: % public {
658: @ publicId = ref(String);
659: }?;
660: } | {
661: % public {
662: @ publicId = ref(String);
663: };
664: };
665: };
666:
667: /// This is blank statement. It is used to attach attributes
668: /// and documentation comments.
669: statement BlankGrammarStatement {
670: };
671:
672:
673: /// This is an include statement. Include causes all context from
674: /// included grammar to be added to current grammar. The definitions
675: /// from grammar include are added only if current grammar does not
676: /// have definitions with the same name.
677: ///
678: /// Grammar imports and context imports also follow this inclusion rule.
679: /// It is an error to include two different non-shadowed definitions by
680: /// different include paths.
681: statement GrammarInclude {
682: % include;
683: ref(GrammarRef);
684: };
685:
686: /// This is grammar import statement. A statement allows contexts of this
687: /// grammar to import context from specified grammar.
688: statement GrammarImport {
689: % import {
690: @ name = identifier;
691: } % = {
692: ref(GrammarRef);
693: };
694: };
695:
696:
697: /// Namespace declaration is used to declare namespace prefix. The
698: /// prefix declaration is local to grammar and is not inherited in
699: /// the case of grammar include.
700: ///
701: /// The namespace can have a default modifier. This namespace will
702: /// be used along with operator or statement name in case when
703: /// there are several children in the definition or when the only
704: /// child is not an object creation expression.
705: statement Namespace {
706: % namespace;
707: modifiers wrapper g:Modifier.value {
708: @ defaultModifier = modifier default;
709: };
710: @ prefix = identifier;
711: % = ;
712: @ uri = ref(String);
713: };
714:
715: /// Context definition. This definition is used to define context.
716: /// Context may be default and abstract. Abstract contexts
717: /// cannot be used for parsing and are used only in context include.
718: /// Abstract contexts may be imported only by abstract contexts.
719: ///
720: /// Default context is a context that used to parse source when
721: /// no context is specified in doctype.
722: statement Context {
723: % context;
724: modifiers wrapper g:Modifier.value {
725: @ abstractModifier = modifier abstract;
726: @ defaultModifier = modifier default;
727: };
728: @ name = identifier;
729: @ content += block(ContextContent);
730: };
731: };
732:
733: /// This context contains definition of grammar construct itself
734: context default GrammarSource {
735: include Base;
736:
737: /// This is blank statement. It is used to attach attributes
738: /// and documentation comments. It is ignored during grammar
739: /// compilation.
740: statement BlankTopLevel {
741: };
742:
743: /// Grammar statement. It defines grammar. Grammar name is purely
744: /// informative and is used in reported events to identify grammar
745: /// by logical name rather by URI that happens to be current grammar
746: /// location.
747: ///
748: /// Grammar can be abstract; in that case it cannot be instantiated
749: /// and referenced from doctype. It can be only included into other
750: /// grammars.
751: statement Grammar {
752: % grammar;
753: modifiers wrapper g:Modifier.value {
754: @ abstractModifier = modifier abstract;
755: };
756: @ name += list . {identifier;};
757: @ content += block(GrammarContent);
758: };
759: };
760: };
This section provides the grammar for the doctype directive. This grammar is normally hardcoded in the parsers, but it should generate events as it is specified in this section.
Example B.1. doctype.g.etl
1: // Reference ETL Parser for Java
2: // Copyright (c) 2000-2007 Constantine A Plotnikov
3: //
4: // Permission is hereby granted, free of charge, to any person
5: // obtaining a copy of this software and associated documentation
6: // files (the "Software"), to deal in the Software without restriction,
7: // including without limitation the rights to use, copy, modify, merge,
8: // publish, distribute, sublicense, and/or sell copies of the Software,
9: // and to permit persons to whom the Software is furnished to do so,
10: // subject to the following conditions:
11: //
12: // The above copyright notice and this permission notice shall be
13: // included in all copies or substantial portions of the Software.
14: //
15: // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16: // EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
17: // MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18: // NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
19: // BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
20: // ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
21: // CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22: // SOFTWARE.
23: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
24:
25: /// This is a grammar for doctype declaration. The doctype can be encountered
26: /// as the first statement in the source code of ELT-based language. This
27: /// grammar is hard-coded in the parser for obvious reasons. So this file is
28: /// for information only.
29: ///
30: /// Note that this grammar does not support documentation comments. The mapping
31: /// for these comments differs between different contexts and there is no
32: /// universal mapping that is suitable for all.
33: ///
34: /// <author>const</author>
35: grammar net.sf.etl.grammars.DoctypeDeclaration {
36: namespace dc = "http://etl.sf.net/etl/doctype/0.2.1";
37:
38: /// This is the only context in the grammar
39: context default DoctypeContext {
40:
41: /// A definition for string used in the grammar. Two kinds of string are allowed.
42: /// <example>
43: /// 'aaa'
44: /// "aaa"
45: /// </example>
46: def String {
47: string(quote="\"") | string(quote='\'');
48: };
49:
50: /// A doctype statement that declares grammar associated
51: /// with the file. The doctype statement is an obvious rip-off of XML doctype.
52: /// Inline grammar is not supported yet.
53: ///
54: /// System identifier or public identifier or both might be used.
55: /// <example>
56: /// doctype public "-//IDN etl.sf.net/ETL/Grammar 0.2";
57: /// doctype "http://etl.sf.net/2005/etl/grammar.g.etl" public '-//IDN etl.sf.net/ETL/Grammar 0.2';
58: /// doctype 'mygrammar.g.etl';
59: /// </example>
60: statement DoctypeStatement {
61: ^ dc:DoctypeDeclaration {
62: % doctype {
63: {
64: @ systemId = ref(String);
65: % public {
66: @ publicId = ref(String);
67: }?;
68: } | {
69: % public {
70: @ publicId = ref(String);
71: };
72: };
73: % context {
74: @ context = ref(String);
75: }?;
76: };
77: };
78: };
79: };
80: };
The grammar specified in this section should be used for parsing sources if the grammar is not available or was compiled with errors. This grammar is able to parse any source without syntax errors.
Example C.1. default.g.etl
1: // Reference ETL Parser for Java
2: // Copyright (c) 2000-2009 Constantine A Plotnikov
3: //
4: // Permission is hereby granted, free of charge, to any person
5: // obtaining a copy of this software and associated documentation
6: // files (the "Software"), to deal in the Software without restriction,
7: // including without limitation the rights to use, copy, modify, merge,
8: // publish, distribute, sublicense, and/or sell copies of the Software,
9: // and to permit persons to whom the Software is furnished to do so,
10: // subject to the following conditions:
11: //
12: // The above copyright notice and this permission notice shall be
13: // included in all copies or substantial portions of the Software.
14: //
15: // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16: // EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
17: // MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18: // NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
19: // BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
20: // ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
21: // CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22: // SOFTWARE.
23: doctype public "-//IDN etl.sf.net//ETL//Grammar 0.2.1";
24:
25: /// This is a default grammar. It is used if one of the following happened:
26: /// <ul>
27: /// <li>Doctype directive is missing or it has invalid syntax and default
28: /// grammar is not specified for parser.</li>
29: /// <li>Grammar referenced by doctype statement cannot be located.</li>
30: /// <li>Grammar is located but failed to be parsed because of IO error
31: /// or it is invalid (some syntax or semantic errors).</li>
32: /// </ul>
33: ///
34: /// Note that this grammar is hard-coded and it is provided here just for
35: /// informational purposes.
36: ///
37: /// <author>const</author>
38: grammar net.sf.etl.grammars.DefaultGrammar {
39: namespace default d = "http://etl.sf.net/etl/default/0.2.1";
40:
41: /// The only context in this grammar
42: context default DefaultContext {
43:
44: /// Documentation mapping definition.
45: documentation DefaultDocumentation {
46: @ documentation += doclines wrapper d:DefaultDocumentationLine.text;
47: };
48:
49: /// Default statement that matches anything
50: statement DefaultStatement {
51: @ content += {
52: {
53: ^ d:DefaultBlock { @ content += block; };
54: } | {
55: ^ d:DefaultTokens { @ values += token+; };
56: };
57: }*;
58: };
59: };
60: };