Grammar'' and a syntax
extension file ``pa_extend.cmo'' to program grammars. All
details on modules, types and functions are described
chapter 7.Grammar.create''. It takes a lexer as parameter. A good
candidate it the function ``Plexer.make'' but the user can
creates its own lexer, providing that it is of type
``Token.lexer''.Grammar.Entry.create''. The first parameter is the grammar,
the second one is a string used to name the entry in error messages
and in entries dump. Entries are created empty, i.e. raising
``Stream.Error'' when called. An entry is composed of entry
precedence levels, the first one being the least precedent and the
last one the most.Grammar.Entry.parse''. In case of syntax error, the
exception ``Stream.Error'' is raised, encapsulated by
the exception ``Stdpp.Exc_located'', giving the location
of the error.Grammar.extend''. But its interface being quite complicated
and, as it must be used with appropriate type constraints, the Camlp4
library provides a file, named ``pa_extend.cmo'', compatible
with ``pa_o.cmo'' and ``pa_r.cmo'' which creates a new
instruction doing this work.EXTEND'' which has the following
format:EXTEND |
{ GLOBAL : global-list ; } |
| entry : { position } extension ; |
| ... |
| entry : { position } extension ; |
END |
EXTEND, GLOBAL and END are keywords.
There are some other keywords in this instruction, all in uppercase.EXTEND''
instruction. The other entries are created locally. By default, all
entries are global and must correspond to entry variables visible at
the ``EXTEND'' instruction point.FIRST: The extension is inserted at the beginning
of the precedence levels.
LAST: The extension is inserted as the end of the
precedence levels.
BEFORE label: The extension is inserted
before the precedence level so labelled.
AFTER label: The extension is inserted
after the precedence level so labelled.
LEVEL label: The extension is inserted
at the precedence level so labelled.
LEVEL extends already existing levels: the other
cases create new levels.[ |
{ label } { associativity } level-rules | |
| |
... | |
| |
{ label } { associativity } level-rules | ] |
LEFTA, RIGHTA or NONA for respectively left,
right and no associativity: the default is left associativity.[ |
{ pattern = } symbol ; ... { pattern
= } symbol { -> action } |
|
| |
... | |
| |
{ pattern = } symbol ; ... { pattern
= } symbol { -> action } |
] |
loc'' is bound to the source location of the rule.
The action part is optional; by default it is the value ``()''.string.
LIST0 and LIST1 whose syntax is:
LISTx symbol1 { SEP symbol2 }
LIST0 and with at least
one element for LIST1) of symbol1, whose
elements are optionally separated by symbol2.
The type is t1 list where t1 is the type of symbol1 (the result of the optional symbol2 is lost).
OPT followed by a symbol, meaning this symbol or
nothing. The type is t option where t is the type of
symbol.
Grammar.delete_rule''. But, like for
``Grammar.extend'', it is not documented. One must use the
instruction ``DELETE_RULE'', generating a call to this
function. This instruction is a syntax extension, loaded together with
the instruction ``EXTEND'' by the file
``pa_extend.cmo''.DELETE_RULE'' is:DELETE_RULE |
| entry : symbol ; ... symbol |
END |
Token.t which is
actually (string * string), the first string being a
constructor (which must be an identifier starting with an uppercase
letter) and the second string the value.IDENT and it put the
identifier value as second string. For example, reading "foo",
it returns: ("IDENT", "foo"). Another example, if your lexer
read integers, you can use INT as constructor and the string
representation of the integer in the string, e.g. ("INT", "32").EXTEND statement, you can use as symbol a
constructor with a specific value, e.g:
IDENT "bar"
INT "32"
which recognizes only the identifier "bar" or only the integer
32. Another possible symbol is the constructor alone, which recognizes
any value of this constructor. It is useful to assign it to a pattern
identifier, to use it in the action part of the rule:
p = IDENT
i = INT
Notice that you can use any name for your constructors, provided they
are identifiers starting with an uppercase letter, and not in conflict
with the predefined symbols in the EXTEND statement which are:
SELF, NEXT, LIST0, LIST1, OPT.""
(the empty string) as constructor, you can use the second string
directly in the rules. It is the case in our examples for the
operators "+", "*" and so on.(Token.t * Token.location) from a character stream. The type
Token.t is defined above and Token.location is a couple
of integers giving the input location (the first character of the input
stream having position 0).(0, 0) as
location; it does not prevent the system to work, since it is
used only in error messages.Token provides a function to create a lexer function from an
ocamllex lexing function (see the interface of module
Token). Moreover, this function takes care of the location
stuff.Grammar.create function is a record
of type Token.lexer which is a record with the following fields:func: it is the main lexing function. It is called once
when calling Grammar.Entry.parse with the input character
stream. The simplest way to create this function is to apply
Token.lexer_func_of_parser to your stream parser lexer, or
Token.lexer_func_of_ocamllex to your ocamllex lexing function.using: it is a function taking a token pattern as
parameter. When EXTEND statement first scans all symbols in all
rules, it calls this function with the token patterns encountered (a
token pattern is of type (string * string)). This function
allows you to check the pattern (that the constructor is among the
ones the lexer generates: raise an exception if not) and enter
keywords (if you have keywords) in your keyword table (if you use a
keyword table).removing: like using but used when a rule is
removed.tparse: tells how a token pattern is parsed. Called each
time the grammar machinery has to compare the input token to a token
pattern of a rule. This function must return Some specific
parser or None for the standard token parsing, which is,
for the pattern (p_con, p_prm):
if p_prm = "" then parser [< '(con, prm) when con = p_con >] -> prm
else parser [< '(con, prm) when con = p_con && prm = p_prm >] -> prm
text: give the name of a token pattern to be used in
error messages. You can use the function named lexer_text
provided in the module Token.