228 lines
8.0 KiB
Markdown
228 lines
8.0 KiB
Markdown
This directory contains data needed by Bison.
|
|
|
|
# Directory Content
|
|
## Skeletons
|
|
Bison skeletons: the general shapes of the different parser kinds, that are
|
|
specialized for specific grammars by the bison program.
|
|
|
|
Currently, the supported skeletons are:
|
|
|
|
- yacc.c
|
|
It used to be named bison.simple: it corresponds to C Yacc
|
|
compatible LALR(1) parsers.
|
|
|
|
- lalr1.cc
|
|
Produces a C++ parser class.
|
|
|
|
- lalr1.java
|
|
Produces a Java parser class.
|
|
|
|
- glr.c
|
|
A Generalized LR C parser based on Bison's LALR(1) tables.
|
|
|
|
- glr.cc
|
|
A Generalized LR C++ parser. Actually a C++ wrapper around glr.c.
|
|
|
|
These skeletons are the only ones supported by the Bison team. Because the
|
|
interface between skeletons and the bison program is not finished, *we are
|
|
not bound to it*. In particular, Bison is not mature enough for us to
|
|
consider that "foreign skeletons" are supported.
|
|
|
|
## m4sugar
|
|
This directory contains M4sugar, sort of an extended library for M4, which
|
|
is used by Bison to instantiate the skeletons.
|
|
|
|
## xslt
|
|
This directory contains XSLT programs that transform Bison's XML output into
|
|
various formats.
|
|
|
|
- bison.xsl
|
|
A library of routines used by the other XSLT programs.
|
|
|
|
- xml2dot.xsl
|
|
Conversion into GraphViz's dot format.
|
|
|
|
- xml2text.xsl
|
|
Conversion into text.
|
|
|
|
- xml2xhtml.xsl
|
|
Conversion into XHTML.
|
|
|
|
# Implementation Notes About the Skeletons
|
|
|
|
"Skeleton" in Bison parlance means "backend": a skeleton is fed by the bison
|
|
executable with LR tables, facts about the symbols, etc. and they generate
|
|
the output (say parser.cc, parser.hh, location.hh, etc.). They are only in
|
|
charge of generating the parser and its auxiliary files, they do not
|
|
generate the XML output, the parser.output reports, nor the graphical
|
|
rendering.
|
|
|
|
The bits of information passing from bison to the backend is named
|
|
"muscles". Muscles are passed to M4 via its standard input: it's a set of
|
|
m4 definitions. To see them, use `--trace=muscles`.
|
|
|
|
Except for muscles, whose names are generated by bison, the skeletons have
|
|
no constraint at all on the macro names: there is no technical/theoretical
|
|
limitation, as long as you generate the output, you can do what you want.
|
|
However, of course, that would be a bad idea if, say, the C and C++
|
|
skeletons used different approaches and had completely different
|
|
implementations. That would be a maintenance nightmare.
|
|
|
|
Below, we document some of the macros that we use in several of the
|
|
skeletons. If you are to write a new skeleton, please, implement them for
|
|
your language. Overall, be sure to follow the same patterns as the existing
|
|
skeletons.
|
|
|
|
## Vocabulary
|
|
|
|
We use "formal arguments", or "formals" for short, to denote the declared
|
|
parameters of a function (e.g., `int argc, const char **argv`). Yes, this
|
|
is somewhat contradictory with `param` in the `%param` directives.
|
|
|
|
We use "effective arguments", or "args" for short, to denote the values
|
|
passed in function calls (e.g., `argc, argv`).
|
|
|
|
## Symbols
|
|
|
|
### `b4_symbol(NUM, FIELD)`
|
|
In order to unify the handling of the various aspects of symbols (tag, type
|
|
name, whether terminal, etc.), bison.exe defines one macro per (token,
|
|
field), where field can `has_id`, `id`, etc.: see
|
|
`prepare_symbol_definitions()` in `src/output.c`.
|
|
|
|
NUM can be:
|
|
- `empty` to denote the "empty" pseudo-symbol when it exists,
|
|
- `eof`, `error`, or `undef`
|
|
- a symbol number.
|
|
|
|
FIELD can be:
|
|
|
|
- `has_id`: 0 or 1
|
|
Whether the symbol has an `id`.
|
|
|
|
- `id`: string (e.g., `exp`, `NUM`, or `TOK_NUM` with api.token.prefix)
|
|
If `has_id`, the name of the token kind (prefixed by api.token.prefix if
|
|
defined), otherwise empty. Guaranteed to be usable as a C identifier.
|
|
This is used to define the token kind (i.e., the enum used by the return
|
|
value of yylex). Should be named `token_kind`.
|
|
|
|
- `tag`: string
|
|
A human readable representation of the symbol. Can be `'foo'`,
|
|
`'foo.id'`, `'"foo"'` etc.
|
|
|
|
- `code`: integer
|
|
The token code associated to the token kind `id`.
|
|
The external number as used by yylex. Can be ASCII code when a character,
|
|
some number chosen by bison, or some user number in the case of `%token
|
|
FOO <NUM>`. Corresponds to `yychar` in `yacc.c`.
|
|
|
|
- `is_token`: 0 or 1
|
|
Whether this is a terminal symbol.
|
|
|
|
- `kind_base`: string (e.g., `YYSYMBOL_exp`, `YYSYMBOL_NUM`)
|
|
The base of the symbol kind, i.e., the enumerator of this symbol (token or
|
|
nonterminal) which is mapped to its `number`.
|
|
|
|
- `kind`: string
|
|
Same as `kind_base`, but possibly with a prefix in some languages. E.g.,
|
|
EOF's `kind_base` and `kind` are `YYSYMBOL_YYEOF` in C, but are
|
|
`S_YYEMPTY` and `symbol_kind::S_YYEMPTY` in C++.
|
|
|
|
- `number`: integer
|
|
The code associated to the `kind`.
|
|
The internal number (computed from the external number by yytranslate).
|
|
Corresponds to yytoken in yacc.c. This is the same number that serves as
|
|
key in b4_symbol(NUM, FIELD).
|
|
|
|
In bison, symbols are first assigned increasing numbers in order of
|
|
appearance (but tokens first, then nterms). After grammar reduction,
|
|
unused nterms are then renumbered to appear last (i.e., first tokens, then
|
|
used nterms and finally unused nterms). This final number NUM is the one
|
|
contained in this field, and it is the one used as key in `b4_symbol(NUM,
|
|
FIELD)`.
|
|
|
|
The code of the rule actions, however, is emitted before we know what
|
|
symbols are unused, so they use the original numbers. To avoid confusion,
|
|
they actually use "orig NUM" instead of just "NUM". bison also emits
|
|
definitions for `b4_symbol(orig NUM, number)` that map from original
|
|
numbers to the new ones. `b4_symbol` actually resolves `orig NUM` in the
|
|
other case, i.e., `b4_symbol(orig 42, tag)` would return the tag of the
|
|
symbols whose original number was 42.
|
|
|
|
- `has_type`: 0, 1
|
|
Whether has a semantic value.
|
|
|
|
- `type_tag`: string
|
|
When api.value.type=union, the generated name for the union member.
|
|
yytype_INT etc. for symbols that has_id, otherwise yytype_1 etc.
|
|
|
|
- `type`: string
|
|
If it has a semantic value, its type tag, or, if variant are used,
|
|
its type.
|
|
In the case of api.value.type=union, type is the real type (e.g. int).
|
|
|
|
- `slot`: string
|
|
If it has a semantic value, the name of the union member (i.e., bounces to
|
|
either `type_tag` or `type`). It would be better to fix our mess and
|
|
always use `type` for the true type of the member, and `type_tag` for the
|
|
name of the union member.
|
|
|
|
- `has_printer`: 0, 1
|
|
- `printer`: string
|
|
- `printer_file`: string
|
|
- `printer_line`: integer
|
|
- `printer_loc`: location
|
|
If the symbol has a printer, everything about it.
|
|
|
|
- `has_destructor`, `destructor`, `destructor_file`, `destructor_line`, `destructor_loc`
|
|
Likewise.
|
|
|
|
### `b4_symbol_value(VAL, [SYMBOL-NUM], [TYPE-TAG])`
|
|
Expansion of $$, $1, $<TYPE-TAG>3, etc.
|
|
|
|
The semantic value from a given VAL.
|
|
- `VAL`: some semantic value storage (typically a union). e.g., `yylval`
|
|
- `SYMBOL-NUM`: the symbol number from which we extract the type tag.
|
|
- `TYPE-TAG`, the user forced the `<TYPE-TAG>`.
|
|
|
|
The result can be used safely, it is put in parens to avoid nasty precedence
|
|
issues.
|
|
|
|
### `b4_lhs_value(SYMBOL-NUM, [TYPE])`
|
|
Expansion of `$$` or `$<TYPE>$`, for symbol `SYMBOL-NUM`.
|
|
|
|
### `b4_rhs_data(RULE-LENGTH, POS)`
|
|
The data corresponding to the symbol `#POS`, where the current rule has
|
|
`RULE-LENGTH` symbols on RHS.
|
|
|
|
### `b4_rhs_value(RULE-LENGTH, POS, SYMBOL-NUM, [TYPE])`
|
|
Expansion of `$<TYPE>POS`, where the current rule has `RULE-LENGTH` symbols
|
|
on RHS.
|
|
|
|
<!--
|
|
|
|
Local Variables:
|
|
mode: markdown
|
|
fill-column: 76
|
|
ispell-dictionary: "american"
|
|
End:
|
|
|
|
Copyright (C) 2002, 2008-2015, 2018-2021 Free Software Foundation, Inc.
|
|
|
|
This file is part of GNU Bison.
|
|
|
|
This program is free software: you can redistribute it and/or modify
|
|
it under the terms of the GNU General Public License as published by
|
|
the Free Software Foundation, either version 3 of the License, or
|
|
(at your option) any later version.
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
GNU General Public License for more details.
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
along with this program. If not, see <https://www.gnu.org/licenses/>.
|
|
|
|
-->
|