So much of UNIX's tooling are domain-specific languages to parse, and to make those easier to write in C they (eventually) created YACC which GNU rewrote as Bison. Arguably all commandline all commandline programs are domain-specific languages! It is even used by Gettext (where in any other language I would've written a Shunting Yard or Pratt parser...), and most languages have equivalents!

Today I'll discuss Bison's frontend, & should only take a couple days to finish this.


The first thing Bison does is retrieve $BISON_PROGRAM_NAME if present into argv[0] & a global, followed by initializing internationalization, quoting as per $LC_CTYPE, on-exit cleanup, glyphicons table (preferably Unicode), deduplicating hashmap for strings, "muscles" table + obstack allocation, line truncation data & tables for how seriously to treat different kinds of errors, & obstack/flag for lexing.

Then parses extensive commandline flags preprocessing to extract colouroptions first.


Bison's commandline parsing integrates the internal "complaints" infrastructure with special line-numbering handling, populating the tables I described earlier amongst other globals. Afterwords it validates there's a single extra commandline arg which it deduplicates & stores in a global & the "file_name" interpretor-var. As always --help & --version handled specially.

With extensive & optional time-profiling & some profiling bitflags it parses the input file exitting upon fatal error, ...


... garbage collect the AST, compute derivitives & nullability from it, convert into a statemachine, compute lookahead sets, detect conflicts & complain about them, & compile the output code! I'll dig into these more later...

Compilation is done by allocating arrays for precomputed counts populated with the aid of some temp arrays bucketsorting actions, output additional warnings, & if desired:

1. Computes output filenames with appropriate extensions


2. Output a file reporting rules marked as useless, states marked as conflicting, all useful grammar rules, terminals & all rules referencing them, same for non-terminals, & the state machine
3. Iterates over each state generating a nontrivial label & outputting a graphviz node followed by corresponding edges with reduction in red, all surrounded by a graphviz header & footer. Seperate internal module abstracts graphviz
4. Outputs the grammar & statemachine as XML without an XML serializer


5. For HTML it runs that XML file through bundled XSLT file via a configuable XSLT processor!
6. Unconditionally (as long as there weren't any errors) frees lookahead tokens.
7. Conditionally again outputs the statemachine to C in multiple passes I'll describe later.

From there it frees all the various data it has allocated whilst applying minor fixes keeping a backup file.

6/6 Fin for today! Will continue tomorrow with parsing & optimizing!

Sign in to participate in the conversation

For people who care about, support, or build Free, Libre, and Open Source Software (FLOSS).