YAPP XSLT is a lexical scanner and recursive descent parser generator, implemented in XSLT. No language extensions or non-standard features are used apart from the nodeset() function. Grammars are expressed in XML form and transformed by the generator stylesheet into another XSLT. A lexical scanner may also be generated from the same grammar.
This project started from my frustration that I could not find any simple, portable XML Parser to use inside my tools (see CONDOR for example). Let's look at the well-known Xerces C++ library: the complete Xerces project is 53 MB! (11 MB compressed in a zipfile). I am currently developping many small tools. I am using XML as standard for all my input /ouput configuration and data files. The source code of my small tools is usually around 600KB.
A grammar for Haskell, close to the specification in the Haskell report is given. This is especially interesting, as many rules given in the report are hard to implement.
Grammatica is a C# and Java parser generator (compiler compiler). It improves upon simlar tools (like yacc and ANTLR) by creating well-commented and readable source code, by having automatic error recovery and detailed error messages, and by support for testing and debugging grammars without generating source code.
This is an insanely long and gnarly essay about implementing, then optimizing, the low-level bits of a pure-Ruby XML parser. If you obsess about XML reading, deterministic finite automata, or Ruby code optimization, you may find some part of it interestin