PoWderiZer: parsing with zippers

This is yet another parser, but with some unique features.

The core algorithm is about 200 lines of code (excluding extensions and type signatures). Which makes it suitable for learning about parsing and easy to extend (which I did). It is directly taken from the Parsing with Zippers (Functional Pearl). I recommend to read it - it is quite short and explains algorithm pretty well. Or watch this video.
It can handle left recursion and ambiguous grammars. Which makes it easier to use for newbies. One doesn't need to bend their grammar in order to fit restrictions of an algorithm.
Extensions to the core algorithm add ability to express shape of the desired tree without need to massage it after parsing. Which makes it trully portable grammar - where is other parsers may need additional augmentation (A in ABNF) in hosting language.
Can work with lexer (array of tokens as input) and without (array of chars, or string), so called scannerless parser.

As of now this is rather proof of concept of the idea. Parser works, but it speed is probably not the best.

Backstory

I was looking for approachable parser. I read couple of papers about parsing and it is quite hard to understand what is happening there.

Then I found Parsing with Derivatives (A Functional Pearl). This algorithm was easy to follow. I managed to write my first parser. I learned a lot about parsers and formal grammars. Unfortunately algorithm is quite slow and doesn't produce tree in the shape that I want. I left it there

Later I found another paper Parsing with Zippers (Functional Pearl), which is the twist to the first paper. It promised to be more performant and I figured I can manipulate tree shape.

So I started my implementation. I wanted to build it from the ground up - to make sure I understand every bit. I implemented zippers, I implemented vizualization. Then I started experimenting with different extensions, like Kleene star, negative lookahed, ordered choice. I got carried away a bit.

After all I created small digital garden about parsing. See:

I had an idea about tree shape manipulation and adding PEG operators (negative lookahead, ordered choice) when I was implementing first version of PwZ (back in 2023). But recently I discovered Instaparse - which has all the same ideas. So I tried to compile it to JS. It works. But due to clojure runtime library weight is more than 200kb.

Fast forward to the present day. I decided to write cleaner version from scratch, omit the bits I'm not sure about and publish as a library.

Grammar

The parser itslef is in the category of interpreters (oposite to parser generators). It's API is based on parser combinators (traditional for functional language parsers). Though DSL (parser combinators) has some caveats - that is why they are not exposed and instead one needs to write grammar to use parser.

Grammar itself is variation of regular extensions to BNF. Or you can call it Chomsky CFG with regular extensions.

Classical BNF (Backus–Naur form):

<S> ::= <S> "a" | ""

Chomsky CFG (context free grammar):

S -> S "a" | ""

Grammar used by this library:

S = S "a" | ""

(space) - concatenation
| - unordered choice
"a" - token aka string, terminal, character, lexeme
S - symbol aka variable, non-terminal

And regular extensions (extensions comming from regular expressions)

a* - Kleene star. Any number of a
a+ - Kleene "plus". One or more a
a? - Kleene "question". Zero or one a
a{...} - quantifiers
#"[a-z]" - regular expression

And tree manipulation extension

<a> - the same as in instaparse
[a] - collapse all underlying nodes in one node with all strings concatenated

Tree-shape manipulation

Simplest example. Grammar: S = "a". Input: a

It produces exactly one node with tag S and value a.

Now let's try sequences (fixed length). Input aaa

`S = "a" "a" "a"`	`S = A A A; A = "a"`	`S = A A A; <A> = "a"`

Pay attention there are no tags on `a`s

Sequences unlimited length. Input aaa

`S = S "a" \| ""`	`S = "a" S \| ""`	`S = "a"*`

Delimited list

For example, list of arguments in function call foo(a,a,a). Input: a,a,a

`S = A ("," A)*; A = "a"`	`S = A (<","> A)*; A = "a"`	`S = (A (<","> A)*)?; A = "a"`

		Allows zero items in the list

Pay attention how * and <> allows precisely recreate tree structure without need to explicitly remove some nodes or flatten tree.

Quoted string

Quite primitive implementation - doesn't support escape characters, like \", \\, etc.

`string = <"\""> (#"[^\"]")* <"\"">`	`string = [<"\""> (#"[^\"]")* <"\"">]`

JSON example

Full JSON grammar except string (it is not according spec)

<json> = ws value ws
<value> = null | true | false | array | object | number | string
null = <"null">
true = <"true">
false = <"false">
array = <"["> ws (value (ws <","> ws value)*)? ws <"]">
object = <"{"> ws (member (ws <","> ws member)*)? ws <"}">
member = string ws <":"> ws value
number = [integer fraction? exponent?]
onenine = #"[1-9]"
digit = "0" | onenine
integer = "-"? ("0" | onenine digit*)
fraction = "." digit+
exponent = ("E" | "e") ("+" | "-")? digit+
string = <"\""> [#"[^\"]"*] <"\"">
<ws> = <#"\\s"*>

Parse tree for {"i": 1, "s": "a", "a": [], "f": -0.1E+1, "o": {}}

Issues and future improvements

There is no reporting for syntax errors. It either can parse or can't, it doesn't give a clue why it can't parse. Which is the biggest pain point right now
I want to investigate idea of macros (or functions), like in Rosie
I wonder if it is possible to implement other operators, like negative lookahead, ordered choice, backreferences, limited negation.
What about disambiguation filters?

Packages

powderizer - main package
demo - demo

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
docs		docs
examples		examples
packages		packages
.gitignore		.gitignore
Readme.md		Readme.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PoWderiZer: parsing with zippers

Backstory

Grammar

Tree-shape manipulation

Delimited list

Quoted string

JSON example

Issues and future improvements

Packages

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

stereobooster/powderizer

Folders and files

Latest commit

History

Repository files navigation

PoWderiZer: parsing with zippers

Backstory

Grammar

Tree-shape manipulation

Delimited list

Quoted string

JSON example

Issues and future improvements

Packages

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages