100% found this document useful (1 vote)

28 views8 pages

Monadic Parsing in Haskell

Uploaded by

tkmyypbjkroinlouui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

28 views8 pages

Monadic Parsing in Haskell

Uploaded by

tkmyypbjkroinlouui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

J. Functional Programming 8 (4): 437–444, July 1998.

Printed in the United Kingdom 437

c 1998 Cambridge University Press

FUNCTIONAL PEARL
Monadic parsing in Haskell
GRAHAM HUTTON
University of Nottingham, Nottingham, UK

ERIK MEIJER
University of Utrecht, Utrecht, The Netherlands

1 Introduction
This paper is a tutorial on defining recursive descent parsers in Haskell. In the spirit
of one-stop shopping, the paper combines material from three areas into a single
source. The three areas are functional parsers (Burge, 1975; Wadler, 1985; Hutton,
1992; Fokker, 1995), the use of monads to structure functional programs (Wadler,
1990, 1992a, 1992b), and the use of special syntax for monadic programs in Haskell
(Jones, 1995; Peterson et al., 1996). More specifically, the paper shows how to define
monadic parsers using do notation in Haskell.
Of course, recursive descent parsers defined by hand lack the efficiency of bottom-
up parsers generated by machine (Aho et al., 1986; Mogensen, 1993; Gill and
Marlow, 1995). However, for many research applications, a simple recursive descent
parser is perfectly sufficient. Moreover, while parser generators typically offer a
fixed set of combinators for describing grammars, the method described here is
completely extensible: parsers are first-class values, and we have the full power of
Haskell available to define new combinators for special applications. The method is
also an excellent illustration of the elegance of functional programming.
The paper is targeted at the level of a good undergraduate student who is familiar
with Haskell, and has completed a grammars and parsing course. Some knowledge
of functional parsers would be useful, but no experience with monads is assumed.
A Haskell library derived from the paper is available on the web from:
http://www.cs.nott.ac.uk/Department/Staff/gmh/bib.html#pearl

2 A type for parsers

We begin by defining a type for parsers:
newtype Parser a = Parser (String -> [(a,String)])
That is, a parser is a function that takes a string of characters as its argument,
and returns a list of results. The convention is that the empty list of results denotes
failure of a parser, and that non-empty lists denote success. In the case of success,
each result is a pair whose first component is a value of type a produced by parsing

https://doi.org/10.1017/S0956796898003050 Published online by Cambridge University Press

438 G. Hutton and E. Meijer

and processing a prefix of the argument string, and whose second component is the
unparsed suffix of the argument string. Returning a list of results allows us to build
parsers for ambiguous grammars, with many results being returned if the argument
string can be parsed in many different ways.

3 A monad of parsers
The first parser we define is item, which successfully consumes the first character if
the argument string is non-empty, and fails otherwise:
item :: Parser Char
item = Parser (\cs -> case cs of
"" -> []
(c:cs) -> [(c,cs)])
Next we define two combinators that reflect the monadic nature of parsers. In
Haskell, the notion of a monad is captured by a built-in class definition:
class Monad m where
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
That is, a type constructor m is a member of the class Monad if it is equipped with
return and (>>=) functions of the specified types. The type constructor Parser can
be made into an instance of the Monad class as follows:
instance Monad Parser where
return a = Parser (\cs -> [(a,cs)])
p >>= f = Parser (\cs -> concat [parse (f a) cs’ |
(a,cs’) <- parse p cs])
The parser return a succeeds without consuming any of the argument string, and
returns the single value a. The (>>=) operator is a sequencing operator for parsers.
Using a deconstructor function for parsers defined by parse (Parser p) = p, the
parser p >>= f first applies the parser p to the argument string cs to give a list of
results of the form (a,cs’), where a is a value and cs’ is a string. For each such
pair, f a is a parser which is applied to the string cs’. The result is a list of lists,
which is then concatenated to give the final list of results.
The return and (>>=) functions for parsers satisfy some simple laws:
return a >>= f = f a
p >>= return = p
p >>= (\a -> (f a >>= g)) = (p >>= (\a -> f a)) >>= g
In fact, these laws must hold for any monad, not just the special case of parsers.
The laws assert that – modulo the fact that the right argument to (>>=) involves
a binding operation – return is a left and right unit for (>>=), and that (>>=) is
associative. The unit laws allow some parsers to be simplified, and the associativity
law allows parentheses to be eliminated in repeated sequencings.

https://doi.org/10.1017/S0956796898003050 Published online by Cambridge University Press

Functional pearl 439

4 The do notation
A typical parser built using (>>=) has the following structure:
p1 >>= \a1 ->
p2 >>= \a2 ->
...
pn >>= \an ->
f a1 a2 ... an
Such a parser has a natural operational reading: apply parser p1 and call its result
value a1; then apply parser p2 and call its result value a2; ...; then apply parser
pn and call its result value an; and finally, combine all the results by applying a
semantic action f. For most parsers, the semantic action will be of the form return
(g a1 a2 ... an) for some function g, but this is not true in general. For example,
it may be necessary to parse more of the argument string before a result can be
returned, as is the case for the chainl1 combinator defined later on.
Haskell provides a special syntax for defining parsers of the above shape, allowing
them to be expressed in the following, more appealing, form:
do a1 <- p1
a2 <- p2
...
an <- pn
f a1 a2 ... an
This notation can also be used on a single line if preferred, by making use of
parentheses and semi-colons, in the following manner:
do {a1 <- p1; a2 <- p2; ...; an <- pn; f a1 a2 ... an}
In fact, the do notation in Haskell can be used with any monad, not just parsers. The
subexpressions ai <- pi are called generators, since they generate values for the
variables ai. In the special case when we are not interested in the values produced
by a generator ai <- pi, the generator can be abbreviated simply by pi.
Example. A parser that consumes three characters, throws away the second character,
and returns the other two as a pair, can be defined as follows:
p :: Parser (Char,Char)
p = do {c <- item; item; d <- item; return (c,d)}

5 Choice combinators
We now define two combinators that extend the monadic nature of parsers. In
Haskell, the notion of a monad with a zero, and a monad with a zero and a plus are
captured by two built-in class definitions:
class Monad m => MonadZero m where
zero :: m a

https://doi.org/10.1017/S0956796898003050 Published online by Cambridge University Press

440 G. Hutton and E. Meijer

class MonadZero m => MonadPlus m where

(++) :: m a -> m a -> m a
That is, a type constructor m is a member of the class MonadZero if it is a member
of the class Monad, and if it is also equipped with a value zero of the specified type.
In a similar way, the class MonadPlus builds upon the class MonadZero by adding
a (++) operation of the specified type. The type constructor Parser can be made
into instances of these two classes as follows:
instance MonadZero Parser where
zero = Parser (\cs -> [])

instance MonadPlus Parser where

p ++ q = Parser (\cs -> parse p cs ++ parse q cs)
The parser zero fails for all argument strings, returning no results. The (++)
operator is a (non-deterministic) choice operator for parsers. The parser p ++ q
applies both parsers p and q to the argument string, and appends their list of results.
The zero and (++) operations for parsers satisfy some simple laws:
zero ++ p = p
p ++ zero = p
p ++ (q ++ r) = (p ++ q) ++ r
These laws must in fact hold for any monad with a zero and a plus. The laws assert
that zero is a left and right unit for (++), and that (++) is associative. For the
special case of parsers, it can also be shown that – modulo the binding involved with
(>>=) – zero is the left and right zero for (>>=), that (>>=) distributes through
(++) on the right, and (provided we ignore the order of results returned by parsers)
that (>>=) also distributes through (++) on the left:
zero >>= f = zero
p >>= const zero = zero
(p ++ q) >>= f = (p >>= f) ++ (q >>= f)
p >>= (\a -> f a ++ g a) = (p >>= f) ++ (p >>= g)
The zero laws allow some parsers to be simplified, and the distribution laws allow
the efficiency of some parsers to be improved.
Parsers built using (++) return many results if the argument string can be parsed
in many different ways. In practice, we are normally only interested in the first
result. For this reason, we define a (deterministic) choice operator (+++) that has
the same behaviour as (++), except that at most one result is returned:
(+++) :: Parser a -> Parser a -> Parser a
p +++ q = Parser (\cs -> case parse (p ++ q) cs of
[] -> []
(x:xs) -> [x])

https://doi.org/10.1017/S0956796898003050 Published online by Cambridge University Press

Functional pearl 441

All the laws given above for (++) also hold for (+++). Moreover, for the case of
(+++), the precondition of the left distribution law is automatically satisfied.
The item parser consumes single characters unconditionally. To allow conditional
parsing, we define a combinator sat that takes a predicate, and yields a parser that
consumes a single character if it satisfies the predicate, and fails otherwise:

sat :: (Char -> Bool) -> Parser Char

sat p = do {c <- item; if p c then return c else zero}

Example. A parser for specific characters can be defined as follows:

char :: Char -> Parser Char

char c = sat (c ==)

In a similar way, by supplying suitable predicates to sat, we can define parsers for
digits, lower-case letters, upper-case letters, and so on.

6 Recursion combinators
A number of useful parser combinators can be defined recursively. Most of these
combinators can in fact be defined for arbitrary monads with a zero and a plus, but
for clarity they are defined below for the special case of parsers.

• Parse a specific string:

string :: String -> Parser String

string "" = return ""
string (c:cs) = do {char c; string cs; return (c:cs)}

• Parse repeated applications of a parser p; the many combinator permits zero

or more applications of p, while many1 permits one or more:

many :: Parser a -> Parser [a]

many p = many1 p +++ return []

many1 :: Parser a -> Parser [a]

many1 p = do {a <- p; as <- many p; return (a:as)}

• Parse repeated applications of a parser p, separated by applications of a parser

sep whose result values are thrown away:

sepby :: Parser a -> Parser b -> Parser [a]

p ‘sepby‘ sep = (p ‘sepby1‘ sep) +++ return []

sepby1 :: Parser a -> Parser b -> Parser [a]

p ‘sepby1‘ sep = do a <- p
as <- many (do {sep; p})
return (a:as)

https://doi.org/10.1017/S0956796898003050 Published online by Cambridge University Press

442 G. Hutton and E. Meijer

• Parse repeated applications of a parser p, separated by applications of a parser

op whose result value is an operator that is assumed to associate to the left,
and which is used to combine the results from the p parsers:

chainl :: Parser a -> Parser (a -> a -> a) -> a -> Parser a

chainl p op a = (p ‘chainl1‘ op) +++ return a

chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a

p ‘chainl1‘ op = do {a <- p; rest a}
where
rest a = (do f <- op
b <- p
rest (f a b))
+++ return a

Combinators chainr and chainr1 that assume the parsed operators associate
to the right can be defined in a similar manner.

7 Lexical combinators
Traditionally, parsing is usually preceded by a lexical phase that transforms the
argument string into a sequence of tokens. However, the lexical phase can be
avoided by defining suitable combinators. In this section we define combinators to
handle the use of space between tokens in the argument string. Combinators to
handle other lexical issues such as comments and keywords can easily be defined
too.

• Parse a string of spaces, tabs, and newlines:

space :: Parser String

space = many (sat isSpace)

• Parse a token using a parser p, throwing away any trailing space:

token :: Parser a -> Parser a

token p = do {a <- p; space; return a}

• Parse a symbolic token:

symb :: String -> Parser String

symb cs = token (string cs)

• Apply a parser p, throwing away any leading space:

apply :: Parser a -> String -> [(a,String)]

apply p = parse (do {space; p})

https://doi.org/10.1017/S0956796898003050 Published online by Cambridge University Press

Functional pearl 443

8 Example
We illustrate the combinators defined in this article with a simple example. Consider
the standard grammar for arithmetic expressions built up from single digits using
the operators +, -, * and /, together with parentheses (Aho et al., 1986):
expr ::= expr addop term | term
term ::= term mulop factor | factor
factor ::= digit | ( expr )
digit ::= 0 | 1 | ... | 9

addop ::= + | -
mulop ::= * | /
Using the chainl1 combinator to implement the left-recursive production rules for
expr and term, this grammar can be directly translated into a Haskell program that
parses expressions and evaluates them to their integer value:
expr :: Parser Int
addop :: Parser (Int -> Int -> Int)
mulop :: Parser (Int -> Int -> Int)

expr = term ‘chainl1‘ addop

term = factor ‘chainl1‘ mulop
factor = digit +++ do {symb "("; n <- expr; symb ")"; return n}
digit = do {x <- token (sat isDigit); return (ord x - ord ’0’)}

addop = do {symb "+"; return (+)} +++ do {symb "-"; return (-)}
mulop = do {symb "*"; return (*)} +++ do {symb "/"; return (div)}
For example, evaluating apply expr " 1 - 2 * 3 + 4 " gives the singleton list
of results [(-1,"")], which is the desired behaviour.

Acknowledgements
Thanks for due to Luc Duponcheel, Benedict Gaster, Mark P. Jones, Colin Taylor
and Philip Wadler for their useful comments on the many drafts of this article.

References
Aho, A., Sethi, R. and Ullman, J. (1986) Compilers – Principles, Techniques and Tools.
Addison-Wesley.
Burge, W, H. (1975) Recursive Programming Techniques. Addison-Wesley.
Fokker, J. (1995) Functional parsers. Lecture Notes of the Baastad Spring School on Functional
Programming.
Gill, A. and Marlow, S. (1995) Happy: the parser generator for Haskell. University of Glasgow.
Hutton, G. (1992) Higher-order functions for parsing. J. Functional Programming, 2(3), 323–
343.

https://doi.org/10.1017/S0956796898003050 Published online by Cambridge University Press

444 G. Hutton and E. Meijer

Jones, M. P. (1995) A system of constructor classes: overloading and implicit higher-order

polymorphism. J. Functional Programming, 5(1), 1–35.
Mogensen, T. (1993) Ratatosk: A parser generator and scanner generator for Gofer. University
of Copenhagen (DIKU).
Peterson, J. et al. (1996) The Haskell Language Report, version 1.3. Research report
YALEU/DCS/RR-1106, Yale University.
Wadler, P. (1985) How to replace failure by a list of successes. Proc. Conf. on Functional
Programming and Computer Architecture. Springer-Verlag.
Wadler, P. (1990) Comprehending monads. Proc. ACM Conf. on Lisp and Functional Pro-
gramming.
Wadler, P. (1992a) The essence of functional programming. Proc. Principles of Programming
Languages.
Wadler, P. (1992b) Monads for functional programming. In: Broy, M. (ed.), Proc. Marktober-
dorf Summer School on Program Design Calculi. Springer-Verlag.

https://doi.org/10.1017/S0956796898003050 Published online by Cambridge University Press

monadic-parsing-jfp
100% (1)
monadic-parsing-jfp
8 pages
Monadic Parser Combinators: University of Nottingham
100% (1)
Monadic Parser Combinators: University of Nottingham
38 pages
Parsing With Haskell
100% (1)
Parsing With Haskell
16 pages
2008-044
No ratings yet
2008-044
55 pages
PS1: Introduction and Parser Combinators: Lior Zur-Lotan and Avi Hayoun December 18, 2014
No ratings yet
PS1: Introduction and Parser Combinators: Lior Zur-Lotan and Avi Hayoun December 18, 2014
7 pages
CD_Notes_by_Quantum_City_AIR_107,_GATE_CS_2024,_Shreyas_Rathod_Compiler
No ratings yet
CD_Notes_by_Quantum_City_AIR_107,_GATE_CS_2024,_Shreyas_Rathod_Compiler
37 pages
Simple Grammars: LR (K) " SLR (K)
No ratings yet
Simple Grammars: LR (K) " SLR (K)
8 pages
SE Compiler Chapter 3-Parser
No ratings yet
SE Compiler Chapter 3-Parser
27 pages
Understanding Haskell Monads - Ertugrul Söylemez - Pags 16-28
100% (1)
Understanding Haskell Monads - Ertugrul Söylemez - Pags 16-28
13 pages
Parsernotes in C
No ratings yet
Parsernotes in C
45 pages
Bottom Up Parsing1
No ratings yet
Bottom Up Parsing1
69 pages
4 Predctive Parser
No ratings yet
4 Predctive Parser
59 pages
Compiler Design Questions
No ratings yet
Compiler Design Questions
6 pages
2010 03 21 Dan - Vasicek.functional Programming Using Haskell
No ratings yet
2010 03 21 Dan - Vasicek.functional Programming Using Haskell
54 pages
Httpshackage.haskell.orgpackagebase 4.18.1.0docssrcText.parserCombinators.readP.html#Chainl1
No ratings yet
Httpshackage.haskell.orgpackagebase 4.18.1.0docssrcText.parserCombinators.readP.html#Chainl1
1 page
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
14 pages
Haskell
No ratings yet
Haskell
33 pages
Syntax Analysis (Part-I)
No ratings yet
Syntax Analysis (Part-I)
88 pages
KCA015 Unit2
No ratings yet
KCA015 Unit2
29 pages
2014-CD Ch-03 SAn
No ratings yet
2014-CD Ch-03 SAn
21 pages
CC 5
No ratings yet
CC 5
77 pages
Unit II PDF
No ratings yet
Unit II PDF
7 pages
Haskell Ucs 0.4 PDF
No ratings yet
Haskell Ucs 0.4 PDF
2 pages
Parsing Bun
No ratings yet
Parsing Bun
48 pages
Syntax Analysis: CD: Compiler Design
No ratings yet
Syntax Analysis: CD: Compiler Design
90 pages
Compiler Design Unit-2
No ratings yet
Compiler Design Unit-2
29 pages
Syntax Analyzer
No ratings yet
Syntax Analyzer
38 pages
Algebraic Expression Travesties 002
No ratings yet
Algebraic Expression Travesties 002
16 pages
Cd notes
No ratings yet
Cd notes
194 pages
Compiler Design(Unit-II)
No ratings yet
Compiler Design(Unit-II)
89 pages
lecture 4
No ratings yet
lecture 4
26 pages
LR (0) Parser
No ratings yet
LR (0) Parser
8 pages
Unit 2
No ratings yet
Unit 2
67 pages
Lecture 05
No ratings yet
Lecture 05
58 pages
CD Unit 2
No ratings yet
CD Unit 2
19 pages
CSE437 Assignment 8
No ratings yet
CSE437 Assignment 8
3 pages
History 6
No ratings yet
History 6
170 pages
maybe-haskell
100% (1)
maybe-haskell
79 pages
Syntax
No ratings yet
Syntax
62 pages
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
No ratings yet
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
44 pages
Haskell Tokenizer
No ratings yet
Haskell Tokenizer
10 pages
Second Phase of The Compiler. Main Task:: Lexical Analyzer Rest of Front End Parser Source Tree Parse Req Token IR
No ratings yet
Second Phase of The Compiler. Main Task:: Lexical Analyzer Rest of Front End Parser Source Tree Parse Req Token IR
13 pages
Compilers - Week 3
No ratings yet
Compilers - Week 3
17 pages
CD Unit 3
No ratings yet
CD Unit 3
30 pages
Wearing The Hair Shirt
100% (1)
Wearing The Hair Shirt
68 pages
Chomsky Normal Form
100% (1)
Chomsky Normal Form
6 pages
Compiler Design
No ratings yet
Compiler Design
12 pages
CS 3723 - Programming Language: 1. Introductory Stuff
No ratings yet
CS 3723 - Programming Language: 1. Introductory Stuff
11 pages
ACD-UNIT-4 Notes
No ratings yet
ACD-UNIT-4 Notes
32 pages
The Monad Type Class
No ratings yet
The Monad Type Class
2 pages
Unit-2 F&CD
No ratings yet
Unit-2 F&CD
31 pages
SLR Parsing
No ratings yet
SLR Parsing
22 pages
Lecture3 Parser Full
No ratings yet
Lecture3 Parser Full
30 pages
Semiring Parsing
No ratings yet
Semiring Parsing
34 pages
Haskell For Lisp Programmers
No ratings yet
Haskell For Lisp Programmers
27 pages
Parsing Techniques: Parsers
No ratings yet
Parsing Techniques: Parsers
16 pages
Lecture 7 (Slide)
No ratings yet
Lecture 7 (Slide)
14 pages
Compiler 3
No ratings yet
Compiler 3
11 pages

Monadic Parsing in Haskell

Uploaded by

Monadic Parsing in Haskell

Uploaded by

J. Functional Programming 8 (4): 437–444, July 1998.

Printed in the United Kingdom 437

2 A type for parsers

https://doi.org/10.1017/S0956796898003050 Published online by Cambridge University Press

https://doi.org/10.1017/S0956796898003050 Published online by Cambridge University Press

https://doi.org/10.1017/S0956796898003050 Published online by Cambridge University Press

class MonadZero m => MonadPlus m where

instance MonadPlus Parser where

https://doi.org/10.1017/S0956796898003050 Published online by Cambridge University Press

sat :: (Char -> Bool) -> Parser Char

Example. A parser for specific characters can be defined as follows:

char :: Char -> Parser Char

• Parse a specific string:

string :: String -> Parser String

• Parse repeated applications of a parser p; the many combinator permits zero

many :: Parser a -> Parser [a]

many1 :: Parser a -> Parser [a]

• Parse repeated applications of a parser p, separated by applications of a parser

sepby :: Parser a -> Parser b -> Parser [a]

sepby1 :: Parser a -> Parser b -> Parser [a]

https://doi.org/10.1017/S0956796898003050 Published online by Cambridge University Press

• Parse repeated applications of a parser p, separated by applications of a parser

chainl :: Parser a -> Parser (a -> a -> a) -> a -> Parser a

chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a

• Parse a string of spaces, tabs, and newlines:

space :: Parser String

• Parse a token using a parser p, throwing away any trailing space:

token :: Parser a -> Parser a

• Parse a symbolic token:

symb :: String -> Parser String

• Apply a parser p, throwing away any leading space:

apply :: Parser a -> String -> [(a,String)]

https://doi.org/10.1017/S0956796898003050 Published online by Cambridge University Press

expr = term ‘chainl1‘ addop

https://doi.org/10.1017/S0956796898003050 Published online by Cambridge University Press

Jones, M. P. (1995) A system of constructor classes: overloading and implicit higher-order

https://doi.org/10.1017/S0956796898003050 Published online by Cambridge University Press

You might also like