A Parser Combinator in PureScript (part 2/2)

Posted on July 1, 2019 by Riccardo

In the previous post we’ve created the basic building blocks of a parser combinator library. Now it’s time to add some more fancy stuff.

Choice and Failure

Up until now, each parser either parsed successfully or failed. We’d like to have a parser that can try different parsers before giving up and failing.

Luckily, math already did the dirty job for us. In fact, we can use a couple of typeclasses: Alt and Plus.

Alt captures the intuition of a choice between alternatives. It has only one member alt or infix <|>:

In other words, it tries p1 and falls back to p2 in case of failure. For example

Plusrepresents failure:

In the previous post, we represented failure in the fail parser. Turns out we could have used empty in its place:

Putting it together

Attempt Parsing Multiple Times

It would be awesome to be able to use one parser as many times as it can succeed. Let’s say we wanted to match any number of spaces. Then we could write

In other words, we parse any character, if it’s a space we recursively call spaces. Otherwise, we noop by putting back the character in the string.

Or similarly

we either succeed at parsing a space and recursively call spaces' or we get out successfully.

Those work

However, they look ugly. But guess what? Math has our back once again. As a matter of fact, we can use Data.List.many.

To use many, we need to satisfy a couple of constraints. In fact, the signature is

If we substitute Parser in place of f then we get

That means we need the following instances: Alternative Parser and Lazy (Parser (List a)).

Alternative is easy pease because it has no members to implement. In particular, for a type to be an Alternative, it only needs to be both Applicative and Plus. Since Parser implements those, we can just write

Attempting to parse an indefinite amount of times requires laziness for the same reason we need laziness to have an infinite list (check the mutual recursive definition of many and some to dig deeper).

In Haskell that is not a problem because the language is lazy by default. In PureScript we need to use Lazy:

That means now we can write

We discard the spaces (_ <- many $ char ' ') because we are not interested in a list of spaces, we just want to consume them:

The best part is that now we can also use some to have a parser that needs at least one match to succeed:

The int Parser

Just for fun, let’s put everything together into a parser that parses Ints.

At the top we define int as

In other words, an int is either a negative or a natural. nat is implemented as follows

And neg as follows

Et voilà

The Whole Code

module Main where

import Prelude
import Effect (Effect)
import Effect.Console (logShow)
import Data.Maybe
import Data.Tuple
import Data.String.Yarn hiding (fromString)
import Data.List
import Data.Char.Unicode
import Data.String.CodeUnits
import Control.Alt
import Control.Plus
import Control.Alternative
import Control.Lazy (class Lazy)
import Data.Int

newtype Parser a = Parser (String -> Maybe (Tuple a String))

runParser :: forall a. Parser a -> String -> Maybe (Tuple a String)
runParser (Parser g) s = g s

instance functorParser :: Functor Parser where
    -- map :: forall a b. (a -> b) -> f a -> f b
    map g f = Parser (\s -> case runParser f s of
                              Just (Tuple v s') -> Just $ Tuple (g v) s'
                              Nothing           -> Nothing)

instance applyParser :: (Functor Parser) => Apply Parser where
    -- apply :: forall a b. f (a -> b) -> f a -> f b
    apply fg f = Parser (\s -> case runParser fg s of
                                 Just (Tuple g s') ->
                                   case runParser f s' of
                                     Just (Tuple v s'') -> Just $ Tuple (g v) s''
                                     Nothing            -> Nothing
                                 Nothing           -> Nothing)

instance applicativeParser :: (Apply Parser) => Applicative Parser where
    -- pure :: forall a. a -> f a
    pure x = Parser (\s -> Just $ Tuple x s)

instance bindParser :: (Apply Parser) => Bind Parser where
    -- bind :: forall a b. m a -> (a -> m b) -> m b
    bind m g = Parser (\s -> case runParser m s of
                               Just (Tuple v s') -> runParser (g v) s'
                               Nothing           -> Nothing)

instance monadParser :: (Bind Parser) => Monad Parser

instance altParser :: (Functor Parser) => Alt Parser where
    -- alt :: forall a. f a -> f a -> f a
    alt p1 p2 = Parser (\s -> case runParser p1 s of
                                Just (Tuple v s') -> Just $ Tuple v s'
                                Nothing           -> runParser p2 s)

instance plusParser :: (Alt Parser) => Plus Parser where
    -- empty :: forall a. f a
    empty = Parser (\_ -> Nothing)

instance alternativeParser :: (Applicative Parser, Plus Parser) => Alternative Parser

instance lazyParser :: Lazy (Parser (List a)) where
    -- defer :: (Unit -> l) -> l
    defer g = Parser (\s -> runParser (g unit) s)

fail :: forall a. Parser a
fail = Parser (\_ -> Nothing)

anyChar :: Parser Char
anyChar = Parser (\s -> case toChars s :: List Char of
                          Nil       -> Nothing
                          Cons x xs -> Just $ Tuple x $ fromChars xs)

sat :: (Char -> Boolean) -> Parser Char
sat pred = do
    c <- anyChar
    if pred c then pure c else fail

digit :: Parser Char
digit = sat isDigit

lower :: Parser Char
lower = sat isLower

upper :: Parser Char
upper = sat isUpper

letter :: Parser Char
letter = sat isAlpha

alphanum :: Parser Char
alphanum = sat isAlphaNum

char :: Char -> Parser Char
char c = sat ((==) c)

string :: String -> Parser String
string s =
  map fromChars $ case toChars s :: List Char of
       Nil       -> pure Nil
       Cons x xs -> do
          _ <- char x
          _ <- string $ fromChars xs
          pure $ Cons x xs

spaces :: Parser Unit
spaces = do
  c <- anyChar
  if c == ' '
    then spaces
    else Parser (\s -> Just (Tuple unit $ fromChars [c] <> s))

spaces' :: Parser Unit
spaces' = (char ' ' >>= \_ -> spaces') <|> pure unit

spaces0 :: Parser Unit
spaces0 = do
  _ <- many $ char ' '
  pure unit

spaces1 :: Parser Unit
spaces1 = do
  _ <- some $ char ' '
  pure unit

int :: Parser Int
int =
  neg <|> nat

nat :: Parser Int
nat = do
  xs <- some digit
  case fromString $ fromChars xs of
    Just n -> pure n
    Nothing -> empty

neg :: Parser Int
neg = do
  _ <- char '-'
  negate <$> nat

main :: Effect Unit
main = do
  logShow $ runParser anyChar "string"
  -- (Just (Tuple 's' "tring"))

  logShow $ runParser (char 's') "string"
  -- (Just (Tuple 's' "tring"))

  logShow $ runParser (char 'Z') "string"
  -- Nothing

  logShow $ runParser digit "3tring"
  -- (Just (Tuple '3' "tring"))

  logShow $ runParser (string "stri") "string"
  -- (Just (Tuple "stri" "ng"))

  logShow $ runParser (string "ZZZ") "string"
  -- Nothing

  logShow $ runParser (char 'Z' <|> char 's') "string"
  -- (Just (Tuple 's' "tring"))

  logShow $ runParser (empty :: Parser Unit) "string"
  -- Nothing

  logShow $ runParser (empty <|> anyChar) "string"
  -- (Just (Tuple 's' "tring"))

  logShow $ runParser (anyChar <|> empty) "string"
  -- (Just (Tuple 's' "tring"))

  logShow $ runParser spaces "   string"
  -- (Just (Tuple Unit "string"))

  logShow $ runParser spaces "string"
  -- (Just (Tuple Unit "string"))

  logShow $ runParser spaces' "   string"
  -- (Just (Tuple Unit "string"))

  logShow $ runParser spaces' "string"
  -- (Just (Tuple Unit "string"))

  logShow $ runParser spaces0 "   string"
  -- (Just (Tuple Unit "string"))

  logShow $ runParser spaces0 "string"
  -- (Just (Tuple Unit "string"))

  logShow $ runParser spaces1 "   string"
  -- (Just (Tuple Unit "string"))

  logShow $ runParser spaces1 "string"
  -- Nothing

  logShow $ runParser int "123 string"
  -- (Just (Tuple 123 " string"))

  logShow $ runParser int "-123 string"
  -- (Just (Tuple -123 " string"))

  logShow $ runParser int "string"
  -- Nothing

Outro

Special thanks to Tom for writing “Fantas, Eel, and Specification 10: Alt, Plus, and Alternative” and to Typeclassopedia. Those two provide clear explanations of the intuitions behind Alternative. Also, thanks a bunch to @monoidmusician and @kadblas from fpchat who helped me understand Lazy.