Custom Markdown in Pandoc

Posted on June 8, 2020 by Riccardo

The architecture of Pandoc is well described in the documentation:

Pandoc consists of a set of readers and writers. When converting a document from one format to another, text is parsed by a reader into pandoc’s intermediate representation of the document—an “abstract syntax tree” or AST—which is then converted by the writer into the target format.

For example, let's take a Markdown file:

```haskell
x = 1
```

Pandoc translates it to the following AST:

stack install pandoc
stack exec -- pandoc -s -f markdown -t native file.markdown

# Pandoc
#   (Meta {unMeta = fromList []})
#   [CodeBlock ("",["haskell"],[]) "x = 1"]

CodeBlock is one of the value constructors of the Block type:

data Block
    = CodeBlock Attr Text
--  | ...

type Attr = (Text, [Text], [(Text, Text)])
--           ^ Id
--                 ^ Classes
--                         ^ Key-Value pairs

Pandoc allows changing the AST before the output document is written.

One way to achieve that is by using a filter:

#!/usr/bin/env runghc

{-# LANGUAGE OverloadedStrings #-}

import Text.Pandoc
import Text.Pandoc.JSON

main :: IO ()
main = toJSONFilter transform

transform :: Block -> Block
transform (CodeBlock attr content) = CodeBlock attr "y = 2"
transform x = x

I'm using runghc in this case to make sure the version of Pandoc used in the filter is the same as the one invoked on the command line:

stack exec -- pandoc -s -f markdown -t native --filter filter.hs file.markdown

# Pandoc
#   (Meta {unMeta = fromList []})
#   [CodeBlock ("",["haskell"],[]) "y = 2"]

Notice that the code changed from x = 1 to y = 2.

The same can be achieved in pure Haskell code:

{-# LANGUAGE OverloadedStrings #-}

import Data.Text
import Text.Pandoc
import Text.Pandoc.JSON
import Text.Pandoc.Walk

main :: IO ()
main = do
  file <- readFile "./file.markdown"
  result <- runIO $ do
    doc <- readMarkdown (def {readerExtensions = pandocExtensions}) (pack file)
    let transformed = walk transform doc
    writeNative (def {writerExtensions = pandocExtensions}) transformed
  handleError result >>= putStrLn . unpack

transform :: Block -> Block
transform (CodeBlock attr content) = CodeBlock attr "y = 2"
transform x = x
stack runghc pure-haskell.hs

# [CodeBlock ("",["haskell"],[]) "y = 2"]

With Hakyll, similar transformations can be achieved by using pandocCompilerWithTransform or pandocCompilerWithTransformM.

That is how tweetable pull quotes are implemented on this blog. In particular, the following Markdown

```pullquote
If we are not open to silly ideas, then why even bother with a creative activity in the first place?
```

In a recent workshop I attended one rule was "all ideas are brilliant". Yes, at first an idea could be raw, maybe even silly. However, by thinking outside the box, it may turn into something innovative. More often than not though, it will be a draw in the blank. Still, if we are not open to silly ideas, then why even bother with a creative activity in the first place? In fact, if you always do what you've always done, you'll always get what you've always got.

becomes

Screenshot of a pull quote from a blog post


If you want to dig deeper, the Pandoc documentation on filters is a great read on the topic.

PinkLetter

It's one of the selected few I follow every week – Mateusz

Tired of RELEARNING webdev stuff?

  • A 100+ page book with the best links I curated over the years
  • An email once a week full of timeless software wisdom
  • Your recommended weekly dose of pink
  • Try before you buy? Check the archives.