How to Create a Pollen Markup Alternative in 61 Lines

Last updated: home

Let’s apply ISML, the stupidest markup language in the world, towards writing an alternative to Pollen markup in 61 lines of code.

Why? Because this is a great exercise that will teach you fun things to do in Racket, like:

The markup language we’ll create doesn’t have to use Racket or Pollen, but can. Additionally, the control character for our markup language is easy to configure based on the rules of ISML.

Our ISML parser

For reference, here are the rules:

Here’s a parser implementation. Save it to my-isml.rkt.

#lang racket

(provide read-isml)

(define (accrete-chars remaining [found '()])
  (let ([next (and (not (eq? remaining '())) (car remaining))])
    (if (char? next)
        (accrete-chars (cdr remaining) (cons next found))
        (values (apply string found)
                remaining))))

(define (accrete-content content [found '()])
  (if (eq? content '()) found
      (if (char? (car content))
          (let-values ([(str remaining) (accrete-chars content)])
            (accrete-content remaining (cons str found)))
          (accrete-content (cdr content) (cons (car content) found)))))

(define (read-isml in [read-controlled read])
  (define control-char (read-char in))
  (define (read-next state reader)
    (define datum (reader in))
    (if (eof-object? datum)
        state
        (if (eq? reader read-char)
            (if (char=? control-char datum)
                (read-next state read-controlled)
                (read-next (cons datum state) reader))
            (read-next (cons datum state) read-char))))
  (if (eof-object? control-char) '()
      (accrete-content (read-next '() read-char))))

Now you can express lists like '("jimminii" (b (i "christmassss"))) as ٭jimminii ٭(b (i "christmassss")). You can replace ٭ with any other character, but since you can’t escape the control charater I just leave it as something weird.

Now let’s look at Pollen Markup. This is a clean and brain-friendly markup language that interfaces with Pollen’s preprocessor. You can write procedures to represent elements, and then call those procedures while expressing them as annotations for text. It’s control character is the lozenge ().

Here’s some Pollen markup. Save it to doc.pm.

#lang pollen
(require racket/string)
(define sparkles string-upcase)

oh my god I'm just so ◊sparkles{pretty} todayyy

The control character is configured differently, but the premise is the same. Here’s the equivalent ISML document. Save it to doc.isml.

٭
٭(require racket/string)
٭(define sparkles string-upcase)

oh my god I'm just so ٭(sparkles "pretty") todayy

Two differences stand out:

  1. Pollen documents are programs that compute a finished document. ISML doesn’t do anything. If you run the Pollen markup you’ll get a document in the form of X-expressions. If you run the ISML then it will just fall over.
  2. Pollen markup supports the same bracket conventions as Scribble, which in this example lets you apply sparkles to a string using one pair of curly braces. The ISML parser I shared doesn’t do that, so we just settle with the S-expression form. It can be upgraded to use the same conventions later if you wish.

How do we close this gap?

Racket Integration

Let’s add some code to “run” an ISML document as if it were Racket. Copy this code to the end of my-isml.rkt, and add run-isml to your provide.

(provide read-isml run-isml)

; ...

(define (run-isml in)
  (define ns (make-base-namespace))
  (for/list ([expr (in-list (read-isml in))])
    (eval expr ns)))

run-isml will run an ISML document as a program. It does this by applying eval to each item in its contents. The strings will evaluate to themselves, and every expression after a control character will turn into the result of eval applied to that expression.

In Racket, eval runs in the context of a namespace. It needs one to look up the values of variables you declare in a program. A “base” namespace is just one that has all the bindings you’d expect in the racket/base language, which is plenty for our needs. We create a new base namespace using make-base-namespace each time because we don’t want to parse two documents that conflict over names.

While I’m ahead, I should of course warn you to never use this code on untrusted user input. For simplicity, we are assuming that any ol’ thing you give to this procedure is fair game to run with your privileges. Security is another article, but if you want to learn more, you can read up on Racket security guards and sandboxes.

Let’s run the two documents now. Pollen uses it’s #lang to process content, so you can just run it like a script.

$ racket doc.pm
'(root "oh my god I'm just so " "PRETTY" " todayyy" "\n")

ISML is not using a #lang, so we’ll instead call our procedure using our file: (call-with-input-file "doc.isml" run-isml).

> (require "my-isml.rkt")
> (call-with-input-file "doc.isml" run-isml)
'("\n" #<void> "\n" #<void> "\n\noh my god I'm just so " "PRETTY" " todayy\n")

No, it’s fine. This actually makes sense. Some expressions like (require ...) result in void values, which appeared in our output. Let’s filter those out now.

(define (run-isml in)
(define ns (make-base-namespace))
(filter (negate void?)
  (for/list ([expr (in-list (read-isml in))])
    (eval expr ns))))

Let’s run that again.

> (call-with-input-file "doc.isml" run-isml)
'("\n" "\n" "\n\noh my god I'm just so " "PRETTY" " todayy\n")

There’s still a difference in how Pollen and our parser treats whitespace. Pollen trims contiguous whitespace and provides a root tag to make the resulting document follow XML rules. The ISML parser we use keeps all of the whitespace it can. I’m not going to cover cleaning up the whitespace because you probably know how to do that if you read this far.

All that matters is that we can see that our code ran, and we get PRETTY in both examples. We’re well on our way.

What about errors?

Errors? The code is perfect. I don’t know what you’re talking about.

For simplicity we’ll use with-handlers to print expressions that were being evaluated at the time of error, but not the location of the expression in the original document. We’ll then re-raise the exception to terminate the loop early.

(define (run-isml in)
(define ns (make-base-namespace))
(filter (negate void?)
  (for/list ([expr (in-list (read-isml in))])
    (with-handlers
      ([exn?
        (λ (e)
          (printf "~a~nexpr: ~v~n"
                  (exn-message e)
                  expr)
          (raise e))])
      (eval expr ns)))))

I’d recommend just creating a new exception with the extra information and raising that instead, but this will do for now.

Other #langs? In MY markup?

One of the nice features in my project Polyglot was that you could mix #langs in a single source file. Let’s cover how to add that feature here.

Add this before run-isml in your code:

(require syntax/modread)

(define (read-module in)
  (with-module-reading-parameterization
    (λ () (check-module-form (read-syntax (object-name in) in)
                             'ignored
                             "not a #lang module form"))))

(define (read-module/string str)
  (read-module (open-input-string str)))

(define (replace-module-name mod-form [id (gensym)])
  (datum->syntax mod-form
                 (list-set (syntax->datum mod-form) 1 id)))

These are procedures that give you the ability to read Racket modules from character ports, particularly read-module. with-module-reading-parameterization sounds freakishly complicated, but it isn’t. All it does is set options that makes Racket assume it’s about to read a new module like you would have it in a file. We’ll then use check-module-form after attempting to read the module with these options set. What that will do is give you some code that will declare a module when evaluated.

read-module/string is the same thing, but you can pass Racket code as a string to it (Again, untrusted input is a no-no here). replace-module-name is just a utility that we can use to rename the modules before we run them. This will be important in a minute.

Let’s modify our doc.isml file to hold a Racket module.

٭٭(register-module 'my-pollen #<<END
#lang pollen
...did you just flex on me?
END
)
٭(require 'my-pollen)
٭doc

We embedded Pollen markup in our markup. This means you can embed any Racket DSL. This is what Polyglot was about, and now it’s what you are about. You should feel tingly.

The ٭doc is an artifact of Pollen. Pollen markup modules provide a doc identifier bound to the output of that document. When we ran the module on the command line, it was printed instead.

So what about that #<< thing? That starts a herestring, which is just a way to express a multi-line string between some tags (END in this case). END must sit on it’s own line, and can’t even share it with the closing ). Notice that I am calling a procedure called register-module, which doesn’t exist yet. We’re going to use our namespace from earlier to hold this procedure and forward its arguments to our reading procedures from earlier.

Modify run-isml as follows:

(define (run-isml in)
(define (register-module id str)
  (eval-syntax (expand (replace-module-name (read-module/string str) id)) ns))
(define ns (make-base-namespace))
(namespace-set-variable-value! 'register-module register-module #t ns #t)
(filter (negate void?)
  (for/list ([expr (in-list (read-isml in))])
    (with-handlers
      ([exn?
        (λ (e)
          (printf "~a~nexpr: ~v~n"
                  (exn-message e)
                  expr)
          (raise e))])
      (eval expr ns)))))

Here we declare register-module and inject it for our documents’ use via namespace-set-variable-value!. register-module can then be used to read a module from a provided string and rename the module according to the user’s preference.

And with that, Pollen markup is our markup.

> (call-with-input-file "doc.isml" run-isml)
'("\n\n" "\n" "...did you just flex on me?" "\n")

One Last Thing

Before I wrap up here, let’s add one more formal to run-isml:

(define (run-isml in [preval '#f])
(define (register-module id str)
  (eval-syntax (expand (replace-module-name (read-module/string str) id)) ns))
(define ns (make-base-namespace))
(namespace-set-variable-value! 'register-module register-module #t ns #t)
(eval preval ns)
(filter (negate void?)
  (for/list ([expr (in-list (read-isml in))])
    (with-handlers
      ([exn?
        (λ (e)
          (printf "~a~nexpr: ~v~n"
                  (exn-message e)
                  expr)
          (raise e))])
      (eval expr ns)))))

Here, preval gives you a chance to run some code using the ISML namespace before the document program itself runs. You can use that to define common identifiers or require a library without needing to type it every time in all of your ISML documents.

All Together Now

Here’s our final module, weighing in at 61 lines (including blanks).

#lang racket
(provide read-isml run-isml)
(require syntax/modread)

(define (accrete-chars remaining [found '()])
  (let ([next (and (not (eq? remaining '())) (car remaining))])
    (if (char? next)
        (accrete-chars (cdr remaining) (cons next found))
        (values (apply string found)
                remaining))))

(define (accrete-content content [found '()])
  (if (eq? content '()) found
      (if (char? (car content))
          (let-values ([(str remaining) (accrete-chars content)])
            (accrete-content remaining (cons str found)))
          (accrete-content (cdr content) (cons (car content) found)))))

(define (read-isml in [read-controlled read])
  (define control-char (read-char in))
  (define (read-next state reader)
    (define datum (reader in))
    (if (eof-object? datum)
        state
        (if (eq? reader read-char)
            (if (char=? control-char datum)
                (read-next state read-controlled)
                (read-next (cons datum state) reader))
            (read-next (cons datum state) read-char))))
  (if (eof-object? control-char) '()
      (accrete-content (read-next '() read-char))))

(define (read-module in)
  (with-module-reading-parameterization
    (λ () (check-module-form (read-syntax (object-name in) in)
                             'ignored
                             "not a #lang module form"))))

(define (read-module/string str)
  (read-module (open-input-string str)))

(define (replace-module-name mod-form [id (gensym)])
  (datum->syntax mod-form
                 (list-set (syntax->datum mod-form) 1 id)))

(define (run-isml in [preval '#f])
  (define (register-module id str)
    (eval-syntax (expand (replace-module-name (read-module/string str) id)) ns))
  (define ns (make-base-namespace))
  (namespace-set-variable-value! 'register-module register-module #t ns #t)
  (eval preval ns)
  (filter (negate void?)
    (for/list ([expr (in-list (read-isml in))])
      (with-handlers
        ([exn?
          (λ (e)
            (printf "~a~nexpr: ~v~n"
                    (exn-message e)
                    expr)
            (raise e))])
        (eval expr ns)))))

Conclusion

In this article I showed you how ISML, a barebones markup language, can integrate with Racket. I took it further by showing you how your markup can express Racket modules of a different #lang inline for later expansion. The entire implementation is 61 lines of code (including blanks), meaning that you can whip up an incredibly powerful prose language at a moment’s notice.

If you found this article useful, please consider supporting my work. You can do so by buying a subscription, leaving a tip, or sharing this article. Thanks much.