Extension Points - 3 Years Later

UPDATE: 2017-12-05 smondet pointed out that the extension doesn’t work quite as the original. ppx_getenv should fetch the environment variable at compile time rather than at runtime. The extension and the snippets in this post have been updated to reflect this.

It’s been year over 3 years since whitequark’s blog post marked the beginning of a Cambrian explosion of ppx extensions. My not completely accurate estimate (opam search ppx_ | tail -n + 3 | wc -l) gives me at least 80 packages. A lot has happened since that improved the quality of life for users and developers of ppx rewriters. This blog post from janestreet basically details the major advances:

  • The addition of ppx drivers, which greatly enhanced the performance and usability of ppx’s. There’s now only a single preprocessing pass, and there’s no longer a need to serialize/deserialize the parse tree for every ppx rewriter. Drivers are also quite handy for debugging and testing (as we’ll see later).

  • If you use ppx_core to define your extension and ppx_driver for as your driver, preprocessing is even faster because the preprocessing ppx_core rewriters in a single pass. Furthermore, ppx_core is a swiss army knife of ppx rewriter construction. It has a bunch of handy features such as an ast pattern language, safe attribute handling system, typo checking system, and more.

There’s a couple of other important improvements to the stack unmentioned in that article:

  • ocaml-migrate-parsetree (omp) decouples ppx rewriters from the version of ocaml being used. Now, ppx users no longer have to worry about their ppx extensions working on a new version of the compiler. Conversely, ppx authors no longer have to worry about supporting multiple versions of the parse tree.

  • jbuilder is a new build system that makes using omp and packaging your own ppx rewriters trivial.

In this blog post, I’d like to tie in all these advances together in a small practical demonstration by reimplementing whitequark’s original ppx_getenv rewriter which served as a starting point for other ppx’s. In the process, I’d like to show off all the improvements made possible by all these advances, and encourage their wider adoption.

Writing the ppx

Since the source code for the rewriter is so brief, I will simply replicate it here and explain what’s novel from the original ppx_getenv.

open Ppx_core

let name = "getenv"

let expand ~loc ~path:_ (env : string) =
  match Caml.Sys.getenv env with
  | s -> [%expr Some ([%e Ast_builder.Default.estring s ~loc])]
  | exception Not_found -> [%expr None]

let ext =
  Extension.declare
    name
    Extension.Context.expression
    Ast_pattern.(single_expr_payload (estring __))
    expand

let () = Ppx_driver.register_transformation name ~extensions:[ext]

First let’s review the familiar quasi quotations:

match Caml.Sys.getenv env with
| s -> [%expr Some ([%e Ast_builder.Default.estring s ~loc])]
| exception Not_found -> [%expr None]

This is almost the same as before, but there are some subtle differences. First, it comes from a different package - ppx_metaquot. Second, it expects a loc argument to exist in the lexical scope where the quotation is inserted. In our example, the location comes from the ~loc labeled argument in expand.

Next, we declare the extension using Ppx_core:

let ext =
  Extension.declare
    name
    Extension.Context.expression
    Ast_pattern.(single_expr_payload (estring __))
    expand

We define the name of the payload this extension applies to (getenv), the kind of AST fragment it applies to (expressions), and the kind of pattern it must match, and finally, the function which will transform our expression node. Providing the name of the node up-front for the extension prevents us from accidentally declaring or using two extensions that apply to the same payload. Also, when users of this extension mistype the extension name, ppx_driver will offer helpful suggestions.

But the most interesting part of course is the pattern itself. Which roughly says match any payload in [%getenv payload] to be an expression that is a string constant. This automatically extracts out the string into our expand function and gives us good error handling when the payload isn’t what we’d expect.

This primitive example doesn’t really show off the full power of this pattern DSL. Which offers alternation/combination of patterns, capturing the location, matching on lists, tuples, and other goodies. I’ll prepare better examples in another blog post.

Packaging a Rewriter

Using jbuilder, creating a ppx rewriter is pretty trivial. All it takes is adding a (kind ppx_rewriter). If your rewriter has runtime dependencies for the code it generates just add it to the (ppx_runtime_libraries (...)) list. In our case, our rewriter only requires specifying the kind

(library
 ((name ppx_getenv2)
   (public_name ppx_getenv2)
   (wrapped false)
   (kind ppx_rewriter) ;; kind specified here
   (libraries (ppx_core ppx_driver))
   (preprocess (pps (ppx_metaquot)))))

The above looks deceivingly simple, but it accomplishes quite a lot for us under the hood. First the preprocess line will cause jbuilder to construct a driver for us that will make it quite easy for us to see our preprocessed code. This is quite handy if you’re not sure what effect the ppx is having on your source (ppx_metaquot in our case):

(* $ _build/default/.ppx/ppx_metaquot/ppx.exe src/ppx_getenv2.ml *)

(* ... output has been truncated ...*)
let expand ~loc  ~path:_  (env : string) =
  let env = Ast_builder.Default.estring env ~loc  in
  {
    pexp_desc =
      (Pexp_match
        ({
            pexp_desc =
              (Pexp_apply
                ({
                    pexp_desc =
                      (Pexp_ident
                        { txt = (Ldot ((Lident "Sys"), "getenv")); loc });
                    pexp_loc = loc;
                    pexp_attributes = []
                  }, [(Nolabel, env)]));
            pexp_loc = loc;
            pexp_attributes = []
          },
(*...*)

jbuilder also takes care to generate a correct META file for us. One that will work for users of findlib (sometimes called classical ppx), and also users who construct drivers to preprocess their code. The runtime dependencies of code generated by our rewriter will be handled transparently for us. To give one example where this is matters, if ppx_deriving_yojson used jbuilder for packaging then users wouldn’t have to remember to add ppx_deriving_yojson.runtime whenever they used that rewriter.

Testing the Rewriter

This is where the driver stuff pays off again. It’s quite easy to write tests for a preprocessor using a simple diff tool by comparing the results of the preprocessed source to what is expected 1.

(executable
 ((name pp)
  (modules (pp))
  (libraries (ppx_getenv2 ppx_driver))))

(rule
 ((targets (test.result))
  (deps (test.ml))
  (action (run ./pp.exe --impl ${<} -o ${@}))))

(alias
 ((name runtest)
  (deps (test.result test.expected))
  (action (run diff -dEbBt test.result test.expected))))

The source for pp.ml is just a trivial manual reconstruction of a driver:

Ppx_driver.standalone ();

Of course jbuilder also lets us use our ppx rewriter directly when compiling an executable:

(executable
 ((name test)
  (modules (test))
  (preprocess (pps (ppx_getenv2)))))

(alias
 ((name runtest)
  (deps (test.exe))
  (action (run ${<}))))

Which is a useful test to make sure that our preprocessed code type checks and the runtime dependencies of our rewriter are specified correctly.

Conclusion

The full source for this project is available here if you’d like to experiment or use this as a starting point. I will try to keep it updated as the ppx stack evolves. Note that this blog ignores a huge part of the ppx ecosystem by omitting the 2 deriving frameworks: ppx_type_conv, and ppx_deriving. It will take a separate blog post to do either of those justice.

1

kudos to Drup for finding the optimal set of diff flags. My set of flags in this post is the subset that works on both OSX and Gnu diff.

Comments

comments powered by Disqus