Extension Points - 3 Years Later¶
UPDATE: 2017-12-05 smondet pointed out that the extension doesn’t work quite as the original. ppx_getenv should fetch the environment variable at compile time rather than at runtime. The extension and the snippets in this post have been updated to reflect this.
It’s been year over 3 years since whitequark’s blog post
marked the beginning of a Cambrian explosion of ppx extensions. My not
completely accurate estimate (opam search ppx_ | tail -n + 3 | wc -l) gives
me at least 80 packages. A lot has happened since that improved the quality of
life for users and developers of ppx rewriters. This blog post
from janestreet basically details the major advances:
- The addition of ppx drivers, which greatly enhanced the performance and usability of ppx’s. There’s now only a single preprocessing pass, and there’s no longer a need to serialize/deserialize the parse tree for every ppx rewriter. Drivers are also quite handy for debugging and testing (as we’ll see later). 
- If you use ppx_core to define your extension and ppx_driver for as your driver, preprocessing is even faster because the preprocessing ppx_core rewriters in a single pass. Furthermore, ppx_core is a swiss army knife of ppx rewriter construction. It has a bunch of handy features such as an ast pattern language, safe attribute handling system, typo checking system, and more. 
There’s a couple of other important improvements to the stack unmentioned in that article:
- ocaml-migrate-parsetree (omp) decouples ppx rewriters from the version of ocaml being used. Now, ppx users no longer have to worry about their ppx extensions working on a new version of the compiler. Conversely, ppx authors no longer have to worry about supporting multiple versions of the parse tree. 
- jbuilder is a new build system that makes using omp and packaging your own ppx rewriters trivial. 
In this blog post, I’d like to tie in all these advances together in a small practical demonstration by reimplementing whitequark’s original ppx_getenv rewriter which served as a starting point for other ppx’s. In the process, I’d like to show off all the improvements made possible by all these advances, and encourage their wider adoption.
Writing the ppx¶
Since the source code for the rewriter is so brief, I will simply replicate it here and explain what’s novel from the original ppx_getenv.
open Ppx_core
let name = "getenv"
let expand ~loc ~path:_ (env : string) =
  match Caml.Sys.getenv env with
  | s -> [%expr Some ([%e Ast_builder.Default.estring s ~loc])]
  | exception Not_found -> [%expr None]
let ext =
  Extension.declare
    name
    Extension.Context.expression
    Ast_pattern.(single_expr_payload (estring __))
    expand
let () = Ppx_driver.register_transformation name ~extensions:[ext]
First let’s review the familiar quasi quotations:
match Caml.Sys.getenv env with
| s -> [%expr Some ([%e Ast_builder.Default.estring s ~loc])]
| exception Not_found -> [%expr None]
This is almost the same as before, but there are some subtle differences. First,
it comes from a different package - ppx_metaquot. Second, it expects a loc
argument to exist in the lexical scope where the quotation is inserted. In our
example, the location comes from the ~loc labeled argument in expand.
Next, we declare the extension using Ppx_core:
let ext =
  Extension.declare
    name
    Extension.Context.expression
    Ast_pattern.(single_expr_payload (estring __))
    expand
We define the name of the payload this extension applies to (getenv), the
kind of AST fragment it applies to (expressions), and the kind of pattern it
must match, and finally, the function which will transform our expression node.
Providing the name of the node up-front for the extension prevents us from
accidentally declaring or using two extensions that apply to the same payload.
Also, when users of this extension mistype the extension name, ppx_driver will
offer helpful suggestions.
But the most interesting part of course is the pattern itself. Which roughly
says match any payload in [%getenv payload] to be an expression that is
a string constant. This automatically extracts out the string into our
expand function and gives us good error handling when the payload isn’t what
we’d expect.
This primitive example doesn’t really show off the full power of this pattern DSL. Which offers alternation/combination of patterns, capturing the location, matching on lists, tuples, and other goodies. I’ll prepare better examples in another blog post.
Packaging a Rewriter¶
Using jbuilder, creating a ppx rewriter is pretty trivial. All it takes is
adding a (kind ppx_rewriter). If your rewriter has runtime dependencies for
the code it generates just add it to the (ppx_runtime_libraries (...)) list.
In our case, our rewriter only requires specifying the kind
(library
 ((name ppx_getenv2)
   (public_name ppx_getenv2)
   (wrapped false)
   (kind ppx_rewriter) ;; kind specified here
   (libraries (ppx_core ppx_driver))
   (preprocess (pps (ppx_metaquot)))))
The above looks deceivingly simple, but it accomplishes quite a lot for us under
the hood. First the preprocess line will cause jbuilder to construct a
driver for us that will make it quite easy for us to see our preprocessed code.
This is quite handy if you’re not sure what effect the ppx is having on your
source (ppx_metaquot in our case):
(* $ _build/default/.ppx/ppx_metaquot/ppx.exe src/ppx_getenv2.ml *)
(* ... output has been truncated ...*)
let expand ~loc  ~path:_  (env : string) =
  let env = Ast_builder.Default.estring env ~loc  in
  {
    pexp_desc =
      (Pexp_match
        ({
            pexp_desc =
              (Pexp_apply
                ({
                    pexp_desc =
                      (Pexp_ident
                        { txt = (Ldot ((Lident "Sys"), "getenv")); loc });
                    pexp_loc = loc;
                    pexp_attributes = []
                  }, [(Nolabel, env)]));
            pexp_loc = loc;
            pexp_attributes = []
          },
(*...*)
jbuilder also takes care to generate a correct META file for us. One that
will work for users of findlib (sometimes called classical ppx), and also
users who construct drivers to preprocess their code. The runtime dependencies
of code generated by our rewriter will be handled transparently for us. To give
one example where this is matters, if ppx_deriving_yojson used jbuilder for
packaging then users wouldn’t have to remember to add
ppx_deriving_yojson.runtime whenever they used that rewriter.
Testing the Rewriter¶
This is where the driver stuff pays off again. It’s quite easy to write tests for a preprocessor using a simple diff tool by comparing the results of the preprocessed source to what is expected 1.
(executable
 ((name pp)
  (modules (pp))
  (libraries (ppx_getenv2 ppx_driver))))
(rule
 ((targets (test.result))
  (deps (test.ml))
  (action (run ./pp.exe --impl ${<} -o ${@}))))
(alias
 ((name runtest)
  (deps (test.result test.expected))
  (action (run diff -dEbBt test.result test.expected))))
The source for pp.ml is just a trivial manual reconstruction of a driver:
Ppx_driver.standalone ();
Of course jbuilder also lets us use our ppx rewriter directly when compiling an executable:
(executable
 ((name test)
  (modules (test))
  (preprocess (pps (ppx_getenv2)))))
(alias
 ((name runtest)
  (deps (test.exe))
  (action (run ${<}))))
Which is a useful test to make sure that our preprocessed code type checks and the runtime dependencies of our rewriter are specified correctly.
Conclusion¶
The full source for this project is available here if you’d like to experiment or use this as a starting point. I will try to keep it updated as the ppx stack evolves. Note that this blog ignores a huge part of the ppx ecosystem by omitting the 2 deriving frameworks: ppx_type_conv, and ppx_deriving. It will take a separate blog post to do either of those justice.
- 1
- kudos to Drup for finding the optimal set of diff flags. My set of flags in this post is the subset that works on both OSX and Gnu diff.