.. post:: 2017-12-04 :tags: OCaml, Ppx, jbuilder .. _extension_points: Extension Points - 3 Years Later ================================ UPDATE: 2017-12-05 smondet_ pointed out that the extension doesn't work quite as the original. ppx_getenv_ should fetch the environment variable at compile time rather than at runtime. The extension and the snippets in this post have been updated to reflect this. It's been year over 3 years since `whitequark's blog post `_ marked the beginning of a Cambrian explosion of ppx extensions. My not completely accurate estimate (``opam search ppx_ | tail -n + 3 | wc -l``) gives me at least 80 packages. A lot has happened since that improved the quality of life for users and developers of ppx rewriters. `This blog post `_ from janestreet basically details the major advances: * The addition of ppx drivers, which greatly enhanced the performance and usability of ppx's. There's now only a single preprocessing pass, and there's no longer a need to serialize/deserialize the parse tree for every ppx rewriter. Drivers are also quite handy for debugging and testing (as we'll see later). * If you use ppx_core_ to define your extension and ppx_driver_ for as your driver, preprocessing is even faster because the preprocessing ppx_core_ rewriters in a single pass. Furthermore, ppx_core_ is a swiss army knife of ppx rewriter construction. It has a bunch of handy features such as an ast pattern language, safe attribute handling system, typo checking system, and more. There's a couple of other important improvements to the stack unmentioned in that article: * `ocaml-migrate-parsetree (omp) `_ decouples ppx rewriters from the version of ocaml being used. Now, ppx users no longer have to worry about their ppx extensions working on a new version of the compiler. Conversely, ppx authors no longer have to worry about supporting multiple versions of the parse tree. * jbuilder_ is a new build system that makes using omp and packaging your own ppx rewriters trivial. In this blog post, I'd like to tie in all these advances together in a small practical demonstration by reimplementing whitequark's original ppx_getenv_ rewriter which served as a starting point for other ppx's. In the process, I'd like to show off all the improvements made possible by all these advances, and encourage their wider adoption. Writing the ppx --------------- Since the source code for the rewriter is so brief, I will simply replicate it here and explain what's novel from the original ppx_getenv_. .. code-block:: ocaml open Ppx_core let name = "getenv" let expand ~loc ~path:_ (env : string) = match Caml.Sys.getenv env with | s -> [%expr Some ([%e Ast_builder.Default.estring s ~loc])] | exception Not_found -> [%expr None] let ext = Extension.declare name Extension.Context.expression Ast_pattern.(single_expr_payload (estring __)) expand let () = Ppx_driver.register_transformation name ~extensions:[ext] First let's review the familiar quasi quotations: .. code-block:: ocaml match Caml.Sys.getenv env with | s -> [%expr Some ([%e Ast_builder.Default.estring s ~loc])] | exception Not_found -> [%expr None] This is almost the same as before, but there are some subtle differences. First, it comes from a different package - ppx_metaquot_. Second, it expects a ``loc`` argument to exist in the lexical scope where the quotation is inserted. In our example, the location comes from the ``~loc`` labeled argument in ``expand``. Next, we declare the extension using Ppx_core: .. code-block:: ocaml let ext = Extension.declare name Extension.Context.expression Ast_pattern.(single_expr_payload (estring __)) expand We define the name of the payload this extension applies to (``getenv``), the kind of AST fragment it applies to (expressions), and the kind of pattern it must match, and finally, the function which will transform our expression node. Providing the name of the node up-front for the extension prevents us from accidentally declaring or using two extensions that apply to the same payload. Also, when users of this extension mistype the extension name, ppx_driver_ will offer helpful suggestions. But the most interesting part of course is the pattern itself. Which roughly says match any ``payload`` in ``[%getenv payload]`` to be an expression that is a string constant. This automatically extracts out the string into our ``expand`` function and gives us good error handling when the payload isn't what we'd expect. This primitive example doesn't really show off the full power of this pattern DSL. Which offers alternation/combination of patterns, capturing the location, matching on lists, tuples, and other goodies. I'll prepare better examples in another blog post. Packaging a Rewriter -------------------- Using jbuilder, creating a ppx rewriter is pretty trivial. All it takes is adding a ``(kind ppx_rewriter)``. If your rewriter has runtime dependencies for the code it generates just add it to the ``(ppx_runtime_libraries (...))`` list. In our case, our rewriter only requires specifying the ``kind`` .. code-block:: lisp (library ((name ppx_getenv2) (public_name ppx_getenv2) (wrapped false) (kind ppx_rewriter) ;; kind specified here (libraries (ppx_core ppx_driver)) (preprocess (pps (ppx_metaquot))))) The above looks deceivingly simple, but it accomplishes quite a lot for us under the hood. First the ``preprocess`` line will cause jbuilder to construct a driver for us that will make it quite easy for us to see our preprocessed code. This is quite handy if you're not sure what effect the ppx is having on your source (ppx_metaquot_ in our case): .. code-block:: ocaml (* $ _build/default/.ppx/ppx_metaquot/ppx.exe src/ppx_getenv2.ml *) (* ... output has been truncated ...*) let expand ~loc ~path:_ (env : string) = let env = Ast_builder.Default.estring env ~loc in { pexp_desc = (Pexp_match ({ pexp_desc = (Pexp_apply ({ pexp_desc = (Pexp_ident { txt = (Ldot ((Lident "Sys"), "getenv")); loc }); pexp_loc = loc; pexp_attributes = [] }, [(Nolabel, env)])); pexp_loc = loc; pexp_attributes = [] }, (*...*) jbuilder also takes care to generate a correct ``META`` file for us. One that will work for users of findlib (sometimes called *classical ppx*), and also users who construct drivers to preprocess their code. The runtime dependencies of code generated by our rewriter will be handled transparently for us. To give one example where this is matters, if ppx_deriving_yojson_ used jbuilder for packaging then users wouldn't have to remember to add ``ppx_deriving_yojson.runtime`` whenever they used that rewriter. Testing the Rewriter -------------------- This is where the driver stuff pays off again. It's quite easy to write tests for a preprocessor using a simple diff tool by comparing the results of the preprocessed source to what is expected [#]_. .. code-block:: lisp (executable ((name pp) (modules (pp)) (libraries (ppx_getenv2 ppx_driver)))) (rule ((targets (test.result)) (deps (test.ml)) (action (run ./pp.exe --impl ${<} -o ${@})))) (alias ((name runtest) (deps (test.result test.expected)) (action (run diff -dEbBt test.result test.expected)))) The source for ``pp.ml`` is just a trivial manual reconstruction of a driver: .. code-block:: ocaml Ppx_driver.standalone (); Of course jbuilder also lets us use our ppx rewriter directly when compiling an executable: .. code-block:: lisp (executable ((name test) (modules (test)) (preprocess (pps (ppx_getenv2))))) (alias ((name runtest) (deps (test.exe)) (action (run ${<})))) Which is a useful test to make sure that our preprocessed code type checks and the runtime dependencies of our rewriter are specified correctly. Conclusion ---------- The full source for this project is available `here `_ if you'd like to experiment or use this as a starting point. I will try to keep it updated as the ppx stack evolves. Note that this blog ignores a huge part of the ppx ecosystem by omitting the 2 deriving frameworks: ppx_type_conv_, and ppx_deriving_. It will take a separate blog post to do either of those justice. .. [#] kudos to Drup for finding the optimal set of `diff flags`__. My set of flags in this post is the subset that works on both OSX and Gnu diff. __ https://github.com/Drup/pumping/blob/master/test/jbuild#L15/ .. _ppx_core: https://github.com/janestreet/ppx_core .. _ppx_driver: https://github.com/janestreet/ppx_driver .. _ppx_type_conv: https://github.com/janestreet/ppx_type_conv .. _ppx_metaquot: https://github.com/janestreet/ppx_metaquot .. _jbuilder: https://github.com/janestreet/jbuilder .. _ppx_deriving: https://github.com/ocaml-ppx/ppx_deriving/ .. _ppx_getenv: https://github.com/ocaml-ppx/ppx_getenv/ .. _ppx_getenv2: https://github.com/rgrinberg/ppx_getenv2/ .. _ppx_deriving_yojson: https://github.com/ocaml-ppx/ppx_deriving_yojson .. _smondet: https://github.com/smondet