Deriving Slowly

There’s been some recent grumbling about the usability of ppx in OCaml, and instead of just letting it slide again, I’ve decided to do something constructive about this and write a little tutorial on how to write a deriving plugin.

Why deriving? Well, some fairly experienced devs have expressed the difficulty in writing such a plugin. That is particularly surprising to me since I think it’s one of the simpler API’s in ppxlib. I do not think that ppx programming is simple, but writing deriving plugins does not introduce much new complexity.

I assume the reader is comfortable with the material in this blog post by Nathan Rebours. There’s quite a lot of material that one can cover, so I will skip or skim through material that is already explained in that post.

Theory

The Ppxlib.Deriving is the entry point for writing your own derivers. There’s a single function add that allows one to register a deriver for types, exceptions, and open types. 1

val add
 :  ?str_type_decl:(structure, rec_flag * type_declaration list) Generator.t
 -> ?str_type_ext :(structure, type_extension                  ) Generator.t
 -> ?str_exception:(structure, type_exception                  ) Generator.t
 -> ?sig_type_decl:(signature, rec_flag * type_declaration list) Generator.t
 -> ?sig_type_ext :(signature, type_extension                  ) Generator.t
 -> ?sig_exception:(signature, type_exception                  ) Generator.t
 -> ?extension:(loc:Location.t -> path:string -> core_type -> expression)
 -> string
 -> t

Notice the str & sig symmetry present in the arguments. The former is for generating the code, while the latter is for the type signatures. Our problem now transforms into creating an appropriate Generator.t. There are two ways to create this value:

I’ll only cover the latter in this tutorial, but the adding arguments is simple as well.

Here’s the signature for make_noarg:

val make_noarg
  :  ?attributes:Attribute.packed list
  -> ?deps:deriver list
  -> (loc:Location.t -> path:string -> 'input_ast -> 'output_ast)
  -> ('output_ast, 'input_ast) t

The main argument to focus on is the callback. This is the function that will generate parsetree fragments based on the type declarations or signatures. The callback will also get location information loc along with the file path of the source. The other argument that is worth noting ?deps. This is useful when you’d like your code generators to run in a particular order.

Hello World

Let’s start by writing a trivial deriver. This deriver will simply output the type’s name & path in a submodule Info_<type_name>

open Base
open Ppxlib

(* Generate a module name Info_t from type [t] *)
let module_name_of_type t =
  let type_name = t.ptype_name.txt in
  { t.ptype_name with txt = "Info_" ^ type_name  }

let str_gen ~loc ~path (_rec, t) =
  (* All nodes created using this Ast module will use [loc] by default *)
  let (module Ast) = Ast_builder.make loc in
  (* we are silently dropping mutually recursive definitions to keep things
     brief *)
  let t = List.hd_exn t in
  let info_module =
    let expr =
      (* we are using this ppxlib function to generate a full name for the type
         that includes the type variable *)
      let name =
        core_type_of_type_declaration t
        |> string_of_core_type
      in
      Ast.pmod_structure (
        [%str
          let path = [%e Ast.estring path]
          let name = [%e Ast.estring name]
        ])
    in
    let name = module_name_of_type t in
    Ast.module_binding ~name ~expr
    |> Ast.pstr_module
  in
  [info_module]

Here we’re using ast_builder and metaquot to generate a Parsetree fragment for the Info_<type_name> module. I’ll explain code generation in more detail in a later section.

Similarly, we can automatically generate the signature for this generated submodule:

let sig_gen ~loc ~path:_ (_rec, t) =
  let (module Ast) = Ast_builder.make loc in
  (* we are silently dropping mutually recursive definitions to keep things
    brief *)
  let t = List.hd_exn t in
  let name = module_name_of_type t in
  let type_ =
    let sig_ =
      [%sig:
        val path : string
        val name : string
      ]
    in
    Ast.pmty_signature sig_
  in
  Ast.module_declaration ~name ~type_
  |> Ast.psig_module
  |> List.return

We turn these functions into generators and create the hello_world plugin:

let name = "hello_world"

let () =
  let str_type_decl = Deriving.Generator.make_noarg str_gen in
  let sig_type_decl = Deriving.Generator.make_noarg sig_gen in
  Deriving.add name ~str_type_decl ~sig_type_decl
  |> Deriving.ignore

Finally, here’s some example code that demonstrates the functionality:

type 'a my_typ =
  { foo : int
  ; bar : 'a
  } [@@deriving hello_world]

let () =
  let open Info_my_typ in
  Format.eprintf "path: %s@.name: %s@." path name

This generator is also used in the mli for the test.

To tie all of this together, a couple of dune files have to be written. They’re very typical so I will not include them here. But they will be available in the associated github repo. I’ll just remind that derivers should have (kind ppx_deriver) in their library stanza. Why is this necessary as opposed to ppx_rewriter? It’s so that dune knows to generate the appropriate META for ppx_deriving to load your plugins.

As a side note, I plan to discuss the current situation with ppx_deriving vs ppxlib in another post. The current situation where we have two libraries to accomplish basically the same thing isn’t ideal and we should think of a plan to converge.

Next Steps

To give an idea of how to write plugins, I’ve developed a few trivial plugins. While they aren’t useful, they’re easy to understand and hopefully contain instructive value. The plugins are:

  • tuple an automatic converter from records to tuples

  • poly a converter from normal to polymorphic variant constructors

  • is_constr a generator for predicates of the form:

let is_constr = function
  | Constr _ -> true
  | _ -> false

They are all available in the repository accompanying this blog post, and I will be referring to snippets from these plugins in the next sections.

Inspecting Type Declarations

To generate interesting code, one first has to extract relevant information from user written types. The parsetree is self contained in a single module and should be open as a reference when writing a plugin. In the end, most of the work is really just accessing the appropriate data in this data structure. I’ll give a quick tour of the types one commonly inspects to write a deriving plugin.

The bread and butter of a deriver usually consists of dealing with a type_declaration 2:

and type_declaration =
    {
     ptype_name: string loc;
     ptype_params: (core_type * variance) list;
           (* ('a1,...'an) t; None represents  _*)
     ptype_cstrs: (core_type * core_type * Location.t) list;
           (* ... constraint T1=T1'  ... constraint Tn=Tn' *)
     ptype_kind: type_kind;
     ptype_private: private_flag;   (* = private ... *)
     ptype_manifest: core_type option;  (* = T *)
     ptype_attributes: attributes;   (* ... [@@id1] [@@id2] *)
     ptype_loc: Location.t;
    }

Most of these fields are quite straight forward and well documented. The ptype_name contains the name and the location, using the ubiquitous loc:

type 'a loc = 'a Location.loc = {
  txt : 'a;
  loc : Location.t;
}

Now let me comment on the ptype_kind field. This is the field most derivers will inspect to dispatch their code generator.

The ptype_kind tells us what kind of type we’re dealing with:

and type_kind =
  | Ptype_abstract
  | Ptype_variant of constructor_declaration list
  | Ptype_record of label_declaration list (* Invariant: non-empty list *)
  | Ptype_open

I’ll cover the Ptype_variant and Ptype_record constructors here which predictably correpond to Records and Variants.

Records

Every record contains a list of label_declaration’s:

and label_declaration =
    {
     pld_name: string loc;
     pld_mutable: mutable_flag;
     pld_type: core_type;
     pld_loc: Location.t;
     pld_attributes: attributes; (* l : T [@id1] [@id2] *)
    }

The name and the type are the fields one commonly works with. A common operation is to turn field names into accessors. This is done by converting them to long identifiers:

let lident_of_field field =
  (* We are reusing the locations of the field declarations for the
     accesses. *)
  Ast_builder.Default.Located.lident
    ~loc:field.pld_name.loc
    field.pld_name.txt

Variants

Variants are just a list of constructor_declaration’s:

and constructor_declaration =
    {
     pcd_name: string loc;
     pcd_args: constructor_arguments;
     pcd_res: core_type option;
     pcd_loc: Location.t;
     pcd_attributes: attributes; (* C of ... [@id1] [@id2] *)
    }

and constructor_arguments =
  | Pcstr_tuple of core_type list
  | Pcstr_record of label_declaration list

(*
  | C of T1 * ... * Tn     (res = None,    args = Pcstr_tuple [])
  | C: T0                  (res = Some T0, args = [])
  | C: T1 * ... * Tn -> T0 (res = Some T0, args = Pcstr_tuple)
  | C of {...}             (res = None,    args = Pcstr_record)
  | C: {...} -> T0         (res = Some T0, args = Pcstr_record)
  | C of {...} as t        (res = None,    args = Pcstr_record)
*)

Where the argument to the constructor is:

  • An empty list - corresponds to simple enums

  • A list with least one element - corresponds to constructor arguments

  • An inline record - Present since 4.03

There’s also an optional return type annotation for GADT constructors. These are not covered in this post.

In the poly deriver, I loop over all the constructor arguments to later generate the pattern that captures all the arguments and later uses them to construct the polymorphic variant:

let (arg_pat, arg_expr) =
  match constructor.pcd_args with
  | Pcstr_record _ ->
    Location.raise_errorf ~loc:constructor.pcd_name.loc
      "inline records aren't supported"
  | Pcstr_tuple args ->
    List.mapi args ~f:(fun i _ ->
        let var = "x" ^ Int.to_string i in
        let pat = Ast.ppat_var { txt = var; loc } in
        let expr = Ast.evar var in
        (pat, expr))
    |> List.unzip
in

Generating Code

There are two main approaches for generating code:

  • Constructing parsetree nodes directly with ast_builder. This is very simple but also very manual. It requires one to constantly refer to the parsetree documentation to know which AST nodes are necessary for your code.

  • Constructing AST fragments with quotations and anti quotations. This is the most readable approach and should always be favored whenever it’s feasible.

Ast_builder

This module is quite simple, and requires minimal documentation apart from the parsetree itself. It essentially provides a function for every single AST node one may construct. That is mainly superior to constructing the values directly, because it breaks less often when the parsetree changes.

One also has two different ways of providing locations:

  • Via a labelled ~loc argument. This is the style of the Ast_builder.Default module.

  • By partially applying the location to all functions via the Ast_builder.make function. This function returns a first class module with where all relevant functions have the ~loc argument partially applied.

This module is automatically generated from the Parsetree and that makes it simple conceptually. However, it’s a real pain in the neck to work with because browsing the mli is no longer convenient. The most practical option is to browse the odoc generated docs.

Here’s an example of generating the code for converting records into tuples. We are aiming to generate code that looks like:

let tuple_t { x ; y } = (x, y)

So after we get an idea of what the parsetree should look like with $ ocamlc -dparsetree, we can generate the pattern on the left:

let record_pat =
  let fields = List.map fields ~f:(fun field ->
      let pattern = Ast.pvar field.pld_name.txt in
      let field_id = lident_of_field field in
      (field_id, pattern))
  in
  Ast.ppat_record fields Closed
in

The tuple expression on the right:

let tuple_expr =
  List.map fields ~f:(fun field ->
      Ast.pexp_ident (lident_of_field field))
  |> Ast.pexp_tuple
in

And glue them together in a function:

let fun_ =
  let f_name =
    let type_name = t.ptype_name.txt in
    "tuple_" ^ type_name
  in
  let pat = Ast.pvar f_name in
  let expr = Ast.pexp_fun Nolabel None record_pat tuple_expr in
  [Ast.value_binding ~pat ~expr]
  |> Ast.pstr_value Nonrecursive
in
[fun_]

metaquot

This is a much more usable way of generating code. Rather than constructing AST nodes directly, you write code fragments directly (quotations), and substitute dynamically constructed chunks into the fragments using anti-quotations. The catch is that there is a limited amount places you may insert anti quotations, and one must also remember the various forms for expressions, patterns, types, etc.

In practice, metaquot is most useful when you have a fairly static code template that you’d like to instantiate. Here’s an example from is_constr where metaquot is used to generate a function that will test for a particular constructor:

[%expr
  function
  | [%p pat] -> true
  | _        -> false
]

Some things to note:

The location comes from the loc found in the current lexical scope.

[%expr ..] is used to generate values of type expression. There are other quotations for patterns, types, signature, structures.

[%p ..] is an anti quotation used to insert a pattern generated from the constructor we are generating the code for. Here is the snippet used to generate the pattern we substitute:

let name =
  { constr.pcd_name with
    txt = Longident.Lident constr.pcd_name.txt
  }
in
let pat =
  match constr.pcd_args with
  | Pcstr_tuple [] -> []
  | Pcstr_tuple (_::_)
  | Pcstr_record _ -> [Ast.ppat_any]
in
Ast.ppat_construct name (Ast.ppat_tuple_opt pat)

We are either generating a $constr or a $constr _ pattern depending on whether the variant constructor has an argument.

Metaquot is also used simplify generating the signature for the is_constr functions. Since these always return a boolean, we can use a quotation for the type signature:

let type_ = [%type: [%t type_] -> bool] in

and type_ is straight forward to build using the name from the declaration:

let type_ =
  let name = { t.ptype_name with txt = Longident.Lident t.ptype_name.txt } in
  Ast.ptyp_constr name [] in

What else?

The subject of generating code is quite rich, and I haven’t covered some important topics:

  • Dealing with type generic types and GADT’s

  • Dealing with polymorphic variants, objects, classes, exceptions, or open types.

  • Allow code generators to be used without type definitions. These are usually called expanders.

  • Writing AST traversals using the OO API.

Once I have a better idea of what interest users, I’ll surely revisit some of these topics.

1

Unfortunately, it’s not possible to generate code from module signatures. This is one of the few features that are available in ppx_deriving.

2

I use the 4.08 version of the parsetree for illustration

Comments

comments powered by Disqus