.. post:: 2019-08-20 :author: Rudi Grinberg :tags: OCaml, Ppx, ppxlib, deriving Deriving Slowly =============== There's been some recent grumbling about the usability of ppx in OCaml, and instead of just letting it slide again, I've decided to do something constructive about this and write a little tutorial on how to write a ``deriving`` plugin. Why ``deriving``? Well, some fairly experienced devs have expressed the difficulty in writing such a plugin. That is particularly surprising to me since I think it's one of the simpler API's in ppxlib_. I do not think that ppx programming is simple, but writing deriving plugins does not introduce much new complexity. I assume the reader is comfortable with the material in this `blog post`_ by Nathan Rebours. There's quite a lot of material that one can cover, so I will skip or skim through material that is already explained in that post. Theory ------ The `Ppxlib.Deriving`_ is the entry point for writing your own derivers. There's a single function ``add`` that allows one to `register` a deriver for types, exceptions, and open types. [#deriving_module]_ .. code-block:: ocaml val add : ?str_type_decl:(structure, rec_flag * type_declaration list) Generator.t -> ?str_type_ext :(structure, type_extension ) Generator.t -> ?str_exception:(structure, type_exception ) Generator.t -> ?sig_type_decl:(signature, rec_flag * type_declaration list) Generator.t -> ?sig_type_ext :(signature, type_extension ) Generator.t -> ?sig_exception:(signature, type_exception ) Generator.t -> ?extension:(loc:Location.t -> path:string -> core_type -> expression) -> string -> t Notice the ``str`` & ``sig`` symmetry present in the arguments. The former is for generating the code, while the latter is for the type signatures. Our problem now transforms into creating an appropriate ``Generator.t``. There are two ways to create this value: * `Deriving.Generator.make`_: which accepts arguments and flags in the application of the deriver: e.g. ``[@@deriving ~foo { bar }]``. * `Deriving.Generator.make_noarg`_: which does not accept any flags or arguments. I'll only cover the latter in this tutorial, but the adding arguments is simple as well. Here's the signature for ``make_noarg``: .. code-block:: ocaml val make_noarg : ?attributes:Attribute.packed list -> ?deps:deriver list -> (loc:Location.t -> path:string -> 'input_ast -> 'output_ast) -> ('output_ast, 'input_ast) t The main argument to focus on is the callback. This is the function that will generate parsetree fragments based on the type declarations or signatures. The callback will also get location information ``loc`` along with the file ``path`` of the source. The other argument that is worth noting ``?deps``. This is useful when you'd like your code generators to run in a particular order. Hello World ----------- Let's start by writing a trivial deriver. This deriver will simply output the type's name & path in a submodule ``Info_`` .. code-block:: ocaml open Base open Ppxlib (* Generate a module name Info_t from type [t] *) let module_name_of_type t = let type_name = t.ptype_name.txt in { t.ptype_name with txt = "Info_" ^ type_name } let str_gen ~loc ~path (_rec, t) = (* All nodes created using this Ast module will use [loc] by default *) let (module Ast) = Ast_builder.make loc in (* we are silently dropping mutually recursive definitions to keep things brief *) let t = List.hd_exn t in let info_module = let expr = (* we are using this ppxlib function to generate a full name for the type that includes the type variable *) let name = core_type_of_type_declaration t |> string_of_core_type in Ast.pmod_structure ( [%str let path = [%e Ast.estring path] let name = [%e Ast.estring name] ]) in let name = module_name_of_type t in Ast.module_binding ~name ~expr |> Ast.pstr_module in [info_module] Here we're using `ast_builder`_ and `metaquot`_ to generate a Parsetree fragment for the ``Info_`` module. I'll explain code generation in more detail in a later section. Similarly, we can automatically generate the signature for this generated submodule: .. code-block:: ocaml let sig_gen ~loc ~path:_ (_rec, t) = let (module Ast) = Ast_builder.make loc in (* we are silently dropping mutually recursive definitions to keep things brief *) let t = List.hd_exn t in let name = module_name_of_type t in let type_ = let sig_ = [%sig: val path : string val name : string ] in Ast.pmty_signature sig_ in Ast.module_declaration ~name ~type_ |> Ast.psig_module |> List.return We turn these functions into generators and create the ``hello_world`` plugin: .. code-block:: ocaml let name = "hello_world" let () = let str_type_decl = Deriving.Generator.make_noarg str_gen in let sig_type_decl = Deriving.Generator.make_noarg sig_gen in Deriving.add name ~str_type_decl ~sig_type_decl |> Deriving.ignore Finally, here's some example code that demonstrates the functionality: .. code-block:: ocaml type 'a my_typ = { foo : int ; bar : 'a } [@@deriving hello_world] let () = let open Info_my_typ in Format.eprintf "path: %s@.name: %s@." path name This generator is also used in the mli_ for the test. To tie all of this together, a couple of dune files have to be written. They're very typical so I will not include them here. But they will be available in the associated `github repo`_. I'll just remind that derivers should have ``(kind ppx_deriver)`` in their ``library`` stanza. Why is this necessary as opposed to ``ppx_rewriter``? It's so that dune knows to generate the appropriate META for ppx_deriving_ to load your plugins. As a side note, I plan to discuss the current situation with ppx_deriving_ vs ppxlib_ in another post. The current situation where we have two libraries to accomplish basically the same thing isn't ideal and we should think of a plan to converge. Next Steps ---------- To give an idea of how to write plugins, I've developed a few trivial plugins. While they aren't useful, they're easy to understand and hopefully contain instructive value. The plugins are: * `tuple`_ an automatic converter from records to tuples * `poly`_ a converter from normal to polymorphic variant constructors * `is_constr`_ a generator for predicates of the form: .. code-block:: ocaml let is_constr = function | Constr _ -> true | _ -> false They are all available in the repository accompanying this blog post, and I will be referring to snippets from these plugins in the next sections. Inspecting Type Declarations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To generate interesting code, one first has to extract relevant information from user written types. The parsetree_ is self contained in a single module and should be open as a reference when writing a plugin. In the end, most of the work is really just accessing the appropriate data in this data structure. I'll give a quick tour of the types one commonly inspects to write a deriving plugin. The bread and butter of a deriver usually consists of dealing with a type_declaration_ [#408]_: .. code-block:: ocaml and type_declaration = { ptype_name: string loc; ptype_params: (core_type * variance) list; (* ('a1,...'an) t; None represents _*) ptype_cstrs: (core_type * core_type * Location.t) list; (* ... constraint T1=T1' ... constraint Tn=Tn' *) ptype_kind: type_kind; ptype_private: private_flag; (* = private ... *) ptype_manifest: core_type option; (* = T *) ptype_attributes: attributes; (* ... [@@id1] [@@id2] *) ptype_loc: Location.t; } Most of these fields are quite straight forward and well documented. The ``ptype_name`` contains the name and the location, using the ubiquitous loc_: .. code-block:: ocaml type 'a loc = 'a Location.loc = { txt : 'a; loc : Location.t; } Now let me comment on the ``ptype_kind`` field. This is the field most derivers will inspect to dispatch their code generator. The ``ptype_kind`` tells us what kind of type we're dealing with: .. code-block:: ocaml and type_kind = | Ptype_abstract | Ptype_variant of constructor_declaration list | Ptype_record of label_declaration list (* Invariant: non-empty list *) | Ptype_open I'll cover the ``Ptype_variant`` and ``Ptype_record`` constructors here which predictably correpond to Records and Variants. Records ^^^^^^^ Every record contains a list of label_declaration_'s: .. code-block:: ocaml and label_declaration = { pld_name: string loc; pld_mutable: mutable_flag; pld_type: core_type; pld_loc: Location.t; pld_attributes: attributes; (* l : T [@id1] [@id2] *) } The name and the type are the fields one commonly works with. A common operation is to turn field names into accessors. This is done by converting them to long identifiers: .. code-block:: ocaml let lident_of_field field = (* We are reusing the locations of the field declarations for the accesses. *) Ast_builder.Default.Located.lident ~loc:field.pld_name.loc field.pld_name.txt Variants ^^^^^^^^ Variants are just a list of constructor_declaration_'s: .. code-block:: ocaml and constructor_declaration = { pcd_name: string loc; pcd_args: constructor_arguments; pcd_res: core_type option; pcd_loc: Location.t; pcd_attributes: attributes; (* C of ... [@id1] [@id2] *) } and constructor_arguments = | Pcstr_tuple of core_type list | Pcstr_record of label_declaration list (* | C of T1 * ... * Tn (res = None, args = Pcstr_tuple []) | C: T0 (res = Some T0, args = []) | C: T1 * ... * Tn -> T0 (res = Some T0, args = Pcstr_tuple) | C of {...} (res = None, args = Pcstr_record) | C: {...} -> T0 (res = Some T0, args = Pcstr_record) | C of {...} as t (res = None, args = Pcstr_record) *) Where the argument to the constructor is: * An empty list - corresponds to simple enums * A list with least one element - corresponds to constructor arguments * An inline record - Present since 4.03 There's also an optional return type annotation for GADT constructors. These are not covered in this post. In the poly_ deriver, I loop over all the constructor arguments to later generate the pattern that captures all the arguments and later uses them to construct the polymorphic variant: .. code-block:: ocaml let (arg_pat, arg_expr) = match constructor.pcd_args with | Pcstr_record _ -> Location.raise_errorf ~loc:constructor.pcd_name.loc "inline records aren't supported" | Pcstr_tuple args -> List.mapi args ~f:(fun i _ -> let var = "x" ^ Int.to_string i in let pat = Ast.ppat_var { txt = var; loc } in let expr = Ast.evar var in (pat, expr)) |> List.unzip in Generating Code ~~~~~~~~~~~~~~~ There are two main approaches for generating code: * Constructing parsetree nodes directly with ast_builder_. This is very simple but also very manual. It requires one to constantly refer to the parsetree documentation to know which AST nodes are necessary for your code. * Constructing AST fragments with quotations and anti quotations. This is the most readable approach and should always be favored whenever it's feasible. Ast_builder ^^^^^^^^^^^ This module is quite simple, and requires minimal documentation apart from the parsetree itself. It essentially provides a function for every single AST node one may construct. That is mainly superior to constructing the values directly, because it breaks less often when the parsetree changes. One also has two different ways of providing locations: * Via a labelled ``~loc`` argument. This is the style of the `Ast_builder.Default`_ module. * By partially applying the location to all functions via the `Ast_builder.make`_ function. This function returns a first class module with where all relevant functions have the ``~loc`` argument partially applied. This module is automatically generated from the Parsetree and that makes it simple conceptually. However, it's a real pain in the neck to work with because browsing the mli is no longer convenient. The most practical option is to browse the odoc generated docs. Here's an example of generating the code for converting records into tuples. We are aiming to generate code that looks like: .. code-block:: ocaml let tuple_t { x ; y } = (x, y) So after we get an idea of what the parsetree should look like with ``$ ocamlc -dparsetree``, we can generate the pattern on the left: .. code-block:: ocaml let record_pat = let fields = List.map fields ~f:(fun field -> let pattern = Ast.pvar field.pld_name.txt in let field_id = lident_of_field field in (field_id, pattern)) in Ast.ppat_record fields Closed in The tuple expression on the right: .. code-block:: ocaml let tuple_expr = List.map fields ~f:(fun field -> Ast.pexp_ident (lident_of_field field)) |> Ast.pexp_tuple in And glue them together in a function: .. code-block:: ocaml let fun_ = let f_name = let type_name = t.ptype_name.txt in "tuple_" ^ type_name in let pat = Ast.pvar f_name in let expr = Ast.pexp_fun Nolabel None record_pat tuple_expr in [Ast.value_binding ~pat ~expr] |> Ast.pstr_value Nonrecursive in [fun_] metaquot ^^^^^^^^ This is a much more usable way of generating code. Rather than constructing AST nodes directly, you write code fragments directly (quotations), and substitute dynamically constructed chunks into the fragments using anti-quotations. The catch is that there is a limited amount places you may insert anti quotations, and one must also remember the various forms for expressions, patterns, types, etc. In practice, metaquot is most useful when you have a fairly static code template that you'd like to instantiate. Here's an example from is_constr_ where metaquot is used to generate a function that will test for a particular constructor: .. code-block:: ocaml [%expr function | [%p pat] -> true | _ -> false ] Some things to note: The location comes from the ``loc`` found in the current lexical scope. ``[%expr ..]`` is used to generate values of type ``expression``. There are other quotations for patterns, types, signature, structures. ``[%p ..]`` is an anti quotation used to insert a pattern generated from the constructor we are generating the code for. Here is the snippet used to generate the pattern we substitute: .. code-block:: ocaml let name = { constr.pcd_name with txt = Longident.Lident constr.pcd_name.txt } in let pat = match constr.pcd_args with | Pcstr_tuple [] -> [] | Pcstr_tuple (_::_) | Pcstr_record _ -> [Ast.ppat_any] in Ast.ppat_construct name (Ast.ppat_tuple_opt pat) We are either generating a ``$constr`` or a ``$constr _`` pattern depending on whether the variant constructor has an argument. Metaquot is also used simplify generating the signature for the ``is_constr`` functions. Since these always return a boolean, we can use a quotation for the type signature: .. code-block:: ocaml let type_ = [%type: [%t type_] -> bool] in and ``type_`` is straight forward to build using the name from the declaration: .. code-block:: ocaml let type_ = let name = { t.ptype_name with txt = Longident.Lident t.ptype_name.txt } in Ast.ptyp_constr name [] in What else? ---------- The subject of generating code is quite rich, and I haven't covered some important topics: * Dealing with type generic types and GADT's * Dealing with polymorphic variants, objects, classes, exceptions, or open types. * Allow code generators to be used without type definitions. These are usually called expanders. * Writing AST traversals using the OO API. Once I have a better idea of what interest users, I'll surely revisit some of these topics. .. _ppx_deriving: https://github.com/ocaml-ppx/ppx-deriving .. _ppxlib: https://github.com/ocaml-ppx/ppxlib .. _blog post: https://tarides.com/blog/2019-05-09-an-introduction-to-ocaml-ppx-ecosystem.html .. _github repo: https://github.com/rgrinberg/deriving-slowly .. _type_declaration: https://github.com/ocaml/ocaml/blob/4.08.1/parsing/parsetree.mli#L423-L435 .. _is_constr: https://github.com/rgrinberg/blob/master/deriving-slowly/deriving-is-constr/src/deriving_is_constr.ml .. _poly: https://github.com/rgrinberg/blob/master/deriving-slowly/deriving-poly/src/deriving_poly.ml .. _tuple: https://github.com/rgrinberg/blob/master/deriving-slowly/deriving-tuple/src/deriving_tuple.ml .. _loc: https://github.com/ocaml/ocaml/blob/4.08.1/parsing/asttypes.mli#L54-L57 .. _ast_builder: https://github.com/ocaml-ppx/ppxlib/blob/0.8.1/src/ast_builder.mli .. _Ast_builder.make: https://github.com/ocaml-ppx/ppxlib/blob/0.8.1/src/ast_builder.mli#L97 .. _Ast_builder.Default: https://github.com/ocaml-ppx/ppxlib/blob/0.8.1/src/ast_builder.mli#L79-L88 .. _metaquot: https://github.com/ocaml-ppx/ppxlib/blob/0.8.1/metaquot/ppxlib_metaquot.ml .. _label_declaration: https://github.com/ocaml/ocaml/blob/4.08.1/parsing/parsetree.mli#L454-L461 .. _constructor_declaration: https://github.com/ocaml/ocaml/blob/4.08.1/parsing/parsetree.mli#L469-L476 .. _parsetree: https://github.com/ocaml/ocaml/blob/4.08.1/parsing/parsetree.mli .. _Ppxlib.Deriving: https://github.com/ocaml-ppx/ppxlib/blob/0.8.1/src/deriving.mli .. _Deriving.Generator.make: https://github.com/ocaml-ppx/ppxlib/blob/0.8.1/src/deriving.mli#L47-L52 .. _Deriving.Generator.make_noarg: https://github.com/ocaml-ppx/ppxlib/blob/0.8.1/src/deriving.mli#L54-L58 .. _mli: https://github.com/rgrinberg/deriving-slowly/blob/master/deriving-hello/test/hello_world_test.mli .. [#deriving_module] Unfortunately, it's not possible to generate code from module signatures. This is one of the few features that are available in ppx_deriving_. .. [#408] I use the 4.08 version of the parsetree for illustration