Free Monads in the Wild - OCaml Edition¶

OCaml programmers don’t seem to resort to free monads much. I like to imagine that this is the case because we’re a practical bunch. But it could simply be that this technique like other monads is a bit heavyweight syntactically, let alone the performance implications it might have.

Regardless, the pattern does seem to occur from time to time. It’s worth pointing it out when it happens, to raise awareness for it. I know I still fail to see opportunities when it is useful. At least I managed to recognize this pattern when I encountered it in opam’s source code.

OPAM’s Job Module¶

OPAM has a convenient process management module called OpamProcess. The workhorse in it is the Job.Op sub module. Where it defines the following primitive for running jobs:

type 'a job =
  | Done of 'a
  | Run of command * (result -> 'a job)

where command, and result are both just mundane records that include details such as the actual command to run, arguments, the environment, cwd, status code returned, stdout, stderr, etc. OpamProcess even includes a val run : command -> result function that lets you go from one type to the other by just executing the command.

The module also contains a few handy combinators for manipulating these jobs:

val (@@>): command -> (result -> 'a job) -> 'a job
val (@@+): 'a job -> ('a -> 'b job) -> 'b job
val (@@|): 'a job -> ('a -> 'b) -> 'b job

OPAM’s source contains lengthy documentation about these functions. But an experienced reader will notice that this is just map, bind, and a smart constructor for Run.

As an example, let’s use this module to write a simple job that will recursively find all ml files in your cwd, and put them in mls.tar. A bit contrived, but bear with me.

open OpamProcess

let tar_mls =
  let open Job.Op in
  (* passing dir is required for stdout to be saved. This will create
     temp files that opam will clean on its own *)
  let command cmd args = command ~verbose:false ~dir:"." ~name:cmd cmd args in
  let all_mls =
    (command "find" ["." ; "-iname" ; "*.ml"])
    @@> (fun r -> Done r.r_stdout) in
  all_mls @@+ (fun mls ->
      (command "tar" (["-cf" ; "mls.tar"] @ mls))
      @@> (fun r -> Done r.r_code)
    )

Kind of ugly, although I blame that mostly on opam’s selection of combinator names. A more standard set of combinators, say >>= and >>| for bind and map respectively would make this code shed a lot of indentation and useless names. A few more seemingly absent functions are these staples from Control.Monad:

val sequence : 'a job list -> 'a list job
val mapM : ('a -> 'b job) -> 'a list -> 'b list job
val foldM : ('b -> 'a -> 'b job) -> 'b -> 'a list -> 'b job

The functions would improve usability quite a lot, but OpamProcess isn’t a part of any standard lib. So it’s understandable why it wants to just cater to its own internal use cases.

Anyhow, let’s run this (at your own risk!) to convince ourselves this does something. Note that running the job requires calling val run : 'a Op.Job -> 'a like this:

let () = ignore (Job.run tar_mls)

What’s the point of this ceremony? We could have just as easily written this directly by running commands and inspecting their output. What we’re gaining here is the separation of the computational content from the effects that it produces. Put another way, this is the essence of the interpreter pattern. We build up a pure computation, and use an interpreter (in this case run) to force the effects. One of the main benefits is the ability to implement different interpreters for the same computation.

Consider:

val dry_run : 'a Op.job -> 'a

This will execute our computation without actually running any commands but will instead pass a dummy result to subsequent commands.

The possibilities are endless here, you can have interpreters that log commands and results, inject environment variables, run every command with nice, etc. Testability is another strong feature of this technique. It should be easy to “mock” run every command and return some dummy output in anticipation of a particular result.

Generalizing to Free Monads¶

First, let me put up a disclaimer that I’m definitely the wrong person to teach the theory and fundamentals of free monads. I have neither the pedagogical skill nor the deep understanding required for this. But I do hope that this post will whet one’s appetite to look further into this subject.

Let’s quickly recall what a free monad is. A free monad is a monad construction out of a functor. Where functor refers to the haskell functor type class or the following module signature - not be confused with a parameterized module. I will try to clarify this unfortunate overlap of terms throughout this post.

module type Functor = sig
  type 'a t
  val map : ('a -> 'b) -> 'a t -> 'b t
end

to construct a free monad out of a functor, we can use an OCaml functor. A free monad is of course a monad so we’ll define the appropriate bind (>>=) and return operations. Although I will construct a val join : 'a t t -> 'a t and a val map : ('a -> 'b) -> 'a t -> 'b t instead since in my opinion, the definition is simpler.

module Free (F : Functor) = struct
  type 'a t =
    | Return of 'a
    | Free of ('a t) F.t

  let return a = Return a

  let rec map f = function
    | Return a -> Return (f a)
    | Free a -> Free (F.map (map f) a)

  let rec join = function
    | Return a -> a
    | Free f -> Free (F.map join f)

  let lift f = Free (F.map return f)

  (* The following definition is standard for any monad: *)
  let (>>=) t f = join (map t f)
  (* Infix map in the style of core *)
  let (>>|) t f = map f t
end

Here’s the associated functor (the type class or module signature) instance:

module JobF = struct
  type 'a t = Run of command * (result -> 'a)
  let map f (Run (c, k)) = Run (c, fun res -> f (k res))
end

You can convince yourself with simple substitution that Free(JobF).t is equivalent to 'a Job.Op.t. We’ll write this OCaml functor application as:

module J = struct
  include Free(JobF)
  let run c = lift (JobF.Run (c, fun i -> i))
end

The module is called J to avoid conflicts with Job and the run constructor will be very useful for us.

let _ =
  let tar_mls =
    let open J in
    run (command "find" ["." ; "-iname" ; "*.ml"])
    >>| (fun r -> r.r_stdout) >>= fun mls ->
    run (command "tar" (["-cf" ; "mls.tar"] @ mls))
  in
  Lwt_main.run (run_lwt tar_mls)

For completeness, let’s write our own interpreter for JobF. This is mostly just a demonstration to show how it’s done. But we’ll also make our own interpreter execute the commands with Lwt. Decoupling your programs from a particular implementation is a worthy goal in OCaml. So it’s worth to advertise as many techniques as possible for accomplishing this.

Here’s my hastily written implementation. (An Async implementation is an exercise for the reader)

let rec run_lwt : 'a J.t -> 'a Lwt.t = function
  | J.Return a -> Lwt.return a
  | J.Free (JobF.Run (cmd, k)) ->
    let (cmd, args) =
      (* Gross but necessary because command is abstract *)
      match Str.split (Str.regexp "[ \t]+") (string_of_command cmd) with
      | c::args -> (c, Array.of_list (c::args))
      | _ -> failwith "invalid command" in
    let p = Lwt_process.open_process_full (cmd, args) in
    let code_signal =
      p # status >|= function
      | Unix.WEXITED c -> (c, None)
      | Unix.WSTOPPED c
      | Unix.WSIGNALED c -> (256, Some c) in
    let read_stream s = Lwt_stream.to_list (Lwt_io.read_lines s) in
    let stdout = read_stream (p # stdout) in
    let stderr = read_stream (p # stderr) in
    let result =
      stdout >>= fun r_stdout ->
      stderr >>= fun r_stderr ->
      code_signal >|= fun (r_code, r_signal) ->
      { r_code
      ; r_signal
      ; r_duration = 0.0 (* TODO for the reader *)
      ; r_info = []
      ; r_stdout
      ; r_stderr
      ; r_cleanup = [] } in
    Lwt_io.close p # stdin >>= fun () ->
    Lwt.map k result >>= run_lwt

This should be enough to show one of the advantages of free monads. The ability to write your program in a way that separates effects from your computation gives us the opportunity to reinterpret these effects in different ways. This isn’t the only benefit of using free monads, and the example cited here is one of the simplest functors you can imagine, but it’s still nice to see how even this is useful in the context of real software.