.. post:: 2016-07-22 :tags: OCaml, Monads, OPAM, Free Monads :author: Rudi Grinberg Free Monads in the Wild - OCaml Edition ======================================== OCaml programmers don't seem to resort to free monads much. I like to imagine that this is the case because we're a practical bunch. But it could simply be that this technique like other monads is a bit heavyweight syntactically, let alone the performance implications it might have. Regardless, the pattern does seem to occur from time to time. It's worth pointing it out when it happens, to raise awareness for it. I know I still fail to see opportunities when it is useful. At least I managed to recognize this pattern when I encountered it in opam's source code. OPAM's Job Module ----------------- OPAM has a convenient process management module called ``OpamProcess``. The workhorse in it is the ``Job.Op`` sub module. Where it defines the following primitive for running jobs: .. code-block:: ocaml type 'a job = | Done of 'a | Run of command * (result -> 'a job) where ``command``, and ``result`` are both just mundane records that include details such as the actual command to run, arguments, the environment, cwd, status code returned, stdout, stderr, etc. ``OpamProcess`` even includes a ``val run : command -> result`` function that lets you go from one type to the other by just executing the command. The module also contains a few handy combinators for manipulating these jobs: .. code-block:: ocaml val (@@>): command -> (result -> 'a job) -> 'a job val (@@+): 'a job -> ('a -> 'b job) -> 'b job val (@@|): 'a job -> ('a -> 'b) -> 'b job OPAM's source contains lengthy documentation about these functions. But an experienced reader will notice that this is just ``map``, ``bind``, and a smart constructor for ``Run``. As an example, let's use this module to write a simple job that will recursively find all ml files in your cwd, and put them in ``mls.tar``. A bit contrived, but bear with me. .. code-block:: ocaml open OpamProcess let tar_mls = let open Job.Op in (* passing dir is required for stdout to be saved. This will create temp files that opam will clean on its own *) let command cmd args = command ~verbose:false ~dir:"." ~name:cmd cmd args in let all_mls = (command "find" ["." ; "-iname" ; "*.ml"]) @@> (fun r -> Done r.r_stdout) in all_mls @@+ (fun mls -> (command "tar" (["-cf" ; "mls.tar"] @ mls)) @@> (fun r -> Done r.r_code) ) Kind of ugly, although I blame that mostly on opam's selection of combinator names. A more standard set of combinators, say ``>>=`` and ``>>|`` for ``bind`` and ``map`` respectively would make this code shed a lot of indentation and useless names. A few more seemingly absent functions are these staples from ``Control.Monad``: .. code-block:: ocaml val sequence : 'a job list -> 'a list job val mapM : ('a -> 'b job) -> 'a list -> 'b list job val foldM : ('b -> 'a -> 'b job) -> 'b -> 'a list -> 'b job The functions would improve usability quite a lot, but ``OpamProcess`` isn't a part of any standard lib. So it's understandable why it wants to just cater to its own internal use cases. Anyhow, let's run this (at your own risk!) to convince ourselves this does something. Note that running the job requires calling ``val run : 'a Op.Job -> 'a`` like this: .. code-block:: ocaml let () = ignore (Job.run tar_mls) What's the point of this ceremony? We could have just as easily written this directly by running commands and inspecting their output. What we're gaining here is the separation of the computational content from the effects that it produces. Put another way, this is the essence of the interpreter pattern. We build up a pure computation, and use an interpreter (in this case ``run``) to force the effects. One of the main benefits is the ability to implement different interpreters for the same computation. Consider: .. code-block:: ocaml val dry_run : 'a Op.job -> 'a This will execute our computation without actually running any commands but will instead pass a dummy result to subsequent commands. The possibilities are endless here, you can have interpreters that log commands and results, inject environment variables, run every command with ``nice``, etc. Testability is another strong feature of this technique. It should be easy to "mock" run every command and return some dummy output in anticipation of a particular result. Generalizing to Free Monads --------------------------- First, let me put up a disclaimer that I'm definitely the wrong person to teach the theory and fundamentals of free monads. I have neither the pedagogical skill nor the deep understanding required for this. But I do hope that this post will whet one's appetite to look further into this subject. Let's quickly recall what a free monad is. A free monad is a monad construction out of a functor. Where functor refers to the `haskell` functor type class or the following module signature - not be confused with a parameterized module. I will try to clarify this unfortunate overlap of terms throughout this post. .. code-block:: ocaml module type Functor = sig type 'a t val map : ('a -> 'b) -> 'a t -> 'b t end to construct a free monad out of a functor, we can use an `OCaml` functor. A free monad is of course a monad so we'll define the appropriate ``bind`` (``>>=``) and ``return`` operations. Although I will construct a ``val join : 'a t t -> 'a t`` and a ``val map : ('a -> 'b) -> 'a t -> 'b t`` instead since in my opinion, the definition is simpler. .. code-block:: ocaml module Free (F : Functor) = struct type 'a t = | Return of 'a | Free of ('a t) F.t let return a = Return a let rec map f = function | Return a -> Return (f a) | Free a -> Free (F.map (map f) a) let rec join = function | Return a -> a | Free f -> Free (F.map join f) let lift f = Free (F.map return f) (* The following definition is standard for any monad: *) let (>>=) t f = join (map t f) (* Infix map in the style of core *) let (>>|) t f = map f t end Here's the associated functor (the type class or module signature) instance: .. code-block:: ocaml module JobF = struct type 'a t = Run of command * (result -> 'a) let map f (Run (c, k)) = Run (c, fun res -> f (k res)) end You can convince yourself with simple substitution that ``Free(JobF).t`` is equivalent to ``'a Job.Op.t``. We'll write this `OCaml` functor application as: .. code-block:: ocaml module J = struct include Free(JobF) let run c = lift (JobF.Run (c, fun i -> i)) end The module is called ``J`` to avoid conflicts with ``Job`` and the ``run`` constructor will be very useful for us. .. code-block:: ocaml let _ = let tar_mls = let open J in run (command "find" ["." ; "-iname" ; "*.ml"]) >>| (fun r -> r.r_stdout) >>= fun mls -> run (command "tar" (["-cf" ; "mls.tar"] @ mls)) in Lwt_main.run (run_lwt tar_mls) For completeness, let's write our own interpreter for ``JobF``. This is mostly just a demonstration to show how it's done. But we'll also make our own interpreter execute the commands with Lwt. Decoupling your programs from a particular implementation is a worthy goal in OCaml. So it's worth to advertise as many techniques as possible for accomplishing this. Here's my hastily written implementation. (An Async implementation is an exercise for the reader) .. code-block:: ocaml let rec run_lwt : 'a J.t -> 'a Lwt.t = function | J.Return a -> Lwt.return a | J.Free (JobF.Run (cmd, k)) -> let (cmd, args) = (* Gross but necessary because command is abstract *) match Str.split (Str.regexp "[ \t]+") (string_of_command cmd) with | c::args -> (c, Array.of_list (c::args)) | _ -> failwith "invalid command" in let p = Lwt_process.open_process_full (cmd, args) in let code_signal = p # status >|= function | Unix.WEXITED c -> (c, None) | Unix.WSTOPPED c | Unix.WSIGNALED c -> (256, Some c) in let read_stream s = Lwt_stream.to_list (Lwt_io.read_lines s) in let stdout = read_stream (p # stdout) in let stderr = read_stream (p # stderr) in let result = stdout >>= fun r_stdout -> stderr >>= fun r_stderr -> code_signal >|= fun (r_code, r_signal) -> { r_code ; r_signal ; r_duration = 0.0 (* TODO for the reader *) ; r_info = [] ; r_stdout ; r_stderr ; r_cleanup = [] } in Lwt_io.close p # stdin >>= fun () -> Lwt.map k result >>= run_lwt This should be enough to show one of the advantages of free monads. The ability to write your program in a way that separates effects from your computation gives us the opportunity to reinterpret these effects in different ways. This isn't the only benefit of using free monads, and the example cited here is one of the simplest functors you can imagine, but it's still nice to see how even this is useful in the context of real software. Where to Go from Here? ---------------------- Gabriel Gonzales' posts about Free Monads are the best introduction to the subject: http://www.haskellforall.com/2012/07/purify-code-using-free-monads.html http://www.haskellforall.com/2012/06/you-could-have-invented-free-monads.html John A De Goe's post is an excellent resource to learn how to use free monads in practice: http://degoes.net/articles/modern-fp ocaml-free from NicolasT is a small but pretty complete library for Free monads. It also includes support for Free Applicatives a bonus. https://github.com/NicolasT/ocaml-free