.. post:: 2016-07-22
   :tags: OCaml, Monads, OPAM, Free Monads
   :author: Rudi Grinberg

Free Monads in the Wild - OCaml Edition
========================================

OCaml programmers don't seem to resort to free monads much. I like to imagine
that this is the case because we're a practical bunch. But it could simply be
that this technique like other monads is a bit heavyweight syntactically, let
alone the performance implications it might have.

Regardless, the pattern does seem to occur from time to time. It's worth
pointing it out when it happens, to raise awareness for it. I know I still fail
to see opportunities when it is useful. At least I managed to recognize this
pattern when I encountered it in opam's source code.

OPAM's Job Module
-----------------

OPAM has a convenient process management module called ``OpamProcess``. The
workhorse in it is the ``Job.Op`` sub module. Where it defines the following
primitive for running jobs:

.. code-block:: ocaml

   type 'a job =
     | Done of 'a
     | Run of command * (result -> 'a job)

where ``command``, and ``result`` are both just mundane records that include
details such as the actual command to run, arguments, the environment, cwd,
status code returned, stdout, stderr, etc. ``OpamProcess`` even includes a ``val
run : command -> result`` function that lets you go from one type to the other
by just executing the command.

The module also contains a few handy combinators for manipulating these jobs:

.. code-block:: ocaml

   val (@@>): command -> (result -> 'a job) -> 'a job
   val (@@+): 'a job -> ('a -> 'b job) -> 'b job
   val (@@|): 'a job -> ('a -> 'b) -> 'b job

OPAM's source contains lengthy documentation about these functions. But an
experienced reader will notice that this is just ``map``, ``bind``, and a smart
constructor for ``Run``.

As an example, let's use this module to write a simple job that will recursively
find all ml files in your cwd, and put them in ``mls.tar``. A bit contrived, but
bear with me.

.. code-block:: ocaml

   open OpamProcess

   let tar_mls =
     let open Job.Op in
     (* passing dir is required for stdout to be saved. This will create
        temp files that opam will clean on its own *)
     let command cmd args = command ~verbose:false ~dir:"." ~name:cmd cmd args in
     let all_mls =
       (command "find" ["." ; "-iname" ; "*.ml"])
       @@> (fun r -> Done r.r_stdout) in
     all_mls @@+ (fun mls ->
         (command "tar" (["-cf" ; "mls.tar"] @ mls))
         @@> (fun r -> Done r.r_code)
       )

Kind of ugly, although I blame that mostly on opam's selection of combinator
names. A more standard set of combinators, say ``>>=`` and ``>>|`` for ``bind``
and ``map`` respectively would make this code shed a lot of indentation and
useless names. A few more seemingly absent functions are these staples from
``Control.Monad``:

.. code-block:: ocaml

   val sequence : 'a job list -> 'a list job
   val mapM : ('a -> 'b job) -> 'a list -> 'b list job
   val foldM : ('b -> 'a -> 'b job) -> 'b -> 'a list -> 'b job

The functions would improve usability quite a lot, but ``OpamProcess`` isn't a
part of any standard lib. So it's understandable why it wants to just cater to
its own internal use cases.

Anyhow, let's run this (at your own risk!) to convince ourselves this does
something. Note that running the job requires calling ``val run : 'a Op.Job ->
'a`` like this:

.. code-block:: ocaml

   let () = ignore (Job.run tar_mls)

What's the point of this ceremony? We could have just as easily written this
directly by running commands and inspecting their output. What we're gaining
here is the separation of the computational content from the effects that it
produces. Put another way, this is the essence of the interpreter pattern. We
build up a pure computation, and use an interpreter (in this case ``run``) to
force the effects. One of the main benefits is the ability to implement
different interpreters for the same computation.

Consider:

.. code-block:: ocaml

   val dry_run : 'a Op.job -> 'a

This will execute our computation without actually running any commands but will
instead pass a dummy result to subsequent commands.

The possibilities are endless here, you can have interpreters that log commands
and results, inject environment variables, run every command with ``nice``, etc.
Testability is another strong feature of this technique. It should be easy to
"mock" run every command and return some dummy output in anticipation of a
particular result.

Generalizing to Free Monads
---------------------------

First, let me put up a disclaimer that I'm definitely the wrong person to teach
the theory and fundamentals of free monads. I have neither the pedagogical skill
nor the deep understanding required for this. But I do hope that this post will
whet one's appetite to look further into this subject.

Let's quickly recall what a free monad is. A free monad is a monad construction
out of a functor. Where functor refers to the `haskell` functor type class or
the following module signature - not be confused with a parameterized module. I
will try to clarify this unfortunate overlap of terms throughout this post.

.. code-block:: ocaml

   module type Functor = sig
     type 'a t
     val map : ('a -> 'b) -> 'a t -> 'b t
   end

to construct a free monad out of a functor, we can use an `OCaml` functor. A
free monad is of course a monad so we'll define the appropriate ``bind``
(``>>=``) and ``return`` operations. Although I will construct a ``val join : 'a
t t -> 'a t`` and a ``val map : ('a -> 'b) -> 'a t -> 'b t`` instead since in my
opinion, the definition is simpler.

.. code-block:: ocaml

   module Free (F : Functor) = struct
     type 'a t =
       | Return of 'a
       | Free of ('a t) F.t

     let return a = Return a

     let rec map f = function
       | Return a -> Return (f a)
       | Free a -> Free (F.map (map f) a)

     let rec join = function
       | Return a -> a
       | Free f -> Free (F.map join f)

     let lift f = Free (F.map return f)

     (* The following definition is standard for any monad: *)
     let (>>=) t f = join (map t f)
     (* Infix map in the style of core *)
     let (>>|) t f = map f t
   end

Here's the associated functor (the type class or module signature) instance:

.. code-block:: ocaml

   module JobF = struct
     type 'a t = Run of command * (result -> 'a)
     let map f (Run (c, k)) = Run (c, fun res -> f (k res))
   end

You can convince yourself with simple substitution that ``Free(JobF).t`` is
equivalent to ``'a Job.Op.t``. We'll write this `OCaml` functor application as:

.. code-block:: ocaml

   module J = struct
     include Free(JobF)
     let run c = lift (JobF.Run (c, fun i -> i))
   end

The module is called ``J`` to avoid conflicts with ``Job`` and the ``run``
constructor will be very useful for us.


.. code-block:: ocaml

   let _ =
     let tar_mls =
       let open J in
       run (command "find" ["." ; "-iname" ; "*.ml"])
       >>| (fun r -> r.r_stdout) >>= fun mls ->
       run (command "tar" (["-cf" ; "mls.tar"] @ mls))
     in
     Lwt_main.run (run_lwt tar_mls)

For completeness, let's write our own interpreter for ``JobF``. This is mostly
just a demonstration to show how it's done. But we'll also make our own
interpreter execute the commands with Lwt. Decoupling your programs from a
particular implementation is a worthy goal in OCaml. So it's worth to advertise
as many techniques as possible for accomplishing this.

Here's my hastily written implementation. (An Async implementation is an
exercise for the reader)

.. code-block:: ocaml

   let rec run_lwt : 'a J.t -> 'a Lwt.t = function
     | J.Return a -> Lwt.return a
     | J.Free (JobF.Run (cmd, k)) ->
       let (cmd, args) =
         (* Gross but necessary because command is abstract *)
         match Str.split (Str.regexp "[ \t]+") (string_of_command cmd) with
         | c::args -> (c, Array.of_list (c::args))
         | _ -> failwith "invalid command" in
       let p = Lwt_process.open_process_full (cmd, args) in
       let code_signal =
         p # status >|= function
         | Unix.WEXITED c -> (c, None)
         | Unix.WSTOPPED c
         | Unix.WSIGNALED c -> (256, Some c) in
       let read_stream s = Lwt_stream.to_list (Lwt_io.read_lines s) in
       let stdout = read_stream (p # stdout) in
       let stderr = read_stream (p # stderr) in
       let result =
         stdout >>= fun r_stdout ->
         stderr >>= fun r_stderr ->
         code_signal >|= fun (r_code, r_signal) ->
         { r_code
         ; r_signal
         ; r_duration = 0.0 (* TODO for the reader *)
         ; r_info = []
         ; r_stdout
         ; r_stderr
         ; r_cleanup = [] } in
       Lwt_io.close p # stdin >>= fun () ->
       Lwt.map k result >>= run_lwt

This should be enough to show one of the advantages of free monads. The ability
to write your program in a way that separates effects from your computation
gives us the opportunity to reinterpret these effects in different ways. This
isn't the only benefit of using free monads, and the example cited here is one
of the simplest functors you can imagine, but it's still nice to see how even
this is useful in the context of real software.

Where to Go from Here?
----------------------

Gabriel Gonzales' posts about Free Monads are the best introduction to the
subject:

http://www.haskellforall.com/2012/07/purify-code-using-free-monads.html

http://www.haskellforall.com/2012/06/you-could-have-invented-free-monads.html

John A De Goe's post is an excellent resource to learn how to use free
monads in practice:

http://degoes.net/articles/modern-fp

ocaml-free from NicolasT is a small but pretty complete library for Free monads.
It also includes support for Free Applicatives a bonus.

https://github.com/NicolasT/ocaml-free