.. post:: 2017-07-25
   :tags: OCaml, Monads, Generators

Monadic Generators in OCaml
===========================

Generators are one of those features that have been heavily popularized by
today's fashionable languages such as Python and ES6. So it's not the first time
I've encountered programmers who are curious about OCaml bring up with questions
along the lines of:

* Why doesn't OCaml support generators?
* Are there alternatives to generators in OCaml?
* How can I emulate generators in OCaml?

In this post, I'd like to briefly address these questions.

Where are the Generators?
-------------------------

I can only speculate but it seems self evident, OCaml doesn't have native
support for generators because it doesn't really need it. There's quite a few
satisfactory ways of emulating them to the point where the strong justification
for adding a feature to the language just isn't there.

With that being said, it's well known that people are working on algebraic
effects for OCaml and it's my understanding that it will be possible to
implement generators in direct style which would be indistinguishable from
direct support in the compiler. So this feature is effectively a strict superset
of generators and should settle this issue once and for all.

Alternatives to Generators
--------------------------

Maybe I'm just old fashioned, but I find myself rarely needing generators. I
usually find the bog standard `lazy lists` to be more convenient to work with.
Where my implementation of choice the one is the one that comes with `containers
<https://github.com/c-cube/ocaml-containers/blob/master/src/iter/CCLazy_list.mli>`_.

Here's a simple lazy list of primes:

.. code-block:: ocaml

   (* save to file.ml and run as $ utop file.ml *)
   #require "containers.iter";;

   module Llist = CCLazy_list

   let primes =
     let open Llist in
     let rec next_prime n =
       try
         for i=2 to int_of_float (sqrt (float_of_int n)) do
           if n mod i = 0 then raise_notrace Exit
         done;
         Cons (n, lazy (next_prime (n + 1)))
       with Exit -> next_prime (n + 1) in
     Lazy.from_val @@ Cons (2, lazy (next_prime 3))

   let () =
     let n = 100_000 in
     let primes = Llist.length (Llist.take n primes) in
     Printf.printf "Found: %d primes\n" primes

OK, maybe this isn't the prettiest code, but I've picked a bad example. Lazy
lists usually work much better. Promise.

The other candidate I usually reach for is the `gen` type:

.. code-block:: ocaml

   type 'a gen = unit -> 'a option

Which comes with its own useful `gen <https://github.com/c-cube/gen>`_ library.
The main advantage of this type is that it's structural. We can consume and
produce gens without ever depending on a library. ocaml-re uses this to great
effect `here <https://github.com/ocaml/ocaml-re/blob/master/lib/re.mli#L127>`_
for example.

The disadvantage of this type is that implementing generators usually involves
ugly mutable code. Oh well, if I had a problem with this kind of mutation, I'd
be using Haskell.

.. code-block:: ocaml

   let primes () =
     let prime = ref 1 in
     let rec loop () =
       incr prime;
       try
         for i=2 to int_of_float (sqrt (float_of_int !prime)) do
           if !prime mod i = 0 then raise_notrace Exit
         done;
         Some !prime
       with Exit -> loop ()
     in loop

   let () =
     let prime_gen = primes () in
     for _ = 0 to 100_00 do
       assert (prime_gen () != None);
     done;
     print_endline "There's more than 100_000 primes"

Note the type of this function:

.. code-block::  ocaml

   val primes : unit -> (unit -> int option)

We must not forget to create a new generator whenever we want to iterate over
our generator from scratch.

Really Faking Generators
------------------------

While the approaches above were similar, they're still not capturing the essence
of generators - the ability to `yield` control back to the caller along with a
value anywhere inside a computation. How do we approach this problem? Like every
good functional programmer: with Monads!

Here's the signature we're aiming to implement:

.. code-block:: ocaml

   module type Gen_intf = sig
     type ('result, 'yield) t
     val return : 'a -> ('a, _) t
     val yield : 's -> (unit, 's) t
     val (>>=) : ('a, 's) t -> ('a -> ('b, 's) t) -> ('b, 's) t
     val iter : ('result, 'a) t -> f:('a -> unit) -> 'result
   end

With this signature, we're modeling generators as computations that can produce
a final ``'result``` value and possibly many intermediate ``'yield'`` values
along the way. These computations can of course be sequenced (the essence of
monads), and finally executed to produce the final a result along with running a
callback for every yielded value.

Representing this behavior with a type we arrive at a generator:

.. code-block:: ocaml

   type ('result, 'yield) t =
     | Done of 'result
     | Yield of (unit -> 'yield * ('result, 'yield) t)

A constructor for terminating the generator and continuing it along with a
yielded value. Note that we need to guard every yield behind a thunk. Otherwise
we will find it quite hard to pause our generator mid way.

Since this a monad instance,

.. code-block:: ocaml

  let return a = Done a

  let rec (>>=) x f =
    match x with
    | Done r -> f r
    | Yield t -> Yield (fun () ->
        let (y, next) = t () in
        (y, next >>= f)
      )

where ``return`` trivial and ``(>>=)`` is simply sequencing a generator along
with a computation that produces a generator from the result.

If you really want to get the hang of it, try implementing ``join`` from
scratch:

.. code-block:: ocaml

   val join : (('a, 's) t, 's) -> ('a, 's) t

Now, finally we need ``yield`` to generate values:

.. code-block:: ocaml

   let yield r = Yield (fun () -> (r, return ()))

Let's briefly contrast this to our ``return`` above. While ``return`` created a
generator that yielded no values and produced some result, ``yield`` creates a
generator that yields a single value and produces a trivial result.

With this kit, we can construct generators:

.. code-block:: ocaml

   (* assuming that all these functions are in Gen*)
   let primes : (unit, int) Gen.t =
     let open Gen in
     let rec loop n =
       try
         for i=2 to int_of_float (sqrt (float_of_int n)) do
           if n mod i = 0 then raise_notrace Exit
         done;
         yield n >>= fun () ->
         loop (n + 1)
       with
         Exit ->
         loop (n + 1)
     in loop 2

We're almost done here, all that remains is a way to run a generator:

.. code-block:: ocaml

   let rec iter x ~f =
     match x with
     | Done s -> s
     | Yield t ->
       let (y, x) = t () in
       f y;
       iter x ~f

While this is a useful function in general, it doesn't quite work in our case
since our generator doesn't terminate.

No matter, we can have a variant that will only consume up to a constant number
of yields:

.. code-block:: ocaml

   let take
     : type a b. (a, b) t -> int -> b list
     = fun t n ->
       let rec loop acc t n =
         match t, n with
         | Done _, _
         | _, 0 -> List.rev acc
         | Yield t, n ->
           let (a, next) = t () in
           loop (a::acc) next (n - 1)
       in loop [] t n

As an exercise, one can also implement a generator-returning version of this:

.. code-block:: ocaml

   val take : ('a, 'b) t -> int -> (unit, 'b) t

And finally:

.. code-block:: ocaml

   ignore (Gen.take primes 100_000)

What Else?
----------

First of all, let me first mention that this entire treatment is following Kwang
Yul Seo's original `exposition
<https://kseo.github.io/posts/2017-01-25-write-your-own-stream-processing-library-part1.html>`_.
Apart from the usual porting to OCaml, I took the liberty to get rid of the
transformer layer in my OCaml port. Transformers aren't necessary to thread
effects while yielding in OCaml, and a naive implementation would raise some
stack safety issues that would distract us from the topic at hand.

I should also note that a variant of all this work in OCaml can always be found
in some obscure Jane Street library. This time, it's not obscure at all but the
ubiquitous `Sequence.Generator
<https://github.com/janestreet/base/blob/master/src/sequence.mli#L402>`_ module
in `Base <https://github.com/janestreet/base>`_. Their approach is more
sophisticated as it uses the `Stream Fusion
<https://code.haskell.org/~dons/papers/icfp088-coutts.pdf>`_ step type to
construct the generator. Also, they have a solution to the problem of composing
monads with their sequence type. Which is similar to the problem I've avoided
with making ``yield`` play nicely with other binds.

Finally, I should note that the most honest to goodness emulation is actually
done with the infamous delimcc in this `blog post
<https://monkey.org/~marius/implementing-python-style-generators-with-delimited-continuations.html>`_.
But it would take me a lot of guts before I'd use delimcc in production, so it's
not worth dwelling on.

Happy generating!