.. post:: 2014-04-11
   :tags: OCaml, Opium
   :author: Rudi Grinberg

Middleware in Opium
===================

In my `previous
post <http://rgrinberg.com/blog/2014/04/04/introducing-opium/>`__ I've
introduced opium in a beginner friendly way, while in this post I'll try
to show something that's a little more interesting to experienced OCaml
programmers or those are well versed with protocols such as Rack, WSGI,
Ring, from Ruby, Python, Clojure respectively.

Traditional Middleware
----------------------

First, let's start with some history. I'll be recounting from memory so
I apologize for any inaccuracies. Back in the stone ages of web
application development, a big problem was (still is) creating reusable
stand alone components. For example, caching, authentication, static
pages. One solution to this problem was invented by the python
community, (WSGI) and popularized by the Ruby community (Rack). Since
then, it caught on like wild fire and has been ported to almost every
other langauge. I attribute Rack's success to its extreme simplicity. In
Rack, an application is an object that takes an environment, an
all-encompassing dictionary that includes the http request among other
things, usually called ``env`` and returns a tuple of three elements,
the status code, the headers, and the body of the response. Translating
this to OCaml, a rack application is just:

.. code-block:: ocaml

    type env = (string, string) Hashtbl.t
    (* In ruby, the env hash is not restricted to string values course. They can be
    any values. Ignore this restriction for now. Or pretend that we're doing some
    gross Obj.magic hack. Ruby does the same anyway ;) *)
    type body = string
    type header = (string, string) Hashtbl.t
    type application = env -> status * header * body

    (* Actually I'm simplfiying. In reality, body in Rack is more similar to: *)
    type body = (string -> unit) -> unit

Accepting Rack's proposition that applications are simply functions that
return that 3 element tuple, how do we create reusable components? In
Rack the solution was to create so called "middleware". Middleware is an
object with a call method (of type application) and is constructed by
passing the next application down the middleware chain. A very literal
translation of a typical rack middleware to OCaml gives:

.. code-block:: ocaml

    class my_middleware app =
      object
        method call env =
           if go_down_chain_condition
           then app#call env
           else (failwith "fix", failwith "me", failwith "please")
      end

One can imagine chaining multiple middleware to provide various
functionality. For example checking credentials in the header of the
request (found in env) to decide whether to authenticate the user to
proceed with the next step or to return a not authorized status.

Middleware in Opium
-------------------

Of course, nobody sane writes code like that in OCaml. So if we get rid
of the classes and store state using closures instead of instance
variables (if necessary) we'd get just a function of type:

.. code-block:: ocaml

    type middleware = application -> env -> status * header * body
    (* which can be simplified to *)
    type middleware = application -> application

In this viewpoint, middleware is simply a higher order function that
transforms an application to another.

OK, so is this good enough for OCaml? No. Putting back my statically
typed functional programming hat on I can poke a couple of holes in this
approach.

-  env is mutable. In fact, middleware is encouraged to treat it as
   request local storage and pass information between middleware. This
   means that env may not be the same before and after calling a
   middleware.

-  env offers no encapsulation. Middleware can easily pry at each
   other's internals. Sometimes this is necessary, but many times it's
   not. Ring in clojure offers namespaced keywords as the keys to the
   env hash, but this is only a gentleman's agreement.

-  env offers no type safety. In rack, doing env['xxx'] is like trying
   to pull a rabbit out of hat. There's no guarantee that the value
   obtained will of a certain type.

Universal Maps
~~~~~~~~~~~~~~

All of this points us towards using something other than an untyped hash
table for the environment hash. But what do we use in OCaml if we want,
an openly extensible, heteregoneus, immutable map? We use core's
``Univ_map``. I won't go into the details of how it works but I'll say
that a univ map supports the following two operations:

.. code-block:: ocaml

    val find : Univ_map.t -> 'a Univ_map.Key.t -> 'a option
    val add : Univ_map.t -> 'a Univ_map.Key.t -> 'a -> Univ_map.t

In addition to the creation of new ``Univ_map.Key.t`` that are
associated with the types a key would extract.

.. code-block:: ocaml

    val create: name:string -> ('a -> Sexp.t) -> 'a Univ_map.Key.t 

If you'd like to know more I recommend the following resource [2], [3],
[4].

In Opium, we throw away Rack's env and simply put everything under the
same umbrella and call it a Request. Similarly, an opium response will
subsume the 3 element response tuple. ``env`` will then be the
extensible part of a request/response. Which gives us:

.. code-block:: ocaml

    type request = {
      request: Cohttp.Request.t;
      env: Univ_map.t;
    }

    type response = {
      code: Code.status_code;
      headers: Cohttp.Header.t;
      body: Cohttp_async.Body.t;
      env: Univ_map.t;
    }

Now middleware is able to:

-  store stuff in env and always know the type of a value pulled out of
   env.

-  middleware can encapsulate its private data by not exposing its
   ``env`` keys in the public interface

And of course, there's no sight of side effects anywhere.

Examples
--------

Enough theory crafting, let's build some middleware. Let's start with a
trivial example. First of all, our middleware need not even use ``env``
at all. For example here's a middleware that uppercases the body:

.. code-block:: ocaml

    open Core.Std
    open Async.Std
    open Opium.Std

    let uppercase =
      let filter handler req =
        handler req >>| fun response ->
        response
        |> Response.body
        (* this trick is only available with the latest cohttp. 
           you can manually unpack to a string and then repack however *)
        |> Cohttp_async.Body.map ~f:String.uppercase
        |> Field.fset Response.Fields.body response
      in
      Rock.Middleware.create ~name:(Info.of_string "uppercaser") ~filter

    let _ = App.empty
            |> middleware uppercase
            |> get "/hello" (fun req -> `String ("Hello World") |> respond')
            |> App.cmd_name "Uppercaser"
            |> App.command
            |> Command.run

As you can tell, a middleware knows of 2 bits of information:

-  ``handler : Request.t -> Response.t Deferred.t``. This is the
   application request handler.

-  ``req : Request.t``. The current request.

In our example, the middleware runs the handler and returns a response
with the uppercased body. But of course a general middleware doesn't
have to run handler at all, it can change the request before feeding the
handler, or it can simple add a logging message and let the handler
proceeed with the request.

You can tell that middleware is flexible, but to make it do something
more interesting, you must be able to store stuff along the
request/response. As I've mentioned before, the ``env`` bag is the
perfect place for that.

Here's another common use case. Suppose we'd like our webapp to
automatically authenticate users that provide their credentials using
the `HTTP
Basic <http://en.wikipedia.org/wiki/Basic_access_authentication>`__
scheme. For example, we can export a export function like:

.. code-block:: ocaml

    val user : Request.t -> user option

Here's how we could implement that:

.. code-block:: ocaml

    open Core.Std
    open Async.Std
    open Opium.Std

    type user = {
      username: string;
      (* ... *)
    } with sexp

    (* My convention is to stick the keys inside an Env sub module. By
       not exposing this module in the mli we are preventing the user or other
       middleware from meddling with our values by not using our interface *)
    module Env = struct
      let key : user Univ_map.Key.t = Univ_map.Key.create "user" <:sexp_of<user>>
    end

    (*
       Usually middleware gets its own module so the middleware constructor function
       is usually shortened to m. For example, [Auth.m] is obvious enough.

       The auth param (auth : username:string -> password:string -> user option)
       would represent our database model. E.g. it would do some lookup in the db
       and fetch the user.
    *)
    let m auth =
      let filter handler req =
        match req |> Request.headers |> Cohttp.Header.get_authorization with
        | None ->
          (* could redirect here, but we return user as an option type *)
          handler req
        | Some (Cohttp.Auth.Basic (username, password)) ->
          match auth ~username ~password with
          | None -> failwith "TODO: bad username/password pair"
          | Some user -> (* we have a user. let's add him to req *)
            let env = Univ_map.add_exn (Request.env req) Env.key user in
            let req = Field.fset Request.Fields.env req env in
            handler req
      in
      Rock.Middleware.create ~name:(Info.of_string "http basic auth") ~filter

    let user req = Univ_map.find (Request.env req) Env.key

The middleware above might be basic as well but it should give you an
idea on how to create a wide variety of middleware. For example, serving
static pages, caching, throttling, routing, etc.

There are of course downsides however that I've yet to solve in Opium.
The main downside is that middleware does not commute. This means that
it must be executed in serial for every request. Much more worse however
is that middleware order will affect application behaviour. In fact, a
common source of bugs in Rack is executing middleware in the wrong
order. One obvious solution to this is to explictly specify dependencies
between middleware. Unfortunately it's pretty ugly and heavyweight and
it makes middleware less composable.

Final Note: After writing the first version of Opium I realized I was
copying the core of twitter's finnagle/finatra. I don't mention them
here however since finnagle is a lot more general and I don't know much
about it other than this excellent
`paper <http://monkey.org/~marius/funsrv.pdf>`__. I did end up borrowing
their terminology for the lower level plumbing though.

[1] [Rack Spec](https://github.com/rack/rack/blob/master/SPEC).

[2] [Ring Spec](https://github.com/mmcgrana/ring/blob/master/SPEC). A
huge improvement of Rack in my opinion. Rock could be thought of as a
typed Ring.

[3] If you're coming from the haskell WAI world I believe ``env`` would
be called the ``vault``. In fact, the way Rock is designed should be
very similar to the old WAI (< 3.0). Except that WAI use IO instead of
Deferred.t.

[4] [mixtbl](https://github.com/mjambon/mixtbl) An implementation
without core. It's also a hashtable instead of a map.

[5] [Haskell
Vault](http://apfelmus.nfshost.com/blog/2011/09/04-vault.html). Same
concept but implemented in the Haskell world.

[6] [Univeral Type](https://blogs.janestreet.com/rethinking-univ/). If
you'd like to dig deeper to see how the basic for such a map can be
implemented.