Virtual Libraries

Last week I finally managed to slog through the last remaining issues required to implement Virtual Libraries, and since it’s quite a milestone (slated for dune 1.7), I’d like to share my excitement for this feature. The feature itself is nothing new to old timers, but I think that dune now manages to present it in a much more polished way. Besides, OCaml is enjoying a recent uptick of new users, so perhaps a little review might be useful. I find that it is always helpful to have at least a high level understanding of how a build feature works to use it effectively.

The Linking Hack

The Linking Hack is an age old trick [1] for parameterizing libraries without using functors. The idea is quite simple:

  • Define an interface(s) by writing an .mli file(s). We dub this interface(s) as the virtual module(s) and the library that contains these modules as virtual.
  • An implementation for a virtual library is another library that defines an .ml implementation module for every virtual module in the virtual library.
  • Generic library code is written against the virtual library.
  • The selection of implementations (for the virtual libraries) is delayed until building executables.

The advantage is clear: we no longer have to commit to particular implementations of interfaces until we need to build a concrete executable. We can continue to write generic code for libraries and are only required to provide implementations when linking executables.

Sand Washing the Linking Hack

While the idea is clear, there are some non trivial difficulties in making this work in practice. As usual, Dune’s philosophy is to sweep all these low level details under the rug, and provide users with a high level API.

The motivating example to showcase the feature is going to be implementing a trivial file IO library with unix and node bindings. It’s a trivial example, but how one can eventually write an entire cross platform IO library this way.

Defining Virtual Libraries

To define a virtual library, it’s enough to mark at least one module as virtual:

(library
 (name fileio)
 (virtual_modules v))

The virtual module is:

val read : string -> string
val write : string -> string -> unit

While our main module is:

include V
let copy p1 p2 =
  let contents = read p1 in
  write p2 contents

Note that we are able to add implementation code in a virtual library. In java’s terms, a virtual library is more like an abstract class rather than an interface because it can provide a partial implementation as well.

Defining Implementations

To define an implementation, we need to mark a library as providing an implementation for a particular library:

;; cat fileio_unix/dune
(library
 (name fileio_unix)
 (implements fileio))

And for jsoo:

;; cat fileio_jsoo/dune
(library
 (name fileio_jsoo)
 (libraries js_of_ocaml)
 (preprocess (pps js_of_ocaml-ppx))
 (modes byte)
 (implements fileio))

There’s nothing interesting in the implementation code itself, so I won’t comment on it.

OCaml implementation:

let read p =
  let in_ = open_in p in
  let contents = really_input_string in_ (in_channel_length in_) in
  close_in in_;
  contents

let write p contents =
  let out = open_out p in
  Pervasives.output_string out contents;
  close_out out

Jsoo implementation:

open Js_of_ocaml

class type buffer = object
  method toString : Js.js_string Js.t Js.meth
end

module Fs = struct
  class type fs = object
    method readFileSync : Js.js_string Js.t -> buffer Js.t Js.meth
    method writeFileSync : Js.js_string Js.t -> Js.js_string Js.t -> unit Js.meth
  end

  let require_module s =
    Js.Unsafe.fun_call
      (Js.Unsafe.js_expr "require")
      [|Js.Unsafe.inject (Js.string s)|]

  let fs : fs Js.t = require_module "fs"
end

let read p =
  (Fs.fs##readFileSync (Js.string p))
  ## toString
  |> Js.to_string

let write path contents =
Fs.fs##writeFileSync (Js.string path) (Js.string contents)

Generic Code

Now that we have the virtual library and the two implementations defined, it’s now possible to write generic code.

Here’s a very simple program that tests the copying functionality:

let run () =
  Fileio.write "foo1" "hello world";
  Fileio.copy "foo1" "foo2";
  let foo1 = Fileio.read "foo1" in
  let foo2 = Fileio.read "foo2" in
  print_endline (Printf.sprintf "foo1 = foo2 is %b" (foo1 = foo2))

Which we’ll build with the following dune file:

(library
 (name main)
 (libraries fileio))

Finally, our two executables can be assembled with:

(rule (with-stdout-to jsoo_run.ml (echo "Main.run ();;\n")))
(rule (copy jsoo_run.ml unix_run.ml))

(executable
 (name jsoo_run)
 (modules jsoo_run)
 (libraries main fileio_jsoo))

(executable
 (name unix_run)
 (modules unix_run)
 (libraries main fileio_unix))

It’s a bit annoying to have to go through another intermediate library, but executables cannot have backend independent code.

Finally, we may run the executables with the following alias:

(alias
 (name default)
 (deps ./exe/jsoo_run.bc.js)
 (action
  (progn
   (echo "Unix Implementation\n")
   (run ./exe/unix_run.exe)
   (echo "Jsoo Implementation\n")
   (run %{bin:node} %{deps}))))

Which we test out with $ dune build.

The full source for the example above is on github.

External Implementations

In the example above, all implementations of the virtual library are provided upfront in the same package. In fact, dune’s virtual libraries allow for external implementations installed separately from the virtual library. This important generalization gives the feature another degree of flexibility: the ability to implement new platforms without coordinating with upstream.

Generalizing to Variants

With a little bit of imagination we can see how useful this feature can be for writing cross platform code. In fact, the main motivation for this feature was to create an abstraction mechanism for the different backends of mirage. One could easily imagine writing a virtual library for a concern such as date/time handling and implementations for backends such as xen, unix, jsoo, etc.

While this approach looks promising, it will quickly run into boilerplate problems. If we imagine parameterizing multiple libraries in this manner, we’ll have to write quite a bit of boilerplate to link actual executables. For example:

(executable
 (name server)
 (libraries a b c ... time.unix fs.unix crypto.unix ...))

We actually depend on libraries a, b, c, but we need to select implementations for all virtual libraries in the transitive closure of dependencies.

We already prepared a solution for this problem: A lightweight tagging mechanism for implementations called variants. We can tag implementations with a tag called a variant:

(library
 (implements fs)
 (variant unix))

(library
 (implements time)
 (variant unix))

...

Now we can define executables by choosing groups of implementations by selecting a tag:

(executable
 (name server)
 (libraries a b c)
 (variants unix))

This is much neater. For now these so called variants are still vaporware, but they’re on a roadmap, as it’s an important usability improvement. As always, feedback is welcome.

Contrasting with Functors

The experienced reader will immediately notice that this feature is quite similar to functors. In fact, the following dictionary is basically all that’s required for translating between the two features:

Variants Functors
Virtual Library Module Signature
Library that depends on a Virtual Library Functor
Implementation Functor Application

In fact, this duality has a long history [2]. I’ll just quickly highlight the main advantages and disadvantages of both approaches:

  • Virtual libraries are a much more lightweight parameterization mechanism. Using functors for large scale parameterizations quickly descends into a combinatorial explosion of sharing constraints and towers of functor applications, etc. Virtual libraries don’t have any of this extra glue code.
  • Virtual libraries impose a type class like uniqueness constraint to implementations. Only a single implementation per virtual library can be linked into an executable. While this is a disadvantage, it can still be useful sometimes.
  • You lose out on some optimizations when using virtual libraries. In particular, ocamlopt relies on the presence of .cmx files to inline across modules. Obviously, virtual modules have no such .cmx files. Hence you do lose out on some performance when compiling against virtual libraries. I’m not aware of any other details regarding performance, but I wouldn’t be surprised if some other optimizations are affected.

Conclusion

I’ve omitted some implementation details such as private modules, and findlib compatibility. I invite the reader to refer to the manual for a more complete description.

Virtual libraries have been in the works for quite a while and it took us quite a bit of iteration on the design to have something simple and usable [3]. The feature is quite young, and we’d like to hear your feedback to refine this feature further.

[1]I attempted to find the definitive source for who introduced this linking hack, but could not get a definitive answer. Personally, I first learned of it from @dbuenzli’s libraries. For example see the mtime library.
[2]Big Functors are an old idea that generalizes virtual libraries. Alas it requires patching the compiler.
[3]Starting from https://github.com/ocaml/dune/issues/136

Comments

comments powered by Disqus