Content addressed derivations
Introduction
Floating content addressed derivations (from now CA derivations) is an experimental feature which substantially change how the hashes in the store paths are calculated. Indeed, normally derivations are input addressed i.e. the outputs store paths depends only on the derivation inputs, instead with CA derivations they depend on the content of the outputs.
This has two main advantages:
- The so-called "early cutoff", namely the ability of Nix to stop a build if the build outputs would be something already built.
For example suppose you add a comment in an Haskell source, at this point Nix will rebuild the component depending on this source but since the output will be the same (adding a comment is an "output-invariant" change for
ghc
) every other component that depends on that will not be rebuilt. - Users of the same Nix store does not need to trust each other when using substituters.
You can find more information in the ca-derivations page on the wiki (and in the other resources linked there).
Usage
Enable CA derivations in your system
First of all your Nix installation must support the ca-derivations
experimental feature, this can done by adding the following in your nix.conf
:
experimental-features = ca-derivations
Or if you use NixOS:
nix.extraOptions = ''
experimental-features = ca-derivations
'';
Enable CA derivations in your project
At this point you can pass a new module to project'
that tells haskell.nix
to build every component in the project as CA derivation.
haskell-nix.project' {
# ...
modules = [{
contentAddressed = true;
# packages.project-name.components.exes.executable.contentAddressed = true;
}];
};
Optionally you can also specify which components you don't want to be content addressed.
Known problems
Limitation of the current CA derivations implementation
As explained in the RFC 62
The current implementation has a naive approach that just forbids fetching a path if the local system has a different realisation for the same drv output. This approach is simple and correct, but it's possible that it might not be good-enough in practice as it can result in a totally useless binary cache in some pathological cases.
For example, suppose that your machine builds a derivation A
producing an output A.out
in your store and that after that a CI machine builds the same derivation A
but producing a different output A.out'
and populating a cache with this output.
At this point, if you need to build a derivation B
that depends on A
, since you already have the realisation A.out
in your local store and you can't get B.out
from the cache and you will end up building B
even if one of its realisation is in the cache.
This means that, in some cases, enabling CA derivations would lead to more rebuilds than not having it.
Hydra
Hydra currently doesn't support CA derivations, efforts are being made in this direction.
GHC is not deterministic
Currently ghc
is determinstic only disabling the parallel building i.e. passing -j1
. Here the upstream issue.
Having a deterministic ghc
would be a dream since it will automatically fix all the pathological cases about substituters discussed above and would allow haskell.nix
to parallel build even when using CA derivations.