Reorganize for monorepo structure: - Move content/, templates/, static/, site.toml → sites/nrd.sh/ - Frees root for sukr docs site - Build with: sukr -c sites/nrd.sh/site.toml
187 lines
10 KiB
Markdown
187 lines
10 KiB
Markdown
---
|
||
title: Standard Action
|
||
description: Do it once, do it right.
|
||
taxonomies:
|
||
tags:
|
||
- std
|
||
- nix
|
||
- devops
|
||
- github actions
|
||
author: Tim D
|
||
authorGithub: nrdxp
|
||
authorImage: https://avatars.githubusercontent.com/u/34083928?v=4
|
||
authorTwitter: nrdxp52262
|
||
date: "2022-12-09"
|
||
category: dev
|
||
extra:
|
||
read_time: true
|
||
repo_view: true
|
||
---
|
||
|
||
## CI Should be Simple
|
||
|
||
As promised in the [last post](./std), I'd like to expand a bit more on what we've
|
||
been working on recently concerning Nix & Standard in CI.
|
||
|
||
At work, our current GH action setup is rather _ad hoc_, and the challenge of optimizing that path
|
||
around Nix’s strengths lay largely untapped for nearly a year now. Standard has helped somewhat
|
||
to get things organized, but there has been a ton of room for improvement in the way tasks are
|
||
scheduled and executed in CI.
|
||
|
||
[Standard Action][action] is our answer. We have taken the last several months of brainstorming
|
||
off and on as time allows, experimenting to find a path that is versatile enough to be useful
|
||
in the general case, yet powerful enough for organizations who need extra capacity. So without
|
||
any further stalling, let's get into it!
|
||
|
||
## The Gist
|
||
|
||
The goal is simple, we want a CI system that only does work once and shares the result from there.
|
||
If it has been built or evaled before, then we want to share the results from the previous run
|
||
rather than start from scratch.
|
||
|
||
It is also useful to have some kind of metadata about our actions, which we can use to build
|
||
matrices of task runners to accomplish our goals. This also allows us to schedule builds on
|
||
multiple OS trivially, for example.
|
||
|
||
Task runners shouldn't have to care about Nix evaluation at all, they should just be able to get
|
||
to work doing whatever they need to do. If they have access to already reified derivations, they
|
||
can do that.
|
||
|
||
So how can we accomplish this? Isolate the evaluation to its own dedicated "discovery" phase, and
|
||
share the resulting /nix/store and a json list describing each task and its target derivations.
|
||
|
||
From there it's just a matter of opimizing the details based on your usecase, and to that end we
|
||
have a few optional inputs for things like caching and remote building, if you are so inclined.
|
||
|
||
But you can do everything straight on the runner too, if you just need the basics.
|
||
|
||
## How it Works
|
||
|
||
Talking is fine, but code is better. To that end, feel free to take a look at my own personal CI
|
||
for my NixOS system and related packages: [nrdxp/nrdos/ci.yml][nrdos].
|
||
|
||
What is actually evaluated during the discovery phase is determined directly in the
|
||
[flake.nix][ci-api].
|
||
|
||
I am not doing anything fancy here at the moment, just some basic package builds, but that is
|
||
enough to illustrate what's happening. You can get a quick visual by look at the summary of
|
||
a given run: [nrdxp/nrdos#3644114900](https://github.com/nrdxp/nrdos/actions/runs/3644114900).
|
||
|
||
You could have any number of matrices here, one for publishing OCI images, one for publishing
|
||
documentation, one for running deployments against a target environment, etc, etc.
|
||
|
||
Notice in this particular example that CI exited in 2 minutes. That's because everything
|
||
represented by these builds is already cached in the specified action input `cache`, so no work is
|
||
required, we simply report that the artifacts already exist and exit quickly.
|
||
|
||
There is a run phase that typically starts after this build step which runs the Standard action,
|
||
but since the "build" actions only duty is building, it is also skipped here.
|
||
|
||
This is partially enabled by use of the GH action cache. The cache key is set using the following
|
||
format: [divnix/std-action/discover/action.yml#key][key]. Coupled with the guarantees nix already
|
||
gives us, this is enough to ensure the evaluation will only be used on runners using a matching OS,
|
||
on a matching architecture and the exact revision of the current run.
|
||
|
||
This is critical for runners to ensure they get an exact cache hit on start, that way they pick
|
||
up where the discovery job left off and begin their build work immediately, acting directly
|
||
on their target derivation file instead of doing any more evaluation.
|
||
|
||
## Caching & Remote Builds
|
||
|
||
Caching is also a first class citizen, and even in the event that a given task fails (even
|
||
discovery itself), any of its nix dependencies built during the process leading up to that failure
|
||
will be cached, making sure no nix build _or_ evaluation is ever repeated. The user doesn't have
|
||
to set a cache, but if they do, they can be rest assured their results will be well cached, we
|
||
make a point to cache the entire build time closure, and not just the runtime closure, which is
|
||
important for active developement in projects using a shared cache.
|
||
|
||
The builds themselves can also be handed off to a more powerful dedicated remote builder. The
|
||
action handles remote builds using the newer and more efficient remote store build API, and when
|
||
coupled with a special purpose service such as [nixbuild.net](https://nixbuild.net), which your
|
||
author is already doing, it becomes incredibly powerful.
|
||
|
||
To get started, you can run all your builds directly on the action runner, and if that becomes
|
||
a burden, there is a solid path available if and when you need to split out your build phase to a
|
||
dedicated build farm.
|
||
|
||
## Import from What?
|
||
|
||
This next part is a bit of an aside, so feel free to skip, but the process outlined above just so
|
||
happened to solve an otherwise expensive problem for us at work, outlining how thinking through
|
||
these problems carefully has helped us improve our process.
|
||
|
||
IOG in general is a bit unique in the Nix community as one of the few heavy users of Nix’s IFD
|
||
feature via our [haskell.nix][haskell] project. For those unaware, IFD stands for
|
||
"import from derivation" and happens any time the contents of some file from one derivations output
|
||
path is read into another during evaluation, say to read a lock file and generate fetch actions.
|
||
|
||
This gives us great power, but comes at a cost, since the evaluator has to stop and build the
|
||
referenced path if it does not already exist in order to be able to read from it.
|
||
|
||
For this reason, this feature is banned from inclusion in nixpkgs, and so the tooling used there
|
||
(Hydra, _et al._) is not necessarily a good fit for projects that do make use of IFD to some extent.
|
||
|
||
So what can be done? Many folks would love to improve the performance of the evaluator itself, your
|
||
author included. The current Nix evaluator is single threaded, so there is plenty of room for
|
||
splitting this burden across threads, and especially in the case of IFD, it could theoretically
|
||
speed things up a great deal.
|
||
|
||
However, improving the evaluator performance itself is actually a bit of a red herring as far as
|
||
we are concerned here. What we really want to ensure is that we never pay the cost of any given Nix
|
||
workload more than once, no matter how long it takes. Then we can ensure we are only ever
|
||
building on what has already been done; an additive process if you will. Without careful
|
||
consideration of this principle beforehand, even a well optimized evaluator would be wasting cycles
|
||
doing the same evals over and over. There is the nix flake evalulation cache, but it comes with
|
||
a few [caveats][4279] on its own and so doesn't currently solve our problem either.
|
||
|
||
To give you some numbers, to run a fresh eval of my current project at work takes 35 minutes from a
|
||
clean /nix/store, but with a popullated /nix/store from a previous run it takes only 2.5 minutes.
|
||
Some of the savings is eaten up by data transfer and compression, but the net savings are still
|
||
massive.
|
||
|
||
I have already begun brainstorming ways we could elimnate that transfer cost entirely by introducing
|
||
an optional, dedicated [evaluation store](https://github.com/divnix/std-action/issues/10) for those
|
||
who would benefit from it. With that, there is no transfer cost at all during discovery, and the
|
||
individual task runners only have to pull the derivations for their particular task, instead of the
|
||
entire /nix/store produced by discovery, saving a ton of time in our case.
|
||
|
||
Either way, this is a special case optimization, and for those who are content to stick with the
|
||
default of using the action cache to share evaluation results, it should more than suffice in the
|
||
majority of cases.
|
||
|
||
## Wrap Up
|
||
|
||
So essentially, we make due with what we have in terms of eval performance, focus on ensuring we
|
||
never do the same work twice, and if breakthroughs are made in the Nix evaluator upstream at some
|
||
point in the future, great, but we don't have to wait around for it, we can minimize our burden
|
||
right now by thinking smart. After all, we are not doing Nix evaluations just for the sake of it,
|
||
but to get meaningful work done, and doing new and interesting work is always better than repeating
|
||
old tasks because we failed to strategize correctly.
|
||
|
||
If we do ever need to migrate to a more complex CI system, these principles themeselves are all
|
||
encapsulated in a few fairly minimal shell scripts and could probably be ported to other
|
||
systems without incredible effort. Feel free to take a look at the source to see what's really
|
||
goin on: [divnix/std-action](https://github.com/divnix/std-action).
|
||
|
||
There are some places where we could use some [help][7437] from [upstream][2946], but even then, the
|
||
process is efficient enough to be a massive improvement, both for my own personal setup, and for
|
||
work.
|
||
|
||
As I mentioned in the previous post though, Standard isn't just about convenience or performance,
|
||
but arguable the most important aspect is to assist us in being _thorough_. To ensure all
|
||
our tasks are run, all our artifacts are cached and all our images are published is no small feat
|
||
without something like Standard to help us automate away the tedium, and thank goodness for that.
|
||
|
||
For comments or questions, please feel free to drop by the official Standard [Matrix Room][matrix]
|
||
as well to track progress as it comes in. Until next time...
|
||
|
||
[action]: https://github.com/divnix/std-action
|
||
[haskell]: https://github.com/input-output-hk/haskell.nix
|
||
[nrdos]: https://github.com/nrdxp/nrdos/blob/master/.github/workflows/ci.yml
|
||
[key]: https://github.com/divnix/std-action/blob/6ed23356cab30bd5c1d957d45404c2accb70e4bd/discover/action.yml#L37
|
||
[7437]: https://github.com/NixOS/nix/issues/7437
|
||
[3946]: https://github.com/NixOS/nix/issues/3946#issuecomment-1344612074
|
||
[4279]: https://github.com/NixOS/nix/issues/4279#issuecomment-1343723345
|
||
[matrix]: https://matrix.to/#/#std-nix:matrix.org
|
||
[ci-api]: https://github.com/nrdxp/nrdos/blob/66149ed7fdb4d4d282cfe798c138cb1745bef008/flake.nix#L66-L68
|