posts: add post for std-action
This commit is contained in:
178
src/pages/blog/std-action.md
Normal file
178
src/pages/blog/std-action.md
Normal file
@@ -0,0 +1,178 @@
|
|||||||
|
---
|
||||||
|
layout: $/layouts/post.astro
|
||||||
|
title: Standard Action
|
||||||
|
description: Do it once, do it right.
|
||||||
|
tags:
|
||||||
|
- std
|
||||||
|
- nix
|
||||||
|
- devops
|
||||||
|
- GitHub
|
||||||
|
- actions
|
||||||
|
author: Tim D
|
||||||
|
authorGithub: nrdxp
|
||||||
|
date: 2022-12-09
|
||||||
|
---
|
||||||
|
|
||||||
|
## CI Should be Simple
|
||||||
|
|
||||||
|
As promised in the [last post](./std), I'd like to expand a bit more on what we've
|
||||||
|
been working on recently concerning Nix & Standard in CI.
|
||||||
|
|
||||||
|
At work, our current GH action setup is rather _ad hoc_, and the challenge of optimizing that path
|
||||||
|
around Nix’s strengths lay largely untapped for nearly a year now. Standard has helped somewhat
|
||||||
|
to get things organized, but there has been a ton of room for improvement in the way tasks are
|
||||||
|
scheduled and executed in CI.
|
||||||
|
|
||||||
|
[Standard Action][action] is our answer. We have taken the last several months of brainstorming
|
||||||
|
off and on as time allows, experimenting to find a path that is versatile enough to be useful
|
||||||
|
in the general case, yet powerful enough for organizations who need extra capacity. So without
|
||||||
|
any further stalling, let's get into it!
|
||||||
|
|
||||||
|
## The Gist
|
||||||
|
|
||||||
|
The goal is simple, we want a CI system that only does work once and shares the result from there.
|
||||||
|
If it has been built or evaled before, then we want to share the results from the previous run
|
||||||
|
rather than start from scratch.
|
||||||
|
|
||||||
|
It is also useful to have some kind of metadata about our actions, which we can use to build
|
||||||
|
matrices of task runners to accomplish our goals. This also allows us to schedule builds on
|
||||||
|
multiple OS trivially, for example.
|
||||||
|
|
||||||
|
Task runners shouldn't have to care about Nix evaluation at all, they should just be able to get
|
||||||
|
to work doing whatever they need to do. If they have access to already reified derivations, they
|
||||||
|
can do that.
|
||||||
|
|
||||||
|
So how can we accomplish this? Isolate the evaluation to its own dedicated "discovery" phase, and
|
||||||
|
share the resulting /nix/store and a json list describing each task and its target derivations.
|
||||||
|
|
||||||
|
From there it's just a matter of opimizing the details based on your usecase, and to that end we
|
||||||
|
have a few optional inputs for things like caching and remote building, if you are so inclined.
|
||||||
|
|
||||||
|
But you can do everything straight on the runner too, if you just need the basics.
|
||||||
|
|
||||||
|
## How it Works
|
||||||
|
|
||||||
|
Talking is fine, but code is better. To that end, feel free to take a look at my own personal CI
|
||||||
|
for my NixOS system and related packages: [nrdxp/nrdos/ci.yml][nrdos].
|
||||||
|
|
||||||
|
What is actually evaluated during the discovery phase is determined directly in the
|
||||||
|
[flake.nix][ci-api].
|
||||||
|
|
||||||
|
I am not doing anything fancy here at the moment, just some basic package builds, but that is
|
||||||
|
enough to illustrate what's happening. You can get a quick visual by look at the summary of
|
||||||
|
a given run: [nrdxp/nrdos#3644114900](https://github.com/nrdxp/nrdos/actions/runs/3644114900).
|
||||||
|
|
||||||
|
You could have any number of matrices here, one for publishing OCI images, one for publishing
|
||||||
|
documentation, one for running deployments against a target environment, etc, etc.
|
||||||
|
|
||||||
|
Notice in this particular example that CI exited in 2 minutes. That's because everything
|
||||||
|
represented by these builds is already cached in the specified action input `cache`, so no work is
|
||||||
|
required, we simply report that the artifacts already exist and exit quickly.
|
||||||
|
|
||||||
|
This is partially enabled by use of the GH action cache. The cache key is set using the following
|
||||||
|
format: [divnix/std-action/discover/action.yml#key][key]. Coupled with the guarantees nix already
|
||||||
|
gives us, this is enough to ensure the evaluation will only be used on runners using a matching OS,
|
||||||
|
on a matching architecture and the exact revision of the current run.
|
||||||
|
|
||||||
|
This is critical for runners to ensure they get an exact cache hit on start, that way they pick
|
||||||
|
up where the discovery job left off and begin their build work immediately, acting directly
|
||||||
|
on their target derivation file instead of doing any more evaluation.
|
||||||
|
|
||||||
|
## Caching & Remote Builds
|
||||||
|
|
||||||
|
Caching is also a first class citizen, and even in the event that a given task fails (even
|
||||||
|
discovery itself), any of its nix dependencies built during the process leading up to that failure
|
||||||
|
will be cached making sure no nix build _or_ evaluation is ever repeated. The user doesn't have
|
||||||
|
to set a cache, but if they do, they can be rest assured their results will be well cached, we
|
||||||
|
make a point to cache the entire build time closure, and not just the runtime closure, which is
|
||||||
|
important for active developement in projects using a shared cache.
|
||||||
|
|
||||||
|
The builds themselves can also be handed off to a more powerful dedicated remote builder. The
|
||||||
|
action handles remote builds using the newer and more efficient remote store build API, and when
|
||||||
|
coupled with a special purpose service such as [nixbuild.net](https://nixbuild.net), which your
|
||||||
|
author is already doing, it becomes incredibly powerful.
|
||||||
|
|
||||||
|
To get started, you can run all your builds directly on the action runner, and if that becomes
|
||||||
|
a burden, there is a solid path available if and when you need to split out your build phase to a
|
||||||
|
dedicated build farm.
|
||||||
|
|
||||||
|
## Import from What?
|
||||||
|
|
||||||
|
This next part is a bit of an aside, so feel free to skip, but the process outlined above just so
|
||||||
|
happened to solve an otherwise expensive problem for us at work, outlining how thinking through
|
||||||
|
these problems carefully has helped us improve our process.
|
||||||
|
|
||||||
|
IOG in general is a bit unique in the Nix community as one of the few heavy users of Nix’s IFD
|
||||||
|
feature via our [haskell.nix][haskell] project. For those unaware, IFD stands for
|
||||||
|
"import from derivation" and happens any time the contents of some file from one derivations output
|
||||||
|
path is read into another during evaluation, say to read a lock file and generate fetch actions.
|
||||||
|
|
||||||
|
This gives us great power, but comes at a cost, since the evaluator has to stop and build the
|
||||||
|
referenced path if it does not already exist in order to be able to read from it.
|
||||||
|
|
||||||
|
For this reason, this feature is banned from inclusion in nixpkgs, and so the tooling used there
|
||||||
|
(Hydra, _et al._) is not necessarily a good fit for projects that do make use of IFD to some extent.
|
||||||
|
|
||||||
|
So what can be done? Many folks would love to improve the performance of the evaluator itself, your
|
||||||
|
author included. The current Nix evaluator is single threaded, so there is plenty of room for
|
||||||
|
splitting this burden across threads, and especially in the case of IFD, it could theoretically
|
||||||
|
speed things up a great deal.
|
||||||
|
|
||||||
|
However, improving the evaluator performance itself is actually a bit of a red herring as far as
|
||||||
|
we are concerned here. What we really want to ensure is that we never pay the cost of any given Nix
|
||||||
|
workload more than once, no matter how long it takes. Then we can ensure we are only ever
|
||||||
|
building on what has already been done; an additive process if you will. Without careful
|
||||||
|
consideration of this principle beforehand, even a well optimized evaluator would be wasting cycles
|
||||||
|
doing the same evals over and over. There is the nix flake evalulation cache, but it comes with
|
||||||
|
a few [caveats][4279] on its own and so doesn't currently solve our problem either.
|
||||||
|
|
||||||
|
To give you some numbers, to run a fresh eval of my current project at work takes 35 minutes from a
|
||||||
|
clean /nix/store, but with a popullated /nix/store from a previous run it takes only 2.5 minutes.
|
||||||
|
Some of the savings is eaten up by data transfer and compression, but the net savings are still
|
||||||
|
massive.
|
||||||
|
|
||||||
|
I have already begun brainstorming ways we could elimnate that transfer cost entirely by introducing
|
||||||
|
an optional, dedicated [evaluation store](https://github.com/divnix/std-action/issues/10) for those
|
||||||
|
who would benefit from it. With that, there is no transfer cost at all during discovery, and the
|
||||||
|
individual task runners only have to pull the derivations for their particular task, instead of the
|
||||||
|
entire /nix/store produced by discovery, saving a ton of time in our case.
|
||||||
|
|
||||||
|
Either way, this is a special case optimization, and for those who are content to stick with the
|
||||||
|
default of using the action cache to share evaluation results, it should more than suffice in the
|
||||||
|
majority of cases.
|
||||||
|
|
||||||
|
## Wrap Up
|
||||||
|
|
||||||
|
So essentially, we make due with what we have in terms of eval performance, focus on ensuring we
|
||||||
|
never do the same work twice, and if breakthroughs are made in the Nix evaluator upstream at some
|
||||||
|
point in the future, great, but we don't have to wait around for it, we can minimize our burden
|
||||||
|
right now by thinking smart. After all, we are not doing Nix evaluations just for the sake of it,
|
||||||
|
but to get meaningful work done, and doing new and interesting work is always better than repeating
|
||||||
|
old tasks because we failed to strategize correctly.
|
||||||
|
|
||||||
|
If we do ever need to migrate to a more complex CI system, these principles themeselves are all
|
||||||
|
encapsulated in a few fairly minimal shell scripts and could probably be ported to other
|
||||||
|
systems without incredible effort. Feel free to take a look at the source to see what's really
|
||||||
|
goin on: [divnix/std-action](https://github.com/divnix/std-action).
|
||||||
|
|
||||||
|
There are some places where we could use some [help][7437] from [upstream][2946], but even then, the
|
||||||
|
process is efficient enough to be a massive improvement, both for my own personal setup, and for
|
||||||
|
work.
|
||||||
|
|
||||||
|
As I mentioned in the previous post though, Standard isn't just about convenience or performance,
|
||||||
|
but arguable the most important aspect is to assist us in being _thorough_. To ensure all
|
||||||
|
our tasks are run, all our artifacts are cached and all our images are published is no small feat
|
||||||
|
without something like Standard to help us automate away the tedium, and thank goodness for that.
|
||||||
|
|
||||||
|
For comments or questions, please feel free to drop by the official Standard [Matrix Room][matrix]
|
||||||
|
as well to track progress as it comes in. Until next time...
|
||||||
|
|
||||||
|
[action]: https://github.com/divnix/std-action
|
||||||
|
[haskell]: https://github.com/input-output-hk/haskell.nix
|
||||||
|
[nrdos]: https://github.com/nrdxp/nrdos/blob/master/.github/workflows/ci.yml
|
||||||
|
[key]: https://github.com/divnix/std-action/blob/6ed23356cab30bd5c1d957d45404c2accb70e4bd/discover/action.yml#L37
|
||||||
|
[7437]: https://github.com/NixOS/nix/issues/7437
|
||||||
|
[3946]: https://github.com/NixOS/nix/issues/3946#issuecomment-1344612074
|
||||||
|
[4279]: https://github.com/NixOS/nix/issues/4279#issuecomment-1343723345
|
||||||
|
[matrix]: https://matrix.to/#/#std-nix:matrix.org
|
||||||
|
[ci-api]: https://github.com/nrdxp/nrdos/blob/66149ed7fdb4d4d282cfe798c138cb1745bef008/flake.nix#L66-L68
|
||||||
Reference in New Issue
Block a user