productize CI/CD on TFGrid #28
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
todo
VM's needed
Requirements
optional
questions
remark
ideas
should we consider zeroCI? it was working natively on the grid with a nice interface?
ofcourse we should make it part of the package, people can select which one they deploy, lets do a demo again
@despiegk I think this is amazing. One question concerning this:
"create manual specific for this usecase"
Do you want a whole new mdbook, or do we simply add a section to the TF Manual? (info_grid)
For now, we could quickly add a new section to the manual on zeroCI (with the demo). What do you think?
The usual way of doing this is using Docker's
buildx
to handle QEMU and do the cross compiling inside a container. This should just work inside our VMs, with no need to emulate an entire VM.For GitHub Actions, the runners (VMs) only need to be able to connect to GitHub via a outbound https connection (see here). Probably Gitea Actions works the same.
It's worth noting though that Github does allocate IP addresses to their runner VMs, and optionally static ones. I guess these are there for management of the runners as needed.
The obvious place we can compete is on price. Github's runners are marked up significantly as a convenience product.
Here's a post that not only covers price but a couple other potential benefits and does some comparison against hosting runners on AWS: https://www.linkedin.com/pulse/how-we-saved-15k-month-github-actions-part-1-trunkio
The figures given in the post for AWS machines are for spot pricing, not on demand, but the difference is still significant (price per minute for the base level machine of 2vcpu and 8gb ram):
I didn't find any services yet specifically offering just hosted Actions runners for price comparison. This might be part of the offering of some CI/CD services though.
I did some research and testing of deploying Github and Gitea actions runners in Grid VMs.
For Github, dealing with runner tokens is somewhat complicated and leads to some limitations I describe toward the end.
The situation for Gitea seems better however. As of a recent change, runner registration tokens are now reusable on Gitea until revoked.
Here are my summarized findings about how runners work (some aspects are Github specific, but most probably applies to Gitea as well):
runner
is a process that can execute a single workflow job at a time. Multiple runners can exist on a single machine, sharing all available resources of the machineruns_on
field of the job. All self hosted runners are labeledself_hosted
and other labels can be added (runners must match all labels of a job)Some Gitea specific notes
There are currently no API endpoints or webhooks for managing actions or runners in Gitea.
Some work is ongoing though:
https://github.com/go-gitea/gitea/issues/23796
https://github.com/go-gitea/gitea/issues/25572
https://github.com/go-gitea/gitea/issues/26370
There also seems to be no option for one time use or ephemeral runners. So scaling down runners while being sure not to interrupt any ongoing jobs could be a challenge on Gitea.
Given all that, here's a possible approach:
small
,medium
,large
, etc)4cpu8gbram50gbssd
for example, where the user can adjust the quantities freely. This is nice because no additional config needs to be passed to the manager up front—it can respond dynamically to whatever capacity requests the user adds to their jobs)Caveats (Github only)
This means that repo specific runners need a level of access that we probably shouldn't be asking users for. Org level on the other hand is doable. This isn't a huge limitation, but overall it requires additional config from the user and additional code in the manager app to refresh the runner registration tokens every hour.
See also this discussion.
Here's a project offering Github runners. Is self hostable with GNU Affero license:
https://www.ubicloud.com/use-cases/github-actions
Their managed service starts at $.0008, so 10x less than Github but still more than the Grid.
Source code repo too:
https://github.com/ubicloud/ubicloud
Maybe can learn something by studying it.
Here's a few others:
https://depot.dev/
https://buildjet.com/
https://www.warpbuild.com/
Here's another open source solution, for autoscaling runners on AWS: https://github.com/philips-labs/terraform-aws-github-runner
Some other notes and thoughts from my research:
So I think we need to considering our positioning (Zos may be secure, but any residual data on farmer's disks is not necessarily) and whether we can reach a sufficient market to make this worthwhile.
I think we can increase the appeal of Grid hosted runners by providing two optional features to protect user data:
These approaches both have the advantage that any sensitive data that might be processed during runner jobs is inaccessible after the node is powered down. That means there's no chance of anyone harvesting user data from the node disks.
For encryption there are a number of options, with these looking most promising:
gocryptfs
- this is a user space option that can be deployed on top of any filesystem, meaning it works in our micro VMs out of the box. Seems performance is good (if their own benchmarks are to be believed)fscrypt
- this is using the native encryption capabilities ofext4
, for example. Won't work in a micro VM because the kernel we supply doesn't have proper supportdm-crypt
- perhaps the best performance, but works on a full block device so is a bit less flexible (I didn't test it yet)As for RAM disks, there are also a few options. There's a comparison here. The
brd
kernel module seems not an option again in our micro VMs, buttmpfs
should be fine I think (assuming no swap).The remaining challenge would be to ensure that each type of runner is only placing user data into the encrypted store, while ideally avoiding use of the encrypted area for non sensitive data.
Investigating the possibility of using encryption for runner VMs, the following points become apparent:
ext4
,btrfs
,zfs
, orxfs
(there are some tradeoffs among these, but in general they are all better than the alternatives, which are the universally compatiblefuse-overlayfs
andvfs
drivers)dm-crypt
) or native filesystem encryption (fscrypt
) are thus essential to providing good Docker performance with encryptionWith this in mind, I already have an example full VM image, built entirely in Docker, that can support kernel based encryption schemes. It's also configured with a solution to write all changes to the root filesystem into an encrypted overlay.
This solution raises a few more points:
GITHUB_WORKSPACE
by defaultwe will implement this, actually we are working on it for our own purposes, it will be using hero with indeed actrunner, gitea, ...
we will focus on gitea first
being done differently
Good to know Thabet. Please write the new url/issue/story to track this if possible. Thanks.