Cloud + CI/CD lifecycle
SARC deploys the same stack to AWS, Azure, GCP, and local k3d from one repo via TARGET_CLOUD. Each cloud, and each CI/CD pipeline that feeds it, can be turned off and back on without losing the ability to rebuild. There are three levels of “off”, from cheapest-to-resume to fully removed.
The full operator guide lives in the repo at docs/CLOUD-LIFECYCLE.md. This page is the summary.
Three levels of off
Section titled “Three levels of off”| Level | What it does | Billing | Reversibility |
|---|---|---|---|
| Pause | Scale nodes to 0 (EKS/GKE) or az aks stop (AKS). Control plane stays. | Control plane only | Instant — just cluster-start-<cloud>, 3-5 min to Ready |
| Teardown | terraform destroy the whole infra/<cloud> stack. | Zero | Rebuild via just bootstrap-<cloud>, ~30 min |
| Disable-not-delete | For out-of-band artifacts (e.g. a GitLab-integration service account + token): gcloud ... disable. | Negligible | gcloud ... enable |
Pick the lowest level that meets the goal. Pause is the default for routine weekend/overnight cost saving; the per-cloud stop/start recipes exist so you do not have to tear down and rebuild.
Pause / resume a cluster
Section titled “Pause / resume a cluster”just clusters-stop-all stop AWS + Azure + GCP + k3d (fail-tolerant)just clusters-start-all start every cluster before a demojust clusters-status power state of all four clusters
just cluster-stop-<cloud> aws | azure | gcp | k3djust cluster-start-<cloud>- AWS EKS — nodes scaled to 0; control plane still billed.
- Azure AKS —
az aks stopdeallocates control plane + nodes (cheapest pause). - GCP GKE — node pools to 0; regional control plane still billed.
- k3d — local, no cloud cost.
Teardown / rebuild
Section titled “Teardown / rebuild”cd infra/<cloud>terraform destroy # frees the LoadBalancer first, then cluster, network, IAM, secretsjust bootstrap-<cloud> # rebuild from clean state, ~30 minA Kubernetes Service of type LoadBalancer (ingress-nginx) creates a cloud load balancer out of band. In SARC that release is Terraform-managed, so a normal terraform destroy uninstalls it first and frees the LB in the correct order before the network is deleted. OpenShift (ROSA HCP) is teardown-only — it has no stop/start lifecycle.
Disable / enable CI/CD per platform and env
Section titled “Disable / enable CI/CD per platform and env”Nothing in the cloud-deploy path fires automatically — all cloud deploys are manual/dispatch — so disabling CI/CD is mostly about the few scheduled or push-triggered jobs.
- GitLab — the GitHub + ADO mirror sync runs 2x/day from
mirror-sync.yml; kill switch is the project CI/CD variableDISABLE_MIRRORS=true. The Azure + GCP terraform templates are pinnedwhen: never; AWS terraform apply iswhen: manual. - GitHub Actions — cloud deploys use
workflow_dispatch(only run when invoked). Disable a workflow withgh workflow disable <wf.yml>; disable a specific cloud/env by protecting or removing its<cloud>-karc-<env>GitHub Environment or its OIDC secret. - Azure DevOps — pipelines toggle Disabled in the UI; environments (
sarc-azure-*) are operator-bound in the ADO Library. - ArgoCD — production auto-sync is disabled by policy. Pause an app with
argocd app set <app> --sync-policy none; re-enable by restoring the policy.
Park a whole cloud, recoverably
Section titled “Park a whole cloud, recoverably”terraform destroythe cloud (orjust cluster-stop-<cloud>if returning soon).- Disable-not-delete any out-of-band integration service account + secret.
- GitLab: set
DISABLE_MIRRORS=trueif replicas should not refresh. - GitHub:
gh workflow disablethe cloud’s deploy workflows, or protect its Environments. - ArgoCD: set the cloud’s apps to
--sync-policy none.
Reverse each step to bring it back: just bootstrap-<cloud>, re-enable the SA + secret, unset DISABLE_MIRRORS, gh workflow enable, restore ArgoCD sync.