Skip to content

Cloud + CI/CD lifecycle

SARC deploys the same stack to AWS, Azure, GCP, and local k3d from one repo via TARGET_CLOUD. Each cloud, and each CI/CD pipeline that feeds it, can be turned off and back on without losing the ability to rebuild. There are three levels of “off”, from cheapest-to-resume to fully removed.

The full operator guide lives in the repo at docs/CLOUD-LIFECYCLE.md. This page is the summary.

LevelWhat it doesBillingReversibility
PauseScale nodes to 0 (EKS/GKE) or az aks stop (AKS). Control plane stays.Control plane onlyInstant — just cluster-start-<cloud>, 3-5 min to Ready
Teardownterraform destroy the whole infra/<cloud> stack.ZeroRebuild via just bootstrap-<cloud>, ~30 min
Disable-not-deleteFor out-of-band artifacts (e.g. a GitLab-integration service account + token): gcloud ... disable.Negligiblegcloud ... enable

Pick the lowest level that meets the goal. Pause is the default for routine weekend/overnight cost saving; the per-cloud stop/start recipes exist so you do not have to tear down and rebuild.

just clusters-stop-all stop AWS + Azure + GCP + k3d (fail-tolerant)
just clusters-start-all start every cluster before a demo
just clusters-status power state of all four clusters
just cluster-stop-<cloud> aws | azure | gcp | k3d
just cluster-start-<cloud>
  • AWS EKS — nodes scaled to 0; control plane still billed.
  • Azure AKSaz aks stop deallocates control plane + nodes (cheapest pause).
  • GCP GKE — node pools to 0; regional control plane still billed.
  • k3d — local, no cloud cost.
Terminal window
cd infra/<cloud>
terraform destroy # frees the LoadBalancer first, then cluster, network, IAM, secrets
just bootstrap-<cloud> # rebuild from clean state, ~30 min

A Kubernetes Service of type LoadBalancer (ingress-nginx) creates a cloud load balancer out of band. In SARC that release is Terraform-managed, so a normal terraform destroy uninstalls it first and frees the LB in the correct order before the network is deleted. OpenShift (ROSA HCP) is teardown-only — it has no stop/start lifecycle.

Disable / enable CI/CD per platform and env

Section titled “Disable / enable CI/CD per platform and env”

Nothing in the cloud-deploy path fires automatically — all cloud deploys are manual/dispatch — so disabling CI/CD is mostly about the few scheduled or push-triggered jobs.

  • GitLab — the GitHub + ADO mirror sync runs 2x/day from mirror-sync.yml; kill switch is the project CI/CD variable DISABLE_MIRRORS=true. The Azure + GCP terraform templates are pinned when: never; AWS terraform apply is when: manual.
  • GitHub Actions — cloud deploys use workflow_dispatch (only run when invoked). Disable a workflow with gh workflow disable <wf.yml>; disable a specific cloud/env by protecting or removing its <cloud>-karc-<env> GitHub Environment or its OIDC secret.
  • Azure DevOps — pipelines toggle Disabled in the UI; environments (sarc-azure-*) are operator-bound in the ADO Library.
  • ArgoCD — production auto-sync is disabled by policy. Pause an app with argocd app set <app> --sync-policy none; re-enable by restoring the policy.
  1. terraform destroy the cloud (or just cluster-stop-<cloud> if returning soon).
  2. Disable-not-delete any out-of-band integration service account + secret.
  3. GitLab: set DISABLE_MIRRORS=true if replicas should not refresh.
  4. GitHub: gh workflow disable the cloud’s deploy workflows, or protect its Environments.
  5. ArgoCD: set the cloud’s apps to --sync-policy none.

Reverse each step to bring it back: just bootstrap-<cloud>, re-enable the SA + secret, unset DISABLE_MIRRORS, gh workflow enable, restore ArgoCD sync.