Skip to content

Multi-cloud architecture

A single environment variable, TARGET_CLOUD, drives every cloud-specific decision. Set it once; everything composes:

TARGET_CLOUDTerraform stackkubectl authHelm overlayKosli env naming
awsinfra/aws/aws eks update-kubeconfigvalues-aws.yamlaws-karc-<env>
azureinfra/azure/az aks get-credentialsvalues-azure.yamlazure-karc-<env>
gcpinfra/gcp/gcloud container clusters get-credentialsvalues-gcp.yamlgcp-karc-<env>
k3dinfra/k3d/k3d kubeconfig mergevalues-k3d.yamlk3d-karc-<env>

OpenShift (ROSA HCP) is a 5th observable target but not a TARGET_CLOUD value — it’s gated per-tenant via TenantConfig.openshiftApiUrl + openshiftToken columns and read-only. SARC observes OpenShift CMDB (ClusterOperators, Routes, BuildConfigs, Builds, ImageStreams, ImageStreamTags) but does not deploy karc-portal onto it.

Same Helm chart. Same portal code. Same kosli-reporter. Same compliance pipeline. Same ServiceNow integration. Same ArgoCD setup. Same karc-{dev,qa,prod} namespace shape. Same compliance frameworks + controls.

  • Identity-as-a-service primitive: IRSA (AWS), Workload Identity (Azure), Workload Identity Federation (GCP), plain k8s Secret (k3d)
  • Container registry: ECR (AWS), ACR (Azure), Artifact Registry (GCP), local registry (k3d)
  • Secrets store: Secrets Manager (AWS), Key Vault (Azure), Secret Manager (GCP), kubectl create secret (k3d)
  • Cost ingest source: Cost Explorer (AWS), Cost Management (Azure), Cloud Billing (GCP), Kubecost (any)
  • Load balancer: ELB+nip.io (AWS demo), Azure Load Balancer, GCP Cloud Load Balancing, traefik+localtest.me (k3d)

All of these are encapsulated in the per-cloud Terraform stack and the per-cloud values-*.yaml overlay. The portal code doesn’t know which cloud it’s running on.

What composes outside the per-cloud overlay

Section titled “What composes outside the per-cloud overlay”
  • infra/modules/cluster-bootstrap/ — shared Helm installs (ingress-nginx, cert-manager, external-secrets, Kosli reporter CronJob) used by every cloud
  • scripts/bootstrap-secrets-*.sh — per-cloud, but call the same shared logic
  • gitops/argocd/install.sh — per-cloud values overlays for ArgoCD itself
  • apps/karc-portal/ — single Helm chart, varying values per cloud

As of 2026-05-19:

  • AWS terraform — GitLab CI (the main pipeline)
  • Azure terraform — Azure DevOps
  • GCP terraform — GitHub Actions

This isn’t because the code is different — it’s because the customer engagement model varies (different orgs may be ahead on different CIs for different clouds). Each CI runs the same Terraform code with the right backend.

If you’re contributing a feature that touches infrastructure, the parity test is:

  1. Does the change land equally in all four values-*.yaml overlays?
  2. Does it work on at least k3d (local) + one cloud you have access to?
  3. Does the per-cloud terraform module need a sibling change in the other three?

Open the MR with a “tested on: k3d, aws” (or whichever clouds) note. Reviewers verify parity before merging.

Cluster: EKS (managed) with a managed-nodegroup for the workload + a separate nodegroup for ingress.

Identity: IRSA (IAM Roles for Service Accounts) wired through OIDC. The Kosli reporter, the external-secrets controller, and the portal all assume per-service IAM roles — no long-lived AWS keys anywhere in-cluster.

Registries + secrets: ECR for images, AWS Secrets Manager for tenant + integration secrets (Kosli token, ServiceNow OAuth, GitHub PAT), surfaced via external-secrets.

Network: VPC with private subnets + VPC endpoints for ECR / S3 / Secrets Manager so the cluster can reach AWS services without a NAT path.

CI ownership: GitLab CI runs the AWS terraform stack.

Demo URL pattern: https://<service>.<lb-ip>.nip.io (nip.io wildcard DNS — instant per-cluster URL without zone setup).

Cost band: ~120–150 USD / month for the standing demo cluster; cheaper if you spin it up on demand.

Cluster: AKS with a system nodepool + a workload nodepool.

Identity: Workload Identity (the AKS-native flow) backed by the Microsoft Entra OIDC provider. Same pattern as IRSA — per-service ManagedIdentities, no static credentials.

Registries + secrets: ACR for images, Azure Key Vault for secrets, surfaced via external-secrets.

CI ownership: Azure DevOps runs the Azure terraform stack (Azure-only parallel CI on top of the GitLab source-of-truth).

Cost band: comparable to AWS; AKS control plane is free, nodes are the cost.

Cluster: GKE Autopilot (no node management) or Standard, depending on the engagement.

Identity: Workload Identity Federation — the cluster’s KSAs federate to GCP service accounts via OIDC, without keys.

Registries + secrets: Artifact Registry for images, Secret Manager for secrets, surfaced via external-secrets.

CI ownership: GitHub Actions runs the GCP terraform stack.

Cost band: Autopilot is more predictable on cost; Standard is cheaper but needs sizing care.

Cluster: k3d (k3s in Docker) for hands-on evaluation without any cloud cost.

Identity: Local secrets in plain ConfigMaps for tenant + integration data. The Kosli + ServiceNow integrations still point at the real upstream services (the same way a cloud install would) — what’s local is the cluster, not the SaaS dependencies.

Bring-up: single command (just demo-up-k3d) brings up the cluster + ingress + cert-manager + podtato-head + the portal in ~5 minutes.

Cost: zero (cloud) — just whatever your laptop draws.

Best for: developer onboarding, demo rehearsal without burning a cloud cluster, contributors testing changes against a real cluster locally.

Posture: OpenShift (ROSA HCP) is an observable target, not one of the four TARGET_CLOUD values. The portal renders OpenShift-specific surfaces (Routes, BuildConfigs, ImageStreams) read-only when configured per-tenant — useful for organisations that run OpenShift alongside the main cloud target.

Auth: a paste-in Service Account bearer token in tenant settings.