Infrastructure as Code with Terraform: Managing Multi-Cloud Deployments
Running infrastructure across AWS and GCP simultaneously is not a theoretical exercise for many teams. Whether it’s leveraging GCP’s ML platform alongside AWS’s mature networking stack, or satisfying client requirements that mandate a specific cloud, multi-cloud is often a business reality rather than an architectural preference. Terraform is the most practical tool for managing this without losing your mind.
Module Structure for Multi-Cloud
The key architectural decision is how to organize modules when the same logical resource exists on different providers. Resist the temptation to abstract away cloud differences behind a single generic module. A “generic compute” module that wraps both EC2 and GCE with conditional logic becomes unmaintainable fast.
Instead, create provider-specific modules with a consistent interface pattern. A Kubernetes cluster module for GKE and another for EKS should both accept similar input variables like node count, machine type, and network CIDR, but their internal implementation stays provider-native. The calling code in your environment compositions chooses which module to invoke based on the target cloud.
Structure your repository with a clear separation: modules/aws/, modules/gcp/, and modules/shared/ for truly cloud-agnostic resources like DNS records or monitoring configurations. Environment definitions live under environments/ and compose these modules together.
State Management That Won’t Bite You
Remote state is non-negotiable for team workflows. For multi-cloud setups, I store Terraform state in the primary cloud’s object storage, typically S3 with DynamoDB locking. Using a single state backend simplifies access control and avoids the circular dependency of needing GCP infrastructure to store state that provisions GCP infrastructure.
Split state files by environment and by cloud provider. A single state file for all infrastructure becomes a bottleneck when multiple engineers run plans simultaneously, and a failed apply on one provider shouldn’t lock state for the other. Your backend configuration should produce paths like terraform/prod/aws/networking.tfstate and terraform/prod/gcp/gke.tfstate.
Use terraform_remote_state data sources sparingly to share outputs between state files. Overuse creates tight coupling between independent stacks. For cross-cloud references like passing a GCP service account email into an AWS IAM trust policy, consider writing outputs to a shared parameter store or using Terraform Cloud’s workspace outputs.
Variables and Environment Parity
Define a tfvars file per environment per cloud. The pattern environments/prod/aws.tfvars and environments/staging/gcp.tfvars keeps configuration explicit. Avoid deeply nested variable objects. Flat variables with clear names like gke_node_pool_machine_type are easier to grep, review in PRs, and override in CI pipelines.
Use locals blocks to compute derived values rather than pushing that logic into variable defaults. A local that calculates subnet CIDRs from a base network range is more maintainable than asking every caller to manually compute non-overlapping subnets.
CI/CD Pipeline Integration
Terraform in CI requires careful sequencing. The pipeline should run terraform fmt -check and terraform validate as fast-fail steps, followed by terraform plan with the output saved to a plan file. The apply step should consume that exact plan file, never re-run planning.
For multi-cloud repos, parallelize plans across providers but serialize applies within each provider. AWS networking must exist before EKS can be provisioned, and GCP VPC must exist before GKE. Encode these dependencies in your pipeline DAG rather than relying on Terraform’s implicit ordering across state files.
Pin provider versions aggressively. A minor version bump in the AWS provider has broken production plans more than once. Use a .terraform.lock.hcl committed to the repository, and update provider versions in dedicated PRs with full plan review.
Drift Detection and Compliance
Schedule periodic terraform plan runs in CI that compare actual infrastructure against your declared state. Pipe the output to Slack or your alerting system. Cloud consoles make it easy for someone to click a checkbox that creates drift, and catching it within hours prevents the kind of state corruption that forces manual state surgery.
For compliance requirements, use Sentinel policies or Open Policy Agent with conftest to validate plans before apply. Rules like “no public S3 buckets” or “all GCE instances must have OS Login enabled” catch misconfigurations before they reach infrastructure.
Handling Secrets
Never store secrets in Terraform state or variable files. Use provider-specific secret managers: AWS Secrets Manager or GCP Secret Manager. Reference secrets with data sources at plan time, and restrict state file access since Terraform writes secret values into state in plaintext. Encrypt your state backend’s storage bucket and limit IAM access to the pipeline service account and a break-glass role.
Conclusion
Multi-cloud Terraform is manageable when you keep modules provider-native, split state aggressively, and enforce guardrails through CI. The complexity is real, but the alternative of managing two separate infrastructure workflows with divergent patterns is worse. Invest in the module structure and pipeline discipline upfront, and the operational overhead stays linear rather than exponential as your infrastructure grows.
