How to Manage Terraform State in a Large Team
Published on 19 Mar 2026 by Adam Lloyd-Jones
To manage Terraform state in a large team without conflict, you should move away from manual local execution and shift toward a modular, file-based isolation strategy managed by a centralized CI/CD system.
1. Shift from Workspaces to File Layout Isolation
While Terraform CLI workspaces allow for isolated tests on the same code, they are often considered unsuitable for isolating critical environments like staging from production. A major friction point with workspaces is that they are not visible in the code itself; a module deployed in one workspace looks identical to one in ten others, making it easy for developers to lose track of their context and accidentally run apply in the wrong environment.
The solution is Isolation via File Layout:
- Separate Folders: Define each environment (e.g.,
/stage,/prod) and each component (e.g.,/vpc,/data-storage) in its own dedicated directory. - Explicit Context: This approach makes it immediately clear which environment a developer is working in just by looking at their current directory.
- Distinct Backends: Each folder should have its own backend configuration, potentially in separate AWS accounts, to provide “bulkheads” that prevent a mistake in staging from impacting production.
2. Resolving CI/CD Lock Collisions
State lock collisions in parallel pipelines are common when multiple jobs attempt to modify the same state file simultaneously.
- Lock Timeout: Configure your CI/CD pipelines to run
terraform applywith the-lock-timeout=<TIME>parameter (e.g.,-lock-timeout=10m). This instructs Terraform to wait for a set period for the lock to be released rather than failing the build immediately. - Decompose Monoliths: Splitting a monolithic state into smaller, independent components (e.g., separating the network from the database) naturally reduces lock contention because parallel jobs will likely be targeting different state files.
3. Establishing an Audit Trail (Who and Why)
If you are deploying from local laptops, you lose visibility into the history of changes.
- Deploy via CI Server: Run all production deployments from a centralized CI/CD server (such as GitHub Actions, Azure Pipelines, or a specialized TACOS platform like Terraform Cloud).
- VCS as the Source of Truth: Storing Terraform code in Version Control (Git) captures the entire history of infrastructure changes in the commit log. A well-written commit message provides the “why,” while the Git history provides the “who” and “when”.
- Atlantis: Consider using Atlantis, which automatically runs
terraform planon every pull request and posts the output as a comment. This integrates the “diff” directly into the peer review process, ensuring changes are visible and approved before they are applied.
4. Managing State Sprawl and Dependencies
While splitting a monolithic state is recommended to improve performance and security, it does introduce dependency management challenges.
- Data Source Lookups (Preferred): Before relying on
terraform_remote_state, check if you can use provider-specific data sources to look up resources. For example, instead of reading a network state file to get a VPC ID, use theaws_vpcdata source to look it up by tags. This reduces tight coupling between projects. - Remote State with Care: When you must use
terraform_remote_state, treat it as a read-only dependency. Use thedefaultsparameter to set fallback values for new projects that don’t have outputs yet, turning a hard dependency into a soft one. - Terragrunt: To prevent “remote state sprawl” from becoming unmanageable, use Terragrunt. It allows you to define backend configurations once in a root file and automatically inherit them across dozens of modules, keeping your configurations DRY (Don’t Repeat Yourself) while maintaining folder-based state isolation.
Summary of Recommendations
| Friction Point | Recommended Action |
|---|---|
| Wrong Workspace | Switch to File Layout Isolation (separate folders for /stage and /prod). |
| Lock Collisions | Implement -lock-timeout and split the monolith into smaller components. |
| Lack of Audit | Deploy only from CI/CD; use Atlantis to link plans to PR comments. |
| Sprawl/Coupling | Use Data Source lookups instead of remote state where possible. |
| DRY Backends | Use Terragrunt to manage many small state files from a single config. |
Related Posts
- Kubernetes Introduction: Core Concepts, Architecture, and Best Practices
- Module 5: Terraform CI/CD Environments and Production Workflows on Azure
- Module 4: Modularisation and Reusability in Terraform
- Module 2: Provisioning Core Azure Resources With Terraform
- Module 1: Introduction to Terraform on Azure
