From Messy to Modular: A Better Way to Write Production-Ready Terraform – Part 1

Hey everyone! If you’ve been in the DevOps or Cloud Engineering space for a while, you’ve probably seen it all. From a single, monstrous main.tf file that tries to define an entire universe, to copy-pasting code across projects until you can’t tell which Kubernetes cluster belongs to dev and which one is the prod money-maker.

We’ve all been there. You start with a simple project, and it works. Then business asks you to spin up a staging environment. Then a UAT environment. Soon, you’re drowning in duplicated code, and a simple change (like a Kubernetes version upgrade) requires updating five different places. That’s not just messy; it’s a recipe for disaster.

Over the years, I’ve learned that writing good Infrastructure as Code (IaC) is a lot like writing good application code. It’s all about patterns, reusability, and modularity. In this multi-part series, I’ll walk you through the design patterns I use to build robust, scalable, and easy-to-manage Terraform projects for AWS.

Today, we’re starting with the absolute foundation for taming complexity: The Module Pattern.

What’s Wrong With a Giant `main.tf`?

Imagine you’re building a car engine. You could try to assemble every single piston, wire, and bolt in one go, right on the factory floor. It might work, but what happens when you want to build a slightly different engine for a different car model? You’d have to start from scratch or painstakingly copy your first creation.

A monolithic main.tf file for an EKS cluster is that chaotic assembly. It mixes the what (an EKS control plane, IAM roles, node groups, security groups) with the where and why (this is for the dev environment with t3.medium nodes, this is for the prod app with m5.large nodes).

This approach has several major flaws:

It’s not DRY (Don’t Repeat Yourself): Creating a new environment means copying hundreds of lines of complex IAM and networking code.
High cognitive load: Understanding the cluster setup requires deciphering a huge, interconnected block of code.
High blast radius: A small typo in an IAM policy could cripple your entire cluster, and if you’ve copy-pasted, that vulnerability exists in every environment.

We can do better. Instead of assembling the engine piece by piece every time, let’s build pre-fabricated components, like a complete fuel injection system or a pre-wired ignition module. In Terraform, we call these modules.

The Module Pattern: Your IaC Building Blocks

A Terraform module is a self-contained package of Terraform configurations that are managed as a group. Think of it as a function in a programming language. It takes some inputs (variables), performs some actions (creates resources), and provides some outputs.

For something like an EKS cluster, a module is a lifesaver. It can encapsulate all the boilerplate resources needed to get a cluster up and running:

The EKS cluster control plane itself.
The complex IAM roles and policies for the cluster and its nodes.
The managed node groups, including their launch templates and auto-scaling configurations.

You then call this single module from your environment-specific code, passing in values like the cluster name, version, and instance types.

A Practical Example: A Reusable EKS Cluster Module

Let’s build a simplified version. A common best practice is to have a modules directory in your repository where you store your custom, reusable modules. Our initial project structure looks like this:

terraform-project/
├── modules/
│   └── aws-eks-cluster/
│       ├── main.tf
│       ├── variables.tf
│       └── outputs.tf
└── environments/
    └── dev/
        └── main.tf

Inside modules/aws-eks-cluster/, we define the module’s API (variables.tf), its logic (main.tf), and its return values (outputs.tf).

(For brevity, the full code for the EKS module is omitted here but is the same as in our previous discussion. It defines resources like aws_eks_cluster, aws_eks_node_group, and their associated IAM roles.)

Patterns for Environment Variables

Okay, we have our reusable EKS module. Now for the most important part: How do we feed it the right configuration for each environment (dev, prod, etc.) in a way that is clean, scalable, and easy to manage?

Let’s look at the evolution of passing variables, from the basic approach to the recommended best practice.

Method 1: In-line Arguments

The most straightforward way is to hardcode the values directly in the module block inside your environment’s main.tf.

# environments/dev/main.tf

module "eks_cluster" {
  source = "../../modules/aws-eks-cluster"

  # --- In-line arguments ---
  cluster_name                = "my-app-dev-cluster"
  node_group_instance_types   = ["t3.medium"]
  node_group_desired_size     = 2
  # ... other variables
}

This is fine for a quick test, but it doesn’t scale. It mixes configuration with logic, making the file hard to read and forcing you to hunt for values when you need to make a change.

Method 2: Using `.tfvars` Files (The Standard Way)

A much better pattern is to separate your configuration values from your resource logic. A .tfvars file is a simple text file for variable assignments. You create one for each environment.

1. Define Root Variables: First, in your environment directory (environments/dev/), create a variables.tf file to declare the variables this environment will accept.

# environments/dev/variables.tf

variable "node_group_instance_types" {
  description = "Instance types for the EKS node group."
  type        = list(string)
}
# ... other variable definitions

2. Create an Environment .tfvars File: Now, create a dev.tfvars file in the same directory to provide the values.

# environments/dev/dev.tfvars

node_group_instance_types   = ["t3.medium"]
node_group_desired_size     = 2

Your prod environment would have its own prod.tfvars with different values, like ["m5.large"].

3. Update main.tf and Apply: Your main.tf now uses these variables, becoming generic.

# environments/dev/main.tf
module "eks_cluster" {
  source = "../../modules/aws-eks-cluster"

  # Pass variables from the root module to the child module
  node_group_instance_types = var.node_group_instance_types
  # ...
}

You then apply the specific configuration from the command line: terraform apply -var-file="dev.tfvars". This cleanly separates the “what” from the “how.”

Method 3: Using `locals` for Derived Values

What if you want to enforce a consistent naming convention? Instead of defining cluster_name in every .tfvars file, you can derive it using locals. Locals are like named constants within your configuration.

Create a locals.tf file in your environment directory.

# environments/dev/locals.tf

locals {
  # Base variables
  environment = "dev"
  project     = "my-app"

  # Derived value to enforce naming conventions
  cluster_name = "${local.project}-${local.environment}-cluster"

  # Centralized map of tags
  common_tags = {
    Environment = title(local.environment)
    Project     = local.project
  }
}

Now, your main.tf can use this local value, ensuring consistency: cluster_name = local.cluster_name.

Putting It All Together: The Recommended Structure

For a truly robust and readable project, you should combine Methods 2 and 3.

Use .tfvars files for the raw inputs that change between environments (instance sizes, counts).
Use locals to enforce conventions and derive values (names, tags).

Here is the complete, recommended file structure and workflow for your dev environment:

Project Structure:

terraform-project/
├── modules/
│   └── aws-eks-cluster/
│       ├── main.tf
│       ├── variables.tf
│       └── outputs.tf
└── environments/
    └── dev/
        ├── main.tf           # Orchestrates the modules
        ├── variables.tf      # Defines input variables for the environment
        ├── locals.tf         # Defines naming conventions and common tags
        └── dev.tfvars        # Sets the actual values for dev

dev.tfvars (The Raw Config):

# environments/dev/dev.tfvars
instance_types = ["t3.medium"]
node_count     = 2

locals.tf (The Conventions):

# environments/dev/locals.tf
locals {
  environment = "dev"
  project     = "my-app"
  cluster_name = "${local.project}-${local.environment}-cluster"
  common_tags = {
    Environment = title(local.environment)
    Project     = local.project
    ManagedBy   = "Terraform"
  }
}

main.tf (The Orchestrator):

# environments/dev/main.tf
module "eks_cluster" {
  source = "../../modules/aws-eks-cluster"

  # Values from locals
  cluster_name = local.cluster_name
  tags         = local.common_tags

  # Values from .tfvars file
  node_group_instance_types = var.instance_types
  node_group_desired_size   = var.node_count

  # Other values like VPC and subnets
  vpc_id     = data.aws_vpc.selected.id
  subnet_ids = data.aws_subnets.private.ids
}

This pattern gives you the best of all worlds: a clean separation of concerns, enforced consistency, and environment configurations that are simple and easy to audit.

What’s Next in Part 2?

We’ve now built a reusable module and established a robust pattern for configuring it across multiple environments. This is a massive step towards professional-grade IaC.

But there’s still a critical piece of the puzzle missing: State Management. Where does Terraform store the state file for each environment? How do we prevent developers from accidentally running dev changes against the prod state?

In Part 2, we’ll explore patterns for remote state backends (using S3, of course!) and introduce tools like Terragrunt to keep our environment configurations even more DRY.

Stay tuned, and happy building! Feel free to leave your questions in the comments, and I will be glad to connect on LinkedIn.

Disclaimer: Parts of this article were drafted with the help of an AI assistant. The technical concepts, code examples, and overall structure were directed, curated, and verified by the author to ensure technical accuracy and reflect real-world experience.

Source link