Infrastructure as Code : Terraform en production

Terraform est l'outil standard pour Infrastructure as Code. Mais nous avons vu des centaines d'organisations :

State files corrompus → perte de contrôle
Secrets en clair en repos
Drift non détecté (real infra ≠ Terraform config)
Pas de tests → breaking changes en prod

Cet article est un guide pragmatique pour Terraform en production sans regrets.

Architecture Terraform : le modèle

Oubliez les tutoriels "hello world". Voici une architecture robuste :

terraform/
├── modules/
│   ├── kubernetes/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   │   └── tests/
│   ├── database/
│   ├── networking/
│   └── security/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── terraform.tfvars
│   │   └── backend.tf
│   ├── staging/
│   └── production/
├── shared/
│   └── vpc.tf
└── tests/
    ├── terraform_test.go
    └── fixtures/

Principes :

Modules : code réutilisable (ex. module Kubernetes)
Environments : dev/staging/prod avec configurations différentes
Shared : ressources communes (VPC, security groups)
Tests : validation avant production

État Terraform : le cœur sensible

State file = base de données Terraform. Elle mappe votre code à l'infra réelle.

resource "aws_instance" "api" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t2.medium"
  tags = {
    Name = "api-server"
  }
}

# State file contient :
# aws_instance.api:
#   id = i-1234567890abcdef0
#   public_ip = 203.0.113.5
#   ... (200+ attributes)

Problème 1 : State non-versioned = disaster

# ❌ MAUVAIS : state local
terraform {
  # backend "local" # pas configuré = state dans working directory
}
# Risque : quelqu'un delete le directory = perte complète

# ✓ BON : state centralisé
terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "prod/terraform.tfstate"
    region         = "eu-ch-1"
    encrypt        = true
    dynamodb_table = "terraform-lock"  # locking (concurrent access)
  }
}

Checklist state :

- Stocké en S3/Blob/GCS (pas local)
- Encryption at rest (AES256)
- Versioning activé (history)
- Backup réguliers
- DynamoDB table pour locking

Problème 2 : Secrets en clair = breach

# ❌ MAUVAIS : secrets en clair
resource "aws_db_instance" "postgres" {
  allocated_storage    = 20
  storage_type         = "gp2"
  engine               = "postgres"
  engine_version       = "14.0"
  instance_class       = "db.t2.micro"
  db_name              = "mydb"
  username             = "admin"
  password             = "SecurePassword123!"  # ❌ En clair !
  skip_final_snapshot  = false
}

# État Terraform inclut le mot de passe en clair
# → Audit trouvera, breach possible

# ✓ BON : secrets externalisés
resource "aws_db_instance" "postgres" {
  allocated_storage    = 20
  storage_type         = "gp2"
  engine               = "postgres"
  engine_version       = "14.0"
  instance_class       = "db.t2.micro"
  db_name              = var.db_name
  username             = var.db_user
  password             = var.db_password  # Passed via var, jamais en clair
  skip_final_snapshot  = false
}

# variables.tf
variable "db_password" {
  description = "Database password"
  type        = string
  sensitive   = true  # Ne pas afficher en logs
  # Default : none (doit être passé)
}

# .gitignore
*.tfvars       # Jamais commiter terraform.tfvars
!*.example.tfvars
.terraform/

Passer les secrets via CI/CD :

# CI/CD pipeline
terraform apply \
  -var="db_password=${DB_PASSWORD}" \
  -var="api_key=${VAULT_API_KEY}"

# DB_PASSWORD et VAULT_API_KEY = secrets environment vars

Ou utiliser Vault :

terraform {
  required_providers {
    vault = {
      source  = "hashicorp/vault"
      version = "~> 4.0"
    }
  }
}

provider "vault" {
  address = var.vault_addr
}

data "vault_generic_secret" "db_password" {
  path = "secret/data/database/prod"
}

resource "aws_db_instance" "postgres" {
  password = data.vault_generic_secret.db_password.data["password"]
}

Checklist secrets :

- Jamais en clair dans le code
- Jamais commités en Git
- Marked as sensitive = true
- Passés via CI/CD ou Vault

Modules Terraform : DRY principle

Répéter du code Terraform est tentant. Résiste.

# ❌ MAUVAIS : duplication
resource "aws_security_group" "api_prod" {
  name = "api-prod-sg"
  vpc_id = aws_vpc.production.id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_security_group" "api_staging" {
  name = "api-staging-sg"
  vpc_id = aws_vpc.staging.id
  # ... répète 20 lignes identiques
}

# ✓ BON : module réutilisable
# modules/security_group/main.tf
variable "name" {
  type = string
}

variable "vpc_id" {
  type = string
}

variable "allowed_ports" {
  type = list(number)
  default = [80, 443]
}

resource "aws_security_group" "main" {
  name   = var.name
  vpc_id = var.vpc_id

  dynamic "ingress" {
    for_each = var.allowed_ports
    content {
      from_port   = ingress.value
      to_port     = ingress.value
      protocol    = "tcp"
      cidr_blocks = ["0.0.0.0/0"]
    }
  }
}

output "security_group_id" {
  value = aws_security_group.main.id
}

# environments/production/main.tf
module "api_sg_prod" {
  source = "../../modules/security_group"

  name          = "api-prod-sg"
  vpc_id        = aws_vpc.production.id
  allowed_ports = [80, 443, 8080]
}

module "api_sg_staging" {
  source = "../../modules/security_group"

  name          = "api-staging-sg"
  vpc_id        = aws_vpc.staging.id
  allowed_ports = [80, 443]  # Différent
}

Testing Terraform : avant la prod

# Installer Terratest
go get -u github.com/gruntwork-io/terratest/modules/terraform
go get -u github.com/gruntwork-io/terratest/modules/random

// test/terraform_test.go
package test

import (
	"testing"

	"github.com/gruntwork-io/terratest/modules/terraform"
	"github.com/stretchr/testify/assert"
)

func TestTerraformKubernetesModule(t *testing.T) {
	t.Parallel()

	terraformOptions := &terraform.Options{
		TerraformDir: "../terraform/modules/kubernetes",
		Vars: map[string]interface{}{
			"cluster_name": "test-cluster",
			"node_count":   3,
			"node_type":    "t2.medium",
		},
	}

	// Cleanup
	defer terraform.Destroy(t, terraformOptions)

	// Init + Plan + Apply
	terraform.InitAndApply(t, terraformOptions)

	// Assertions
	clusterName := terraform.Output(t, terraformOptions, "cluster_name")
	assert.Equal(t, "test-cluster", clusterName)

	nodeCount := terraform.OutputInt(t, terraformOptions, "node_count")
	assert.Equal(t, 3, nodeCount)
}

Exécuter :

cd test
go test -v -timeout 30m
# Output:
# === RUN   TestTerraformKubernetesModule
# --- PASS: TestTerraformKubernetesModule (120s)

Intégrer au CI :

# .gitlab-ci.yml
test:terraform:
  stage: test
  script:
    - terraform validate
    - terraform plan -out=tfplan
    - go test -v ./test
  artifacts:
    paths:
      - tfplan

Checklist testing :

- terraform validate passe en CI
- terraform plan passe en CI (pas breaking changes)
- Terratest pour modules critiques
- Plan review obligatoire (MR/PR avant apply)

Drift detection : garder la sync

Drift = état réel ≠ Terraform config.

Exemple : quelqu'un SSH dans le serveur et modifie une config. Terraform ignore, next apply écrase.

# Détecter le drift
terraform apply -refresh-only
terraform plan

# Output :
# aws_instance.api will be destroyed
# aws_instance.api will be created
# Reason : real state ≠ tf config

Automatiser la détection :

# CronJob : hourly drift detection
0 * * * * cd /terraform && terraform apply -refresh-only -auto-approve && terraform plan -out=/tmp/tfplan.out | mail -s "Drift detected" ops@company.ch

Ou via Atlantis (terraform automation bot) :

# atlantis.yaml
projects:
- name: kubernetes
  dir: terraform/environments/production
  workflow: default
  autoplan:
    when_modified: ["*.tf"]
    enabled: true
  automerge:
    when_modified: ["*.tf"]
    enabled: false  # Manual approval required for production

Atlantis autolaunches terraform plan/apply via PR comments.

Multi-environnement : structure

terraform/
├── environments/
│   ├── dev/
│   │   ├── terraform.tfvars
│   │   │   api_replicas = 1
│   │   │   db_instance_type = "db.t2.micro"
│   │   │   enable_backup = false
│   │   └── main.tf (imports modules)
│   ├── staging/
│   │   ├── terraform.tfvars
│   │   │   api_replicas = 2
│   │   │   db_instance_type = "db.t3.small"
│   │   │   enable_backup = true
│   │   └── main.tf
│   └── production/
│       ├── terraform.tfvars
│       │   api_replicas = 5
│       │   db_instance_type = "db.t3.large"
│       │   enable_backup = true
│       │   backup_retention_days = 30
│       └── main.tf

Déployer par environnement :

# Dev
cd terraform/environments/dev
terraform apply -var-file=terraform.tfvars

# Staging
cd ../staging
terraform apply -var-file=terraform.tfvars

# Production
cd ../production
terraform apply -var-file=terraform.tfvars

Gotchas courants

Gotcha 1 : Resource delete sans intention

# Resource existe
resource "aws_db_instance" "postgres" {
  allocated_storage = 20
  ...
}

# Quelqu'un supprime la ligne
# terraform apply → destruira la DB en production !

Solution : Protéger les ressources critiques :

resource "aws_db_instance" "postgres" {
  lifecycle {
    prevent_destroy = true
  }
}

# Maintenant : terraform apply refusera de la supprimer
# Error: Resource `aws_db_instance.postgres` cannot be destroyed

Gotcha 2 : Arguments ignorer les modifications

resource "aws_instance" "api" {
  ami           = "ami-12345"
  instance_type = "t2.medium"

  tags = {
    Environment = "production"
  }
}

# Quelqu'un ajoute un tag manuellement (AWS console)
# tags: {
#   Environment = "production"
#   Backup = "daily"   # ← Added manually
# }

# terraform plan → détecte la différence :
# aws_instance.api: Tags will be updated

Solution : Ignore les modifications tagging manuel :

resource "aws_instance" "api" {
  lifecycle {
    ignore_changes = [tags]
  }
}

# Maintenant terraform ignore les tags ajoutés manuellement

Gotcha 3 : Timing issues (dépendances implicites)

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

resource "aws_security_group" "web" {
  vpc_id = aws_vpc.main.id  # Terraform infers dependency
}

# Terraform sait que SG dépend de VPC, applique dans l'ordre.
# Bon.

# Mais parfois :
resource "aws_security_group_rule" "web_http" {
  type                     = "ingress"
  from_port                = 80
  to_port                  = 80
  protocol                 = "tcp"
  security_group_id        = aws_security_group.web.id
  source_security_group_id = aws_security_group.elb.id
}

# Si elb_id not existe pas → race condition
# Solution : dépendance explicite
resource "aws_security_group_rule" "web_http" {
  depends_on = [aws_security_group.elb]
  # ...
}

Checklist production Terraform

- State en S3/Blob avec encryption, versioning, locking
- Secrets externalisés (Vault ou CI vars), jamais en clair
- Modules réutilisables pour ressources communes
- Tests (terraform validate + Terratest)
- Plan review obligatoire en CI (nobody applies sans review)
- Drift detection automatisé
- Multi-environnement structuré (dev/staging/prod)
- Protect critical resources (prevent_destroy)
- CHANGELOG maintenu (what changed, why)

Conclusion

Terraform bien fait = infra reproductible, auditable, testée.

Étapes pour démarrer :

Structure modules + environments
Sécuriser state (S3 centralisé)
Gérer secrets proprement (Vault)
Tests en CI
Plan review obligatoire

Hidora aide à architected et maintenir Terraform à l'échelle.

À lire aussi :

Cet article vous a été utile ? Découvrez comment Hidora peut vous accompagner : Professional Services · Managed Services · SLA Expert