Manual deployments are a recipe for configuration drift and 3 AM emergencies. After accidentally deleting a production deployment with a typo'd kubectl command, I embraced GitOps fully. This article covers deploying ArgoCD with detailed explanations of GitOps concepts, integrating with Infisical for secret management, and building CI/CD pipelines that have saved me countless hours.
Understanding GitOps
What is GitOps?
GitOps treats Git as the single source of truth for your infrastructure. Instead of manually running commands to deploy applications, you commit changes to Git and automation handles the deployment. Think of it like this: Git becomes your control panel, and tools like ArgoCD are the robots that make reality match what's in Git.
Key GitOps Concepts:
- Declarative Configuration: You describe the desired state ("I want 3 replicas") rather than imperative commands ("create 3 replicas")
- Version Control: Every change is tracked in Git history
- Automated Reconciliation: Tools continuously ensure cluster matches Git
- Pull-based Deployment: Cluster pulls changes from Git (more secure than pushing)
Why GitOps Changes Everything
Before GitOps, my workflow was:
- Edit YAML files locally
kubectl apply -fand hope for the best # Manual command to apply changes- Forget what I changed last week
- Panic when something breaks
Now with GitOps:
- All changes go through Git PR review (Pull Request - proposed changes)
- ArgoCD automatically syncs desired state from Git to cluster
- Full audit trail of who changed what and when
- Easy rollback to any previous state (just revert the Git commit)
- Cluster rebuilds from scratch in minutes
The Power of Git History:
# See what changed and when
git log --oneline
# Revert a bad change
git revert abc123
# ArgoCD automatically applies the revert!
Installing ArgoCD
What is ArgoCD?
ArgoCD is a GitOps continuous delivery tool for Kubernetes. It watches your Git repositories and automatically deploys changes to your cluster. Think of it as a bridge between Git and Kubernetes that keeps them in sync.
Deploy ArgoCD with High Availability
# argocd-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: argocd # Dedicated namespace for ArgoCD components
labels:
# Pod Security Standards - controls what pods can do
pod-security.kubernetes.io/enforce: baseline # Enforces basic security
pod-security.kubernetes.io/audit: restricted # Logs violations of strict rules
pod-security.kubernetes.io/warn: restricted # Warns about strict violations
# Create the ArgoCD namespace first
kubectl apply -f argocd-namespace.yaml
# Install ArgoCD with HA (High Availability) configuration
# HA means multiple replicas for reliability
# Note: Check latest releases at github.com/argoproj/argo-cd/releases
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v3.1.3/manifests/ha/install.yaml
# Wait for each component to be ready
# rollout status watches deployment until all pods are running
kubectl -n argocd rollout status deployment/argocd-server # Web UI and API
kubectl -n argocd rollout status deployment/argocd-repo-server # Git repository cache
kubectl -n argocd rollout status deployment/argocd-applicationset-controller # Manages app sets
# Verify all pods are running
kubectl -n argocd get pods
# Should see multiple pods all in Running state
Configure ArgoCD for Production
# argocd-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cm
namespace: argocd
data:
url: "https://argocd.homelab.example"
# Disable anonymous access for security
users.anonymous.enabled: "false"
# Disable built-in admin after setting up SSO or local users
# accounts.admin.enabled: "false" # Uncomment after SSO setup
# Enable GPG signature verification for commits (supply chain security)
gpg.enabled: "true"
# Multi-controller support: unique instance label if running multiple Argo CDs
# application.instanceLabelKey: gitops.argocd.io/instance
# Session security: keep sessions reasonably short
# Note: If you see "unknown key" in logs, your version doesn't support this (defaults to 24h anyway)
users.session.duration: "24h" # Version-dependent, safe to omit
# Health checks for common CRDs (cleaner dashboards)
resource.customizations.health.ceph.rook.io_CephCluster: |
hs = {}
if obj.status ~= nil and obj.status.phase == "Ready" then
hs.status = "Healthy"
hs.message = "CephCluster is Ready"
else
hs.status = "Progressing"
hs.message = "CephCluster not Ready"
end
return hs
resource.customizations.health.cert-manager.io_Certificate: |
hs = {}
if obj.status ~= nil and obj.status.conditions ~= nil then
for _, c in ipairs(obj.status.conditions) do
if c.type == "Ready" and c.status == "True" then
hs.status = "Healthy"; hs.message = "Certificate Ready"; return hs
end
end
end
hs.status = "Progressing"; hs.message = "Waiting for Ready"; return hs
# Resource customizations - handle special cases
# Using split-key format (group_Kind) avoids YAML-in-YAML complexity
# Format: resource.customizations.<action>.<group>_<Kind>
resource.customizations.ignoreDifferences.admissionregistration.k8s.io_MutatingWebhookConfiguration: |
jqPathExpressions:
- '.webhooks[].clientConfig.caBundle' # Ignore ALL webhook cert bundles
resource.customizations.ignoreDifferences.admissionregistration.k8s.io_ValidatingWebhookConfiguration: |
jqPathExpressions:
- '.webhooks[].clientConfig.caBundle'
# Optional: Ignore common drift from HPAs and Services
resource.customizations.ignoreDifferences.apps_Deployment: |
jsonPointers:
- /spec/replicas # HPA manages this
resource.customizations.ignoreDifferences._Service: |
jsonPointers:
- /spec/clusterIP # Kubernetes assigns this
# Exclude resources from sync - ArgoCD won't manage these
resource.exclusions: |
- apiGroups:
- cilium.io
kinds:
- CiliumIdentity # Cilium creates these dynamically, don't sync
clusters:
- "*"
---
# argocd-rbac-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-rbac-cm
namespace: argocd
data:
# RBAC (Role-Based Access Control) policy
policy.default: role:readonly # Default users can only read
policy.csv: | # Policy rules in CSV format
# p = policy rule: role, resource, action, object, effect
# Actions are verbs: get|create|update|delete|sync|override
p, role:admin, applications, *, */*, allow # Admins can do anything with apps
p, role:admin, applications, update/*, */*, allow # Argo CD v3 explicit update
p, role:admin, applications, delete/*, */*, allow # Argo CD v3 explicit delete
p, role:admin, clusters, *, *, allow # Admins can manage clusters
p, role:admin, repositories, *, *, allow # Admins can manage repos
p, role:admin, certificates, *, *, allow # Admins can manage certs
p, role:admin, projects, *, *, allow # Admins can manage projects
g, argocd-admins, role:admin # Group argocd-admins has admin role
Apply the configuration:
# Apply both ConfigMaps with our custom settings
kubectl apply -f argocd-cm.yaml -f argocd-rbac-cm.yaml
# Optional: Apply performance tuning and security hardening
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cmd-params-cm
namespace: argocd
data:
controller.status.processors: "20" # default ~10; bump for many apps
controller.operation.processors: "10" # default ~10; tune to cluster size
reposerver.parallelism.limit: "20" # concurrent manifest generations
server.insecure: "false" # Disable plain HTTP, TLS-only
EOF
# Optional: Add GPG keys for commit verification
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-gpg-keys-cm
namespace: argocd
data:
# Add your organization's signing keys here
# your-key.asc: |
# -----BEGIN PGP PUBLIC KEY BLOCK-----
# ...key content...
# -----END PGP PUBLIC KEY BLOCK-----
EOF
# Or if they're in one file separated by ---:
# kubectl apply -f argocd-configs.yaml
# Restart ALL ArgoCD components to load new configuration
# Different components read different config keys
kubectl -n argocd rollout restart \
deploy/argocd-server \
deploy/argocd-repo-server \
deploy/argocd-application-controller
# Watch the restart progress
kubectl -n argocd rollout status deploy/argocd-server
Expose ArgoCD via LoadBalancer
What's a LoadBalancer?
A LoadBalancer Service gets an external IP address that routes traffic to your pods. In our homelab, Cilium's L2 announcement provides this IP from our defined pool.
First, ensure Cilium has L2 announcements enabled and create the LoadBalancer IP pool:
# cilium-loadbalancer-pool.yaml
apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
name: default
spec:
blocks:
- cidr: 192.168.0.200/29 # Adjust to your network
serviceSelector:
matchLabels:
lb-pool: default
Note: Cilium must be deployed with L2 announcements enabled (l2announcements.enabled=true) for LoadBalancer services to work in bare metal environments.
Then patch the existing ArgoCD service to use LoadBalancer:
# Apply the IP pool first
kubectl apply -f cilium-loadbalancer-pool.yaml
# Patch the existing ArgoCD server service to LoadBalancer
kubectl -n argocd patch svc argocd-server -p '{"spec":{"type":"LoadBalancer"}}'
# Label the service to use our IP pool
kubectl -n argocd label svc argocd-server lb-pool=default
# Get the LoadBalancer IP (may take a minute to assign)
kubectl -n argocd get svc argocd-server
# Output explained:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
argocd-server LoadBalancer 10.96.50.100 192.168.0.202 80:30901/TCP,443:31890/TCP
# Internal IP Your LAN IP HTTP:NodePort,HTTPS:NodePort
# Access ArgoCD at: https://192.168.0.202
# Note: You'll get a certificate warning - that's expected for now
# In Part 6, we'll add Traefik ingress with trusted certificates
# Optional: Preserve real client IPs (if your L2/BGP supports it)
# kubectl -n argocd patch svc argocd-server -p '{"spec":{"externalTrafficPolicy":"Local"}}'
# Optional: Remove unused HTTP port since we're TLS-only
# Method 1: Test and remove by index (if http is first)
kubectl -n argocd patch svc argocd-server --type json \
-p '[{"op":"test","path":"/spec/ports/0/name","value":"http"},{"op":"remove","path":"/spec/ports/0"}]'
# Method 2: Order-agnostic removal using jq
kubectl -n argocd get svc argocd-server -o json | \
jq 'del(.spec.ports[] | select(.name=="http"))' | \
kubectl apply -f -
Secret Management with Infisical
Why Not Store Secrets in Git?
Git history is permanent - once a secret is committed, it's visible forever even if deleted. Anyone with repository access can see all secrets. Instead, we store secrets in a secure vault (Infisical) and reference them from our manifests.
What is Infisical?
Infisical is an open-source secret management platform. It stores secrets encrypted and provides them to applications at runtime. Think of it as a secure password manager for your applications.
Install Infisical Operator
# Add Infisical's official Helm repository (Cloudsmith)
# Helm repositories are like app stores for Kubernetes
helm repo add infisical 'https://dl.cloudsmith.io/public/infisical/helm-charts/helm/charts/'
helm repo update # Refresh available charts
# Install the Infisical operator
# Operator = controller that manages Infisical secrets in Kubernetes
helm install infisical-operator infisical/secrets-operator \
--namespace infisical \ # Install in dedicated namespace
--create-namespace \ # Create namespace if it doesn't exist
--version 0.11.1 # Version with secretsScope support
# Note: Check latest releases if using newer versions - CRD fields may evolve
# Verify fields with: kubectl explain infisicalsecrets.spec --recursive
# Verify installation
kubectl -n infisical get pods
# Should see infisical-operator pod running
Configure Infisical Authentication
Setting up Machine Identity:
Machine identities are service accounts for applications. Unlike user accounts, they're designed for automated systems. You create these in the Infisical dashboard.
# infisical-auth.yaml
apiVersion: v1
kind: Secret
metadata:
name: infisical-auth
namespace: argocd
stringData: # stringData allows plain text (stored base64-encoded in etcd; enable etcd encryption for at-rest security)
identity-id: "YOUR_IDENTITY_ID" # From Infisical dashboard
client-id: "YOUR_CLIENT_ID" # From Infisical dashboard
client-secret: "YOUR_CLIENT_SECRET" # From Infisical dashboard
# SECURITY WARNING: Never commit this file with real values!
# Create the secret out-of-band instead:
# kubectl -n argocd create secret generic infisical-auth \
# --from-literal=identity-id=... \
# --from-literal=client-id=... \
# --from-literal=client-secret=...
---
ArgoCD Repository Credentials
Choose your credential pattern:
- Use
repo-credsfor org/domain-wide access (one secret covers many repos) - Use
repositoryfor single repo access (more granular control)
Create a template secret that Infisical will populate:
# gitlab-repo-creds.yaml
apiVersion: v1
kind: Secret
metadata:
name: gitlab-repo-creds
namespace: argocd
labels:
argocd.argoproj.io/secret-type: repo-creds # repo-creds applies to all matching repos
stringData:
type: git
url: https://gitlab.com/your-org # Matches all repos under this org
username: not-used # GitLab uses tokens as passwords
password: "<INFISICAL_WILL_INJECT>" # Infisical operator will replace this
Then create the InfisicalSecret to manage it:
# infisical-gitlab-secret.yaml
apiVersion: secrets.infisical.com/v1alpha1
kind: InfisicalSecret
metadata:
name: gitlab-repo-creds-manager
namespace: argocd
spec:
hostAPI: https://app.infisical.com/api
authentication:
universalAuth:
credentialsRef:
secretName: infisical-auth
secretNamespace: argocd
secretsScope:
projectSlug: homelab
envSlug: prod
secretsPath: /argocd
managedKubeSecretReferences:
- secretName: gitlab-repo-creds
secretNamespace: argocd
How This Works:
- The
repo-credstype means one credential covers all repos undergitlab.com/your-org - Infisical operator watches for InfisicalSecret resources
- It fetches the token from Infisical using the machine identity
- Updates the
gitlab-repo-credssecret with the actual token - ArgoCD uses this credential for any repo matching the URL pattern
Project Structure
Organizing for GitOps:
A well-organized repository makes GitOps manageable. Here's the structure I use:
k8s-manifests/ # Root of your GitOps repository
├── bootstrap/ # Initial cluster setup
│ ├── argocd/ # ArgoCD itself
│ │ └── kustomization.yaml # Kustomize configuration
│ └── infisical/ # Secret management
│ └── kustomization.yaml
├── infrastructure/ # Cluster-wide services
│ ├── cert-manager/ # SSL certificate management
│ ├── cilium/ # CNI networking
│ ├── rook-ceph/ # Storage
│ └── traefik/ # Ingress controller
├── applications/ # Your actual applications
│ ├── production/ # Production environment
│ │ ├── app1/ # Each app in its folder
│ │ └── app2/
│ └── staging/ # Staging environment
│ ├── app1/ # Same apps, different config
│ └── app2/
├── clusters/ # Cluster-specific configuration
│ └── homelab/ # Your cluster name
│ ├── apps.yaml # Application definitions
│ └── infrastructure.yaml # Infrastructure definitions
└── .gitlab-ci.yml # CI/CD pipeline configuration
Why This Structure:
- Separation of concerns: Infrastructure vs applications
- Environment isolation: Staging and production separate
- Single source of truth: One place for all configurations
- Easy navigation: Logical folder structure
App of Apps Pattern
What is App of Apps?
Instead of managing dozens of individual applications, you create one "parent" application that manages all others. It's like having a manager that oversees multiple teams - you only need to talk to the manager.
Create AppProjects First
Note: Applications reference projects for access control. Create these before deploying apps:
# app-projects.yaml
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: infrastructure
namespace: argocd
spec:
description: Infrastructure applications
sourceRepos:
- https://gitlab.com/your-org/k8s-manifests # Specific repo (more secure)
- https://charts.bitnami.com/bitnami # Helm charts
- https://charts.rook.io/release # Rook-Ceph charts
destinations:
- namespace: 'argocd'
server: https://kubernetes.default.svc
- namespace: 'infisical'
server: https://kubernetes.default.svc
- namespace: 'cert-manager'
server: https://kubernetes.default.svc
- namespace: 'kube-system'
server: https://kubernetes.default.svc
- namespace: 'rook-ceph'
server: https://kubernetes.default.svc
- namespace: 'traefik'
server: https://kubernetes.default.svc
- namespace: 'monitoring'
server: https://kubernetes.default.svc
- namespace: 'prometheus'
server: https://kubernetes.default.svc
# Alternative for homelab simplicity (less secure):
# - namespace: '*'
# server: https://kubernetes.default.svc
clusterResourceWhitelist:
- group: '' # Core resources
kind: 'Namespace'
- group: 'apiextensions.k8s.io' # CRDs for infrastructure apps
kind: 'CustomResourceDefinition'
- group: 'cilium.io'
kind: '*'
- group: 'cert-manager.io'
kind: '*'
---
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: applications
namespace: argocd
spec:
description: Business applications
sourceRepos:
- https://gitlab.com/your-org/k8s-manifests
- https://charts.bitnami.com/bitnami
destinations:
- namespace: 'databases'
server: https://kubernetes.default.svc
- namespace: 'production'
server: https://kubernetes.default.svc
- namespace: 'staging'
server: https://kubernetes.default.svc
namespaceResourceWhitelist:
- group: '*'
kind: '*'
orphanedResources:
warn: true # Warn about resources not tracked in Git
# Apply projects first
kubectl apply -f app-projects.yaml
# clusters/homelab/infrastructure.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application # ArgoCD Application resource
metadata:
name: infrastructure
namespace: argocd
labels:
gitops.argocd.io/project: infrastructure # Label for CI/CD pipeline selectors
finalizers: # Ensures resources are cleaned up properly on deletion
- resources-finalizer.argocd.argoproj.io
spec:
project: infrastructure # ArgoCD project (for RBAC)
source:
repoURL: https://gitlab.com/your-org/k8s-manifests # Git repository
targetRevision: main # Branch to track
path: infrastructure # Folder in repo
destination:
server: https://kubernetes.default.svc # Deploy to same cluster
syncPolicy:
automated: # Automatic synchronization settings
prune: true # Delete resources not in Git
selfHeal: true # Fix drift automatically
allowEmpty: false # Don't sync if path is empty
syncOptions:
- CreateNamespace=true # Create namespaces if needed
- PrunePropagationPolicy=foreground # Delete dependencies first
- PruneLast=true # Delete parent resources last
- ServerSideApply=true # Reduce patch conflicts
retry:
limit: 5 # Retry failed syncs 5 times
backoff:
duration: 5s # Start with 5 second wait
factor: 2 # Double wait time each retry
maxDuration: 3m # Max 3 minutes between retries
revisionHistoryLimit: 10 # Keep 10 previous versions for rollback
---
# clusters/homelab/apps.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: applications
namespace: argocd
labels:
gitops.argocd.io/project: applications # Label for CI/CD pipeline selectors
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: applications
source:
repoURL: https://gitlab.com/your-org/k8s-manifests
targetRevision: main
path: applications/production
destination:
server: https://kubernetes.default.svc
syncPolicy:
automated:
prune: false # Don't auto-delete apps
selfHeal: true
syncOptions:
- CreateNamespace=true
- ServerSideApply=true # Reduce patch conflicts
revisionHistoryLimit: 10
Deploy the App of Apps:
# Apply the parent applications
kubectl apply -f clusters/homelab/infrastructure.yaml
kubectl apply -f clusters/homelab/apps.yaml
# Get ArgoCD initial admin password
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d
# Login to ArgoCD CLI (install with: brew install argocd)
# --insecure is needed for self-signed certificates
argocd login 192.168.0.202 --username admin --password <password> --insecure
# SECURITY: After setting up SSO or changing admin password, delete the initial secret
# argocd account update-password # Change password first
# kubectl -n argocd delete secret argocd-initial-admin-secret
# Watch sync status
argocd app list # List all applications
argocd app sync infrastructure # Manually trigger sync
argocd app get infrastructure # Detailed status
# Or use the Web UI at https://192.168.0.202
CI/CD Pipeline with GitLab
What's CI/CD?
- CI (Continuous Integration): Automatically test and validate code changes
- CD (Continuous Deployment): Automatically deploy validated changes
Here's my production pipeline that validates and deploys:
# .gitlab-ci.yml
stages: # Pipeline stages run in order
- validate # Check YAML syntax
- test # Security and compatibility checks
- deploy # Deploy to cluster
variables:
ARGOCD_SERVER: argocd.homelab.example
# ARGOCD_TOKEN set in GitLab CI/CD variables (synced from Infisical)
# Validate YAML syntax
validate:yaml:
stage: validate
image: alpine:latest # Lightweight Linux container
before_script:
- apk add --no-cache yamllint # Install YAML linter
script:
- yamllint -c .yamllint . # Check all YAML files
rules:
- if: $CI_MERGE_REQUEST_ID # Only run on pull requests
# Validate Kubernetes manifests
validate:k8s:
stage: validate
image: alpine:latest
before_script:
- apk add --no-cache curl tar
# Download and install kubeconform (validates K8s YAML - kubeval successor)
- curl -L -o /tmp/kubeconform.tar.gz https://github.com/yannh/kubeconform/releases/latest/download/kubeconform-linux-amd64.tar.gz
- tar -xzf /tmp/kubeconform.tar.gz -C /usr/local/bin kubeconform
- chmod +x /usr/local/bin/kubeconform
script:
# Validate all YAML files recursively with strict mode
# Pin to your cluster version to reduce false positives
- export K8S_VERSION=${K8S_VERSION:-1.29.0}
- kubeconform -kubernetes-version $K8S_VERSION -strict -summary -ignore-missing-schemas -recursive .
rules:
- if: $CI_MERGE_REQUEST_ID # Only on pull requests
# Security scanning with Kubesec
security:scan:
stage: test
image: alpine:latest
before_script:
- apk add --no-cache curl jq # curl for HTTP, jq for JSON
script:
- | # Multi-line script
for file in $(find . -name "*.yaml" -type f); do
echo "Scanning $file for security issues"
# Send file to Kubesec API for scanning
# Note: This uses a public API - consider self-hosted scanner for sensitive repos
curl -X POST --data-binary @"$file" https://v2.kubesec.io/scan | jq .
# Kubesec checks for security issues like:
# - Running as root
# - Missing security contexts
# - Privileged containers
done
rules:
- if: $CI_MERGE_REQUEST_ID
# Dry-run with ArgoCD
test:dryrun:
stage: test
image: argoproj/argocd:v3.1.3 # ArgoCD CLI image
script:
- set -euo pipefail # Fail fast on errors
# Show what would change without actually changing
# First hard-refresh, then diff
- argocd app get applications --server $ARGOCD_SERVER --auth-token $ARGOCD_TOKEN --hard-refresh
- argocd app diff applications --server $ARGOCD_SERVER --auth-token $ARGOCD_TOKEN | tee argocd-diff.txt
# diff shows: additions (+), deletions (-), modifications (~)
artifacts:
when: always
paths:
- argocd-diff.txt # Keep diff for MR review
expire_in: 1 week
rules:
- if: $CI_MERGE_REQUEST_ID # Only on pull requests
# Deploy to production
deploy:production:
stage: deploy
image: argoproj/argocd:v3.1.3
script:
- |
echo "🚀 Syncing ArgoCD applications..."
# app sync doesn't support --project, use label selector
argocd app sync -l gitops.argocd.io/project=applications \
--server $ARGOCD_SERVER \ # ArgoCD server URL
--auth-token $ARGOCD_TOKEN \ # Authentication
--prune \ # Remove deleted resources
--retry-limit 3 # Retry 3 times on failure
# Wait for sync to complete (up to 5 minutes)
argocd app wait -l gitops.argocd.io/project=applications \
--server $ARGOCD_SERVER \
--auth-token $ARGOCD_TOKEN \
--timeout 300 # 300 seconds = 5 minutes
environment:
name: production
url: https://argocd.homelab.example # Link in GitLab UI
rules:
- if: $CI_COMMIT_BRANCH == "main" # Only deploy from main branch
# Rollback on failure
rollback:auto:
stage: deploy
image: argoproj/argocd:v3.1.3
script:
- |
echo "⚠️ Deployment failed, rolling back..."
# Rollback to previous version (defaults to previous)
argocd app rollback applications \
--server $ARGOCD_SERVER \
--auth-token $ARGOCD_TOKEN
when: on_failure # Only runs if deploy:production fails
rules:
- if: $CI_COMMIT_BRANCH == "main"
Application Deployment Example
Deploying with Helm Charts:
Helm charts are pre-packaged Kubernetes applications. Instead of writing all YAML yourself, you use a template and customize it with values.
# applications/production/postgresql/application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: postgresql
namespace: argocd
labels:
gitops.argocd.io/project: applications # For CI/CD selectors
spec:
project: applications # ArgoCD project for permissions
source:
repoURL: https://charts.bitnami.com/bitnami # Helm chart repository
chart: postgresql # Chart name
targetRevision: 12.12.10 # Chart version
helm:
releaseName: postgresql # Name for this installation
values: | # Custom configuration
auth:
database: homelab
existingSecret: postgresql-secret # Uses secret from Infisical
secretKeys:
adminPasswordKey: postgres-password
userPasswordKey: postgres-password
replicationPasswordKey: postgres-replication-password
primary:
persistence:
enabled: true
storageClass: ceph-block-fast
size: 50Gi
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
metrics:
enabled: true
serviceMonitor:
enabled: true
destination:
server: https://kubernetes.default.svc
namespace: databases
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Managing Secrets with Infisical
How Secret Sync Works:
- Define secrets in Infisical dashboard
- Create InfisicalSecret resource in Kubernetes
- Infisical operator fetches and creates Kubernetes secret
- Application uses the Kubernetes secret
# applications/production/postgresql/infisical-secret.yaml
apiVersion: secrets.infisical.com/v1alpha1
kind: InfisicalSecret
metadata:
name: postgresql-secret
namespace: databases # Where the secret will be created
spec:
hostAPI: https://app.infisical.com/api
authentication:
universalAuth:
credentialsRef:
secretName: infisical-auth # Auth credentials
secretNamespace: argocd
secretsScope:
projectSlug: homelab # Infisical project
envSlug: prod # Environment
secretsPath: /databases/postgresql # Path in Infisical
managedKubeSecretReferences:
- secretName: postgresql-secret # K8s secret to create
secretNamespace: databases # In this namespace
# The Bitnami chart expects these keys in the secret:
# - postgres-password (admin password)
# - password (user password, if creating custom user)
# - postgres-replication-password (replication password)
# You've overridden these via auth.secretKeys above—ensure your Infisical secret matches
Sync Strategies
Choosing the Right Strategy:
Not all applications should auto-sync. Critical infrastructure needs manual approval, while development apps can sync automatically.
# Critical infrastructure - manual sync
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: rook-ceph # Storage is critical - don't auto-update
spec:
syncPolicy:
automated: false # Requires manual 'argocd app sync' command
---
# Development apps - auto sync everything
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: dev-app
spec:
syncPolicy:
automated:
prune: true # Delete removed resources
selfHeal: true # Fix any manual changes
allowEmpty: false # Don't sync if Git path is empty
---
# Production apps - careful auto sync
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: prod-app
spec:
syncPolicy:
automated:
prune: false # Never auto-delete in production
selfHeal: true # Fix drift from desired state
syncOptions:
- ApplyOutOfSyncOnly=true # Don't touch unchanged resources
- RespectIgnoreDifferences=true # Honor ignoreDifferences settings
Multi-Environment Management
What is Kustomize?
Kustomize lets you customize Kubernetes resources without templates. You define a base configuration and then apply patches for different environments. It's like having a base recipe and adding different spices for different tastes.
# applications/base/app/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources: # Base resources used by all environments
- deployment.yaml
- service.yaml
- ingress.yaml
---
# applications/staging/app/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base/app # Start with base configuration
patchesStrategicMerge:
- deployment-patch.yaml # Apply staging-specific changes
configMapGenerator: # Create ConfigMap with environment variables
- name: app-config
literals:
- environment=staging
- debug=true # Enable debug in staging
---
# applications/production/app/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base/app # Same base as staging
patchesStrategicMerge:
- deployment-patch.yaml # Production-specific changes
configMapGenerator:
- name: app-config
literals:
- environment=production
- debug=false # No debug in production
replicas: # Override replica count for production
- name: app
count: 3 # Run 3 replicas in production (vs 1 in staging)
Monitoring ArgoCD
Prometheus Metrics
# argocd-metrics-service.yaml
apiVersion: v1
kind: Service
metadata:
name: argocd-metrics
namespace: argocd
labels:
app.kubernetes.io/name: argocd-metrics # ServiceMonitor matches this label
spec:
ports:
- name: metrics
port: 8082 # Application controller metrics port
protocol: TCP
targetPort: 8082
selector:
app.kubernetes.io/name: argocd-application-controller # Selects controller pods
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: argocd-metrics
namespace: argocd
spec:
selector:
matchLabels:
app.kubernetes.io/name: argocd-metrics # Matches Service label, not pod label
namespaceSelector:
matchNames: ["argocd"]
endpoints:
- port: metrics
interval: 30s
Metrics Endpoints:
- Application Controller:
:8082/metrics(app health, sync status) - API Server:
:8083/metrics(API requests) - Repo Server:
:8084/metrics(Git operations) - ApplicationSet Controller:
:8080/metrics(appset operations)
Complete Metrics Setup
To monitor all ArgoCD components (server:8083, repo:8084, appset:8080), create Services and ServiceMonitors:
# argocd-metrics-complete.yaml
apiVersion: v1
kind: Service
metadata:
name: argocd-server-metrics
namespace: argocd
labels:
app.kubernetes.io/name: argocd-server-metrics
spec:
selector:
app.kubernetes.io/name: argocd-server
ports:
- name: metrics
port: 8083
targetPort: 8083
---
apiVersion: v1
kind: Service
metadata:
name: argocd-repo-server-metrics
namespace: argocd
labels:
app.kubernetes.io/name: argocd-repo-server-metrics
spec:
selector:
app.kubernetes.io/name: argocd-repo-server
ports:
- name: metrics
port: 8084
targetPort: 8084
---
apiVersion: v1
kind: Service
metadata:
name: argocd-applicationset-metrics
namespace: argocd
labels:
app.kubernetes.io/name: argocd-applicationset-metrics
spec:
selector:
app.kubernetes.io/name: argocd-applicationset-controller
ports:
- name: metrics
port: 8080
targetPort: 8080
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: argocd-server-metrics
namespace: argocd
spec:
selector:
matchLabels:
app.kubernetes.io/name: argocd-server-metrics
namespaceSelector:
matchNames: ["argocd"]
endpoints:
- port: metrics
interval: 30s
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: argocd-repo-server-metrics
namespace: argocd
spec:
selector:
matchLabels:
app.kubernetes.io/name: argocd-repo-server-metrics
namespaceSelector:
matchNames: ["argocd"]
endpoints:
- port: metrics
interval: 30s
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: argocd-applicationset-metrics
namespace: argocd
spec:
selector:
matchLabels:
app.kubernetes.io/name: argocd-applicationset-metrics
namespaceSelector:
matchNames: ["argocd"]
endpoints:
- port: metrics
interval: 30s
Note: Ensure your Prometheus Operator is configured to watch ServiceMonitors in the argocd namespace.
Key Metrics to Watch
# Application health status - how many apps are unhealthy?
sum(argocd_app_info{health_status!="Healthy"})
# Alert if > 0
# Sync operations per hour - how busy is ArgoCD?
rate(argocd_app_sync_total[1h])
# Normal: 10-50/hour, High: >100/hour
# Resource count by sync status - what's out of sync?
sum by (sync_status) (argocd_app_info)
# Should mostly show "Synced"
# Git request rate - how many Git operations per hour?
rate(argocd_git_request_total[1h])
# Alert if rate is unusually high (may indicate issues)
# Apps stuck OutOfSync for >15 minutes (alert-worthy)
max_over_time(sum by (name) (argocd_app_info{sync_status="OutOfSync"})[15m]) > 0
Security Hardening
Network Policies
Lock down the ArgoCD namespace with strict network policies:
# argocd-network-policies.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: argocd-default-deny
namespace: argocd
spec:
podSelector: {}
policyTypes: ["Ingress","Egress"]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: argocd-server-https-ingress
namespace: argocd
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: argocd-server
policyTypes: ["Ingress"]
ingress:
- ports:
- port: 443
protocol: TCP
# Optionally restrict to your LAN:
# from: [{ ipBlock: { cidr: 192.168.0.0/16 } }]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: argocd-allow-egress-api-git
namespace: argocd
spec:
podSelector: {}
policyTypes: ["Egress"]
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: default
- ipBlock: { cidr: 0.0.0.0/0 } # Git providers
ports:
- port: 443
protocol: TCP
# Uncomment if using Git over SSH:
# - port: 22
# protocol: TCP
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: argocd-allow-dns
namespace: argocd
spec:
podSelector: {}
policyTypes: ["Egress"]
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns # CoreDNS only
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP # DNS fallback
GitLab Webhook Configuration
Enable instant sync on Git push (instead of polling):
# Create webhook secret
WEBHOOK_SECRET=$(openssl rand -base64 32)
kubectl -n argocd patch secret argocd-secret \
--type merge -p "{\"stringData\":{\"webhook.gitlab.secret\":\"$WEBHOOK_SECRET\"}}"
echo "GitLab Webhook Secret: $WEBHOOK_SECRET"
In GitLab → Project → Settings → Webhooks:
- URL:
https://argocd.homelab.example/api/webhook - Secret Token: (use the generated secret)
- Triggers: Push events, Merge request events
- SSL verification: Disable temporarily (until Part 6 adds trusted certs)
Note: GitLab will fail SSL verification with self-signed certs. Temporarily disable SSL verification now; Part 6's checklist includes re-enabling it after cert-manager setup.
OIDC SSO Configuration
Replace local users with GitLab SSO:
# argocd-cm.yaml additions
data:
oidc.config: |
name: GitLab
issuer: https://gitlab.com
clientID: $oidc.gitlab.clientId
clientSecret: $oidc.gitlab.clientSecret
requestedScopes: ["openid","profile","email","groups"]
requestedIDTokenClaims:
groups:
essential: true
# Disable admin after SSO works
accounts.admin.enabled: "false"
# Store OIDC credentials (get from GitLab Applications)
kubectl -n argocd patch secret argocd-secret --type merge -p \
'{"stringData":{"oidc.gitlab.clientId":"<client-id>","oidc.gitlab.clientSecret":"<client-secret>"}}'
Ensure your GitLab users are in a group named argocd-admins to match the RBAC policy.
Test SSO is working:
argocd account get-user-info --server $ARGOCD_SERVER --auth-token $ARGOCD_TOKEN
# Expect groups to include: argocd-admins
Disaster Recovery
Backup ArgoCD Configuration
# Create backup directory
mkdir -p backup/$(date +%Y%m%d) # Date-stamped folder
cd backup/$(date +%Y%m%d)
# Backup all applications
for app in $(argocd app list -o name); do
echo "Backing up $app"
argocd app get $app -o yaml > $app.yaml
done
# Backup ArgoCD projects (for RBAC and app grouping)
kubectl get appprojects -n argocd -o yaml > projects.yaml
# Backup repository credentials (both single-repo and org-wide patterns)
kubectl get secrets -n argocd \
-l 'argocd.argoproj.io/secret-type in (repository,repo-creds)' \
-o yaml > repos.yaml
# Backup ArgoCD configuration
kubectl -n argocd get cm argocd-cm argocd-rbac-cm -o yaml > argocd-configs.yaml
# Backup additional configs (performance, GPG keys)
kubectl -n argocd get cm argocd-cmd-params-cm argocd-gpg-keys-cm -o yaml > argocd-extra-configs.yaml 2>/dev/null || true
# Backup AppProjects CRD (needed for disaster recovery on fresh clusters)
kubectl get crd appprojects.argoproj.io -o yaml > crd-appprojects.yaml
# Create restore script
cat > restore.sh << 'EOF'
#!/bin/bash
echo "Restoring ArgoCD configuration..."
kubectl apply -f projects.yaml
kubectl apply -f repos.yaml
for file in *.yaml; do
if [[ $file != "projects.yaml" && $file != "repos.yaml" ]]; then
kubectl apply -f $file
fi
done
EOF
chmod +x restore.sh
Restore from Backup
# Navigate to backup directory
cd backup/20240315 # Use your backup date
# Run the restore script
./restore.sh
# Or manually restore:
# 1. Restore projects first (defines permissions)
kubectl apply -f projects.yaml
# 2. Restore repository credentials
kubectl apply -f repos.yaml
# 3. Restore all applications
for file in *.yaml; do
if [[ $file != "projects.yaml" && $file != "repos.yaml" ]]; then
kubectl apply -f $file
fi
done
# 4. Trigger sync for all apps (using label selector)
argocd app sync -l gitops.argocd.io/project=applications
# 5. Verify restoration
argocd app list
Common Patterns and Best Practices
1. Progressive Delivery
apiVersion: argoproj.io/v1alpha1
kind: Application
spec:
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- ApplyOutOfSyncOnly=true
managedNamespaceMetadata:
labels:
environment: production
annotations:
team: platform
2. Resource Hooks and Sync Waves
What are Sync Waves?
Sync waves control deployment order. Lower numbers deploy first. Use for dependencies:
metadata:
annotations:
argocd.argoproj.io/sync-wave: "-2" # CRDs first
---
metadata:
annotations:
argocd.argoproj.io/sync-wave: "-1" # Namespace/RBAC
---
metadata:
annotations:
argocd.argoproj.io/sync-wave: "0" # Core infrastructure
---
metadata:
annotations:
argocd.argoproj.io/sync-wave: "1" # Applications
What are Hooks?
Hooks are resources that run at specific points in the sync process. Common uses: database migrations before app deployment, cache warming after deployment.
apiVersion: batch/v1
kind: Job
metadata:
generateName: schema-migration- # Unique name each run
annotations:
# Hook annotations tell ArgoCD when to run this
argocd.argoproj.io/hook: PreSync # Run before main sync
argocd.argoproj.io/hook-delete-policy: HookSucceeded # Delete job when done
spec:
template:
spec:
containers:
- name: migrate
image: migrate/migrate
command: ["migrate", "-path", "/migrations", "-database", "$DATABASE_URL", "up"]
restartPolicy: Never # Don't restart if migration fails
# Hook types:
# - PreSync: Before syncing resources
# - Sync: During sync (waves)
# - PostSync: After successful sync
# - SyncFail: If sync fails
3. Ignore Differences
Why Ignore Differences?
Some fields are managed by Kubernetes or other controllers, not Git. Ignoring these prevents ArgoCD from constantly showing "OutOfSync".
apiVersion: argoproj.io/v1alpha1
kind: Application
spec:
ignoreDifferences:
- group: apps
kind: Deployment
jsonPointers:
- /spec/replicas # HPA changes this, don't sync from Git
- group: "" # Core API group (no group name)
kind: Service
jsonPointers:
- /spec/clusterIP # Kubernetes assigns this, not in Git
# Common fields to ignore:
# - /metadata/resourceVersion # Kubernetes version tracking
# - /status # Status is current state, not desired
# - /spec/replicas # When using HPA
# - webhook certificates # Auto-generated
Troubleshooting
Application Stuck in Syncing
# 1. Check detailed sync status
argocd app get <app-name>
# Look for: Sync Status, Health Status, and any error messages
# 2. Check controller logs for errors
kubectl -n argocd logs deployment/argocd-application-controller --tail=50
# Common errors:
# - "repository not accessible" = Git credentials issue
# - "rate limit" = Too many Git API calls
# - "timeout" = Resource taking too long to become healthy
# 3. Force refresh (re-read Git, compare with cluster)
argocd app get <app-name> --refresh
# 4. Hard refresh (delete repo cache, re-clone)
argocd app get <app-name> --hard-refresh
# 5. If still stuck, terminate operation
argocd app terminate-op <app-name>
# 6. Manual sync with specific options
argocd app sync <app-name> --force # Ignore conflicts
Out of Sync but Nothing Changed
# 1. See exactly what ArgoCD thinks is different
argocd app diff <app-name>
# Shows + (additions), - (deletions), ~ (changes)
# Common causes and fixes:
# Cause 1: Admission webhooks adding defaults
# Example: Webhook adds sidecar containers
# Fix: Add to ignoreDifferences in Application
# Cause 2: Controllers modifying resources
# Example: HPA changing replica count
# Fix: Ignore /spec/replicas for that Deployment
# Cause 3: Server-side defaults
# Example: imagePullPolicy defaults to "IfNotPresent"
# Fix: Explicitly set in your manifests
# Cause 4: Kubernetes API conversions
# Example: memory: 1Gi becomes memory: 1073741824
# Fix: Use consistent units in manifests
# Quick fix - sync anyway:
argocd app sync <app-name> --force
Repository Connection Issues
# 1. List all configured repositories
argocd repo list
# STATUS column shows connection state
# 2. Test specific repository
argocd repo get <repo-url>
# Shows detailed connection info and last error
# 3. Common fixes:
# Fix: Update expired token
argocd repo add <repo-url> \
--username not-used \ # GitLab uses tokens as passwords
--password <new-token> \ # Personal access token
--upsert # Update if exists
# Fix: SSH key authentication
argocd repo add [email protected]:org/repo.git \
--ssh-private-key-path ~/.ssh/id_rsa
# Fix: Self-signed certificates
argocd repo add https://gitlab.internal/repo \
--insecure-skip-server-verification
# 4. Verify fix worked
argocd app get <app-name> --refresh
Production Readiness Checklist
Before considering your GitOps setup production-ready:
- [ ] Every child Application has the label
gitops.argocd.io/projectfor CI/CD selectors - [ ] Cilium L2 announcements are enabled before using LoadBalancer services
- [ ] Initial admin secret deleted after setting up SSO or rotating password
- [ ] Built-in
admindisabled in argocd-cm after SSO/password rotation - [ ] Server-side apply enabled on parent Applications to reduce conflicts
- [ ] Git webhooks configured (push/MR) with secret for instant sync
- [ ] NetworkPolicies applied in
argocdnamespace for zero-trust - [ ] Performance tuning applied if managing >20 applications
- [ ] All metrics endpoints monitored for complete observability
- [ ] Repository credentials use
repo-credsfor org-wide access - [ ] Backup scripts tested including restore procedure
- [ ] AppProject destinations include all required namespaces
What's Next
With GitOps operational, your deployments are now declarative and auditable. In Part 6, we'll add ingress and certificate management with Traefik and cert-manager, exposing your applications securely to the world.
Key Takeaways
- GitOps provides single source of truth - Git becomes your desired state
- Never store secrets in Git - Use Infisical or similar secret management
- App of Apps pattern scales well - Manage hundreds of apps easily
- CI/CD validation prevents disasters - Catch issues before production
- Different sync strategies for different apps - Not everything should auto-sync
References
- ArgoCD Documentation: https://argo-cd.readthedocs.io/en/stable/
- GitOps Principles: https://www.gitops.tech/
- Infisical Documentation: https://infisical.com/docs
- Kustomize Documentation: https://kustomize.io/
- GitLab CI/CD: https://docs.gitlab.com/ee/ci/
Continue to Part 6: Ingress and Certificate Management →