Part 5: GitOps with ArgoCD

Manual deployments are a recipe for configuration drift and 3 AM emergencies. After accidentally deleting a production deployment with a typo'd kubectl command, I embraced GitOps fully. This article covers deploying ArgoCD with detailed explanations of GitOps concepts, integrating with Infisical for secret management, and building CI/CD pipelines that have saved me countless hours.

Understanding GitOps

What is GitOps?

GitOps treats Git as the single source of truth for your infrastructure. Instead of manually running commands to deploy applications, you commit changes to Git and automation handles the deployment. Think of it like this: Git becomes your control panel, and tools like ArgoCD are the robots that make reality match what's in Git.

Key GitOps Concepts:

Declarative Configuration: You describe the desired state ("I want 3 replicas") rather than imperative commands ("create 3 replicas")
Version Control: Every change is tracked in Git history
Automated Reconciliation: Tools continuously ensure cluster matches Git
Pull-based Deployment: Cluster pulls changes from Git (more secure than pushing)

Why GitOps Changes Everything

Before GitOps, my workflow was:

Edit YAML files locally
kubectl apply -f and hope for the best # Manual command to apply changes
Forget what I changed last week
Panic when something breaks

Now with GitOps:

All changes go through Git PR review (Pull Request - proposed changes)
ArgoCD automatically syncs desired state from Git to cluster
Full audit trail of who changed what and when
Easy rollback to any previous state (just revert the Git commit)
Cluster rebuilds from scratch in minutes

The Power of Git History:

# See what changed and when
git log --oneline
# Revert a bad change
git revert abc123
# ArgoCD automatically applies the revert!

Installing ArgoCD

What is ArgoCD?
ArgoCD is a GitOps continuous delivery tool for Kubernetes. It watches your Git repositories and automatically deploys changes to your cluster. Think of it as a bridge between Git and Kubernetes that keeps them in sync.

Deploy ArgoCD with High Availability

# argocd-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: argocd  # Dedicated namespace for ArgoCD components
  labels:
    # Pod Security Standards - controls what pods can do
    pod-security.kubernetes.io/enforce: baseline   # Enforces basic security
    pod-security.kubernetes.io/audit: restricted   # Logs violations of strict rules
    pod-security.kubernetes.io/warn: restricted    # Warns about strict violations

# Create the ArgoCD namespace first
kubectl apply -f argocd-namespace.yaml

# Install ArgoCD with HA (High Availability) configuration
# HA means multiple replicas for reliability
# Note: Check latest releases at github.com/argoproj/argo-cd/releases
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v3.1.3/manifests/ha/install.yaml

# Wait for each component to be ready
# rollout status watches deployment until all pods are running
kubectl -n argocd rollout status deployment/argocd-server          # Web UI and API
kubectl -n argocd rollout status deployment/argocd-repo-server     # Git repository cache
kubectl -n argocd rollout status deployment/argocd-applicationset-controller  # Manages app sets

# Verify all pods are running
kubectl -n argocd get pods
# Should see multiple pods all in Running state

Configure ArgoCD for Production

# argocd-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
  namespace: argocd
data:
  url: "https://argocd.homelab.example"

  # Disable anonymous access for security
  users.anonymous.enabled: "false"

  # Disable built-in admin after setting up SSO or local users
  # accounts.admin.enabled: "false"  # Uncomment after SSO setup

  # Enable GPG signature verification for commits (supply chain security)
  gpg.enabled: "true"

  # Multi-controller support: unique instance label if running multiple Argo CDs
  # application.instanceLabelKey: gitops.argocd.io/instance

  # Session security: keep sessions reasonably short
  # Note: If you see "unknown key" in logs, your version doesn't support this (defaults to 24h anyway)
  users.session.duration: "24h"  # Version-dependent, safe to omit

  # Health checks for common CRDs (cleaner dashboards)
  resource.customizations.health.ceph.rook.io_CephCluster: |
    hs = {}
    if obj.status ~= nil and obj.status.phase == "Ready" then
      hs.status = "Healthy"
      hs.message = "CephCluster is Ready"
    else
      hs.status = "Progressing"
      hs.message = "CephCluster not Ready"
    end
    return hs
  resource.customizations.health.cert-manager.io_Certificate: |
    hs = {}
    if obj.status ~= nil and obj.status.conditions ~= nil then
      for _, c in ipairs(obj.status.conditions) do
        if c.type == "Ready" and c.status == "True" then
          hs.status = "Healthy"; hs.message = "Certificate Ready"; return hs
        end
      end
    end
    hs.status = "Progressing"; hs.message = "Waiting for Ready"; return hs

  # Resource customizations - handle special cases
  # Using split-key format (group_Kind) avoids YAML-in-YAML complexity
  # Format: resource.customizations.<action>.<group>_<Kind>
  resource.customizations.ignoreDifferences.admissionregistration.k8s.io_MutatingWebhookConfiguration: |
    jqPathExpressions:
    - '.webhooks[].clientConfig.caBundle'  # Ignore ALL webhook cert bundles
  resource.customizations.ignoreDifferences.admissionregistration.k8s.io_ValidatingWebhookConfiguration: |
    jqPathExpressions:
    - '.webhooks[].clientConfig.caBundle'

  # Optional: Ignore common drift from HPAs and Services
  resource.customizations.ignoreDifferences.apps_Deployment: |
    jsonPointers:
    - /spec/replicas  # HPA manages this
  resource.customizations.ignoreDifferences._Service: |
    jsonPointers:
    - /spec/clusterIP  # Kubernetes assigns this

  # Exclude resources from sync - ArgoCD won't manage these
  resource.exclusions: |
    - apiGroups:
      - cilium.io
      kinds:
      - CiliumIdentity  # Cilium creates these dynamically, don't sync
      clusters:
      - "*"
---
# argocd-rbac-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-rbac-cm
  namespace: argocd
data:
  # RBAC (Role-Based Access Control) policy
  policy.default: role:readonly  # Default users can only read
  policy.csv: |  # Policy rules in CSV format
    # p = policy rule: role, resource, action, object, effect
    # Actions are verbs: get|create|update|delete|sync|override
    p, role:admin, applications, *, */*, allow    # Admins can do anything with apps
    p, role:admin, applications, update/*, */*, allow  # Argo CD v3 explicit update
    p, role:admin, applications, delete/*, */*, allow  # Argo CD v3 explicit delete
    p, role:admin, clusters, *, *, allow          # Admins can manage clusters
    p, role:admin, repositories, *, *, allow      # Admins can manage repos
    p, role:admin, certificates, *, *, allow      # Admins can manage certs
    p, role:admin, projects, *, *, allow          # Admins can manage projects
    g, argocd-admins, role:admin                  # Group argocd-admins has admin role

Apply the configuration:

# Apply both ConfigMaps with our custom settings
kubectl apply -f argocd-cm.yaml -f argocd-rbac-cm.yaml

# Optional: Apply performance tuning and security hardening
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cmd-params-cm
  namespace: argocd
data:
  controller.status.processors: "20"     # default ~10; bump for many apps
  controller.operation.processors: "10"  # default ~10; tune to cluster size
  reposerver.parallelism.limit: "20"     # concurrent manifest generations
  server.insecure: "false"               # Disable plain HTTP, TLS-only
EOF

# Optional: Add GPG keys for commit verification
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-gpg-keys-cm
  namespace: argocd
data:
  # Add your organization's signing keys here
  # your-key.asc: |
  #   -----BEGIN PGP PUBLIC KEY BLOCK-----
  #   ...key content...
  #   -----END PGP PUBLIC KEY BLOCK-----
EOF
# Or if they're in one file separated by ---:
# kubectl apply -f argocd-configs.yaml

# Restart ALL ArgoCD components to load new configuration
# Different components read different config keys
kubectl -n argocd rollout restart \
  deploy/argocd-server \
  deploy/argocd-repo-server \
  deploy/argocd-application-controller

# Watch the restart progress
kubectl -n argocd rollout status deploy/argocd-server

Expose ArgoCD via LoadBalancer

What's a LoadBalancer?
A LoadBalancer Service gets an external IP address that routes traffic to your pods. In our homelab, Cilium's L2 announcement provides this IP from our defined pool.

First, ensure Cilium has L2 announcements enabled and create the LoadBalancer IP pool:

# cilium-loadbalancer-pool.yaml
apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
  name: default
spec:
  blocks:
    - cidr: 192.168.0.200/29  # Adjust to your network
  serviceSelector:
    matchLabels:
      lb-pool: default

Note: Cilium must be deployed with L2 announcements enabled (l2announcements.enabled=true) for LoadBalancer services to work in bare metal environments.

Then patch the existing ArgoCD service to use LoadBalancer:

# Apply the IP pool first
kubectl apply -f cilium-loadbalancer-pool.yaml

# Patch the existing ArgoCD server service to LoadBalancer
kubectl -n argocd patch svc argocd-server -p '{"spec":{"type":"LoadBalancer"}}'

# Label the service to use our IP pool
kubectl -n argocd label svc argocd-server lb-pool=default

# Get the LoadBalancer IP (may take a minute to assign)
kubectl -n argocd get svc argocd-server

# Output explained:
NAME            TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)
argocd-server   LoadBalancer   10.96.50.100   192.168.0.202   80:30901/TCP,443:31890/TCP
#                              Internal IP     Your LAN IP     HTTP:NodePort,HTTPS:NodePort

# Access ArgoCD at: https://192.168.0.202
# Note: You'll get a certificate warning - that's expected for now
# In Part 6, we'll add Traefik ingress with trusted certificates

# Optional: Preserve real client IPs (if your L2/BGP supports it)
# kubectl -n argocd patch svc argocd-server -p '{"spec":{"externalTrafficPolicy":"Local"}}'

# Optional: Remove unused HTTP port since we're TLS-only
# Method 1: Test and remove by index (if http is first)
kubectl -n argocd patch svc argocd-server --type json \
  -p '[{"op":"test","path":"/spec/ports/0/name","value":"http"},{"op":"remove","path":"/spec/ports/0"}]'

# Method 2: Order-agnostic removal using jq
kubectl -n argocd get svc argocd-server -o json | \
  jq 'del(.spec.ports[] | select(.name=="http"))' | \
  kubectl apply -f -

Secret Management with Infisical

Why Not Store Secrets in Git?
Git history is permanent - once a secret is committed, it's visible forever even if deleted. Anyone with repository access can see all secrets. Instead, we store secrets in a secure vault (Infisical) and reference them from our manifests.

What is Infisical?
Infisical is an open-source secret management platform. It stores secrets encrypted and provides them to applications at runtime. Think of it as a secure password manager for your applications.

Install Infisical Operator

# Add Infisical's official Helm repository (Cloudsmith)
# Helm repositories are like app stores for Kubernetes
helm repo add infisical 'https://dl.cloudsmith.io/public/infisical/helm-charts/helm/charts/'
helm repo update  # Refresh available charts

# Install the Infisical operator
# Operator = controller that manages Infisical secrets in Kubernetes
helm install infisical-operator infisical/secrets-operator \
  --namespace infisical \      # Install in dedicated namespace
  --create-namespace \         # Create namespace if it doesn't exist
  --version 0.11.1              # Version with secretsScope support

# Note: Check latest releases if using newer versions - CRD fields may evolve
# Verify fields with: kubectl explain infisicalsecrets.spec --recursive

# Verify installation
kubectl -n infisical get pods
# Should see infisical-operator pod running

Configure Infisical Authentication

Setting up Machine Identity:
Machine identities are service accounts for applications. Unlike user accounts, they're designed for automated systems. You create these in the Infisical dashboard.

# infisical-auth.yaml
apiVersion: v1
kind: Secret
metadata:
  name: infisical-auth
  namespace: argocd
stringData:  # stringData allows plain text (stored base64-encoded in etcd; enable etcd encryption for at-rest security)
  identity-id: "YOUR_IDENTITY_ID"     # From Infisical dashboard
  client-id: "YOUR_CLIENT_ID"         # From Infisical dashboard
  client-secret: "YOUR_CLIENT_SECRET" # From Infisical dashboard

# SECURITY WARNING: Never commit this file with real values!
# Create the secret out-of-band instead:
# kubectl -n argocd create secret generic infisical-auth \
#   --from-literal=identity-id=... \
#   --from-literal=client-id=... \
#   --from-literal=client-secret=...
---

ArgoCD Repository Credentials

Choose your credential pattern:

Use repo-creds for org/domain-wide access (one secret covers many repos)
Use repository for single repo access (more granular control)

Create a template secret that Infisical will populate:

# gitlab-repo-creds.yaml
apiVersion: v1
kind: Secret
metadata:
  name: gitlab-repo-creds
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: repo-creds  # repo-creds applies to all matching repos
stringData:
  type: git
  url: https://gitlab.com/your-org               # Matches all repos under this org
  username: not-used                             # GitLab uses tokens as passwords
  password: "<INFISICAL_WILL_INJECT>"           # Infisical operator will replace this

Then create the InfisicalSecret to manage it:

# infisical-gitlab-secret.yaml
apiVersion: secrets.infisical.com/v1alpha1
kind: InfisicalSecret
metadata:
  name: gitlab-repo-creds-manager
  namespace: argocd
spec:
  hostAPI: https://app.infisical.com/api
  authentication:
    universalAuth:
      credentialsRef:
        secretName: infisical-auth
        secretNamespace: argocd
  secretsScope:
    projectSlug: homelab
    envSlug: prod
    secretsPath: /argocd
  managedKubeSecretReferences:
  - secretName: gitlab-repo-creds
    secretNamespace: argocd

How This Works:

The repo-creds type means one credential covers all repos under gitlab.com/your-org
Infisical operator watches for InfisicalSecret resources
It fetches the token from Infisical using the machine identity
Updates the gitlab-repo-creds secret with the actual token
ArgoCD uses this credential for any repo matching the URL pattern

Project Structure

Organizing for GitOps:
A well-organized repository makes GitOps manageable. Here's the structure I use:

k8s-manifests/                    # Root of your GitOps repository
├── bootstrap/                    # Initial cluster setup
│   ├── argocd/                  # ArgoCD itself
│   │   └── kustomization.yaml   # Kustomize configuration
│   └── infisical/               # Secret management
│       └── kustomization.yaml
├── infrastructure/              # Cluster-wide services
│   ├── cert-manager/           # SSL certificate management
│   ├── cilium/                 # CNI networking
│   ├── rook-ceph/             # Storage
│   └── traefik/               # Ingress controller
├── applications/              # Your actual applications
│   ├── production/           # Production environment
│   │   ├── app1/            # Each app in its folder
│   │   └── app2/
│   └── staging/             # Staging environment
│       ├── app1/           # Same apps, different config
│       └── app2/
├── clusters/               # Cluster-specific configuration
│   └── homelab/           # Your cluster name
│       ├── apps.yaml      # Application definitions
│       └── infrastructure.yaml  # Infrastructure definitions
└── .gitlab-ci.yml         # CI/CD pipeline configuration

Why This Structure:

Separation of concerns: Infrastructure vs applications
Environment isolation: Staging and production separate
Single source of truth: One place for all configurations
Easy navigation: Logical folder structure

App of Apps Pattern

What is App of Apps?
Instead of managing dozens of individual applications, you create one "parent" application that manages all others. It's like having a manager that oversees multiple teams - you only need to talk to the manager.

Create AppProjects First

Note: Applications reference projects for access control. Create these before deploying apps:

# app-projects.yaml
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: infrastructure
  namespace: argocd
spec:
  description: Infrastructure applications
  sourceRepos:
  - https://gitlab.com/your-org/k8s-manifests  # Specific repo (more secure)
  - https://charts.bitnami.com/bitnami         # Helm charts
  - https://charts.rook.io/release             # Rook-Ceph charts
  destinations:
  - namespace: 'argocd'
    server: https://kubernetes.default.svc
  - namespace: 'infisical'
    server: https://kubernetes.default.svc
  - namespace: 'cert-manager'
    server: https://kubernetes.default.svc
  - namespace: 'kube-system'
    server: https://kubernetes.default.svc
  - namespace: 'rook-ceph'
    server: https://kubernetes.default.svc
  - namespace: 'traefik'
    server: https://kubernetes.default.svc
  - namespace: 'monitoring'
    server: https://kubernetes.default.svc
  - namespace: 'prometheus'
    server: https://kubernetes.default.svc
  # Alternative for homelab simplicity (less secure):
  # - namespace: '*'
  #   server: https://kubernetes.default.svc
  clusterResourceWhitelist:
  - group: ''  # Core resources
    kind: 'Namespace'
  - group: 'apiextensions.k8s.io'  # CRDs for infrastructure apps
    kind: 'CustomResourceDefinition'
  - group: 'cilium.io'
    kind: '*'
  - group: 'cert-manager.io'
    kind: '*'
---
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: applications
  namespace: argocd
spec:
  description: Business applications
  sourceRepos:
  - https://gitlab.com/your-org/k8s-manifests
  - https://charts.bitnami.com/bitnami
  destinations:
  - namespace: 'databases'
    server: https://kubernetes.default.svc
  - namespace: 'production'
    server: https://kubernetes.default.svc
  - namespace: 'staging'
    server: https://kubernetes.default.svc
  namespaceResourceWhitelist:
  - group: '*'
    kind: '*'
  orphanedResources:
    warn: true  # Warn about resources not tracked in Git

# Apply projects first
kubectl apply -f app-projects.yaml

# clusters/homelab/infrastructure.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application  # ArgoCD Application resource
metadata:
  name: infrastructure
  namespace: argocd
  labels:
    gitops.argocd.io/project: infrastructure  # Label for CI/CD pipeline selectors
  finalizers:  # Ensures resources are cleaned up properly on deletion
    - resources-finalizer.argocd.argoproj.io
spec:
  project: infrastructure  # ArgoCD project (for RBAC)
  source:
    repoURL: https://gitlab.com/your-org/k8s-manifests  # Git repository
    targetRevision: main                                 # Branch to track
    path: infrastructure                                 # Folder in repo
  destination:
    server: https://kubernetes.default.svc              # Deploy to same cluster
  syncPolicy:
    automated:              # Automatic synchronization settings
      prune: true          # Delete resources not in Git
      selfHeal: true       # Fix drift automatically
      allowEmpty: false    # Don't sync if path is empty
    syncOptions:
    - CreateNamespace=true                    # Create namespaces if needed
    - PrunePropagationPolicy=foreground       # Delete dependencies first
    - PruneLast=true                         # Delete parent resources last
    - ServerSideApply=true                   # Reduce patch conflicts
    retry:
      limit: 5             # Retry failed syncs 5 times
      backoff:
        duration: 5s       # Start with 5 second wait
        factor: 2          # Double wait time each retry
        maxDuration: 3m    # Max 3 minutes between retries
  revisionHistoryLimit: 10  # Keep 10 previous versions for rollback
---
# clusters/homelab/apps.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: applications
  namespace: argocd
  labels:
    gitops.argocd.io/project: applications  # Label for CI/CD pipeline selectors
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: applications
  source:
    repoURL: https://gitlab.com/your-org/k8s-manifests
    targetRevision: main
    path: applications/production
  destination:
    server: https://kubernetes.default.svc
  syncPolicy:
    automated:
      prune: false  # Don't auto-delete apps
      selfHeal: true
    syncOptions:
    - CreateNamespace=true
    - ServerSideApply=true  # Reduce patch conflicts
  revisionHistoryLimit: 10

Deploy the App of Apps:

# Apply the parent applications
kubectl apply -f clusters/homelab/infrastructure.yaml
kubectl apply -f clusters/homelab/apps.yaml

# Get ArgoCD initial admin password
kubectl -n argocd get secret argocd-initial-admin-secret \
  -o jsonpath="{.data.password}" | base64 -d

# Login to ArgoCD CLI (install with: brew install argocd)
# --insecure is needed for self-signed certificates
argocd login 192.168.0.202 --username admin --password <password> --insecure

# SECURITY: After setting up SSO or changing admin password, delete the initial secret
# argocd account update-password  # Change password first
# kubectl -n argocd delete secret argocd-initial-admin-secret

# Watch sync status
argocd app list                  # List all applications
argocd app sync infrastructure   # Manually trigger sync
argocd app get infrastructure    # Detailed status

# Or use the Web UI at https://192.168.0.202

CI/CD Pipeline with GitLab

What's CI/CD?

CI (Continuous Integration): Automatically test and validate code changes
CD (Continuous Deployment): Automatically deploy validated changes

Here's my production pipeline that validates and deploys:

# .gitlab-ci.yml
stages:           # Pipeline stages run in order
  - validate      # Check YAML syntax
  - test         # Security and compatibility checks
  - deploy       # Deploy to cluster

variables:
  ARGOCD_SERVER: argocd.homelab.example
  # ARGOCD_TOKEN set in GitLab CI/CD variables (synced from Infisical)

# Validate YAML syntax
validate:yaml:
  stage: validate
  image: alpine:latest      # Lightweight Linux container
  before_script:
    - apk add --no-cache yamllint  # Install YAML linter
  script:
    - yamllint -c .yamllint .      # Check all YAML files
  rules:
    - if: $CI_MERGE_REQUEST_ID     # Only run on pull requests

# Validate Kubernetes manifests
validate:k8s:
  stage: validate
  image: alpine:latest
  before_script:
    - apk add --no-cache curl tar
    # Download and install kubeconform (validates K8s YAML - kubeval successor)
    - curl -L -o /tmp/kubeconform.tar.gz https://github.com/yannh/kubeconform/releases/latest/download/kubeconform-linux-amd64.tar.gz
    - tar -xzf /tmp/kubeconform.tar.gz -C /usr/local/bin kubeconform
    - chmod +x /usr/local/bin/kubeconform
  script:
    # Validate all YAML files recursively with strict mode
    # Pin to your cluster version to reduce false positives
    - export K8S_VERSION=${K8S_VERSION:-1.29.0}
    - kubeconform -kubernetes-version $K8S_VERSION -strict -summary -ignore-missing-schemas -recursive .
  rules:
    - if: $CI_MERGE_REQUEST_ID  # Only on pull requests

# Security scanning with Kubesec
security:scan:
  stage: test
  image: alpine:latest
  before_script:
    - apk add --no-cache curl jq  # curl for HTTP, jq for JSON
  script:
    - |  # Multi-line script
      for file in $(find . -name "*.yaml" -type f); do
        echo "Scanning $file for security issues"
        # Send file to Kubesec API for scanning
        # Note: This uses a public API - consider self-hosted scanner for sensitive repos
        curl -X POST --data-binary @"$file" https://v2.kubesec.io/scan | jq .
        # Kubesec checks for security issues like:
        # - Running as root
        # - Missing security contexts
        # - Privileged containers
      done
  rules:
    - if: $CI_MERGE_REQUEST_ID

# Dry-run with ArgoCD
test:dryrun:
  stage: test
  image: argoproj/argocd:v3.1.3  # ArgoCD CLI image
  script:
    - set -euo pipefail  # Fail fast on errors
    # Show what would change without actually changing
    # First hard-refresh, then diff
    - argocd app get applications --server $ARGOCD_SERVER --auth-token $ARGOCD_TOKEN --hard-refresh
    - argocd app diff applications --server $ARGOCD_SERVER --auth-token $ARGOCD_TOKEN | tee argocd-diff.txt
    # diff shows: additions (+), deletions (-), modifications (~)
  artifacts:
    when: always
    paths:
      - argocd-diff.txt  # Keep diff for MR review
    expire_in: 1 week
  rules:
    - if: $CI_MERGE_REQUEST_ID  # Only on pull requests

# Deploy to production
deploy:production:
  stage: deploy
  image: argoproj/argocd:v3.1.3
  script:
    - |
      echo "🚀 Syncing ArgoCD applications..."
      # app sync doesn't support --project, use label selector
      argocd app sync -l gitops.argocd.io/project=applications \
        --server $ARGOCD_SERVER \               # ArgoCD server URL
        --auth-token $ARGOCD_TOKEN \            # Authentication
        --prune \                               # Remove deleted resources
        --retry-limit 3                         # Retry 3 times on failure

      # Wait for sync to complete (up to 5 minutes)
      argocd app wait -l gitops.argocd.io/project=applications \
        --server $ARGOCD_SERVER \
        --auth-token $ARGOCD_TOKEN \
        --timeout 300  # 300 seconds = 5 minutes
  environment:
    name: production
    url: https://argocd.homelab.example  # Link in GitLab UI
  rules:
    - if: $CI_COMMIT_BRANCH == "main"   # Only deploy from main branch

# Rollback on failure
rollback:auto:
  stage: deploy
  image: argoproj/argocd:v3.1.3
  script:
    - |
      echo "⚠️ Deployment failed, rolling back..."
      # Rollback to previous version (defaults to previous)
      argocd app rollback applications \
        --server $ARGOCD_SERVER \
        --auth-token $ARGOCD_TOKEN
  when: on_failure  # Only runs if deploy:production fails
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

Application Deployment Example

Deploying with Helm Charts:
Helm charts are pre-packaged Kubernetes applications. Instead of writing all YAML yourself, you use a template and customize it with values.

# applications/production/postgresql/application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: postgresql
  namespace: argocd
  labels:
    gitops.argocd.io/project: applications  # For CI/CD selectors
spec:
  project: applications  # ArgoCD project for permissions
  source:
    repoURL: https://charts.bitnami.com/bitnami  # Helm chart repository
    chart: postgresql                             # Chart name
    targetRevision: 12.12.10                     # Chart version
    helm:
      releaseName: postgresql  # Name for this installation
      values: |               # Custom configuration
        auth:
          database: homelab
          existingSecret: postgresql-secret  # Uses secret from Infisical
          secretKeys:
            adminPasswordKey: postgres-password
            userPasswordKey: postgres-password
            replicationPasswordKey: postgres-replication-password
        primary:
          persistence:
            enabled: true
            storageClass: ceph-block-fast
            size: 50Gi
          resources:
            requests:
              cpu: 500m
              memory: 512Mi
            limits:
              cpu: 2000m
              memory: 2Gi
        metrics:
          enabled: true
          serviceMonitor:
            enabled: true
  destination:
    server: https://kubernetes.default.svc
    namespace: databases
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - CreateNamespace=true

Managing Secrets with Infisical

How Secret Sync Works:

Define secrets in Infisical dashboard
Create InfisicalSecret resource in Kubernetes
Infisical operator fetches and creates Kubernetes secret
Application uses the Kubernetes secret

# applications/production/postgresql/infisical-secret.yaml
apiVersion: secrets.infisical.com/v1alpha1
kind: InfisicalSecret
metadata:
  name: postgresql-secret
  namespace: databases  # Where the secret will be created
spec:
  hostAPI: https://app.infisical.com/api
  authentication:
    universalAuth:
      credentialsRef:
        secretName: infisical-auth       # Auth credentials
        secretNamespace: argocd
  secretsScope:
    projectSlug: homelab                 # Infisical project
    envSlug: prod                        # Environment
    secretsPath: /databases/postgresql   # Path in Infisical
  managedKubeSecretReferences:
  - secretName: postgresql-secret        # K8s secret to create
    secretNamespace: databases          # In this namespace

# The Bitnami chart expects these keys in the secret:
# - postgres-password          (admin password)
# - password                   (user password, if creating custom user)
# - postgres-replication-password (replication password)
# You've overridden these via auth.secretKeys above—ensure your Infisical secret matches

Sync Strategies

Choosing the Right Strategy:
Not all applications should auto-sync. Critical infrastructure needs manual approval, while development apps can sync automatically.

# Critical infrastructure - manual sync
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: rook-ceph  # Storage is critical - don't auto-update
spec:
  syncPolicy:
    automated: false  # Requires manual 'argocd app sync' command

---
# Development apps - auto sync everything
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: dev-app
spec:
  syncPolicy:
    automated:
      prune: true        # Delete removed resources
      selfHeal: true     # Fix any manual changes
      allowEmpty: false  # Don't sync if Git path is empty

---
# Production apps - careful auto sync
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: prod-app
spec:
  syncPolicy:
    automated:
      prune: false       # Never auto-delete in production
      selfHeal: true     # Fix drift from desired state
    syncOptions:
    - ApplyOutOfSyncOnly=true         # Don't touch unchanged resources
    - RespectIgnoreDifferences=true   # Honor ignoreDifferences settings

Multi-Environment Management

What is Kustomize?
Kustomize lets you customize Kubernetes resources without templates. You define a base configuration and then apply patches for different environments. It's like having a base recipe and adding different spices for different tastes.

# applications/base/app/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:  # Base resources used by all environments
  - deployment.yaml
  - service.yaml
  - ingress.yaml

---
# applications/staging/app/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ../../base/app  # Start with base configuration

patchesStrategicMerge:
  - deployment-patch.yaml  # Apply staging-specific changes

configMapGenerator:  # Create ConfigMap with environment variables
  - name: app-config
    literals:
      - environment=staging
      - debug=true  # Enable debug in staging

---
# applications/production/app/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ../../base/app  # Same base as staging

patchesStrategicMerge:
  - deployment-patch.yaml  # Production-specific changes

configMapGenerator:
  - name: app-config
    literals:
      - environment=production
      - debug=false  # No debug in production

replicas:  # Override replica count for production
  - name: app
    count: 3  # Run 3 replicas in production (vs 1 in staging)

Monitoring ArgoCD

Prometheus Metrics

# argocd-metrics-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: argocd-metrics
  namespace: argocd
  labels:
    app.kubernetes.io/name: argocd-metrics  # ServiceMonitor matches this label
spec:
  ports:
    - name: metrics
      port: 8082        # Application controller metrics port
      protocol: TCP
      targetPort: 8082
  selector:
    app.kubernetes.io/name: argocd-application-controller  # Selects controller pods
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: argocd-metrics
  namespace: argocd
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: argocd-metrics  # Matches Service label, not pod label
  namespaceSelector:
    matchNames: ["argocd"]
  endpoints:
  - port: metrics
    interval: 30s

Metrics Endpoints:

Application Controller: :8082/metrics (app health, sync status)
API Server: :8083/metrics (API requests)
Repo Server: :8084/metrics (Git operations)
ApplicationSet Controller: :8080/metrics (appset operations)

Complete Metrics Setup

To monitor all ArgoCD components (server:8083, repo:8084, appset:8080), create Services and ServiceMonitors:

# argocd-metrics-complete.yaml
apiVersion: v1
kind: Service
metadata:
  name: argocd-server-metrics
  namespace: argocd
  labels:
    app.kubernetes.io/name: argocd-server-metrics
spec:
  selector:
    app.kubernetes.io/name: argocd-server
  ports:
    - name: metrics
      port: 8083
      targetPort: 8083
---
apiVersion: v1
kind: Service
metadata:
  name: argocd-repo-server-metrics
  namespace: argocd
  labels:
    app.kubernetes.io/name: argocd-repo-server-metrics
spec:
  selector:
    app.kubernetes.io/name: argocd-repo-server
  ports:
    - name: metrics
      port: 8084
      targetPort: 8084
---
apiVersion: v1
kind: Service
metadata:
  name: argocd-applicationset-metrics
  namespace: argocd
  labels:
    app.kubernetes.io/name: argocd-applicationset-metrics
spec:
  selector:
    app.kubernetes.io/name: argocd-applicationset-controller
  ports:
    - name: metrics
      port: 8080
      targetPort: 8080
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: argocd-server-metrics
  namespace: argocd
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: argocd-server-metrics
  namespaceSelector:
    matchNames: ["argocd"]
  endpoints:
    - port: metrics
      interval: 30s
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: argocd-repo-server-metrics
  namespace: argocd
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: argocd-repo-server-metrics
  namespaceSelector:
    matchNames: ["argocd"]
  endpoints:
    - port: metrics
      interval: 30s
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: argocd-applicationset-metrics
  namespace: argocd
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: argocd-applicationset-metrics
  namespaceSelector:
    matchNames: ["argocd"]
  endpoints:
    - port: metrics
      interval: 30s

Note: Ensure your Prometheus Operator is configured to watch ServiceMonitors in the argocd namespace.

Key Metrics to Watch

# Application health status - how many apps are unhealthy?
sum(argocd_app_info{health_status!="Healthy"})
# Alert if > 0

# Sync operations per hour - how busy is ArgoCD?
rate(argocd_app_sync_total[1h])
# Normal: 10-50/hour, High: >100/hour

# Resource count by sync status - what's out of sync?
sum by (sync_status) (argocd_app_info)
# Should mostly show "Synced"

# Git request rate - how many Git operations per hour?
rate(argocd_git_request_total[1h])
# Alert if rate is unusually high (may indicate issues)

# Apps stuck OutOfSync for >15 minutes (alert-worthy)
max_over_time(sum by (name) (argocd_app_info{sync_status="OutOfSync"})[15m]) > 0

Security Hardening

Network Policies

Lock down the ArgoCD namespace with strict network policies:

# argocd-network-policies.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: argocd-default-deny
  namespace: argocd
spec:
  podSelector: {}
  policyTypes: ["Ingress","Egress"]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: argocd-server-https-ingress
  namespace: argocd
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: argocd-server
  policyTypes: ["Ingress"]
  ingress:
  - ports:
    - port: 443
      protocol: TCP
    # Optionally restrict to your LAN:
    # from: [{ ipBlock: { cidr: 192.168.0.0/16 } }]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: argocd-allow-egress-api-git
  namespace: argocd
spec:
  podSelector: {}
  policyTypes: ["Egress"]
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: default
    - ipBlock: { cidr: 0.0.0.0/0 }  # Git providers
    ports:
    - port: 443
      protocol: TCP
    # Uncomment if using Git over SSH:
    # - port: 22
    #   protocol: TCP
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: argocd-allow-dns
  namespace: argocd
spec:
  podSelector: {}
  policyTypes: ["Egress"]
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
      podSelector:
        matchLabels:
          k8s-app: kube-dns  # CoreDNS only
    ports:
    - port: 53
      protocol: UDP
    - port: 53
      protocol: TCP  # DNS fallback

GitLab Webhook Configuration

Enable instant sync on Git push (instead of polling):

# Create webhook secret
WEBHOOK_SECRET=$(openssl rand -base64 32)
kubectl -n argocd patch secret argocd-secret \
  --type merge -p "{\"stringData\":{\"webhook.gitlab.secret\":\"$WEBHOOK_SECRET\"}}"

echo "GitLab Webhook Secret: $WEBHOOK_SECRET"

In GitLab → Project → Settings → Webhooks:

URL: https://argocd.homelab.example/api/webhook
Secret Token: (use the generated secret)
Triggers: Push events, Merge request events
SSL verification: Disable temporarily (until Part 6 adds trusted certs)

Note: GitLab will fail SSL verification with self-signed certs. Temporarily disable SSL verification now; Part 6's checklist includes re-enabling it after cert-manager setup.

OIDC SSO Configuration

Replace local users with GitLab SSO:

# argocd-cm.yaml additions
data:
  oidc.config: |
    name: GitLab
    issuer: https://gitlab.com
    clientID: $oidc.gitlab.clientId
    clientSecret: $oidc.gitlab.clientSecret
    requestedScopes: ["openid","profile","email","groups"]
    requestedIDTokenClaims:
      groups:
        essential: true
  # Disable admin after SSO works
  accounts.admin.enabled: "false"

# Store OIDC credentials (get from GitLab Applications)
kubectl -n argocd patch secret argocd-secret --type merge -p \
'{"stringData":{"oidc.gitlab.clientId":"<client-id>","oidc.gitlab.clientSecret":"<client-secret>"}}'

Ensure your GitLab users are in a group named argocd-admins to match the RBAC policy.

Test SSO is working:

argocd account get-user-info --server $ARGOCD_SERVER --auth-token $ARGOCD_TOKEN
# Expect groups to include: argocd-admins

Disaster Recovery

Backup ArgoCD Configuration

# Create backup directory
mkdir -p backup/$(date +%Y%m%d)  # Date-stamped folder
cd backup/$(date +%Y%m%d)

# Backup all applications
for app in $(argocd app list -o name); do
  echo "Backing up $app"
  argocd app get $app -o yaml > $app.yaml
done

# Backup ArgoCD projects (for RBAC and app grouping)
kubectl get appprojects -n argocd -o yaml > projects.yaml

# Backup repository credentials (both single-repo and org-wide patterns)
kubectl get secrets -n argocd \
  -l 'argocd.argoproj.io/secret-type in (repository,repo-creds)' \
  -o yaml > repos.yaml

# Backup ArgoCD configuration
kubectl -n argocd get cm argocd-cm argocd-rbac-cm -o yaml > argocd-configs.yaml

# Backup additional configs (performance, GPG keys)
kubectl -n argocd get cm argocd-cmd-params-cm argocd-gpg-keys-cm -o yaml > argocd-extra-configs.yaml 2>/dev/null || true

# Backup AppProjects CRD (needed for disaster recovery on fresh clusters)
kubectl get crd appprojects.argoproj.io -o yaml > crd-appprojects.yaml

# Create restore script
cat > restore.sh << 'EOF'
#!/bin/bash
echo "Restoring ArgoCD configuration..."
kubectl apply -f projects.yaml
kubectl apply -f repos.yaml
for file in *.yaml; do
  if [[ $file != "projects.yaml" && $file != "repos.yaml" ]]; then
    kubectl apply -f $file
  fi
done
EOF
chmod +x restore.sh

Restore from Backup

# Navigate to backup directory
cd backup/20240315  # Use your backup date

# Run the restore script
./restore.sh

# Or manually restore:
# 1. Restore projects first (defines permissions)
kubectl apply -f projects.yaml

# 2. Restore repository credentials
kubectl apply -f repos.yaml

# 3. Restore all applications
for file in *.yaml; do
  if [[ $file != "projects.yaml" && $file != "repos.yaml" ]]; then
    kubectl apply -f $file
  fi
done

# 4. Trigger sync for all apps (using label selector)
argocd app sync -l gitops.argocd.io/project=applications

# 5. Verify restoration
argocd app list

Common Patterns and Best Practices

1. Progressive Delivery

apiVersion: argoproj.io/v1alpha1
kind: Application
spec:
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - CreateNamespace=true
    - ApplyOutOfSyncOnly=true
    managedNamespaceMetadata:
      labels:
        environment: production
      annotations:
        team: platform

2. Resource Hooks and Sync Waves

What are Sync Waves?
Sync waves control deployment order. Lower numbers deploy first. Use for dependencies:

metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "-2"  # CRDs first
---
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "-1"  # Namespace/RBAC
---
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "0"   # Core infrastructure
---
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "1"   # Applications

What are Hooks?
Hooks are resources that run at specific points in the sync process. Common uses: database migrations before app deployment, cache warming after deployment.

apiVersion: batch/v1
kind: Job
metadata:
  generateName: schema-migration-  # Unique name each run
  annotations:
    # Hook annotations tell ArgoCD when to run this
    argocd.argoproj.io/hook: PreSync  # Run before main sync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded  # Delete job when done
spec:
  template:
    spec:
      containers:
      - name: migrate
        image: migrate/migrate
        command: ["migrate", "-path", "/migrations", "-database", "$DATABASE_URL", "up"]
      restartPolicy: Never  # Don't restart if migration fails

# Hook types:
# - PreSync: Before syncing resources
# - Sync: During sync (waves)
# - PostSync: After successful sync
# - SyncFail: If sync fails

3. Ignore Differences

Why Ignore Differences?
Some fields are managed by Kubernetes or other controllers, not Git. Ignoring these prevents ArgoCD from constantly showing "OutOfSync".

apiVersion: argoproj.io/v1alpha1
kind: Application
spec:
  ignoreDifferences:
  - group: apps
    kind: Deployment
    jsonPointers:
    - /spec/replicas  # HPA changes this, don't sync from Git
  - group: ""  # Core API group (no group name)
    kind: Service
    jsonPointers:
    - /spec/clusterIP  # Kubernetes assigns this, not in Git

# Common fields to ignore:
# - /metadata/resourceVersion  # Kubernetes version tracking
# - /status  # Status is current state, not desired
# - /spec/replicas  # When using HPA
# - webhook certificates  # Auto-generated

Troubleshooting

Application Stuck in Syncing

# 1. Check detailed sync status
argocd app get <app-name>
# Look for: Sync Status, Health Status, and any error messages

# 2. Check controller logs for errors
kubectl -n argocd logs deployment/argocd-application-controller --tail=50
# Common errors:
# - "repository not accessible" = Git credentials issue
# - "rate limit" = Too many Git API calls
# - "timeout" = Resource taking too long to become healthy

# 3. Force refresh (re-read Git, compare with cluster)
argocd app get <app-name> --refresh

# 4. Hard refresh (delete repo cache, re-clone)
argocd app get <app-name> --hard-refresh

# 5. If still stuck, terminate operation
argocd app terminate-op <app-name>

# 6. Manual sync with specific options
argocd app sync <app-name> --force  # Ignore conflicts

Out of Sync but Nothing Changed

# 1. See exactly what ArgoCD thinks is different
argocd app diff <app-name>
# Shows + (additions), - (deletions), ~ (changes)

# Common causes and fixes:

# Cause 1: Admission webhooks adding defaults
# Example: Webhook adds sidecar containers
# Fix: Add to ignoreDifferences in Application

# Cause 2: Controllers modifying resources
# Example: HPA changing replica count
# Fix: Ignore /spec/replicas for that Deployment

# Cause 3: Server-side defaults
# Example: imagePullPolicy defaults to "IfNotPresent"
# Fix: Explicitly set in your manifests

# Cause 4: Kubernetes API conversions
# Example: memory: 1Gi becomes memory: 1073741824
# Fix: Use consistent units in manifests

# Quick fix - sync anyway:
argocd app sync <app-name> --force

Repository Connection Issues

# 1. List all configured repositories
argocd repo list
# STATUS column shows connection state

# 2. Test specific repository
argocd repo get <repo-url>
# Shows detailed connection info and last error

# 3. Common fixes:

# Fix: Update expired token
argocd repo add <repo-url> \
  --username not-used \      # GitLab uses tokens as passwords
  --password <new-token> \    # Personal access token
  --upsert                    # Update if exists

# Fix: SSH key authentication
argocd repo add [email protected]:org/repo.git \
  --ssh-private-key-path ~/.ssh/id_rsa

# Fix: Self-signed certificates
argocd repo add https://gitlab.internal/repo \
  --insecure-skip-server-verification

# 4. Verify fix worked
argocd app get <app-name> --refresh

Production Readiness Checklist

Before considering your GitOps setup production-ready:

[ ] Every child Application has the label gitops.argocd.io/project for CI/CD selectors
[ ] Cilium L2 announcements are enabled before using LoadBalancer services
[ ] Initial admin secret deleted after setting up SSO or rotating password
[ ] Built-in admin disabled in argocd-cm after SSO/password rotation
[ ] Server-side apply enabled on parent Applications to reduce conflicts
[ ] Git webhooks configured (push/MR) with secret for instant sync
[ ] NetworkPolicies applied in argocd namespace for zero-trust
[ ] Performance tuning applied if managing >20 applications
[ ] All metrics endpoints monitored for complete observability
[ ] Repository credentials use repo-creds for org-wide access
[ ] Backup scripts tested including restore procedure
[ ] AppProject destinations include all required namespaces

What's Next

With GitOps operational, your deployments are now declarative and auditable. In Part 6, we'll add ingress and certificate management with Traefik and cert-manager, exposing your applications securely to the world.

Key Takeaways

GitOps provides single source of truth - Git becomes your desired state
Never store secrets in Git - Use Infisical or similar secret management
App of Apps pattern scales well - Manage hundreds of apps easily
CI/CD validation prevents disasters - Catch issues before production
Different sync strategies for different apps - Not everything should auto-sync

References

ArgoCD Documentation: https://argo-cd.readthedocs.io/en/stable/
GitOps Principles: https://www.gitops.tech/
Infisical Documentation: https://infisical.com/docs
Kustomize Documentation: https://kustomize.io/
GitLab CI/CD: https://docs.gitlab.com/ee/ci/

Continue to Part 6: Ingress and Certificate Management →