Part 3: Cilium CNI - Advanced Networking and Load Balancing

Six hours. That's how long my cluster was partially broken during a Flannel to Cilium migration. This article covers the proper way to install Cilium from the start, configure native L2 load balancing (replacing MetalLB), and implement network policies. I'll explain every networking concept clearly so you understand not just the "how" but the "why" behind each configuration.

Understanding CNI (Container Network Interface)

Before diving into Cilium, let's understand what a CNI does:

What is a CNI?

A CNI plugin provides networking for pods in Kubernetes:

Assigns IP addresses to each pod
Enables pod-to-pod communication across nodes
Implements network policies for security
Handles service load balancing and DNS

Without a CNI, your pods can't talk to each other - that's why nodes show "NotReady" after Part 2.

Common CNI Options

Flannel: Simple, uses VXLAN overlay, limited features
Calico: Uses BGP routing, good for large clusters
Weave: Mesh networking, easy but slower
Cilium: eBPF-based, feature-rich, our choice

Why Cilium Over Other CNIs

After running Flannel, Calico, and finally Cilium in production, here's why Cilium wins:

eBPF Technology Advantage

What is eBPF?
eBPF (extended Berkeley Packet Filter) lets Cilium run networking code directly in the Linux kernel, like having a Formula 1 engine instead of a regular car engine.

Traditional CNI (iptables):

Packet → User Space → iptables rules → Kernel → Destination
         (slow)       (thousands of rules)

Cilium (eBPF):

Packet → Kernel eBPF program → Destination
         (fast, no context switching)

Results in my testing:

10-20% better throughput (data transfer speed)
32% lower latency (response time)
37% less CPU usage

Native Load Balancing

Traditional Setup: Kubernetes + CNI + MetalLB (3 components)
With Cilium: Just Kubernetes + Cilium (2 components)

Cilium includes load balancing built-in:

L2 mode: Announces IPs via ARP (what we'll use)
BGP mode: For advanced routing (datacenter style)
No extra components means less to break

Observability with Hubble

Hubble is Cilium's observability platform - think of it as X-ray vision for your network:

See traffic flows between pods visually
Debug connection issues with packet-level detail
Monitor API calls without modifying applications
Track security policy violations in real-time

Enterprise Security Features

Network Policies: Control which pods can talk to each other
WireGuard Encryption: Optional encryption using WireGuard protocol
Identity-Based Security: Policies based on labels, not just IPs
Layer 7 Filtering: Block specific HTTP paths or methods

Pre-Installation Requirements

Before installing Cilium, let's verify your cluster is ready:

# Check kernel version (Linux kernel 5.10+ recommended for full eBPF functionality)
talosctl --nodes 192.168.0.11 version | grep -i kernel

# Expected output:
# kernel: 6.1.0-talos  # Version 5.10+ recommended for all eBPF features
# Older kernels have limited eBPF support

# Verify all nodes are waiting for CNI (NotReady is expected!)
kubectl get nodes

# Expected output - all nodes NotReady:
NAME          STATUS     ROLES           AGE   VERSION
talos-cp-01   NotReady   control-plane   10m   v1.34.0
talos-cp-02   NotReady   control-plane   8m    v1.34.0
talos-cp-03   NotReady   control-plane   6m    v1.34.0
talos-wrk-01  NotReady   worker          4m    v1.34.0
talos-wrk-02  NotReady   worker          4m    v1.34.0

# Confirm no other CNI is installed
# grep searches for text patterns in output
kubectl get pods -n kube-system | grep -E "flannel|calico|weave|cilium"

# Should return nothing - empty output is good!
# (If you've already installed Cilium, seeing cilium-... pods here is expected)

Troubleshooting Pre-Installation Issues:

# If nodes are not appearing at all:
kubectl get nodes -o wide
# Look for connection or certificate issues

# If you see old CNI pods:
kubectl delete -n kube-system ds/kube-flannel-ds  # Remove Flannel
kubectl delete -n kube-system ds/calico-node      # Remove Calico

# Verify control plane is actually ready:
# Note: componentstatuses is deprecated since k8s 1.19
# Use the API server healthz endpoints instead:
kubectl get --raw='/readyz?verbose'
# All checks should show "ok"

# Check API server liveness:
kubectl get --raw='/livez?verbose'
# Should show "ok" for all checks

Why These Checks Matter

Kernel version: eBPF works best with modern kernels
- 5.10+ recommended; newer kernels unlock DSR/Host-Reachable Services/Bandwidth Manager
- Check your Cilium chart's feature matrix and Talos kernel for exact support
NotReady nodes: Confirms nodes are waiting for CNI (expected state)
No existing CNI: Installing multiple CNIs causes routing conflicts and packet loss

Installing Cilium CLI

First, install the Cilium CLI tool on your workstation. This tool helps manage and troubleshoot Cilium:

# Get the latest stable version number
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)

# Detect your CPU architecture (Intel/AMD or ARM)
CLI_ARCH=amd64  # Default for Intel/AMD
if [ "$(uname -m)" = "aarch64" ]; then
  CLI_ARCH=arm64  # ARM processors (like Apple Silicon)
fi

# Detect OS (linux or darwin for macOS)
OS=$(uname -s | tr '[:upper:]' '[:lower:]')

# Download the CLI and its checksum file
# -L: Follow redirects
# --fail: Exit on HTTP errors
# --remote-name-all: Save with original filenames
curl -L --fail --remote-name-all \
  https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-${OS}-${CLI_ARCH}.tar.gz{,.sha256sum}

# Verify the download isn't corrupted (works on Linux and macOS)
if command -v sha256sum >/dev/null 2>&1; then
  sha256sum --check cilium-${OS}-${CLI_ARCH}.tar.gz.sha256sum
else
  shasum -a 256 -c cilium-${OS}-${CLI_ARCH}.tar.gz.sha256sum
fi

# Extract to system binary location
# x: extract, z: gzip compressed, v: verbose, f: file, C: change to directory
sudo tar xzvf cilium-${OS}-${CLI_ARCH}.tar.gz -C /usr/local/bin

# Clean up downloaded files
rm cilium-${OS}-${CLI_ARCH}.tar.gz{,.sha256sum}

# Verify installation worked
cilium version --client

# Expected output:
# cilium-cli: v0.16.x (git-sha)

Installing Helm (Required for Production)

Helm is Kubernetes' package manager. We'll use it to install Cilium with custom configuration:

# Install Helm if you don't have it
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Verify Helm installation
helm version

# Expected output:
# version.BuildInfo{Version:"v3.x.x", ...}

Cilium Configuration

Here's my battle-tested Cilium configuration with detailed explanations of every setting:

Understanding the Configuration

Before we dive in, let's understand key networking concepts:

CIDR (10.244.0.0/16): Defines an IP range. The /16 means the first 16 bits are fixed, giving us 65,536 IP addresses (10.244.0.0 to 10.244.255.255)
Pod CIDR: The IP range assigned to pods
Service CIDR: The IP range for Kubernetes services
VIP: Virtual IP that floats between control planes
MTU: Maximum Transmission Unit - largest packet size (like envelope size for network data)

The Complete Configuration File

Create this configuration file with explanations for each section. Always cross-check loadBalancer.mode/acceleration and bpf.lbAlgorithmAnnotation exist in your chart version to avoid "unknown field" errors.

# cilium-values.yaml
# Eviction Protection - Prevent Cilium from being killed on resource pressure
priorityClassName: system-node-critical  # Agent gets highest priority
operator:
  priorityClassName: system-cluster-critical  # Operator is cluster-critical

cluster:
  name: homelab-cluster  # Your cluster name (can be anything)
  id: 1                  # Unique ID if running multiple clusters
                        # Why: Prevents clusters from interfering

# IP Address Management (IPAM) Configuration
ipam:
  mode: kubernetes  # Let Kubernetes assign pod IPs
                   # Why: Integrates with Kubernetes' IP management
                   # Note: clusterPoolIPv4PodCIDRList is NOT used in kubernetes mode
                   # Pod CIDRs are managed by Kubernetes itself via node allocations
                   # With ipam.mode: kubernetes, ensure the node CIDR allocator is enabled
                   # in kube-controller-manager and --cluster-cidr matches your Pod CIDR
                   # (Talos sets this when you define podSubnets)
                   # Non-Talos: kube-controller-manager flags must include:
                   # --allocate-node-cidrs=true --cluster-cidr=10.244.0.0/16
                   # Verify each node has a podCIDR:
                   # kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.podCIDR}{"\n"}{end}'

# Replace kube-proxy with eBPF (major performance boost)
kubeProxyReplacement: strict    # Use 'strict' when kube-proxy is disabled (e.g., Talos)
                                # Use 'probe' to auto-detect if kube-proxy may still run
                                # Options: strict|probe|partial|false
k8sServiceHost: 192.168.0.200   # Your control plane VIP from Part 2
k8sServicePort: 6443            # Kubernetes API port (always 6443)
                                # Why: Cilium needs to reach API server
                                # Note for Talos: You can also use KubePrism (localhost:7445)
                                # k8sServiceHost: localhost
                                # k8sServicePort: 7445

# eBPF (Extended Berkeley Packet Filter) Settings
bpf:
  masquerade: true      # Hide pod IPs behind node IPs for external traffic
                       # Why: Required for SNAT of pod traffic to external networks
  lbAlgorithmAnnotation: true  # Allow per-Service LB algorithm override

# Host access to ClusterIP services (optional)
# Enable this ONLY if you curl ClusterIP Services from nodes or hostNetwork pods:
# enableHostReachableServices: true
# hostServices:
#   enabled: true

# Routing Configuration
routingMode: native             # Direct routing (no overlay/tunnel)
                               # Why: Better performance than VXLAN
                               # Note: Native routing requires L2 adjacency or correct L3 routes to each node's PodCIDR
                               # If nodes span VLANs/routers, use BGP mode or add static routes
autoDirectNodeRoutes: true     # Auto-configure routes between nodes
ipv4NativeRoutingCIDR: 10.244.0.0/16  # Pod CIDR(s) for native routing & direct node routes
                                      # Why: Enables native routing for these IPs (no overlay)
tunnel: disabled               # Explicitly disable tunneling (optional, native mode implies this)
endpointRoutes:
  enabled: true                # Create per-endpoint routes
                              # Why: More efficient packet routing

# Network packet size
mtu: 1500  # Standard Ethernet packet size
          # Change to 1450 if you see packet fragmentation
          # Why: Must match your network's MTU
          # Note: If you later enable WireGuard, reduce MTU (e.g., 1420-1450)
          # to account for tunnel overhead or you'll see fragmentation/timeouts

# Hubble - Network observability (like Wireshark for Kubernetes)
hubble:
  enabled: true        # Turn on network visibility
  relay:
    enabled: true
    replicas: 2        # Run 2 copies for high availability
                      # Why: Don't lose visibility if one fails
    priorityClassName: system-cluster-critical  # Protect from eviction
    resources:         # Set resource requests to avoid starvation
      requests: { cpu: 50m, memory: 128Mi }
      limits:   { cpu: 500m, memory: 512Mi }
    tls:
      server:
        enabled: false # Keep simple until Part 6; enable when you add certs
  ui:
    enabled: true      # Web UI for viewing network flows
    replicas: 1
    priorityClassName: system-cluster-critical  # Protect from eviction
    resources:         # Minimal resources for UI
      requests: { cpu: 20m, memory: 64Mi }
      limits:   { cpu: 200m, memory: 256Mi }
    ingress:
      enabled: false   # We'll configure ingress in Part 6
  metrics:
    enabled:           # What network events to track
      - dns:query      # DNS lookups
      - drop          # Dropped packets (important!)
      - tcp           # TCP connections
      - flow          # General network flows
      - icmp          # Ping traffic
      - http          # HTTP requests
    serviceMonitor:
      enabled: false   # Enable when we add Prometheus

# L2 (Layer 2) Load Balancer - Replaces MetalLB
l2announcements:
  enabled: true        # Use L2 (ARP) mode for load balancing
  interfaces:
    - '^(eth0|ens.*|enp[0-9s]+)$'  # Regex matching your interface(s)
                                   # CRITICAL: Must match your actual interface!
                                   # Check with: ip link show
                                   # Single interface: '^eth0$'
                                   # Mixed hosts: '^(eth0|ens.*|enp[0-9s]+)$'

# Load Balancer Advanced Settings
loadBalancer:
  mode: dsr           # Direct Server Return - bypasses load balancer for responses
                     # Why: Better performance for large responses
                     # Note: DSR preserves source IP but requires proper return routing
                     # DSR requires clients can reach backend node IPs directly (return path bypasses LB)
                     # If you see asymmetric routing/dropped replies, try mode: snat or
                     # set Service's externalTrafficPolicy: Local
  acceleration: native  # Use eBPF for acceleration
  # Note: For L4 load-balancing algorithm, use per-Service annotation:
  # service.cilium.io/lb-algorithm: maglev|random|round_robin
  # (Enable with bpf.lbAlgorithmAnnotation=true)
  # Also can override mode per-Service:
  # service.cilium.io/lb-mode: dsr|hybrid|snat

# Security Features (start simple, add later)
encryption:
  enabled: false      # Set true for production (15% performance cost)
  type: wireguard    # Modern VPN-style encryption
  nodeEncryption: false  # Encrypt node-to-node traffic

# Network Policy Enforcement
policyEnforcementMode: default  # How strictly to enforce policies
                               # Options: default, always, never

# Identity allocation (fast policy warmup after restarts)
identityAllocationMode: crd  # Store identities as CRDs for persistence

# Cilium Operator (manages Cilium)
operator:
  replicas: 2         # Run 2 for high availability

# Cilium Agent resources (top-level in Helm chart)
resources:           # Resource limits for cilium-agent pods
  requests:          # Minimum resources needed
    cpu: 100m        # 0.1 CPU cores
    memory: 128Mi    # 128 MB RAM
  limits:            # Maximum resources allowed
    cpu: 1000m       # 1 CPU core
    memory: 1Gi      # 1 GB RAM
# Note: For operator/relay/UI resources, use operator.resources,
# hubble.relay.resources, hubble.ui.resources

Installation Process

Time to install Cilium and bring your cluster to life!

Step 1: Add Cilium Helm Repository

# Add the Cilium repository to Helm
helm repo add cilium https://helm.cilium.io/

# Update repository information
helm repo update

# Expected output:
# ...Successfully got an update from the "cilium" chart repository

Step 2: Install Cilium

# Create the configuration file first
cat > cilium-values.yaml <<'EOF'
[paste the configuration from above]
EOF

# Install Cilium using Helm
# --version: Specific version for consistency
# --namespace: Install in kube-system (where CNI belongs)
# --values: Use our custom configuration
# Get latest chart version (or set a known-good version explicitly)
CILIUM_CHART_VERSION=${CILIUM_CHART_VERSION:-$(helm search repo cilium/cilium --versions | awk 'NR==2 {print $2}')}
# Or pin to a specific version for consistency:
# CILIUM_CHART_VERSION=${CILIUM_CHART_VERSION:-1.18.1}
helm show chart cilium/cilium --version "$CILIUM_CHART_VERSION" | grep appVersion
# appVersion: 1.18.1 ← verify before installing so you track the latest GA

# Verify your values match the chart version (prevents "unknown field" errors):
helm show values cilium/cilium --version "$CILIUM_CHART_VERSION" | less

# Verify kube-proxy is truly gone (required for strict mode):
kubectl -n kube-system get ds/kube-proxy || echo "kube-proxy not present (good)"
# If kube-proxy exists, strict KPR requires it to be disabled/removed

# If migrating from kube-proxy, run preflight check first:
cilium preflight check --kube-proxy-replacement=strict

helm upgrade --install cilium cilium/cilium \
  --version "$CILIUM_CHART_VERSION" \
  --namespace kube-system \
  --values cilium-values.yaml

# Expected output:
# NAME: cilium
# LAST DEPLOYED: [timestamp]
# NAMESPACE: kube-system
# STATUS: deployed

Step 3: Monitor the Installation

# Watch Cilium pods starting up
# DaemonSet means one pod per node
kubectl -n kube-system rollout status daemonset/cilium

# Expected output:
# Waiting for daemon set "cilium" rollout to finish: 0 of 7 updated...
# Waiting for daemon set "cilium" rollout to finish: 3 of 7 updated...
# daemon set "cilium" successfully rolled out

# Check all Cilium components
kubectl -n kube-system get pods -l app.kubernetes.io/part-of=cilium

# Expected output:
NAME                               READY   STATUS    RESTARTS
cilium-7xkg9                      1/1     Running   0
cilium-8xkg2                      1/1     Running   0
cilium-operator-6f9cbd4d7c-2nvpt  1/1     Running   0
cilium-operator-6f9cbd4d7c-xrzdr  1/1     Running   0
hubble-relay-7d4d6cb8c5-kzb4n     1/1     Running   0
hubble-ui-64d4995d57-g5zhj         1/1     Running   0

Step 4: Verify Nodes Are Now Ready

This is the moment of truth - your nodes should transition from NotReady to Ready:

# Check node status
kubectl get nodes

# Expected output - all nodes Ready!
NAME          STATUS   ROLES           AGE   VERSION
talos-cp-01   Ready    control-plane   20m   v1.34.0
talos-cp-02   Ready    control-plane   18m   v1.34.0
talos-cp-03   Ready    control-plane   16m   v1.34.0
talos-wrk-01  Ready    worker          14m   v1.34.0
talos-wrk-02  Ready    worker          14m   v1.34.0

# If still NotReady after 2-3 minutes, check Cilium logs:
kubectl -n kube-system logs -l app.kubernetes.io/name=cilium-agent --tail=50 || \
kubectl -n kube-system logs -l k8s-app=cilium --tail=50

Configuring L2 Load Balancer

Now let's set up load balancing. This replaces MetalLB with Cilium's native functionality.

Important: If you previously ran MetalLB, make sure it's removed/disabled to avoid ARP conflicts.

# Clean up MetalLB if it was installed:
kubectl delete ns metallb-system --ignore-not-found
# If you used CRDs:
kubectl get crd | grep metallb | awk '{print $1}' | xargs -r kubectl delete crd

# If traffic doesn't move after switching from MetalLB, flush ARP on client:
# Linux
ip neigh flush all
# macOS
sudo arp -d -a  # flush all if needed

Understanding L2 Load Balancing

What is L2 (Layer 2) Load Balancing?

L2 refers to the Data Link layer of networking. In simple terms:

Your router uses ARP (Address Resolution Protocol) to find devices
L2 load balancing makes Cilium respond to ARP requests for service IPs
This gives your services external IPs that work on your local network

Think of it like this:

Service needs external IP (e.g., 192.168.0.201)
Router asks "who has 192.168.0.201?"
Cilium responds "I do!"
Traffic flows to your service

IPv6 Note: For dual-stack clusters, L2 announcements use NDP (Neighbor Discovery Protocol) for IPv6. Create an IPv6 pool and set loadBalancerIPs: true to cover both protocols. Enable IPv6 forwarding on nodes: sysctl -w net.ipv6.conf.all.forwarding=1

Step 1: Create IP Address Pool

First, define which IP addresses Cilium can hand out to services:

# cilium-lb-ipam-pool.yaml
# Note: Using v2alpha1 for Cilium v1.18; check docs for your version
apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
  name: default-pool
spec:
  blocks:
    - start: 192.168.0.201    # First IP to hand out
      stop: 192.168.0.210     # Last IP to hand out
                             # Why: Gives us 10 IPs for LoadBalancer services
                             # IMPORTANT: Ensure the pool (192.168.0.201–.210) does NOT
                             # overlap with DHCP scopes or static assignments, or you'll
                             # see intermittent ARP flaps.
  serviceSelector:
    matchLabels:
      lb-pool: default        # Services can request this pool with a label
                             # Why: Allows multiple pools for different purposes

Step 2: Configure L2 Announcement Policy

Tell Cilium how to announce these IPs on your network:

# l2-announcement-policy.yaml
apiVersion: cilium.io/v2alpha1
kind: CiliumL2AnnouncementPolicy
metadata:
  name: default-l2-policy
spec:
  interfaces:
    - '^(eth0|ens.*|enp[0-9s]+)$'  # Regex matching your interface(s)
                                   # CRITICAL: Must match your actual interface!
                                   # Check with: ip link show
  externalIPs: true          # Announce manually assigned external IPs
  loadBalancerIPs: true      # Announce LoadBalancer service IPs
  nodeSelector:
    matchLabels:
      kubernetes.io/os: linux # Which nodes should announce
                             # Why: All nodes can announce for redundancy

Step 3: Apply the Configuration

# Create both resources
kubectl apply -f cilium-lb-ipam-pool.yaml
kubectl apply -f l2-announcement-policy.yaml

# Verify IP pool was created
kubectl get ciliumloadbalancerippools

# Expected output:
NAME           DISABLED   CONFLICTING   IPS AVAILABLE   AGE
default-pool   false      False         10              30s
#                                       ^^ 10 IPs available

# Check the announcement policy
kubectl get ciliuml2announcementpolicies

# Expected output:
NAME                AGE
default-l2-policy   30s

Troubleshooting Interface Names

If load balancing doesn't work, the interface name is usually wrong:

# Find your actual interface name
ip link show | grep -E "^[0-9]+:"

# Example output:
# 1: lo: <LOOPBACK,UP,LOWER_UP>
# 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP>
# 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP>

# In this case, use "eth0"
# For Proxmox VMs, it might be "ens18"
# For Ubuntu, often "enp0s3" or similar

Verifying Cilium Health

Let's thoroughly verify Cilium is working correctly:

Check Overall Status

# Run Cilium status check
cilium status --wait

# Expected output (the art is Cilium's logo!):
    /¯¯\
 /¯¯\__/¯¯\    Cilium:             OK
 \__/¯¯\__/    Operator:           OK
 /¯¯\__/¯¯\    Envoy DaemonSet:    disabled (normal - Envoy runs embedded in Cilium agent)
 \__/¯¯\__/    Hubble Relay:       OK
    \__/       ClusterMesh:        disabled (normal - single cluster)
                                            # Multi-cluster? You'll add ClusterMesh in a later part

# What each component does:
# - Cilium: Main networking agent on each node
# - Operator: Manages IP allocation and garbage collection
# - Hubble Relay: Collects network observability data

# Verify each node has a podCIDR assigned (required for kubernetes IPAM mode)
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.podCIDR}{"\n"}{end}'
# All nodes should show a CIDR (e.g., 10.244.0.0/24). If any are blank, fix your
# controller-manager or Talos podSubnets configuration before proceeding.

Run Connectivity Tests

Cilium includes comprehensive network tests:

# This runs comprehensive network tests
# Takes about 5 minutes to complete
cilium connectivity test --collect-sysdump-on-failure

# If the connectivity test complains about missing CRDs/RBAC, ensure they're present:
# Get the Cilium version (not chart version) for CLI commands:
CILIUM_VERSION=$(helm show chart cilium/cilium --version "$CILIUM_CHART_VERSION" | awk '/^appVersion:/ {print $2}')
cilium install --version "$CILIUM_VERSION" --dry-run --print-config > /dev/null

# You'll see tests like:
# ✅ [pod-to-pod] Testing connectivity...
# ✅ [pod-to-service] Testing service connectivity...
# ✅ [pod-to-external] Testing external connectivity...
# ✅ [network-policy] Testing policy enforcement...

# Final output will show all tests passed
# (exact count varies by Cilium version)

# If any test fails, it will show:
# ❌ Test "pod-to-pod" failed: connection timeout
# This helps identify specific network issues

Verify Node Connectivity

# List all nodes and their Cilium status (agent CLI - run inside pod)
kubectl -n kube-system exec -it ds/cilium -c cilium-agent -- cilium-dbg node list || \
POD=$(kubectl -n kube-system get pods -l app.kubernetes.io/name=cilium-agent -o jsonpath='{.items[0].metadata.name}') && \
kubectl -n kube-system exec -it "$POD" -c cilium-agent -- cilium-dbg node list

# Portable fallback if kubectl exec ds/cilium is flaky:
POD=$(kubectl -n kube-system get pods -l app.kubernetes.io/name=cilium-agent -o jsonpath='{.items[0].metadata.name}')
kubectl -n kube-system exec -it "$POD" -c cilium-agent -- cilium-dbg node list

# Expected output:
NAME          STATUS   AGE    ENDPOINT
talos-cp-01   OK       5m     192.168.0.11:4240
talos-cp-02   OK       5m     192.168.0.12:4240
talos-cp-03   OK       5m     192.168.0.13:4240
talos-wrk-01  OK       4m     192.168.0.14:4240
talos-wrk-02  OK       4m     192.168.0.15:4240

# The endpoint port 4240 is Cilium's health check port
# All nodes should show "OK" status

Verify eBPF Programs

Check that eBPF programs are loaded in the kernel:

# List eBPF programs for endpoints (run inside cilium agent pod)
# Note: These commands require cilium-dbg inside the agent pod, not cilium-cli
kubectl -n kube-system exec -it ds/cilium -c cilium-agent -- cilium-dbg endpoint list

# Example output:
IP ADDRESS       IDENTITY  LABELS
10.244.0.125     4         k8s:app=nginx
10.244.1.89      5         k8s:app=redis

# Check eBPF map usage (advanced)
kubectl -n kube-system exec -it ds/cilium -c cilium-agent -- cilium-dbg bpf metrics

# Shows eBPF memory usage and map pressure
# High pressure indicates need for tuning

Check DNS Resolution

Verify CoreDNS is now working (it was Pending before):

# CoreDNS should now be Running
kubectl get pods -n kube-system -l k8s-app=kube-dns

# Expected output:
NAME                      READY   STATUS    RESTARTS
coredns-565d847f9-abc123  1/1     Running   0
coredns-565d847f9-def456  1/1     Running   0

# Test DNS from a pod
kubectl run test-dns --image=busybox:1.36 --rm -it --restart=Never -- \
  sh -c 'nslookup kubernetes.default || wget -qO- http://kubernetes.default'

# Expected output:
# Server:    10.96.0.10      # CoreDNS IP (from Service CIDR, default 10.96.0.0/12)
# Address:   10.96.0.10:53  # If you changed Service CIDR, this IP will differ
# Name:      kubernetes.default.svc.cluster.local
# Address:   10.96.0.1       # Kubernetes API Service IP

Testing Load Balancer

Let's verify the load balancer works by deploying a test application:

Create a Test Application

# test-lb-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: test-lb
  labels:
    lb-pool: default        # Request IP from our default pool (matches pool label selector)
  annotations:              # Override LB behavior per-service
    service.cilium.io/lb-algorithm: maglev   # or: random|round_robin
    service.cilium.io/lb-mode: dsr           # or: snat|hybrid
    # io.cilium/lb-ipam-ips: "192.168.0.205"  # Request specific static IP (optional)
    # io.cilium/lb-ipam-ips: "192.168.0.205,192.168.0.206"  # Multiple IPs supported (comma-separated)
spec:
  type: LoadBalancer       # This triggers Cilium to assign an external IP
  externalTrafficPolicy: Cluster  # Spreads traffic across all nodes, client IP NOT preserved
  # To preserve client IP: Change to 'Local' (only nodes with backends handle traffic)
  # DSR clarification:
  # - DSR + Cluster policy: Usually doesn't preserve client IP (but may in some eBPF paths)
  # - DSR + Local policy: Preserves client IP BUT drops traffic to nodes without backends
  # If you MUST guarantee client IP preservation, use Local and ensure backends on every node
  # that might receive traffic, or requests to "empty" nodes will drop (by design).
  # If clients can't route directly to node IPs (separate VLANs, L3 hops), prefer lb-mode: snat
  # DSR may silently "work sometimes" with routing issues. Use snat for reliability.
  ports:
    - port: 80             # External port
      targetPort: 8080     # Container port
  selector:
    app: test              # Which pods to send traffic to
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-deployment
spec:
  replicas: 3              # Run 3 copies for load balancing test
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test          # Label that service uses to find pods
    spec:
      containers:
      - name: echo
        image: ealen/echo-server:latest  # Automatically shows hostname/pod info
        ports:
        - containerPort: 8080
        env:
        - name: PORT
          value: "8080"

Deploy and Test

# Create the test application
kubectl apply -f test-lb-service.yaml

# Watch the service get an external IP
kubectl get svc test-lb -w

# Expected output (IP appears after ~10 seconds):
NAME      TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)
test-lb   LoadBalancer   10.96.10.100   <pending>       80:31234/TCP
test-lb   LoadBalancer   10.96.10.100   192.168.0.201   80:31234/TCP
#                        ^^^^^^^^^^^^   ^^^^^^^^^^^^^
#                        Service CIDR   Announced LB IP
# Note: CLUSTER-IP from Service CIDR (default 10.96.0.0/12); changes if you customized serviceSubnets

# Press Ctrl+C to stop watching

Test From Your Network

# Test from your workstation (not inside Kubernetes)
curl http://192.168.0.201

# Expected output (echo-server shows pod info):
# {"host":{"hostname":"test-deployment-7b8f5c6d7-abc123", ...}}

# If connection fails, flush ARP cache:
# On your client machine:
# Linux
ip neigh flush to 192.168.0.201
# macOS
sudo arp -d 192.168.0.201 || true
sudo arp -d -a            # flush all if still sticky

# Test multiple times to see load balancing
for i in {1..10}; do
  curl -s http://192.168.0.201 | grep -o '"hostname":"[^"]*"'
done

# You should see different pod names, proving load balancing works:
# "hostname":"test-deployment-7b8f5c6d7-abc123"
# "hostname":"test-deployment-7b8f5c6d7-def456"
# "hostname":"test-deployment-7b8f5c6d7-ghi789"
# ...

# Verify client IP preservation (from a non-cluster client):
curl -s http://192.168.0.201 | jq -r '.headers."X-Forwarded-For", .headers."X-Real-Ip", .remote_addr'
# With DSR + Cluster: May show node IP or client IP depending on eBPF path
# With DSR + Local: Shows real client IP (if backend exists on receiving node)
# With SNAT mode: Always shows node IP

# Clean up the test
kubectl delete -f test-lb-service.yaml

Troubleshooting Load Balancer Issues

If the external IP stays <pending>:

# Check if Cilium assigned an IP
kubectl get svc test-lb -o jsonpath='{.status.loadBalancer}'

# Check Cilium's IP allocation
kubectl get ciliumloadbalancerippools default-pool -o yaml

# Look for allocation events
kubectl describe svc test-lb | grep -A5 Events

# Check Cilium agent logs for L2 announcements
kubectl -n kube-system logs -l app.kubernetes.io/name=cilium-agent | grep -i "l2\|arp"

# If your chart uses k8s-app=cilium:
kubectl -n kube-system logs -l k8s-app=cilium --tail=100

# Watch service events for debugging
kubectl get events -A --field-selector \
  involvedObject.kind=Service,involvedObject.name=test-lb -w

# Make sure your router/DHCP server isn't leasing 192.168.0.201–210
# Check DHCP leases on your router's admin panel

Important: L2 announcements don't cross routers/VLANs. Ensure the LB IP is on the same L2 segment as the announcing nodes, or use BGP mode instead.

Network Policies

Network policies are firewall rules for your pods. Let's implement zero-trust networking where pods can't communicate unless explicitly allowed.

Understanding Network Policies

Think of network policies like security guards:

Default: All pods can talk to all pods (apartment building with no locks)
With policies: Only allowed connections work (secured building with access control)

Create a Default Deny Policy

This policy blocks all traffic by default - the foundation of zero-trust:

# default-deny-all.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  # Note: namespace will be set when applying to specific namespaces
spec:
  podSelector: {}              # Empty selector = all pods in namespace
  policyTypes:
  - Ingress                    # Block incoming traffic
  - Egress                     # Block outgoing traffic
                              # Result: Complete isolation

Allow DNS (Essential for Pods)

Pods need DNS to resolve service names. This policy allows only DNS traffic:

# allow-dns-egress.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns-egress
  # Note: namespace will be set when applying to specific namespaces
spec:
  podSelector: {}              # Apply to all pods
  policyTypes:
  - Egress                     # Only controlling outgoing traffic
  egress:
  - to:                        # Allow traffic to...
    - namespaceSelector:       # Pods in namespace with this label
        matchLabels:
          kubernetes.io/metadata.name: kube-system
      podSelector:             # That also have this label
        matchLabels:
          k8s-app: kube-dns    # CoreDNS pods (Talos default)
          # Alternative label (check your cluster):
          # app.kubernetes.io/name: coredns
    ports:
    - protocol: UDP            # DNS uses UDP
      port: 53
    - protocol: TCP            # Sometimes TCP for large responses
      port: 53

Apply Security Policies

# Create the policy files
cat > default-deny-all.yaml <<'EOF'
[paste the yaml above]
EOF

cat > allow-dns-egress.yaml <<'EOF'
[paste the yaml above]
EOF

# Create namespaces with security policies
for ns in production staging development; do
  echo "Setting up namespace: $ns"

  # Create namespace
  kubectl create namespace $ns

  # Apply deny-all policy (zero-trust baseline)
  sed "s/# Note: namespace will be set when applying to specific namespaces/namespace: $ns/" default-deny-all.yaml | kubectl apply -f -

  # Allow DNS (pods need this to function)
  sed "s/# Note: namespace will be set when applying to specific namespaces/namespace: $ns/" allow-dns-egress.yaml | kubectl apply -f -
done

# Verify policies are active
kubectl get networkpolicies -A

# Expected output:
NAMESPACE     NAME               POD-SELECTOR   AGE
production    default-deny-all   <none>         10s
production    allow-dns-egress   <none>         5s
staging       default-deny-all   <none>         10s
staging       allow-dns-egress   <none>         5s
development   default-deny-all   <none>         10s
development   allow-dns-egress   <none>         5s

Test Network Policies

Verify the policies are working:

# Deploy a test pod in production namespace
kubectl run test-pod --image=busybox --namespace=production -- sleep 3600

# Try to ping another pod (should fail due to policy)
kubectl exec -n production test-pod -- ping -c 1 10.244.0.1

# Expected output:
# PING 10.244.0.1: 56 data bytes
# --- 10.244.0.1 ping statistics ---
# 1 packets transmitted, 0 packets received, 100% packet loss

# But DNS should work
kubectl exec -n production test-pod -- nslookup kubernetes.default

# Expected output:
# Server:    10.96.0.10
# Address:   10.96.0.10:53
# Name:      kubernetes.default.svc.cluster.local

# Clean up
kubectl delete pod test-pod -n production

Example: Allow Specific Communication

Here's how to allow pods to communicate:

# allow-web-to-db.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-web-to-database
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: database            # Apply to database pods
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: web             # Allow from web pods
    ports:
    - protocol: TCP
      port: 5432              # PostgreSQL port

Hubble Observability

Hubble is Cilium's observability platform - think of it as X-ray vision for your network. You can see every packet, every connection, and every policy decision.

Access Hubble UI

# Forward Hubble UI to your local machine
# & runs it in background so you can continue using terminal
kubectl port-forward -n kube-system svc/hubble-ui 8080:80 &
KPF_UI=$!  # Capture PID for cleanup

# Open your browser to:
# http://localhost:8080

# Stop the port-forward when done:
# kill "$KPF_UI"

# You'll see a visual map of your pods and their connections!

Using Hubble CLI

For command-line visibility:

# Install Hubble CLI
export HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt)
OS=$(uname -s | tr '[:upper:]' '[:lower:]')  # linux or darwin
ARCH=$(uname -m); [ "$ARCH" = "x86_64" ] && ARCH="amd64" || ARCH="arm64"
curl -L --remote-name-all https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-${OS}-${ARCH}.tar.gz{,.sha256sum}

# Verify checksum (works on Linux and macOS)
if command -v sha256sum >/dev/null 2>&1; then
  sha256sum --check hubble-${OS}-${ARCH}.tar.gz.sha256sum
else
  shasum -a 256 -c hubble-${OS}-${ARCH}.tar.gz.sha256sum
fi

sudo tar xzvf hubble-${OS}-${ARCH}.tar.gz -C /usr/local/bin
rm hubble-${OS}-${ARCH}.tar.gz{,.sha256sum}

# Port-forward Hubble Relay
kubectl port-forward -n kube-system svc/hubble-relay 4245:80 &
KPF1=$!  # Capture PID for cleanup

# Check Hubble status
hubble status --server localhost:4245

# Watch live traffic
hubble observe --server localhost:4245 --namespace default

# Example output:
# TIMESTAMP             SOURCE           DESTINATION      TYPE      VERDICT
# Dec 28 09:15:23.123   web-pod-123      db-pod-456      L4/TCP    FORWARDED
# Dec 28 09:15:23.124   db-pod-456       web-pod-123      L4/TCP    FORWARDED

# Filter for dropped packets (great for debugging policies)
hubble observe --server localhost:4245 --verdict DROPPED

# Filter by labels (helpful for debugging specific apps)
hubble observe --server localhost:4245 --from-label app=web --to-label app=database

# See HTTP traffic
hubble observe --server localhost:4245 --protocol http

# Stop the port-forward when done
kill "$KPF1"  # Or use fg then Ctrl+C

What to Look for in Hubble

DROPPED verdicts: Network policies blocking traffic
Connection patterns: Which pods talk to which
Latency issues: Slow responses between services
Policy violations: Attempted connections that were blocked

Migration Story: Flannel to Cilium

Let me share my painful 6-hour CNI migration story so you can learn from my mistakes:

The 6-Hour Outage Timeline

Hour 1: "This should be quick" (Famous last words)

# What I did (DON'T DO THIS!)
kubectl delete -n kube-system daemonset kube-flannel-ds
helm install cilium cilium/cilium

# Immediate result:
kubectl get nodes
# All nodes NotReady - cluster broken!

Hour 2: Authentication Failures

# Cilium pods crashlooping
kubectl -n kube-system logs cilium-xxxxx
# "Failed to initialize Kubernetes client: Unauthorized"

# Problem: Service account tokens were stale
# Fix: Force kubelet to regenerate tokens
for node in $(kubectl get nodes -o name); do
  talosctl -n ${node##*/} service kubelet restart
done

Hour 3: IP Range Chaos

# Pods had wrong IPs
kubectl get pods -o wide
# IPs showing 10.245.x.x instead of 10.244.x.x

# Problem: Old Flannel config still cached
# Fix: Complete node reboot to clear IP allocations
talosctl -n 192.168.0.11-17 reboot

Hour 4: The DNS Disaster

# No pod could resolve DNS
kubectl exec test-pod -- nslookup kubernetes
# ;; connection timed out; no servers could be reached

# Problem: forwardKubeDNSToHost was still true!
# This was THE critical issue - took 2 hours to find

# Fix: Applied the patch from Part 2
talosctl patch machineconfig -n 192.168.0.11-17 \
  --patch '{"machine":{"features":{"hostDNS":{"forwardKubeDNSToHost":false}}}}'

Hour 5: Storage System Offline

# All persistent volumes broken
kubectl get pv
# All showing "Released" or "Failed"

# CSI pods couldn't reach API server
# Had to delete and recreate entire CSI deployment
kubectl -n longhorn-system delete pods --all

Hour 6: The Cleanup

# 35 pods still had Flannel IPs cached
# Systematic restart of everything
for ns in $(kubectl get ns -o name); do
  kubectl -n ${ns##*/} rollout restart deployment
  kubectl -n ${ns##*/} rollout restart statefulset
  kubectl -n ${ns##*/} rollout restart daemonset
done

# Finally working!

Root Causes Analysis

After post-mortem analysis, here's what actually went wrong:

CNI State Persistence: Flannel left behind:
- IP allocations in /var/lib/cni/
- Network namespaces in /var/run/netns/
- iptables rules that conflicted with Cilium
The DNS Configuration: forwardKubeDNSToHost: true causes:
- Host to query CoreDNS for cluster services
- CoreDNS returns cluster IPs
- Cilium eBPF doesn't masquerade these properly
- Result: Complete DNS failure
Service Account Tokens: When CNI changes:
- Existing tokens become invalid
- Pods can't authenticate to API server
- Kubelet restart forces new tokens

Lessons Learned

CNI migration requires careful planning
- Fresh installation is much safer than migration
- If you must migrate, plan for downtime and have rollback procedures
- Backup etcd before changes
- Document original configuration
- Test recovery procedure
IP ranges must match with Talos configuration:
- Talos podSubnets: 10.244.0.0/16
- Cilium IPAM mode 'kubernetes' uses Kubernetes node allocations
- Kubernetes allocates subnets from the Talos-defined pod CIDR

Clean state between CNIs:

# If you MUST migrate, clean everything:
rm -rf /var/lib/cni/
rm -rf /var/run/netns/
iptables -F
iptables -t nat -F
iptables -t mangle -F

Have a rollback plan:

Talos Note: On Talos (immutable OS), don't try to modify /var/lib/cni or iptables directly; reboot with talosctl reboot to clear stale CNI state.

Critical configuration for Cilium:

# In Talos config (MUST have):
machine:
  features:
    hostDNS:
      forwardKubeDNSToHost: false  # Critical!

Performance Tuning

After installation, optimize Cilium for your specific needs:

Monitor eBPF Performance

# Check BPF map usage (inside agent pod)
kubectl -n kube-system exec -it ds/cilium -c cilium-agent -- cilium-dbg bpf metrics

# Example output:
# BPF map pressure:
#   cilium_ct4_global: 15% (7864/52428)     # Connection tracking
#   cilium_lb4_services: 2% (10/512)        # Load balancer services
#   cilium_ipcache: 8% (412/5000)           # IP cache

# If any map shows >80%, increase its size (see below)

Tune Connection Tracking

For high-traffic environments, increase map sizes:

# Note: Prefer setting map sizes via Helm values / cilium-config ConfigMap
# for your exact version; env var names may change
#
# Increase connection tracking tables
# CT = Connection Tracking, NAT = Network Address Translation
kubectl -n kube-system set env daemonset/cilium \
  CILIUM_CNP_NODE_STATUS_GC_INTERVAL=5m \
  CILIUM_CT_MAP_SIZE=524288 \
  CILIUM_NAT_MAP_SIZE=524288

# What these do:
# CNP_NODE_STATUS_GC_INTERVAL: How often to clean up old entries
# CT_MAP_SIZE: Max concurrent connections (524K = good for production)
# NAT_MAP_SIZE: Max NAT translations

# Restart Cilium to apply
kubectl -n kube-system rollout restart daemonset/cilium

Enable Bandwidth Manager

For QoS (Quality of Service) and traffic shaping:
(Requires kernel with TC eBPF & sch_ingress/cls_bpf modules; Talos already ships modern kernels with these)

# Add to cilium-values.yaml:
bandwidthManager:
  enabled: true
  # bbr: false     # Include only if 'bbr' key exists in your chart version; enable BBR via sysctl below

# Or upgrade with:
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  --set bandwidthManager.enabled=true

# Optional: Enable BBR congestion control on each node (persist via your OS/Talos mechanism)
# sysctl -w net.core.default_qdisc=fq
# sysctl -w net.ipv4.tcp_congestion_control=bbr

# Now you can set bandwidth limits on pods using Kubernetes standard annotations:
kubectl annotate pod my-pod \
  kubernetes.io/ingress-bandwidth=10M \
  kubernetes.io/egress-bandwidth=10M

Performance Monitoring Commands

# Check Cilium agent performance
cilium status --all-health

# Monitor eBPF program execution time (inside agent pod)
kubectl -n kube-system exec -it ds/cilium -c cilium-agent -- \
  cilium-dbg bpf metrics | grep exec_time

# Check packet drops (agent CLI - run inside pod)
kubectl -n kube-system exec -it ds/cilium -c cilium-agent -- cilium-dbg monitor --type drop

# Or from your workstation using Hubble (preferred):
hubble observe --server localhost:4245 --verdict DROPPED

# Measure endpoint creation time
time kubectl run test --image=nginx --rm -it --restart=Never -- echo test

Troubleshooting Guide

Here are common issues and their solutions:

Pods Stuck in ContainerCreating

This usually means Cilium can't assign an IP:

# 1. Check Cilium agent logs for errors
# Note: Use cilium-dbg for debugging, cilium-cli for management
kubectl -n kube-system logs -l app.kubernetes.io/name=cilium-agent --tail=50 || \
kubectl -n kube-system logs -l k8s-app=cilium --tail=50

# For more detailed debugging, use cilium-dbg from within a Cilium pod:
kubectl -n kube-system exec -it ds/cilium -c cilium-agent -- cilium-dbg status

# Look for errors like:
# "Unable to allocate IP" - IP pool exhausted
# "Failed to create endpoint" - eBPF program issue

# 2. Verify IP allocation for the node
kubectl get ciliumnodes <node-name> -o yaml | grep -A5 ipam

# Should show:
# ipam:
#   pools:
#     allocated: 5
#     available: 251

# 3. Check for IP conflicts (inside agent pod)
kubectl -n kube-system exec -it ds/cilium -c cilium-agent -- cilium-dbg ip list

# 4. Describe the stuck pod for more details
kubectl describe pod <stuck-pod-name>

# Events section will show CNI errors

# 5. Check host firewall rules (if using ufw/firewalld/nftables)
# Some distros block ARP or NodePort traffic by default
# For quick triage, temporarily disable host firewall and retest:
# systemctl stop firewalld  # or: ufw disable

Load Balancer IP Not Responding

When services get IPs but don't respond:

# 1. Verify L2 announcement policy exists
kubectl get ciliuml2announcementpolicies

# If missing, reapply from earlier in this guide

# 2. Check if Cilium is announcing the IP
kubectl -n kube-system logs -l app.kubernetes.io/name=cilium-agent | grep -i "l2.*192.168.0.201" || \
kubectl -n kube-system logs -l k8s-app=cilium | grep -i "l2.*192.168.0.201"

# Should see: "Announcing L2 IP 192.168.0.201"

# 3. Verify ARP on your router/workstation
arp -a | grep 192.168.0.201

# Should show MAC address
# If not, interface name might be wrong

# 4. Check the correct interface is being used
kubectl get ciliuml2announcementpolicies -o yaml | grep interface

# Must match actual interface from: ip link show

Network Policies Not Working

Policies exist but traffic still flows:

# 1. Check enforcement mode (inside agent pod)
kubectl -n kube-system exec -it ds/cilium -c cilium-agent -- cilium-dbg config view | grep -i policy

# Should show: PolicyEnforcementMode: default

# 2. Verify policies are loaded (inside agent pod)
kubectl -n kube-system exec -it ds/cilium -c cilium-agent -- cilium-dbg policy get

# 3. Check identity allocation (inside agent pod)
kubectl -n kube-system exec -it ds/cilium -c cilium-agent -- cilium-dbg identity list

# Pods need identities for policies to work
# If missing, restart Cilium

# 4. Monitor dropped packets (agent CLI - run inside pod)
kubectl -n kube-system exec -it ds/cilium -c cilium-agent -- cilium-dbg monitor --type drop

# Or from workstation using Hubble (preferred):
hubble observe --server localhost:4245 --verdict DROPPED

# Shows real-time policy violations
# If nothing shown, policy might not be applying

# 5. Test specific pod-to-pod connectivity manually
kubectl exec -n <namespace> <src-pod> -- curl -sS http://<dst-pod-ip>:<port>

# 6. Filter Hubble traffic by labels (helpful for debugging specific apps)
hubble observe --server localhost:4245 --from-label app=web --to-label app=database

DNS Resolution Failures

Pods can't resolve service names:

# 1. Check CoreDNS is running
kubectl get pods -n kube-system -l k8s-app=kube-dns
# If no pods found, try alternative label:
kubectl -n kube-system logs -l app.kubernetes.io/name=coredns --tail=100 || \
kubectl -n kube-system logs -l k8s-app=kube-dns --tail=100

# 2. Test DNS from a pod
kubectl run test-dns --image=busybox --rm -it --restart=Never -- nslookup kubernetes

# 3. Check if it's a policy issue
kubectl get networkpolicy -n <namespace>

# Ensure DNS egress is allowed (we did this earlier)

# 4. Verify Cilium's DNS proxy (inside agent pod)
kubectl -n kube-system exec -it ds/cilium -c cilium-agent -- cilium-dbg config view | grep -i dns

High Memory/CPU Usage

Cilium using too many resources:

# 1. Check current usage (requires metrics-server; if not installed, skip this)
# To install metrics-server if missing:
# kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
kubectl top pods -n kube-system | grep cilium

# 2. Review eBPF map sizes (inside agent pod)
kubectl -n kube-system exec -it ds/cilium -c cilium-agent -- cilium-dbg bpf metrics

# Large maps = more memory

# 3. Reduce map sizes if needed
# Method A: Environment variables (as above)
kubectl -n kube-system set env daemonset/cilium \
  CILIUM_CT_MAP_SIZE=262144  # Reduce from 524288

# Method B: ConfigMap (more stable across upgrades)
kubectl -n kube-system edit configmap cilium-config
# Add or modify:
# bpf-ct-global-tcp-max: "524288"
# bpf-ct-global-any-max: "262144"
# bpf-nat-global-max:    "524288"
# Then restart: kubectl -n kube-system rollout restart daemonset/cilium

# 4. Disable unused features
# Edit cilium-values.yaml and reinstall

Security Best Practices

Let's harden your Cilium installation for production:

Enable WireGuard Encryption

WireGuard encryption protects node-to-node pod traffic:

Understanding the Impact

Performance cost: ~15% throughput reduction
CPU usage: Increases by 10-20%
Security benefit: Node-to-node pod traffic encrypted automatically
No app changes: Completely transparent to applications
Note: With nodeEncryption: true, protects traffic between nodes (not same-node pod traffic)

Enable Encryption

# Upgrade Cilium with encryption enabled
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  --set encryption.enabled=true \
  --set encryption.type=wireguard \
  --set encryption.nodeEncryption=true

# Monitor the rollout
kubectl -n kube-system rollout status daemonset/cilium

# Verify encryption is active (run inside agent pod)
kubectl -n kube-system exec -it ds/cilium -c cilium-agent -- cilium-dbg encrypt status

# Expected output:
# Encryption: Wireguard
# Keys in use: 1
# Max keys: 2
# Nodes with encryption: 7/7

Test Encryption

# Capture traffic between pods to verify encryption
# Run tcpdump on a node (via Talos)
talosctl -n 192.168.0.14 tcpdump -i eth0 -c 10 host 10.244.1.5

# You should see encrypted WireGuard packets, not plain text

Implement Advanced Cilium Network Policies

Cilium policies are more powerful than standard Kubernetes policies:

Layer 7 (Application) Policy Example

This policy inspects HTTP traffic content. Note: Cilium runs Envoy embedded in the agent for L7; you don't need a separate Envoy DaemonSet.

# cilium-layer7-policy.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: api-access-control
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: api-server         # Apply to API pods
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: web-frontend     # Allow from frontend pods
    toPorts:
    - ports:
      - port: "80"
        protocol: TCP
      rules:
        http:
        - method: "GET"       # Only allow GET requests
          path: "/api/public/.*"  # Only public API paths
        - method: "POST"
          path: "/api/auth/login"  # Allow login
          headerMatches:      # Match specific headers
          - name: "Content-Type"
            value: "application/json"  # Require JSON

Apply and test:

# Apply the policy
kubectl apply -f cilium-layer7-policy.yaml

# Test allowed request (should work)
kubectl exec -n production web-pod -- \
  curl -X GET http://api-server/api/public/health

# Test blocked request (should fail)
kubectl exec -n production web-pod -- \
  curl -X DELETE http://api-server/api/users/1

# Monitor blocked requests
hubble observe --server localhost:4245 --verdict DROPPED --namespace production

DNS Security Policy

Restrict which domains pods can resolve:

# dns-restriction-policy.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: restrict-external-dns
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      restricted: "true"      # Pods with this label
  egress:
  - toFQDNs:
    - matchPattern: "*.homelab.example"  # Allow internal
    - matchName: "github.com"            # Allow specific external
    - matchPattern: "*.docker.io"        # Allow Docker Hub
  - toEndpoints:
    - matchLabels:
        k8s-app: kube-dns    # Always allow cluster DNS
    toPorts:
    - ports:
      - port: "53"
        protocol: UDP

Performance Benchmarks

Here are the real improvements I measured after switching to Cilium:

Test Methodology

# Latency test (use busybox for ping since iperf3 lacks it)
kubectl run ping-server --image=busybox:1.36 --image-pull-policy=IfNotPresent -- sleep 3600
kubectl run ping-client --image=busybox:1.36 --image-pull-policy=IfNotPresent -- sleep 3600
kubectl exec ping-client -- ping -c 1000 <server-pod-ip>

# Throughput test (use iperf3 for bandwidth)
kubectl run perf-server --image=networkstatic/iperf3 --image-pull-policy=IfNotPresent -- -s
kubectl run perf-client --image=networkstatic/iperf3 --image-pull-policy=IfNotPresent -- sleep 3600
kubectl exec perf-client -- iperf3 -c <server-pod-ip> -t 60

# Load balancing test
kubectl run -i --rm perf-test --image=rakyll/hey -- \
  -n 100000 -c 50 http://<service-ip>

Results Comparison

Metric	Flannel + MetalLB	Cilium Only	Improvement
Pod-to-Pod Latency (same node)	0.28ms	0.19ms	32% lower
Pod-to-Pod Latency (cross-node)	0.52ms	0.41ms	21% lower
Throughput (10Gb network)	9.41 Gbps	9.87 Gbps	5% higher
Service Load Balancing	45,000 rps	52,000 rps	15% higher
P99 Latency @ 10k rps	12ms	8ms	33% lower
CPU per node (idle)	~400m	~250m	37% less
Memory per node	~320MB	~380MB	60MB more

Why Cilium Performs Better

eBPF in kernel space: No context switching between kernel and userspace
No iptables overhead: Direct packet processing without rule traversal
Efficient load balancing: Maglev algorithm with consistent hashing
Native integration: Single component instead of CNI + LB combo

Resource Usage Over Time

# After 30 days running production workloads:
# Flannel + MetalLB: 68 iptables rules per node, growing
# Cilium: 0 iptables rules, all eBPF

# Connection tracking entries:
# Flannel: ~15,000 conntrack entries
# Cilium: Handled in eBPF maps, no conntrack pressure

What's Next

Your cluster now has enterprise-grade networking! Nodes are Ready, pods can communicate, services get external IPs, and network policies provide security. In Part 4, we'll add persistent storage with Rook-Ceph, enabling your applications to store data that survives pod restarts and node failures.

Key Takeaways

Through implementing Cilium and surviving a painful migration, here's what I learned:

eBPF is a game-changer: 32% latency reduction and 37% less CPU usage is significant
Native L2 load balancing works perfectly: No need for MetalLB complexity
forwardKubeDNSToHost: false is CRITICAL: This single setting caused 6 hours of downtime
CNI migration is risky: Prefer fresh installations; if you must migrate, follow the official guide
Hubble is invaluable: Visual network debugging saves hours
Network policies from day one: Much easier than retrofitting security
Interface names matter: Wrong interface = no load balancing

Troubleshooting Checklist

Save this for when things go wrong:

[ ] Nodes NotReady? → Check Cilium pods are running
[ ] Pods stuck Creating? → Check IP allocation with kubectl -n kube-system exec ds/cilium -c cilium-agent -- cilium-dbg ip list
For migrations: Regular Linux: sudo rm -rf /var/lib/cni/* /var/run/netns/*
On Talos (immutable): Reboot nodes with talosctl reboot to clear CNI state
[ ] LoadBalancer pending? → Verify L2 announcement policy and interface
[ ] DNS not working? → Ensure DNS egress policy exists
[ ] Policies not blocking? → Check with hubble observe --server localhost:4245 --verdict DROPPED
[ ] High CPU usage? → Review eBPF map sizes
[ ] Can't see traffic? → Use Hubble UI or CLI

For complex issues, collect a support bundle:

# Collect a support bundle (logs, events, config, BPF maps)
cilium sysdump --output-filename cilium-sysdump-$(date +%Y%m%d-%H%M%S).zip

Command Quick Reference

# Health checks
cilium status --wait
cilium connectivity test
kubectl get nodes

# Quick verbose status (surfaces datapath issues fast)
kubectl -n kube-system exec ds/cilium -c cilium-agent -- cilium-dbg status --verbose

# Troubleshooting
# If exec ds/cilium fails, use: POD=$(kubectl -n kube-system get pods -l app.kubernetes.io/name=cilium-agent -o jsonpath='{.items[0].metadata.name}')
kubectl -n kube-system exec ds/cilium -c cilium-agent -- cilium-dbg monitor --type drop  # Watch drops
hubble observe --server localhost:4245 --verdict DROPPED    # See policy violations
kubectl -n kube-system exec ds/cilium -c cilium-agent -- cilium-dbg bpf metrics  # Check eBPF performance

# Load balancer
kubectl get ciliumloadbalancerippools
kubectl get ciliuml2announcementpolicies

# Network policies
kubectl get networkpolicies -A
kubectl get ciliumnetworkpolicies -A

# Rollback/Uninstall (if needed)
helm -n kube-system history cilium      # Check revision history
helm -n kube-system rollback cilium <REVISION>  # Rollback to specific revision
# helm -n kube-system uninstall cilium   # Complete uninstall
# Note: After uninstall, nodes return to NotReady (no CNI) until reinstalling

References

Official Documentation

Cilium Documentation v1.18 - Complete Cilium guide
eBPF Introduction - Understanding eBPF technology
Cilium Network Policies - Advanced policy examples
Hubble Observability - Network visibility guide

Specific Issues and Solutions

L2 Announcements Guide - Load balancer setup
Talos + Cilium Integration - Official integration guide
Cilium Troubleshooting - Common problems and fixes

Performance and Tuning

eBPF Map Sizing - Tuning for scale
Maglev Load Balancing - Google's algorithm paper

Continue to Part 4: Distributed Storage with Rook-Ceph →