K8S Production Checklist

Components Security
  • API Server authorization-mode=Node,RBAC
  • Ensure all services are protected by TLS
  • Ensure kubelet protects its API via authorization-mode=Webhook
  • Ensure the kube-dashboard uses a restrictive RBAC role policy and v1.7+
  • Closely monitor all RBAC policy failures
  • Remove default ServiceAccount permissions
Network Security
  • Filter access to the cloud provider metadata APIs/URL, and Limit IAM permissions
  • Use a CNI network plugin that filters ingress/egress pod network traffic
    • Properly label all pods
    • Isolate all workloads from each other
    • Prevent workloads from egressing to the Internet, the Pod IP space, the Node IP subnets, and/or other internal networks
    • Establish network ingress policies
    • Restrict all traffic coming into the kube-system namespace except kube-dns
  • Consider a Service Mesh
Workload Containment and Security
  • Namespaces per tenant
  • Default network "deny" inbound on all namespaces
  • Assign CPU/RAM limits to all containers
  • Set automountServiceAccountToken: false on pods where possible
  • Use a PodSecurityPolicy to enforce container restrictions and to protect the node
  • Implement container-aware malicious activity / behavioral detection
Platform reliability and availability plan
  • Cover
    • Logging
    • Monitoring
    • Alerting
    • Capacity Planning
    • Auditing
  • Each of these points needs to be addressed in two contexts
    • The overall platform
    • Applications running on the platform
  • For the applications the capabilities need to be made available
    • Release new apps/re-stage
    • Access to running apps
    • Synthetic monitoring
Misc Security
  • Collect logs from all containers, especially the RBAC access/deny logs
  • Encrypt the contents of etcd, and run etcd on dedicated nodes
  • Separate Cloud accounts/VPCs/projects/resource groups
  • Separate clusters for dev/test and production environments
  • Separate node pools for different tenants
  • Establish onboarding process
  • User documentation (Onboarding, Operating, Container image pipeline, Finding images to use)
  • Determine who is K8S cluster admin