What to do if certificates are rotten and the cluster turns into a pumpkin?

If in response to the kubectl get pod command you get:

 Unable to connect to the server: x509: certificate has expired or is not yet valid 

then, most likely, a year has passed, your kubernetes certificates expired, the cluster components stopped using them, the interaction between them stopped and your cluster turned into a pumpkin.

image

What to do and how to restore a cluster?

First, we need to understand where the certificates that need to be updated are located.

Depending on the way the cluster was installed, the location and name of the certificate files may vary. So, for example, when creating a cluster, Kubeadm decomposes certificate files according to best-practices . Thus, all certificates are located in the /etc/kuberenetes/pki , in files with the extension .crt , private keys, respectively, in the .key files. Plus in /etc/kubernetes/ are .conf files with access configuration for user accounts administrator, controller manager, sheduler and kubelet from the master node. Certificates in .conf files are in the user.client-certificate-data field in base64-encoded form.

You can look at the expiration date to whom it was issued and by whom the certificate was signed using this small shcert script

shcert
 #!/bin/bash [ -f "$1" ] || exit if [[ $1 =~ \.(crt|pem)$ ]]; then openssl x509 -in "$1" -text -noout fi if [[ $1 =~ \.conf$ ]]; then certfile=$(mktemp) grep 'client-certificate-data:' "$1"| awk '{ print $2}' | base64 -d > "$certfile" openssl x509 -in "$certfile" -text -noout rm -f "$certfile" fi 


There are still certificates that use kubelet on work nodes for authentication in the API. If you used kubeadm join to add nodes to the cluster, then most likely the node was connected using the TLS bootstrapping procedure and in this case kubelet can renew its certificate automatically if it is given the --rotate-certificates option. In recent versions of kubernetes, this option is already enabled by default.
Checking that the node is connected using the TLS bootstrap procedure is quite simple - in this case, the /etc/kubernetes/kubelet.conf file is usually specified in the client-certificate field in the /var/lib/kubelet/pki/kubelet-client-current.pem file which is a symlink to the current certificate.

You can also see the expiration dates of this certificate using the shcert script

We return to the problem of renewing certificates.

If you installed the cluster using kubeadm, then I have good news for you. Starting with version 1.15, kubeadm can update almost all control plane certificates with one command

 kubeadm alpha certs renew all 

This command will renew all certificates in the / etc / kubernetes directory, even if they have already expired and everything has broken.

Only the kubelet certificate will not be updated - this is the one that lies in the /etc/kubernetes/kubelet.conf file!
To renew this certificate, use the create user account command

 kubeadm alpha kubeconfig user --client-name system:node:kube.slurm.io --org system:nodes > /etc/kubernetes/kubelet.conf 

If the system has a user account, this command updates the certificate for this account. Do not forget to specify the correct host name in the --client-name option, you can --client-name host name in the Subject field of an existing certificate:

 shcert /etc/kubernetes/kubelet.conf 

And of course, after updating the certificates, you need to restart all components of the control plane, rebooting the entire node or stopping the containers with etcd, api, controller-manager and scheduler with the docker stop , and then restarting kubelet systemctl restart kubelet .

If your cluster is an old version: 1.13 or less, it simply will not work to upgrade kubeadm to 1.15, since it pulls along the dependencies kubelet and kubernetes-cni, which can cause problems, since the performance of cluster components differing in versions by more than one stage, not guaranteed. The easiest way out of this situation is to install kubeadm on some other machine, take the binary file /usr/bin/kubeadm , copy it to the master nodes of the deceased cluster and use it only to renew certificates. And after the cluster has been revitalized, update it step by step using regular methods, installing kubeadm one version newer each time.

And finally, from version 1.15 kubeadm learned how to renew all-all certificates when updating a cluster with the kubeadm upgrade command. So if you regularly update your cluster at least once a year, your certificates will always be valid.

But if the cluster is not installed using kubeadm, then you will have to pick up openssl and renew all the certificates individually.

The problem is that the certificates contain extended fields, and different cluster installation tools can add their own set of fields. Moreover, the names of these fields in the openssl configuration and in the output of the certificate contents are correlated, but weakly. It is necessary to google and select.

I will give an example configuration for openssl, in separate sections of which extended attributes are described, specific for each type of certificate. We will refer to the corresponding section when creating and signing csr. This configuration was used to revitalize the cluster established a year ago by the rancher.

openssl.cnf
 [req] distinguished_name = req_distinguished_name req_extensions = v3_req [v3_req] keyUsage = nonRepudiation, digitalSignature, keyEncipherment extendedKeyUsage = clientAuth [client] keyUsage = critical,digitalSignature, keyEncipherment extendedKeyUsage = clientAuth [apiproxyclient] keyUsage = critical,digitalSignature, keyEncipherment extendedKeyUsage = clientAuth, serverAuth [etcd] keyUsage = critical,digitalSignature, keyEncipherment extendedKeyUsage = clientAuth, serverAuth subjectAltName = @alt_names [api] keyUsage = critical,digitalSignature, keyEncipherment extendedKeyUsage = clientAuth, serverAuth subjectAltName = @alt_names [alt_names] DNS.1 = ec2-us-east-1-1a-c1-master-2 DNS.2 = ec2-us-east-1-1a-c1-master-3 DNS.3 = ec2-us-east-1-1a-c1-master-1 DNS.4 = localhost DNS.5 = kubernetes DNS.6 = kubernetes.default DNS.7 = kubernetes.default.svc DNS.8 = kubernetes.default.svc.cluster.local IP.1 = 10.0.0.109 IP.2 = 10.0.0.159 IP.3 = 10.0.0.236 IP.4 = 127.0.0.1 IP.5 = 10.43.0.1 


Actual attributes and additional names in the certificate can be viewed using the command

 openssl x509 -in cert.crt -text 

When renewing the certificate for the server API, I had a problem: the updated certificate did not work. The solution was to issue a certificate that was valid for 1 year in the past.

In openssl, you cannot issue a certificate valid in the past with a simple command, the code strictly states that the certificate is valid only from the current moment. But you can locally go back in time using the libfaketime library

 yum install libfaketime LD_PRELOAD=/usr/lib64/faketime/libfaketime.so.1 FAKETIME="-365d" openssl x509 -req ... 

We issue extended certificates according to the following algorithm:

We create a CSR using an existing certificate, specify the desired section with a list of advanced attributes in the configuration file:

 openssl x509 -x509toreq -in "node.cert" -out "node.csr" -signkey "node.key" -extfile "openssl.cnf" -extensions client 

We sign it with the corresponding root certificate, shifting the time by 1 year ago and specifying the desired section with a list of advanced attributes in the configuration file

 LD_PRELOAD=/usr/lib64/faketime/libfaketime.so.1 FAKETIME="-365d" openssl x509 -req -days 36500 -in "node.csr" -CA "kube-ca.pem" -CAkey "kube-ca-key.pem" -CAcreateserial -out "node.new.cert" -extfile "openssl.cnf" -extensions client 

We check the attributes and restart the components of the control plane.

Sergey Bondarev,
Slurm teacher
slurm.io

Source: https://habr.com/ru/post/465733/


All Articles