Rebuilding/Adding Worker Nodes
This article will show us how to recover from a lost worker node or can also be used for adding a new worker node. This assumes a UPI-based install but this process should work the same with even IPI methods. The advantage you will have with a IPI-based install, is through the use of MachineSets but that won’t be mentioned here (yet).
For demonstration purposes, let’s assume that worker4 needs to be replaced.
Replacing/Adding Worker4
To replace/add worker4 back, we will need to get an ignition file with the contents of the CA of the OCP cluster.
- To get the contents of the CA cert, run the following OC command:
oc describe cm root-ca -n kube-system
You will want to grab the cert (starting and BEGIN CERTIFICATE line and ending at END CERTIFICATE line) and save to a file (such as ca.crt).
2. Take the contents of the ca.crt file and encode to base64.
cat ca.crt | base64 -w0 > ca.crt.base64
To build the ignition file needed to restore/replace this worker, we will use the following template:
{"ignition":{"config":{"merge":[{"source":"https://<apifqdn>:22623/config/worker"}]},"security":{"tls":{"certificateAuthorities":[{"source":"data:text/plain;charset=utf-8;base64, <base64decodedcrt>"}]}},"version":"3.1.0"}}
Replace the <apifqdn> with the fqdn of your api server.
Replace the <base64decodedcacrt> with the contents of ca.crt.base64decodedcacrt
A sample of this template is located at:
Place this file on a web server that is reachable from the control-plane network.
- Download the ISO based on version of OCP that you are running (this cluster is 4.7.13). The ISO images are located at:
My specific ISO is
2. In my case, I am using a virtual machine but you can also boot your physical machine from this ISO.
3. When the ISO is booted, run the following command:
sudo coreos-installer install \
–ignition-url=https://host/worker.ign /dev/sda
If using an http url, pass the –insecure-ignition switch to the above command
4. Once the install is complete, you can issue a reboot command. Sometimes the node will not reboot automatically.
5. The node will reboot a few times to apply the various machine configs.
6. Lastly, wait for some new CSR requests to come in based on the new worker node by issuing the following command:
oc get csr
7. You can loop through each of the requests that come in by issuing the following command:
for i in `oc get csr|awk '{print $1}'|grep -v NAME
` ; do oc adm certificate approve $i; done;
You may need to issue this command a few times based on some new/pending requests coming in.
After approving the CSRs, issue the “oc get node” command. This command will show the new worker registered to the cluster. It may show a “Not Ready” status momentarily but will evenutally go to a “Ready” status.