Fully Self-Contained OCP Cluster

THIS IS A PROOF OF CONCEPT AND IS IN NO WAY A SUPPORTED WAY TO RUN A CLUSTER

Some customers are looking at ways of deploying OCP clusters in a manner in which the cluster is totally self-sufficient meaning that once the cluster is built, there is no reliance on an any resources outside the cluster (not even a registry).  Think of an edge deployment or an isolated environment that has no connection to the outside world (IE: disconnected environment).

The resources that are typically required by any running cluster are a DHCP server, DNS server, external registry, etc.  

We are looking at this type of deployment for the smaller-footprint (one less server) to fit into a small space (IE: closet).  6 small-form factor servers can be used for this deployment (3 masters and 3 workers).

Here are some requirements that were given to us based on this deployment:

-No Virtualization (bare-metal)
-Small form-factor servers
-No external DNS server (look at using machine-config hosts file)
-No DHCP Server (this can be accomplished through AI installer with params file)
-No external registry
-AI Installer can be used.

The idea with this experiment is to create the OCP clusters in an assembly-line fashion.  IP addresses, FQDN, cluster names can be the exact same for each cluster that is deployed.  A registry will be used at the staging site to build the cluster but there will be no registry once the cluster is moved.

When the OCP clusters are built in an assembly-line function, it was found that an action that needed to be performed to ensure the cluster came up properly once it reached its destination.  This was to find a way to store every image that is used on the cluster locally on each node since there is no registry or any upstream DNS servers at the deployment site.

One of the impacts of having a standalone OCP cluster without any DNS servers is that some of the nodes would not come up all the way if the nodes were rebooted.  The symptoms of a node or cluster being rebooted without any DNS or external connectivity (disconnected environment), is as follows:

  1. Since there is a systemd process on the nodes that occasionally wipes old images, sometimes the nodes would not have the images needed to even be able to start the Kubelet process (openshift-release images)
  2. In other cases, even when Kubelet starts, there may be other images that will fail to pull (ImagePullBackoff) due to no DNS resolution or outside connectivity.  Typically, at a minimum, quay.io and registry.redhat.io need to resolve.

Here is a way that I duplicated this behavior in my lab.  I shut-off my DNS server and rebooted nodes/cluster.  

Once the cluster attempted to start (some nodes started and others did not due to Kubelet failing), I ran the following commands to look for new errors.  Luckily, in this situation, the API was responding but this might not always be the case.

This is the set of commands I ran to see the pods that failed or any events that occurred in the cluster due to the ImagePullBackOff error.

oc get events -A --raw='' --sort-by='.lastTimestamp'
oc get po -A|grep -v -e Running -e Completed
oc get co

Within a few minutes, I started to see some issues with pulling some images (output of oc get events command listed above).  This is to be expected since the external registries' DNS names can no longer be resolved.

A look at the Openshift web console shows the pods that are now failing as a result of the imagePull error.

On the nodes that failed to start Kubelet, here is an error message that showed up in /var/log/messages

quay.io/openshift-release-dev/[email protected]:fd7ce3da297b589c7b3c34f6dc820f4d71a51ec367424749e76fbfad06298456 is needed before kubelet starts up

Which steps do we need to perform to get around this behavior?

As I mentioned, one way to get around this would be to configure an additional location for the CRIO Imagestore on each master/worker (readonly so systemd wipe process can't remove).  At a minimum, this imagestore would need to have the openshift-release images and anything else that is critical for key services in the cluster to start.  Think of images that are needed for ODF, Quay registry, etc.  Even if we have a registry on the cluster, this is the chicken before egg problem.  If the cluster doesn't come up, there is no OCP registry.  

We need to have all of the images that are needed on every node (because the workloads will not always get scheduled to the same nodes they are already on).

Here are the exact steps we can follow:

Note: Your nodes will need to have more than the minimum requirement of 120GB of storage for this to work.  Depending on how many images are running in the cluster, I would think roughly 300GB on each node would be sufficient.  

Also, be sure that any workloads you will want to run on the cluster are already running since we wouldn't be able to pull anything new once the cluster is moved to isolated environment.

  1. To gather the list of images that are needed for the cluster, create a script called GatherImages.sh based on the contents below.  This will need to be run with access to the cluster using oc commands.
oc get po -A|awk '{print "oc describe po "$2" -n " $1}' > workloads.sh;
chmod 755 workloads.sh;
./workloads.sh|grep "Image:"|sort|uniq|awk '{print $2}' > images.lst;

2.  Copy this images.lst file to each node.

3.  On each node, run the following (crictl needs to be used to pull digests)

for image in `cat images.lst`; do crictl pull $image; done;

In some cases, the images will already exist.  A message saying that "image is up to date" may be displayed.

4.  Let's create an alternate location to store some local images

Run this set of commands on each node (as root).  Some error message may appear when running the chattr command (operation not supported) at the end but this does not cause a problem.

mkdir /home/core/images;
cd /var/lib/containers/storage;
cp -ar * /home/core/images
# The following makes it so the images can't be deleted or modified
chattr -R +i +a /home/core/images

5.  Add Machine Config to use additional ImageStore location.

99-container-storage-conf machine-config-master (just change label to apply this to workers)

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-container-master-storage-conf
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
        - contents:
            source: >-
              data:text/plain;charset=utf-8;base64,IyBUaGlzIGZpbGUgaXMgZ2VuZXJhdGVkIGJ5IHRoZSBNYWNoaW5lIENvbmZpZyBPcGVyYXRvcidzIGNvbnRhaW5lcnJ1bnRpbWVjb25maWcgY29udHJvbGxlci4KIwojIHN0b3JhZ2UuY29uZiBpcyB0aGUgY29uZmlndXJhdGlvbiBmaWxlIGZvciBhbGwgdG9vbHMKIyB0aGF0IHNoYXJlIHRoZSBjb250YWluZXJzL3N0b3JhZ2UgbGlicmFyaWVzCiMgU2VlIG1hbiA1IGNvbnRhaW5lcnMtc3RvcmFnZS5jb25mIGZvciBtb3JlIGluZm9ybWF0aW9uCiMgVGhlICJjb250YWluZXIgc3RvcmFnZSIgdGFibGUgY29udGFpbnMgYWxsIG9mIHRoZSBzZXJ2ZXIgb3B0aW9ucy4KW3N0b3JhZ2VdCgojIERlZmF1bHQgU3RvcmFnZSBEcml2ZXIKZHJpdmVyID0gIm92ZXJsYXkiCgojIFRlbXBvcmFyeSBzdG9yYWdlIGxvY2F0aW9uCnJ1bnJvb3QgPSAiL3Zhci9ydW4vY29udGFpbmVycy9zdG9yYWdlIgoKIyBQcmltYXJ5IFJlYWQvV3JpdGUgbG9jYXRpb24gb2YgY29udGFpbmVyIHN0b3JhZ2UKZ3JhcGhyb290ID0gIi92YXIvbGliL2NvbnRhaW5lcnMvc3RvcmFnZSIKCltzdG9yYWdlLm9wdGlvbnNdCiMgU3RvcmFnZSBvcHRpb25zIHRvIGJlIHBhc3NlZCB0byB1bmRlcmx5aW5nIHN0b3JhZ2UgZHJpdmVycwoKIyBBZGRpdGlvbmFsSW1hZ2VTdG9yZXMgaXMgdXNlZCB0byBwYXNzIHBhdGhzIHRvIGFkZGl0aW9uYWwgUmVhZC9Pbmx5IGltYWdlIHN0b3JlcwojIE11c3QgYmUgY29tbWEgc2VwYXJhdGVkIGxpc3QuCmFkZGl0aW9uYWxpbWFnZXN0b3JlcyA9IFsiL2hvbWUvY29yZS9pbWFnZXMiLF0KCiMgU2l6ZSBpcyB1c2VkIHRvIHNldCBhIG1heGltdW0gc2l6ZSBvZiB0aGUgY29udGFpbmVyIGltYWdlLiAgT25seSBzdXBwb3J0ZWQgYnkKIyBjZXJ0YWluIGNvbnRhaW5lciBzdG9yYWdlIGRyaXZlcnMuCnNpemUgPSAiIgoKIyBSZW1hcC1VSURzL0dJRHMgaXMgdGhlIG1hcHBpbmcgZnJvbSBVSURzL0dJRHMgYXMgdGhleSBzaG91bGQgYXBwZWFyIGluc2lkZSBvZgojIGEgY29udGFpbmVyLCB0byBVSURzL0dJRHMgYXMgdGhleSBzaG91bGQgYXBwZWFyIG91dHNpZGUgb2YgdGhlIGNvbnRhaW5lciwgYW5kCiMgdGhlIGxlbmd0aCBvZiB0aGUgcmFuZ2Ugb2YgVUlEcy9HSURzLiAgQWRkaXRpb25hbCBtYXBwZWQgc2V0cyBjYW4gYmUgbGlzdGVkCiMgYW5kIHdpbGwgYmUgaGVlZGVkIGJ5IGxpYnJhcmllcywgYnV0IHRoZXJlIGFyZSBsaW1pdHMgdG8gdGhlIG51bWJlciBvZgojIG1hcHBpbmdzIHdoaWNoIHRoZSBrZXJuZWwgd2lsbCBhbGxvdyB3aGVuIHlvdSBsYXRlciBhdHRlbXB0IHRvIHJ1biBhCiMgY29udGFpbmVyLgojCiMgcmVtYXAtdWlkcyA9IDA6MTY2ODQ0MjQ3OTo2NTUzNgojIHJlbWFwLWdpZHMgPSAwOjE2Njg0NDI0Nzk6NjU1MzYKCiMgUmVtYXAtVXNlci9Hcm91cCBpcyBhIG5hbWUgd2hpY2ggY2FuIGJlIHVzZWQgdG8gbG9vayB1cCBvbmUgb3IgbW9yZSBVSUQvR0lECiMgcmFuZ2VzIGluIHRoZSAvZXRjL3N1YnVpZCBvciAvZXRjL3N1YmdpZCBmaWxlLiAgTWFwcGluZ3MgYXJlIHNldCB1cCBzdGFydGluZwojIHdpdGggYW4gaW4tY29udGFpbmVyIElEIG9mIDAgYW5kIHRoZSBhIGhvc3QtbGV2ZWwgSUQgdGFrZW4gZnJvbSB0aGUgbG93ZXN0CiMgcmFuZ2UgdGhhdCBtYXRjaGVzIHRoZSBzcGVjaWZpZWQgbmFtZSwgYW5kIHVzaW5nIHRoZSBsZW5ndGggb2YgdGhhdCByYW5nZS4KIyBBZGRpdGlvbmFsIHJhbmdlcyBhcmUgdGhlbiBhc3NpZ25lZCwgdXNpbmcgdGhlIHJhbmdlcyB3aGljaCBzcGVjaWZ5IHRoZQojIGxvd2VzdCBob3N0LWxldmVsIElEcyBmaXJzdCwgdG8gdGhlIGxvd2VzdCBub3QteWV0LW1hcHBlZCBjb250YWluZXItbGV2ZWwgSUQsCiMgdW50aWwgYWxsIG9mIHRoZSBlbnRyaWVzIGhhdmUgYmVlbiB1c2VkIGZvciBtYXBzLgojCiMgcmVtYXAtdXNlciA9ICJzdG9yYWdlIgojIHJlbWFwLWdyb3VwID0gInN0b3JhZ2UiCgpbc3RvcmFnZS5vcHRpb25zLnRoaW5wb29sXQojIFN0b3JhZ2UgT3B0aW9ucyBmb3IgdGhpbnBvb2wKCiMgYXV0b2V4dGVuZF9wZXJjZW50IGRldGVybWluZXMgdGhlIGFtb3VudCBieSB3aGljaCBwb29sIG5lZWRzIHRvIGJlCiMgZ3Jvd24uIFRoaXMgaXMgc3BlY2lmaWVkIGluIHRlcm1zIG9mICUgb2YgcG9vbCBzaXplLiBTbyBhIHZhbHVlIG9mIDIwIG1lYW5zCiMgdGhhdCB3aGVuIHRocmVzaG9sZCBpcyBoaXQsIHBvb2wgd2lsbCBiZSBncm93biBieSAyMCUgb2YgZXhpc3RpbmcKIyBwb29sIHNpemUuCiMgYXV0b2V4dGVuZF9wZXJjZW50ID0gIjIwIgoKIyBhdXRvZXh0ZW5kX3RocmVzaG9sZCBkZXRlcm1pbmVzIHRoZSBwb29sIGV4dGVuc2lvbiB0aHJlc2hvbGQgaW4gdGVybXMKIyBvZiBwZXJjZW50YWdlIG9mIHBvb2wgc2l6ZS4gRm9yIGV4YW1wbGUsIGlmIHRocmVzaG9sZCBpcyA2MCwgdGhhdCBtZWFucyB3aGVuCiMgcG9vbCBpcyA2MCUgZnVsbCwgdGhyZXNob2xkIGhhcyBiZWVuIGhpdC4KIyBhdXRvZXh0ZW5kX3RocmVzaG9sZCA9ICI4MCIKCiMgYmFzZXNpemUgc3BlY2lmaWVzIHRoZSBzaXplIHRvIHVzZSB3aGVuIGNyZWF0aW5nIHRoZSBiYXNlIGRldmljZSwgd2hpY2gKIyBsaW1pdHMgdGhlIHNpemUgb2YgaW1hZ2VzIGFuZCBjb250YWluZXJzLgojIGJhc2VzaXplID0gIjEwRyIKCiMgYmxvY2tzaXplIHNwZWNpZmllcyBhIGN1c3RvbSBibG9ja3NpemUgdG8gdXNlIGZvciB0aGUgdGhpbiBwb29sLgojIGJsb2Nrc2l6ZT0iNjRrIgoKIyBkaXJlY3Rsdm1fZGV2aWNlIHNwZWNpZmllcyBhIGN1c3RvbSBibG9jayBzdG9yYWdlIGRldmljZSB0byB1c2UgZm9yIHRoZQojIHRoaW4gcG9vbC4gUmVxdWlyZWQgaWYgeW91IHNldHVwIGRldmljZW1hcHBlcgojIGRpcmVjdGx2bV9kZXZpY2UgPSAiIgoKIyBkaXJlY3Rsdm1fZGV2aWNlX2ZvcmNlIHdpcGVzIGRldmljZSBldmVuIGlmIGRldmljZSBhbHJlYWR5IGhhcyBhIGZpbGVzeXN0ZW0KIyBkaXJlY3Rsdm1fZGV2aWNlX2ZvcmNlID0gIlRydWUiCgojIGZzIHNwZWNpZmllcyB0aGUgZmlsZXN5c3RlbSB0eXBlIHRvIHVzZSBmb3IgdGhlIGJhc2UgZGV2aWNlLgojIGZzPSJ4ZnMiCgojIGxvZ19sZXZlbCBzZXRzIHRoZSBsb2cgbGV2ZWwgb2YgZGV2aWNlbWFwcGVyLgojIDA6IExvZ0xldmVsU3VwcHJlc3MgMCAoRGVmYXVsdCkKIyAyOiBMb2dMZXZlbEZhdGFsCiMgMzogTG9nTGV2ZWxFcnIKIyA0OiBMb2dMZXZlbFdhcm4KIyA1OiBMb2dMZXZlbE5vdGljZQojIDY6IExvZ0xldmVsSW5mbwojIDc6IExvZ0xldmVsRGVidWcKIyBsb2dfbGV2ZWwgPSAiNyIKCiMgbWluX2ZyZWVfc3BhY2Ugc3BlY2lmaWVzIHRoZSBtaW4gZnJlZSBzcGFjZSBwZXJjZW50IGluIGEgdGhpbiBwb29sIHJlcXVpcmUgZm9yCiMgbmV3IGRldmljZSBjcmVhdGlvbiB0byBzdWNjZWVkLiBWYWxpZCB2YWx1ZXMgYXJlIGZyb20gMCUgLSA5OSUuCiMgVmFsdWUgMCUgZGlzYWJsZXMKIyBtaW5fZnJlZV9zcGFjZSA9ICIxMCUiCgojIG1rZnNhcmcgc3BlY2lmaWVzIGV4dHJhIG1rZnMgYXJndW1lbnRzIHRvIGJlIHVzZWQgd2hlbiBjcmVhdGluZyB0aGUgYmFzZQojIGRldmljZS4KIyBta2ZzYXJnID0gIiIKCiMgbW91bnRvcHQgc3BlY2lmaWVzIGV4dHJhIG1vdW50IG9wdGlvbnMgdXNlZCB3aGVuIG1vdW50aW5nIHRoZSB0aGluIGRldmljZXMuCiMgbW91bnRvcHQgPSAiIgoKIyB1c2VfZGVmZXJyZWRfcmVtb3ZhbCBNYXJraW5nIGRldmljZSBmb3IgZGVmZXJyZWQgcmVtb3ZhbAojIHVzZV9kZWZlcnJlZF9yZW1vdmFsID0gIlRydWUiCgojIHVzZV9kZWZlcnJlZF9kZWxldGlvbiBNYXJraW5nIGRldmljZSBmb3IgZGVmZXJyZWQgZGVsZXRpb24KIyB1c2VfZGVmZXJyZWRfZGVsZXRpb24gPSAiVHJ1ZSIKCiMgeGZzX25vc3BhY2VfbWF4X3JldHJpZXMgc3BlY2lmaWVzIHRoZSBtYXhpbXVtIG51bWJlciBvZiByZXRyaWVzIFhGUyBzaG91bGQKIyBhdHRlbXB0IHRvIGNvbXBsZXRlIElPIHdoZW4gRU5PU1BDIChubyBzcGFjZSkgZXJyb3IgaXMgcmV0dXJuZWQgYnkKIyB1bmRlcmx5aW5nIHN0b3JhZ2UgZGV2aWNlLgojIHhmc19ub3NwYWNlX21heF9yZXRyaWVzID0gIjAiCg==
          mode: 420
          overwrite: true
          path: /etc/containers/storage.conf

The change that is specifically being made in /etc/containers/storage.conf is as follows:

additionalimagestores = ["/home/core/images",]

Now Kubelet will start even when there is no DNS access (IE: when host can't connect to quay.io or registry.redhat.io to pull images).

To test things, out stop your DNS resolver server and/or disconnect your cluster from the Internet and reboot the nodes a few times.  I tested this a couple of different ways.

  1. Cleanly (draining the nodes)
  2. Soft reboot
  3. Forcibly Powering off.

In each case, the nodes came back online quickly and API access worked within a minute or two (since no images needed to be pulled again).

Keith Calligan

Keith Calligan