Adding RHEL Worker Node to an existing Openshift 4.x Cluster

With part of RedHat Openshift 4.5 default installation you normally get worker nodes using CoreOS as operating system. CoreOS is a really nice small operating system that works great with Openshift, but sometimes not all applications like IBM Spectrum Scale can be installed on CoreOS. By using RedHat Enterprise Linux Worker node you can do the same as CoreOS, but also adding other features like Spectrum Scale for Persistent Volume.Visio v1

Prepare the nodes

In my case I do want to replace my three CoreOS Worker Nodes with RHEL Worker Nodes.

Before you start, make sure you create the DNS records for the new worker nodes. Easiest way is to copycat the existing DNS records for your other worker node.

A Record compute-3.test.cristie.local
A Record compute-4.test.cristie.local
A Record compute-5.test.cristie.local

To be able to deploy 3 new worker nodes I also need a Playbook server, so in this case, make sure you deploy 4 new RHEL 7.8 Servers.

If you need help to deploy and enable subscription to your RHEL installation you can always follow this documentation from RedHat: 

https://docs.openshift.com/container-platform/4.5/machine_management/adding-rhel-compute.html

 

And to create a Playbook server you can follow this instruction:

https://docs.openshift.com/container-platform/4.5/machine_management/adding-rhel-compute.html#rhel-preparing-playbook-machine_adding-rhel-compute

 

Create only one RedHat Enterprise Linux Worker node by using the following instruction, we will later clone this server. 

https://docs.openshift.com/container-platform/4.5/machine_management/adding-rhel-compute.html#rhel-preparing-playbook-machine_adding-rhel-compute

 

After installing our Playbook Server and first worker node. Create the user core on both nodes and give it sudo access with NOPASSWD: ALL

Now let's shutdown the first worker node and clone it. If you go bare metal to bare metal you can use our solution Cristie TBMR to clone that machine to another, or you can always use the Hypervisor Clone functionality to create second and third worker node.

 

Configure Playbook Server

Login as core user and download the oc / kubectl binary from RedHat website.

https://cloud.redhat.com/openshift/install

 

Create the a private SSH Key on your Playbook server, this will be used for the playbook server to connect to all worker nodes.

 

$ ssh-keygen -t rsa -b 4096 -f /home/core/.ssh/id_rsa -N ''
Generating public/private rsa key pair.
Your identification has been saved in /home/core/.ssh/id_rsa.
Your public key has been saved in /home/core/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:xxxxxx core@ocplaybook
The key's randomart image is:
+---[RSA 4096]----+
+----[SHA256]-----+

 

Copy your ssh key to all your worker nodes.

$ ssh-copy-id core@compute-x.test.cristie.local

Create a hosts file in your inventory directory, in my case have I created the directory in my core home.

$ mkdir ~/inventory/
$ vi ./inventory/hosts

In the hosts file add the following linkes.

[all:vars]
ansible_user=core
ansible_become=True
 
openshift_kubeconfig_path="~/.kube/config"
 
[new_workers]
compute-3.test.cristie.local
compute-4.test.cristie.local
compute-5.test.cristie.local

Now let's install the ansible client on your Playbook server. 

$ sudo yum install openshift-ansible openshift-clients jq -y

 

When I try to install openshift-ansible do I get a dependency problem that I can only install anisble 2.8 but it required 2.9.5 so I download the latest version from following link

https://releases.ansible.com/ansible/rpm/release/epel-7-x86_64/

 

$ sudo yum install wget -y
$ wget https://releases.ansible.com/ansible/rpm/release/epel-7-x86_64/ansible-2.9.9-1.el7.ans.noarch.rpm
$ sudo yum install ansible-2.9.9-1.el7.ans.noarch.rpm -y
$ sudo yum install openshift-ansible openshift-clients jq -y

Now can you start deploying Openshift to your new nodes.

$ cd /usr/share/ansible/openshift-ansible
$ ansible-playbook -i ~/inventory/hosts playbooks/scaleup.yml

When this is done it should look like following output

 

ok: [compute-3.test.cristie.local -> localhost]
ok: [compute-5.test.cristie.local -> localhost]
ok: [compute-4.test.cristie.local -> localhost]
 
PLAY RECAP **********************************************************************************************************
compute-3.test.cristie.local : ok=40 changed=29 unreachable=0 failed=0 skipped=8 rescued=0 ignored=0
compute-4.test.cristie.local : ok=40 changed=29 unreachable=0 failed=0 skipped=8 rescued=0 ignored=0
compute-5.test.cristie.local : ok=40 changed=29 unreachable=0 failed=0 skipped=8 rescued=0 ignored=0
localhost : ok=1 changed=1 unreachable=0 failed=0 skipped=3 rescued=0 ignored=0
 
Wednesday 12 August 2020 14:35:58 +0200 (0:01:04.901) 0:06:42.055 ******
===============================================================================
openshift_node : Install openshift support packages -------------------------------------------------------- 152.26s
openshift_node : Install openshift packages ----------------------------------------------------------------- 91.30s
openshift_node : Wait for node to report ready -------------------------------------------------------------- 64.90s
openshift_node : Reboot the host and wait for it to come back ----------------------------------------------- 21.06s
openshift_node : Approve node CSRs -------------------------------------------------------------------------- 16.65s
openshift_node : Pull release image ------------------------------------------------------------------------- 13.82s
openshift_node : Pull MCD image ------------------------------------------------------------------------------ 8.43s
openshift_node : Wait for bootstrap endpoint to show up ------------------------------------------------------ 4.70s
openshift_node : Fetch bootstrap ignition file locally ------------------------------------------------------- 4.29s
openshift_node : Get machine controller daemon image from release image -------------------------------------- 2.85s
openshift_node : Apply ignition manifest --------------------------------------------------------------------- 2.55s
Gathering Facts ---------------------------------------------------------------------------------------------- 1.92s
openshift_node : Disable firewalld service ------------------------------------------------------------------- 1.35s
openshift_node : Setting sebool container_manage_cgroup ------------------------------------------------------ 1.24s
openshift_node : Restart the CRI-O service ------------------------------------------------------------------- 1.08s
openshift_node : Write /etc/containers/registries.conf ------------------------------------------------------- 1.02s
openshift_node : Setting sebool container_use_cephfs --------------------------------------------------------- 1.00s
openshift_node : Setting sebool virt_use_samba --------------------------------------------------------------- 0.94s
openshift_node : Enable the CRI-O service -------------------------------------------------------------------- 0.84s
openshift_node : Get cluster nodes --------------------------------------------------------------------------- 0.68s

Now when the installation is done, you need to add the new nodes to your load-balancer.

In our case are we using a DNS Load-Balancer in Windows DNS and in that case are we only adding the new node in to apps.test.cristie.local

 

To verify if the node get installed on the machines you can then run

 

$ oc get nodes

 

Delete Old CoreOS Machines

https://docs.openshift.com/container-platform/4.5/machine_management/adding-rhel-compute.html#rhel-removing-rhcos_adding-rhel-compute

Christian Petersson

Subscribe to blog