Red Hat OpenShift Day-2 Operations

⚠️ WIP

Red Hat OpenShift Day-2 Operations

OpenShift Identity Providers

Configure `htpasswd`

Docs: Configuring an htpasswd identity provider

Step 1: Create an htpasswd file to store the user and password information:

htpasswd -c -B -b users.htpasswd rguske <password>

Add a new user to the file:

htpasswd -bB users.htpasswd rbohne 'r3dh4t1!'

htpasswd -bB users.htpasswd devuser 'r3dh4t1!'

Remove an existing user:

htpasswd -D users.htpasswd <username>

Replacing an updated users.htpasswd file:

oc create secret generic htpass-secret --from-file=htpasswd=users.htpasswd --dry-run=client -o yaml -n openshift-config | oc replace -f -

Step 2: Create a Kubernetes secret:

oc create secret generic htpass-secret-rguske --from-file=htpasswd=<path_to_rguske.htpasswd> -n openshift-config

oc create secret generic htpass-secret-devuser --from-file=htpasswd=<path_to_devuser.htpasswd> -n openshift-config

This can also be done using the OpenShift User Interface:

Updating User

oc get secret htpass-secret -ojsonpath={.data.htpasswd} -n openshift-config | base64 --decode > users.htpasswd

Configure RBAC Permissions

Docs: Using RBAC to define and apply permissions

Add cluster-wide admin priviledges to e.g. user rguske:

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: rguske-cluster-admin
subjects:
  - kind: User
    apiGroup: rbac.authorization.k8s.io
    name: rguske
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin

Alternatively via the WebUi:

Configuring LDAP

Configuring an LDAP identity provider

To use the identity provider, you must define an OpenShift Container Platform Secret object that contains the bindPassword field.

oc create secret generic ldap-secret \
    --from-literal=bindPassword='r3dh4t1!' \
    -n openshift-config

Identity providers use OpenShift Container Platform ConfigMap objects in the openshift-config namespace to contain the certificate authority bundle. These are primarily used to contain certificate bundles needed by the identity provider.

oc create configmap ca-config-map \
    --from-file=ca.crt=/path/to/ca \
    -n openshift-config

There's also an option to skip the certificate verification:

      insecure: true

Obtain the certificate from a Windows Server

Via MMC (Microsoft Management Console): Open MMC:

Press Win + R, type mmc, press Enter. Add the Certificates Snap-in:

In MMC, go to File > Add/Remove Snap-in. Select Certificates, click Add. Choose Computer account, then Local computer, click Finish. Navigate to the Certificate:

Expand Certificates (Local Computer). Look under: Personal > Certificates for most service-related certs. Web Hosting > Certificates for IIS SSL certs. Export the Certificate:

Right-click the certificate > All Tasks > Export. Use the Certificate Export Wizard. Choose Yes, export the private key if needed (e.g., for backup or moving). Choose format: .PFX (with private key), or .CER (public cert only).

Validations

Validate the bind user and the appropriate configuration using ldapsearch:

ldapsearch -x -H ldap://jarvisnas.jarvis.lab \
-D "uid=root,cn=users,dc=ldap,dc=jarvis,dc=lab" \
-b "dc=ldap,dc=jarvis,dc=lab" \
-W "(objectClass=*)"

Creating the CR

Creating the LDAP CR:

The following custom resource (CR) shows the parameters and acceptable values for an LDAP identity provider.

apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
  - name: ldapidp
    mappingMethod: claim
    type: LDAP
    ldap:
      attributes:
        id:
        - dn
        email:
        - mail
        name:
        - cn
        preferredUsername:
        - uid
      bindDN: "sa-ldap-bind"
      bindPassword:
        name: ldap-bind-password-qrzn9
#      ca:
#        name: ca-config-map
      insecure: true
      url: "ldap://w2k19-dc.rguske.coe.muc.redhat.com/DC=rguske,DC=coe,DC=muc,DC=redhat,DC=com?sAMAccountName"

Node Configurations

NTP using Chrony

Docs - Configuring chrony time service

You can set the time server and related settings used by the chrony time service (chronyd) by modifying the contents of the chrony.conf file.

Create a Butane config including the contents of the chrony.conf file. For example, to configure chrony on worker nodes, create a 99-worker-chrony.bu file.

tee 99-worker-chrony.bu > /dev/null <<'EOF'
variant: openshift
version: 4.18.0
metadata:
  name: 99-worker-chrony
  labels:
    machineconfiguration.openshift.io/role: worker
storage:
  files:
  - path: /etc/chrony.conf
    mode: 0644
    overwrite: true
    contents:
      inline: |
        server 10.10.42.20 iburst
        driftfile /var/lib/chrony/drift
        makestep 1.0 3
        rtcsync
        logdir /var/log/chrony
EOF

Use Butane (brew install butane) to generate a MachineConfig object file, 99-worker-chrony.yaml, containing the configuration to be delivered to the nodes:

butane 99-worker-chrony.bu -o 99-worker-chrony.yaml

Apply the config: oc apply -f 99-worker-chrony.yaml

Alternatively to butane:

chronybase64=$(cat << EOF | base64 -w 0
server 10.10.42.20 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
EOF
)

oc apply -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 50-worker-chrony
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,${chronybase64}
        filesystem: root
        mode: 0644
        path: /etc/chrony.conf
EOF

Make a Control-Plane Node `scheduable`

Red Hat KB6148012 - How to schedule pod on master node where scheduling is disabled?

oc get scheduler cluster -oyaml

apiVersion: config.openshift.io/v1
kind: Scheduler
metadata:
  creationTimestamp: "2025-01-28T15:20:20Z"
  generation: 1
  name: cluster
  resourceVersion: "542"
  uid: 59f6fef1-e88a-484a-8e3c-fa38e6e300b3
spec:
  mastersSchedulable: false
  policy:
    name: ""
status: {}

Edit the scheduler CR and configure the spec: mastersSchedulable: true

oc get nodes
NAME                  STATUS   ROLES                         AGE     VERSION
ocp1-h5ggj-master-0   Ready    control-plane,master,worker   2d19h   v1.30.6
ocp1-h5ggj-master-1   Ready    control-plane,master,worker   2d19h   v1.30.6
ocp1-h5ggj-master-2   Ready    control-plane,master,worker   2d19h   v1.30.6
ocp1-h5ggj-worker-0   Ready    worker                        2d18h   v1.30.6
ocp1-h5ggj-worker-1   Ready    worker                        2d18h   v1.30.6

Troubleshooting

Gathering Logs

Creating must-gather with more details for specific components in OCP 4

Data Collection Audit logs:

oc adm must-gather -- /usr/bin/gather_audit_logs

Default must-gather including the audit logs:

oc adm must-gather -- '/usr/bin/gather && /usr/bin/gather_audit_logs'

OCPV:

oc adm must-gather --image-stream=openshift/must-gather --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel[8,9]:[operator_version]

The [8,9] should be replaced based on the version of OCP 4.12 uses rhel8, and OCP 4.13 and later uses rhel9. The [operator_version] tag should be in format v4.y.z.

Examples - 4.17: oc adm must-gather --image-stream=openshift/must-gather --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8:v4.17.4

oc adm must-gather \
 --image-stream=openshift/must-gather \
 --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel9:v4.17.4 \
 --image=registry.redhat.io/workload-availability/node-healthcheck-must-gather-rhel9:v0.9.0

How to generate a sosreport within nodes without SSH in OCP 4

oc get nodes
NAME                  STATUS   ROLES                         AGE     VERSION
ocp1-h5ggj-master-0   Ready    control-plane,master,worker   2d19h   v1.30.6
ocp1-h5ggj-master-1   Ready    control-plane,master,worker   2d19h   v1.30.6
ocp1-h5ggj-master-2   Ready    control-plane,master,worker   2d19h   v1.30.6
ocp1-h5ggj-worker-0   Ready    worker                        2d18h   v1.30.6
ocp1-h5ggj-worker-1   Ready    worker                        2d18h   v1.30.6

Then, create a debug session with oc debug node/ (in this example oc debug node/node-1). The debug session will spawn a pod using the tools image from the release (which doesn't contain sos):

oc debug node/ocp1-h5ggj-master-0

chroot /host bash
[root@ocp1-h5ggj-master-0 /]#  cat /etc/redhat-release
Red Hat Enterprise Linux CoreOS release 4.17

$ toolbox

Trying to pull registry.redhat.io/rhel9/support-tools:latest...
Getting image source signatures
Checking if image destination supports signatures
Copying blob facf1e7dd3e0 done   |
Copying blob a0e56de801f5 done   |
Copying blob ec465ce79861 done   |
Copying blob cbea42b25984 done   |
Copying config a627accb68 done   |
Writing manifest to image destination
Storing signatures
a627accb682adb407580be0d7d707afbcb90abf2f407a0b0519bacafa15dd409
Spawning a container 'toolbox-root' with image 'registry.redhat.io/rhel9/support-tools'
Detected RUN label in the container image. Using that as the default...
ebf4dd2b82bf8ebeab55291c8ca195b61e13c9fc5d8dfb095f5fdcbcdabae2df
toolbox-root
Container started successfully. To exit, type 'exit'.

sosreport -e openshift -k crio.all=on -k crio.logs=on -k podman.all=on -k podman.logs=on --all-logs

Nested Virtualization

How to set the CPU model to Passthrough in OpenShift Virtualization?

oc create -f - <<EOF
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  annotations:
  labels:
    app: rhel9-pod-bridge
    kubevirt.io/dynamic-credentials-support: "true"
  name: rhel9-pod-bridge
spec:
  dataVolumeTemplates:
    - apiVersion: cdi.kubevirt.io/v1beta1
      kind: DataVolume
      metadata:
        name: rhel9-pod-bridge
      spec:
        sourceRef:
          kind: DataSource
          name: rhel9
          namespace: openshift-virtualization-os-images
        storage:
          accessModes:
            - ReadWriteMany
          storageClassName: thin-csi
          resources:
            requests:
              storage: 30Gi
  running: false
  template:
    metadata:
      annotations:
        vm.kubevirt.io/flavor: tiny
        vm.kubevirt.io/os: rhel9
        vm.kubevirt.io/workload: server
        kubevirt.io/allow-pod-bridge-network-live-migration: ""
      labels:
        kubevirt.io/domain: rhel9-pod-bridge
        kubevirt.io/size: tiny
    spec:
      domain:
        cpu:
          model: host-passthrough
          cores: 1
          sockets: 1
          threads: 1
        devices:
          disks:
            - disk:
                bus: virtio
              name: rootdisk
            - disk:
                bus: virtio
              name: cloudinitdisk
          interfaces:
            - bridge: {}
              name: default
        machine:
          type: pc-q35-rhel9.2.0
        memory:
          guest: 1.5Gi
      networks:
        - name: default
          pod: {}
      terminationGracePeriodSeconds: 180
      volumes:
        - dataVolume:
            name: rhel9-pod-bridge
          name: rootdisk
        - cloudInitNoCloud:
            userData: |-
              #cloud-config
              user: cloud-user
              password: redhat
              chpasswd: { expire: False }
          name: cloudinitdisk
EOF

Other sources:

OpenShift Virtualization reports no nodes are available, cannot start VMs

Nested virtualization in OpenShift Virtualization

Enable features on vSphere:

Replacing the default Ingress Certificate

Prerequisites:

You must have a wildcard certificate for the fully qualified .apps subdomain and its corresponding private key. Each should be in a separate PEM format file.
The private key must be unencrypted. If your key is encrypted, decrypt it before importing it into OpenShift Container Platform.
The certificate must include the subjectAltName extension showing *.apps...
The certificate file can contain one or more certificates in a chain. The wildcard certificate must be the first certificate in the file. It can then be followed with any intermediate certificates, and the file should end with the root CA certificate.
Copy the root CA certificate into an additional PEM format file.
Verify that all certificates which include -----END CERTIFICATE----- also end with one carriage return after that line.

Create a config map that includes only the root CA certificate used to sign the wildcard certificate:

oc apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: user-ca-bundle
  namespace: openshift-config
data:
  ca-bundle.crt: |
    # MyPrivateCA (root.crt)
    -----BEGIN CERTIFICATE-----
   zzzzz
    -----END CERTIFICATE-----
EOF

Update the cluster-wide proxy configuration with the newly created config map:

oc patch proxy/cluster \
     --type=merge \
     --patch='{"spec":{"trustedCA":{"name":"user-ca-bundle"}}}'

Create a secret that contains the wildcard certificate chain and key:

oc create secret tls ocp1-wildcard-cert \
     --cert='/Users/rguske/Downloads/ocp1.rguske/chain.crt' \
     --key='/Users/rguske/Downloads/ocp1.rguske/key.key' \
     -n openshift-ingress

Update the Ingress Controller configuration with the newly created secret:

// Replace the secret name

oc patch ingresscontroller.operator default \
     --type=merge -p \
     '{"spec":{"defaultCertificate": {"name": "ocp1-wildcard-cert"}}}' \
     -n openshift-ingress-operator

Watch the ClusterOperator (co) for the status update.

OpenShift Web Console Customizations

Customizing the Web Console in OpenShift Container Platform

Docs - Customizing the web console in OpenShift Container Platform

oc create configmap console-custom-logo --from-file /path/to/console-custom-logo.png -n openshift-config

oc create configmap console-custom-logo --from-file '/Users/rguske/Documents/ironman.jpg' -n openshift-config

Edit the web console’s Operator configuration to include customLogoFile and customProductName:

oc edit consoles.operator.openshift.io cluster

apiVersion: operator.openshift.io/v1
kind: Console
metadata:
  name: cluster
spec:
  customization:
    customLogoFile:
      key: ironman.jpg
      name: console-custom-logo
    customProductName: My Console

Once the Operator configuration is updated, it will sync the custom logo config map into the console namespace, mount it to the console pod, and redeploy.

Validate: oc get clusteroperator console

Customizing the Login/Provider Page

Docs - Customizing the login page

Run the following commands to create templates you can modify:

oc adm create-login-template > login.html

Alternatively, adjust the existing login.html and or provider.html.

Export the existing login.html and provider.html:

POD=$(oc get pods -n openshift-authentication -o name | head -n 1)

oc exec -n openshift-authentication "$POD" -- cat /var/config/system/secrets/v4-0-config-system-ocp-branding-template/login.html > login.html

oc exec -n openshift-authentication "$POD" -- cat /var/config/system/secrets/v4-0-config-system-ocp-branding-template/providers.html > providers.html

Choose an image ewhich you'd like to use for the replacement. Encode the the image into base64. Base64 Guru helps.

Replace the base64 value in the login.html. Search for background-image:url(data:image/, pay attention to the file format (png, svg, jpg), adjust it if necessary and replace the base64 value of the image.

Create the secrets:

oc -n openshift-config get secret
NAME                                      TYPE                             DATA   AGE
etcd-client                               kubernetes.io/tls                2      8d
htpasswd-dm9mt                            Opaque                           1      6d1h
initial-service-account-private-key       Opaque                           1      8d
pull-secret                               kubernetes.io/dockerconfigjson   1      8d
webhook-authentication-integrated-oauth   Opaque                           1      8d

oc create secret generic login-template --from-file=login.html -n openshift-config

oc create secret generic providers-template --from-file=providers.html -n openshift-config

Edit the oauth CR:

oc edit oauths cluster

apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
# ...
spec:
  templates:
    error:
        name: error-template
    login:
        name: login-template
    providerSelection:
        name: providers-template

After editing the CR, the pods within the openshift-authentication namespace will be redeployed.

oc -n openshift-authentication get pods -w
NAME                              READY   STATUS    RESTARTS   AGE
oauth-openshift-8c7859b9f-fwsnl   1/1     Running   0          6m55s
oauth-openshift-8c7859b9f-kp8rw   1/1     Running   0          7m53s
oauth-openshift-8c7859b9f-qw7wl   1/1     Running   0          7m25s
oauth-openshift-8c7859b9f-kp8rw   1/1     Terminating   0          8m42s
oauth-openshift-664fbb9d49-r5bzk   0/1     Pending       0          0s
oauth-openshift-664fbb9d49-r5bzk   0/1     Pending       0          0s
oauth-openshift-8c7859b9f-kp8rw    0/1     Terminating   0          9m8s
oauth-openshift-664fbb9d49-r5bzk   0/1     Pending       0          26s
oauth-openshift-664fbb9d49-r5bzk   0/1     Pending       0          26s
oauth-openshift-664fbb9d49-r5bzk   0/1     ContainerCreating   0          26s
oauth-openshift-8c7859b9f-kp8rw    0/1     Terminating         0          9m8s
oauth-openshift-8c7859b9f-kp8rw    0/1     Terminating         0          9m8s
oauth-openshift-664fbb9d49-r5bzk   0/1     ContainerCreating   0          27s
oauth-openshift-664fbb9d49-r5bzk   0/1     Running             0          27s
oauth-openshift-664fbb9d49-r5bzk   1/1     Running             0          28s

Registry Authentication

oc create secret docker-registry docker-hub \
    --docker-server=docker.io \
    --docker-username= \
    --docker-password='' \
    --docker-email=''

oc secrets link default docker-hub --for=pull

Activate Internal Registry

Docs - Changing the image registry’s management state

You need to first activate the Internal Registry by changing its state to managed. To start the image registry, you must change the Image Registry Operator configuration’s managementState from Removed to Managed.

oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"managementState":"Managed"}}'

Default:

oc get configs.imageregistry.operator.openshift.io cluster
NAME      AGE
cluster   36d

oc get configs.imageregistry.operator.openshift.io cluster -oyaml | grep managementState
  managementState: Removed

oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"managementState":"Managed"}}'
config.imageregistry.operator.openshift.io/cluster patched

oc get configs.imageregistry.operator.openshift.io cluster -oyaml | grep managementState
  managementState: Managed

Storage Configuration

Docs - Image registry storage configuration

Verify that you do not have a registry pod:

oc get pod -n openshift-image-registry -l docker-registry=default

Edit the cluster operator:

oc edit configs.imageregistry.operator.openshift.io

Adjust the storage section accordingly. Leave the claim field blank to allow the automatic creation of an image-registry-storage persistent volume claim (PVC).

[...]
storage:
    pvc:
      claim:
[...]

Activate the Registry Route

Docs - Enable the Image Registry default route with the Custom Resource Definition

In OpenShift Container Platform, the Registry Operator controls the OpenShift image registry feature. The Operator is defined by the configs.imageregistry.operator.openshift.io Custom Resource Definition (CRD).

If you need to automatically enable the Image Registry default route, patch the Image Registry Operator CRD.

oc patch configs.imageregistry.operator.openshift.io/cluster --type merge -p '{"spec":{"defaultRoute":true}}'

oc -n openshift-image-registry get route
NAME            HOST/PORT                                                        PATH   SERVICES         PORT    TERMINATION   WILDCARD
default-route   default-route-openshift-image-registry.apps.ocp-mk1.jarvis.lab          image-registry   <all>   reencrypt     None

Accessing the Registry

Docs - Exposing a default registry manually

podman login -u rguske -p $(oc whoami -t) --tls-verify=false $HOST
Login Succeeded!

With Certificate:

oc extract secret/$(oc get ingresscontroller -n openshift-ingress-operator default -o json | jq '.spec.defaultCertificate.name // "router-certs-default"' -r) -n openshift-ingress --confirm

sudo mv tls.crt /etc/pki/ca-trust/source/anchors/

sudo update-ca-trust enable

Create a Secret using the exracted certificates:

oc create secret tls public-route-tls \
    -n openshift-image-registry \
    --cert=/Users/rguske/Downloads/tls.crt \
    --key=/Users/rguske/Downloads/tls.key

Configure the Operator using oc edit configs.imageregistry.operator.openshift.io/cluster

  routes:
    - name: public-routes
      hostname: default-route-openshift-image-registry.apps.ocp-mk1.jarvis.lab
      secretName: public-route-tls

Quick NFS Storage

Install the NFS Server

In can be handy to have a NFS backend storage for an OpenShift cluster available quickly. The following instructions guides you through the installation of a NFS server installed on a RHEL bastion host.

Install the NFS package and activate the service:

dnf install nfs-utils -y
systemctl enable nfs-server.service
systemctl start nfs-server.service
systemctl status nfs-server.service

Create the directory in which the Persistent Volumes will be stored in:

mkdir /srv/nfs-storage-pv-user-pvs
chmod g+w /srv/nfs-storage-pv-user-pvs

Configure the folder as well as the network CIDR for the systems which are accessing the NFS server:

vi /etc/exports
/srv/nfs-storage-pv-user-pvs  10.198.15.0/24(rw,sync,no_root_squash)
systemctl restart nfs-server
exportfs -arv
exportfs -s

Configure the firewall on the RHEL accordingly:

firewall-cmd --permanent --add-service=nfs
firewall-cmd --permanent --add-service=rpc-bind
firewall-cmd --permanent --add-service=mountd
firewall-cmd --reload

NFS CSI Driver

# Add Helm repo
helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts

# List versions
helm search repo -l csi-driver-nfs

Install the NFS provisioner:

helm install csi-driver-nfs csi-driver-nfs/csi-driver-nfs --version 4.11.0 \
  --create-namespace \
  --namespace csi-driver-nfs \
  --set controller.runOnControlPlane=true \
  --set controller.replicas=2 \
  --set controller.strategyType=RollingUpdate \
  --set externalSnapshotter.enabled=true \
  --set externalSnapshotter.customResourceDefinitions.enabled=false

For a SNO setup:

helm install csi-driver-nfs csi-driver-nfs/csi-driver-nfs --version 4.11.0 \
  --create-namespace \
  --namespace csi-driver-nfs \
  --set controller.runOnControlPlane=true \
  --set controller.strategyType=RollingUpdate \
  --set externalSnapshotter.enabled=true \
  --set externalSnapshotter.customResourceDefinitions.enabled=false

Grant additional permissions to the ServiceAccounts:

oc adm policy add-scc-to-user privileged -z csi-nfs-node-sa -n csi-driver-nfs

oc adm policy add-scc-to-user privileged -z csi-nfs-controller-sa -n csi-driver-nfs

Create a StorageClass:

oc apply -f https://raw.githubusercontent.com/rguske/openshift-day-two/refs/heads/main/manifests/nfs-storageclass.yaml

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-csi
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: nfs.csi.k8s.io
parameters:
  server: 10.10.42.20   ### NFS server's IP/FQDN
  share: /volume1/nfs_ds/ocp             ### NFS server's exported directory
  subDir: ${pvc.metadata.namespace}-${pvc.metadata.name}-${pv.metadata.name}  ### Folder/subdir name template
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true

Create a SnapshotClass:

oc apply -f https://raw.githubusercontent.com/rguske/openshift-day-two/refs/heads/main/manifests/nfs-volumesnapshotclass.yaml

---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
deletionPolicy: Delete
driver: nfs.csi.k8s.io
metadata:
  name: csi-nfs-snapclass

Set the StorageClass to default:

oc annotate storageclass/nfs-csi storageclass.kubernetes.io/is-default-class=true

OpenShift NFS Provisioner Template

We need a NFS provisioner in order to consume the NFS service. Create the following OpenShift template and make sure to adjust the IP address as well as the path to the NFS folder accordingly at the end of the file:

Example:

- name: NFS_SERVER
  required: true
  value: xxx.xxx.xxx.xxx ## IP of the host which runs the NFS server
- name: NFS_PATH
  required: true
  value: /srv/nfs-storage-pv-user-pvs ## folder which was configured on the NFS server

Create the template:

tee nfs-provisioner-template.yaml > /dev/null <<'EOF'
apiVersion: template.openshift.io/v1
kind: Template
labels:
  template: nfs-client-provisioner
message: 'NFS storage class ${STORAGE_CLASS} created.'
metadata:
  annotations:
    description: nfs-client-provisioner
    openshift.io/display-name: nfs-client-provisioner
    openshift.io/provider-display-name: Tiger Team
    tags: infra,nfs
    template.openshift.io/documentation-url: nfs-client-provisioner
    template.openshift.io/long-description: nfs-client-provisioner
    version: 0.0.1
  name: nfs-client-provisioner
objects:
- kind: Namespace
  apiVersion: v1
  metadata:
    name: ${TARGET_NAMESPACE}
- kind: ServiceAccount
  apiVersion: v1
  metadata:
    name: nfs-client-provisioner
    namespace: ${TARGET_NAMESPACE}
- kind: ClusterRole
  apiVersion: rbac.authorization.k8s.io/v1
  metadata:
    name: nfs-client-provisioner-runner
  rules:
    - apiGroups: [""]
      resources: ["persistentvolumes"]
      verbs: ["get", "list", "watch", "create", "delete"]
    - apiGroups: [""]
      resources: ["persistentvolumeclaims"]
      verbs: ["get", "list", "watch", "update"]
    - apiGroups: ["storage.k8s.io"]
      resources: ["storageclasses"]
      verbs: ["get", "list", "watch"]
    - apiGroups: [""]
      resources: ["events"]
      verbs: ["create", "update", "patch"]

- kind: ClusterRoleBinding
  apiVersion: rbac.authorization.k8s.io/v1
  metadata:
    name: run-nfs-client-provisioner
  subjects:
    - kind: ServiceAccount
      name: nfs-client-provisioner
      namespace: ${TARGET_NAMESPACE}
  roleRef:
    kind: ClusterRole
    name: nfs-client-provisioner-runner
    apiGroup: rbac.authorization.k8s.io

- kind: Role
  apiVersion: rbac.authorization.k8s.io/v1
  metadata:
    name: nfs-client-provisioner
    namespace: ${TARGET_NAMESPACE}
  rules:
    - apiGroups: [""]
      resources: ["endpoints"]
      verbs: ["get", "list", "watch", "create", "update", "patch"]
    - apiGroups: ["security.openshift.io"]
      resourceNames: ["hostmount-anyuid"]
      resources: ["securitycontextconstraints"]
      verbs: ["use"]

- kind: RoleBinding
  apiVersion: rbac.authorization.k8s.io/v1
  metadata:
    name: nfs-client-provisioner
    namespace: ${TARGET_NAMESPACE}
  subjects:
    - kind: ServiceAccount
      name: nfs-client-provisioner
  roleRef:
    kind: Role
    name: nfs-client-provisioner
    apiGroup: rbac.authorization.k8s.io

- kind: Deployment
  apiVersion: apps/v1
  metadata:
    name: nfs-client-provisioner
    namespace: ${TARGET_NAMESPACE}
  spec:
    replicas: 1
    selector:
      matchLabels:
        app: nfs-client-provisioner
    strategy:
      type: Recreate
    template:
      metadata:
        labels:
          app: nfs-client-provisioner
      spec:
        serviceAccountName: nfs-client-provisioner
        containers:
          - name: nfs-client-provisioner
            image: ${PROVISIONER_IMAGE}
            volumeMounts:
              - name: nfs-client-root
                mountPath: /persistentvolumes
            env:
              - name: PROVISIONER_NAME
                value: ${PROVISIONER_NAME}
              - name: NFS_SERVER
                value: ${NFS_SERVER}
              - name: NFS_PATH
                value: ${NFS_PATH}
        volumes:
          - name: nfs-client-root
            nfs:
              server: ${NFS_SERVER}
              path: ${NFS_PATH}

- apiVersion: storage.k8s.io/v1
  kind: StorageClass
  metadata:
    name: managed-nfs-storage
    annotations:
      storageclass.kubernetes.io/is-default-class: "true"
  provisioner: ${PROVISIONER_NAME}
  parameters:
    archiveOnDelete: "false"

parameters:
- description: Target namespace where nfs-client-provisioner will run.
  displayName: Target namespace
  name: TARGET_NAMESPACE
  required: true
  value: openshift-nfs-provisioner
- name: NFS_SERVER
  required: true
  value: xxx.xxx.xxx.xxx ## IP of the host which runs the NFS server
- name: NFS_PATH
  required: true
  value: /srv/nfs-storage-pv-user-pvs ## folder which was configured on the NFS server
- name: PROVISIONER_IMAGE
  value: registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
- name: PROVISIONER_NAME
  value: "nfs-client-provisioner"
EOF

Deploy the template: oc process -f nfs-provisioner-template.yaml | oc apply -f -

Deploying a Test-workload

oc -n test1 create -f - <<EOF
kind: Deployment
apiVersion: apps/v1
metadata:
  name: ubi9
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ubi9
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: ubi9
    spec:
      storageClassName: managed-nfs-storage
      volumes:
        - name: pvc
          persistentVolumeClaim:
            claimName: pvc
      containers:
        - name: ubi
          image: 'registry.access.redhat.com/ubi9/ubi-micro:latest'
          volumeMounts:
            - name: pvc
              mountPath: /pvc
          command:
            - /bin/sh
            - '-c'
            - |
              sleep infinity
EOF

Create the first PersistentVolumeClaim either via the OpenShift Webconsole or via oc:

oc -n test create -f - <<EOF
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: managed-nfs-storage
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 5Gi
EOF

Local Storage

Option 1:

Installing the Local Storage Operator

Option 2:

Logical Volume Manager Storage installation

Installation via yaml:

oc apply -f https://raw.githubusercontent.com/rguske/openshift-day-two/refs/heads/main/manifests/lvm-storage-operator.yaml

Via Operator Web Console

Install the Logical Volume Cluster only including the SSD with the by-path identifier:

ls -li /dev/disk/by-path

oc apply -f https://raw.githubusercontent.com/rguske/openshift-day-two/refs/heads/main/manifests/lvmcluster.yaml

Create a test pvc:

oc create -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lvm-block-1
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Block
  resources:
    requests:
      storage: 10Gi
    limits:
      storage: 20Gi
  storageClassName: lvms-vg1
EOF

USB Client Passthrough

Not supported in Red Hat OpenShift Virtualization!

Kubevirt - Client Passthrough

From the official docs:

Support for redirection of client's USB device was introduced in release v0.44. This feature is not enabled by default. To enable it, add an empty clientPassthrough under devices, as such:

Adjust the VM Configuration (specs)

spec:
  domain:
    devices:
      clientPassthrough: {}

There are two ways of redirecting the same USB devices: Either using its device's vendor and product information or the actual bus and device address information. In Linux, you can gather this info with lsusb, a redacted example below:

Identify USB Vendor and Product ID

Connect an USB device like e.g. an external CD-Rom device. I've connected it to my MacBook, installed lsusb via brew and checked for the Vendor ID and Product ID.

lsusb
[...]
Bus 002 Device 001: ID 0e8d:1806 MediaTek Inc. MT1806  Serial: R8RY6GAC60008Y
[...]

Connect to your VM using `virtctl`

Connect to your VM running on OpenShift Virtualization.

virtctl console rguske-rhel9
Successfully connected to rguske-rhel9 console. The escape sequence is ^]

rguske-rhel9 login:

[cloud-user@rguske-rhel9 ~]$ lsusb
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub

Start redirecting the USB Device

On your local machine, install virtctl and the usbredir. I've installed both using brew.

sudo virtctl usbredir 0e8d:1806 rguske-rhel9

{"component":"portforward","level":"info","msg":"port_arg: '127.0.0.1:49275'","pos":"client.go:166","timestamp":"2025-03-26T10:19:43.292294Z"}
{"component":"portforward","level":"info","msg":"args: '[--device 0e8d:1806 --to 127.0.0.1:49275]'","pos":"client.go:167","timestamp":"2025-03-26T10:19:43.293541Z"}
{"component":"portforward","level":"info","msg":"Executing commandline: 'usbredirect [--device 0e8d:1806 --to 127.0.0.1:49275]'","pos":"client.go:168","timestamp":"2025-03-26T10:19:43.293591Z"}
{"component":"portforward","level":"info","msg":"Connected to usbredirect at 610.549083ms","pos":"client.go:132","timestamp":"2025-03-26T10:19:43.903058Z"}

The output will show the redirection to your Virtual Machine.

On your target VM, you'll notice:

[151999.488527] usb 1-1: new high-speed USB device number 9 using xhci_hcd
[152000.279607] usb 1-1: New USB device found, idVendor=0e8d, idProduct=1806, bcdDevice= 0.00
[152000.280126] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[152000.280490] usb 1-1: Product: MT1806
[152000.280786] usb 1-1: Manufacturer: MediaTek Inc
[152000.281075] usb 1-1: SerialNumber: R8RY6GAC60008Y
[152000.548218] usb-storage 1-1:1.0: USB Mass Storage device detected
[152000.551594] scsi host7: usb-storage 1-1:1.0
[152001.907628] scsi 7:0:0:0: CD-ROM            ASUS     SDRW-08D3S-U     F201 PQ: 0 ANSI: 0
[152002.595801] sr 7:0:0:0: [sr0] scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
[152003.026401] sr 7:0:0:0: Attached scsi generic sg0 type 5

Using lsusb will show the connected device:

[cloud-user@rguske-rhel9 ~]$ lsusb
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 009: ID 0e8d:1806 MediaTek Inc. Samsung SE-208 Slim Portable DVD Writer
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub

Backup and restore OpenShift Cluster

Official Docs for 4.19

oc debug --as-root node/ocp-mk1.jarvis.lab

To use host binaries, run `chroot /host`. Instead, if you need to access host namespaces, run `nsenter -a -t 1`.
Pod IP: 192.168.42.2
If you don't see a command prompt, try pressing enter.
sh-5.1#

Change your root directory to /host in the debug shell:

chroot /host

If proxy is in use:

export HTTP_PROXY=http://<your_proxy.example.com>:8080
export HTTPS_PROXY=https://<your_proxy.example.com>:8080
export NO_PROXY=<example.com>

Run the cluster-backup.sh script:

The cluster-backup.sh script is maintained as a component of the etcd Cluster Operator and is a wrapper around the etcdctl snapshot save command.

/usr/local/bin/cluster-backup.sh /home/core/assets/backup

Starting pod/ocp-mk1jarvislab-debug-t6x4m ...
To use host binaries, run `chroot /host`. Instead, if you need to access host namespaces, run `nsenter -a -t 1`.
Pod IP: 192.168.42.2
If you don't see a command prompt, try pressing enter.
sh-5.1# chroot /host
sh-5.1# /usr/local/bin/cluster-backup.sh /home/core/assets/backup
Certificate /etc/kubernetes/static-pod-certs/configmaps/etcd-all-bundles/server-ca-bundle.crt is missing. Checking in different directory
Certificate /etc/kubernetes/static-pod-resources/etcd-certs/configmaps/etcd-all-bundles/server-ca-bundle.crt found!
found latest kube-apiserver: /etc/kubernetes/static-pod-resources/kube-apiserver-pod-14
found latest kube-controller-manager: /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-5
found latest kube-scheduler: /etc/kubernetes/static-pod-resources/kube-scheduler-pod-5
found latest etcd: /etc/kubernetes/static-pod-resources/etcd-pod-2
56518b777f31c161916f516b21725a562461218761fbf03224014afd83c3e589
etcdctl version: 3.5.21
API version: 3.5
{"level":"info","ts":"2025-09-15T08:35:18.813919Z","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/home/core/assets/backup/snapshot_2025-09-15_083517.db.part"}
{"level":"info","ts":"2025-09-15T08:35:18.823486Z","logger":"client","caller":"v3@v3.5.21/maintenance.go:212","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2025-09-15T08:35:18.823575Z","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"https://192.168.42.2:2379"}
{"level":"info","ts":"2025-09-15T08:35:21.367146Z","logger":"client","caller":"v3@v3.5.21/maintenance.go:220","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2025-09-15T08:35:22.483373Z","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"https://192.168.42.2:2379","size":"289 MB","took":"3 seconds ago"}
{"level":"info","ts":"2025-09-15T08:35:22.484415Z","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/home/core/assets/backup/snapshot_2025-09-15_083517.db"}
Snapshot saved at /home/core/assets/backup/snapshot_2025-09-15_083517.db
{"hash":2597648169,"revision":49148119,"totalKey":15553,"totalSize":288808960}
snapshot db and kube resources are successfully saved to /home/core/assets/backup

Two files saved into /home/core/assets/backup

ls /home/core/assets/backup
snapshot_2025-09-15_083517.db  static_kuberesources_2025-09-15_083517.tar.gz

snapshot_.db: This file is the etcd snapshot. The cluster-backup.sh script confirms its validity.
static_kuberesources_.tar.gz: This file contains the resources for the static pods. If etcd encryption is enabled, it also contains the encryption keys for the etcd snapshot.

Creating automated etcd backups

Official docs for 4.19

Example Executed Once:

apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
  name: cluster
spec:
  featureSet: TechPreviewNoUpgrade

apiVersion: operator.openshift.io/v1alpha1
kind: EtcdBackup
metadata:
  name: etcd-single-backup
  namespace: openshift-etcd
spec:
  pvcName: etcd-backup-pvc

Example Scheduled Executions

apiVersion: config.openshift.io/v1alpha1
kind: Backup
metadata:
  name: etcd-recurring-backup
spec:
  etcd:
    schedule: "20 4 * * *"
    timeZone: "UTC"
    pvcName: etcd-backup-pvc

Load-aware rebalancing using the Kubernetes Descheduler

Official Docs for 4.19

You can benefit from descheduling running pods in situations such as the following:

Nodes are underutilized or overutilized.

Pod and node affinity requirements, such as taints or labels, have changed and the original scheduling decisions are no longer appropriate for certain nodes.

Node failure requires pods to be moved.

New nodes are added to clusters.

Pods have been restarted too many times.

The KubeDescheduler can be installed via the OpertorHub or via appropriate manifest files:

---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-kube-descheduler-operator
  namespace: openshift-kube-descheduler-operator
spec:
  targetNamespaces:
  - openshift-kube-descheduler-operator
  upgradeStrategy: Default
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  labels:
    operators.coreos.com/cluster-kube-descheduler-operator.openshift-kube-descheduler-op: ""
  name: cluster-kube-descheduler-operator
  namespace: openshift-kube-descheduler-operator
spec:
  channel: stable
  installPlanApproval: Automatic
  name: cluster-kube-descheduler-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace

The following configuration will evict long-running pods and balances resource usage between nodes.

See further profile specific info here: LifecycleAndUtilization

apiVersion: operator.openshift.io/v1
kind: KubeDescheduler
metadata:
  name: cluster
  namespace: openshift-kube-descheduler-operator
spec:
  logLevel: Normal
  mode: Automatic
  operatorLogLevel: Normal
  deschedulingIntervalSeconds: 3600
  profileCustomizations:
    devActualUtilizationProfile: PrometheusCPUCombined
    devDeviationThresholds: AsymmetricLow
    devEnableSoftTainter: true
  profiles:
    - LifecycleAndUtilization
    - EvictPodsWithPVC
    - EvictPodsWithLocalStorage
  managementState: Managed

Pod with external NetworkAccess

Working example:

apiVersion: v1
kind: Pod
metadata:
  name: rhel-support-tools-localnet-50
  namespace: default
  annotations:
    k8s.v1.cni.cncf.io/networks: |
      [{
        "name": "localnet-50",
        "interface": "net1",
        "ips": [ "192.168.xxx.xxx/24" ],
        "gateway": [ "192.168.xxx.1" ],
        "default-route": ["192.168.xxx.1"]
      }]
spec:
  containers:
  - name: rhel-support-tools
    image: registry.redhat.io/rhel9/support-tools:9.7
    command: ["/bin/bash","-c","sleep infinity"]

Egress IP

Node --> Pod (EgressIP) --curl--> external Webserver

install Podman on your jumphost

sudo install -y podman

start a simple nginx pod:

podman run -ti --rm -p 8080:8080 quay.io/openshift-examples/simple-http-server:latest

configure the RHEL firewall:

sudo firewall-cmd --permanent --zone=public --add-port=8080/tcp

Egress for worker nodes:

oc get nodes -l node-role.kubernetes.io/worker

ocp-mk42-cp1.jarvislab.guske.io
ocp-mk42-cp2.jarvislab.guske.io

Label the worker nodes:

for node in $(oc get nodes -o jsonpath='{.items[*].metadata.name}'); do echo ${node} ; oc label node/${node}  k8s.ovn.org/egress-assignable="" ; done

Create Egress object:

oc apply -f - <<EOF
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egress-poc
spec:
  egressIPs:
  - 192...
  namespaceSelector:
    matchLabels:
      egress: poc
EOF

rollout a test deployment

oc new-project poc-egress

oc apply -k git@github.com:openshift-examples/kustomize/components/simple-http-server

oc rsh deployment/simple-http-server
curl -i http://192.168...:8080

label the namespace:

oc label namespace/poc-egress egress=poc

VirtualMachinePool (VMPool)

apiVersion: pool.kubevirt.io/v1alp1
kind: VirtualMachinePool
metadata:
  name: vm-pool-cirros
  namespace: eventing
spec:
  replicas: 0
  selector:
    matchLabels:
      kubevirt.io/vmpool: vm-pool-cirros
  virtualMachineTemplate:
    metadata:
      labels:
        kubevirt.io/vmpool: vm-pool-cirros
    spec:
      template:
        metadata:
          labels:
            kubevirt.io/vmpool: vm-pool-cirros
        spec:
          domain:
            devices:
              disks:
                - disk:
                    bus: virtio
                  name: containerdisk
            resources:
              requests:
                memory: 128Mi
          volumes:
            - containerDisk:
                image: 'docker.io/kubevirt/cirros-container-disk-demo:latest'
              name: containerdisk

NFS Volume Mount

create PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv
spec:
  storageClassName: "storageClass"
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  nfs:
    path: "/fs/ess/group/openshift_test"
    server: "xxx.xxx.xxx.xxx"
    readOnly: false

create pvc

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: nfs-pvc
spec:
  storageClassName: "storageClass"
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  volumeName: nfs-pv

create deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfs-mounter
  labels:
    app: nfs-mounter
spec:
  selector:
    matchLabels:
      app: nfs-mounter
  template:
    metadata:
      annotations:
        k8s.v1.cni.cncf.io/networks: |
          [{
            "name": "localnet-550",
            "namespace": "default",
            "interface": "net1",
            "ips": ["10.xxx.xxx.xxx/23"],
            "gateway": ["10.xxx.xxx.1"],
            "default-route": ["10.xxx.xxx.1"],
            "dns": {"nameservers": ["xxx.xxx.xxx.xxx"]}
          }]
      labels:
        app: nfs-mounter
    spec:
      volumes:
        - name: nfs-vol
          persistentVolumeClaim:
            claimName: nfs-pvc
      containers:
        - name: app
          image: registry.redhat.io/rhel9/support-tools:9.7
          command: ["/bin/sh", "-c", "sleep infinity"]
          volumeMounts:
            - mountPath: /mnt/vol1
              name: nfs-vol

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
manifests		manifests
.DS_Store		.DS_Store
README.md		README.md

rguske/openshift-day-two

Folders and files

Latest commit

History

Repository files navigation