openshift metrics with dynamic storage provision

Openshift metrics support Persistant Storage ( USE_PERSISTENT_STORAGE=true option ) and dynamic storage provisioning ( DYNAMICALLY_PROVISION_STORAGE=true ) using storageclasses. To learn more about kubernetes storage classes, read documentation at this location kubernetes documentation

In order to use these parameters, it is necessary to have

  • have OCP platform configured for particular storage backend, check OCP documentation for details how to configure OCP to work with particular storage backend. OCP supports many storage backends, so pick up one which is closes to you use case.
  • configure storage class which will provide storage for persistent storage which will be used by Openshift metrics

In metrics templates we can see there is also dynamic pv template from where we can see that Persistent Volume claim is defined as

- kind: PersistentVolumeClaim
  apiVersion: v1
    name: "${PV_PREFIX}-${NODE}"
      metrics-infra: hawkular-cassandra
    annotations: "dynamic"
    - ReadWriteOnce
        storage: "${PV_SIZE}"

Until, storageclass name is fixed as parameter upstream, it is either necessary to have storageclass name dynamic, or if there is request to use storage class with different name then it is necessary to rebuild metrics images. Rebuilding metrics images is quite easy and can be done following below steps ( I assume that images have already pre-built part already )

# git clone
# do necessary changes in template related to storageclass 

# cd origin-metrics/hack 
# ./ --prefix=new_image --version=1.0

Once this finish, the name of desired storage class will be part of deployer pod and configuring metrics with dynamic storage provisioning will work as expected

#dynamic-storage-provisioning, #kubernetes-dynamic-storage, #ocp, #openshift-container-platform, #openshift-metrics

Remove CNS configuration from OCP cluster

WARNING: Below steps are destructive, follow them on your own responsibility

CNS – Container Native Storage
OCP – Openshift Container Platform

CNS cluster can be part of OCP cluster, and that means running CNS cluster inside OCP cluster and have all managing by OCP. Read more about OCP here and about CNS here.

This post is not going to be about how to setup CNS / OCP, if you want to learn how to setup CNS then follow documentation links. It will be about how to remove CNS configuration from OCP cluster. Why would anybody want to do this? I see below two as most obvious

  • stop using CNS as storage backend and free up resources for other projects
  • test various configurations and setups before going with final one, and during configuration testing it is necessary to clean up configuration and start over.

Deleting CNS pods and storage configuration will result in data loss , but assuming you know what and why you are doing this, it is safe to play with this.
So, how to delete / recreate CNS configuration from OCP cluster? Steps are below!

CNS itself provides cns-deploy

# cns-deploy --abort 

if run in namespace where CNS pods are created, it will report below output

# cns-deploy --abort
Multiple CLI options detected. Please select a deployment option.
[O]penShift, [K]ubernetes? [O/o/K/k]: O
Using OpenShift CLI.
zelko     Active    12h
Using namespace "zelko".
Do you wish to abort the deployment?
[Y]es, [N]o? [Default: N]: Y
No resources found
deploymentconfig "heketi" deleted
service "heketi" deleted
route "heketi" deleted
service "heketi-storage-endpoints" deleted
serviceaccount "heketi-service-account" deleted
template "deploy-heketi" deleted
template "heketi" deleted

from this it is visible that deploymentconfig, services, serviceaccounts, and templates were deleted, but not labels on nodes, I opened BZ

This is first step,however CNS pods are still present, and this is by CNS design, CNS pods should not be deleted so easily as deleting them will destroy data, but in this specific case and for reasons listed above we want to delete them.

# oc get pods -n zelko 
NAME              READY     STATUS    RESTARTS   AGE
glusterfs-72bps   1/1       Running   0          12h
glusterfs-fg3k5   1/1       Running   0          12h
glusterfs-gb9h4   1/1       Running   0          12h
glusterfs-hn0gk   1/1       Running   0          12h
glusterfs-jsrn8   1/1       Running   0          12h

deleting CNS project will delete CNS pods

# oc delete project zelko

cns-deploy --abort and oc delete project zelko can be reduced to deleting only project which will give same result

This will not clean up all stuff on CNS nodes, there will be still PV/VG/LVs created by CNS configuration, and if logged to OCP node which earlier hosted CNS pods, there will be visible

  vg_18c5f1249a65b678dd5a904fd70a9cd8   1   0   0 wz--n- 199.87g 199.87g
  vg_7137ba2d9b189997110f53a6e7b1a5e4   1   2   0 wz--n- 199.87g 197.85g
# pvs
 PV         VG                                  Fmt  Attr PSize   PFree  
  /dev/xvdc  vg_7137ba2d9b189997110f53a6e7b1a5e4 lvm2 a--  199.87g 197.85g
  /dev/xvdd  vg_18c5f1249a65b678dd5a904fd70a9cd8 lvm2 a--  199.87g 199.87g
# lvs                    
  brick_29ec6b1183398e9513676e586db1adb5 vg_7137ba2d9b189997110f53a6e7b1a5e4 Vwi-a-tz--  2.00g tp_29ec6b1183398e9513676e586db1adb5        0.66                                   
  tp_29ec6b1183398e9513676e586db1adb5    vg_7137ba2d9b189997110f53a6e7b1a5e4 twi-aotz--  2.00g                                            0.66   0.33                  

recreating CNS as long as these are present from previous configuration will not work. For removing logical volumes, fastest approach I found was vgchange -an volumegroup; vgremove volumegroup --force

It is also necessary to clean up stuff in /var/lib/glusterd/, /etc/glusterfs/ and /var/lib/heketi/, below is list of files which needs to be removed before running again cns-deploy

# ls -l /var/lib/glusterd/
total 8
drwxr-xr-x. 3 root root 17 Feb 27 14:14 bitd
drwxr-xr-x. 2 root root 34 Feb 27 14:12 geo-replication
-rw-------. 1 root root 66 Feb 27 14:14
drwxr-xr-x. 3 root root 19 Feb 27 14:12 glusterfind
drwxr-xr-x. 3 root root 46 Feb 27 14:14 glustershd
drwxr-xr-x. 2 root root 40 Feb 27 14:12 groups
drwxr-xr-x. 3 root root 15 Feb 27 14:12 hooks
drwxr-xr-x. 3 root root 39 Feb 27 14:14 nfs
-rw-------. 1 root root 47 Feb 27 14:14 options
drwxr-xr-x. 2 root root 94 Feb 27 14:14 peers
drwxr-xr-x. 3 root root 17 Feb 27 14:14 quotad
drwxr-xr-x. 3 root root 17 Feb 27 14:14 scrub
drwxr-xr-x. 2 root root 31 Feb 27 14:14 snaps
drwxr-xr-x. 2 root root  6 Feb 27 14:12 ss_brick
drwxr-xr-x. 3 root root 29 Feb 27 14:14 vols

# ls -l /etc/glusterfs/
total 32
-rw-r--r--. 1 root root  400 Feb 27 14:12 glusterd.vol
-rw-r--r--. 1 root root 1001 Feb 27 14:12 glusterfs-georep-logrotate
-rw-r--r--. 1 root root  626 Feb 27 14:12 glusterfs-logrotate
-rw-r--r--. 1 root root 1822 Feb 27 14:12 gluster-rsyslog-5.8.conf
-rw-r--r--. 1 root root 2564 Feb 27 14:12 gluster-rsyslog-7.2.conf
-rw-r--r--. 1 root root  197 Feb 27 14:12 group-metadata-cache
-rw-r--r--. 1 root root  276 Feb 27 14:12 group-virt.example
-rw-r--r--. 1 root root  338 Feb 27 14:12 logger.conf.example

# ls -l /var/lib/heketi/
total 4
-rw-r--r--. 1 root root 219 Feb 27 14:14 fstab
drwxr-xr-x. 3 root root  49 Feb 27 14:14 mounts

do this on every machine which was previously part of CNS cluster, pay attention when deleting VGs/PVs/LVs to point to correct names/devices!!!

remove storagenode=glusterfs labels from nodes which were previously used in CNS deployment, check BZ for why is this necessary. To remove node labels, either use approach described here or oc edit node and then clean storagenode=glusterfs label.

After all this is done, then running cns-deploy should work as expected

#cns, #container-native-storage, #gluster, #ipv6, #lvm, #openshift-container-platform

OCP metrics error message “Error from server: No API token found for service account “metrics-deployer”

I wanted to recreate OCP – OpenShift Container Platform metrics and followed same upstream process as many times before but it was keep failing with

Error from server: No API token found for service account "metrics-deployer", retry after the token is automatically created and added to the service account 

huh, new trouble, luckily restarting master services helped in this case

# systemctl restart atomic-openshift-master-controllers; systemctl restart atomic-openshift-master-api

This was multimaster configuration, so necessary to restart master services on all masters.

Just writing it down in hope google will pick up tags and hopefully help someone with same issue

Happy hacking!

#atomic-openshift-master-api, #atomic-openshift-master-controlers, #kubernetes, #metrics, #ocp, #openshift-container-platform, #openshift-metrics