Setup CNS on Openshift – using openshift-ansible

Let’s say you have fully functional Openshift Container Platform ( OCP ) cluster and you want to extend it with CNS ( Container Native Storage ) functionality. This can be done manually, using cns-deploy, or using openshift-ansible

In order to get installation working using openshift-ansible set of playbooks after openshift cluster installation do below

  • clone https://github.com/openshift/openshift-ansible
# git clone https://github.com/openshift/openshift-ansible
  • edit original openshift install inventory file and add there section related to [glusterfs]
glusterfs
/rest of playbook 
[glusterfs] 
node1 glusterfs_devices='["/dev/sdX", "/dev/sdY"]'
node2 glusterfs_devices='["/dev/sdX", "/dev/sdY"]'
node3 glusterfs_devices='["/dev/sdX", "/dev/sdY"]'

Ensure that on devices /dev/sdX and /dev/sdY is nothing worthy and no partitions present.

  • check openshift_storage_glusterfs and get an idea
    what parameters from there is necessary to change from their default value. Mostly it will be image version and namespace where cns pods will be running, in my case it was as
# cat cns.yaml
openshift_storage_glusterfs_name: cnstestelko
openshift_storage_glusterfs_version: 3.3.0-362
openshift_storage_glusterfs_heketi_version: 3.3.0-362
  • execute playbook
# ansible-playbook -i original_cluster_install_invntory.yaml openshift-ansible/playbooks/byo/openshift-glusterfs/config.yml -e @cns.yaml  

After last ansible job finish, there will be created new project named glusterfs , and new storage class which then can be used to when PVCs are created

# kubectl get storageclass
NAME                    TYPE
glusterfs-cnstestelko   kubernetes.io/glusterfs 
 
#kubectl get -o yaml storageclass glusterfs-cnstestelko
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: 2017-11-22T15:48:04Z
  name: glusterfs-cnstestelko
  resourceVersion: "3196130"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/glusterfs-cnstestelko
  uid: 871fd59e-cf9c-11e7-bb0a-d4bed9b390df
parameters:
  resturl: http://heketi-cnstestelko-glusterfs.router.default.svc.cluster.local
  restuser: admin
  secretName: heketi-cnstestelko-admin-secret
  secretNamespace: glusterfs
provisioner: kubernetes.io/glusterfs
# cat pvc.json 
{
  "kind": "PersistentVolumeClaim",
  "apiVersion": "v1",
  "metadata": {
    "name": "pvctest",
    "annotations": {
      "volume.beta.kubernetes.io/storage-class": "glusterfs-cnstestelko"
    }
  },
  "spec": {
    "accessModes": [
      "ReadWriteOnce"
    ],
    "resources": {
      "requests": {
        "storage": "1Gi"
      }
    }
  }
}
# kubectl apply -f pvc.json

# kubectl get pvc 
NAME      STATUS    VOLUME                                     CAPACITY   ACCESSMODES   STORAGECLASS            AGE
pvctest   Bound     pvc-ccb0d8a6-cfa6-11e7-bb0a-d4bed9b390df   1Gi        RWO           glusterfs-cnstestelko   27s

For more details about openshift/CNS check openshift-doc. Also , check openshift_storage_glusterfs

Advertisements

#cns, #container-native-storage, #linux, #openshift, #openshift-container-platform

setup cassandra cluster on centos7

Setup test cassandra cluster. On minimal centos7 install add cassandra repository in /etc/yum.repos.d/

# cat cassandra.repo

[cassandra]
name=Apache Cassandra
baseurl=https://www.apache.org/dist/cassandra/redhat/311x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://www.apache.org/dist/cassandra/KEYS

Install cassandra package

# yum install -y cassandra 

Edit /etc/cassandra/default.conf/cassandra.yaml and setup there below parameters

seeds: "ip1,ip2,ip3" 
listen_address 
rpc_address 

adapt above to specific cluster environment vars. listen_address and rpc_address has to be address of cassandra node

Open ports, 7000/tcp, 9042/tcp

firewall-cmd --zone=public --permanent --add-port=7000/tcp
firewall-cmd --zone=public --permanent --add-port=9042/tcp>
systemctl restart firewalld

start cassandra on all three boxes

# service cassandra start 
# chkonfig cassandra on 

After this nodetool status should list cassandra nodes

# nodetool status 
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load          Tokens            Owns (effective)  Host ID             Rack
UN  ip1  192.33 KiB  256          66.7%             ff1970ad-28d0-41fe-b749-30bfdb0912b3  rack1
UN  ip2  198.04 KiB  256          65.9%             9f72ab9a-7ea8-4b65-b0af-61006697c0fa  rack1
UN  ip3  179.76 KiB  256          67.4%             ef3ca4be-9c1c-4be6-bbca-c54239ad104c  rack1

#cassandra, #centos7

glustersummit2017

Today finished Glustersummit 2017 in Prague / Czechia , and I can say it was one of the best conference I ever attended. This is my (probably) subjective feeling, but all was top level from organization to ideas presented during the talks.

I had oprotunity together with my colleague Shekhar Berry to present our work on topic : Scalability and Performance with Brick Multiplexing, whole list of talks presented can be found in gluster summit schedule

Slides of our presentation can be found at link

Group photo is at this link

#cns, #gluster, #glustersummit2017, #kubernetes, #openshift, #redhat

openshift metrics with dynamic storage provision

Openshift metrics support Persistant Storage ( USE_PERSISTENT_STORAGE=true option ) and dynamic storage provisioning ( DYNAMICALLY_PROVISION_STORAGE=true ) using storageclasses. To learn more about kubernetes storage classes, read documentation at this location kubernetes documentation

In order to use these parameters, it is necessary to have

  • have OCP platform configured for particular storage backend, check OCP documentation for details how to configure OCP to work with particular storage backend. OCP supports many storage backends, so pick up one which is closes to you use case.
  • configure storage class which will provide storage for persistent storage which will be used by Openshift metrics

In metrics templates we can see there is also dynamic pv template from where we can see that Persistent Volume claim is defined as

- kind: PersistentVolumeClaim
  apiVersion: v1
  metadata:
    name: "${PV_PREFIX}-${NODE}"
    labels:
      metrics-infra: hawkular-cassandra
    annotations:
      volume.beta.kubernetes.io/storage-class: "dynamic"
  spec:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: "${PV_SIZE}"

Until, storageclass name is fixed as parameter upstream, it is either necessary to have storageclass name dynamic, or if there is request to use storage class with different name then it is necessary to rebuild metrics images. Rebuilding metrics images is quite easy and can be done following below steps ( I assume that images have already pre-built volume.beta.kubernetes.io/storage-class part already )

# git clone https://github.com/openshift/origin-metrics
# do necessary changes in template related to storageclass 

# cd origin-metrics/hack 
# ./build-images.sh --prefix=new_image --version=1.0

Once this finish, the name of desired storage class will be part of deployer pod and configuring metrics with dynamic storage provisioning will work as expected

#dynamic-storage-provisioning, #kubernetes-dynamic-storage, #ocp, #openshift-container-platform, #openshift-metrics

Remove CNS configuration from OCP cluster

WARNING: Below steps are destructive, follow them on your own responsibility

CNS – Container Native Storage
OCP – Openshift Container Platform

CNS cluster can be part of OCP cluster, and that means running CNS cluster inside OCP cluster and have all managing by OCP. Read more about OCP here and about CNS here.

This post is not going to be about how to setup CNS / OCP, if you want to learn how to setup CNS then follow documentation links. It will be about how to remove CNS configuration from OCP cluster. Why would anybody want to do this? I see below two as most obvious

  • stop using CNS as storage backend and free up resources for other projects
  • test various configurations and setups before going with final one, and during configuration testing it is necessary to clean up configuration and start over.

Deleting CNS pods and storage configuration will result in data loss , but assuming you know what and why you are doing this, it is safe to play with this.
So, how to delete / recreate CNS configuration from OCP cluster? Steps are below!

CNS itself provides cns-deploy

# cns-deploy --abort 

if run in namespace where CNS pods are created, it will report below output

# cns-deploy --abort
Multiple CLI options detected. Please select a deployment option.
[O]penShift, [K]ubernetes? [O/o/K/k]: O
Using OpenShift CLI.
NAME      STATUS    AGE
zelko     Active    12h
Using namespace "zelko".
Do you wish to abort the deployment?
[Y]es, [N]o? [Default: N]: Y
No resources found
deploymentconfig "heketi" deleted
service "heketi" deleted
route "heketi" deleted
service "heketi-storage-endpoints" deleted
serviceaccount "heketi-service-account" deleted
template "deploy-heketi" deleted
template "heketi" deleted

from this it is visible that deploymentconfig, services, serviceaccounts, and templates were deleted, but not labels on nodes, I opened BZ

This is first step,however CNS pods are still present, and this is by CNS design, CNS pods should not be deleted so easily as deleting them will destroy data, but in this specific case and for reasons listed above we want to delete them.

# oc get pods -n zelko 
NAME              READY     STATUS    RESTARTS   AGE
glusterfs-72bps   1/1       Running   0          12h
glusterfs-fg3k5   1/1       Running   0          12h
glusterfs-gb9h4   1/1       Running   0          12h
glusterfs-hn0gk   1/1       Running   0          12h
glusterfs-jsrn8   1/1       Running   0          12h

deleting CNS project will delete CNS pods

# oc delete project zelko

cns-deploy --abort and oc delete project zelko can be reduced to deleting only project which will give same result

This will not clean up all stuff on CNS nodes, there will be still PV/VG/LVs created by CNS configuration, and if logged to OCP node which earlier hosted CNS pods, there will be visible

#vgs
  
  vg_18c5f1249a65b678dd5a904fd70a9cd8   1   0   0 wz--n- 199.87g 199.87g
  vg_7137ba2d9b189997110f53a6e7b1a5e4   1   2   0 wz--n- 199.87g 197.85g
# pvs
 PV         VG                                  Fmt  Attr PSize   PFree  
  /dev/xvdc  vg_7137ba2d9b189997110f53a6e7b1a5e4 lvm2 a--  199.87g 197.85g
  /dev/xvdd  vg_18c5f1249a65b678dd5a904fd70a9cd8 lvm2 a--  199.87g 199.87g
# lvs                    
  brick_29ec6b1183398e9513676e586db1adb5 vg_7137ba2d9b189997110f53a6e7b1a5e4 Vwi-a-tz--  2.00g tp_29ec6b1183398e9513676e586db1adb5        0.66                                   
  tp_29ec6b1183398e9513676e586db1adb5    vg_7137ba2d9b189997110f53a6e7b1a5e4 twi-aotz--  2.00g                                            0.66   0.33                  

recreating CNS as long as these are present from previous configuration will not work. For removing logical volumes, fastest approach I found was vgchange -an volumegroup; vgremove volumegroup --force

It is also necessary to clean up stuff in /var/lib/glusterd/, /etc/glusterfs/ and /var/lib/heketi/, below is list of files which needs to be removed before running again cns-deploy

# ls -l /var/lib/glusterd/
total 8
drwxr-xr-x. 3 root root 17 Feb 27 14:14 bitd
drwxr-xr-x. 2 root root 34 Feb 27 14:12 geo-replication
-rw-------. 1 root root 66 Feb 27 14:14 glusterd.info
drwxr-xr-x. 3 root root 19 Feb 27 14:12 glusterfind
drwxr-xr-x. 3 root root 46 Feb 27 14:14 glustershd
drwxr-xr-x. 2 root root 40 Feb 27 14:12 groups
drwxr-xr-x. 3 root root 15 Feb 27 14:12 hooks
drwxr-xr-x. 3 root root 39 Feb 27 14:14 nfs
-rw-------. 1 root root 47 Feb 27 14:14 options
drwxr-xr-x. 2 root root 94 Feb 27 14:14 peers
drwxr-xr-x. 3 root root 17 Feb 27 14:14 quotad
drwxr-xr-x. 3 root root 17 Feb 27 14:14 scrub
drwxr-xr-x. 2 root root 31 Feb 27 14:14 snaps
drwxr-xr-x. 2 root root  6 Feb 27 14:12 ss_brick
drwxr-xr-x. 3 root root 29 Feb 27 14:14 vols


# ls -l /etc/glusterfs/
total 32
-rw-r--r--. 1 root root  400 Feb 27 14:12 glusterd.vol
-rw-r--r--. 1 root root 1001 Feb 27 14:12 glusterfs-georep-logrotate
-rw-r--r--. 1 root root  626 Feb 27 14:12 glusterfs-logrotate
-rw-r--r--. 1 root root 1822 Feb 27 14:12 gluster-rsyslog-5.8.conf
-rw-r--r--. 1 root root 2564 Feb 27 14:12 gluster-rsyslog-7.2.conf
-rw-r--r--. 1 root root  197 Feb 27 14:12 group-metadata-cache
-rw-r--r--. 1 root root  276 Feb 27 14:12 group-virt.example
-rw-r--r--. 1 root root  338 Feb 27 14:12 logger.conf.example

# ls -l /var/lib/heketi/
total 4
-rw-r--r--. 1 root root 219 Feb 27 14:14 fstab
drwxr-xr-x. 3 root root  49 Feb 27 14:14 mounts

do this on every machine which was previously part of CNS cluster, pay attention when deleting VGs/PVs/LVs to point to correct names/devices!!!

remove storagenode=glusterfs labels from nodes which were previously used in CNS deployment, check BZ for why is this necessary. To remove node labels, either use approach described here or oc edit node and then clean storagenode=glusterfs label.

After all this is done, then running cns-deploy should work as expected

#cns, #container-native-storage, #gluster, #ipv6, #lvm, #openshift-container-platform

firewalld custom rules for OpenShift Container Platform

If you end with below while trying to restart iptables, then firewalld is the service you will be looking at

# systemctl restart iptables 
Failed to restart iptables.service: Unit is masked.

Firewalld service has set of commands, but most notable one is firewall-cmd and if run in help mode, it will present iteself in whole messy glory … try and run!

# firewall-cmd -h

will give you all necessary to proceed to play with firewalld rules.

Useful ones are

# systemctl status firewalld
# firewall-cmd --get-zones
# firewall-cmd --list-all-zones
# firewall-cmd --get-default-zone
# firewall-cmd --get-active-zones
# firewall-cmd --info-zone=public

and hundredths of others, man firewal-cmd is man page to read.

If for some reason we have to change firewalld rules then that could be different experience than most linux users are get used.

In recent OpenShift installation you will notice many firewalld rules created by Openshift installation. An example of input chain is

Chain IN_public_allow (1 references)
  pkts bytes target     prot opt in     out     source               destination         

  598 31928 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:22 ctstate NEW
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:2379 ctstate NEW
   24  1052 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:443 ctstate NEW
   34  1556 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:80 ctstate NEW
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8053 ctstate NEW
    0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:10255 ctstate NEW
 2669  160K ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8443 ctstate NEW
    0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:4789 ctstate NEW
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:10250 ctstate NEW
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:10255 ctstate NEW
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8444 ctstate NEW
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:2380 ctstate NEW
13862 1488K ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:8053 ctstate NEW

Trying to add additional rule in IN_public_allow with classical iptables will not work. Firwealld has different approach.

ie. to add CNS ( Container Native Storage ) ports ( which are by default not open and that will be like that as long as CNS is not part of default OpenShift ansible installer ) then we need to run

# firewall-cmd --direct --add-rule ipv4 filter IN_public_allow 1 -m tcp -p tcp -m conntrack --ctstate NEW --dport 24007 -j ACCEPT
# firewall-cmd --direct --add-rule ipv4 filter IN_public_allow 1 -m tcp -p tcp -m conntrack --ctstate NEW --dport 24008 -j ACCEPT
# firewall-cmd --direct --add-rule ipv4 filter IN_public_allow 1 -m tcp -p tcp -m conntrack --ctstate NEW --dport 2222 -j ACCEPT
# firewall-cmd --direct --add-rule ipv4 filter IN_public_allow 1 -m tcp -p tcp -m conntrack --ctstate NEW -m multiport --dports 49152:49664 -j ACCEPT

keyword is --direct as it name says, it will interact with firewalld rules direct-ly. More about this here and here

After adding rules, if not saved with

# firewall-cmd --runtime-to--permanent

next restart of firewalld.service will clean ip them, so necessary to save rules. These rules will be written in /etc/firewalld/direct.xml

#fedora-2, #firewalld, #iptables, #openshift, #redhat

OCP metrics error message “Error from server: No API token found for service account “metrics-deployer”

I wanted to recreate OCP – OpenShift Container Platform metrics and followed same upstream process as many times before but it was keep failing with

Error from server: No API token found for service account "metrics-deployer", retry after the token is automatically created and added to the service account 

huh, new trouble, luckily restarting master services helped in this case

# systemctl restart atomic-openshift-master-controllers; systemctl restart atomic-openshift-master-api

This was multimaster configuration, so necessary to restart master services on all masters.

Just writing it down in hope google will pick up tags and hopefully help someone with same issue

Happy hacking!

#atomic-openshift-master-api, #atomic-openshift-master-controlers, #kubernetes, #metrics, #ocp, #openshift-container-platform, #openshift-metrics