Running multiple CNS clusters in OCP cluster

CNS ( Container Native Storage ) is way to containerizes storage to run as pods inside OCP ( OpenShift Container Platform ). Documentation is very detailed – so start from there if you want to learn more.

In short, it requires minimal nodes which will serve as base for CNS pods, and three nodes are minimum, one can have 6, 9 and it should work fine.
If one decide to run multiple CNS clusters , eg two three node clusters instead one six node cluster that will work too and in below text will be described how to achieve this.

Having two three nodes separate clusters is in my opinion better approach than having one big ( let’s say 6 node ) cluster. Multiple clusters can help to separate and/or organize block devices which are part of particular cluster and users can have different storage classes bound to different CNS storages in backend.

Let’s say we have below organization of devices

cns cluster1 [ node1, node2, node3 ]

node1 /dev/sdb
node2 /dev/sdb
node3 /dev/sdb 

cns cluster2 [ node4, node5, node6 ]

node4 /dev/nvme0n1 
node5 /dev/nvme0n1 
node6 /dev/nvme0n1 

CNS uses daemon sets and node labels to decide where to start cns pods. So pods will be started on nodes with specific label which is provided during CNS cluster setup.
Default label is glusterfs and it is applied on nodes and not namespaces – this means pods from other cluster will end on node with label even they are not supposed to be there.

Steps to build cns cluster1

  • Get CNS packages, if you are Red Hat Customer, check documentation for proper RHN channels
  • craft topology file
# oc new-project cnscluster1
# cns-deploy -n cnscluster1 --block-host 500 -g topologyfile1.json -y 

Steps to build cns cluster2

# oc new-project cnscluster2 
# cns-deploy -n cnselko --daemonset-label cnsclusterb --block-host 500  -g topologyfile2.json -y

I mentioned above that default label for CNS nodes is storagenode=glusterfs . In order to have two cns cluster, for second cluster it is necessary to use different daemonset-label which will ensure that nodes for second cluster have different label and based on that label daemonset will decide where to start cns pods in seconds cluster.

With both cluster up and running, the only remaining thing is to craft proper storage classes for use and we are done and ready to have two different storage classes for two different storage backends.

Advertisements

#cns, #openshift, #openshift-container-platform, #storage

Dynamic storage provisioning for OCP persistent templates

OCP ( Openshift Container Platform ) ships many templates – ephemeral and/or persistent ones. Change to project openshift

# oc project openshift 
# oc get templates

to see default delivered templates which you can use directly and without any further changes.

If you take a look at some of persistent templates you can notice that they have PVC definition as showed below

 {
            "apiVersion": "v1",
            "kind": "PersistentVolumeClaim",
            "metadata": {
                "name": "${SERVICE_NAME}"
            },
            "spec": {
                "accessModes": [
                    "ReadWriteOnce"
                ],
                "resources": {
                    "requests": {
                        "storage": "${VOLUME_CAPACITY}"
                    }
                }
            }
        },
 

For an persistent template to work one need to provide PVC with specific name in advance and prior to executing oc new-app template_name. This works fine, but I find it problematic to create pvc in advance
This is easy to overcome and change existing template / create new one which will support dynamic storage provision via storage classes.

First, we need to locate template we want to change in order to use dynamic storage provision
Note: I assume there is already storageclass in place

  1. edit desired permanent template, eg lets take postgresql-persistent and edit PersistentVolumeClaim section to be like
# oc get template -n openshift postgresql-persistent -o json > postgresql-persistent_storageclass.json  

Edit postgresql-persistent_storageclass.json by changing sections below sections

 
 "kind": "Template",
    "labels": {
        "template": "glusterfs-postgresql-persistent-template_storageclass"
    },

... rest of template ..... 

"name": "glusterfs-postgresql-persistent_storageclass", 

....... rest of template .... 

"selfLink": "/oapi/v1/namespaces/openshift/templates/glusterfs-postgresql-persistent_storageclass" 

.... .... rest of template .... 

adapt PersistentVolumeClaim section to support dynamic storage provisioning

 
{
            "apiVersion": "v1",
            "kind": "PersistentVolumeClaim",
            "metadata": {
                "name": "${DATABASE_SERVICE_NAME}",
                "annotations": {
                    "volume.beta.kubernetes.io/storage-class": "${STORAGE_CLASS}"
                        }
            },
            "spec": {
                "accessModes": [
                    "ReadWriteOnce"
                ],
                "resources": {
                    "requests": {
                        "storage": "${VOLUME_CAPACITY}"
                    }
                }
            }
        },

This will add new requirement for new parameter STORAGE_CLASS

At end of template in parameter section add this new parameter

{
            "name" : "STORAGE_CLASS",
            "description": "Storagecclass to use - here we expect storageclass name",
            "required": true,
            "value": "storageclassname"
        }

Save and create new template

# oc create -f postgresql-persistent_storageclass.json

After this point we can use this new template to start postgresql service and it will automatically allocate storage space from specified storage class

It is supposed that storageclass is already configured and in place, you can use any storage backend which support storage classes,in case you want to try CNS , you can follow instructions how to setup cns storage

 
# oc get storageclass
NAME           TYPE
cnsclass       kubernetes.io/glusterfs    

# oc new-project postgresql-storageclass
# oc new-app postgresql-persistent_storageclass -p STORAGE_CLASS=cnsclass

After application is started

# oc get pod
NAME                 READY     STATUS    RESTARTS   AGE
postgresql-1-zdvq2   1/1       Running   0          52m
[root@gprfc031 templates]# oc exec postgresql-1-zdvq2 -- mount | grep pgsql
10.16.153.123:vol_72cd8ef33eee365d4c7f75cffaa1681b on /var/lib/pgsql/data type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
[root@gprfc031 templates]# oc get pvc
NAME         STATUS    VOLUME                                     CAPACITY   ACCESSMODES   STORAGECLASS   AGE
postgresql   Bound     pvc-514d56e8-e7bf-11e7-8827-d4bed9b390df   1Gi        RWO           cnsclass       52m

it will mount volume from storage backend defined in storage class and start using it.

#cns, #gluster, #k8s, #kubernetes, #openshift, #persistant-storage, #pvc

Setup CNS on Openshift – using openshift-ansible

Let’s say you have fully functional Openshift Container Platform ( OCP ) cluster and you want to extend it with CNS ( Container Native Storage ) functionality. This can be done manually, using cns-deploy, or using openshift-ansible

In order to get installation working using openshift-ansible set of playbooks after openshift cluster installation do below

  • clone https://github.com/openshift/openshift-ansible
# git clone https://github.com/openshift/openshift-ansible
  • edit original openshift install inventory file and add there section related to [glusterfs]
glusterfs
/rest of playbook 
[glusterfs] 
node1 glusterfs_devices='["/dev/sdX", "/dev/sdY"]'
node2 glusterfs_devices='["/dev/sdX", "/dev/sdY"]'
node3 glusterfs_devices='["/dev/sdX", "/dev/sdY"]'

Ensure that on devices /dev/sdX and /dev/sdY is nothing worthy and no partitions present.

  • check openshift_storage_glusterfs and get an idea
    what parameters from there is necessary to change from their default value. Mostly it will be image version and namespace where cns pods will be running, in my case it was as
# cat cns.yaml
openshift_storage_glusterfs_name: cnstestelko
openshift_storage_glusterfs_version: 3.3.0-362
openshift_storage_glusterfs_heketi_version: 3.3.0-362
  • execute playbook
# ansible-playbook -i original_cluster_install_invntory.yaml openshift-ansible/playbooks/byo/openshift-glusterfs/config.yml -e @cns.yaml  

After last ansible job finish, there will be created new project named glusterfs , and new storage class which then can be used to when PVCs are created

# kubectl get storageclass
NAME                    TYPE
glusterfs-cnstestelko   kubernetes.io/glusterfs 
 
#kubectl get -o yaml storageclass glusterfs-cnstestelko
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: 2017-11-22T15:48:04Z
  name: glusterfs-cnstestelko
  resourceVersion: "3196130"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/glusterfs-cnstestelko
  uid: 871fd59e-cf9c-11e7-bb0a-d4bed9b390df
parameters:
  resturl: http://heketi-cnstestelko-glusterfs.router.default.svc.cluster.local
  restuser: admin
  secretName: heketi-cnstestelko-admin-secret
  secretNamespace: glusterfs
provisioner: kubernetes.io/glusterfs
# cat pvc.json 
{
  "kind": "PersistentVolumeClaim",
  "apiVersion": "v1",
  "metadata": {
    "name": "pvctest",
    "annotations": {
      "volume.beta.kubernetes.io/storage-class": "glusterfs-cnstestelko"
    }
  },
  "spec": {
    "accessModes": [
      "ReadWriteOnce"
    ],
    "resources": {
      "requests": {
        "storage": "1Gi"
      }
    }
  }
}
# kubectl apply -f pvc.json

# kubectl get pvc 
NAME      STATUS    VOLUME                                     CAPACITY   ACCESSMODES   STORAGECLASS            AGE
pvctest   Bound     pvc-ccb0d8a6-cfa6-11e7-bb0a-d4bed9b390df   1Gi        RWO           glusterfs-cnstestelko   27s

For more details about openshift/CNS check openshift-doc. Also , check openshift_storage_glusterfs

#cns, #container-native-storage, #linux, #openshift, #openshift-container-platform

glustersummit2017

Today finished Glustersummit 2017 in Prague / Czechia , and I can say it was one of the best conference I ever attended. This is my (probably) subjective feeling, but all was top level from organization to ideas presented during the talks.

I had oprotunity together with my colleague Shekhar Berry to present our work on topic : Scalability and Performance with Brick Multiplexing, whole list of talks presented can be found in gluster summit schedule

Slides of our presentation can be found at link

Group photo is at this link

#cns, #gluster, #glustersummit2017, #kubernetes, #openshift, #redhat

firewalld custom rules for OpenShift Container Platform

If you end with below while trying to restart iptables, then firewalld is the service you will be looking at

# systemctl restart iptables 
Failed to restart iptables.service: Unit is masked.

Firewalld service has set of commands, but most notable one is firewall-cmd and if run in help mode, it will present iteself in whole messy glory … try and run!

# firewall-cmd -h

will give you all necessary to proceed to play with firewalld rules.

Useful ones are

# systemctl status firewalld
# firewall-cmd --get-zones
# firewall-cmd --list-all-zones
# firewall-cmd --get-default-zone
# firewall-cmd --get-active-zones
# firewall-cmd --info-zone=public

and hundredths of others, man firewal-cmd is man page to read.

If for some reason we have to change firewalld rules then that could be different experience than most linux users are get used.

In recent OpenShift installation you will notice many firewalld rules created by Openshift installation. An example of input chain is

Chain IN_public_allow (1 references)
  pkts bytes target     prot opt in     out     source               destination         

  598 31928 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:22 ctstate NEW
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:2379 ctstate NEW
   24  1052 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:443 ctstate NEW
   34  1556 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:80 ctstate NEW
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8053 ctstate NEW
    0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:10255 ctstate NEW
 2669  160K ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8443 ctstate NEW
    0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:4789 ctstate NEW
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:10250 ctstate NEW
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:10255 ctstate NEW
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8444 ctstate NEW
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:2380 ctstate NEW
13862 1488K ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:8053 ctstate NEW

Trying to add additional rule in IN_public_allow with classical iptables will not work. Firwealld has different approach.

ie. to add CNS ( Container Native Storage ) ports ( which are by default not open and that will be like that as long as CNS is not part of default OpenShift ansible installer ) then we need to run

# firewall-cmd --direct --add-rule ipv4 filter IN_public_allow 1 -m tcp -p tcp -m conntrack --ctstate NEW --dport 24007 -j ACCEPT
# firewall-cmd --direct --add-rule ipv4 filter IN_public_allow 1 -m tcp -p tcp -m conntrack --ctstate NEW --dport 24008 -j ACCEPT
# firewall-cmd --direct --add-rule ipv4 filter IN_public_allow 1 -m tcp -p tcp -m conntrack --ctstate NEW --dport 2222 -j ACCEPT
# firewall-cmd --direct --add-rule ipv4 filter IN_public_allow 1 -m tcp -p tcp -m conntrack --ctstate NEW -m multiport --dports 49152:49664 -j ACCEPT

keyword is --direct as it name says, it will interact with firewalld rules direct-ly. More about this here and here

After adding rules, if not saved with

# firewall-cmd --runtime-to--permanent

next restart of firewalld.service will clean ip them, so necessary to save rules. These rules will be written in /etc/firewalld/direct.xml

#fedora-2, #firewalld, #iptables, #openshift, #redhat

OpenShift : Error from server: User “$user” cannot list all nodes/pods in the cluster

The below OpenShift error messages can be quite annoying, and they appear if current login is not for system:admin
Example error messages

# oc get pods 
No resources found.
Error from server: User "system:anonymous" cannot list pods in project "default"
root@dhcp8-176: ~ # oc get nodes 
No resources found.
Error from server: User "system:anonymous" cannot list all nodes in the cluster

Trying to login with user admin will not help

# oc login -u admin 
Authentication required for https://dhcp8-144.example.net:8443 (openshift)
Username: admin
Password: 
Login successful.
You don't have any projects. You can try to create a new project, by running
    oc new-project 

root@dhcp8-176: ~ # oc get pods
No resources found.
Error from server: User "admin" cannot list pods in project "default"
root@dhcp8-176: ~ # oc get nodes 
No resources found.
Error from server: User "admin" cannot list all nodes in the cluster

To get rid of it, login as system:admin

# oc login -u system:admin

what it does and what certificates reads in order to succeed is possible to see if last command run with --loglevel=10

# oc login -u system:admin --logleve=10

#kubernetes, #openshift

etcd error message “etcd failed to send out hearbeat on time”

… etcd distributed key value store that provides a reliable way to store data across a cluster of machines per 1 and 2. ETCD is very sensitive on delays in networks, and not only in networks but all kind of overlay sluggishness of etcd cluster nodes can lead to complete kubernets cluster functionality problems.

At time when OpenShift/Kubernetes cluster starts reporting error messages as showed below, cluster will already behave inappropriate and pods scheduling / deleting will not work as expected and problems will be more than visible

Sep 27 00:04:01 dhcp7-237 etcd: failed to send out heartbeat on time (deadline exceeded for 1.766957688s)
Sep 27 00:04:01 dhcp7-237 etcd: server is likely overloaded
Sep 27 00:04:01 dhcp7-237 etcd: failed to send out heartbeat on time (deadline exceeded for 1.766976918s)
Sep 27 00:04:01 dhcp7-237 etcd: server is likely overloaded

systemctl status etcd output

 systemctl status etcd
● etcd.service - Etcd Server
   Loaded: loaded (/usr/lib/systemd/system/etcd.service; enabled; vendor preset: disabled)
   Active: active (running) since Sat 2016-10-01 09:18:37 EDT; 5h 20min ago
 Main PID: 11970 (etcd)
   Memory: 1.0G
   CGroup: /system.slice/etcd.service
           └─11970 /usr/bin/etcd --name=dhcp6-138.example.net --data-dir=/var/lib/etcd/ --listen-client-urls=https://172.16.6.138:2379

Oct 01 14:38:55 dhcp6-138.example.net etcd[11970]: server is likely overloaded
Oct 01 14:38:56 dhcp6-138.example.net etcd[11970]: failed to send out heartbeat on time (deadline exceeded for 377.70994ms)
Oct 01 14:38:56 dhcp6-138.example.net etcd[11970]: server is likely overloaded
Oct 01 14:38:56 dhcp6-138.example.net etcd[11970]: failed to send out heartbeat on time (deadline exceeded for 377.933298ms)
Oct 01 14:38:56 dhcp6-138.example.net etcd[11970]: server is likely overloaded
Oct 01 14:38:58 dhcp6-138.example.net etcd[11970]: failed to send out heartbeat on time (deadline exceeded for 1.226630142s)
Oct 01 14:38:58 dhcp6-138.example.net etcd[11970]: server is likely overloaded
Oct 01 14:38:58 dhcp6-138.example.net etcd[11970]: failed to send out heartbeat on time (deadline exceeded for 1.226803192s)
Oct 01 14:38:58 dhcp6-138.example.net etcd[11970]: server is likely overloaded
Oct 01 14:39:07 dhcp6-138.example.net etcd[11970]: the clock difference against peer f801f8148b694198 is too high [1.078081179s > 1s]

# systemctl status etcd -l will also have similar messages,and check these too

ETCD configuration file is located in /etc/etcd/etcd.conf and has similar content as below, this one is from RHEL, other OSes can have it a bit changed

ETCD_NAME=dhcp7-237.example.net
ETCD_LISTEN_PEER_URLS=https://172.16.7.237:2380
ETCD_DATA_DIR=/var/lib/etcd/
ETCD_HEARTBEAT_INTERVAL=6000
ETCD_ELECTION_TIMEOUT=30000
ETCD_LISTEN_CLIENT_URLS=https://172.16.7.237:2379

ETCD_INITIAL_ADVERTISE_PEER_URLS=https://172.16.7.237:2380
ETCD_INITIAL_CLUSTER=dhcp7-241.example.net=https://172.16.7.241:2380,dhcp7-237.example.net=https://172.16.7.237:2380,dhcp7-239.example.net=https://172.16.7.239:2380
ETCD_INITIAL_CLUSTER_STATE=new
ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-1
ETCD_ADVERTISE_CLIENT_URLS=https://172.16.7.237:2379


ETCD_CA_FILE=/etc/etcd/ca.crt
ETCD_CERT_FILE=/etc/etcd/server.crt
ETCD_KEY_FILE=/etc/etcd/server.key
ETCD_PEER_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_CERT_FILE=/etc/etcd/peer.crt
ETCD_PEER_KEY_FILE=/etc/etcd/peer.key

bold parameters in above configuration files are ones we want to change ETCD_HEARTBEAT_INTERVAL and ETCD_ELECTION_TIMEOUT and there is not unified value for all, it is necessary to play with different values and find out what is best. For most cases default (500/2500) will be fine.

After changing /etc/etcd/etc.conf do not forget to restart etcd service

# systemctl restart etcd

Below issue affecting ETCD nodes can lead to problem described in this post

  • network latency
  • storage latency
  • combination of network latency and storage latency

if network latency is low, then check storage which is used by Kubernets/OpenShift ETCD servers. This is workaround for case when root cause is discovered and changes as stated in this post are performed in order to mitigate issue when no other option is possible. First and better solution would be to solve issue at its roots by fixing problematic subsystem(s).

In my particular case storage subsystem was slow and not possible to change that without bunch of $$$

References : etcd documentation

#etcd, #k8s, #kubernetes, #linux, #openshift, #redhat, #storage