Benchmark Postgresql in OpenShift Container Platform using pgbench

Openshift Container Platform (OCP) allows to run various applications out of box based on images and templates delivered with OCP installation. After OCP installation if executed below commands

# oc get -n openshift templates
# oc get -n openshift images

there will be showed all by default installed templates and images which can be used directly and without any modifications to start desired application.

Templates having persistant in name means that they are using storage for data. Checking an particular template we can see there that PVC definition inside template is as showed below

{
            "apiVersion": "v1",
            "kind": "PersistentVolumeClaim",
            "metadata": {
                "name": "${DATABASE_SERVICE_NAME}"
            },
            "spec": {
                "accessModes": [
                    "ReadWriteOnce"
                ],
                "resources": {
                    "requests": {
                        "storage": "${VOLUME_CAPACITY}"
                    }
                }
            }
        },

This will require that PVC with name ${DATABASE_SERVICE_NAME} is created before trying to create pod which will use it. From this we can see that there is not direct support for dynamic storage for ocp templates – RFC bug is opened and this will probably be fixed in future releases, it is tracked in BZ

The idea with templates supporting storage classes is to use for example below command to start application pod and at time of creation use storage from predefined storage class

# oc new-app --template= -p STORAGE_CLASS=storage_class_name

In order to make storage classes part of templates, it is necessary to edit it per BZ – 1559728

In this blog post I will show you how to start Postgresql pod with storage from predefined storage and how to benchmark Postgress database running inside OCP pod. In all my tests I am using dynamic storage provisioning and storage classes as only way to provide storage for OCP pods, manual storage preparation works too, but that is 2016ish way to create storage for pods and I find it easier to use storage classes.

In order to use dynamic storage provision with OCP templates ( remember bz ) we need to edit template to support dynamic storage provisioning for PVCs.

# oc get template -n openshift postgresql-persistent storageclass_postgresql-persistent.json

Edit storageclass_postgresql-persistent.json to have PVC section as below and in parameters section add new parameter STORAGE_CLASS_NAME

 
{
            "apiVersion": "v1",
            "kind": "PersistentVolumeClaim",
            "metadata": {
                "annotations":{
                        "volume.beta.kubernetes.io/storage-class": "${STORAGE_CLASS_NAME}"
                },
                "name": "${DATABASE_SERVICE_NAME}"
            },
            "spec": {
                "accessModes": [
                    "ReadWriteOnce"
                ],
                "resources": {
                    "requests": {
                        "storage": "${VOLUME_CAPACITY}"
                    }
                }
            }
        },

Parameters section

{
    "description" : "Storage class name to use",
    "displayName" : "Storage classs name",
    "name": "STORAGE_CLASS_NAME",
    "required": true

} 

After editing template per above instructions, it will be possible to use

# oc new-project testpostgresql 
# oc new-app --template=postgresql-persistent -p STORAGE_CLASS=storage_class_name

to create application which will dynamically allocated storage per storage class definition.

edit storageclass_postgresql-persistent.json, by
This is not the case now, you can use this template which is edited to accommodate dynamic storage provisioning.

Once you have template, load it and all will be prepared

# oc create -f template.json

In order to have this working it is necessary to have storage prepared in advance, this is usually case but it goes out of scope of this blog post.
However in this blog post it is possible to read how to setup CNS ( Container Native Storage ) storage solution to be used by application pods in OCP PaaS. Using CNS is one option, if cluster is running EC2, then it is also easy to setup storage class which can be used to consume storage space provided by ec2 cloud.

In this test, CNS storage will be used as storage backend providing storage for application pods.

Now we assume that there is functional storage class glusterfs-storage-block then we can run below to start pod

# oc new-project testpostgresql 
# oc new-app --template=postgresql-persistent -p STORAGE_CLASS=glusterfs-storage-block 

Once postgresql pod is started there will be visible

# oc get svc 
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
postgresql   ClusterIP   172.27.192.193           5432/TCP   8m

# oc get pods
NAME                 READY     STATUS    RESTARTS   AGE
postgresql-1-qvxrr   1/1       Running   0          6m

and also

# oc exec postgresql-1-qvxrr -- mount  | grep data
/dev/mapper/mpathh on /var/lib/pgsql/data type xfs (rw,relatime,seclabel,attr2,inode64,noquota)

so block storage originating from CNS block is mounted to /var/lib/pgsql/data

Postgresql pod runs, and now it is possible to execute pgbench on two different ways

First is to run pgbench in so called client-server mode

# oc exec postgresql-1-qvxrr -- env | egrep "_USER|_PASS"
POSTGRESQL_USER=userGQF
POSTGRESQL_PASSWORD=a75hXQYCsQfnS1LT

# oc get svc 
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
postgresql   ClusterIP   172.27.192.193          5432/TCP   30m

This is necessary in order for pgbench to work without asking for password

 
# vim /root/.pgpass 
# chmod 600 /root/.pgpass 

and add there 
172.27.192.193:5432:*:userGQF:a75hXQYCsQfnS1LT

This is postgresql feature and it is visible that format is serviceIP:port:*:user:password
Now we can ran pgbench

# pgbench -h 172.27.192.193  -p 5432 -i -s 1 sampledb -U userS8L
creating tables...
10000 tuples done.
20000 tuples done.
30000 tuples done.
40000 tuples done.
50000 tuples done.
60000 tuples done.
70000 tuples done.
80000 tuples done.
90000 tuples done.
100000 tuples done.
set primary key...
vacuum...done.

# pgbench -h 172.27.192.193  -p 5432 -c 2 -j 2 -t 1  sampledb -U userS8L
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 2
number of transactions per client: 1
number of transactions actually processed: 2/2
tps = 142.755175 (including connections establishing)
tps = 167.147215 (excluding connections establishing)

Another way is what proposed Graham Dumpelton from our OCP team suggested a to run pgbench inside postgresql pod directly

 
# oc exec -i postgresql-1-qvxrr -- bash -c "pgbench -i -s 1 sampledb"
creating tables...
100000 of 1000000 tuples (10%) done (elapsed 0.04 s, remaining 0.39 s)
200000 of 1000000 tuples (20%) done (elapsed 0.10 s, remaining 0.40 s)
300000 of 1000000 tuples (30%) done (elapsed 0.15 s, remaining 0.36 s)
400000 of 1000000 tuples (40%) done (elapsed 0.20 s, remaining 0.30 s)
500000 of 1000000 tuples (50%) done (elapsed 0.25 s, remaining 0.25 s)
600000 of 1000000 tuples (60%) done (elapsed 0.31 s, remaining 0.20 s)
700000 of 1000000 tuples (70%) done (elapsed 0.36 s, remaining 0.15 s)
800000 of 1000000 tuples (80%) done (elapsed 0.41 s, remaining 0.10 s)
900000 of 1000000 tuples (90%) done (elapsed 0.47 s, remaining 0.05 s)
1000000 of 1000000 tuples (100%) done (elapsed 0.52 s, remaining 0.00 s)

# oc exec -i postgresql-1-qvxrr -- bash -c "pgbench -c 10 -j 2 -t 10 sampledb"
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 10
number of transactions actually processed: 100/100
latency average: 14.361 ms
tps = 696.335188 (including connections establishing)
tps = 703.148347 (excluding connections establishing)
[root@perf148b pbench-agent]# 

Both of these methods works fine and will give identical result, thought I think later approach is a bit better – it does not require part related to setting up part related to /root/.pgpass
I wrote small script which supports second method from above. Check script’s readme to get insights how it can be used for pgbench testing.

Collect systems statistics during benchmark runs

Only knowing how Postgresql pod performs ( from tp/s point of view ) will give only partial result of benchmark. Beside this information it will be good to know

  • what is network traffic between application pods
  • what is iops load directed at storage network
  • how much memory is consumed on OCP node hosting Postgresql pod
  • status of CPU usage during test
  • and many other things can be also interested to check during test itself

Some of these information will be saved to various places in /var/log/* ( think /var/log/sar* ), but some of them will be short lived records in /proc and it would be good if there is a way to gather these too during test execution time. Luckily there is pbench tool developed by Red Hat’s perf/scale team which can help to collect system performance data during test execution. Pbench has many features and take time to read documentation in advance.

For me it is useful to see how OCP node where postgresql pod is scheduled behaves during test, and also to see load on storage sub-system and how it performs during postgresql load test

As part of Pbench is script pbench-user-benchmark which can as input accept above mentioned script and will collect system stats from desired machines.

This requires pbench to be setup and working in advance, check pbench repo for instructions where to find binaries and how to setup pbench.

Create Jenkins task to execute benchmark test

Above will work fine, but it can be better if above process is “Jenkis-ized”, that said if there is Jenkins job which will run benchmark test based on input requirements, if that is case then it is necessary to take into account that pgbench_test.sh uses oc tool to create and start pvc/pod.At time of this writing kubectl is not covered.

If pbench-user-benchmark is used then ensure pbench is installed and working fine. Check pbench github for details.

If oc is ran outside of OCP cluster ( eg, by installing oc on third machine and getting /root/.kube there ), then it is important to make sure that oc can view OCP cluster configuration and be able to create pods.

Another option is to make OCP master as Jenkins slave and bind Postgresql jenkins job to always execute on OCP master acting in this case as Jenkins slave

Below are graphs on how to create Jenkins job to fulfill desired requirement. I assume there is functional and working Jenkins server ( instructions how to install Jenkins server on CentOS can be found here )

Jenkins job creation

In Jenkins web New item -> Enter some descriptive name -> Freestyle project -> OK then fill necessary information

Make sure that job will run on machine where oc tool is present and available

In build section add

pbench-user-benchmark --config=${pbenchconfig} -- ${WORKSPACE}/postgresql/pgbench_test.sh -n ${namespace} -t ${transactions} -e ${template} -v ${vgsize} -m ${memsize} -i ${iterations} --mode ${mode} --clients ${clients} --threads ${threads} --storageclass ${storageclass}

After this all is prepared for starting jenkins job.
From main Jenkins job console -> Build with parameters -> Build
If all is fine…Jenkins console log will after some time will report Finished: SUCCESS

Happy Benchmark-Ing!

Advertisements

#cns, #jenkins, #kubernetes, #openshift, #performance, #pgbench, #postgresql

Running multiple CNS clusters in OCP cluster

CNS ( Container Native Storage ) is way to containerizes storage to run as pods inside OCP ( OpenShift Container Platform ). Documentation is very detailed – so start from there if you want to learn more.

In short, it requires minimal nodes which will serve as base for CNS pods, and three nodes are minimum, one can have 6, 9 and it should work fine.
If one decide to run multiple CNS clusters , eg two three node clusters instead one six node cluster that will work too and in below text will be described how to achieve this.

Having two three nodes separate clusters is in my opinion better approach than having one big ( let’s say 6 node ) cluster. Multiple clusters can help to separate and/or organize block devices which are part of particular cluster and users can have different storage classes bound to different CNS storages in backend.

Let’s say we have below organization of devices

cns cluster1 [ node1, node2, node3 ]

node1 /dev/sdb
node2 /dev/sdb
node3 /dev/sdb 

cns cluster2 [ node4, node5, node6 ]

node4 /dev/nvme0n1 
node5 /dev/nvme0n1 
node6 /dev/nvme0n1 

CNS uses daemon sets and node labels to decide where to start cns pods. So pods will be started on nodes with specific label which is provided during CNS cluster setup.
Default label is glusterfs and it is applied on nodes and not namespaces – this means pods from other cluster will end on node with label even they are not supposed to be there.

Steps to build cns cluster1

  • Get CNS packages, if you are Red Hat Customer, check documentation for proper RHN channels
  • craft topology file
# oc new-project cnscluster1
# cns-deploy -n cnscluster1 --block-host 500 -g topologyfile1.json -y 

Steps to build cns cluster2

# oc new-project cnscluster2 
# cns-deploy -n cnselko --daemonset-label cnsclusterb --block-host 500  -g topologyfile2.json -y

I mentioned above that default label for CNS nodes is storagenode=glusterfs . In order to have two cns cluster, for second cluster it is necessary to use different daemonset-label which will ensure that nodes for second cluster have different label and based on that label daemonset will decide where to start cns pods in seconds cluster.

With both cluster up and running, the only remaining thing is to craft proper storage classes for use and we are done and ready to have two different storage classes for two different storage backends.

#cns, #openshift, #openshift-container-platform, #storage

Dynamic storage provisioning for OCP persistent templates

OCP ( Openshift Container Platform ) ships many templates – ephemeral and/or persistent ones. Change to project openshift

# oc project openshift 
# oc get templates

to see default delivered templates which you can use directly and without any further changes.

If you take a look at some of persistent templates you can notice that they have PVC definition as showed below

 {
            "apiVersion": "v1",
            "kind": "PersistentVolumeClaim",
            "metadata": {
                "name": "${SERVICE_NAME}"
            },
            "spec": {
                "accessModes": [
                    "ReadWriteOnce"
                ],
                "resources": {
                    "requests": {
                        "storage": "${VOLUME_CAPACITY}"
                    }
                }
            }
        },
 

For an persistent template to work one need to provide PVC with specific name in advance and prior to executing oc new-app template_name. This works fine, but I find it problematic to create pvc in advance
This is easy to overcome and change existing template / create new one which will support dynamic storage provision via storage classes.

First, we need to locate template we want to change in order to use dynamic storage provision
Note: I assume there is already storageclass in place

  1. edit desired permanent template, eg lets take postgresql-persistent and edit PersistentVolumeClaim section to be like
# oc get template -n openshift postgresql-persistent -o json > postgresql-persistent_storageclass.json  

Edit postgresql-persistent_storageclass.json by changing sections below sections

 
 "kind": "Template",
    "labels": {
        "template": "glusterfs-postgresql-persistent-template_storageclass"
    },

... rest of template ..... 

"name": "glusterfs-postgresql-persistent_storageclass", 

....... rest of template .... 

"selfLink": "/oapi/v1/namespaces/openshift/templates/glusterfs-postgresql-persistent_storageclass" 

.... .... rest of template .... 

adapt PersistentVolumeClaim section to support dynamic storage provisioning

 
{
            "apiVersion": "v1",
            "kind": "PersistentVolumeClaim",
            "metadata": {
                "name": "${DATABASE_SERVICE_NAME}",
                "annotations": {
                    "volume.beta.kubernetes.io/storage-class": "${STORAGE_CLASS}"
                        }
            },
            "spec": {
                "accessModes": [
                    "ReadWriteOnce"
                ],
                "resources": {
                    "requests": {
                        "storage": "${VOLUME_CAPACITY}"
                    }
                }
            }
        },

This will add new requirement for new parameter STORAGE_CLASS

At end of template in parameter section add this new parameter

{
            "name" : "STORAGE_CLASS",
            "description": "Storagecclass to use - here we expect storageclass name",
            "required": true,
            "value": "storageclassname"
        }

Save and create new template

# oc create -f postgresql-persistent_storageclass.json

After this point we can use this new template to start postgresql service and it will automatically allocate storage space from specified storage class

It is supposed that storageclass is already configured and in place, you can use any storage backend which support storage classes,in case you want to try CNS , you can follow instructions how to setup cns storage

 
# oc get storageclass
NAME           TYPE
cnsclass       kubernetes.io/glusterfs    

# oc new-project postgresql-storageclass
# oc new-app postgresql-persistent_storageclass -p STORAGE_CLASS=cnsclass

After application is started

# oc get pod
NAME                 READY     STATUS    RESTARTS   AGE
postgresql-1-zdvq2   1/1       Running   0          52m
[root@gprfc031 templates]# oc exec postgresql-1-zdvq2 -- mount | grep pgsql
10.16.153.123:vol_72cd8ef33eee365d4c7f75cffaa1681b on /var/lib/pgsql/data type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
[root@gprfc031 templates]# oc get pvc
NAME         STATUS    VOLUME                                     CAPACITY   ACCESSMODES   STORAGECLASS   AGE
postgresql   Bound     pvc-514d56e8-e7bf-11e7-8827-d4bed9b390df   1Gi        RWO           cnsclass       52m

it will mount volume from storage backend defined in storage class and start using it.

#cns, #gluster, #k8s, #kubernetes, #openshift, #persistant-storage, #pvc

Setup CNS on Openshift – using openshift-ansible

Let’s say you have fully functional Openshift Container Platform ( OCP ) cluster and you want to extend it with CNS ( Container Native Storage ) functionality. This can be done manually, using cns-deploy, or using openshift-ansible

In order to get installation working using openshift-ansible set of playbooks after openshift cluster installation do below

  • clone https://github.com/openshift/openshift-ansible
# git clone https://github.com/openshift/openshift-ansible
  • edit original openshift install inventory file and add there section related to [glusterfs]
glusterfs
/rest of playbook 
[glusterfs] 
node1 glusterfs_devices='["/dev/sdX", "/dev/sdY"]'
node2 glusterfs_devices='["/dev/sdX", "/dev/sdY"]'
node3 glusterfs_devices='["/dev/sdX", "/dev/sdY"]'

Ensure that on devices /dev/sdX and /dev/sdY is nothing worthy and no partitions present.

  • check openshift_storage_glusterfs and get an idea
    what parameters from there is necessary to change from their default value. Mostly it will be image version and namespace where cns pods will be running, in my case it was as
# cat cns.yaml
openshift_storage_glusterfs_name: cnstestelko
openshift_storage_glusterfs_version: 3.3.0-362
openshift_storage_glusterfs_heketi_version: 3.3.0-362
  • execute playbook
# ansible-playbook -i original_cluster_install_invntory.yaml openshift-ansible/playbooks/byo/openshift-glusterfs/config.yml -e @cns.yaml  

After last ansible job finish, there will be created new project named glusterfs , and new storage class which then can be used to when PVCs are created

# kubectl get storageclass
NAME                    TYPE
glusterfs-cnstestelko   kubernetes.io/glusterfs 
 
#kubectl get -o yaml storageclass glusterfs-cnstestelko
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: 2017-11-22T15:48:04Z
  name: glusterfs-cnstestelko
  resourceVersion: "3196130"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/glusterfs-cnstestelko
  uid: 871fd59e-cf9c-11e7-bb0a-d4bed9b390df
parameters:
  resturl: http://heketi-cnstestelko-glusterfs.router.default.svc.cluster.local
  restuser: admin
  secretName: heketi-cnstestelko-admin-secret
  secretNamespace: glusterfs
provisioner: kubernetes.io/glusterfs
# cat pvc.json 
{
  "kind": "PersistentVolumeClaim",
  "apiVersion": "v1",
  "metadata": {
    "name": "pvctest",
    "annotations": {
      "volume.beta.kubernetes.io/storage-class": "glusterfs-cnstestelko"
    }
  },
  "spec": {
    "accessModes": [
      "ReadWriteOnce"
    ],
    "resources": {
      "requests": {
        "storage": "1Gi"
      }
    }
  }
}
# kubectl apply -f pvc.json

# kubectl get pvc 
NAME      STATUS    VOLUME                                     CAPACITY   ACCESSMODES   STORAGECLASS            AGE
pvctest   Bound     pvc-ccb0d8a6-cfa6-11e7-bb0a-d4bed9b390df   1Gi        RWO           glusterfs-cnstestelko   27s

For more details about openshift/CNS check openshift-doc. Also , check openshift_storage_glusterfs

#cns, #container-native-storage, #linux, #openshift, #openshift-container-platform

setup cassandra cluster on centos7

Setup test cassandra cluster. On minimal centos7 install add cassandra repository in /etc/yum.repos.d/

# cat cassandra.repo

[cassandra]
name=Apache Cassandra
baseurl=https://www.apache.org/dist/cassandra/redhat/311x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://www.apache.org/dist/cassandra/KEYS

Install cassandra package

# yum install -y cassandra 

Edit /etc/cassandra/default.conf/cassandra.yaml and setup there below parameters

seeds: "ip1,ip2,ip3" 
listen_address 
rpc_address 

adapt above to specific cluster environment vars. listen_address and rpc_address has to be address of cassandra node

Open ports, 7000/tcp, 9042/tcp

firewall-cmd --zone=public --permanent --add-port=7000/tcp
firewall-cmd --zone=public --permanent --add-port=9042/tcp>
systemctl restart firewalld

start cassandra on all three boxes

# service cassandra start 
# chkonfig cassandra on 

After this nodetool status should list cassandra nodes

# nodetool status 
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load          Tokens            Owns (effective)  Host ID             Rack
UN  ip1  192.33 KiB  256          66.7%             ff1970ad-28d0-41fe-b749-30bfdb0912b3  rack1
UN  ip2  198.04 KiB  256          65.9%             9f72ab9a-7ea8-4b65-b0af-61006697c0fa  rack1
UN  ip3  179.76 KiB  256          67.4%             ef3ca4be-9c1c-4be6-bbca-c54239ad104c  rack1

#cassandra, #centos7

glustersummit2017

Today finished Glustersummit 2017 in Prague / Czechia , and I can say it was one of the best conference I ever attended. This is my (probably) subjective feeling, but all was top level from organization to ideas presented during the talks.

I had oprotunity together with my colleague Shekhar Berry to present our work on topic : Scalability and Performance with Brick Multiplexing, whole list of talks presented can be found in gluster summit schedule

Slides of our presentation can be found at link

Group photo is at this link

#cns, #gluster, #glustersummit2017, #kubernetes, #openshift, #redhat

openshift metrics with dynamic storage provision

Openshift metrics support Persistant Storage ( USE_PERSISTENT_STORAGE=true option ) and dynamic storage provisioning ( DYNAMICALLY_PROVISION_STORAGE=true ) using storageclasses. To learn more about kubernetes storage classes, read documentation at this location kubernetes documentation

In order to use these parameters, it is necessary to have

  • have OCP platform configured for particular storage backend, check OCP documentation for details how to configure OCP to work with particular storage backend. OCP supports many storage backends, so pick up one which is closes to you use case.
  • configure storage class which will provide storage for persistent storage which will be used by Openshift metrics

In metrics templates we can see there is also dynamic pv template from where we can see that Persistent Volume claim is defined as

- kind: PersistentVolumeClaim
  apiVersion: v1
  metadata:
    name: "${PV_PREFIX}-${NODE}"
    labels:
      metrics-infra: hawkular-cassandra
    annotations:
      volume.beta.kubernetes.io/storage-class: "dynamic"
  spec:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: "${PV_SIZE}"

Until, storageclass name is fixed as parameter upstream, it is either necessary to have storageclass name dynamic, or if there is request to use storage class with different name then it is necessary to rebuild metrics images. Rebuilding metrics images is quite easy and can be done following below steps ( I assume that images have already pre-built volume.beta.kubernetes.io/storage-class part already )

# git clone https://github.com/openshift/origin-metrics
# do necessary changes in template related to storageclass 

# cd origin-metrics/hack 
# ./build-images.sh --prefix=new_image --version=1.0

Once this finish, the name of desired storage class will be part of deployer pod and configuring metrics with dynamic storage provisioning will work as expected

#dynamic-storage-provisioning, #kubernetes-dynamic-storage, #ocp, #openshift-container-platform, #openshift-metrics