ceph rbd block device as persistent storage for openshift

After Fedora 23 – ceph installation and Fedora 23 – Openshift installation setup it is now time to hook openshift environment to use CEPH storage backend

Openshift pods will be using CEPH rados block devices as persistent storage and to achieve this one option is to follow below steps

  • create ceph pool and desired number of images on top of it, this can be done manually or using ceph-pool-setup.sh script. If ceph-pool-setup.sh is used,read README before running it.
  • create ceph-secret file. As an example is possible to use ceph-secret
  • define persistent volume and persistent volume claim. Yaml example files ceph-pv and ceph-pv-claim
  • create pod file adapting it to use ceph pv and ceph pv claims pod-pv-pvc-ceph
  • In above examples is necessary to change variables to suit different environments ( ceph pool name, ceph monitor(s) ip addresses … )

    Once all is in place, then running at below on Ceph cluster and after Openshift master will create pod which will in return start using rbd as persistent

    Ceph side

    # ./ceph-pool-setup -a c -p mypool -i 1 -r 3 -s 1

    This will create three way replicated ceph poll with name mypool, with one image on top of it with size of 1 GB

    Openshift side

    # oc create -f ceph-secrets.yaml
    # oc create -f ceph-pv.yaml 
    # oc create -f ceph-pv-claim.yaml
    # oc create -f pod-pv-pvc-ceph.json

    If all is fine pod should start and mount rbd inside pod with ext4 file system preformanted

    # oc rsh pod 
    # mount | grep rbd 
    /dev/rbd0 on /mnt/ceph type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered)

    This setup will enable openshift pods to use ceph rbd device as persistent storage, and in case pod is removed, and started at some other openshift node it will get same data in case it has access to rbd device which was used before pod was deleted. As name said, this is persistent volume and it should persist across pod re-creation.


    #ceph, #ceph-rbd, #ceph-storage, #kubenetes, #openshift

    install openshift origin / OS Fedora 23

    Installing Openshift origin on Fedora 23 is showed below,overall not an difficult task to get test environment for openshift origin which then can be used for testing
    ( really only for testing – as this is going to be master / node on one machine and in kvm guest )

    Following below steps will lead to test openshift origin environment.

    Openshift origin publish bits are github openshift releases bellow is what I did to get it working under 10 mins.

    # dnf install -y docker; systemctl enable docker; systemctl start docker 
    # mkdir /root/openshift
    # cd /root/openshift
    # wget https://github.com/openshift/origin/releases/download/v1.1.1/openshift-origin-server-v1.1.1-e1d9873-linux-64bit.tar.gz
    # tar -xaf openshift-origin-server-v1.1.1-e1d9873-linux-64bit.tar.gz
    # cd openshift-origin-server-v1.1.1-e1d9873-linux-64bit
    # ./openshift start &

    After this, beside files delivered after unpacking source archive, there will be created in openshift directory openshift configuration files

    # ls -l 
    drwxr-xr-x. 4 root root        46 Jan 19 18:44 openshift.local.config
    drwx------. 3 root root        20 Jan 19 20:03 openshift.local.etcd
    drwxr-x---. 4 root root        33 Jan 19 18:44 openshift.local.volumes

    From here, it is necessary to export paths to keys and certificated

    # export KUBECONFIG="$(pwd)"/openshift.local.config/master/admin.kubeconfig
    $ export CURL_CA_BUNDLE="$(pwd)"/openshift.local.config/master/ca.crt
    $ sudo chmod +r "$(pwd)"/openshift.local.config/master/admin.kubeconfig

    That is! Follow rc-local Fedora 23 to make it to start on boot, or write systemd files using as starting points openshift-master-service and openshift-node-service – what should work with small tweaks

    #openshift, #openshift-origin

    prevent NetworkManager to update /etc/resolv.conf

    NetworkManager is going to update /etc/resolv.conf but if you do not want it to do that, then update /etc/resolv.conf to desired value,and edit /etc/NetworkManager/NetworkManager.conf and add there in main section dns=none linke below


    This will prevent updating of /etc/resolv.conf by NetworkManager

    #dns-configuration-fedora, #fedora-2, #networkmanager

    copy/edit partition table with sfdisk

    sfdisk is nice tool for playing with disk partitions. It has many features, and is very useful when is necessary to do some changes with disk partitions. Before doing anything with sfdisk I recommend reading sfdisk man page to get basic picture what is sfdisk and for what it can be used. If not used carefully, it can be dangerous command, especially if pointed to wrong device so … think before running it
    I needed it where was necessary to clone partition table of one sdcard to another ( fdisk can do this too )

    To save partition table, I did

    # sfdisk --dump /dev/sdb > 16gcard

    Now in 16gcard dump file was written

    # cat 16gcard
    label: dos
    label-id: 0x00000000
    device: /dev/sdb
    unit: sectors
    /dev/sdb1 : start=        8192, size=    31108096, type=c

    This is what I need, however, new card is double in size, so 32 GB and writing above on new card will occupy just first 16 GB. Luckily, sfdisk is very versatile tool and it allows editing partition dump and then writing it back to disk. Open 16gcard in text editor ( eg. Vim ) and edit dump file. If original size is 31108096 * 512 B ( sectors ) then new size would be 61399040 * 512 B (sectors) and new dump file

    # cat 16gcard 
    label: dos
    label-id: 0x00000000
    device: /dev/sdb
    unit: sectors
    /dev/sdb1 : start=        8192, size=    61399040, type=c

    Now I can write it to new card

    # sfdisk /dev/sdb < 16gcard

    and fdisk -l shows

    #  fdisk -l /dev/sdb
    Disk /dev/sdb: 29.3 GiB, 31440502784 bytes, 61407232 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: dos
    Disk identifier: 0x00000000
    Device     Boot Start      End  Sectors  Size Id Type
    /dev/sdb1  *     2048 61407231 61405184 29.3G  c W95 FAT32 (LBA)

    What is very same partition table as one I had on old card except last sector which is adapted to suit size of new card.

    #linux, #sfdisk, #storage

    for all deleted … kill all processes

    Got my / file system full, after checking all, as last resort I looked at lsof to check are there some deleted files which are still hold by some process, and yes, there were many of them!
    Sometimes simple for loop can save a lot of time…
    # for m in $(lsof | grep delete |awk '{print $2}');do kill -9 $m;done



    If you install docker-1.7 ( or later ) then you will get with it docker-storage-script delivered

    # rpm -q docker
    #rpm -ql docker | grep storage

    /usr/bin/docker-storage-setup is responsible for docker storage configuration, it is shell script and if you open and read it, you can get all what it does from there.

    docker-storage-setup will follow instructions in /usr/lib/docker-storage-setup/docker-storage-setup and configure docker storage backend based on parameters there. In most cases you are fine with defaults, but you can change them if you want.

    In case you decide to step away from default values ( eg, to give volume group name you like, or to specify block devices you want to use… ) then it is possible too. All you need to do is to

    cp /usr/lib/docker-storage-setup/docker-storage-setup /etc/sysconfig

    Edit, /etc/sysconfig/docker-storage-setup

    and start docker

    systemctl start docker

    In docker-storage-setup, interesting values are

    DEVS – list of block devices to be used for docker storage
    VG – volume group name
    CHUNK_SIZE – default is 512K

    Pay attention on CHUNK_SIZE – value of 512K is showed as most efficient from storage backend performance point of view. Check Determine the default optimal chunk/block size for docker workload and Adjust default chunksize on thinp volume created by docker-storage-setup

    #docker, #docker-performance, #docker-storage-setup, #linux, #thin-lvm

    nsenter … for container entering

    applies for one container in docker ps output

    nsenter -m -u -n -i -p -t $(docker inspect $(docker ps -q) | grep Pid | tail -1  | cut -d"," -f1 | awk '{print $2}')

    #docker, #nsenter