ceph rbd block device as persistent storage for openshift

After Fedora 23 – ceph installation and Fedora 23 – Openshift installation setup it is now time to hook openshift environment to use CEPH storage backend

Openshift pods will be using CEPH rados block devices as persistent storage and to achieve this one option is to follow below steps

  • create ceph pool and desired number of images on top of it, this can be done manually or using ceph-pool-setup.sh script. If ceph-pool-setup.sh is used,read README before running it.
  • create ceph-secret file. As an example is possible to use ceph-secret
  • define persistent volume and persistent volume claim. Yaml example files ceph-pv and ceph-pv-claim
  • create pod file adapting it to use ceph pv and ceph pv claims pod-pv-pvc-ceph
  • In above examples is necessary to change variables to suit different environments ( ceph pool name, ceph monitor(s) ip addresses … )

    Once all is in place, then running at below on Ceph cluster and after Openshift master will create pod which will in return start using rbd as persistent

    Ceph side

    # ./ceph-pool-setup -a c -p mypool -i 1 -r 3 -s 1

    This will create three way replicated ceph poll with name mypool, with one image on top of it with size of 1 GB

    Openshift side

    # oc create -f ceph-secrets.yaml
    # oc create -f ceph-pv.yaml 
    # oc create -f ceph-pv-claim.yaml
    # oc create -f pod-pv-pvc-ceph.json

    If all is fine pod should start and mount rbd inside pod with ext4 file system preformanted

    # oc rsh pod 
    # mount | grep rbd 
    /dev/rbd0 on /mnt/ceph type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered)

    This setup will enable openshift pods to use ceph rbd device as persistent storage, and in case pod is removed, and started at some other openshift node it will get same data in case it has access to rbd device which was used before pod was deleted. As name said, this is persistent volume and it should persist across pod re-creation.

    install openshift origin / OS Fedora 23

    Installing Openshift origin on Fedora 23 is showed below,overall not an difficult task to get test environment for openshift origin which then can be used for testing
    ( really only for testing – as this is going to be master / node on one machine and in kvm guest )

    Following below steps will lead to test openshift origin environment.

    Openshift origin publish bits are github openshift releases bellow is what I did to get it working under 10 mins.

    # dnf install -y docker; systemctl enable docker; systemctl start docker 
    # mkdir /root/openshift
    # cd /root/openshift
    # wget https://github.com/openshift/origin/releases/download/v1.1.1/openshift-origin-server-v1.1.1-e1d9873-linux-64bit.tar.gz
    # tar -xaf openshift-origin-server-v1.1.1-e1d9873-linux-64bit.tar.gz
    # cd openshift-origin-server-v1.1.1-e1d9873-linux-64bit
    # ./openshift start &

    After this, beside files delivered after unpacking source archive, there will be created in openshift directory openshift configuration files

    # ls -l 
    drwxr-xr-x. 4 root root        46 Jan 19 18:44 openshift.local.config
    drwx------. 3 root root        20 Jan 19 20:03 openshift.local.etcd
    drwxr-x---. 4 root root        33 Jan 19 18:44 openshift.local.volumes

    From here, it is necessary to export paths to keys and certificated

    # export KUBECONFIG="$(pwd)"/openshift.local.config/master/admin.kubeconfig
    $ export CURL_CA_BUNDLE="$(pwd)"/openshift.local.config/master/ca.crt
    $ sudo chmod +r "$(pwd)"/openshift.local.config/master/admin.kubeconfig

    That is! Follow rc-local Fedora 23 to make it to start on boot, or write systemd files using as starting points openshift-master-service and openshift-node-service – what should work with small tweaks

    prevent NetworkManager to update /etc/resolv.conf

    NetworkManager is going to update /etc/resolv.conf but if you do not want it to do that, then update /etc/resolv.conf to desired value,and edit /etc/NetworkManager/NetworkManager.conf and add there in main section dns=none linke below


    This will prevent updating of /etc/resolv.conf by NetworkManager

    copy/edit partition table with sfdisk

    sfdisk is nice tool for playing with disk partitions. It has many features, and is very useful when is necessary to do some changes with disk partitions. Before doing anything with sfdisk I recommend reading sfdisk man page to get basic picture what is sfdisk and for what it can be used. If not used carefully, it can be dangerous command, especially if pointed to wrong device so … think before running it
    I needed it where was necessary to clone partition table of one sdcard to another ( fdisk can do this too )

    To save partition table, I did

    # sfdisk --dump /dev/sdb > 16gcard

    Now in 16gcard dump file was written

    # cat 16gcard
    label: dos
    label-id: 0x00000000
    device: /dev/sdb
    unit: sectors
    /dev/sdb1 : start=        8192, size=    31108096, type=c

    This is what I need, however, new card is double in size, so 32 GB and writing above on new card will occupy just first 16 GB. Luckily, sfdisk is very versatile tool and it allows editing partition dump and then writing it back to disk. Open 16gcard in text editor ( eg. Vim ) and edit dump file. If original size is 31108096 * 512 B ( sectors ) then new size would be 61399040 * 512 B (sectors) and new dump file

    # cat 16gcard 
    label: dos
    label-id: 0x00000000
    device: /dev/sdb
    unit: sectors
    /dev/sdb1 : start=        8192, size=    61399040, type=c

    Now I can write it to new card

    # sfdisk /dev/sdb < 16gcard

    and fdisk -l shows

    #  fdisk -l /dev/sdb
    Disk /dev/sdb: 29.3 GiB, 31440502784 bytes, 61407232 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: dos
    Disk identifier: 0x00000000
    Device     Boot Start      End  Sectors  Size Id Type
    /dev/sdb1  *     2048 61407231 61405184 29.3G  c W95 FAT32 (LBA)

    What is very same partition table as one I had on old card except last sector which is adapted to suit size of new card.

    for all deleted … kill all processes

    Got my / file system full, after checking all, as last resort I looked at lsof to check are there some deleted files which are still hold by some process, and yes, there were many of them!
    Sometimes simple for loop can save a lot of time…
    # for m in $(lsof | grep delete |awk '{print $2}');do kill -9 $m;done



    If you install docker-1.7 ( or later ) then you will get with it docker-storage-script delivered

    # rpm -q docker
    #rpm -ql docker | grep storage

    /usr/bin/docker-storage-setup is responsible for docker storage configuration, it is shell script and if you open and read it, you can get all what it does from there.

    docker-storage-setup will follow instructions in /usr/lib/docker-storage-setup/docker-storage-setup and configure docker storage backend based on parameters there. In most cases you are fine with defaults, but you can change them if you want.

    In case you decide to step away from default values ( eg, to give volume group name you like, or to specify block devices you want to use… ) then it is possible too. All you need to do is to

    cp /usr/lib/docker-storage-setup/docker-storage-setup /etc/sysconfig

    Edit, /etc/sysconfig/docker-storage-setup

    and start docker

    systemctl start docker

    In docker-storage-setup, interesting values are

    DEVS – list of block devices to be used for docker storage
    VG – volume group name
    CHUNK_SIZE – default is 512K

    Pay attention on CHUNK_SIZE – value of 512K is showed as most efficient from storage backend performance point of view. Check Determine the default optimal chunk/block size for docker workload and Adjust default chunksize on thinp volume created by docker-storage-setup

    nsenter … for container entering

    applies for one container in docker ps output

    nsenter -m -u -n -i -p -t $(docker inspect $(docker ps -q) | grep Pid | tail -1  | cut -d"," -f1 | awk '{print $2}')

    what if you do not want to run docker pull?

    In Red Hat Security blog post before you initiate docker pull is explained why running docker pull ( at current state of docker ) cannot be ultimately considered as wise step.

    Unfortunately, all instructions out there related to starting with docker points exclusively to docker pull as way to go – not giving much room discussion from above blog post.

    If you are security aware ( some would say paranoid ), then you have couple options to deal with this

    1. divide pull and load steps – this means get .tar.gz archive of docker image and then load docker image after downloading it, this means you can get docker image using wget ( eg fedora Fedora docker images in .tar.gz format ) and then load it with docker load With this these you achieve same result but before loading image you can check md5 sum of image you want to load and ensure it is really one you want.
    I provided link to Fedora .tar.gz docker image format.Asking your favorite search ( evil ) machine will guide you to other images – Centos/Debian.

    I recommend you to read below articles in order to understand a bit more about how docker handle stuff in background

    a. docker insecurity blog post
    b. Cryptographic Signing of Core Images

    2. With 1. implemented, there is no need any more to get images from internet – as long as there is not ( and you want to use it ) updated base image ( in above case Fedora image ). From this point you can use docker for your purposes, and if you want to build new image you can use docker build --tag=mynewimage Dockerfile where in Dockerfile you specify that you want to build new image from one you already have. Beware here that,if docker build is not able to find image to build from – it will search against hub.docker.com – so read above article in to figure out how to prevent this.
    Docker team wrote excellent document on writing dockerfiles and I recommend to read it in advance to get better insight on Dockerfile structure.

    3. You can use docker-registry and build you local docker registry from where you can pull images – without internet interaction. In order to download images from docker registry you will first need to preload / upload ones you want there.

    In my future blog posts, I will write detailed howto how to create base image(s), how to configure and start your local docker registry service and how to upload images to there


    Fedora 21 upgrade and Intellij IDEA issue

    I am using Intellij IDE tool for python and I am very happy with it, it is free and works great ( there is also <wingware but you will need to pay license if you want to use it ). Anyway I am happy with Intellij. Recently after Fedora 20 -> Fedora 21 upgrade I had issue where it was not able to start Intellij IDE afterwards.

    The error message it reported was
    OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=350m; support was removed in 8.0
    Exception in thread "main" java.lang.Error: java.io.FileNotFoundException: /usr/lib/jvm/java-1.8.0-openjdk- (No such file or directory)

    At specified location it was present tzdb.dat file, but pointing to an empty

    # pwd; ls -l | grep tzdb
    lrwxrwxrwx. 1 root root 30 Nov 10 13:42 tzdb.dat -> /usr/share/javazi-1.8/tzdb.dat

    It comes out that with latest Fedora 21 update package tzdata-java

    # rpm -qf javazi-1.8

    #yum update tzdata-java

    was not updated properly. Reinstalling this package fixed above issue, I know that Fedora 21 will be mass used soon – once released and hopefully this can save some of you some time

