Tech Blog

In my last post, I went through the steps to build an active/passive Apache web hosting cluster on CentOS 7.  Building on the same platform, I’m going to add replicated block storage to our cluster by utilizing DRBD

Once we have it all setup, anything changed in the site on the active cluster node will be replicated to the passive node.  I've decided that in my next post in this series, I'm going to take our cluster to the next level by utilizing iSCSI storage and clustered LVM so stay tuned for that as well.

I am utilizing virtual machines for my two hosts so the first thing I’m going to do is add an additional 5 GB virtual drive.  Also keep in mind that I am leaving SELinux enabled so we need to install a tool we can use to tell SELinux that it is OK to run DRBD.  If you decide to utilize virtual machines for a cluster make sure to consider that your virtual machine infrastructure may not be fault tolerant.  So from both nodes:
# yum –y install policycoreutils-python

Then:
# semanage permissive -a drbd_t

Now SELinux should be OK with DRBD.  Also recall that we are using firewalld so now we need to allow DRBD traffic through our firewall as follows:
# firewall-cmd --permanent --add-port=7788-7799/tcp

Now let’s reload the firewall with our new settings:
# firewall-cmd --reload

Next we need to add a yum repository called ELRepo to our CentOS nodes.  You can find more info on it at http://elrepo.org.  First we will import the repo’s key:
# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org

Then install the repo:
# rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm

And install the necessary DRBD tools:
# yum –y install drbd84-utils kmod-drbd84

That should take care of the housekeeping chores.  Now let’s set up our new disk and format it.  For the sake of simplicity, I am using a standard disk but you could just as easily use logical volumes to take advantage of their added features.  In the next post in this series I plan to change to utilizing the clustered logical volume manager so we will dig into that more at that time.
# fdisk /dev/xvdb
And select “n” to create a new partition and then follow the prompts followed by “w” to write and exit.

We are ready to configure DRBD.  Copy the following config file to /etc/drbd.d/r0.res MAKING SURE to change the host names as well as volume info to match your setup.  Note the use of IP addresses instead of names.  That is a requirement. 

resource drbd0 {
        on node1.theharrishome.lan {
                volume 0 {
                        device          /dev/drbd0;
                        disk            /dev/xvdb1;
                        flexible-meta-disk      internal;
                }
                address         192.168.1.180:7788;
        }

        on node2.theharrishome.lan {
                volume 0 {
                        device          /dev/drbd0;
                        disk            /dev/xvdb1;
                        flexible-meta-disk      internal;
                }
                address         192.168.1.181:7788;
        }
}

Now let’s initialize things on both nodes:
# drbdadm create-md drbd0

Now we need to load the DRBD kernel module on both nodes:
# modprobe drbd

To set the module to load at boot:
# echo drbd >> /etc/modules-load.d/drbd.conf

And bring it up on both nodes:
# drbdadm up drbd0

Now we need to set one of the nodes as the primary.  To do that from node1:
# drbdadm primary --force drbd0

It should now begin to synchronize and this will probably take a while.  You can always check the status as follows:
# cat /proc/drbd

Note in the output above where it says Primary/Secondary and UpToDate/Inconsistent.  This is what you want to see at this point.  It is telling us one node is the primary, one is the secondary, and the primary is up to date while the secondary is now syncing.  So now we wait until the synchronization is complete.

Once it is fully synchronized, we should see something like this:

Now we can format our new clustered device.  If we were setting up a cluster such that multiple nodes could write to it at the same time, we would need to format the partition with a file system designed for such shared disk access such as GFS2 or OCFS.  For this cluster, we will only have one node at a time mounted to the volume so ext4 will do nicely.  So format it with your favorite file system from the primary node:
# mkfs.ext4 /dev/drbd0

Let’s mount it temporarily as follows:
# mount /dev/drbd0 /mnt

Now let’s install php so we can set up a script instead of using a static web page to test our setup as we did last time:
# yum –y install php

And create a php page that will respond with the host name for testing:
# vi /mnt/index.php

<?php
echo gethostname();
?>

Save and close.

Because we have SELinux enabled, we need to set our new file system with the same permissions as the standard Apache document root of /var/www/html:
# chcon -R --reference=/var/www/html /mnt

Now unmount the partition:
# umount /mnt

Instead of running ‘# drbdadm up drbd0’, we need to set up the cluster software to handle that as follows:
# pcs resource create WebData ocf:linbit:drbd drbd_resource=drbd0 op monitor interval=30s

If we left things as they are now, the DRBD resource would only run on one cluster node at a time and if you recall above, we had to start it on both nodes.  To take care of that, we need to add a clone of the WebData resources as follows and note that this is NOT the same as mounting the resources on multiple nodes at the same time as we don’t want to allow that:
# pcs resource master WebDataClone WebData master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

If we issued ‘# pcs status’ at this point, we should see something like this:

To allow the cluster to mount the file system:
# pcs resource create WebFS Filesystem device="/dev/drbd0" directory="/var/www/html" fstype="ext4"

We need to add a constraint here so that our new web file system mount (WebFS) doesn’t come up before our DRBD clone (WebDataClone) because if DRBD is not up, we certainly can’t mount it.
# pcs constraint colocation add WebFS with WebDataClone INFINITY with-rsc-role=Master
# pcs constraint order promote WebDataClone then start WebFS

Apache needs to run on the same machine as the filesystem and it must be active before Apache can start:
# pcs constraint colocation add WebSite with WebFS INFINITY
# pcs constraint order WebFS then WebSite

And after all that, here is the output of ‘# pcs status’ again:

So if we pull up our VIP address in a web browser (192.168.1.200), we should see the following:

Lets move the site to node2 and then hit refresh in the browser windows:
# pcs resource move WebSite node2.theharrishome.lan

Cool, it works! ! !

One thing to note however when running "pcs resource move" is that it adds a constraint to the resource so it won't move back.  We need to clear that constraint with the following command:
# pcs resource clear WebSite

If you are setting up a cluster of your own, I realize it might be very tempting now that you have things working to leave well enough alone.  Don’t do it!  I left the most important piece of this cluster until now and that is the fencing agent.  Without a proper fencing mechanism in place, you run the distinct possibility of having a split-brain situation where both nodes think they are the primary node at the same time.  With DRBD shared data storage like this cluster uses, that would quickly corrupt your data.

So, time to set up fencing.  There are many fencing agents available and the best one for your situation may very well differ from mind. 

To install all the fencing agents that are part of CentOS:
# yum –y install fence-agents-all

Or you could do a yum search for the word 'fence' and pick out the agent you wish to use as follows:
# yum search fence

You can also list the fence agents that STONITH knows about as follows:
# pcs stonith list

Then, you can use the following to get more information about any one of them, for example if you are interested in using the VMWare fence:
# pcs stonith describe fence_vmware

I am using XenServer to host the virtual machines in this cluster so I am going to use a fencing agent that can shut down the virtual machines via the Xen API.  The agent I’m using can be found at https://git.fedorahosted.org/cgit/fence-agents.git/ and  it needs to be built.  So time to get started.  Here is a quick run-through of the commands I used to download, build, and install the agents.  Note I used agents (plural) as this download includes more agents than just the one I will be using.

# yum –y install autoconf automake libtool gcc wget python-suds pexpect python-requests
# wget https://git.fedorahosted.org/cgit/fence-agents.git/snapshot/fence-agents-4.0.16.tar.gz
# tar xvzf fence-agents-4.0.16.tar.gz
# cd fence-agents-4.0.16
# ./autogen.sh
# ./configure
# make
# make install

Now, we can test the functionality of the new fencing agent by manually issuing a command such as follows.  Probably the only thing that requires an explanation is the “plug” definition.  That is the UUID of the virtual machine as shown on the general tab of the machine within XenCenter.
# fence_xenapi --action=off --plug=4ff5615d-8049-1e96-ebec-60a068ec4ae5 --username=root --password=password --session=http://192.168.1.155

If all worked as it should, that virtual machine should shut down.  If it didn’t, then stop and figure out what the problem is before continuing on.

Now to configure the STONITH fencing agent:
# pcs stonith create xen-fencing-node1 fence_xenapi pcmk_host_list="node1.theharrishome.lan" action=off port=4ff5615d-8049-1e96-ebec-60a068ec4ae5 login=root passwd=password session_url=http://192.168.1.155/ op monitor interval=30s
# pcs stonith create xen-fencing-node2 fence_xenapi pcmk_host_list="node2.theharrishome.lan" action=off port=a6dfbefd-f197-228b-7d7c-182132be2c79 login=root passwd=password session_url=http://192.168.1.155/ op monitor interval=30s

We need to make sure that xen-fencing-node1 ONLY runs on node2 and xen-fencing-node2 ONLY runs on node1. That way, one node fences the other. We don't EVER want the fencing agent to run on the same node it is set up to fence!  We do this with a location constraints:
# pcs constraint location xen-fencing-node1 prefers node1.theharrishome.lan=-INFINITY
# pcs constraint location xen-fencing-node2 prefers node2.theharrishome.lan=-INFINITY

A question I have seen a lot and one I wondered about for a long time is in regards to the use of -INFINITY.  You may be wondering why something like “# pcs constraint location xen-fencing-node1 prefers node2.theharrishome.lan=INFINITY’ and ‘# pcs constraint location xen-fencing-node2 prefers node1.theharrishome.lan=INFINITY’ wasn't used, but that is not the same, at least not if my understanding is correct.  A positive INFINITY (+INFINITY or just INFINITY) is really just a score (the score of 1,000,000 which is the highest) that determines where a resource SHOULD or SHOULD NOT run depending on how it is used.  However, a negative INFINITY (-INFINITY) as used above says where a resource MUST or MUST NOT run.  So to sum up, the word MUST is a bit stronger than the word SHOULD.

So '# pcs status' should now look like this:

Note that xen-fencing-node1 is running on node2.theharrishome.lan and xen-fencing-node2 is running on node1.theharrishome.lan, just as we instructed it to.

And to view the constraints:
# pcs constraint location show

To test, let’s issue the following command which should reboot node2.theharrishome.lan:
# stonith_admin –B node2.theharrishome.lan

If that worked, let’s try node1.theharrishome.lan:
# stonith_admin –B node1.theharrishome.lan

Recall in my previous post that we turned off STONITH for the cluster while configuring it.  Since we now have it working with shared storage, we need to turn it back on:
# pcs property set stonith-enabled=true

Now to the primary issue of a two node cluster such as this one.  It will work, however recall in our previous post that quorum is the minimum number of nodes required for the cluster to function which is usually half plus one.  Since we only have a two node cluster, it doesn’t’ make much sense to leave quorum enabled.  If we did and one node went down, it would lose quorum and the other node would go down as well.  By utilizing CLVM which I hope to tackle in the next post, we can further decrease the possibility of this causing a split-brain.  So we need to either tell the cluster to ignore quorum:
# pcs property set no-quorum-policy=ignore

or we can tell the cluster to freeze when quorum is lost which means do nothing until quorum is regained.  I chose to freeze it as follows:
# pcs property set no-quorum-policy=freeze

So there you have it, an active/passive cluster built on CentOS 7 with shared storage.  If one node goes down, the other should take over and keep on truckin!  Pretty cool stuff.  If you have read this far, let me once again thank you.  In the next post of this series I will switch to shared storage using an iSCSI device with CLVM and perhaps add MySQL to the mix.  Until then . . .

- Kyle H.