Tech Blog

If you have been following along, in my last two posts we successfully set up an active/passive Apache cluster on CentOS 7 and then we added shared storage. The storage I chose was local storage. In this post, I'll be replacing that with iSCSI shared storage and adding MySQL (actually MariaDB).

This would work very similar if you had a SAN. Shared storage such as iSCSI or a SAN is not a requirement to add MySQL to the cluster but rather it just happens to be the order I’m doing things.

It is also worth pointing out that I am not covering iSCSI at all other than to assume you already have it set up and with a single device connected (not formatted or mounted) to each machine. Maybe I’ll post an article about iSCSI at another time. For this setup, I have /dev/sda connected to each machine with 4 GB of space.  I probably should also mention that if your iSCSI setup is not fault-tolerant, you could present a single point of failure by using this setup.  The same holds true if you are using virtual machines if the underlying infrastructure isn’t fault tolerant.  Another thing to be aware of in this post is that I am building on the existing setup from previous posts rather than starting from scratch so some settings and packages are already in place.  Finally be aware that as we get into this deeper, it should become clear that there are numerious ways in which this could all be built.  As stated previously, my intent is not to show you exactly how you should build out a cluster of your own for mission-critical use but rather to introduce you to clustering by example.

So let’s get started. First, you may want to back up your existing cluster config. With RHEL/CentOS v7.1 and higher, you can easily do that as follows and I would recommend doing so:
# pcs config backup cluster.conf

That will create a compressed tar ball named cluster.conf.tar.bz2. Note also that this DOES NOT include your DRBD configuration so you need to deal with that separately if you need it backed up as well.

Now let’s blow away our existing cluster configuration so we can recreate it. When I initially started thinking about this my plan was just to modify the cluster config we already have but I decided it would be easier and probably more beneficial to create it again from scratch with our new configuration. So to blow away all our cluster configurations, I’ll issue the following command on both nodes and please be aware that this REMOVES THE CLUSTER CONFIG FOR GOOD so back it up first:
# pcs cluster destroy

That takes care of the cluster config so now we need to remove the DRBD config. From both nodes, run the following:
# drbdadm down drbd0

If you get "Resource unknown" or "no resources defined!", you are OK to continue.

Make sure corosync, pacemaker and pcsd are not running on both nodes:
# systemctl stop corosync
# systemctl stop pacemaker
# systemctl stop pcsd

Now let’s remove our old config file, /etc/drbd.d/r0:
# rm /etc/drbd.d/r0.res

This is a good spot to reboot '# reboot' just so we are starting out with a clean slate. Then we need to install MySQL server (MariaDB) and the cluster logical volume management tools on both nodes:
# yum –y install mariadb mariadb-server lvm2-cluster

We need to open TCP port 21064 for the Distributed Lock Manager (dlm) to run on both nodes:
# firewall-cmd --permanent --add-port=21064/tcp

And save our change on both nodes:
# firewall-cmd --reload

Let's also make sure that SELinux is OK with the cluster logical volume manager daemon on both nodes:
# semanage permissive -a clvmd_t

Make sure MariaDB is not set to start at boot as we want the cluster to take care of that:
# systemctl disable mariadb

Now to recreate our initial corosync config of /etc/corosync/corosync.conf with the following:Copy the new corosync.conf file the authkey from node1 to node2 (the authkey should still be there but you can copy it again just to be safe):

totem {
  version: 2
  secauth: off
  cluster_name: cluster1
  transport: udpu
}
nodelist {
  node {
    ring0_addr: node1.theharrishome.lan
    nodeid: 1
  }
  node {
    ring0_addr: node2.theharrishome.lan
    nodeid: 2
  }
}
quorum {
  provider: corosync_votequorum
  two_node: 1
}
logging {
  to_logfile: yes
  logfile: /var/log/cluster/corosync.log
  to_syslog: yes
}

Copy the new corosync.conf file the authkey from node1 to node2 (the authkey should still be there but you can copy it again just to be safe):
# scp /etc/corosync/corosync.conf /etc/corosync/authkey This email address is being protected from spambots. You need JavaScript enabled to view it.:/etc/corosync/

Start corosync, pacemaker and pcsd on both nodes:
# systemctl start corosync
# systemctl start pacemaker
# systemctl start pcsd

Everythong should stil be enabled to run at boot but just to be safe:
# systemctl enable corosync
# systemctl enable pacemaker
# systemctl enable pcsd

And run through the same utilities as last time looking for any errors:
# corosync-cfgtool -s
# corosync-cmapctl | grep members
# corosync-quorumtool

The nodes should still be authenticated from last time but it would hurt to check, and if not, remember to use the hacluster user/pass.
# pcs cluster auth node1.theharrishome.lan node2.theharrishome.lan

Now to set up the cluster again from a single node:
# pcs cluster setup --name cluster1 node1.theharrishome.lan node2.theharrishome.lan --force

Now let’s start the cluster, again from a single node:
# pcs cluster start --all

We should now be able to successfully validate our cluster config with the crm_verify command. If all is well, it won’t return anything other than STONITH errors which is good:
# crm_verify -L -V

Make sure STONITH is on so just for good measure:
# pcs property set stonith-enabled=true

At this point, ‘# pcs status’ should show everything online. If not, a reboot will probably take care of things and it’s probably not a bad idea anyway at this point. Now we need to set up a dlm (Distributed Lock Manager) resource as this is a requirement of the cluster logical volume manager and it needs to be a clone since it will run at the same time on both nodes:
# pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s on-fail=fence clone interleave=true ordered=true

We need to set the locking_type parameter in the /etc/lvm/lvm.conf file to 3 which can be done with the following command ON BOTH NODES. This was already done in the previous post but again, just to be sure:
# /sbin/lvmconf --enable-cluster

Now to set up the clustered logical volume manager to start and again, this requires a clone since it runs at the same time on both nodes:
# pcs resource create clvmd ocf:heartbeat:clvm op monitor interval=30s on-fail=fence clone interleave=true ordered=true

Now we need to set up clvmd and dlm dependencies and start up order. The clvmd service must start after dlm and it must run on the same node as dlm. Again, we need them to be clones.
# pcs constraint order start dlm-clone then clvmd-clone
# pcs constraint colocation add clvmd-clone with dlm-clone

Now let's run '# pcs status':

You will notice that neither the dlm or clvmd clones are running. The reason for that is we don’t yet have any STONITH fencing devices. By design, we must have  STONITH fencing devices or dlm and clvmd will not run.  Let’s take care of that now. Again, I’m using XenServer so I’ll use the same fencing devices as in the previous post:
# pcs stonith create xen-fencing-node1 fence_xenapi pcmk_host_list="node1.theharrishome.lan" action=off port=4ff5615d-8049-1e96-ebec-60a068ec4ae5 login=root passwd=password session_url=http://192.168.1.155/ op monitor interval=30s
# pcs stonith create xen-fencing-node2 fence_xenapi pcmk_host_list="node2.theharrishome.lan" action=off port=a6dfbefd-f197-228b-7d7c-182132be2c79 login=root passwd=password session_url=http://192.168.1.155/ op monitor interval=30s

And don’t forget to set up the fencing constraints:
# pcs constraint location xen-fencing-node1 prefers node1.theharrishome.lan=-INFINITY
# pcs constraint location xen-fencing-node2 prefers node2.theharrishome.lan=-INFINITY

It’s a good idea to double-check the fencing constraints:
# pcs constraint location show

Give it a few seconds and then you should see the clvmd and dlm clones start running:

We need to create the volumes, we use the same commands as for a standard volume utilizing LVM of pvcreate, vgcreate, and lvcreate. We create the volumes from a single node only because we will be utilizing the clustered logical volume manager.  These command will not be successful if all resources added up to this point INCLUDING STONITH fencing are not running properly.  So to create the volumes:
# pvcreate /dev/sda
# vgcreate -Ay -cy clusterVG /dev/sda
# lvcreate -L2G -n clusterLV1 clusterVG
# lvcreate -L2G -n clusterLV2 clusterVG

When complete, we can view them using the pvdisplay and lvdisplay commands on BOTH nodes.

On to configure DRBD for our shared resources. This time we have a single resource file with two resources.  Copy this file to each node:

Content of /etc/drbd.d/r0.res on both nodes:

resource drbd0 {
  volume 0 {
    device /dev/drbd0;
    disk /dev/clusterVG/clusterLV1;
    flexible-meta-disk internal;
  }
  volume 1 {
    device /dev/drbd1;
    disk /dev/clusterVG/clusterLV2;
    flexible-meta-disk internal;
  }
  on node1.theharrishome.lan {
    address 192.168.1.180:7788;
  }
  on node2.theharrishome.lan {
    address 192.168.1.181:7788;
  }
}

I chose to create both resources in a single file for simplicity. You could just as easily use 2 resource files as follows making note of the port change:

Optional contents of /etc/drbd.d/r0.res

resource drbd0 {
  on node1.theharrishome.lan {
    volume 0 {
      device /dev/drbd0;
      disk /dev/clusterVG/clusterLV1;
      flexible-meta-disk internal;
    }
    address 192.168.1.180:7788;
  }
  on node2.theharrishome.lan {
    volume 0 {
      device /dev/drbd0;
      disk /dev/clusterVG/clusterLV1;
      flexible-meta-disk internal;
    }
    address 192.168.1.181:7788;
  }
}

Optional contents of /etc/drbd.d/r1.res

resource drbd1 {
  on node1.theharrishome.lan {
    volume 1 {
      device /dev/drbd1;
      disk /dev/clusterVG/clusterLV2;
      flexible-meta-disk internal;
    }
    address 192.168.1.180:7789;
  }
  on node2.theharrishome.lan {
    volume 1 {
      device /dev/drbd1;
      disk /dev/clusterVG/clusterLV2;
      flexible-meta-disk internal;
    }
    address 192.168.1.181:7789;
  }
}

Now let’s initialize both of the DRBD resources on both nodes:
# drbdadm create-md drbd0

And bring it up on both nodes:
# drbdadm up drbd0

Now we need to set one of the nodes as the primary. To do that from node1:
# drbdadm primary --force drbd0

It should now begin to synchronize and this will probably take a while. You can always check the status as follows:
# cat /proc/drbd

When it is done syncing, we can put a file system on our new DRBD devices from a single node:
mkfs.ext4 /dev/drbd0
mkfs.ext4 /dev/drbd1

Now to mount /dev/drbd0 from a single node temporarily to verify SELinux is set and to create our test index.php file:
# mount /dev/drbd0 /mnt
# vi /mnt/index.php 

<?php
echo gethostname();
?>

Save and close.

Set the new file system with the same permissions as the standard Apache document root of /var/www/html
# chcon -R --reference=/var/www/html /mnt

Now unmount:
# umount /mnt

And now to set up our database in a similar fashion:
# mount /dev/drbd1 /mnt

And now let's use a cool tar trick to copy the data from /var/lib/mysql to /mnt
# tar -C /var/lib/mysql/ -cf - . | tar -C /mnt -xf -

If you have never started MariaDB, the /var/lib/mysql directory is likely empty because it populates this directory with data upon first startup.  If that is the case, you can start and stop it with "# systemctl start mariadb;systemctl stop mariadb" and then re-run the previous command.

And make sure SELinux is happy with our new config
# chcon -R --reference=/var/lib/mysql /mnt

And unmount again:
# umount /mnt

At this point, we need to use a feature of pcs that will allow us to queue up several changes in a file that we can then commit at one time. Because our cluster now has fencing enabled, if we don’t get the ordering just right, it is likely that the fence agent STONITH (recall what that means) the other node. To start, we need to tell pcs to create a file with the cluster config up to this point as follows:
# pcs cluster cib /tmp/drbd_cfg

Now to add to that config, we just include ‘-f /tmp/drbd_cfg’ to the command and when we are all finished, we will commit the changes. So back to our cluster configuration. Recall above in our DRBD setup, I used one resource name for both drives, that of drbd0. That has the advantage of keeping our cluster setup a bit cleaner as we don’t need to control two DRBD resources. So let’s set up the cluster to bring up drbd0:
# pcs -f /tmp/drbd_cfg resource create Data ocf:linbit:drbd drbd_resource=drbd0 op monitor interval=30s

If we left things as they are now, the DRBD resource would only run on one cluster node at a time and if you recall above, we had to start it on both nodes. To take care of that, we need to add a clone of the WebData resources as follows and note that this is NOT the same as mounting the resources on multiple nodes at the same time as we don’t want to allow that:
# pcs -f /tmp/drbd_cfg resource master DataClone Data master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

To allow the cluster to mount the file system:
# pcs -f /tmp/drbd_cfg resource create WebFS Filesystem device="/dev/drbd0" directory="/var/www/html" fstype="ext4"
# pcs -f /tmp/drbd_cfg resource create DBFS Filesystem device="/dev/drbd1" directory="/var/lib/mysql" fstype="ext4"

For our VIP (virtual IP address):
# pcs -f /tmp/drbd_cfg resource create VIP ocf:heartbeat:IPaddr2 ip=192.168.1.200 cidr_netmask=24 nic=eth0 op monitor interval=30s

Now for Apache:
# pcs -f /tmp/drbd_cfg resource create Apache ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf op monitor interval=30s

And MySQL:
# pcs -f /tmp/drbd_cfg resource create MySQL ocf:heartbeat:mysql config=/etc/my.cnf enable_creation=1 start timeout=120s op monitor interval=30s

Time to add some constraints to all these resources.  There are a number of ways in which you could accomplish this and I'm not saying the way I do it is best, but it works for me.  I could sum this up by saying I am making sure that all the services run with the DataClone:
# pcs -f /tmp/drbd_cfg constraint colocation add WebFS with DataClone INFINITY with-rsc-role=Master
# pcs -f /tmp/drbd_cfg constraint colocation add DBFS with DataClone INFINITY with-rsc-role=Master
# pcs -f /tmp/drbd_cfg constraint colocation add Apache with DataClone INFINITY with-rsc-role=Master
# pcs -f /tmp/drbd_cfg constraint colocation add MySQL with DataClone INFINITY with-rsc-role=Master
# pcs -f /tmp/drbd_cfg constraint colocation add VIP with DataClone INFINITY with-rsc-role=Master
# pcs -f /tmp/drbd_cfg constraint order promote DataClone then WebFS

The 'with-rcs-role=Master' in the constraints above means to locate the specific resource with the node that has the Master role of the DataClone resource.  You may also notice that I haven't taken into consideration the ordering of the resources but you may need to address that depending on your needs.

Finally, let’s create a resource group:
# pcs -f /tmp/drbd_cfg resource group add WebSite VIP Apache MySQL WebFS DBFS

With all that in place, let's push our changes and hope everything works:
# pcs cluster cib-push /tmp/drbd_cfg

And here is a final screen shot with everyting running:

At this point, you should be able to move the resources from one node to the other as follows:
# pcs resource move WebSite node2.theharrishome.lan

When running "pcs resource move" a constraint to the resource so it won't move back.  We need to clear that constraint with the following command:
# pcs resource clear WebSite

Notice what happens if you try to mount one of the DRBD resources on a node that is not the primary.  For example, when node1 is primary, I tried to mount /dev/drbd0 to /var/www/html and look at what happens:

The mount failed because clvmd won't allow the underlying resource to mount on a second node while it is already mounted elsewhere.

If you are building out a cluster of your own, I hope it goes without saying that you should test, test, test.  If the machines are physical, pull the power cord on each one and see what happens.  Disconnect the network cables and observer how they react.  If they are virtual machines, randomely power them off.  Beat up on them until you are comfortable they can handle whatever you think may come their way.

I hope at this point you can see how powerful the options are for clustering using these tools.  I have just scratched the surface of the many options they offer so make sure and do some homework to see what else these tools can do.  Once again, I would like to thank you for reading.

- Kyle H.