How to configure High Availability (HA)
January 18th, 2007The following will configure HA for a primary/secondary node configuration.
- On the primary node ensure DNS and or the /etc/hosts file is configured with both forward and reverse DNS configuration. The HA installation requires consistency through the cluster with regards to name resolution.
- Ensure the HA package has been installed. If you have previously installed the DRBD packages previously they should be installed. If not you can install it with the following command
- yum groupinstall drbd-heartbeat
- The HA configuration files are required to be identical across the cluster.
- Create or modify the file /etc/ha.d/ha.cf with the following configuration enabled
|
debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 keepalive 2 deadtime 6 warntime 4 initdead 12 baud 19200 serial /dev/ttyS0 mcast eth0 225.0.0.1 694 1 0 auto_failback off node node1 node node2 respawn hacluster /usr/lib/heartbeat/ipfail apiauth ipfail gid=haclient uid=hacluster |
Notes The debugfile and logfile are not specifically necessary how ever if you exclude them then the HA logging will go into the /var/log/messages file. With the suggested keepalive setting keepalives will be sent to the secondary node every two seconds, if they are not received within 4 seconds a warning will be logged. If there is no keepalive within 6 seconds the host will be marked as down and stand off for 12 seconds before bringing up the floating or virtual address on the secondary node. If you’re currently using /dev/ttyS0 (COM1) then you may wish to change this. It is suggested that you establish multiple links between the HA nodes to ensure if communication is broken over one that the HA heartbeats still get through. If you are using a different interface other than eth0 you should change the bcast and mcast details. node1 and node2 should be renamed to the DNS or /etc/hosts hostnames (it is also suggested that it matches a uname –r).
- Create or modify the file /etc/ha.d/resources with the following configuration enabled
|
node1 \ IPaddr2::192.168.2.240/24/eth0/192.168.2.255 |
Notes node1 should be the name of the primary host within the cluster (as above this should be respective of the DNS or /etc/hosts hostnames (it is also suggested that it matches a uname –r)). The IP address 192.168.2.240 should be changed to be the floating IP address used between the nodes and /24 is the net mask for the network. If you are using a different interface other than eth0 you should name that also. 192.168.2.255 is the IP address used as the broadcast address.
- Create or modify the file /etc/ha.d/authkeys with the following configuration enabled
|
auth 2 2 sha1 ultramonkey |
Notes This assumes we will use SHA1 as the encryption method for key exchange. The options are CRC and MD5, we suggest using SHA1 as its stronger than MD5 and CRC doesn’t provide any encryption. You should change the value ultramonkey to a secure secret key and this should be consistent across all nodes within the cluster.
- Ensure that the heartbeat process will start on all required run levels
- chkconfig –level 2345 heartbeat on
- Start the heartbeat process on the primary node
- service heartbeat start
- Start the heartbeat process on the secondary node
- service heartbeat start
- Information regarding HA activity and status should be updated within /var/log/ha-log
Assumptions
- The configuration is going to be configured on a CentOS 4.4 or greater CentOS 4 server
- There is Ethernet connectivity between all devices within the cluster
- There is a null-modem cable between the nodes in the cluster
Notes
- The HA notes were configured as per the following
- There is very little consistent information available for the HA application, some of the documentation provided hasn’t been updated since 2005 and information refers to information that has been removed from the Ultra Monkey web site
- It is very important to ensure that ntp is running on all nodes within the cluster
- It is important to note that if you have two communication methods (i.e. Ethernet and Serial) and one of them is still active the HA may not fail over to the secondary node. While is may be the correct scenario it may not provide the desired results. During our testing this lead to troubleshooting problems
- The floating or virtual address will be brought up as ethx:0 (so its important to ensure that no Ethernet configuration has been specified under that address
- Sometimes even if the heartbeat services have been stopped on both nodes you may still be able to connect the floating or virtual address (this may not be the desired result, you may need to craft your own script to remove the address once stopped)
- Depending on the configuration of keepalive, warntime, deadtime and initdead it may lead to undesired downtime while the floating or virtual address is brought up on the secondary node
- Syntax errors in the HA configuration files tend to lead to the heartbeat program either not starting (quickly or at all) or not shutting down correctly
- While the HA application works if you intend to use this within a mission critical environment and the budget for the project permits we suggest you review some commercially available HA applications
