Simple Linux Cluster

Creating a simple Linux cluster doesn’t have to be complicated. Below, I demonstrate how to do so with keepalived, named, and nginx. The clustered service will be nginx. I will use named to return virtual IPs (VIP)s when clients lookup the name of the clustered services.

Keepalived will be used to manage VIP failover. Server selection will be handled client side. If you need more control of your load balancing, I recommend setting up a dedicated load balancer between your servers and clients. The process shown in this guide wouldn’t change much if you were to do so. I recommend HAProxy because it is GPLv2, fairly easy to setup, and has commercial support and consultation available if you need it.

Don’t worry if you aren’t trying to cluster nginx. The process of setting it up for your service should be similar. Before moving forward, consider the following:

Databases. On systems with separate database hosts, this may be a single point of failure. Check your database system’s documentation for details on load balancing and high availability.
Sessions. Applications that have user sessions may require configuring session replication or host pinning. Check the application documentation for details.
Split Brain. You may need to figure out how to handle the situation where cluster nodes can’t communicate with each other. Again, this depends on your application. This is probably the most important consideration. In some scenarios, split brain might be acceptable. In others, it can lead to data corruption.

Quick VRRP Overview

VRRP, or Virtual Router Redundancy Protocol is what Keepalived uses to have an IP failover between multiple hosts. It is defined by IETF RFC 3768.

The super short explanation is, VRRP nodes send advertisements with a router ID (in our case, it is used for a service), a priority number, and an interval (in seconds). The master is the device with the highest priority number in the cluster of nodes with the same router ID. If backup devices don’t receive a VRRP advertisement from the master for 3 intervals, the backup with the highest priority becomes the master and grabs the address.

For example, suppose you have a two node cluster and an interval of 3. If the host with a lower priority doesn’t receive a VRRP packet from the higher priority for 9 seconds, it will take on the clustered address.

The priority 0 has a special meaning. It means the master isn’t working and the highest priority backup should take on the shared address.

Setup DNS

I am assuming you already have a DNS provider or DNS servers for your organization. Below is an example named configuration. The name www has four records. Two are A records for IPv4, the other two are AAAA records for IPv6. I also have records for the individual servers. As you will notice, there are records for the individual servers themselves vrrp1 and vrrp2.

vrrp1           IN      A       172.16.1.1
vrrp2           IN      A       172.16.1.2
vrrp1           IN      AAAA    fd00::1
vrrp2           IN      AAAA    fd00::2
www             IN      A       172.16.1.5
www             IN      A       172.16.1.6
www             IN      AAAA    fd00::5
www             IN      AAAA    fd00::6

Setup Your Servers – NGINX

There are two approaches to this. The first is to simply ensure the files being hosted are identical on each server. The second is to serve files out of a directory shared by both hosts. If you choose the latter, make sure that you are using highly available storage. There are several ways to accomplish this, but that is out of scope. A few options are shared devices on a SAN or external disk controller, distributed storage, or a shared filesystem such as NFS or SMB. To keep this example simple, I will setup the web server to serve identical files on each host. In fact, I just installed nginx from the Debian repositories, kept the default configuration file, and created a single example page, shown below:

<!DOCTYPE html>
<html>
<head><title>SERVER 1</title></head>
<body>MY CONTENT</body>
</html>

I set the title on each server to identify the server handling the request. This is to demonstrate load balancing and failure handling in the final section.

Setup Keepalive

My cluster has two nodes and manages 4 virtual IPs. The first node has an IPv4 address of 172.16.1.1 and an IPv6 address of fd00::1. The second node has addresses of 172.16.1.2 and fd00::2

The four virtual IPs:

172.16.1.5 Master – vrrp1
172.16.1.6 Master – vrrp2
fd00::5 Master – vrrp1
fd00::6 Master – vrrp2

The keepalived.conf man page on my system is 1857 lines. I suggest taking some time to skim through it to see what kind of options are available. Below is a list of things I recommend reading about in the documentation:

SMTP configuration for email notification.
All three script types: notify, track, vrrp
VRRP instance configuration.
Global definitions.

On each node, create a configuration file for keepalived, located at /etc/keepalived/keepalived.conf. Each node requires a slightly different configuration.

Node 1 Config

vrrp_script check_nginx {
    script "/bin/systemctl is-active -q nginx"
    interval 2
    #weight -10
    fall 1
    rise 3
}

vrrp_instance VI_1 {
    state MASTER
    vrrp_check_unicast_src
    unicast_src_ip 172.16.1.1
    interface eth1
    virtual_router_id 1
    priority 255
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass password
    }

    unicast_peer { 172.16.1.2 }
    track_script { check_nginx }

    virtual_ipaddress { 172.16.1.5 }
}

vrrp_instance VI_2 {
    state MASTER
    vrrp_check_unicast_src
    unicast_src_ip fd00::1
    interface eth1
    virtual_router_id 2
    priority 255
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass password
    }

    unicast_peer { fd00::2 }
    track_script { check_nginx }

    virtual_ipaddress { fd00::5 }
}

vrrp_instance VI_3 {
    state BACKUP
    vrrp_check_unicast_src
    unicast_src_ip 172.16.1.1
    interface eth1
    virtual_router_id 3
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass password
    }

    unicast_peer { 172.16.1.2 }
    track_script { check_nginx }

    virtual_ipaddress { 172.16.1.6 }
}

vrrp_instance VI_4 {
    state BACKUP
    vrrp_check_unicast_src
    unicast_src_ip fd00::1
    interface eth1
    virtual_router_id 4
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass password
    }

    unicast_peer { fd00::2 }
    track_script { check_nginx }

    virtual_ipaddress { fd00::6 }
}

Node 2 Config

vrrp_script check_nginx {
    script "/bin/systemctl is-active -q nginx"
    interval 2
    #weight -10
    fall 1
    rise 3
}

vrrp_instance VI_1 {
    state BACKUP
    vrrp_check_unicast_src
    unicast_src_ip 172.16.1.2
    interface eth1
    virtual_router_id 1
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass password
    }

    unicast_peer { 172.16.1.1 }
    track_script { check_nginx }

    virtual_ipaddress { 172.16.1.5 }
}

vrrp_instance VI_2 {
    state BACKUP
    vrrp_check_unicast_src
    unicast_src_ip fd00::2
    interface eth1
    virtual_router_id 2
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass password
    }

    unicast_peer { fd00::1 }
    track_script { check_nginx }

    virtual_ipaddress { fd00::5 }
}

vrrp_instance VI_3 {
    state MASTER
    vrrp_check_unicast_src
    unicast_src_ip 172.16.1.2
    interface eth1
    virtual_router_id 3
    priority 255
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass password
    }

    unicast_peer { 172.16.1.1 }
    track_script { check_nginx }

    virtual_ipaddress { 172.16.1.6 }
}

vrrp_instance VI_4 {
    state MASTER
    vrrp_check_unicast_src
    unicast_src_ip fd00::2
    interface eth1
    virtual_router_id 4
    priority 255
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass password
    }

    unicast_peer { fd00::1 }
    track_script { check_nginx }

    virtual_ipaddress { fd00::6 }
}

Notice how the configuration file is broken up into blocks using { and }. The first string of characters before each block (E.g. vrrp_instance) describe what the block is configuring, such as global definitions, a tracking script or virtual interface. The second (E.g. check_nginx is an arbitrary name and is used by the admin to refer to in other parts of the configuration. There are cases where the curly braces are nested, like when setting up your VRRP authentication.

The table below describes some of the options in the config file that may not be self explanatory:

Option	Description
`vrrp_script check_nginx`	The highlighted portion is arbitrary. This is how you define VRRP tracking scripts.
`interval`	The number of seconds to wait between script runs.
`weight`	This defines how much to adjust the priority of an instance when the a tracking script returns something other than 0.
`fall`	The number of times the script can return non zero before adjusting the applicable router ID's priority.
`rise`	Defines how many times the script must return 0 after a fall event before the router ID's priority will be adjusted to normal.
`vrrp_instance VI_1`	The highlighted portion is arbitrary. This defines and names an instance of a VIP(s) you want Keepalived to manage.
`unicast_peer`	Keepalived can use unicast instead of multicast for advertisements. I used unicast in the examples because depending on your network equipment or virtualization environment, multicast may not be feasible. This parameter sets the destination addresses for the applicable instance.
`vrrp_check_unicast_src`	Configures keepalived to check the source address of advertisements and ensure they come from a cluster memeber.
`advert_int 1`	The time in seconds, between VRRP advertisements.
`track_script { NAME }`	Tracking scripts to run that will adjust the priority of an instance or put it in a fault state if it returns non-zero. In the example, if the web server SystemD service isn't active, the instance will go into a fault state causing the other node to use the shared IP(s).

Demonstration

I created a few DNS entries on my local network to demonstrate DNS load balancing and a failover.

client# host www.intranet
www.intranet has address 172.16.1.5
www.intranet has address 172.16.1.6
www.intranet has IPv6 address fd00::5
www.intranet has IPv6 address fd00::6

The cluster is up and running and functioning as it should. Notice how the VIPs are running on their configured master nodes:

vrrp1# ip addr show dev eth1 
3: eth1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:8d:7a:18 brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.1/24 scope global eth1
       valid_lft forever preferred_lft forever
    inet 172.16.1.5/32 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fd00::5/128 scope global nodad deprecated 
       valid_lft forever preferred_lft 0sec
    inet6 fd00::1/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe8d:7a18/64 scope link 
       valid_lft forever preferred_lft forever
vrrp2# ip addr show dev eth1 
3: eth1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:ce:bf:ac brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.2/24 scope global eth1
       valid_lft forever preferred_lft forever
    inet 172.16.1.6/32 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fd00::6/128 scope global nodad deprecated 
       valid_lft forever preferred_lft 0sec
    inet6 fd00::2/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fece:bfac/64 scope link 
       valid_lft forever preferred_lft forever

Now, I will use curl to demonstrate both the load balancing and a failover. Below I show how different addresses are resolved when I attempt to connect with the DNS name:

client# curl www.intranet
<!DOCTYPE html>
<html>
<head><title>SERVER 1</title></head>
<body>MY CONTENT</body>
</html>
client# curl www.intranet
<!DOCTYPE html>
<html>
<head><title>SERVER 2</title></head>
<body>MY CONTENT</body>
</html>

Now, I turn off the web server on the first cluster node. As you can see, the virtual addresses were picked up by the other host:

vrrp1# service nginx stop
vrrp1# ip addr show dev eth1 
3: eth1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:8d:7a:18 brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.1/24 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fd00::1/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe8d:7a18/64 scope link 
       valid_lft forever preferred_lft forever
vrrp2# ip addr show dev eth1 
3: eth1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:ce:bf:ac brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.2/24 scope global eth1
       valid_lft forever preferred_lft forever
    inet 172.16.1.6/32 scope global eth1
       valid_lft forever preferred_lft forever
    inet 172.16.1.5/32 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fd00::5/128 scope global nodad deprecated 
       valid_lft forever preferred_lft 0sec
    inet6 fd00::6/128 scope global nodad deprecated 
       valid_lft forever preferred_lft 0sec
    inet6 fd00::2/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fece:bfac/64 scope link 
       valid_lft forever preferred_lft forever

Now when I run curl, you see all requests are served by the remaining nodes web server:

client# curl www.intranet
<!DOCTYPE html>
<html>
<head><title>SERVER 2</title></head>
<body>MY CONTENT</body>
</html>
client# curl www.intranet
<!DOCTYPE html>
<html>
<head><title>SERVER 2</title></head>
<body>MY CONTENT</body>
</html>

Finally, you can see the IPs come back to the first node when I start the web server.

vrrp1# service nginx start
vrrp1# ip addr show dev eth1 
3: eth1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:8d:7a:18 brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.1/24 scope global eth1
       valid_lft forever preferred_lft forever
    inet 172.16.1.5/32 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fd00::5/128 scope global nodad deprecated 
       valid_lft forever preferred_lft 0sec
    inet6 fd00::1/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe8d:7a18/64 scope link 
       valid_lft forever preferred_lft forever