Setting up a HA Galera Cluster using MaxScale

This is a guide meant to provide you with a simple, highly available, and balanced mysql cluster that works and is resilient as hell.

First, you’ll need five machines or five VM’s. I’m running Ubuntu 14.04 so to follow the guide you should as well. I’ll be notating each galera box as node1-3 and each maxscale box as ha1-2.

We need to add the galera repository. SSH into all three nodes and run the following

root@galera01:~# sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys cbcb082a1bb943db
root@galera01:~# echo 'deb http://mirror.jmu.edu/pub/mariadb/repo/5.5/ubuntu trusty main' | sudo tee /etc/apt/sources.list.d/galera.list
root@galera01:~# sudo apt-get update

Once that’s done we need to install galera and set the root password

root@galera01:~# sudo apt-get install mariadb-galera-server galera

If the service started automatically make sure and stop it now on all nodes. Otherwise what we do in later steps will require you to force it to die.

root@galera01:~# sudo service mysql stop

Next edit /etc/mysql/my.cnf. I’m going to set the bind address to 0.0.0.0 so it’ll work for anyone but you can lock it down to a specific IP if you want. We can’t use loopback because the cluster members have to talk to one another. I’m also going to enable the slow log.

bind-address		= 0.0.0.0
slow_query_log=1

Now, we need to create /etc/mysql/conf.d/cluster.cnf. The important variables in the file are the bind-address which once again is set to 0.0.0.0 for any IP. The wsrep_cluster_name can be anything you’d like but should be the same on all nodes of the cluster. The wsrep_cluster_address field should contain IP’s for all 3 of your nodes as shown. The wsrep_sst_method I use is rsync. Simply put, the gcache size is a cache of replication actions that need to be performed on a node that goes down temporarily and this will function over IST. If you go over your gcache you’ll use SST and rsync. This will lock the donor node until the failed node comes back up. You could use mysqldump but it’s slower. Lastly, your wsrep_node_address should be the node’s address you’re editing the file on. The same goes for the wsrep_node_name. I’ve added the last line wsrep_retry_autocommit because there are occasions where a cluster wide deadlock happens when another node is writing to a table that another node is also trying to write to. These only apply to autocommit statements. I chose to include this because I have seen occurrences where an application encounters a deadlock and the driver, despite being told “retry the transation”, simply errors out.

[mysqld]
query_cache_size=0
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
query_cache_type=0
bind-address=0.0.0.0

# Galera Provider Configuration
wsrep_provider=/usr/lib/galera/libgalera_smm.so
#wsrep_provider_options="gcache.size=32G"

# Galera Cluster Configuration
wsrep_cluster_name="my_cluster"
wsrep_cluster_address="gcomm://192.168.1.130,gcomm://192.168.1.198,gcomm://192.168.1.167"

# Galera Synchronization Congifuration
wsrep_sst_method=rsync
wsrep_sst_auth=galerauser:secretsdontmakefriends

# Galera Node Configuration
wsrep_node_address="192.168.1.130"
wsrep_node_name="galera01"
wsrep_retry_autocommit=4

Next, we need to copy /etc/mysql/debian.cnf from the first node and replace the contents of the file with every other node. You can simply copy/paste or scp the file over. It says in caps “DO NOT TOUCH” but I’m an adult and can do what I want. The main piece we need syncd is the password= field. We’ll bootstrap every other node off the first so it will work fine. Don’t copy and paste mine below, I’m simply including it so you can reference what it looks like.

# DO NOT COPY THIS - USE THE ONE ON YOUR FIRST NODE
#Automatically generated for Debian scripts. DO NOT TOUCH!
[client]
host     = localhost
user     = debian-sys-maint
password = QLlRzgOp4ZIDVSVF
socket   = /var/run/mysqld/mysqld.sock
[mysql_upgrade]
host     = localhost
user     = debian-sys-maint
password = QLlRzgOp4ZIDVSVF
socket   = /var/run/mysqld/mysqld.sock
basedir  = /usr

Create a .my.cnf file in /root if you don’t feel like typing passwords

[client]
user=root
password="bananah4mmock"

[mysql]
user=root
password="bananah4mmock"

Run mysql_secure_installation on the first node.

root@galera01:~# mysql_secure_installation 

NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MariaDB
      SERVERS IN PRODUCTION USE!  PLEASE READ EACH STEP CAREFULLY!

In order to log into MariaDB to secure it, we'll need the current
password for the root user.  If you've just installed MariaDB, and
you haven't set the root password yet, the password will be blank,
so you should just press enter here.

Enter current password for root (enter for none): 
OK, successfully used password, moving on...

Setting the root password ensures that nobody can log into the MariaDB
root user without the proper authorisation.

You already have a root password set, so you can safely answer 'n'.

Change the root password? [Y/n] n
 ... skipping.

By default, a MariaDB installation has an anonymous user, allowing anyone
to log into MariaDB without having to have a user account created for
them.  This is intended only for testing, and to make the installation
go a bit smoother.  You should remove them before moving into a
production environment.

Remove anonymous users? [Y/n] 
 ... Success!

Normally, root should only be allowed to connect from 'localhost'.  This
ensures that someone cannot guess at the root password from the network.

Disallow root login remotely? [Y/n] 
 ... Success!

By default, MariaDB comes with a database named 'test' that anyone can
access.  This is also intended only for testing, and should be removed
before moving into a production environment.

Remove test database and access to it? [Y/n] 
 - Dropping test database...
 ... Success!
 - Removing privileges on test database...
 ... Success!

Reloading the privilege tables will ensure that all changes made so far
will take effect immediately.

Reload privilege tables now? [Y/n] 
 ... Success!

Cleaning up...

All done!  If you've completed all of the above steps, your MariaDB
installation should now be secure.

Thanks for using MariaDB!

Stop mysql on all nodes. Then, on the first node we’ll start our new cluster.

root@galera01:~# service mysql stop
 * Stopping MariaDB database server mysqld                    [ OK ] 
root@galera01:~# service mysql start --wsrep-new-cluster
 * Starting MariaDB database server mysqld                    [ OK ]

Now, on the other nodes we will simply start the mysql service. It’ll bootstrap off the first automatically.

root@galera02:~# service mysql stop
 * Stopping MariaDB database server mysqld                    [ OK ] 
root@galera02:~# service mysql start
 * Starting MariaDB database server mysqld                    [ OK ]

Now, since we were lazy and created a .my.cnf file just run mysql -u root and get dropped into a shell. We’re going to create a database and watch it get replicated to the other nodes

root@galera01:~# mysql -u root
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 30
Server version: 5.5.51-MariaDB-1~trusty-wsrep mariadb.org binary distribution, wsrep_25.14.r9949137

Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> create database replicateme;
Query OK, 1 row affected (0.00 sec)

Head over to another node and log into mysql and run a show databases. You’ll see that our database was replicated just fine.

root@galera02:~# mysql -u root
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 31
Server version: 5.5.51-MariaDB-1~trusty-wsrep mariadb.org binary distribution, wsrep_25.14.r9949137

Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| replicateme        |
+--------------------+
4 rows in set (0.00 sec)

Now, you’re free to use this cluster. If you follow all the galera guidelines and don’t do anything stupid ever then you can use this as a master-master-master without problem using a fancy F5 or haproxy in a round robin. However, people and programs aren’t perfect and rewriting everyone’s code isn’t a viable solution. So, to eliminate deadlocks entirely, monitor the cluster, and have automatic failover we’re going to set up MaxScale. We are going to set up MaxScale to analyze the galera cluster, elect a master, analyze all incoming queries, and split writes to the master and reads to the rest of the slaves. Galera uses optimistic cluster wide locking. So, if you update a row on one node and do it on another node at the same time you can run into a deadlock. Splitting the queries will eliminate this.

To get all the fancy resiliency this post is meant to illustrate to do we need to add the maxscale repo, update apt, and install maxscale and keepalived. Do this on both nodes.

root@ha01:~# sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 70E4618A8167EE24
root@ha01:~# echo 'deb https://downloads.mariadb.com/files/MaxScale/1.4.3/ubuntu trusty main' |sudo tee /etc/apt/sources.list.d/maxscale.list
root@ha01:~# apt-get update
root@ha01:~# apt-get install maxscale keepalived

We’ll set up keepalived first. Edit /etc/keepalived/keepalived.conf on the first ha node. We’re going to make this the master. It’s priority needs to be higher than the second ha box. Your virtual_ipaddress needs to be an IP we haven’t used before and isn’t in use by any other system. Keepalived is going to claim it. The track script simply checks to see if maxscale is running and fails over to the slave if it isn’t.

vrrp_script chk_maxscale
{
	script "killall -0 maxscale"
	interval 2
	weight 2
}

vrrp_instance VI_1
{
	interface eth0
	state MASTER
	virtual_router_id 51
	priority 101
	virtual_ipaddress
	{
		192.168.1.232
	}
	track_script
	{
		chk_maxscale
	}
}

Now, on our second ha box we’ll create /etc/keepalived/keepalived.conf again but set it up to be the failover

vrrp_script chk_maxscale
{
	script "killall -0 maxscale"
	interval 2
	weight 2
}

vrrp_instance VI_1
{
	interface eth0
	state BACKUP
	virtual_router_id 51
	priority 100
	virtual_ipaddress
	{
		192.168.1.232
	}
	track_script
	{
		chk_maxscale
	}
}

With that done we can edit /etc/sysctl.conf and add a line allowing us to bind services to IP’s we don’t really own. Add this to the bottom on both nodes

net.ipv4.ip_nonlocal_bind = 1

Load in the sysctl settings

root@ha01:~# sysctl -p
net.ipv4.ip_nonlocal_bind = 1

Now, we’ll set up maxscale. First we need to generate a .secret file. Do this with maxkeys.

root@ha01:~# maxkeys 
Generating .secrets file in /var/lib/maxscale ...

Next hash and remember a password we’ll use in a second for maxscale to monitor the cluster. You need to run this on both nodes and copy the hashes or run it on one and copy the .secrets file it creates to the second.

root@ha01:~# maxpasswd /var/lib/maxscale/ superduperpassword
F60FB938B468277026925BBEBCD35E5ACBA982A1AD2A9DA2956834CAE12C996C
2016-08-25 04:53:25   notice : Using encrypted passwords. Encryption key: '/var/lib/maxscale/.secrets'.

On the first galera node we’re going to create a user and grant it some permissions. I’m using wildcards but feel free to relegate it to your ha boxes’ main IP’s. Remember that these changes will now replicate so you only have to do it once.

MariaDB [(none)]> CREATE USER 'maxscale'@'%' IDENTIFIED BY 'superduperpassword';
Query OK, 0 rows affected (0.01 sec)

MariaDB [(none)]> GRANT SELECT ON mysql.user TO 'maxscale'@'%'; 
Query OK, 0 rows affected (0.01 sec)

MariaDB [(none)]> GRANT SELECT ON mysql.db TO 'maxscale'@'%';
Query OK, 0 rows affected (0.01 sec)

MariaDB [(none)]> GRANT SELECT ON mysql.tables_priv TO 'maxscale'@'%';
Query OK, 0 rows affected (0.02 sec)

MariaDB [(none)]> GRANT SHOW DATABASES ON *.* TO 'maxscale'@'%';
Query OK, 0 rows affected (0.01 sec)

MariaDB [(none)]> GRANT REPLICATION CLIENT ON *.* TO 'maxscale'@'%';
Query OK, 0 rows affected (0.00 sec)

Next, we need to edit /etc/maxscale.cnf on both ha boxes. Your config should match mine so remove any extra config info from the file. The threads should be set to your CPU count. The servers in the Splitter Service should be the names in [] that define your galera nodes. Mine are galera1-3. The user is the user you created in the previous step and the passwd field is the password hash that was output when you ran maxpasswd. In the Splitter Listener you need to define the virtual IP you set in keepalived.conf. Mine was 192.168.1.232. On each of the [galera#] sections you need to specify the node’s IP and mysql port. The Galera Monitor section should mirror the Splitter Service section in relation to the servers, user, and passwd. Leave the CLI sections alone. Remember that if you ran maxpasswd on both nodes instead of copying the .secrets file that you will have different hashes between the two servers.

[maxscale]
threads=4

[Splitter Service]
type=service
router=readwritesplit
servers=galera1, galera2, galera3
user=maxscale
passwd=F60FB938B468277026925BBEBCD35E5ACBA982A1AD2A9DA2956834CAE12C996C

[Splitter Listener]
type=listener
service=Splitter Service
protocol=MySQLClient
port=3306
address=192.168.1.232
socket=/tmp/ClusterMaster

[galera1]
type=server
address=192.168.1.226
port=3306
protocol=MySQLBackend

[galera2]
type=server
address=192.168.1.176
port=3306
protocol=MySQLBackend

[galera3]
type=server
address=192.168.1.163
port=3306
protocol=MySQLBackend

[Galera Monitor]
type=monitor
module=galeramon
servers=galera1, galera2, galera3
user=maxscale
passwd=F60FB938B468277026925BBEBCD35E5ACBA982A1AD2A9DA2956834CAE12C996C

[CLI]
type=service
router=cli

[CLI Listener]
type=listener
service=CLI
protocol=maxscaled
port=6603

That’s it for the configuration. Now, let’s start keepalived. Once you do you should see your virtual IP appear. Do this on both ha boxes.

root@ha01:~# service keepalived start
 * Starting keepalived keepalived                           [ OK ] 
root@ha01:~# ip addr show dev eth0
41: eth0@if42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 62:12:98:dc:0f:d0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.216/24 brd 192.168.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.1.232/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::6012:98ff:fedc:fd0/64 scope link 
       valid_lft forever preferred_lft forever

Once that’s done we can finally start maxscale and see if we screwed up.

root@ha01:~# service maxscale start
 * Starting MaxScale                                                                                                                                                                                                                                                                       * maxscale is running
                                                                                                                                                                                                                                                                                   [ OK ]
root@ha01:~# cat /var/log/maxscale/maxscale1.log 


MariaDB Corporation MaxScale	/var/log/maxscale/maxscale1.log Thu Aug 25 05:13:00 2016
-----------------------------------------------------------------------
2016-08-25 05:13:00   notice : Configuration file: /etc/maxscale.cnf
2016-08-25 05:13:00   notice : Log directory: /var/log/maxscale
2016-08-25 05:13:00   notice : Data directory: /var/lib/maxscale
2016-08-25 05:13:00   notice : Module directory: /usr/lib/x86_64-linux-gnu/maxscale
2016-08-25 05:13:00   notice : Service cache: /var/cache/maxscale
2016-08-25 05:13:00   notice : Initialise CLI router module V1.0.0.
2016-08-25 05:13:00   notice : Loaded module cli: V1.0.0 from /usr/lib/x86_64-linux-gnu/maxscale/libcli.so
2016-08-25 05:13:00   notice : Initializing statemend-based read/write split router module.
2016-08-25 05:13:00   notice : Loaded module readwritesplit: V1.0.2 from /usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so
2016-08-25 05:13:00   notice : Initialise the MySQL Galera Monitor module V2.0.0.
2016-08-25 05:13:00   notice : Loaded module galeramon: V2.0.0 from /usr/lib/x86_64-linux-gnu/maxscale/libgaleramon.so
2016-08-25 05:13:00   notice : Using encrypted passwords. Encryption key: '/var/lib/maxscale/.secrets'.
2016-08-25 05:13:00   notice : No query classifier specified, using default 'qc_mysqlembedded'.
2016-08-25 05:13:00   notice : Loaded module qc_mysqlembedded: V1.0.0 from /usr/lib/x86_64-linux-gnu/maxscale/libqc_mysqlembedded.so
2016-08-25 05:13:00   notice : qc_mysqlembedded loaded.
2016-08-25 05:13:01   notice : Query classifier initialized.
2016-08-25 05:13:01   notice : MariaDB Corporation MaxScale 1.4.3 (C) MariaDB Corporation Ab 2013-2015
2016-08-25 05:13:01   notice : MaxScale is running in process 4000
2016-08-25 05:13:01   notice : Loaded 2 MySQL Users for service [Splitter Service].
2016-08-25 05:13:01   notice : Loaded module MySQLClient: V1.0.0 from /usr/lib/x86_64-linux-gnu/maxscale/libMySQLClient.so
2016-08-25 05:13:01   notice : Listening MySQL connections at 192.168.1.232:3306
2016-08-25 05:13:01   notice : Listening MySQL connections at /tmp/ClusterMaster
2016-08-25 05:13:01   notice : Loaded module maxscaled: V1.0.0 from /usr/lib/x86_64-linux-gnu/maxscale/libmaxscaled.so
2016-08-25 05:13:01   notice : Listening maxscale connections at 0.0.0.0:6603
2016-08-25 05:13:01   notice : Started MaxScale log flusher.
2016-08-25 05:13:01   notice : MaxScale started with 4 server threads.
2016-08-25 05:13:01   notice : Server changed state: galera1[192.168.1.226:3306]: new_slave
2016-08-25 05:13:01   notice : Server changed state: galera2[192.168.1.176:3306]: new_master
2016-08-25 05:13:01   notice : Server changed state: galera3[192.168.1.163:3306]: new_slave

Let’s log into maxscale and see if it worked. The default password is mariadb. If you create a new user with “add user username password” it will disable the normal login and you can log in using “maxscale -u username”

root@ha01:~# maxadmin
Password: 
MaxScale> list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status              
-------------------+-----------------+-------+-------------+--------------------
galera1            | 192.168.1.226   |  3306 |           0 | Slave, Synced, Running
galera2            | 192.168.1.176   |  3306 |           0 | Master, Synced, Running
galera3            | 192.168.1.163   |  3306 |           0 | Slave, Synced, Running
-------------------+-----------------+-------+-------------+--------------------

Let’s test failover now. On the first ha box show the existing IP’s, stop maxscale, wait 2 seconds, and check the IP’s again.

root@ha01:~# ip addr show dev eth0
41: eth0@if42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 62:12:98:dc:0f:d0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.216/24 brd 192.168.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.1.232/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::6012:98ff:fedc:fd0/64 scope link 
       valid_lft forever preferred_lft forever
root@ha01:~# service maxscale stop
 * Stopping MaxScale                                                 [ OK ]
root@ha01:~# ip addr show dev eth0
41: eth0@if42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 62:12:98:dc:0f:d0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.216/24 brd 192.168.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::6012:98ff:fedc:fd0/64 scope link 
       valid_lft forever preferred_lft forever

After stopping maxscale we lost our 232 IP. Switch to the second ha box and list it’s IP’s. You’ll see that it now exists there.

root@ha02:~# ip addr show dev eth0
43: eth0@if44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 16:8e:c4:53:4c:0b brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.114/24 brd 192.168.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.1.232/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::148e:c4ff:fe53:4c0b/64 scope link 
       valid_lft forever preferred_lft forever

Last step is to turn maxscale back on on the first node and ensure we get our IP back.

root@ha01:~# service maxscale start
 * Starting MaxScale                                                                                                                                                                                                                                                                       * maxscale is running                                                 [ OK ]
root@ha01:~# ip addr show dev eth0
41: eth0@if42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 62:12:98:dc:0f:d0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.216/24 brd 192.168.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.1.232/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::6012:98ff:fedc:fd0/64 scope link 
       valid_lft forever preferred_lft forever

All you have to do now is set maxscale to autostart and you’re done.

root@ha01:~# update-rc.d maxscale defaults
 Adding system startup for /etc/init.d/maxscale ...
   /etc/rc0.d/K20maxscale -> ../init.d/maxscale
   /etc/rc1.d/K20maxscale -> ../init.d/maxscale
   /etc/rc6.d/K20maxscale -> ../init.d/maxscale
   /etc/rc2.d/S20maxscale -> ../init.d/maxscale
   /etc/rc3.d/S20maxscale -> ../init.d/maxscale
   /etc/rc4.d/S20maxscale -> ../init.d/maxscale
   /etc/rc5.d/S20maxscale -> ../init.d/maxscale

Congrats, you just set up a bad ass resilient galera cluster. Set all your clients to hit your new virtual IP and rejoice in the knowledge that the only thing that can stops you now is someone dropping an entire database and it replicating to your other nodes… But you have backups for that right?

Leave a Reply

Your email address will not be published. Required fields are marked *