This is a guide meant to provide you with a simple, highly available, and balanced mysql cluster that works and is resilient as hell.
First, you’ll need five machines or five VM’s. I’m running Ubuntu 14.04 so to follow the guide you should as well. I’ll be notating each galera box as node1-3 and each maxscale box as ha1-2.
We need to add the galera repository. SSH into all three nodes and run the following
root@galera01:~# sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys cbcb082a1bb943db root@galera01:~# echo 'deb http://mirror.jmu.edu/pub/mariadb/repo/5.5/ubuntu trusty main' | sudo tee /etc/apt/sources.list.d/galera.list root@galera01:~# sudo apt-get update
Once that’s done we need to install galera and set the root password
root@galera01:~# sudo apt-get install mariadb-galera-server galera
If the service started automatically make sure and stop it now on all nodes. Otherwise what we do in later steps will require you to force it to die.
root@galera01:~# sudo service mysql stop
Next edit /etc/mysql/my.cnf. I’m going to set the bind address to 0.0.0.0 so it’ll work for anyone but you can lock it down to a specific IP if you want. We can’t use loopback because the cluster members have to talk to one another. I’m also going to enable the slow log.
bind-address = 0.0.0.0 slow_query_log=1
Now, we need to create /etc/mysql/conf.d/cluster.cnf. The important variables in the file are the bind-address which once again is set to 0.0.0.0 for any IP. The wsrep_cluster_name can be anything you’d like but should be the same on all nodes of the cluster. The wsrep_cluster_address field should contain IP’s for all 3 of your nodes as shown. The wsrep_sst_method I use is rsync. Simply put, the gcache size is a cache of replication actions that need to be performed on a node that goes down temporarily and this will function over IST. If you go over your gcache you’ll use SST and rsync. This will lock the donor node until the failed node comes back up. You could use mysqldump but it’s slower. Lastly, your wsrep_node_address should be the node’s address you’re editing the file on. The same goes for the wsrep_node_name. I’ve added the last line wsrep_retry_autocommit because there are occasions where a cluster wide deadlock happens when another node is writing to a table that another node is also trying to write to. These only apply to autocommit statements. I chose to include this because I have seen occurrences where an application encounters a deadlock and the driver, despite being told “retry the transation”, simply errors out.
[mysqld] query_cache_size=0 binlog_format=ROW default-storage-engine=innodb innodb_autoinc_lock_mode=2 query_cache_type=0 bind-address=0.0.0.0 # Galera Provider Configuration wsrep_provider=/usr/lib/galera/libgalera_smm.so #wsrep_provider_options="gcache.size=32G" # Galera Cluster Configuration wsrep_cluster_name="my_cluster" wsrep_cluster_address="gcomm://192.168.1.130,gcomm://192.168.1.198,gcomm://192.168.1.167" # Galera Synchronization Congifuration wsrep_sst_method=rsync wsrep_sst_auth=galerauser:secretsdontmakefriends # Galera Node Configuration wsrep_node_address="192.168.1.130" wsrep_node_name="galera01" wsrep_retry_autocommit=4
Next, we need to copy /etc/mysql/debian.cnf from the first node and replace the contents of the file with every other node. You can simply copy/paste or scp the file over. It says in caps “DO NOT TOUCH” but I’m an adult and can do what I want. The main piece we need syncd is the password= field. We’ll bootstrap every other node off the first so it will work fine. Don’t copy and paste mine below, I’m simply including it so you can reference what it looks like.
# DO NOT COPY THIS - USE THE ONE ON YOUR FIRST NODE #Automatically generated for Debian scripts. DO NOT TOUCH! [client] host = localhost user = debian-sys-maint password = QLlRzgOp4ZIDVSVF socket = /var/run/mysqld/mysqld.sock [mysql_upgrade] host = localhost user = debian-sys-maint password = QLlRzgOp4ZIDVSVF socket = /var/run/mysqld/mysqld.sock basedir = /usr
Create a .my.cnf file in /root if you don’t feel like typing passwords
[client] user=root password="bananah4mmock" [mysql] user=root password="bananah4mmock"
Run mysql_secure_installation on the first node.
root@galera01:~# mysql_secure_installation NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MariaDB SERVERS IN PRODUCTION USE! PLEASE READ EACH STEP CAREFULLY! In order to log into MariaDB to secure it, we'll need the current password for the root user. If you've just installed MariaDB, and you haven't set the root password yet, the password will be blank, so you should just press enter here. Enter current password for root (enter for none): OK, successfully used password, moving on... Setting the root password ensures that nobody can log into the MariaDB root user without the proper authorisation. You already have a root password set, so you can safely answer 'n'. Change the root password? [Y/n] n ... skipping. By default, a MariaDB installation has an anonymous user, allowing anyone to log into MariaDB without having to have a user account created for them. This is intended only for testing, and to make the installation go a bit smoother. You should remove them before moving into a production environment. Remove anonymous users? [Y/n] ... Success! Normally, root should only be allowed to connect from 'localhost'. This ensures that someone cannot guess at the root password from the network. Disallow root login remotely? [Y/n] ... Success! By default, MariaDB comes with a database named 'test' that anyone can access. This is also intended only for testing, and should be removed before moving into a production environment. Remove test database and access to it? [Y/n] - Dropping test database... ... Success! - Removing privileges on test database... ... Success! Reloading the privilege tables will ensure that all changes made so far will take effect immediately. Reload privilege tables now? [Y/n] ... Success! Cleaning up... All done! If you've completed all of the above steps, your MariaDB installation should now be secure. Thanks for using MariaDB!
Stop mysql on all nodes. Then, on the first node we’ll start our new cluster.
root@galera01:~# service mysql stop * Stopping MariaDB database server mysqld [ OK ] root@galera01:~# service mysql start --wsrep-new-cluster * Starting MariaDB database server mysqld [ OK ]
Now, on the other nodes we will simply start the mysql service. It’ll bootstrap off the first automatically.
root@galera02:~# service mysql stop * Stopping MariaDB database server mysqld [ OK ] root@galera02:~# service mysql start * Starting MariaDB database server mysqld [ OK ]
Now, since we were lazy and created a .my.cnf file just run mysql -u root and get dropped into a shell. We’re going to create a database and watch it get replicated to the other nodes
root@galera01:~# mysql -u root Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 30 Server version: 5.5.51-MariaDB-1~trusty-wsrep mariadb.org binary distribution, wsrep_25.14.r9949137 Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> create database replicateme; Query OK, 1 row affected (0.00 sec)
Head over to another node and log into mysql and run a show databases. You’ll see that our database was replicated just fine.
root@galera02:~# mysql -u root Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 31 Server version: 5.5.51-MariaDB-1~trusty-wsrep mariadb.org binary distribution, wsrep_25.14.r9949137 Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | replicateme | +--------------------+ 4 rows in set (0.00 sec)
Now, you’re free to use this cluster. If you follow all the galera guidelines and don’t do anything stupid ever then you can use this as a master-master-master without problem using a fancy F5 or haproxy in a round robin. However, people and programs aren’t perfect and rewriting everyone’s code isn’t a viable solution. So, to eliminate deadlocks entirely, monitor the cluster, and have automatic failover we’re going to set up MaxScale. We are going to set up MaxScale to analyze the galera cluster, elect a master, analyze all incoming queries, and split writes to the master and reads to the rest of the slaves. Galera uses optimistic cluster wide locking. So, if you update a row on one node and do it on another node at the same time you can run into a deadlock. Splitting the queries will eliminate this.
To get all the fancy resiliency this post is meant to illustrate to do we need to add the maxscale repo, update apt, and install maxscale and keepalived. Do this on both nodes.
root@ha01:~# sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 70E4618A8167EE24 root@ha01:~# echo 'deb https://downloads.mariadb.com/files/MaxScale/1.4.3/ubuntu trusty main' |sudo tee /etc/apt/sources.list.d/maxscale.list root@ha01:~# apt-get update root@ha01:~# apt-get install maxscale keepalived
We’ll set up keepalived first. Edit /etc/keepalived/keepalived.conf on the first ha node. We’re going to make this the master. It’s priority needs to be higher than the second ha box. Your virtual_ipaddress needs to be an IP we haven’t used before and isn’t in use by any other system. Keepalived is going to claim it. The track script simply checks to see if maxscale is running and fails over to the slave if it isn’t.
vrrp_script chk_maxscale { script "killall -0 maxscale" interval 2 weight 2 } vrrp_instance VI_1 { interface eth0 state MASTER virtual_router_id 51 priority 101 virtual_ipaddress { 192.168.1.232 } track_script { chk_maxscale } }
Now, on our second ha box we’ll create /etc/keepalived/keepalived.conf again but set it up to be the failover
vrrp_script chk_maxscale { script "killall -0 maxscale" interval 2 weight 2 } vrrp_instance VI_1 { interface eth0 state BACKUP virtual_router_id 51 priority 100 virtual_ipaddress { 192.168.1.232 } track_script { chk_maxscale } }
With that done we can edit /etc/sysctl.conf and add a line allowing us to bind services to IP’s we don’t really own. Add this to the bottom on both nodes
net.ipv4.ip_nonlocal_bind = 1
Load in the sysctl settings
root@ha01:~# sysctl -p net.ipv4.ip_nonlocal_bind = 1
Now, we’ll set up maxscale. First we need to generate a .secret file. Do this with maxkeys.
root@ha01:~# maxkeys Generating .secrets file in /var/lib/maxscale ...
Next hash and remember a password we’ll use in a second for maxscale to monitor the cluster. You need to run this on both nodes and copy the hashes or run it on one and copy the .secrets file it creates to the second.
root@ha01:~# maxpasswd /var/lib/maxscale/ superduperpassword F60FB938B468277026925BBEBCD35E5ACBA982A1AD2A9DA2956834CAE12C996C 2016-08-25 04:53:25 notice : Using encrypted passwords. Encryption key: '/var/lib/maxscale/.secrets'.
On the first galera node we’re going to create a user and grant it some permissions. I’m using wildcards but feel free to relegate it to your ha boxes’ main IP’s. Remember that these changes will now replicate so you only have to do it once.
MariaDB [(none)]> CREATE USER 'maxscale'@'%' IDENTIFIED BY 'superduperpassword'; Query OK, 0 rows affected (0.01 sec) MariaDB [(none)]> GRANT SELECT ON mysql.user TO 'maxscale'@'%'; Query OK, 0 rows affected (0.01 sec) MariaDB [(none)]> GRANT SELECT ON mysql.db TO 'maxscale'@'%'; Query OK, 0 rows affected (0.01 sec) MariaDB [(none)]> GRANT SELECT ON mysql.tables_priv TO 'maxscale'@'%'; Query OK, 0 rows affected (0.02 sec) MariaDB [(none)]> GRANT SHOW DATABASES ON *.* TO 'maxscale'@'%'; Query OK, 0 rows affected (0.01 sec) MariaDB [(none)]> GRANT REPLICATION CLIENT ON *.* TO 'maxscale'@'%'; Query OK, 0 rows affected (0.00 sec)
Next, we need to edit /etc/maxscale.cnf on both ha boxes. Your config should match mine so remove any extra config info from the file. The threads should be set to your CPU count. The servers in the Splitter Service should be the names in [] that define your galera nodes. Mine are galera1-3. The user is the user you created in the previous step and the passwd field is the password hash that was output when you ran maxpasswd. In the Splitter Listener you need to define the virtual IP you set in keepalived.conf. Mine was 192.168.1.232. On each of the [galera#] sections you need to specify the node’s IP and mysql port. The Galera Monitor section should mirror the Splitter Service section in relation to the servers, user, and passwd. Leave the CLI sections alone. Remember that if you ran maxpasswd on both nodes instead of copying the .secrets file that you will have different hashes between the two servers.
[maxscale] threads=4 [Splitter Service] type=service router=readwritesplit servers=galera1, galera2, galera3 user=maxscale passwd=F60FB938B468277026925BBEBCD35E5ACBA982A1AD2A9DA2956834CAE12C996C [Splitter Listener] type=listener service=Splitter Service protocol=MySQLClient port=3306 address=192.168.1.232 socket=/tmp/ClusterMaster [galera1] type=server address=192.168.1.226 port=3306 protocol=MySQLBackend [galera2] type=server address=192.168.1.176 port=3306 protocol=MySQLBackend [galera3] type=server address=192.168.1.163 port=3306 protocol=MySQLBackend [Galera Monitor] type=monitor module=galeramon servers=galera1, galera2, galera3 user=maxscale passwd=F60FB938B468277026925BBEBCD35E5ACBA982A1AD2A9DA2956834CAE12C996C [CLI] type=service router=cli [CLI Listener] type=listener service=CLI protocol=maxscaled port=6603
That’s it for the configuration. Now, let’s start keepalived. Once you do you should see your virtual IP appear. Do this on both ha boxes.
root@ha01:~# service keepalived start * Starting keepalived keepalived [ OK ] root@ha01:~# ip addr show dev eth0 41: eth0@if42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 62:12:98:dc:0f:d0 brd ff:ff:ff:ff:ff:ff inet 192.168.1.216/24 brd 192.168.1.255 scope global eth0 valid_lft forever preferred_lft forever inet 192.168.1.232/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::6012:98ff:fedc:fd0/64 scope link valid_lft forever preferred_lft forever
Once that’s done we can finally start maxscale and see if we screwed up.
root@ha01:~# service maxscale start * Starting MaxScale * maxscale is running [ OK ] root@ha01:~# cat /var/log/maxscale/maxscale1.log MariaDB Corporation MaxScale /var/log/maxscale/maxscale1.log Thu Aug 25 05:13:00 2016 ----------------------------------------------------------------------- 2016-08-25 05:13:00 notice : Configuration file: /etc/maxscale.cnf 2016-08-25 05:13:00 notice : Log directory: /var/log/maxscale 2016-08-25 05:13:00 notice : Data directory: /var/lib/maxscale 2016-08-25 05:13:00 notice : Module directory: /usr/lib/x86_64-linux-gnu/maxscale 2016-08-25 05:13:00 notice : Service cache: /var/cache/maxscale 2016-08-25 05:13:00 notice : Initialise CLI router module V1.0.0. 2016-08-25 05:13:00 notice : Loaded module cli: V1.0.0 from /usr/lib/x86_64-linux-gnu/maxscale/libcli.so 2016-08-25 05:13:00 notice : Initializing statemend-based read/write split router module. 2016-08-25 05:13:00 notice : Loaded module readwritesplit: V1.0.2 from /usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so 2016-08-25 05:13:00 notice : Initialise the MySQL Galera Monitor module V2.0.0. 2016-08-25 05:13:00 notice : Loaded module galeramon: V2.0.0 from /usr/lib/x86_64-linux-gnu/maxscale/libgaleramon.so 2016-08-25 05:13:00 notice : Using encrypted passwords. Encryption key: '/var/lib/maxscale/.secrets'. 2016-08-25 05:13:00 notice : No query classifier specified, using default 'qc_mysqlembedded'. 2016-08-25 05:13:00 notice : Loaded module qc_mysqlembedded: V1.0.0 from /usr/lib/x86_64-linux-gnu/maxscale/libqc_mysqlembedded.so 2016-08-25 05:13:00 notice : qc_mysqlembedded loaded. 2016-08-25 05:13:01 notice : Query classifier initialized. 2016-08-25 05:13:01 notice : MariaDB Corporation MaxScale 1.4.3 (C) MariaDB Corporation Ab 2013-2015 2016-08-25 05:13:01 notice : MaxScale is running in process 4000 2016-08-25 05:13:01 notice : Loaded 2 MySQL Users for service [Splitter Service]. 2016-08-25 05:13:01 notice : Loaded module MySQLClient: V1.0.0 from /usr/lib/x86_64-linux-gnu/maxscale/libMySQLClient.so 2016-08-25 05:13:01 notice : Listening MySQL connections at 192.168.1.232:3306 2016-08-25 05:13:01 notice : Listening MySQL connections at /tmp/ClusterMaster 2016-08-25 05:13:01 notice : Loaded module maxscaled: V1.0.0 from /usr/lib/x86_64-linux-gnu/maxscale/libmaxscaled.so 2016-08-25 05:13:01 notice : Listening maxscale connections at 0.0.0.0:6603 2016-08-25 05:13:01 notice : Started MaxScale log flusher. 2016-08-25 05:13:01 notice : MaxScale started with 4 server threads. 2016-08-25 05:13:01 notice : Server changed state: galera1[192.168.1.226:3306]: new_slave 2016-08-25 05:13:01 notice : Server changed state: galera2[192.168.1.176:3306]: new_master 2016-08-25 05:13:01 notice : Server changed state: galera3[192.168.1.163:3306]: new_slave
Let’s log into maxscale and see if it worked. The default password is mariadb. If you create a new user with “add user username password” it will disable the normal login and you can log in using “maxscale -u username”
root@ha01:~# maxadmin Password: MaxScale> list servers Servers. -------------------+-----------------+-------+-------------+-------------------- Server | Address | Port | Connections | Status -------------------+-----------------+-------+-------------+-------------------- galera1 | 192.168.1.226 | 3306 | 0 | Slave, Synced, Running galera2 | 192.168.1.176 | 3306 | 0 | Master, Synced, Running galera3 | 192.168.1.163 | 3306 | 0 | Slave, Synced, Running -------------------+-----------------+-------+-------------+--------------------
Let’s test failover now. On the first ha box show the existing IP’s, stop maxscale, wait 2 seconds, and check the IP’s again.
root@ha01:~# ip addr show dev eth0 41: eth0@if42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 62:12:98:dc:0f:d0 brd ff:ff:ff:ff:ff:ff inet 192.168.1.216/24 brd 192.168.1.255 scope global eth0 valid_lft forever preferred_lft forever inet 192.168.1.232/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::6012:98ff:fedc:fd0/64 scope link valid_lft forever preferred_lft forever root@ha01:~# service maxscale stop * Stopping MaxScale [ OK ] root@ha01:~# ip addr show dev eth0 41: eth0@if42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 62:12:98:dc:0f:d0 brd ff:ff:ff:ff:ff:ff inet 192.168.1.216/24 brd 192.168.1.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::6012:98ff:fedc:fd0/64 scope link valid_lft forever preferred_lft forever
After stopping maxscale we lost our 232 IP. Switch to the second ha box and list it’s IP’s. You’ll see that it now exists there.
root@ha02:~# ip addr show dev eth0 43: eth0@if44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 16:8e:c4:53:4c:0b brd ff:ff:ff:ff:ff:ff inet 192.168.1.114/24 brd 192.168.1.255 scope global eth0 valid_lft forever preferred_lft forever inet 192.168.1.232/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::148e:c4ff:fe53:4c0b/64 scope link valid_lft forever preferred_lft forever
Last step is to turn maxscale back on on the first node and ensure we get our IP back.
root@ha01:~# service maxscale start * Starting MaxScale * maxscale is running [ OK ] root@ha01:~# ip addr show dev eth0 41: eth0@if42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 62:12:98:dc:0f:d0 brd ff:ff:ff:ff:ff:ff inet 192.168.1.216/24 brd 192.168.1.255 scope global eth0 valid_lft forever preferred_lft forever inet 192.168.1.232/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::6012:98ff:fedc:fd0/64 scope link valid_lft forever preferred_lft forever
All you have to do now is set maxscale to autostart and you’re done.
root@ha01:~# update-rc.d maxscale defaults Adding system startup for /etc/init.d/maxscale ... /etc/rc0.d/K20maxscale -> ../init.d/maxscale /etc/rc1.d/K20maxscale -> ../init.d/maxscale /etc/rc6.d/K20maxscale -> ../init.d/maxscale /etc/rc2.d/S20maxscale -> ../init.d/maxscale /etc/rc3.d/S20maxscale -> ../init.d/maxscale /etc/rc4.d/S20maxscale -> ../init.d/maxscale /etc/rc5.d/S20maxscale -> ../init.d/maxscale
Congrats, you just set up a bad ass resilient galera cluster. Set all your clients to hit your new virtual IP and rejoice in the knowledge that the only thing that can stops you now is someone dropping an entire database and it replicating to your other nodes… But you have backups for that right?