Building a MySQL-MariaDB Galera Cluster

Want a high availability mysql cluster were you can access all the database nodes at the same time? As in Primary-Primary-Primary-Primary-etc setups but without the headache that comes with those?

Then this is the one for you.

In here I will show how to make, configure, and use a cluster that will automatically balance and share all the data within it. It does it automatically so you can write/change mysql data to one host and then read (or write) it from/to another host, and it won’t miss a beat.

Before you begin

Galera only can use InnoDB tables. If you have any MyISAM tables, you will need to convert them to InnoDB (MyISAM is not transactional). The only exception is the system mysql.user table. Do NOT convert this, as mysql needs it as a MyISAM. Just remember to only use CREATE USER commands and such to make/edit users. Only then does it get copied over to the other hosts.

Reads are very fast. Using sysbench in comparing a Galera cluster to a single instance database, for loads with mostly reads, the Galera cluster of 4 nodes was almost twice as fast as the single. It did 427028 vs 239680 reads in 60 secs, at 7321 vs 3994 per sec, averaging 1.91ms vs 3.50ms on a single request. That makes sense that a cluster would be faster at reading.

On the other hand, with mixed read and writes, the Galera cluster was somewhat slower for me at a total of 160246 vs 220134 read/writes in 60 secs, or 2670 vs 3668 per second, averaging 7ms for a query vs 5ms on the single. An optimized cluster and changing some my.cnf values may improve this, as I was using the default settings. It may be because it has to update the cluster with each write. I may look more into this at a later time.

The other thing to know is you must always have at least 2 nodes running at any one time. If it ever drops to 1, the last server will stop responding to requests due to a possible split brain issue.

Lastly, the Galera Cluster does automatically balance between the nodes. The node you connect to has the data from the others, and will push writes it receives to the other ones.

You can use other software to load balance it, however you will also need to split your reads from your writes, otherwise, depending on your data and uses, you may get ‘deadlock’ issues where two or more hosts are trying to edit the same data.

Installation

Here some basic installation instructions. These are just some suggestions. All you need is MariaDB and the Galera plugin.

Debian/Ubuntu Redhat/Fedora Arch

apt-get install mariadb mariadb-server galera

dnf install mariadb mariadb-server galera

pacman -Sy mariadb perl-dbd-mysql lsof

It needs lsof for the cluster to work properly. galera must be installed from source, found below.

If your distro doesn’t provide galera, then go to the Installing Galera from Source page, and follow the instructions on how to install it.

Setting it up

You may use your own current databases. But remember that you’ll need to convert any MyISAM databases, as Galera only supports InnoDB.

To make new databases, you usually use this command. Modify if needed.

mysql_install_db --user=mysql --basedir=/usr --datadir=/var/lib/mysql

Do NOT start up databases at this point! We will get to that point…

Configuration

These servers used an 10.20.30.101-104 network to communicate. Add the following to your my.cnf. For now I would not use any optimized settings or other [mysqld] [mysqld_safe] until you know what you’re doing.

The MariaDB server

[mysqld]
port= 3306
datadir=/var/lib/mysql
socket=/run/mysqld/mysqld.sock
user=mysql
binlog_format=ROW
bind-address=0.0.0.0
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
innodb_buffer_pool_size=122M
wsrep_on=ON
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_provider_options="gcache.size=300M; gcache.page_size=300M"
wsrep_cluster_name="awesome_cluster"
wsrep_node_name=host01
wsrep_node_address="10.20.30.101"
wsrep_cluster_address="gcomm://10.20.30.101,10.20.30.102,10.20.30.103,10.20.30.104"
wsrep_sst_method=rsync

[mysql_safe]
log-error=/var/log/mysqld.log
pid-file=/run/mysqld/mysqld.pid

You may have to update the above. Make sure you set wsrep_on=ON. And that wsrep_provider=/usr/lib/libgalera_smm.so is the right path to the libgalera_smm.so library.

Set wsrep_cluster_address to the list of your servers ip addresses composing your cluster. You may want to customize each so it doesn’t have it’s own address in the list, but I found it would just ignore it anyways.

On each host set the wsrep_node_name and wsrep_node_address values to the current host’s name and address.

Be aware that a Galera Mariadb cluster can use a ton of ram, and you may not be able to predict it. So set wsrep_provider_options="gcache.size=300M; gcache.page_size=300M" to a high enough value but not too high value.

Firewall

Open up your firewall between your nodes for tcp ports 3306, 4567, 4568, and 4444; and udp port 4567.

Here is an example for iptables/ip6tables rules to open up the ports on interface ‘internal’.

# For your mysql connection
-A INPUT -i internal -m tcp -p tcp --dport 3306 -j ACCEPT
# Mysql cluster
-A INPUT -i internal -p tcp -m tcp --dport 4567 -j ACCEPT
-A INPUT -i internal -p tcp -m tcp --dport 4568 -j ACCEPT
-A INPUT -i internal -p tcp -m tcp --dport 4444 -j ACCEPT
-A INPUT -i internal -p udp -m udp --dport 4567 -j ACCEPT

Be sure to restart/reload your firewall. For example:

systemctl restart ip{,6}tables
# or: service iptables restart

Starting the database

Warning

Do not start them yet.

As noted before, do NOT just start those databases. We need to first start up an initial (Primary) database, to bootstrap the cluster. Then we’ll start the others and they will connect to it.

To bootstrap it, make sure you have initialized the databases and added the config and firewall rules above. Then run this on one of the servers.

[root@host01]:  mysqld_bootstrap

or on other systems

[root@host01]:  galera_new_cluster

It will just stay on the terminal. Don’t expect it to fork and give your terminal back. This will start the first node as a new cluster, in a mode where it’s ready for others to connect to it.

Now, you can start mysql on another host:

[root@host02]:  systemctl start mysqld
# or: service mysqld start

Using journalctl -f on this second host you’ll should see it connected

WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
WSREP: wsrep_load(): Galera ... loaded successfully.
WSREP: Could not open state file for reading: '/var/lib/mysql//grastate.dat'
WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1,
WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0)
-> new(00000000-0000-0000-0000-000000000000:-1)
...
WSREP: gcomm: connecting to group 'awesome_cluster'...
WSREP: (55144012, 'tcp://0.0.0.0:4567') connection established to ...

The “00000000-0000-0000-0000-000000000000” means it hasn’t synced yet. Once it syncs you’ll see that changed to something like “4f81e556-f01d-11e6-91fa-462bd707f4d5”.

However, if you get the following error, your firewall may still be blocking one or more of the 45xx or 46xx ports. You may want to test that you can connect via nmap or telnet. Remember 4567 is a udp port. I got it because on of the ports was not allowed.

[Warning] WSREP: Failed to prepare for incremental state transfer:
Local state UUID (00000000-0000-0000-0000-000000000000) does not
match group state UUID (4f81e556-f01d-11e6-91fa-462bd707f4d5): 1
(Operation not permitted)

You can check the status of the cluster inside of mysql too. On any host connect to the database.

[root@hostAny]:  mysql -u root -p

And ask it for the cluster status at the MariaDB prompt.

SHOW STATUS LIKE 'wsrep_cluster_size';

That should show:

+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 2     |
+--------------------+-------+

The number will be however many are connected. You’ll get 1 when you start the bootstrap, and then 2, 3 and so forth as you connect the others. You can also use SHOW STATUS LIKE 'wsrep%'; to show all the stats, options, and info.

With that your second host is now connected. It will sync whatever databases are on the first. Proceed to go on to the others, start up the services, and verify they have synced.

Once you are done, you can Ctrl+C the first one, and start it’s service properly. If everything is working right, it will rejoin the cluster without any issue.

Indeed, you can now stop any host or mysql instance. Just make sure you have at least 2 instances running to keep the cluster going.

Root password

If you haven’t done so, setup your root password. You only have to do this on one of the hosts. It syncs to the others. You may also want to run mysql_secure_installation to get rid of testing databases.

If you want to block network root access, that would be ok. The cluster doesn’t need (mysql) root access to sync. And I find I can just manage the one I’m on and the changes sync to the others. Unless of course you have an application that needs root access, and it’s not on one of the servers.

Trying it out

Ok, let’s see it in action. On one host, let’s make a database by connecting to the database on host01.

[root@host01]:  mysql -u root -p

And creating the database at the MariaDB prompt.

CREATE DATABASE sandbox;
CREATE USER 'kid'@'%' IDENTIFIED BY 'PASSWORD';
GRANT ALL PRIVILEGES ON sandbox.* TO 'kid'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

Info

Must use ‘%’ instead of just localhost so it can reach it from the other hosts.

Then on another host02, verify that the database is there. Connect to the host02 database.

[root@host02]: mysql -u root -p

And display the databases.

show databases;

+--------------------+
| Database           |
+--------------------+
| sandbox            |
...
+--------------------+

Verify the user kid exists from host02:

[root@host02]: mysql -u root -p

SELECT User, Host FROM mysql.user WHERE Host <> 'localhost';

+----------+-----------+
| User     | Host      |
+----------+-----------+
| kid@%    | %         |
...
+----------+-----------+

Tip

On the next steps I’ll specify which host we’re running the sql commands on, using your current connection via mysql -u root -p from above.

Now let’s make a table on host02:

CREATE TABLE sandbox.toys ( id INT NOT NULL AUTO_INCREMENT, type VARCHAR(50), quant INT, color VARCHAR(25), PRIMARY KEY(id));

And add some data:

INSERT INTO sandbox.toys (type, quant, color) VALUES ("truck", 2, "yellow");

Now, go back to host01 and check it out:

SELECT * FROM sandbox.toys;

That should output:

+----+--------+-------+--------+
| id | type   | quant | color  |
+----+--------+-------+--------+
|  1 | truck  |     2 | yellow |
+----+--------+-------+--------+

Let’s add a red car from host01:

INSERT INTO sandbox.toys (type, quant, color) VALUES ("car", 1, "red");

Go over to host03, connect to the database using mysql -u root -p, and check it out:

SELECT * FROM sandbox.toys;

That should output:

+----+--------+-------+--------+
| id | type   | quant | color  |
+----+--------+-------+--------+
|  1 | truck  |     2 | yellow |
|  2 | car    |     1 | red    |
+----+--------+-------+--------+

There it is. Already synced and ready to go.

Connecting your App

To connect your app, have it connect to the 3306 port of a cluster node, or use a load balancer.

Load Balancing and Fail Over

While it’s possible to just split the requests between all servers, this could cause ‘deadlocks’ if the same data is updated by different requests at the same time.

This can be minimized by splitting read/writes, and sending the writes to one server. More on that to come.

Benchmarking

To benchmark, there’s lots of programs out there. I used sysbench. Here’s how I did it.

First, make a sbtest user and database. I set a password for it too.

[root@host02]:  mysql -u root -p

And at the MariaDB prompt:

CREATE DATABASE sbtest;
CREATE USER 'sbtest'@'%' IDENTIFIED BY 'PASSWORD';
GRANT ALL PRIVILEGES ON sbtest.* TO 'sbtest'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

Remember, you only have to do this on once and it will sync to the whole cluster.

In this we will assume the host HOST is your load balancer or gateway to your cluster. Change it according to how you set up your cluster.

First, prepare the database:

sysbench --test=oltp --mysql-host=HOST --db-driver=mysql \
 --mysql-port=3306 --mysql-db=sbtest --mysql-password=PASSWORD cleanup

sysbench --test=oltp --mysql-host=HOST --db-driver=mysql \
 --mysql-port=3306 --mysql-db=sbtest --mysql-password=PASSWORD prepare

Then we can run the read/write test. This will run it for 60 seconds and show the results:

sysbench --test=oltp --mysql-host=HOST --db-driver=mysql \
 --mysql-port=3306 --mysql-db=sbtest --mysql-password=PASSWORD \
 --max-time=60 --max-requests=0 run

Now run the read-only one:

sysbench --test=oltp --mysql-host=HOST --db-driver=mysql \
 --mysql-port=3306 --mysql-db=sbtest --mysql-password=PASSWORD \
 --max-time=60 --max-requests=0 --oltp-read-only=on run

There are other options like --mysql-engine-trx=yes and --oltp-connect-delay=0 that you can try, but I’ll leave that to you.

Restarting Broken Cluster

I had my entire cluster go down. Trying to start them up, no server would take the lead, which is understandable. No one’s the leader, and none of them want to take it in case they break something. So, per the note in the logs, I went ahead and edited /var/lib/mysql/grastate.dat and set safe_to_bootstrap to 1.

cat /var/lib/mysql/grastate.dat

That should output something like:

# GALERA saved state
version: 2.1
uuid:    xxxxxxx-xxx-xxxxx-xxxxxxx
seqno:   -1
safe_to_bootstrap: 1

Of course I had a backup of all my data. Just in case. Never do this unless you have to.

Make sure all your mysql instances are down. Pick one as the Primary. I picked the one that I knew was the last one running before they went down.

Then run ‘galera_new_cluster’ on it. It started and just sat there. Looking at the logs:

Flow-control interval: [16, 16]
Restored state OPEN -> JOINED (433557)
WSREP: Member 0.0 (pod01) synced with group.
WSREP: Shifting JOINED -> SYNCED (TO: 433557)
WSREP: New cluster view: global state: 4f81e556-f01d-11e6-91fa-462bd707f4d5:433557, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 3
...
Started MariaDB database server

It then forked from the terminal. I started up the other ones normally (systemctl start mariadb) and all of them showed up as synced and part of the cluster.

Member x synced with group

And that’s how I got it back up. Checked my data and it was all still there. Nothing lost but some time.

Lesson learned: It’s a cluster and as such needs at least 2 working nodes up and connected to be able to keep the running state.

Conclusion

Setting up a Galera Cluster isn’t very hard, and can present some great speed benefits as well as failover.

And as for the opening image, it’s from some interacting galaxies called “Arp 273”, posted at NASA in 2010.

Arp 273

From NASA’s page (Archived version, as NASA removed the page):

The larger of the spiral galaxies, known as UGC 1810, has a disk that is distorted into a rose-like shape by the gravitational tidal pull of the companion galaxy below it, known as UGC 1813. This image is a composite of Hubble Wide Field Camera 3 data taken on December 17, 2010, with three separate filters that allow a broad range of wavelengths covering the ultraviolet, blue, and red portions of the spectrum.

Before you begin#

Installation#

Setting it up#

Configuration#

The MariaDB server#

Firewall#

Starting the database#

Root password#

Trying it out#

Connecting your App#

Load Balancing and Fail Over#

Benchmarking#

Restarting Broken Cluster#

Conclusion#