Showing posts with label GTID. Show all posts
Showing posts with label GTID. Show all posts

Sunday, January 22, 2017

MySQL Group Replication vs. Multi Source

In my previous post, we saw the usage of MySQL Group Replication (MGR) in single-primary mode. We know that Oracle does not recommends using MGR in multi-primary mode, but there is so much in the documentation and in presentations about MGR behavior in multi-primary, that I feel I should really give it a try, and especially compare this technology with the already existing multiple master solution introduced in 5.7: multi-source replication.

Installation

To this extent, I will set up two clusters using MySQL-Sandbox. The instructions for MGR in the manual use three nodes in the same host without using MySQL Sandbox. Here we can see that using MySQL-Sandbox simplifies operations considerably (the scripts are available in GitHub):

Group replication

# ----------------------------------------------------------------------------
#!/bin/bash
# mm_gr.sh : installs MySQL Group Replication
MYSQL_VERSION=$1
[ -z "$MYSQL_VERSION" ] && MYSQL_VERSION=5.7.17

make_multiple_sandbox --gtid --group_directory=GR $MYSQL_VERSION

if [ "$?" != "0" ] ; then exit 1 ; fi
multi_sb=$HOME/sandboxes/GR
baseport=$($multi_sb/n1 -BN -e 'select @@port')
baseport=$(($baseport+99))

port1=$(($baseport+1))
port2=$(($baseport+2))
port3=$(($baseport+3))
for N in 1 2 3
do
    myport=$(($baseport+N))
    options=(
        binlog_checksum=NONE
        log_slave_updates=ON
        plugin-load=group_replication.so
        group_replication=FORCE_PLUS_PERMANENT
        group_replication_start_on_boot=OFF
        group_replication_bootstrap_group=OFF
        transaction_write_set_extraction=XXHASH64
        report-host=127.0.0.1
        loose-group_replication_group_name="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
        loose-group_replication_local_address="127.0.0.1:$myport"
        loose-group_replication_group_seeds="127.0.0.1:$port1,127.0.0.1:$port2,127.0.0.1:$port3"
        loose-group-replication-single-primary-mode=off
    )
    $multi_sb/node$N/add_option ${options[*]}

    user_cmd='reset master;'
    user_cmd="$user_cmd CHANGE MASTER TO MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox' FOR CHANNEL 'group_replication_recovery';"
    $multi_sb/node$N/use -v -u root -e "$user_cmd"
done

START_CMD="SET GLOBAL group_replication_bootstrap_group=ON;"
START_CMD="$START_CMD START GROUP_REPLICATION;"
START_CMD="$START_CMD SET GLOBAL group_replication_bootstrap_group=OFF;"
$multi_sb/n1 -v -e "$START_CMD"
sleep 1
$multi_sb/n2 -v -e 'START GROUP_REPLICATION;'
sleep 1
$multi_sb/n3 -v -e 'START GROUP_REPLICATION;'
sleep 1
$multi_sb/use_all 'select * from performance_schema.replication_group_members'
# ----------------------------------------------------------------------------

Using this script, we get a cluster with MGR up and running. Here's a trimmed-out sample of its output:

$ ./mm_gr.sh
installing node 1
installing node 2
installing node 3
group directory installed in $HOME/sandboxes/GR
# option 'binlog_checksum=NONE' added to configuration file
# option 'log_slave_updates=ON' added to configuration file
# option 'plugin-load=group_replication.so' added to configuration file
# option 'group_replication=FORCE_PLUS_PERMANENT' added to configuration file
# option 'group_replication_start_on_boot=OFF' added to configuration file
# option 'group_replication_bootstrap_group=OFF' added to configuration file
# option 'transaction_write_set_extraction=XXHASH64' added to configuration file
# option 'loose-group_replication_group_name=aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee' added to configuration file
# option 'loose-group_replication_local_address=127.0.0.1:14518' added to configuration file
# option 'loose-group_replication_group_seeds=127.0.0.1:14518,127.0.0.1:14519,127.0.0.1:14520' added to configuration file
# option 'loose-group-replication-single-primary-mode=off' added to configuration file
.. sandbox server started
reset master
CHANGE MASTER TO MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox' FOR CHANNEL 'group_replication_recovery'

# [ ...]
.. sandbox server started
reset master
CHANGE MASTER TO MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox' FOR CHANNEL 'group_replication_recovery'

# [...]
.. sandbox server started
reset master
CHANGE MASTER TO MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox' FOR CHANNEL 'group_replication_recovery'

SET GLOBAL group_replication_bootstrap_group=ON
START GROUP_REPLICATION
SET GLOBAL group_replication_bootstrap_group=OFF
--------------

--------------
START GROUP_REPLICATION
--------------
START GROUP_REPLICATION
--------------

Multi-source replication

We have a similar (but much shorter) script to run multi-source replication in sandboxes.

# ----------------------------------------------------------------------------
#!/bin/bash
# mm_ms.sh : installs MySQL multi-source replication
MYSQL_VERSION=$1
[ -z "$MYSQL_VERSION" ] && MYSQL_VERSION=5.7.16

make_multiple_sandbox --gtid --group_directory=MS $MYSQL_VERSION

if [ "$?" != "0" ] ; then exit 1 ; fi
multi_sb=$HOME/sandboxes/MS

$multi_sb/use_all 'reset master'

for N in 1 2 3
do
    user_cmd=''
    for node in 1 2 3
    do
        if [ "$node" != "$N" ]
        then
            master_port=$($multi_sb/n$node -BN -e 'select @@port')
            user_cmd="$user_cmd CHANGE MASTER TO MASTER_USER='rsandbox', "
            user_cmd="$user_cmd MASTER_PASSWORD='rsandbox', master_host='127.0.0.1', "
            user_cmd="$user_cmd master_port=$master_port FOR CHANNEL 'node$node';"
            user_cmd="$user_cmd START SLAVE FOR CHANNEL 'node$node';"
        fi
    done
    $multi_sb/node$N/use -v -u root -e "$user_cmd"
done
# ----------------------------------------------------------------------------

Sample run:

$ ./mm_ms.sh
installing node 1
installing node 2
installing node 3
group directory installed in $HOME/sandboxes/MS
# server: 1:
# server: 2:
# server: 3:
--------------
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14318 FOR CHANNEL 'node2'
START SLAVE FOR CHANNEL 'node2'
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14319 FOR CHANNEL 'node3'
START SLAVE FOR CHANNEL 'node3'

--------------
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14317 FOR CHANNEL 'node1'
START SLAVE FOR CHANNEL 'node1'
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14319 FOR CHANNEL 'node3'
START SLAVE FOR CHANNEL 'node3'

--------------
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14317 FOR CHANNEL 'node1'
START SLAVE FOR CHANNEL 'node1'
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14318 FOR CHANNEL 'node2'
START SLAVE FOR CHANNEL 'node2'
--------------

Simple test data

Finally, we have a script that will create one table for each node and insert one record.


# ----------------------------------------------------------------------------     
#!/bin/bash
multi_sb=$1
if [ -z "$multi_sb" ]
then
    echo multiple sandbox path needed
    exit 1
fi
if [ ! -d $multi_sb ]
then
    echo directory $multi_sb not found
    exit 1
fi
if [ ! -d "$multi_sb/node3" ]
then
    echo directory $multi_sb/node3 not found
    exit 1
fi
cd $multi_sb

for N in  1 2 3 ; do
    ./n$N -e "create schema if not exists test"
    ./n$N -e "drop table if exists test.t$N"
    ./n$N -e "create table test.t$N(id int not null primary key, sid int)"
    ./n$N -e "insert into  test.t$N values ($N, @@server_id)"
done

./use_all 'select * from test.t1 union select * from test.t2 union select * from test.t3'
# ----------------------------------------------------------------------------

We run the script in both clusters, and at the end we'll have the test database with three tables, each one created and filled by a different node.

Checking replication status

The old topology: multi-source

Let's start with the the old technology, so we can easily compare it with the new one.

node1 [localhost] {msandbox} (performance_schema) > select * from replication_connection_status\G
*************************** 1. row ***************************
             CHANNEL_NAME: node2
               GROUP_NAME:
              SOURCE_UUID: 00014318-2222-2222-2222-222222222222   # ----
                THREAD_ID: 32
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 244
 LAST_HEARTBEAT_TIMESTAMP: 2017-01-22 13:31:54
 RECEIVED_TRANSACTION_SET: 00014318-2222-2222-2222-222222222222:1-4
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 2. row ***************************
             CHANNEL_NAME: node3
               GROUP_NAME:
              SOURCE_UUID: 00014319-3333-3333-3333-333333333333   # ----
                THREAD_ID: 34
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 244
 LAST_HEARTBEAT_TIMESTAMP: 2017-01-22 13:31:55
 RECEIVED_TRANSACTION_SET: 00014319-3333-3333-3333-333333333333:1-4
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
2 rows in set (0.00 sec)

Notice that we are benefitting from a feature of MySQL-Sandbox that creates a more readable version of the server UUID. This way we can easily identify the nodes. Here we see that each transaction set has a clearly defined origin. We can see similar information in the replication tables from the mysql database:

node1 [localhost] {msandbox} (mysql) > select * from slave_master_info\G
*************************** 1. row ***************************
       Number_of_lines: 25
       Master_log_name: mysql-bin.000001
        Master_log_pos: 154
                  Host: 127.0.0.1       # ----
             User_name: rsandbox
         User_password: rsandbox
                  Port: 14318           # ----
         Connect_retry: 60
           Enabled_ssl: 0
Ssl_verify_server_cert: 0
             Heartbeat: 30
                  Bind:
    Ignored_server_ids: 0
                  Uuid: 00014318-2222-2222-2222-222222222222  # ----
           Retry_count: 86400
           Ssl_crlpath:
 Enabled_auto_position: 0
          Channel_name: node2
           Tls_version:
*************************** 2. row ***************************
       Number_of_lines: 25
       Master_log_name: mysql-bin.000001
        Master_log_pos: 154
                  Host: 127.0.0.1    # ----
             User_name: rsandbox
         User_password: rsandbox
                  Port: 14319        # ----
         Connect_retry: 60
           Enabled_ssl: 0
Ssl_verify_server_cert: 0
             Heartbeat: 30
                  Bind:
    Ignored_server_ids: 0
                  Uuid: 00014319-3333-3333-3333-333333333333  # ----
           Retry_count: 86400
           Ssl_crlpath:
 Enabled_auto_position: 0
          Channel_name: node3
           Tls_version:
2 rows in set (0.00 sec)

Additionally, we have SHOW SLAVE STATUS, which, although not the ideal monitoring tool, is still the only place where we can see at once both the received and executed transactions, and the corresponding binary log and relay log records.

Here's an abridged version:

node1 [localhost] {msandbox} (performance_schema) > SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 14318
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 965
               Relay_Log_File: mysql-relay-node2.000002
                Relay_Log_Pos: 1178
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 965
              Relay_Log_Space: 1387
             Master_Server_Id: 102
                  Master_UUID: 00014318-2222-2222-2222-222222222222
             Master_Info_File: mysql.slave_master_info
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
           Retrieved_Gtid_Set: 00014318-2222-2222-2222-222222222222:1-4
            Executed_Gtid_Set: 00014317-1111-1111-1111-111111111111:1-4,
00014318-2222-2222-2222-222222222222:1-4,
00014319-3333-3333-3333-333333333333:1-4
                 Channel_Name: node2
*************************** 2. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 14319
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 965
               Relay_Log_File: mysql-relay-node3.000002
                Relay_Log_Pos: 1178
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 965
              Relay_Log_Space: 1387
              Until_Condition: None
             Master_Server_Id: 103
                  Master_UUID: 00014319-3333-3333-3333-333333333333
             Master_Info_File: mysql.slave_master_info
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
           Retrieved_Gtid_Set: 00014319-3333-3333-3333-333333333333:1-4
            Executed_Gtid_Set: 00014317-1111-1111-1111-111111111111:1-4,
00014318-2222-2222-2222-222222222222:1-4,
00014319-3333-3333-3333-333333333333:1-4
                 Channel_Name: node3
2 rows in set (0.00 sec)

Finally, we'll have a look at the data itself:

node1 [localhost] {msandbox} (mysql) > show binlog events;
+------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
| Log_name         | Pos | Event_type     | Server_id | End_log_pos | Info                                                              |
+------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
| mysql-bin.000001 |   4 | Format_desc    |       101 |         123 | Server ver: 5.7.16-log, Binlog ver: 4                             |
| mysql-bin.000001 | 123 | Previous_gtids |       101 |         154 |                                                                   |
| mysql-bin.000001 | 154 | Gtid           |       101 |         219 | SET @@SESSION.GTID_NEXT= '00014317-1111-1111-1111-111111111111:1' |
| mysql-bin.000001 | 219 | Query          |       101 |         325 | create schema if not exists test                                  |
| mysql-bin.000001 | 325 | Gtid           |       101 |         390 | SET @@SESSION.GTID_NEXT= '00014317-1111-1111-1111-111111111111:2' |
| mysql-bin.000001 | 390 | Query          |       101 |         518 | DROP TABLE IF EXISTS `test`.`t1` /* generated by server */        |
| mysql-bin.000001 | 518 | Gtid           |       101 |         583 | SET @@SESSION.GTID_NEXT= '00014317-1111-1111-1111-111111111111:3' |
| mysql-bin.000001 | 583 | Query          |       101 |         711 | create table test.t1(id int not null primary key, sid int)        |
| mysql-bin.000001 | 711 | Gtid           |       101 |         776 | SET @@SESSION.GTID_NEXT= '00014317-1111-1111-1111-111111111111:4' |
| mysql-bin.000001 | 776 | Query          |       101 |         844 | BEGIN                                                             |
| mysql-bin.000001 | 844 | Table_map      |       101 |         890 | table_id: 108 (test.t1)                                           |
| mysql-bin.000001 | 890 | Write_rows     |       101 |         934 | table_id: 108 flags: STMT_END_F                                   |
| mysql-bin.000001 | 934 | Xid            |       101 |         965 | COMMIT /* xid=72 */                                               |
+------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
13 rows in set (0.00 sec)

The binary log contains only the data produced in this node.

The new topology: MGR

Turning to the new software, let's first check whether replication is working. An important note here: SHOW SLAVE STATUS is not available in MGR. That's not entirely true. The channel architecture used for multi-master has been hijacked to convey information about group problems. If something goes wrong during the setup, you will find the information in the groupreplicationrecovery channel.

node1 [localhost] {msandbox} (performance_schema) > SHOW SLAVE STATUS for channel 'group_replication_recovery';
Empty set (0.00 sec)

When things are fine, the tables in performance_schema report a satisfactory status:

node1 [localhost] {msandbox} (performance_schema) > select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 00014418-1111-1111-1111-111111111111 | gmini       |       14418 | ONLINE       |
| group_replication_applier | 00014419-2222-2222-2222-222222222222 | gmini       |       14419 | ONLINE       |
| group_replication_applier | 00014420-3333-3333-3333-333333333333 | gmini       |       14420 | ONLINE       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+

The above command tells us that all nodes are online.

Next, we ask what are the stats of the current member.

node1 [localhost] {msandbox} (performance_schema) > select * from replication_group_member_stats\G
*************************** 1. row ***************************
                      CHANNEL_NAME: group_replication_applier
                           VIEW_ID: 14850806532423012:3
                         MEMBER_ID: 00014418-1111-1111-1111-111111111111
       COUNT_TRANSACTIONS_IN_QUEUE: 0
        COUNT_TRANSACTIONS_CHECKED: 12
          COUNT_CONFLICTS_DETECTED: 0
COUNT_TRANSACTIONS_ROWS_VALIDATING: 0
TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1-7:1000003-1000006:2000003-2000006
    LAST_CONFLICT_FREE_TRANSACTION: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000006
1 row in set (0.00 sec)

The same operation from a different member will give a very similar result.

node2 [localhost] {msandbox} (performance_schema) > select * from replication_group_member_stats\G
*************************** 1. row ***************************
                      CHANNEL_NAME: group_replication_applier
                           VIEW_ID: 14850806532423012:3
                         MEMBER_ID: 00014419-2222-2222-2222-222222222222
       COUNT_TRANSACTIONS_IN_QUEUE: 0
        COUNT_TRANSACTIONS_CHECKED: 12
          COUNT_CONFLICTS_DETECTED: 0
COUNT_TRANSACTIONS_ROWS_VALIDATING: 0
TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1-7:1000003-1000006:2000003-2000006
    LAST_CONFLICT_FREE_TRANSACTION: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000006
1 row in set (0.00 sec)

Then, we check the more classical replication status:

node1 [localhost] {msandbox} (performance_schema) > select * from replication_connection_status\G
*************************** 1. row ***************************
             CHANNEL_NAME: group_replication_applier
               GROUP_NAME: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee
              SOURCE_UUID: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee    # ----
                THREAD_ID: NULL
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 0
 LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00
 RECEIVED_TRANSACTION_SET: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1-7:1000003-1000006:2000003-2000006
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
1 row in set (0.00 sec)

There are a few things that strike the observer immediately:

  • As we saw in the single-primary topology, all transactions bear the UUID of the group, not of the server that generated them. While in single-primary mode this could be considered an asset, as it simplifies a failover procedure, in multi-primary mode I consider it to be a loss. We lose the knowledge of the transaction provenience. As you can see, the SOURCE_UUID field shows the group ID instead of the node.
  • The GTID numbers look odd. There is a set that stars at 1, another set that starts at 1 million, and a third one that starts at 2 million. What's going on? The answer is in the value of group_replication_gtid_assignment_block_size, which determines the block of values for each node. When the values in the block are exhausted, the node allocates another block. Someone could naively think that we could use this block to identify which node the data comes from, but this would be ultimately wrong for two reasons:
    • The blocks are assigned on a first-come-first-served basis. If we start operations in node 2, its transactions will bear the lowest numbers.
    • When the blocks are exhausted, the node starts a new block, meaning that with a busy cluster we will have hard time identifying which nodes uses which block.

If someone thought that we could get some more information from the replication tables in mysql, they are in for a disappointment:

node2 [localhost] {msandbox} (mysql) > select * from slave_master_info\G
*************************** 1. row ***************************
       Number_of_lines: 25
       Master_log_name:
        Master_log_pos: 4
                  Host: <NULL>            # ----
             User_name:
         User_password:
                  Port: 0                 # ----
         Connect_retry: 60
           Enabled_ssl: 0
Ssl_verify_server_cert: 0
             Heartbeat: 30
                  Bind:
    Ignored_server_ids: 0
                  Uuid:                   # ----
           Retry_count: 86400
 Enabled_auto_position: 1
          Channel_name: group_replication_applier
           Tls_version:
*************************** 2. row ***************************
       Number_of_lines: 25
       Master_log_name:
        Master_log_pos: 4
                  Host: <NULL>
             User_name: rsandbox
         User_password: rsandbox
                  Port: 0
         Connect_retry: 60
           Enabled_ssl: 0
Ssl_verify_server_cert: 0
             Heartbeat: 30
                  Bind:
    Ignored_server_ids: 0
                  Uuid:
           Retry_count: 1
 Enabled_auto_position: 1
          Channel_name: group_replication_recovery
           Tls_version:
2 rows in set (0.00 sec)

The table shows group operations rather than individual hosts connections. There is no origin information here.

Looking at the events, we will notice immediately some more differences.

node2 [localhost] {msandbox} (mysql) > show binlog events;
+------------------+------+----------------+-----------+-------------+-------------------------------------------------------------------------+
| Log_name         | Pos  | Event_type     | Server_id | End_log_pos | Info                                                                    |
+------------------+------+----------------+-----------+-------------+-------------------------------------------------------------------------+
| mysql-bin.000001 |    4 | Format_desc    |       102 |         123 | Server ver: 5.7.17-log, Binlog ver: 4                                   |
| mysql-bin.000001 |  123 | Previous_gtids |       102 |         150 |                                                                         |
| mysql-bin.000001 |  150 | Gtid           |       101 |         211 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1'       |
| mysql-bin.000001 |  211 | Query          |       101 |         270 | BEGIN                                                                   |
| mysql-bin.000001 |  270 | View_change    |       101 |         369 | view_id=14850806532423012:1                                             |
| mysql-bin.000001 |  369 | Query          |       101 |         434 | COMMIT                                                                  |
| mysql-bin.000001 |  434 | Gtid           |       101 |         495 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2'       |
| mysql-bin.000001 |  495 | Query          |       101 |         554 | BEGIN                                                                   |
| mysql-bin.000001 |  554 | View_change    |       101 |         693 | view_id=14850806532423012:2                                             |
| mysql-bin.000001 |  693 | Query          |       101 |         758 | COMMIT                                                                  |
| mysql-bin.000001 |  758 | Gtid           |       102 |         819 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:3'       |
| mysql-bin.000001 |  819 | Query          |       102 |         878 | BEGIN                                                                   |
| mysql-bin.000001 |  878 | View_change    |       102 |        1017 | view_id=14850806532423012:3                                             |
| mysql-bin.000001 | 1017 | Query          |       102 |        1082 | COMMIT                                                                  |
| mysql-bin.000001 | 1082 | Gtid           |       101 |        1143 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:4'       |
| mysql-bin.000001 | 1143 | Query          |       101 |        1250 | create schema if not exists test                                        |
| mysql-bin.000001 | 1250 | Gtid           |       101 |        1311 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:5'       |
| mysql-bin.000001 | 1311 | Query          |       101 |        1440 | DROP TABLE IF EXISTS `test`.`t1` /* generated by server */              |
| mysql-bin.000001 | 1440 | Gtid           |       101 |        1501 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:6'       |
| mysql-bin.000001 | 1501 | Query          |       101 |        1630 | create table test.t1(id int not null primary key, sid int)              |
| mysql-bin.000001 | 1630 | Gtid           |       101 |        1691 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:7'       |
| mysql-bin.000001 | 1691 | Query          |       101 |        1755 | BEGIN                                                                   |
| mysql-bin.000001 | 1755 | Table_map      |       101 |        1797 | table_id: 219 (test.t1)                                                 |
| mysql-bin.000001 | 1797 | Write_rows     |       101 |        1837 | table_id: 219 flags: STMT_END_F                                         |
| mysql-bin.000001 | 1837 | Xid            |       101 |        1864 | COMMIT /* xid=51 */                                                     |
| mysql-bin.000001 | 1864 | Gtid           |       102 |        1925 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1000003' |
| mysql-bin.000001 | 1925 | Query          |       102 |        2032 | create schema if not exists test                                        |
| mysql-bin.000001 | 2032 | Gtid           |       102 |        2093 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1000004' |
| mysql-bin.000001 | 2093 | Query          |       102 |        2222 | DROP TABLE IF EXISTS `test`.`t2` /* generated by server */              |
| mysql-bin.000001 | 2222 | Gtid           |       102 |        2283 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1000005' |
| mysql-bin.000001 | 2283 | Query          |       102 |        2412 | create table test.t2(id int not null primary key, sid int)              |
| mysql-bin.000001 | 2412 | Gtid           |       102 |        2473 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1000006' |
| mysql-bin.000001 | 2473 | Query          |       102 |        2542 | BEGIN                                                                   |
| mysql-bin.000001 | 2542 | Table_map      |       102 |        2584 | table_id: 220 (test.t2)                                                 |
| mysql-bin.000001 | 2584 | Write_rows     |       102 |        2624 | table_id: 220 flags: STMT_END_F                                         |
| mysql-bin.000001 | 2624 | Xid            |       102 |        2651 | COMMIT /* xid=62 */                                                     |
| mysql-bin.000001 | 2651 | Gtid           |       103 |        2712 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000003' |
| mysql-bin.000001 | 2712 | Query          |       103 |        2819 | create schema if not exists test                                        |
| mysql-bin.000001 | 2819 | Gtid           |       103 |        2880 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000004' |
| mysql-bin.000001 | 2880 | Query          |       103 |        3009 | DROP TABLE IF EXISTS `test`.`t3` /* generated by server */              |
| mysql-bin.000001 | 3009 | Gtid           |       103 |        3070 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000005' |
| mysql-bin.000001 | 3070 | Query          |       103 |        3199 | create table test.t3(id int not null primary key, sid int)              |
| mysql-bin.000001 | 3199 | Gtid           |       103 |        3260 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000006' |
| mysql-bin.000001 | 3260 | Query          |       103 |        3324 | BEGIN                                                                   |
| mysql-bin.000001 | 3324 | Table_map      |       103 |        3366 | table_id: 221 (test.t3)                                                 |
| mysql-bin.000001 | 3366 | Write_rows     |       103 |        3406 | table_id: 221 flags: STMT_END_F                                         |
| mysql-bin.000001 | 3406 | Xid            |       103 |        3433 | COMMIT /* xid=68 */                                                     |
+------------------+------+----------------+-----------+-------------+-------------------------------------------------------------------------+
47 rows in set (0.00 sec)

Two important points:

  • All transaction IDs are assigned to the group, not to the node. The only way to see where the data is coming from is to look at the binary log itself and check the good old server-id. One wonders why we have come all this way with the ugly UUIDs in the global transaction identifier only to maim their usefulness by removing one of the most important feature, which is tracking the data origin.

For example:

# at 434
#170122 11:24:11 server id 101  end_log_pos 495         GTID    last_committed=1        sequence_number=2
SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2'/*!*/;
# at 495
#170122 11:24:11 server id 101  end_log_pos 554         Query   thread_id=7     exec_time=6     error_code=0
SET TIMESTAMP=1485080651/*!*/;
BEGIN
/*!*/;
  • Because log-slave-updates is mandatory, the binary log in every node will have all the transactions of every other node. This can have disagreeable side effects when dealing with large data. Here is an example when we load the sample employee database from node #1:

With Group Replication, the load takes 2 minutes and 16 seconds, and the binary logs have the same size in every node.

[GR]$ ls -lh node?/data/*bin*
-rw-r-----  1 gmax  staff   8.2K Jan 22 10:22 node1/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    63M Jan 22 10:24 node1/data/mysql-bin.000002
-rw-r-----  1 gmax  staff    38B Jan 22 10:22 node1/data/mysql-bin.index

-rw-r-----  1 gmax  staff    63M Jan 22 10:24 node2/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    19B Jan 22 10:12 node2/data/mysql-bin.index

-rw-r-----  1 gmax  staff    63M Jan 22 10:24 node3/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    19B Jan 22 10:12 node3/data/mysql-bin.index

The same operation in multi-source replication takes 1 minute and 30 seconds. The binary logs are kept only in the origin.

[MS]$ ls -lh node?/data/*bin*
-rw-r-----  1 gmax  staff   4.9K Jan 22 10:26 node1/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    63M Jan 22 10:27 node1/data/mysql-bin.000002
-rw-r-----  1 gmax  staff    38B Jan 22 10:26 node1/data/mysql-bin.index

-rw-r-----  1 gmax  staff   1.4K Jan 22 10:14 node2/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    19B Jan 22 10:14 node2/data/mysql-bin.index

-rw-r-----  1 gmax  staff   1.4K Jan 22 10:14 node3/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    19B Jan 22 10:14 node3/data/mysql-bin.index

Conflict resolution

One of the strong points of MGR is conflict resolution.

We can try a conflicting operations in two nodes, inserting the same data at the same time:

use test;
set autocommit=0;
insert into t2 values (3, @@server_id);
commit;

In multi source, we get a replication error, on both nodes. It's an ugly result, but it tells the user immediately that something went wrong in a given node, and doesn't let the error propagate to other nodes.

In MGR, the situation varies. This is a possible outcome:

node1 [localhost] {msandbox} (test) > set autocommit=0;                        |   node2 [localhost] {msandbox} (test) > set autocommit=0;
Query OK, 0 rows affected (0.00 sec)                                           |   Query OK, 0 rows affected (0.00 sec)
                                                                               |
node1 [localhost] {msandbox} (test) > insert into t2 values (3, @@server_id);  |   node2 [localhost] {msandbox} (test) > insert into t2 values (3, @@server_id);
Query OK, 1 row affected (0.00 sec)                                            |   Query OK, 1 row affected (0.00 sec)
                                                                               |
node1 [localhost] {msandbox} (test) > select * from t2;                        |   node2 [localhost] {msandbox} (test) > select * from t2;
+----+------+                                                                  |   +----+------+
| id | sid  |                                                                  |   | id | sid  |
+----+------+                                                                  |   +----+------+
|  2 |  102 |                                                                  |   |  2 |  102 |
|  3 |  101 |                                                                  |   |  3 |  102 |
+----+------+                                                                  |   +----+------+
2 rows in set (0.00 sec)                                                       |   2 rows in set (0.00 sec)
                                                                               |
node1 [localhost] {msandbox} (test) > commit;                                  |   node2 [localhost] {msandbox} (test) > commit;
Query OK, 0 rows affected (0.01 sec)                                           |   ERROR 3101 (HY000): Plugin instructed the server to rollback the current transaction.
                                                                               |   node2 [localhost] {msandbox} (test) > select * from t2;
 node1 [localhost] {msandbox} (test) > select * from t2;                       |   +----+------+
 +----+------+                                                                 |   | id | sid  |
 | id | sid  |                                                                 |   +----+------+
 +----+------+                                                                 |   |  2 |  102 |
 |  2 |  102 |                                                                 |   |  3 |  101 |
 |  3 |  101 |                                                                 |   +----+------+
 +----+------+                                                                 |   2 rows in set (0.00 sec)
 2 rows in set (0.00 sec)                                                      |

Here node # 2 got the transaction a fraction of second later, and its transaction was rolled back. Thus the transaction that was ultimately kept in the database was the one from node1 (server-id 101.) However, this behavior is not predictable. If we try the same operation again, we get a different outcome:

node1 [localhost] {msandbox} (test) > insert into t2 values (4, @@server_id);  |   node2 [localhost] {msandbox} (test) > insert into t2 values (4, @@server_id);
Query OK, 1 row affected (0.00 sec)                                            |   Query OK, 1 row affected (0.00 sec)
                                                                               |
node1 [localhost] {msandbox} (test) > select * from t2;                        |   node2 [localhost] {msandbox} (test) > select * from t2;
+----+------+                                                                  |   +----+------+
| id | sid  |                                                                  |   | id | sid  |
+----+------+                                                                  |   +----+------+
|  2 |  102 |                                                                  |   |  2 |  102 |
|  3 |  101 |                                                                  |   |  3 |  101 |
|  4 |  101 |                                                                  |   |  4 |  102 |
+----+------+                                                                  |   +----+------+
3 rows in set (0.00 sec)                                                       |   3 rows in set (0.00 sec)
                                                                               |
node1 [localhost] {msandbox} (test) > commit;                                  |   node2 [localhost] {msandbox} (test) > commit;
Query OK, 0 rows affected (0.01 sec)                                           |
ERROR 3101 (HY000): Plugin instructed the server to rollback                   |
the current transaction.                                                       |
node1 [localhost] {msandbox} (test) > select * from t2;                        |   node2 [localhost] {msandbox} (test) > select * from t2;
+----+------+                                                                  |   +----+------+
| id | sid  |                                                                  |   | id | sid  |
+----+------+                                                                  |   +----+------+
|  2 |  102 |                                                                  |   |  2 |  102 |
|  3 |  101 |                                                                  |   |  3 |  101 |
|  4 |  102 |                                                                  |   |  4 |  102 |
+----+------+                                                                  |   +----+------+
4 rows in set (0.00 sec)                                                       |   3 rows in set (0.00 sec)

In the second attempt, the transaction was rolled back by node 1, and the surviving one is the one that was inserted from node 2. This means that conflict resolution works, but it may not be what the user wants, as the resolved conflict if aleatory.

Summing up

On the plus side, MGR keeps what it promises. We can set up a cluster of peer nodes and replicate data between nodes with some advantages compared to older multi-source topologies.

On the minus side, the documentation could be vastly improved, especially for multi-primary setup. Moreover, users need to be aware of the limitations, such as serializable isolation level and foreign keys with constraints not being supported.

Most important from my standpoint is the reduction of monitoring information for this technology, namely the loss of information about the data origin.

Wednesday, September 21, 2016

Improving the design of MySQL replication

Now that MySQL 8.0 has been revealed, it's time to take a deep look at replication features in the latest releases, and review its overall design.

Server UUID vs Server-ID

At the beginning of replication, there was the server_id variable that identified uniquely a node in a replication system. The variable is still here, but in MySQL 5.6 it was joined by another value, which is created during the server initialisation, regardless of its involvement in a replication system. The server_uuid is a string of hexadecimal characters that is the basis for global transaction identifiers:

select @@server_id, @@server_uuid;
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         101 | 4c0a9670-7f9a-11e6-9f8b-188f83e4eefc |
+-------------+--------------------------------------+

If this server is a master, its slave will report:

        Master_Server_ID: 101 
             Master_UUID: 4c0a9670-7f9a-11e6-9f8b-188f83e4eefc
       [...]
       Retrieved_Gtid_Set: 4c0a9670-7f9a-11e6-9f8b-188f83e4eefc:1-42
        Executed_Gtid_Set: 4c0a9670-7f9a-11e6-9f8b-188f83e4eefc:1-42

The serverid is still indispensable. We can't use the same value for two nodes in the replication system, or we get errors. But instead of using the serverid as the identifier for global transaction identifiers (GTID) we use the serverUUID, thus being subjected to the inhuman treatment of dealing with unreadable values to track our valuable transactions. Yet, server UUIDs have not replaced everything: if we want to exclude one or more servers from being applied, we still refer to them by serverid, as in the IGNORE_SERVER_IDS clause of CHANGE MASTER TO.

How should it be instead

While UUIDs guarantee that identifiers are unique in the the whole planet, the number of nodes in a replication system are finite, and for practical a single system may not go beyond the thousands. I don't care if my serverid is the same as a serverid in another continent: all I want is to be unique within my system. And since we have to use unique values for every node server_id, it is unnecessarily cruel to force users to deal with long strings instead of plain numbers.

VARIABLES vs STATUS

In general, MySQL behavior gets changed by setting variables to a given value either in the options file or using SET GLOBAL dynamically. To see the setting of a given value, we use SHOW VARIABLES:

show variables like 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 15    |
+-----------------+-------+

Here max_connections was changed from its default value to a much lower one. To see how many connections we have consumed, we use SHOW STATUS:

show status like '%connections%';
+-----------------------------------+---------------------+
| Variable_name                     | Value               |
+-----------------------------------+---------------------+
| Connection_errors_max_connections | 0                   |
| Connections                       | 7                   |
| Max_used_connections              | 1                   |
| Max_used_connections_time         | 2016-09-20 18:01:40 |
+-----------------------------------+---------------------+

That's the general theory. Sometimes, in replication, MySQL follows the same path. For example, in semi-synchronous replication, we have the settings of the functionality using variables:

 show variables like 'rpl%';
+-------------------------------------------+------------+
| Variable_name                             | Value      |
+-------------------------------------------+------------+
| rpl_semi_sync_master_enabled              | ON         |
| rpl_semi_sync_master_timeout              | 10000      |
| rpl_semi_sync_master_trace_level          | 32         |
| rpl_semi_sync_master_wait_for_slave_count | 1          |
| rpl_semi_sync_master_wait_no_slave        | ON         |
| rpl_semi_sync_master_wait_point           | AFTER_SYNC |
| rpl_stop_slave_timeout                    | 31536000   |
+-------------------------------------------+------------+

and the running results of how the feature is performing using status:

set status like 'rpl%';
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 2     |
| Rpl_semi_sync_master_net_avg_wait_time     | 0     |
| Rpl_semi_sync_master_net_wait_time         | 0     |
| Rpl_semi_sync_master_net_waits             | 4     |
| Rpl_semi_sync_master_no_times              | 0     |
| Rpl_semi_sync_master_no_tx                 | 0     |
| Rpl_semi_sync_master_status                | ON    |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 431   |
| Rpl_semi_sync_master_tx_wait_time          | 863   |
| Rpl_semi_sync_master_tx_waits              | 2     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 2     |
+--------------------------------------------+-------+

However, when it comes to GTID, the paradigm is broken:

slave1 [localhost] {msandbox} ((none)) > show global VARIABLES like '%gtid%';
+----------------------------------+-------------------------------------------+
| Variable_name                    | Value                                     |
+----------------------------------+-------------------------------------------+
| binlog_gtid_simple_recovery      | ON                                        |
| enforce_gtid_consistency         | ON                                        |
| gtid_executed                    | 00011808-1111-1111-1111-111111111111:1-42 |
| gtid_executed_compression_period | 1000                                      |
| gtid_mode                        | ON                                        |
| gtid_owned                       |                                           |
| gtid_purged                      | 00011808-1111-1111-1111-111111111111:1-42 |
| session_track_gtids              | OFF                                       |
+----------------------------------+-------------------------------------------+
8 rows in set (0.00 sec)

slave1 [localhost] {msandbox} ((none)) > show global STATUS like '%gtid%';
Empty set (0.01 sec)

Here we have variables like gtid_mode and enforce_gtid_consistency that are used to enable the behavior, but the monitoring of the result (gtid_executed and gtid_purged) should have been in STATUS, not in VARIABLES. The same usage of VARIABLES can be seen in MySQL group replication.

How should it be instead

The values that monitor the functionality should be in STATUS rather than in VARIABLES. However, in this case, it would be better if the values were tracked in performance_schema tables, as noted below.

GTID executed

When GTIDs are enabled, we can see its value in SHOW MASTER STATUS

SHOW MASTER STATUS\G
*************************** 1. row ***************************
             File: mysql-bin.000002
         Position: 6325
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22
1 row in set (0.00 sec)

The information here has a precise meaning: the transaction set 1-22 comes from a master identified by UUID 00007801-1111-1111-1111-111111111111. The latest transaction corresponds to binary log mysql-bin.000002 at position 6325.

This is the identification of a source, and as such is the basis for monitoring. When we want to check if the slave service is running fine, we need to compare the slave progress against the information reported by the master. So far, so good.

However, when we use multiple masters, the matter becomes less clear. A slave that was replicating from this master and from two more, will show the following:

SHOW MASTER STATUS\G
*************************** 1. row ***************************
             File: mysql-bin.000002
         Position: 154
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4

This is completely wrong. What we see here has nothing to do with being a master, as the data shown in the above statement was originated in different servers. The data is also wrong because it is thrown together in the same field. Using the same server, we can see the same information in different ways.

select @@global.gtid_executed\G
*************************** 1. row ***************************
@@global.gtid_executed: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4

This is the same info seen before. In this context, it may seem less wrong, as it is only data coming from other servers.

Here again the information will become much more confused when the slave is also a master and produces its own data sets. If we create something in this server, we end up with the following:

select @@global.gtid_executed\G
*************************** 1. row ***************************
@@global.gtid_executed: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4,
00007804-4444-4444-4444-444444444444:1-2

Now we have in the same field the data that was produced in this server and the data that came through replication. The matter looks even more embarrassing if we look at SHOW SLAVE STATUS, which reports distinct data for every channel, but the executed_gtid_set is reported all mixed up, in every channel, unchanged and confused, master data and slave data alike.

SHOW SLAVE STATUS\G
*************************** 1. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 6325
               Relay_Log_File: mysql-relay-node1.000002
                Relay_Log_Pos: 6538
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 6325
              Relay_Log_Space: 6747
[...]
             Master_Server_Id: 101
                  Master_UUID: 00007801-1111-1111-1111-111111111111
[...]
           Retrieved_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22
            Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4,
00007804-4444-4444-4444-444444444444:1-2
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node1
           Master_TLS_Version:
*************************** 2. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 6039
               Relay_Log_File: mysql-relay-node2.000002
                Relay_Log_Pos: 6252
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 6039
              Relay_Log_Space: 6461
[...]
             Master_Server_Id: 102
                  Master_UUID: 00007802-2222-2222-2222-222222222222
[...]
           Retrieved_Gtid_Set: 00007802-2222-2222-2222-222222222222:1-21
            Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4,
00007804-4444-4444-4444-444444444444:1-2
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node2
           Master_TLS_Version:
*************************** 3. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 1177
               Relay_Log_File: mysql-relay-node3.000002
                Relay_Log_Pos: 1390
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 1177
              Relay_Log_Space: 1599
[...]
             Master_Server_Id: 103
                  Master_UUID: 00007803-3333-3333-3333-333333333333
[...]
           Retrieved_Gtid_Set: 00007803-3333-3333-3333-333333333333:1-4
            Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4,
00007804-4444-4444-4444-444444444444:1-2
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node3
           Master_TLS_Version:

How should it be instead

First off, the data should not be in a single variable, because it is composed of multiple values. But even if it were to be kept in the environment (as STATUS items, not variables, as we have seen above,) we should have a distinction:

SHOW STATUS LIKE 'gtid_applied'
*************************** 1. row ***************************
@@global.gtid_executed: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4

SHOW STATUS LIKE 'gtid_generated'
*************************** 1. row ***************************
00007804-4444-4444-4444-444444444444:1-2

The SHOW SLAVE STATUS should report the right line for each channel. For example:

*************************** 1. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 6325
               Relay_Log_File: mysql-relay-node1.000002
                Relay_Log_Pos: 6538
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 6325
              Relay_Log_Space: 6747
[...]
             Master_Server_Id: 101
                  Master_UUID: 00007801-1111-1111-1111-111111111111
[...]
           Retrieved_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22
            Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node1
           Master_TLS_Version:
*************************** 2. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 6039
               Relay_Log_File: mysql-relay-node2.000002
                Relay_Log_Pos: 6252
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 6039
              Relay_Log_Space: 6461
[...]
             Master_Server_Id: 102
                  Master_UUID: 00007802-2222-2222-2222-222222222222
[...]
           Retrieved_Gtid_Set: 00007802-2222-2222-2222-222222222222:1-21
            Executed_Gtid_Set: 00007802-2222-2222-2222-222222222222:1-21
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node2
           Master_TLS_Version:
*************************** 3. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 1177
               Relay_Log_File: mysql-relay-node3.000002
                Relay_Log_Pos: 1390
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 1177
              Relay_Log_Space: 1599
[...]
             Master_Server_Id: 103
                  Master_UUID: 00007803-3333-3333-3333-333333333333
[...]
           Retrieved_Gtid_Set: 00007803-3333-3333-3333-333333333333:1-4
            Executed_Gtid_Set: 00007803-3333-3333-3333-333333333333:1-4
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node3
           Master_TLS_Version:

Missing information in performance_schema tables

The idea of having replication information in performance schema is to replace SHOW SLAVE STATUS with a set of tables that report the same info. The theory is good, but the implementation is lacking.

First of all, the tables are misnamed. replication_applier_configuration has only settings about the applier delay, while replication_applier_status again reports only info about the remaining delay. The information worth looking at for monitoring is in a table named replication_connection_status, while it should refer to applier status or applier progress.

select * from replication_connection_status\G
*************************** 1. row ***************************
             CHANNEL_NAME: node1
               GROUP_NAME:
              SOURCE_UUID: 00007801-1111-1111-1111-111111111111
                THREAD_ID: 35
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:41
 RECEIVED_TRANSACTION_SET: 00007801-1111-1111-1111-111111111111:1-22
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 2. row ***************************
             CHANNEL_NAME: node2
               GROUP_NAME:
              SOURCE_UUID: 00007802-2222-2222-2222-222222222222
                THREAD_ID: 39
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:41
 RECEIVED_TRANSACTION_SET: 00007802-2222-2222-2222-222222222222:1-21
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 3. row ***************************
             CHANNEL_NAME: node3
               GROUP_NAME:
              SOURCE_UUID: 00007803-3333-3333-3333-333333333333
                THREAD_ID: 43
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:42
 RECEIVED_TRANSACTION_SET: 00007803-3333-3333-3333-333333333333:1-4
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00

What is missing here is the applied data, i.e. what appears in @@global.gtid_executed, which, as said above, should be split by server. Moreover, we miss the corresponding binary logs and positions, which we find in some neglected tables in mysql databases and that should be removed, with their contents integrated here. Another thing that's missing are the schema and the timestamps of the latest transaction, both the creation and the replication times. All this information is valuable to the DBA when troubleshooting. Rather than opening binary logs to find out the details, they should be all collected in this table for ease of use.

How should it be instead

The monitoring tables should include information about:

  • transaction timestamp;
  • binary log and position (or what a future implementation would use to transport data);
  • default schema.

Here's an example:

select * from replication_applier_progress\G
*************************** 1. row ***************************
             CHANNEL_NAME: node1
               GROUP_NAME:
              SOURCE_UUID: 00007801-1111-1111-1111-111111111111
                THREAD_ID: 35
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:41
    TRANSACTION_TIMESTAMP: 2016-09-20 22:24:13
          APPLY_TIMESTAMP: 2016-09-20 22:24:15
 RECEIVED_TRANSACTION_SET: 00007801-1111-1111-1111-111111111111:1-22
  APPLIED_TRANSACTION_SET: 00007801-1111-1111-1111-111111111111:1-22
          Master_Log_File: mysql-bin.000002
      Read_Master_Log_Pos: 6325
           Relay_Log_File: mysql-relay-node1.000002
            Relay_Log_Pos: 6538
    Relay_Master_Log_File: mysql-bin.000002
       applying_on_schema: db1
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 2. row ***************************
             CHANNEL_NAME: node2
               GROUP_NAME:
              SOURCE_UUID: 00007802-2222-2222-2222-222222222222
                THREAD_ID: 39
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:41
    TRANSACTION_TIMESTAMP: 2016-09-20 22:25:15
          APPLY_TIMESTAMP: 2016-09-20 22:25:18
 RECEIVED_TRANSACTION_SET: 00007802-2222-2222-2222-222222222222:1-21
  APPLIED_TRANSACTION_SET: 00007802-2222-2222-2222-222222222222:1-21
          Master_Log_File: mysql-bin.000002
      Read_Master_Log_Pos: 6039
           Relay_Log_File: mysql-relay-node2.000002
            Relay_Log_Pos: 6252
    Relay_Master_Log_File: mysql-bin.000002
       applying_on_schema: db2
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 3. row ***************************
             CHANNEL_NAME: node3
               GROUP_NAME:
              SOURCE_UUID: 00007803-3333-3333-3333-333333333333
                THREAD_ID: 43
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:42
    TRANSACTION_TIMESTAMP: 2016-09-20 22:28:16
          APPLY_TIMESTAMP: 2016-09-20 22:28:19
 RECEIVED_TRANSACTION_SET: 00007803-3333-3333-3333-333333333333:1-4
  APPLIED_TRANSACTION_SET: 00007803-3333-3333-3333-333333333333:1-4
          Master_Log_File: mysql-bin.000002
      Read_Master_Log_Pos: 1177
           Relay_Log_File: mysql-relay-node3.000002
            Relay_Log_Pos: 1390
    Relay_Master_Log_File: mysql-bin.000002
       applying_on_schema: db3
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00

The same level of detail should be used for replication_applier_status_by_worker, which, by the way, has a more appropriate name than replication_connection_status.

Monitoring tables are in different places

MySQL 5.6 introduced replication tables in the mysql database. Unfortunately, this inclusion happened before GTID was implemented. As a result, we have tables in mysql database reporting log names and positions and tables in performance_schema reporting GTIDs, but without correlations.

How should it be instead

The replication tables in mysql should be removed, and integrated in the performance_schema as shown above.

There is no table for SHOW MASTER STATUS

The information in SHOW SLAVE STATUS has been translated almost entirely in performance_schema tables. No such thing happened to SHOW MASTER STATUS. To make monitoring more efficient, all the data used for monitoring should be in tables, and the information should not contain more than one piece of information per field, as we have seen in previous cases.

How should it be instead

There should be a table containing the information from SHOW MASTER STATUS, except the filter info, which should be in a different table:

select * from replication_master_status\G
SHOW MASTER STATUS\G
*************************** 1. row ***************************
                 File: mysql-bin.000002
             Position: 154
    Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22
Transaction_timestamp: 2016-09-20 22:24:13

select * from replication_filters\G
*************************** 1. row ***************************
     Filter_Name: binlog_do_db
     Filter_Yype: binlog
    Filter_Value: personnel.*
   Filter_Status: ACTIVE

Call for action

Dear MySQL product managers: please take action to make replication more consistent and usable. I may have forgotten something, but I think the requests in this article are a good start to improve the design of MySQL replication.

Monday, August 24, 2015

MySQL usability issues for replication

In my latest series of advanced replication features, I came across several usability issues, which I would like to recap here. For each section of this list of requests, I make a wish list, with some general comments.

INSTALLATION

As the maintainer of MySQL Sandbox, a tool that wants to facilitate the installation of MySQL for testing, I am particularly sensitive to the annoyances during installation. I have covered the matter extensively in recent weeks and many times before. My point is that the logging of the installation must be useful for the DBA, not to the MySQL developers.
  • Make a big distinction between [Info] and [Warning] messages. Warnings should only appear when there is some risk. Using a warning to tell the user that the InnoDB log file was created is a nonsense.
  • … and then make [Info] messages suppressible. Currently, I can suppress warnings, but then I would suppress also the real warnings, in addition to the fake ones.
  • Make mysqld –initialize use the correct syntax for timestamps, instead of issuing a warning about the obsolete syntax that was used. (See In search of cleanliness : the elusive quiet installation for a full explanation)

MULTI-SOURCE

Here we have two main problems. The first one is that the introduction of multi source channels has left some new ways of shooting yourself in the foot. By mixing the new explicitly named channels with the hidden one (but still used in regular replication!) which is named "" (= the empty string), we may start to see a new series of blunders in replication.
The second set of problems comes from weak integration between existing features and the new one. There are operation modes, such as semi-synchronous, delayed, and parallel replication that were designed with a single data stream in mind, and that lack the capability of being tuned for different channels.
  • Make it impossible to use the unnamed channel and the named ones at the same time. Currently only a very weak protection exists against mixing named and unnamed channels. Also the syntax for channel ’’ should result in an error when named channels are defined.
  • Integrate features that were designed for single sources to work better with multi source: semi-sync replication, parallel applier, delayed replication.

Wednesday, August 19, 2015

MySQL replication in action - Part 4 - star and hybrid topologies

Previous episodes:

Introducing star topology.

In all-masters P2P topologies, we have seen that we have a way of deploying a topology where all nodes are masters, and achieve better efficiency and stability than ring topologies. That method comes at the price of a complex setup, which requires, for a N-node cluster, N*(N-1) connections.
We can achieve the same result as in a P2P all-masters topology by trading connections for stability. In a star topology (Figure 1) all nodes are masters, but they do not connect to each other directly. There is a special node, named hub, which receives the changes produced by each endpoint and spreads them to the others.
Topologies star
Figure 1 - A star topology

Wednesday, August 12, 2015

MySQL replication in action - Part 1: GTID & Co


In the theoretical part of this series, we have seen the basics of monitoring. In that article, though, we have barely mentioned the new tools available in MySQL 5.7 and MariaDB 10. Let’s start from something that has the potential of dramatically changing replication as we know it.

Crash-safe tables and Global transaction identifiers in MySQL 5.6 and 5.7

Global transaction identifiers (GTID) is a feature that has been in my wish list for long time, since the times I was working with the MySQL team. By the time I left Oracle, this feature was not even in the plans.
When MySQL 5.6 was first disclosed, the biggest improvement for replication was the introduction of crash-safe tables (see Status persistence in Monitoring 101.) There are two tables in the mysql database, named slave_master_info and slave_relay_log_info. At the beginning, these tables were using the MyISAM engine, thus defeating the purpose of making them crash-safe. In later versions, the developers decided to bite the bullet and create these tables with innodb from the beginning.
These two tables allow us to see the same information previously stored in the files master.info and relay_log.info. What makes these tables convenient is that they should survive a crash better than the standalone files.

Monday, January 07, 2013

Easily testing MySQL 5.6 GTID in a sandbox

MySQL 5.6 seems to be ready for GA. I have no inside information about it, but from some clues collected in various places I feel that the release should not be far away. Thus, it's time for some serious testing, and for that purpose I have worked at updating MySQL Sandbox with some urgent features.

I have just released MySQL Sandbox 3.0.28, with more support for MySQL 5.6. Notably in this release, there is suppression of MySQL 5.6 annoying verbosity, additional suppression of more annoying warnings ( actually a bug) when using empty passwords on the command line.

There is also an enhancement to the 'clear' command. In previous versions of MySQL, this command removed everything from the data directory, leaving the server ready for a clean start. In MySQL 5.6, this is not feasible, because there are innodb tables in the mysql schema. Therefore, what happens now is that, immediately after creating the sandbox users, the installation program stores a dump of the 'mysql' schema. The 'clear' command will remove the innodb tables from mysql, and the 'start' command will notice that and reload the schema from the dump.

More interesting, though, the replication installer creates a file (only if the MySQL server is 5.6.9 or higher) called 'enable_gtid' which restarts the replication cluster with Global Transaction Identifiers enabled.

Let's see an example session:
$ make_replication_sandbox 5.6.9
installing and starting master
installing slave 1
installing slave 2
starting slave 1
... sandbox server started
starting slave 2
... sandbox server started
initializing slave 1
initializing slave 2
replication directory installed in $HOME/sandboxes/rsandbox_5_6_9

$ cd $HOME/sandboxes/rsandbox_5_6_9

$ ./check_slaves 
slave # 1
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 2590
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 2590
slave # 2
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 2590
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 2590

Now we use the 'enable_gtid' command. It will simply restart the cluster with the appropriate options.

$ ./enable_gtid 
executing "stop" on slave 1
executing "stop" on slave 2
executing "stop" on master
executing "start" on master
. sandbox server started
executing "start" on slave 1
. sandbox server started
executing "start" on slave 2
. sandbox server started

$ ./check_slaves 
slave # 1
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 151
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 151
slave # 2
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 151
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 151

Now, let's see if Global Transaction IDs are enabled.

$ ./m -e 'create table test.t1(i int not null primary key)'
$ ./m -e 'insert into test.t1 values (1)'
$ ./m -e 'insert into test.t1 values (2)'
$ ./m -e 'show master status\G'
*************************** 1. row ***************************
             File: mysql-bin.000002
         Position: 825
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: C0DF6B8A-5823-11E2-BC44-3970854BE7A5:1-3

$ ./m -e 'show binlog events in "mysql-bin.000002"'
+------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
| Log_name         | Pos | Event_type     | Server_id | End_log_pos | Info                                                              |
+------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
| mysql-bin.000002 |   4 | Format_desc    |         1 |         120 | Server ver: 5.6.9-rc-log, Binlog ver: 4                           |
| mysql-bin.000002 | 120 | Previous_gtids |         1 |         151 |                                                                   |
| mysql-bin.000002 | 151 | Gtid           |         1 |         199 | SET @@SESSION.GTID_NEXT= 'C0DF6B8A-5823-11E2-BC44-3970854BE7A5:1' |
| mysql-bin.000002 | 199 | Query          |         1 |         317 | create table test.t1(i int not null primary key)                  |
| mysql-bin.000002 | 317 | Gtid           |         1 |         365 | SET @@SESSION.GTID_NEXT= 'C0DF6B8A-5823-11E2-BC44-3970854BE7A5:2' |
| mysql-bin.000002 | 365 | Query          |         1 |         440 | BEGIN                                                             |
| mysql-bin.000002 | 440 | Query          |         1 |         540 | insert into test.t1 values (1)                                    |
| mysql-bin.000002 | 540 | Xid            |         1 |         571 | COMMIT /* xid=26 */                                               |
| mysql-bin.000002 | 571 | Gtid           |         1 |         619 | SET @@SESSION.GTID_NEXT= 'C0DF6B8A-5823-11E2-BC44-3970854BE7A5:3' |
| mysql-bin.000002 | 619 | Query          |         1 |         694 | BEGIN                                                             |
| mysql-bin.000002 | 694 | Query          |         1 |         794 | insert into test.t1 values (2)                                    |
| mysql-bin.000002 | 794 | Xid            |         1 |         825 | COMMIT /* xid=29 */                                               |
+------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+

Good. The master has stored the GTID in the binary logs. What about the slaves?

$ ./s1 -e 'show slave status\G' |grep 'Running:\|Gtid'
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
           Retrieved_Gtid_Set: C0DF6B8A-5823-11E2-BC44-3970854BE7A5:1-3
            Executed_Gtid_Set: C0DF6B8A-5823-11E2-BC44-3970854BE7A5:1-3

$ ./s2 -e 'show slave status\G' |grep 'Running:\|Gtid'
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
           Retrieved_Gtid_Set: C0DF6B8A-5823-11E2-BC44-3970854BE7A5:1-3
            Executed_Gtid_Set: C0DF6B8A-5823-11E2-BC44-3970854BE7A5:1-3

Also the slaves are collecting GTIDs. All is well.

Notice that this method is only safe because we are using a system with no traffic at all. If your replication is under load, then you need to follow the method described in the MySQL 5.6 user manual.