The Data Charmer: oracle

Showing posts with label oracle. Show all posts

Sunday, January 22, 2017

MySQL Group Replication vs. Multi Source

In my previous post, we saw the usage of MySQL Group Replication (MGR) in single-primary mode. We know that Oracle does not recommends using MGR in multi-primary mode, but there is so much in the documentation and in presentations about MGR behavior in multi-primary, that I feel I should really give it a try, and especially compare this technology with the already existing multiple master solution introduced in 5.7: multi-source replication.

Installation

To this extent, I will set up two clusters using MySQL-Sandbox. The instructions for MGR in the manual use three nodes in the same host without using MySQL Sandbox. Here we can see that using MySQL-Sandbox simplifies operations considerably (the scripts are available in GitHub):

Group replication

# ----------------------------------------------------------------------------
#!/bin/bash
# mm_gr.sh : installs MySQL Group Replication
MYSQL_VERSION=$1
[ -z "$MYSQL_VERSION" ] && MYSQL_VERSION=5.7.17

make_multiple_sandbox --gtid --group_directory=GR $MYSQL_VERSION

if [ "$?" != "0" ] ; then exit 1 ; fi
multi_sb=$HOME/sandboxes/GR
baseport=$($multi_sb/n1 -BN -e 'select @@port')
baseport=$(($baseport+99))

port1=$(($baseport+1))
port2=$(($baseport+2))
port3=$(($baseport+3))
for N in 1 2 3
do
    myport=$(($baseport+N))
    options=(
        binlog_checksum=NONE
        log_slave_updates=ON
        plugin-load=group_replication.so
        group_replication=FORCE_PLUS_PERMANENT
        group_replication_start_on_boot=OFF
        group_replication_bootstrap_group=OFF
        transaction_write_set_extraction=XXHASH64
        report-host=127.0.0.1
        loose-group_replication_group_name="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
        loose-group_replication_local_address="127.0.0.1:$myport"
        loose-group_replication_group_seeds="127.0.0.1:$port1,127.0.0.1:$port2,127.0.0.1:$port3"
        loose-group-replication-single-primary-mode=off
    )
    $multi_sb/node$N/add_option ${options[*]}

    user_cmd='reset master;'
    user_cmd="$user_cmd CHANGE MASTER TO MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox' FOR CHANNEL 'group_replication_recovery';"
    $multi_sb/node$N/use -v -u root -e "$user_cmd"
done

START_CMD="SET GLOBAL group_replication_bootstrap_group=ON;"
START_CMD="$START_CMD START GROUP_REPLICATION;"
START_CMD="$START_CMD SET GLOBAL group_replication_bootstrap_group=OFF;"
$multi_sb/n1 -v -e "$START_CMD"
sleep 1
$multi_sb/n2 -v -e 'START GROUP_REPLICATION;'
sleep 1
$multi_sb/n3 -v -e 'START GROUP_REPLICATION;'
sleep 1
$multi_sb/use_all 'select * from performance_schema.replication_group_members'
# ----------------------------------------------------------------------------

Using this script, we get a cluster with MGR up and running. Here's a trimmed-out sample of its output:

$ ./mm_gr.sh
installing node 1
installing node 2
installing node 3
group directory installed in $HOME/sandboxes/GR
# option 'binlog_checksum=NONE' added to configuration file
# option 'log_slave_updates=ON' added to configuration file
# option 'plugin-load=group_replication.so' added to configuration file
# option 'group_replication=FORCE_PLUS_PERMANENT' added to configuration file
# option 'group_replication_start_on_boot=OFF' added to configuration file
# option 'group_replication_bootstrap_group=OFF' added to configuration file
# option 'transaction_write_set_extraction=XXHASH64' added to configuration file
# option 'loose-group_replication_group_name=aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee' added to configuration file
# option 'loose-group_replication_local_address=127.0.0.1:14518' added to configuration file
# option 'loose-group_replication_group_seeds=127.0.0.1:14518,127.0.0.1:14519,127.0.0.1:14520' added to configuration file
# option 'loose-group-replication-single-primary-mode=off' added to configuration file
.. sandbox server started
reset master
CHANGE MASTER TO MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox' FOR CHANNEL 'group_replication_recovery'

# [ ...]
.. sandbox server started
reset master
CHANGE MASTER TO MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox' FOR CHANNEL 'group_replication_recovery'

# [...]
.. sandbox server started
reset master
CHANGE MASTER TO MASTER_USER='rsandbox', MASTER_PASSWORD='rsandbox' FOR CHANNEL 'group_replication_recovery'

SET GLOBAL group_replication_bootstrap_group=ON
START GROUP_REPLICATION
SET GLOBAL group_replication_bootstrap_group=OFF
--------------

--------------
START GROUP_REPLICATION
--------------
START GROUP_REPLICATION
--------------

Multi-source replication

We have a similar (but much shorter) script to run multi-source replication in sandboxes.

# ----------------------------------------------------------------------------
#!/bin/bash
# mm_ms.sh : installs MySQL multi-source replication
MYSQL_VERSION=$1
[ -z "$MYSQL_VERSION" ] && MYSQL_VERSION=5.7.16

make_multiple_sandbox --gtid --group_directory=MS $MYSQL_VERSION

if [ "$?" != "0" ] ; then exit 1 ; fi
multi_sb=$HOME/sandboxes/MS

$multi_sb/use_all 'reset master'

for N in 1 2 3
do
    user_cmd=''
    for node in 1 2 3
    do
        if [ "$node" != "$N" ]
        then
            master_port=$($multi_sb/n$node -BN -e 'select @@port')
            user_cmd="$user_cmd CHANGE MASTER TO MASTER_USER='rsandbox', "
            user_cmd="$user_cmd MASTER_PASSWORD='rsandbox', master_host='127.0.0.1', "
            user_cmd="$user_cmd master_port=$master_port FOR CHANNEL 'node$node';"
            user_cmd="$user_cmd START SLAVE FOR CHANNEL 'node$node';"
        fi
    done
    $multi_sb/node$N/use -v -u root -e "$user_cmd"
done
# ----------------------------------------------------------------------------

Sample run:

$ ./mm_ms.sh
installing node 1
installing node 2
installing node 3
group directory installed in $HOME/sandboxes/MS
# server: 1:
# server: 2:
# server: 3:
--------------
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14318 FOR CHANNEL 'node2'
START SLAVE FOR CHANNEL 'node2'
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14319 FOR CHANNEL 'node3'
START SLAVE FOR CHANNEL 'node3'

--------------
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14317 FOR CHANNEL 'node1'
START SLAVE FOR CHANNEL 'node1'
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14319 FOR CHANNEL 'node3'
START SLAVE FOR CHANNEL 'node3'

--------------
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14317 FOR CHANNEL 'node1'
START SLAVE FOR CHANNEL 'node1'
CHANGE MASTER TO MASTER_USER='rsandbox',  MASTER_PASSWORD='rsandbox', master_host='127.0.0.1',  master_port=14318 FOR CHANNEL 'node2'
START SLAVE FOR CHANNEL 'node2'
--------------

Simple test data

Finally, we have a script that will create one table for each node and insert one record.


# ----------------------------------------------------------------------------     
#!/bin/bash
multi_sb=$1
if [ -z "$multi_sb" ]
then
    echo multiple sandbox path needed
    exit 1
fi
if [ ! -d $multi_sb ]
then
    echo directory $multi_sb not found
    exit 1
fi
if [ ! -d "$multi_sb/node3" ]
then
    echo directory $multi_sb/node3 not found
    exit 1
fi
cd $multi_sb

for N in  1 2 3 ; do
    ./n$N -e "create schema if not exists test"
    ./n$N -e "drop table if exists test.t$N"
    ./n$N -e "create table test.t$N(id int not null primary key, sid int)"
    ./n$N -e "insert into  test.t$N values ($N, @@server_id)"
done

./use_all 'select * from test.t1 union select * from test.t2 union select * from test.t3'
# ----------------------------------------------------------------------------

We run the script in both clusters, and at the end we'll have the test database with three tables, each one created and filled by a different node.

Checking replication status

The old topology: multi-source

Let's start with the the old technology, so we can easily compare it with the new one.

node1 [localhost] {msandbox} (performance_schema) > select * from replication_connection_status\G
*************************** 1. row ***************************
             CHANNEL_NAME: node2
               GROUP_NAME:
              SOURCE_UUID: 00014318-2222-2222-2222-222222222222   # ----
                THREAD_ID: 32
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 244
 LAST_HEARTBEAT_TIMESTAMP: 2017-01-22 13:31:54
 RECEIVED_TRANSACTION_SET: 00014318-2222-2222-2222-222222222222:1-4
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 2. row ***************************
             CHANNEL_NAME: node3
               GROUP_NAME:
              SOURCE_UUID: 00014319-3333-3333-3333-333333333333   # ----
                THREAD_ID: 34
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 244
 LAST_HEARTBEAT_TIMESTAMP: 2017-01-22 13:31:55
 RECEIVED_TRANSACTION_SET: 00014319-3333-3333-3333-333333333333:1-4
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
2 rows in set (0.00 sec)

Notice that we are benefitting from a feature of MySQL-Sandbox that creates a more readable version of the server UUID. This way we can easily identify the nodes. Here we see that each transaction set has a clearly defined origin. We can see similar information in the replication tables from the mysql database:

node1 [localhost] {msandbox} (mysql) > select * from slave_master_info\G
*************************** 1. row ***************************
       Number_of_lines: 25
       Master_log_name: mysql-bin.000001
        Master_log_pos: 154
                  Host: 127.0.0.1       # ----
             User_name: rsandbox
         User_password: rsandbox
                  Port: 14318           # ----
         Connect_retry: 60
           Enabled_ssl: 0
Ssl_verify_server_cert: 0
             Heartbeat: 30
                  Bind:
    Ignored_server_ids: 0
                  Uuid: 00014318-2222-2222-2222-222222222222  # ----
           Retry_count: 86400
           Ssl_crlpath:
 Enabled_auto_position: 0
          Channel_name: node2
           Tls_version:
*************************** 2. row ***************************
       Number_of_lines: 25
       Master_log_name: mysql-bin.000001
        Master_log_pos: 154
                  Host: 127.0.0.1    # ----
             User_name: rsandbox
         User_password: rsandbox
                  Port: 14319        # ----
         Connect_retry: 60
           Enabled_ssl: 0
Ssl_verify_server_cert: 0
             Heartbeat: 30
                  Bind:
    Ignored_server_ids: 0
                  Uuid: 00014319-3333-3333-3333-333333333333  # ----
           Retry_count: 86400
           Ssl_crlpath:
 Enabled_auto_position: 0
          Channel_name: node3
           Tls_version:
2 rows in set (0.00 sec)

Additionally, we have SHOW SLAVE STATUS, which, although not the ideal monitoring tool, is still the only place where we can see at once both the received and executed transactions, and the corresponding binary log and relay log records.

Here's an abridged version:

node1 [localhost] {msandbox} (performance_schema) > SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 14318
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 965
               Relay_Log_File: mysql-relay-node2.000002
                Relay_Log_Pos: 1178
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 965
              Relay_Log_Space: 1387
             Master_Server_Id: 102
                  Master_UUID: 00014318-2222-2222-2222-222222222222
             Master_Info_File: mysql.slave_master_info
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
           Retrieved_Gtid_Set: 00014318-2222-2222-2222-222222222222:1-4
            Executed_Gtid_Set: 00014317-1111-1111-1111-111111111111:1-4,
00014318-2222-2222-2222-222222222222:1-4,
00014319-3333-3333-3333-333333333333:1-4
                 Channel_Name: node2
*************************** 2. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 14319
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 965
               Relay_Log_File: mysql-relay-node3.000002
                Relay_Log_Pos: 1178
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 965
              Relay_Log_Space: 1387
              Until_Condition: None
             Master_Server_Id: 103
                  Master_UUID: 00014319-3333-3333-3333-333333333333
             Master_Info_File: mysql.slave_master_info
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
           Retrieved_Gtid_Set: 00014319-3333-3333-3333-333333333333:1-4
            Executed_Gtid_Set: 00014317-1111-1111-1111-111111111111:1-4,
00014318-2222-2222-2222-222222222222:1-4,
00014319-3333-3333-3333-333333333333:1-4
                 Channel_Name: node3
2 rows in set (0.00 sec)

Finally, we'll have a look at the data itself:

node1 [localhost] {msandbox} (mysql) > show binlog events;
+------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
| Log_name         | Pos | Event_type     | Server_id | End_log_pos | Info                                                              |
+------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
| mysql-bin.000001 |   4 | Format_desc    |       101 |         123 | Server ver: 5.7.16-log, Binlog ver: 4                             |
| mysql-bin.000001 | 123 | Previous_gtids |       101 |         154 |                                                                   |
| mysql-bin.000001 | 154 | Gtid           |       101 |         219 | SET @@SESSION.GTID_NEXT= '00014317-1111-1111-1111-111111111111:1' |
| mysql-bin.000001 | 219 | Query          |       101 |         325 | create schema if not exists test                                  |
| mysql-bin.000001 | 325 | Gtid           |       101 |         390 | SET @@SESSION.GTID_NEXT= '00014317-1111-1111-1111-111111111111:2' |
| mysql-bin.000001 | 390 | Query          |       101 |         518 | DROP TABLE IF EXISTS `test`.`t1` /* generated by server */        |
| mysql-bin.000001 | 518 | Gtid           |       101 |         583 | SET @@SESSION.GTID_NEXT= '00014317-1111-1111-1111-111111111111:3' |
| mysql-bin.000001 | 583 | Query          |       101 |         711 | create table test.t1(id int not null primary key, sid int)        |
| mysql-bin.000001 | 711 | Gtid           |       101 |         776 | SET @@SESSION.GTID_NEXT= '00014317-1111-1111-1111-111111111111:4' |
| mysql-bin.000001 | 776 | Query          |       101 |         844 | BEGIN                                                             |
| mysql-bin.000001 | 844 | Table_map      |       101 |         890 | table_id: 108 (test.t1)                                           |
| mysql-bin.000001 | 890 | Write_rows     |       101 |         934 | table_id: 108 flags: STMT_END_F                                   |
| mysql-bin.000001 | 934 | Xid            |       101 |         965 | COMMIT /* xid=72 */                                               |
+------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+
13 rows in set (0.00 sec)

The binary log contains only the data produced in this node.

The new topology: MGR

Turning to the new software, let's first check whether replication is working. An important note here: SHOW SLAVE STATUS is not available in MGR. That's not entirely true. The channel architecture used for multi-master has been hijacked to convey information about group problems. If something goes wrong during the setup, you will find the information in the groupreplicationrecovery channel.

node1 [localhost] {msandbox} (performance_schema) > SHOW SLAVE STATUS for channel 'group_replication_recovery';
Empty set (0.00 sec)

When things are fine, the tables in performance_schema report a satisfactory status:

node1 [localhost] {msandbox} (performance_schema) > select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 00014418-1111-1111-1111-111111111111 | gmini       |       14418 | ONLINE       |
| group_replication_applier | 00014419-2222-2222-2222-222222222222 | gmini       |       14419 | ONLINE       |
| group_replication_applier | 00014420-3333-3333-3333-333333333333 | gmini       |       14420 | ONLINE       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+

The above command tells us that all nodes are online.

Next, we ask what are the stats of the current member.

node1 [localhost] {msandbox} (performance_schema) > select * from replication_group_member_stats\G
*************************** 1. row ***************************
                      CHANNEL_NAME: group_replication_applier
                           VIEW_ID: 14850806532423012:3
                         MEMBER_ID: 00014418-1111-1111-1111-111111111111
       COUNT_TRANSACTIONS_IN_QUEUE: 0
        COUNT_TRANSACTIONS_CHECKED: 12
          COUNT_CONFLICTS_DETECTED: 0
COUNT_TRANSACTIONS_ROWS_VALIDATING: 0
TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1-7:1000003-1000006:2000003-2000006
    LAST_CONFLICT_FREE_TRANSACTION: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000006
1 row in set (0.00 sec)

The same operation from a different member will give a very similar result.

node2 [localhost] {msandbox} (performance_schema) > select * from replication_group_member_stats\G
*************************** 1. row ***************************
                      CHANNEL_NAME: group_replication_applier
                           VIEW_ID: 14850806532423012:3
                         MEMBER_ID: 00014419-2222-2222-2222-222222222222
       COUNT_TRANSACTIONS_IN_QUEUE: 0
        COUNT_TRANSACTIONS_CHECKED: 12
          COUNT_CONFLICTS_DETECTED: 0
COUNT_TRANSACTIONS_ROWS_VALIDATING: 0
TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1-7:1000003-1000006:2000003-2000006
    LAST_CONFLICT_FREE_TRANSACTION: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000006
1 row in set (0.00 sec)

Then, we check the more classical replication status:

node1 [localhost] {msandbox} (performance_schema) > select * from replication_connection_status\G
*************************** 1. row ***************************
             CHANNEL_NAME: group_replication_applier
               GROUP_NAME: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee
              SOURCE_UUID: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee    # ----
                THREAD_ID: NULL
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 0
 LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00
 RECEIVED_TRANSACTION_SET: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1-7:1000003-1000006:2000003-2000006
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
1 row in set (0.00 sec)

There are a few things that strike the observer immediately:

As we saw in the single-primary topology, all transactions bear the UUID of the group, not of the server that generated them. While in single-primary mode this could be considered an asset, as it simplifies a failover procedure, in multi-primary mode I consider it to be a loss. We lose the knowledge of the transaction provenience. As you can see, the SOURCE_UUID field shows the group ID instead of the node.
The GTID numbers look odd. There is a set that stars at 1, another set that starts at 1 million, and a third one that starts at 2 million. What's going on? The answer is in the value of group_replication_gtid_assignment_block_size, which determines the block of values for each node. When the values in the block are exhausted, the node allocates another block. Someone could naively think that we could use this block to identify which node the data comes from, but this would be ultimately wrong for two reasons:
- The blocks are assigned on a first-come-first-served basis. If we start operations in node 2, its transactions will bear the lowest numbers.
- When the blocks are exhausted, the node starts a new block, meaning that with a busy cluster we will have hard time identifying which nodes uses which block.

If someone thought that we could get some more information from the replication tables in mysql, they are in for a disappointment:

node2 [localhost] {msandbox} (mysql) > select * from slave_master_info\G
*************************** 1. row ***************************
       Number_of_lines: 25
       Master_log_name:
        Master_log_pos: 4
                  Host: <NULL>            # ----
             User_name:
         User_password:
                  Port: 0                 # ----
         Connect_retry: 60
           Enabled_ssl: 0
Ssl_verify_server_cert: 0
             Heartbeat: 30
                  Bind:
    Ignored_server_ids: 0
                  Uuid:                   # ----
           Retry_count: 86400
 Enabled_auto_position: 1
          Channel_name: group_replication_applier
           Tls_version:
*************************** 2. row ***************************
       Number_of_lines: 25
       Master_log_name:
        Master_log_pos: 4
                  Host: <NULL>
             User_name: rsandbox
         User_password: rsandbox
                  Port: 0
         Connect_retry: 60
           Enabled_ssl: 0
Ssl_verify_server_cert: 0
             Heartbeat: 30
                  Bind:
    Ignored_server_ids: 0
                  Uuid:
           Retry_count: 1
 Enabled_auto_position: 1
          Channel_name: group_replication_recovery
           Tls_version:
2 rows in set (0.00 sec)

The table shows group operations rather than individual hosts connections. There is no origin information here.

Looking at the events, we will notice immediately some more differences.

node2 [localhost] {msandbox} (mysql) > show binlog events;
+------------------+------+----------------+-----------+-------------+-------------------------------------------------------------------------+
| Log_name         | Pos  | Event_type     | Server_id | End_log_pos | Info                                                                    |
+------------------+------+----------------+-----------+-------------+-------------------------------------------------------------------------+
| mysql-bin.000001 |    4 | Format_desc    |       102 |         123 | Server ver: 5.7.17-log, Binlog ver: 4                                   |
| mysql-bin.000001 |  123 | Previous_gtids |       102 |         150 |                                                                         |
| mysql-bin.000001 |  150 | Gtid           |       101 |         211 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1'       |
| mysql-bin.000001 |  211 | Query          |       101 |         270 | BEGIN                                                                   |
| mysql-bin.000001 |  270 | View_change    |       101 |         369 | view_id=14850806532423012:1                                             |
| mysql-bin.000001 |  369 | Query          |       101 |         434 | COMMIT                                                                  |
| mysql-bin.000001 |  434 | Gtid           |       101 |         495 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2'       |
| mysql-bin.000001 |  495 | Query          |       101 |         554 | BEGIN                                                                   |
| mysql-bin.000001 |  554 | View_change    |       101 |         693 | view_id=14850806532423012:2                                             |
| mysql-bin.000001 |  693 | Query          |       101 |         758 | COMMIT                                                                  |
| mysql-bin.000001 |  758 | Gtid           |       102 |         819 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:3'       |
| mysql-bin.000001 |  819 | Query          |       102 |         878 | BEGIN                                                                   |
| mysql-bin.000001 |  878 | View_change    |       102 |        1017 | view_id=14850806532423012:3                                             |
| mysql-bin.000001 | 1017 | Query          |       102 |        1082 | COMMIT                                                                  |
| mysql-bin.000001 | 1082 | Gtid           |       101 |        1143 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:4'       |
| mysql-bin.000001 | 1143 | Query          |       101 |        1250 | create schema if not exists test                                        |
| mysql-bin.000001 | 1250 | Gtid           |       101 |        1311 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:5'       |
| mysql-bin.000001 | 1311 | Query          |       101 |        1440 | DROP TABLE IF EXISTS `test`.`t1` /* generated by server */              |
| mysql-bin.000001 | 1440 | Gtid           |       101 |        1501 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:6'       |
| mysql-bin.000001 | 1501 | Query          |       101 |        1630 | create table test.t1(id int not null primary key, sid int)              |
| mysql-bin.000001 | 1630 | Gtid           |       101 |        1691 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:7'       |
| mysql-bin.000001 | 1691 | Query          |       101 |        1755 | BEGIN                                                                   |
| mysql-bin.000001 | 1755 | Table_map      |       101 |        1797 | table_id: 219 (test.t1)                                                 |
| mysql-bin.000001 | 1797 | Write_rows     |       101 |        1837 | table_id: 219 flags: STMT_END_F                                         |
| mysql-bin.000001 | 1837 | Xid            |       101 |        1864 | COMMIT /* xid=51 */                                                     |
| mysql-bin.000001 | 1864 | Gtid           |       102 |        1925 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1000003' |
| mysql-bin.000001 | 1925 | Query          |       102 |        2032 | create schema if not exists test                                        |
| mysql-bin.000001 | 2032 | Gtid           |       102 |        2093 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1000004' |
| mysql-bin.000001 | 2093 | Query          |       102 |        2222 | DROP TABLE IF EXISTS `test`.`t2` /* generated by server */              |
| mysql-bin.000001 | 2222 | Gtid           |       102 |        2283 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1000005' |
| mysql-bin.000001 | 2283 | Query          |       102 |        2412 | create table test.t2(id int not null primary key, sid int)              |
| mysql-bin.000001 | 2412 | Gtid           |       102 |        2473 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:1000006' |
| mysql-bin.000001 | 2473 | Query          |       102 |        2542 | BEGIN                                                                   |
| mysql-bin.000001 | 2542 | Table_map      |       102 |        2584 | table_id: 220 (test.t2)                                                 |
| mysql-bin.000001 | 2584 | Write_rows     |       102 |        2624 | table_id: 220 flags: STMT_END_F                                         |
| mysql-bin.000001 | 2624 | Xid            |       102 |        2651 | COMMIT /* xid=62 */                                                     |
| mysql-bin.000001 | 2651 | Gtid           |       103 |        2712 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000003' |
| mysql-bin.000001 | 2712 | Query          |       103 |        2819 | create schema if not exists test                                        |
| mysql-bin.000001 | 2819 | Gtid           |       103 |        2880 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000004' |
| mysql-bin.000001 | 2880 | Query          |       103 |        3009 | DROP TABLE IF EXISTS `test`.`t3` /* generated by server */              |
| mysql-bin.000001 | 3009 | Gtid           |       103 |        3070 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000005' |
| mysql-bin.000001 | 3070 | Query          |       103 |        3199 | create table test.t3(id int not null primary key, sid int)              |
| mysql-bin.000001 | 3199 | Gtid           |       103 |        3260 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2000006' |
| mysql-bin.000001 | 3260 | Query          |       103 |        3324 | BEGIN                                                                   |
| mysql-bin.000001 | 3324 | Table_map      |       103 |        3366 | table_id: 221 (test.t3)                                                 |
| mysql-bin.000001 | 3366 | Write_rows     |       103 |        3406 | table_id: 221 flags: STMT_END_F                                         |
| mysql-bin.000001 | 3406 | Xid            |       103 |        3433 | COMMIT /* xid=68 */                                                     |
+------------------+------+----------------+-----------+-------------+-------------------------------------------------------------------------+
47 rows in set (0.00 sec)

Two important points:

All transaction IDs are assigned to the group, not to the node. The only way to see where the data is coming from is to look at the binary log itself and check the good old server-id. One wonders why we have come all this way with the ugly UUIDs in the global transaction identifier only to maim their usefulness by removing one of the most important feature, which is tracking the data origin.

For example:

# at 434
#170122 11:24:11 server id 101  end_log_pos 495         GTID    last_committed=1        sequence_number=2
SET @@SESSION.GTID_NEXT= 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:2'/*!*/;
# at 495
#170122 11:24:11 server id 101  end_log_pos 554         Query   thread_id=7     exec_time=6     error_code=0
SET TIMESTAMP=1485080651/*!*/;
BEGIN
/*!*/;

Because log-slave-updates is mandatory, the binary log in every node will have all the transactions of every other node. This can have disagreeable side effects when dealing with large data. Here is an example when we load the sample employee database from node #1:

With Group Replication, the load takes 2 minutes and 16 seconds, and the binary logs have the same size in every node.

[GR]$ ls -lh node?/data/*bin*
-rw-r-----  1 gmax  staff   8.2K Jan 22 10:22 node1/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    63M Jan 22 10:24 node1/data/mysql-bin.000002
-rw-r-----  1 gmax  staff    38B Jan 22 10:22 node1/data/mysql-bin.index

-rw-r-----  1 gmax  staff    63M Jan 22 10:24 node2/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    19B Jan 22 10:12 node2/data/mysql-bin.index

-rw-r-----  1 gmax  staff    63M Jan 22 10:24 node3/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    19B Jan 22 10:12 node3/data/mysql-bin.index

The same operation in multi-source replication takes 1 minute and 30 seconds. The binary logs are kept only in the origin.

[MS]$ ls -lh node?/data/*bin*
-rw-r-----  1 gmax  staff   4.9K Jan 22 10:26 node1/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    63M Jan 22 10:27 node1/data/mysql-bin.000002
-rw-r-----  1 gmax  staff    38B Jan 22 10:26 node1/data/mysql-bin.index

-rw-r-----  1 gmax  staff   1.4K Jan 22 10:14 node2/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    19B Jan 22 10:14 node2/data/mysql-bin.index

-rw-r-----  1 gmax  staff   1.4K Jan 22 10:14 node3/data/mysql-bin.000001
-rw-r-----  1 gmax  staff    19B Jan 22 10:14 node3/data/mysql-bin.index

Conflict resolution

One of the strong points of MGR is conflict resolution.

We can try a conflicting operations in two nodes, inserting the same data at the same time:

use test;
set autocommit=0;
insert into t2 values (3, @@server_id);
commit;

In multi source, we get a replication error, on both nodes. It's an ugly result, but it tells the user immediately that something went wrong in a given node, and doesn't let the error propagate to other nodes.

In MGR, the situation varies. This is a possible outcome:

node1 [localhost] {msandbox} (test) > set autocommit=0;                        |   node2 [localhost] {msandbox} (test) > set autocommit=0;
Query OK, 0 rows affected (0.00 sec)                                           |   Query OK, 0 rows affected (0.00 sec)
                                                                               |
node1 [localhost] {msandbox} (test) > insert into t2 values (3, @@server_id);  |   node2 [localhost] {msandbox} (test) > insert into t2 values (3, @@server_id);
Query OK, 1 row affected (0.00 sec)                                            |   Query OK, 1 row affected (0.00 sec)
                                                                               |
node1 [localhost] {msandbox} (test) > select * from t2;                        |   node2 [localhost] {msandbox} (test) > select * from t2;
+----+------+                                                                  |   +----+------+
| id | sid  |                                                                  |   | id | sid  |
+----+------+                                                                  |   +----+------+
|  2 |  102 |                                                                  |   |  2 |  102 |
|  3 |  101 |                                                                  |   |  3 |  102 |
+----+------+                                                                  |   +----+------+
2 rows in set (0.00 sec)                                                       |   2 rows in set (0.00 sec)
                                                                               |
node1 [localhost] {msandbox} (test) > commit;                                  |   node2 [localhost] {msandbox} (test) > commit;
Query OK, 0 rows affected (0.01 sec)                                           |   ERROR 3101 (HY000): Plugin instructed the server to rollback the current transaction.
                                                                               |   node2 [localhost] {msandbox} (test) > select * from t2;
 node1 [localhost] {msandbox} (test) > select * from t2;                       |   +----+------+
 +----+------+                                                                 |   | id | sid  |
 | id | sid  |                                                                 |   +----+------+
 +----+------+                                                                 |   |  2 |  102 |
 |  2 |  102 |                                                                 |   |  3 |  101 |
 |  3 |  101 |                                                                 |   +----+------+
 +----+------+                                                                 |   2 rows in set (0.00 sec)
 2 rows in set (0.00 sec)                                                      |

Here node # 2 got the transaction a fraction of second later, and its transaction was rolled back. Thus the transaction that was ultimately kept in the database was the one from node1 (server-id 101.) However, this behavior is not predictable. If we try the same operation again, we get a different outcome:

node1 [localhost] {msandbox} (test) > insert into t2 values (4, @@server_id);  |   node2 [localhost] {msandbox} (test) > insert into t2 values (4, @@server_id);
Query OK, 1 row affected (0.00 sec)                                            |   Query OK, 1 row affected (0.00 sec)
                                                                               |
node1 [localhost] {msandbox} (test) > select * from t2;                        |   node2 [localhost] {msandbox} (test) > select * from t2;
+----+------+                                                                  |   +----+------+
| id | sid  |                                                                  |   | id | sid  |
+----+------+                                                                  |   +----+------+
|  2 |  102 |                                                                  |   |  2 |  102 |
|  3 |  101 |                                                                  |   |  3 |  101 |
|  4 |  101 |                                                                  |   |  4 |  102 |
+----+------+                                                                  |   +----+------+
3 rows in set (0.00 sec)                                                       |   3 rows in set (0.00 sec)
                                                                               |
node1 [localhost] {msandbox} (test) > commit;                                  |   node2 [localhost] {msandbox} (test) > commit;
Query OK, 0 rows affected (0.01 sec)                                           |
ERROR 3101 (HY000): Plugin instructed the server to rollback                   |
the current transaction.                                                       |
node1 [localhost] {msandbox} (test) > select * from t2;                        |   node2 [localhost] {msandbox} (test) > select * from t2;
+----+------+                                                                  |   +----+------+
| id | sid  |                                                                  |   | id | sid  |
+----+------+                                                                  |   +----+------+
|  2 |  102 |                                                                  |   |  2 |  102 |
|  3 |  101 |                                                                  |   |  3 |  101 |
|  4 |  102 |                                                                  |   |  4 |  102 |
+----+------+                                                                  |   +----+------+
4 rows in set (0.00 sec)                                                       |   3 rows in set (0.00 sec)

In the second attempt, the transaction was rolled back by node 1, and the surviving one is the one that was inserted from node 2. This means that conflict resolution works, but it may not be what the user wants, as the resolved conflict if aleatory.

Summing up

On the plus side, MGR keeps what it promises. We can set up a cluster of peer nodes and replicate data between nodes with some advantages compared to older multi-source topologies.

On the minus side, the documentation could be vastly improved, especially for multi-primary setup. Moreover, users need to be aware of the limitations, such as serializable isolation level and foreign keys with constraints not being supported.

Most important from my standpoint is the reduction of monitoring information for this technology, namely the loss of information about the data origin.

Tuesday, October 11, 2016

OTN appreciation day : MySQL 8.0 data dictionary

About one month ago, the MySQL team at Oracle released MySQL 8.0, with a large list of changes. One of the most interesting features in the new release is also one that does not show up much, also because the team has gone to great length to keep most of its implementation hidden: the data dictionary.

NewImage

What makes the data dictionary so interesting, despite its scarce visibility, is the effect that it has on performance. Up to MySQL 5.7, searching the information_schema was an onerous operation, potentially crippling the system. In MySQL 8.0, the same operations are 100 times faster. This would be reason enough to be excited, as I know many users who have had terrible problems with the inefficiency of information_schema.

But there are several other visible changes: the various files that were needed to identify database objects (.frm for tables, .trg for triggers, .par for partitions) are now gone. When the database server wants to know metadata about such objects, it doesn't have to open files anymore. All the information is stored in the hidden InnoDB tables of the data dictionary. If the hidden tables bother you, you can use a not so difficult hack to discover them, although most operations can be performed without inner knowledge of how the dictionary is organized. The idea is that the public interface (views in information_schema) should be considered reliable and used for future compatibility. Perhaps the hidden status will be lifted in one of the next releases: many in the community have given feedback in this direction.

What else is the data dictionary good for? It allows atomic DDL operations, which were not guaranteed when data for the dictionary was stored in files and MyISAM tables. A nice complement to the data dictionary is the elimination of all MyISAM tables from the system. Now the grant privileges are stored in InnoDB tables, which, probably for historical compatibility, were left visible.

I am sure users will come up with other clever usages of the data dictionary when the system is wider adopted and understood. For now, it's a wonderful toy to explore!

This post was suggested by Tim Hall, a well known community champion among Oracle users, who maintains a rich web site of news and free technical info. According to his suggestion, this post wants to add to the OTN appreciation day, a distributed community effort to show something useful, or pleasant, or both related to the Oracle world.

For those not used to the Oracle Technology Network (OTN), it is the center of Oracle technology, the place where users can get all software (proprietary or open source) and other resources related to Oracle products. In the image below you may find several familiar names.

Monday, September 14, 2015

Improving Sakila database

The Sakila sample database was created almost 10 years ago, as a sample set of data for MySQL courses and examples.

The database was developed by MySQL employees, with substantial contributions form the community.

Recently, the database was updated to use some of the features in MySQL 5.7. As a result, we had two sets of samples, one to use with MySQL 5.0+, and one that only loads with MySQL 5.7.

I filed a feature request, offering a patch to use conditional schema and data changes, which was incorporated very quickly into the official release.

The current release, available within the MySQL docs, has conditional comments such as this:

/*!50610 ALTER TABLE film_text engine=InnoDB */ ;

Using these comments, we can enable specific features if the version is at least the one indicated in the comment. So, for example, we can use InnoDB tables with full-text indexes starting with version 5.6. The original table is MyISAM, but if the current version is at least 5.6.10 (that's the meaning of !50610) then the engine is changed to InnoDB.

>Similarly, there is a GEOMETRY column and SPATIAL key in the 'address' table, which are only enabled for MySQL 5.7.5+. A similar comment allows the loading of the relevant data only in MySQL 5.7.

Using these new files, you can install the Sakila database using any version of MySQL from 5.0 onwards, and it will always load correctly.

Tuesday, March 10, 2015

MySQL 5.7.6 is out. Be prepared for big changes

Today Oracle released MySQL 5.7.6 milestone 16. With this, MySQL 5.7 has been in development for over 2 years.
Compared to MySQL 5.6, the changes are quite extensive. The main effort of the team has been focused on speed, with performance reportedly improved from 2 to 3 times compared to previous releases.
A full list of what is new would take too much space here, but I would like to mention some key points:

Oracle has spent a considerable amount of energy in the improvement of MySQL security and safety. You will see many new features, but even more old features that were deprecated and more that were removed after deprecation in 5.6.
The installation process has been changing in every milestone release of MySQL 5.7, always towards the goal of making MySQL more robust. The effort, though, will break compatibility with installation practices used for previous versions.

In this article, I will cover the biggest changes during installation.
In MySQL 5.6, mysql_install_db has an option for the generation of a random password during database creation. The process is unfriendly to scripting but it is a step in the direction of ending the long time practice of creating a root user without a password.
In MySQL 5.7.4, there was a further change, with the random password generation becoming the default, with the possibility of skipping the creation with the option –skip-random-password.
In MySQL 5.7.5, the default was confirmed, but the option was changed to –insecure.

And now, in MySQL 5.7.6, the crackdown on old practices continues: mysql_install_db is deprecated, and replaced with mysqld –initialize (formerly known as “mysqld –bootstrap,” now deprecated.)
Here’s a test run:

$ ~/opt/mysql/5.7.6/bin/mysqld --no-defaults --basedir=~/opt/mysql/5.7.6/ --datadir=$PWD/data --initialize
2015-03-09T05:06:37.159659Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-03-09T05:06:37.355155Z 0 [Warning] InnoDB: New log files created, LSN=45790
2015-03-09T05:06:37.410118Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2015-03-09T05:06:37.467002Z 0 [Warning] Failed to setup SSL
2015-03-09T05:06:37.467029Z 0 [Warning] SSL error: SSL context is not usable without certificate and private key
2015-03-09T05:06:37.468142Z 1 [Warning] A temporary password is generated for root@localhost: f<jqhdJ(A5p#

Compared to the previous versions, the most notable difference is that there is no .mysql_secret file, but a simple line on the screen mentioning the temporary password.
But there is one, more important behavioral difference: this command works only once. When using mysql_install_db, you could run the same command even if the data directory existed, where the script would re-apply the data creation commands. Using mysqld –initialize, you can only run on a non-existing data directory.

$ ~/opt/mysql/5.7.6/bin/mysqld --no-defaults --basedir=~/opt/mysql/5.7.6/ --datadir=$PWD/data --initialize
2015-03-09T05:49:12.504413Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-03-09T05:49:12.505398Z 0 [ERROR] --initialize specified but the data directory exists. Aborting.
2015-03-09T05:49:12.505422Z 0 [ERROR] Aborting

Using the newly created database is a bit more trickier than before:

$ ~/opt/mysql/5.7.6/bin/mysql --no-defaults   -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.6-m16

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use mysql
ERROR 1820 (HY000): You must SET PASSWORD before executing this statement
mysql> set password=password('test');
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'password('test')' at line 1

Uh? What’s this? This command used to work until recently. The reason is a change in the syntax of SET PASSWORD, which now accepts a plain text argument:

mysql> set password='test';
Query OK, 0 rows affected (0.00 sec)

The old syntax was meant to be only deprecated, but it was accidentally completely removed. This will be hopefully fixed in MySQL 5.7.7.

More changes involve the use of GRANT, REVOKE, CREATE USER, and ALTER USER, which now are more strict. You get warnings if you try to create users with the GRANT command, or when mixing granting of privileges with authentication options.

mysql> grant all on test.* to testuser identified by 'test';
Query OK, 0 rows affected, 1 warning (0.00 sec)

Warning (Code 1287): Using GRANT for creating new user is deprecated and will be removed in future release. Create new user with CREATE USER statement.

mysql> grant all on *.* to testuser identified by 'test';

Query OK, 0 rows affected, 1 warning (0.00 sec)

Warning (Code 1287): Using GRANT statement to modify existing user's properties other than privileges is deprecated and will be removed in future release. Use ALTER USER statement for this operation.

In short, if you have automated scripts that deal with installing and administering MySQL, you should test them with warnings enabled, and be prepared for broken compatibility with your old practice.

One such ‘old practice’ scripts that is broken by the new syntax changes is MySQL-Sandbox. I have just released an updated version (MySQL Sandbox 3.0.48) with a workaround for MySQL 5.7.6 changed SET PASSWORD syntax.

BTW, did I mention that the password field in the mysql.user table was removed? That’s another surprise that may break quite a lot of existing tests:

mysql> use mysql
Database changed
mysql> select host,user,password from user;
ERROR 1054 (42S22): Unknown column 'password' in 'field list'

mysql> select host, user, authentication_string from user;
+-----------+-------------+-------------------------------------------+
| host      | user        | authentication_string                     |
+-----------+-------------+-------------------------------------------+
| localhost | root        | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.%     | msandbox    | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| localhost | msandbox    | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| localhost | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.%     | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.%     | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| localhost | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.%     | rsandbox    | *B07EB15A2E7BD9620DAE47B194D5B9DBA14377AD |
| %         | testuser    | *94BDCEBE19083CE2A1F959FD02F964C7AF4CFC29 |
+-----------+-------------+-------------------------------------------+
9 rows in set (0.00 sec)

Thursday, September 26, 2013

Forking MySQL/ for how long can forks keep up?

Fact: MySQL 5.6 was released as GA in February 2013
Fact: MySQL 5.6 has been available with its complete set of features since September 2012
Fact: On September 21st, Oracle has released MySQL 5.7.2, which is the de facto major release after MySQL 5.6 (5.7.1 was just a token “we’re-still-in-business” release).
Fact: As of today, there is no GA-ready fork of MySQL 5.6.

Percona Server is still in RC state, while MariaDB, with its runaway version 10, is still in alpha state. Of these releases, Percona Server seems the one in the better shape. Their problem was to adapt Percona Server to the enhanced codebase for 5.6, and the merging problems were bigger than the ones encountered in 5.1 and 5.5. Percona Server is a business oriented fork. It includes specific features for performance and manageability, things that users are asking for, and things that make Percona professional services easier to carry out.

Much different is the case of MariaDB. Their feature set is wider and hard to categorize globally. It includes some features that enhance performance (block commit, subquery optimization) but also plenty of features that are included only because nobody else wanted to touch them. In my experience, adding features to a project does not make it more stable, as the MySQL team knows well, with its disastrous experience with version 6.0, which was aborted after two years of alpha. MariaDB is in a state where they are adding lots of stuff to their code base, while merging selected pieces from MySQL 5.6. It could be a recipe for success or disaster. User experience will tell.

Up to MySQL 5.5, MariaDB has kept itself completely compatible with MySQL, with the goal of being a drop-in replacement. With the capriciously numbered version 10, that promise of compatibility has been broken. Most of the features introduced in MySQL 5.6 have not been included in MariaDB 10. Some of them have been rewritten with a completely different approach. As a result, MariaDb 10 lacks some of MySQL 5.6 features (parallel replication, for example) while showing some that are not in the upstream project (multi-source replication). The design of GTID has been started from scratch, and surely will make interoperability really hard (if not downright impossible) between the two projects.

Looking at the status of the projects, it seems that the MySQL team at Oracle has produced so many changes, that keeping up with it, even in terms of simple patch merging, is a heavy task. Where does it leave us, the community users? We have the choice between using the latest and greatest from Oracle, or waiting for the next latest and greatest from one of the forks. Percona Server seems to be on the final path of a less-than-easy catch-up, but MariaDB is farther away. And even if it releases a GA soon, it will find itself in a situation where merging changes from Oracle is going to be tougher and tougher. It may mean that performance enhancements from mainstream MySQL will take months (or years) to reach MariaDB. Users willing to benefit from the latest improvements will choose between Oracle MySQL and (with some delay) Percona Server. MariaDB will need to either do some sort of cherry-picking from Oracle releases or implement its own improvements. Can they compete with the 200 strong engineering team at Oracle? It’s going to be a tough proposition.

Given the above facts, how is it that we hear of big players like Red Hat and Google announcing that they will adopt MariaDB instead of Oracle’s MySQL? Since I could not find a sensible technical reason, I am forced to conclude that the choice has been political. Both Red Hat and Google have been entangled in commercial competition, and sometimes in court. Yet, choosing software for non-technical reason looks foolish to me.

Summing up: Oracle is releasing a steady stream of improvement for MySQL. This shows a high level of commitment that does not play well with what the doomsayers have been announcing for the past 4 years. So far, facts are contradicting that FUD. Forks are lagging behind. Some more than others.

Disclaimer: The above thoughts are my own, and don’t reflect the opinions of my employer or other companies.

Tuesday, September 24, 2013

MySQL 5.7.2 : Good job Oracle! (Well, almost)

On September 21st, during the opening keynote at MySQL Connect 2013, Tomas Ulin disclosed the release of MySQL 5.7.2. This is a milestone release that includes several new features. Unlike the Previous one, which was just a point of pride, where Oracle was stating its continuous commitment to releasing new versions of MySQL. In MySQL 5.7.2, we see several new features:

First and foremost, performance. The announcement slides say MySQL 5.7.2 is 95% faster than MySQL 5.6 and 172% faster than MySQL 5.5. I don’t know yet in which circumstances these numbers hold true, but I am sure someone at Percona will soon prove or disprove the claim.
Performance Schema tables for several aspects:
- Memory usage
- Stored programs and Events pre-filtering
- Replication tables. You can have all the information that you get from SHOW SLAVE STATUS, and then some more, in nicely organized tables. Regrettably, there is no corresponding table or tables for SHOW MASTER STATUS. This is a feature that I did ask for long time ago (7 1/2 years, to be precise) and I had great expectations. Now the feature comes, sort of crippled. Without a table for the contents of SHOW MASTER STATUS, there is no good monitoring system that can be implemented.
There is now a different method for parallel replication. Instead of splitting the threads by schema, the parallelization can be defined by using a logical clock, which uses information stored by the master, telling the slaves when it is safe to run threads in parallel.
More flexibility with Semi-synchronous replication
You can now have more than one trigger per table per each action. The CREATE TRIGGER syntax now include a “trigger_order” clause that allows you to define in which sequence your triggers will run;
EXPLAIN FOR CONNECTION will let you run an explain statement for an already running query.

There are other news and improvements, and they are all listed in the Release notes, and What’s new in MySQL 5.7. Two minor changes that did not make the news, and yet, they mean a lot for end user experience:

The Verbosity of the bootstrap which filled the screen (and the error log) with useless trivia that made the users lose track of the real errors, has been greatly reduced. Now you only see warnings and errors. Thank you, MySQL team!
The test suite now uses InnoDB as default storage engine. This means that a lot of possible errors that could have been overlooked will now be caught by the test suite. Next step will be to use InnoDB for the grant tables, but there does not seem to be a set goal for this.

What else is there? The MySQL Labs now list several experimental features for MySQL 5.7.2. These are features that did not make the cut for the milestone, and are still in a less reliable state. Most notable among them is a build that offers multi source replication. I need to test it more before I can hazard some deeper judgment. For now, I am impressed by the amount of work produced.

Some disappointment comes from the realization that many of the shortcomings found in MySQL 5.6 are still there. SOme of them have been reduced of eliminated by the new performance schema tables, but the bulk of what I noticed in MySQL 5.6 is still there with the same limitations. I will come back to this topic with a deeper analysis of the new features.

Wednesday, August 21, 2013

Tungsten-Replicator 2.1.1 with better installation and built-in security

UPDATE 2013-08-30: Tungsten 2.1.2 was released.

UPDATE 2013-08-23: We have found a few problems that happen when replicating with RBR and temporal columns. We will have to publish an updated bugfix release quite soon.

Tungsten Replicator 2.1.1 is out. Key features in this release are:

A better installer, of which we have already given a preview in tpm, the multi-master composer. The new installer allows faster and more powerful deployments of both single and multiple masters topologies. And it also allows the next feature:
Secured communication layer. Now the replicator data and administrative messages can be encrypted with SSL across nodes. The security layer, once installed, is transparent. All replication features will keep working as before, and the encryption is independent from the database. In fact, heterogeneous replication (e.g. MySQL to MongoDB, Oracle to MySQL, etc) can use it just as easily as MySQL to MySQL replication.
Full support for MySQL 5.6 binary log improvements. Now you can have the best of two worlds, running MySQL 5.6 enhanced performance, and Tungsten advanced replication features, without compromises. Due to this improvement, we also have the first change in our transport layer (the Transaction History Logs, or THL) since we released parallel replication. This means that a full cluster upgrade is needed (first slaves, and then masters) if you want to use the new release.

For more information on Tungsten Replicator 2.1.1, see the Release notes.

What does this mean for the common user? Let’s see what you can experience, when installing Tungsten Replicator 2.1.1

$ tar -xzf tungsten-replicator-2.1.1-230.tar.gz
$ cd tungsten-replicator-2.1.1-230
$ export VERBOSE=1
$ ./cookbook/install_master_slave
## -------------------------------------------------------------------------------------
## Installation with deprecated method will resume in 30 seconds - Hit CTRL+C now to abort
## -------------------------------------------------------------------------------------
## WARNING: INSTALLATION WITH tungsten-installer and configure-service IS DEPRECATED
## Future versions of Tungsten Cookbook will only support tpm-based installations
## To install with tpm, please set the variable 'USE_TPM' and start again
## -------------------------------------------------------------------------------------
....5....^C

Installation with tungsten-installer, which has been used until now, is still available, but it is deprecated. We want to encourage everyone to use tpm, as we will stop supporting tungsten-installer from the next release (2.1.2).

The main reason for using tpm instead of tungsten-installer, is that you can now install with security. The Tungsten manual has an extensive section on how to create security certificates. If you are not used to this kind of tasks, you may get discouraged from the very beginning, as you will need to create two key stores, one encrypted password store, and one file with JMX access rules. Tungsten Cookbook to the rescue! It will be enough to state our intention to install using tpm, with security enabled, and the cookbook script will generate the needed files for you.

$ export USE_TPM=1
$ export WITH_SECURITY=1
$ ./cookbook/install_master_slave
Certificate stored in file </home/tungsten/tinstall/tungsten-replicator-2.1.1-230/cookbook/client.cer>
Certificate was added to keystore
[Storing /home/tungsten/tinstall/tungsten-replicator-2.1.1-230/cookbook/truststore.ts]
Using parameters:
-----------------
password_file.location   = /home/tungsten/tinstall/tungsten-replicator-2.1.1-230/cookbook/passwords.store
encrypted.password   = true
truststore.location      = /home/tungsten/tinstall/tungsten-replicator-2.1.1-230/cookbook/truststore.ts
truststore.password      = cookbookpass
-----------------
Creating non existing file: /home/tungsten/tinstall/tungsten-replicator-2.1.1-230/cookbook/passwords.store
User created successfuly: cookbook
Using parameters:
-----------------
password_file.location   = /home/tungsten/tinstall/tungsten-replicator-2.1.1-230/cookbook/passwords.store
encrypted.password   = true
truststore.location      = /home/tungsten/tinstall/tungsten-replicator-2.1.1-230/cookbook/truststore.ts
truststore.password      = cookbookpass
-----------------
User created successfuly: cookbook
# ---------------------------------------------------------------------
# Options for tpm
\
--thl-ssl=true \
--rmi-ssl=true \
--rmi-authentication=true \
--rmi-user=cookbook \
--java-keystore-password=cookbookpass \
--java-truststore-password=cookbookpass \
--java-truststore-path=/home/tungsten/tinstall/tungsten-replicator-2.1.1-230/cookbook/truststore.ts \
--java-keystore-path=/home/tungsten/tinstall/tungsten-replicator-2.1.1-230/cookbook/keystore.jks \
--java-jmxremote-access-path=/home/tungsten/tinstall/tungsten-replicator-2.1.1-230/cookbook/jmxremote.access \
--java-passwordstore-path=/home/tungsten/tinstall/tungsten-replicator-2.1.1-230/cookbook/passwords.store
# ---------------------------------------------------------------------

Next, you will see the complete installation command using tpm, and the cluster will be built as smoothly as it would be without the security additions.

Notice that the paths that you see on the screen are created dynamically. Once installed, the security files will be deployed in a standard location, which will be easily picked up when you need to upgrade.

The difference that you will notice about the secure deployment is only in a few small differences. When using the cookbook tools, you will see a ssl label next to each secured node:

$ ./cookbook/show_cluster
--------------------------------------------------------------------------------------
Topology: 'MASTER_SLAVE'
--------------------------------------------------------------------------------------
# node host1 (ssl)
cookbook  [master]  seqno:          0  - latency:   0.681 - ONLINE
# node host2 (ssl)
cookbook  [slave]   seqno:          0  - latency:   1.397 - ONLINE
# node host3 (ssl)
cookbook  [slave]   seqno:          0  - latency:   1.683 - ONLINE
# node host4 (ssl)
cookbook  [slave]   seqno:          0  - latency:   1.684 - ONLINE

When using the traditional tools, you will notice one tiny difference in the master URI:

Processing status command...
NAME                     VALUE
----                     -----
appliedLastEventId     : mysql-bin.000008:0000000000000427;0
appliedLastSeqno       : 0
appliedLatency         : 0.681
channels               : 1
clusterName            : cookbook
currentEventId         : mysql-bin.000008:0000000000000427
currentTimeMillis      : 1377091602039
dataServerHost         : host1
extensions             :
latestEpochNumber      : 0
masterConnectUri       : thls://localhost:/    
masterListenUri        : thls://host1:2112/    
maximumStoredSeqNo     : 0
minimumStoredSeqNo     : 0
offlineRequests        : NONE
pendingError           : NONE
pendingErrorCode       : NONE
pendingErrorEventId    : NONE
pendingErrorSeqno      : -1
pendingExceptionMessage: NONE
pipelineSource         : /var/lib/mysql
relativeLatency        : 656.039
resourcePrecedence     : 99
rmiPort                : 10000
role                   : master
seqnoType              : java.lang.Long
serviceName            : cookbook
serviceType            : local
simpleServiceName      : cookbook
siteName               : default
sourceId               : host1
state                  : ONLINE
timeInStateSeconds     : 655.552
transitioningTo        :
uptimeSeconds          : 656.431
version                : Tungsten Replicator 2.1.1 build 230
Finished status command...

Instead of thl:// you see thls://. That’s the indication that the replicators are communicating using a SSL channel.

The same procedure works for multi-master and heterogeneous topologies. In fact, the very same mechanism is used in our commercial product, Continuent Tungsten, where it is installed using the same tools and the same tpm options.

For existing deployments we have a manual page dedicated to Upgrading from tungsten-installer to tpm-based installation. If you are a cookbook user, try

./cookbook/upgrade

There is a live webinar covering many Tungsten-Replicator 2.1.1 features. It is free, on Thursday, August 22nd, at 10am PT.

Friday, June 14, 2013

Welcome Tungsten Replicator 2.1.0!

Overview

First off, the important news. Tungsten Replicator 2.1.0 was released today.
You can download it and give it a try right now.

Second, I would say that I am quite surprised at how much we have done in this release. The previous release (2.0.7) was in February, which is just a few months ago, and yet it looks like ages when I see the list of improvements, new features and bug fixes in the Release Notes. I did not realized it until I ran my last batch of checks to test the upgrade from the previous release, which I hadn’t run for quite a long time. It’s like when you see a son growing in front of your eyes day by day, and you don’t realize he’s grown a full foot until a distant relative comes visit you. The same happened to me here. I looked at the ./cookbook directory in 2.0.7, and I saw just a handful of commands (most of them now deprecated), and then at 2.1.0, which has about 30 new commands, all nicely categorized and advertised in the embedded documentation. If you are starting today with Tungsten Replicator 2.1.0, you can run

./cookbook/readme

and

./cookbook/help

Upgrade

If you were using Tungsten Replicator before, you need to know how to upgrade. If, by any unfortunate chance, you were not using the Cookbook recipes to run the installation, the method for installing is the following:

unpack the tarball in a staging directory

For each node in your deployment:

stop the replicator

run

./tools/update –release-directory=$PATH_TO_DEPLOYED_TUNGSTEN –host=$NODE

If your node has more than one service, restart the replicator

If you are using the cookbook, you can run an upgrade using

./cookbook/upgrade

This command will ask for your current topology and then show all the commands that you should run to perform the upgrade, including adapt the cookbook scripts to use the new deployment.

So, What’s New:

The list of goodies is long. All the gory details are in the Release Notes. Here I would like to mention the ones that have impressed me more.

Oracle Extractor Is Open Source

Up to the previous release, you could extract from MySQL and appley to Oracle, all using open source tools. If you wanted to extract from Oracle, you needed a commercial license. Now all the replication layer is completely open source. You can replicate from and to Oracle using Tungsten Replicator 2.1.0 under the terms of the GPL v2. However, you will still have to buy database licenses from Oracle!

Installation and Administration

There is a long list of utilities released inside the ./cookbook directory, which will help you install and maintain the cluster with a few strokes. See References #2 and #3 below. The thing that you should try right away is:

 # edit ./cookbook/COMMON_NODES.sh
 # edit ./cookbook/USER_VALUES.sh
 ./cookbook/validate_cluster

This will tell you if your servers are ready for deployment, without actually deploying anything.

Documentation!

We have hired a stellar professional writer (my former colleague at MySQL AB, well known book writer MC Brown) and the result is that our well intentional but rather unfocused documentation is now shaping up nicely. Among all the things that got explained, Tungsten Replicator has its own getting started section.

Metadata!

Tungsten replication tools now give information using JSON. Here’s a list of commands to try:

trepctl status -json
trepctl services -json -full
trepctl properties | less
thl list -headers -high 100 [-json]

For example:

$ trepctl services -json
[
{
"appliedLatency": "0.81",
"state": "ONLINE",
"role": "slave",
"appliedLastSeqno": "1",
"started": "true",
"serviceType": "local",
"serviceName": "cookbook"
} 
]

$ trepctl properties -filter replicator.service.comments
{
"replicator.service.comments": "false"
}

More Tools

My colleague Linas Virbalas has made the team (and several customers) happy when he created two new tools:

ddlscan, a Utility to Help Analyze and Migrate Database Schemas
the rename filter A supercharged filter that can rename mostly any object in a relational database, from schema down to columns.

Linas coded also the above mentioned JSON-based improvements.

MongoDB Installation

It was improved and tested better. It’s a pleasure top see how data from a relational database flow into a rival NoSQL repository as if they belong there! See reference #4 below.

More to Come

What’s listed here is what we have tested and documented. But software development is not a linear process. There is much more boiling in the cauldron, ready to be mixed into the soup of release 2.1.1.

We’re working hard at making filters better. You will see soon the long awaited documentation for them, and a simplified interface.

Another thing that I have tested and worked surprisingly well is the creation of Change Data Capture for MySQL. This is a feature that is usually asked for by Oracle users, but I tried it for MySQL and it allowed me to create shadow tables with the audit trace of their changes. I will write about that as soon as we smooth a few rough edges.

Scripting! This going to be huge. Much of it is already available in the source, but not fully documented or integrated yet. The thing that you will see soon in the open is a series of Ruby libraries (the same used by the very sophisticated Tungsten installation tools) that is exposed for general usage by testers and tool creators. While the main focus of this library is aimed at the commercial tools, there is a significant portion of work that needs to end up in the replicator, and as a result its usability will increase.

What else? I may have forgot something important amid all the excitement. If so, I will amend in my next articles. Happy hacking!

References

Tuesday, March 12, 2013

Sessions at Percona Live MySQL Conference 2013: fun, competition, novelties, and a free pass

Percona Live MySQL Conference and Expo, April 22-25, 2013

The Percona Live MySQL Conference and Expo 2013 is almost 1 month away. It's time to start planning, set the expectations, and decide what to attend. This post will give a roundup of some of the sessions that I recommend attending and I look forward to.

First, the unexpected!

After much talk and disbelief, here they come! Oracle engineers will participate to the Percona Live conference. This is wonderful! Their participation was requested by the organizers, by the attendees, and by community advocates, who all told the Oracle management how important it is to be in this conference. Finally, they have agreed to come along, and here they are, with one keynote and three general sessions.

[Tue 10:50am] The State of the Dolphin
[Wed 2:50pm] MySQL 5.6: Performance Benchmarks, Tuning, and ‘Best’ Practices
[Wed 11:50am] MySQL 5.6: Redefining Replication
[Tue 4:50pm] MySQL 5.6: What’s New in InnoDB

My talks

I will be a speaker at the conference, and thus it's no surprise that I will recommend my talks.

[Mon 1:30pm] Using Tungsten Replicator to solve replication problems. This is a complete walkout to explain Tungsten replicator, from the basics to multiple masters deployment and advanced techniques.
[Wed 3:30pm] Script it! Make professional DBA tools out of nothing. This is a session that shows how to create admin scripts that deploy MySQL servers and put them in replication.
[Wed 3:30pm] There is an old tradition about Lightning talks, which will happen during the community reception. I am the host of the lightning talks, but among them, there is also one of mine: MySQL Replication mythology

My company's talks

Continuent is very active at many conferences, and at this one we are participating massively. I know I look partial in this matter, but I am really proud of the products that we create and maintain at my company. That's why I highly recommend these talks.

[Wed 9:00am] Keynote – How MySQL Can Thrive in the World of Massive Data Hype. This is no marketing. Robert Hodges explains why and how you can make business with good intentions.
[Tue 11:30am] Getting started with Tungsten Replicator. This is alternative replication with lots of advanced features. If plain replication is unsatisfactory, that's one place to look.
[Wed 1:00pm] State of the Art for MySQL Multi-Master Replication. Yet another talk by Robert. This one is a semi-philosophical talk, explaining what you can and can't do with multiple master technologies.
[Wed 2:00pm] Surviving an Amazon Outage Neil Armitage explains the theory (a little) and practice (a lot) of high availability.
[Thu 11:50am] Replicating in Real-Time Between Oracle and Oracle, and Oracle and MySQL with Tungsten. Replicating from and to Oracle used to be extremely expensive. Tungsten Replicator makes it cheap, even free!

Competing with whom?

MySQL is a standard, and widely popular. Yet, it has shortcomings and weak points, which allow for alternative solutions to flourish. There are many sessions that offer alternatives to the vanilla software.

[Tue 1:20pm] MariaDB Cassandra Interoperability. MariaDB is a magnetic fork of MySQL. It's magnetic in the sense that it attract most of the features or enhancements that nobody else wanted to accept. While some of its features may look like a whim (and some of them have been discontinued already), there are some that look more interesting than others. This integration with Cassandra deserves some exploration.
[Tue 3:50pm] MySQL Cluster - When to use it and when not to. The classic MySQL Cluster. Some believe that it's a drop-in replacement for a single server. It's not. It's a powerful solution, but it is not fit for all.
[Wed 11:10am] Fine Tuning Percona XtraBackup to your workload. This tool has become a de-facto standard. It is available everywhere, easy to use, and powerful. A great tale of an alternative tool that became the standard.
[Thu 9:00am] MySQL, YourSQL, NoSQL, NewSQL - the state of the MySQL ecosystem While all the keynotes are worth attending, this one is special. If you want to understand the MySQL world, Matt Aslett can draw a quite useful map for you.

New and renewed technologies

There are many interesting talks about new things, or old technologies with a new twist.

[Tue 3:50pm] common_schema: DBA's framework for MySQL. This is the most promising collection of SQL tools that you can find. Must see.
[Tue 4:50pm] Multi-Tier Asynchronous Replication Using Symmetricds. I don't really know what this is about, but it makes me curious.
[Wed 1:00pm] Optimized Integration between MySQL and Hadoop by Gwen Shapira. Gwen is an expert of big data, and an excellent speaker.

Tales from the trenches

[Wed 4:30pm] Lessons learned managing 500+ MySQL servers in the cloud
[Thu 1:50pm] Scaling Pinterest - The Next Stage

Win a free pass

Percona is offering free passes for community participation. One of them is available to readers of this blog and I will be the judge.

To get a free pass, do the following:

Blog, tweet, or post on another public media about this conference;
Leave a comment here, with a link to your post;
The free pass will be given to the most useful or pleasant post;
Make sure there is a way to reach you by email or twitter;

Please notice:

I will award the free pass to the post that I like most. The adjudication will be entirely subjective.
Deadline: March 20th, 2013.

Thursday, February 14, 2013

Provisioning an Oracle slave using Tungsten and MySQL Sandbox

A few years ago, I used MySQL Sandbox to filter binary logs. While that one was a theoretical case, recently I came across a very practical case where we needed to provision an Oracle database, which is the designated slave of a MySQL master.

In this particular case, we needed to provision the Oracle slave with some auditing features in place. Therefore, mass load functions were not considered. What we needed was the contents of the MySQL database as binary rows i.e. the same format used for row-based replication.

To achieve the purpose as quickly and as painlessly as we could, I thought to employ the services of a MySQL Sandbox. The flow of information would be like this:

Install a sandbox, using the same version and character set of the master;
Set the sandbox default binary log format as ROW;
take a backup with mysqldump (with no data)
filter the CREATE TABLE statements, replacing INNODB with BlackHole. This pass is important to speed up insert operations and to minimise storage needs. Using this trick, we only needed storage for the binary logs, not the data itself.
Load the altered schema into the sandbox
Take a full data dump without creation info
Load the data into the sandbox
Start replicating from the sandbox to the Oracle slave (using Tungsten Replicator)
When the provisioning is over, point the replicator to the original master, to the point where we finished dumping data;
Start replicating live from MySQL to Oracle.

Despite the many steps, the plan worked quite smoothly. We managed to extract data from three databases (and about 600 tables) for a total of 0.5 TB, in about 7 hours. The task was completed using a tool developed by my colleague Linas, which does an automated DDL translation from MySQL to Oracle. Probably this tool deserves an article of its own.

Tuesday, September 11, 2012

My speaking engagements - Q4 2012

After a long pause in the speaking game, I am back.

It's since April that I haven't been on stage, and it is now time to resume my public duties.

I will speak at MySQL Connect in San Francisco, just at the start of Oracle Open World, with a talk on MySQL High Availability: Power and Usability. It is about the cool technology that is keeping me busy here at Continuent, which can make life really easy for DBAs. This talk will be a demo fest. If you are attending MySQL Connect, you should see it!
A happy return for me. On October 27th I will talk about open source databases and the pleasures of command line operations at Linux Day in Cagliari, my hometown. Since I speak more in California than in my own backyard, I am happy that this year I managed to get a spot here.
The company will have a team meeting in Nopvember (Barcelona, here we come!) and from there I will fly to Bulgaria, where I am speaking at the Bulgarian Oracle User Group conference. Here I will have two talks, one about MySQL for business, and the other is "MySQL High Availability for the masses".
A few days later, again on the road, in London, for Percona Live, with a talk on MySQL High Availability: Power, Magic, and Usability. It is again about our core products, with some high technology fun involved. I will show how our tools can test the software, spot the mistakes, fix the cluster, and even build a step-by-step demo.

See you around. Look for me carefully, though. I may look differently from how I have been depicted so far.

Wednesday, August 22, 2012

MySQL 5.6 replication gotchas (and bugs)

There has been a lot of talk about MySQL 5.6 replication improvements. With few exceptions, what I have seen was either marketing messages or hearsay. This means that few people have really tried out the new features to see whether they meet the users needs.

As usual, I did try the new version in my environment. I like to form my own opinion based on experiments, and so I have been trying out these features since they have appeared in early milestones.

What follows is a list of (potentially) surprising results that you may get when using MySQL 5.6.
All the examples are made using MySQL 5.6.6.

Gotcha #1 : too much noise

I have already mentioned that MySQL 5.6 is too verbose when creating data directory. This also means that your error log may have way more information than you'd like to get. You should check the contents of the error log when you start, and either clean it up before using it on a regular basis or take note of what's there after a successful installation, so you won't be surprised when something goes wrong.

Gotcha #2 : Innodb tables where you don't expect them

Until version 5.5, after you installed MySQL, you could safely drop the ib* files, change the configuration file, and restart MySQL with optimized parameters. Not anymore.

When you run mysqld with the --bootstrap option (which is what mysql_install_db does), the server creates 5 innodb tables:

 select table_schema, table_name 
from information_schema .tables 
where engine='innodb';
+--------------+----------------------+
| table_schema | table_name           |
+--------------+----------------------+
| mysql        | innodb_index_stats   |
| mysql        | innodb_table_stats   |
| mysql        | slave_master_info    |
| mysql        | slave_relay_log_info |
| mysql        | slave_worker_info    |
+--------------+----------------------+

The slave_* tables are needed for the safe crash slave feature, which we'll cover later. The innodb_*_stats tables are as documented at Innodb persistent stats, and they seem to contain almost the same info of the tables with the same name that you find in Percona Server INFORMATION_SCHEMA. I can only speculate why these tables are in mysql rather than in performance_schema.

Another side effect of this issue is that, whatever setting you want to apply to innodb (size of the data files, file-per-table, default file format, and so on) must be done when you run mysqld --bootstrap.

Gotcha #3 : Global transaction IDs and security

The information about Global transaction ID is not easy to locate. But eventually, searching the manual, you will get it. The important information that you take from this page is that this feature only works if you enable all these options in all the servers used for replication:

log-bin
server-id=xx
log-slave-updates 
gtid-mode=ON
disable-gtid-unsafe-statements

The first two options are not a surprise. You need them for replication anyway. Check.

The third one is puzzling. Why would you want this option in a master? But then you realize that this will allow any server to be promoted or demoted at will. Check.

gtid-mode is the main option that needs to be enabled for global transaction IDs. Check

The last option forces the server to be safe, by using only transactional tables, and by forbidding things like temporary tables inside transactions and create table ... select. Which means that if you try to update a MyISAM table in the master, the statement will fail. You won't be allowed to do it. Check?

The trouble is, if you enable gtid-mode=ON (with its mandatory ancillary options), you can't run mysql_secure_installation, because that utility needs to delete anonymous users and clean the 'db' table for anonymous usage of the 'test' database.

The workaround is to enable GTID after you secure the installation, which means one more server restart.

Gotcha #4 (bug): multi thread slave won't work without safe-crash slave tables

To enable parallel replication, you need to change the value of 'slave_parallel_workers" to a value between 1 and 1024.

show variables like '%worker%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| slave_parallel_workers | 0     |
+------------------------+-------+
1 row in set (0.00 sec)

slave1 [localhost] {msandbox} ((none)) > stop slave;
Query OK, 0 rows affected (0.06 sec)

slave1 [localhost] {msandbox} ((none)) > set global slave_parallel_workers=5;
Query OK, 0 rows affected (0.00 sec)

slave1 [localhost] {msandbox} (mysql) > start slave;
Query OK, 0 rows affected, 1 warning (0.05 sec)

slave1 [localhost] {msandbox} ((none)) > select * from mysql.slave_worker_info\G
Empty set (0.00 sec)

What the hell? The workers table is empty.

I know the cause: the slave_worker_info table is not activated unless you also set relay_log_info_repository='table'. What I don't understand is WHY it is like that. If this is documented, I could not find where.

Anyway, once you are in this bizarre condition, you can't activate relay_log_info_repository='table', because of the following

Gotcha #5 (bug) : master and relay_log repository must be either set forever or they will fail

After we have activated parallel threads, without enabling table repositories, you can't easily get to a clean replication environment:

set global relay_log_info_repository='table';
start slave;
ERROR 1201 (HY000): Could not initialize master info structure; more error messages can be found in the MySQL error log

And the error log says:

120822 14:15:08 [ERROR] Error creating relay log info: Error transfering information.

What you need to do is

stop the slave
enable both master_info_repository and relay_log_info_repository as 'table'
set the number of parallel threads
restart the slave

slave1 [localhost] {msandbox} (mysql) > stop slave;
Query OK, 0 rows affected (0.02 sec)

slave1 [localhost] {msandbox} (mysql) > set global master_info_repository='table';
Query OK, 0 rows affected (0.00 sec)

slave1 [localhost] {msandbox} (mysql) > set global relay_log_info_repository='table';
Query OK, 0 rows affected (0.00 sec)

slave1 [localhost] {msandbox} (mysql) > set global slave_parallel_workers=5;
Query OK, 0 rows affected (0.00 sec)

slave1 [localhost] {msandbox} (mysql) > start slave;
Query OK, 0 rows affected, 1 warning (0.01 sec)

slave1 [localhost] {msandbox} (mysql) > select count(*) from slave_worker_info;
+----------+
| count(*) |
+----------+
|        5 |
+----------+
1 row in set (0.00 sec)

This sequence of commands will start parallel replication, although MySQL crashes when restarting the slave.

Gotcha #6 : Global transaction IDs not used in parallel threads

Global transaction IDs (GTIDs) are very useful when you need to switch roles from master to slave, and especially when you deal with unplanned failovers. They are also a great simplification in many cases where you need to identify a transaction without getting lost in the details of binary log file and position.

However, one of the cases where GTIDs would have been most useful, they are not there. The table mysql.slave_workers_info still identifies transactions by binary log and position. Similarly, CHANGE MASTER TO does not use GTIDs, other than allowing the automatic alignment (MASTER_AUTO_POSITION=1). If you need to perform any fine tuning operations, you need to revert to the old binary log + position.