Showing posts with label best practices. Show all posts
Showing posts with label best practices. Show all posts

Monday, April 02, 2018

dbdeployer GA and semantic versioning

dbdeployer went into release candidate status a few weeks ago. Since then, I added no new features, but a lot of tests. The test suite now runs 3,000+ tests on MacOS and a bit more on Linux, for a grand total of 6,000+ tests that need to run at least twice: once with concurrency enabled and once without. I know that testing can't prove the absence of bugs, but I am satisfied with the results, since all this grinding has allowed me to find several bugs and fix them.

In this framework, I felt that dbdeployer could exit candidate status and get to version 1.0. This happened on March 26th. An immediate side effect of this change is that from this point on, dbdeployer must adhere to the semantic versioning principles:

A version number is made of Major, Minor, and Revision. When changes are applied, the following happens:

  • Backward-compatible bug fixes increment the Revision number (e.g. 1.0.0 to 1.0.1)
  • Backward-compatible new features increment the Minor number (1.0.1 to 1.1.0)
  • Backward incompatible changes (either features or bug fixes that break compatibility with the API) increment the Major number (1.15.9 to 2.0.0)

The starting API is defined in API-1.0.md, which was generated manually.
The file API-1.1.md contains the same API definition, but was generated automatically and can be used to better compare the initial API with further version.

So the app went from 1.0 to 1.1 in less than one week. In obedience to semantic versioning principles, if a new backward-compatible feature is added, the minor number of the version increases. What does backward-compatible mean? It means that commands, procedures, and workflows that were working with the previous version will also work with the current one. It's just that the new release will have more capabilities. In this case, the added feature is the ability of having environment variables HOME and PWD recognized and properly expanded in the configuration file. It's nothing very exciting, but changing the minor number gives the user a hint of what to expect from the new release.

Let's give a few examples:

  • Version goes from 1.0.0 to 1.0.1: It means that there are only bug fixes, and you should expect to use it without modifications.
  • Version goes from 1.0.1 to 1.1.0: You should be able to use dbdeployer just as before, but you should check the release notes to see what's new, because there are new functionalities that might be useful to you.
  • Version goes from 1.3.15 to 2.0.0: Danger! A major number bumped up means that something has changed in the API, which is now partially or totally incompatible with the previous release. Your workflow may break, and you must check the release notes and the documentation to learn how to use the new version.

This is different from other applications. For example, the MySQL server uses version numbers with hard to predict meaning:

  • MySQL 5.1, 5.5, 5.6, and 5.7 should be, in fact, major version number changes, not minor ones. Each one of them introduces incompatible changes that require careful review of the novelties.
  • Within the same version (such as MySQL 5.7) there are a lot of compatible and incompatible changes, although the minor number stays the same.

The plan with dbdeployer is to use the version number as a manifest, to give users an immediate feeling of what to expect. Rather than changing minor or major number only when the developers think there is some juicy new thing of which they can be proud, the version number will tell whether users should worry about compatibility or not.

In my general development plan, you are more likely to see versions like "1.25.16" than version "2.0," meaning that I will try to keep the current API valid as much as possible. A major version change will signify that a new feature could not fit in the current infrastructure and a new one would be needed.

You can draw your own conclusions here. A semantic versioning paradigm is unlikely to be adopted by most software vendors, because version numbers are often marketing gimmicks, and they can charge you more convincingly for a version 6.0 than for version 1.34.
Free software, OTOH, can do this. My goal with dbdeployer is to help the MySQL community, and I will achieve that goal more easily if my releases can be adopted without fear of incompatibility.


Monday, March 05, 2018

MySQL security for real users


Security features overview

One of Oracle's tenets is the focus on security. For this reason, when it took over the stewardship of MySQL, it started addressing the most common issues. It was not quick acting, but we have seen real progress:

  1. MySQL 5.7 has removed the anonymous accounts, which was the greatest threat to security. Because of those accounts, and the default privileges granted to them, users without any privileges could access the "test" database and do serious damage. Additionally, because of the way the privilege engine evaluates accounts, anonymous users could hijack legitimate users, by preventing them to work properly.
  2. The "root" account now comes with a password defined during initialization. This is good news for security, but bad news for how the change was implemented.
  3. There is a new way of setting an options file for connection credentials: the mysql_config_editor paired with option --login-path allows users to store encrypted credentials for secure use. Also here, while we should rejoice for the added security, we can't help feeling that the implementation is yet again far from meeting users needs.
  4. There is an useful warning (introduced in MySQL 5.6) when using a password on the command line, telling users that it is a risk. Also in this case, we have a usability issue: while users care about their production deployments and use option files to avoid using passwords on the command line, there are, nonetheless, a lot of testing scripts, used in safe environment or with non-valuable data, where a password in the command line was not an issue, and the new warning simply screws up the result of those carefully crafted tests. This change, which can't be overcome without modifying the MySQL clients code, needs users to change their existing tests to adapt to the new behavior.
  5. MySQL 8 introduces roles, which simplify the accounts management. There are some minor usability issues, although in general the feature meets expectations.

This is the scenario of the main enhancements in MySQL since 5.6. Each one of them has some usability problems, some minor, some really bad.
We will first have a look at the problems mentioned above, and then examine the root cause for why they have arisen.


Usability issues

I start by noticing that some developers in the MySQL team have been working there for many years, starting with the time when MySQL was a different database and was used really differently.
In those times, managing the database meant that a human (the DBA) would run operations manually, take a look at the result, and adjust when needed. And then, when things went wrong, the same human explored the database system to find out what happened, took action, and went back to sleep.

Human-centered management leads to human problems: lazy DBA left their databases without password, using the root account, and exposing the server to uninspired attacks; they used passwords on the command line, without caring for options files (or without knowing about them.) Careless DBAs did not deal with anonymous users, leaving a dangerous backdoor in their server.

Some of the new functionalities introduced in the latest MySQL versions are aimed at this type of users: when you install MySQL, you get a message saying: your root password is ************, and the lazy DBAs have no option but to take note and use it. When they use the password on the command line, the annoying warning forces them to start using an options file or the mysql_config_editor.

This is all good, but the main problem here is that the DBAs of 10 years ago are on the verge of extinction. They are replaced by a new breed of DBAs who are not lazy, because they can't afford to be, and need to use dozens, hundreds, thousands of databases at once, using configuration management tools that don't require manual intervention, and actually abhor it. In the land of automation, some of the MySQL security enhancements are not seen as a solution, but as new problems.

Let's see an interesting example: docker containers.

Using Docker, MySQL images are deployed using a password on the command line. This is done for compatibility with the first implementation of the image maintained by the Docker team, where you deploy with this syntax:

docker run -e MYSQL_ROOT_PASSWORD=secret -d mysql

The MYSQL_ROOT_PASSWORD is a directive that becomes an environment variable inside the container, and the server uses it during initialization. As you can imagine, this is not recommended for a secure environment. Then, what's the MySQL team recommendation? They suggest the same strategy used for manual installation: set a directive MYSQL_RANDOM_ROOT_PASSWORD that results in a random password being generated, then collected by the DBA and used. Alternatively, the directive MYSQL_ONETIME_PASSWORD will force the root user to change the password on first connection.

The above suggestions were designed with the ancient DBA still in mind, while container deployment is even more automated than VMs, and it is based on the principle of immutable objects, i.e. containers that spring up from the cloud ready to run, with no configuration needed, and especially no configuration that requires someone (or some tool) to extract a new password from a log. I proposed a different solution, that would never show passwords on the command line and while it was implemented, but it still feels like a hack to circumvent an inadequate design.

As a result, the implementation inside the MySQL recommended Docker image uses "--initialize-insecure" to start the server. This is an implicit recognition of the bad design of the initialization feature. What was designed to overcome DBA's laziness becomes an obstacle towards automation.

We have a similar problem with mysql_config_editor: the tool will create a safe configuration file with credentials for multiple instances, but the password must be inserted manually. Consequently, this potentially useful feature doesn't get adopted, because it would be too difficult or impossible to automate properly.

We have seen that, of the security features that were introduced lately, only a few can be used safely in an automated environment, and all of them have at least one small usability quirk. I have talked about a confusing issue related to the removal of anonymous users where in their eagerness of removing the vulnerability the MySQL team removed also the "test" database, which was a consequence, not the cause of the problem. And I have recently talked about roles usability where there are still open problems, like the ability of telling roles from users which are apparently not considered a bug by the MySQL team.

All the above considerations led me to ask: how did we get to this point? There is an active community, and feedback is offered often with plenty of detail. How come we have such an abundance of usability issues? Don't the developers spend time with users at conferences to learn what they expect? Don't they read articles and blog posts about how a new feature meets expectations? Don't they talk to customers who have adopted new features? They certainly do. Then, why the usability problems persist?

What follows is my observation and speculation on this matter.


Disconnection between MySQL developers and users community

My experience working with system providers has put me in contact with many users. I have seen that in most cases users are very much protective of their current deployment, because it took them long time to get it right, and they don't upgrade unless they don't have another choice. I've seen users test the newer versions, realize that they would break some of their procedures, and defer the upgrade to better times that never come. I remember last year a user with a not so large set of servers was considering an upgrade to MySQL 5.6, while 5.7 had been GA for two years. The reason was a set of incompatibilities that made the upgrade too difficult.

For companies that deal with thousands of servers, the problem is similar, but exacerbated by the quantity of servers to upgrade and the need to do it without stopping operations. This latest requirement has made some users decide not to use GTID, because it required offline time for a master, and they hadn't had time enough to test the upgrade to MySQL 5.7 that would solve that problem.

For one reason or the other, many companies upgrade only two or three years after a given version became GA. And this is the main problem: until they use it in production, or at least test the version for a projected upgrade, users can't give valuable feedback, the one that is related to usage in production, and when they do, the version for which they provide feedback has been GA for long time, and can't be changed, while the next one is already close to GA, and as such will be untouchable.

The MySQL team gets feedback on a release from a handful of curious users who don't delay testing until the new version is GA, but don't provide the kind of important feedback that get the development team attention, such as deployment in production by large customers. In many cases, large customers are the ones that upgrade several years after GA, and by then their input is difficult to act upon.

We have then a curious situation, where the warnings given by the early software explorers are confirmed years later by the users to which the MySQL team listens more closely, but by then the next version of the server is already locked in a release schedule that nobody wants to alter to fix usability issues.

How can we solve this problem? Simple: listen to early software explorers and try to fix problems before GA.

Saturday, November 21, 2015

Default users in MySQL 5.7

Among the many New features introduced by MySQL 5.7, we can notice a strong trend towards improving the server security by default. Two features stand out in this respect:

  • A password-less root is no longer the default for new installations. Unless you say otherwise, the default installers mysqld --initialize and the deprecated mysql_install_db will generate a random password which the user needs to change.
  • The anonymous accounts are no longer created by default. When you start MySQL, you only get the root user (and a new one: read on).

The above features are a great advance not only for security but also for usability. The anonymous users were a continuous source of mismatched connections, with difficult to explain errors, and confusion for beginners and experts alike. That's why MySQL-Sandbox has removed the anonymous accounts since its first release.

There are, however, two more changes in the privileges tables:

  1. We have a new user, mysql.sys, which is not a usable account, as it comes with a deliberately invalid password. Its role is only to allow the sys objects to have an owner different from root. This novelty has caught me by surprise, and I had to adjust the management of users in MySQL-Sandbox, to prevent removal of this account attributes from various mysql tables: user, db, tables_priv. In fact, since we don't have anonymous accounts anymore, MySQL-Sandbox does not remove rows from db and tables_priv. (The changes were apparently new for the team maintaining MySQL images on Docker, as this user is not available on Docker MySQL containers: the entrypoint file removes all accounts from the user file.)
  2. The test database is removed by default. This is, in my opinion, a mistake. The reason for the vulnerability of the test database was that it was open to use for the anonymous users. But since we don't have anonymous users anymore, deleting the test database is like obeying a superstitious belief.

Anyway, MySQL-Sandbox 3.1.02 comes with a few small bug fixes, among which is the preservation of the mysql.sys user and a few adjustments to the tests to take into account the latest change. The test database is always present in sandboxes, despite the above mentioned irrational removal.

Let me demonstrate the issue. In a brand new installation, we create an anonymous user and the test database:


mysql> create user '';
Query OK, 0 rows affected (0.02 sec)

mysql> select host,user from user;
+------+------+
| host | user |
+------+------+
| %    |      |
| %    | root |
+------+------+
2 rows in set (0.00 sec)

mysql> create schema test;
Query OK, 1 row affected (0.01 sec)

Then we try to access the server


# mysql -user=''
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.7.9-log MySQL Community Server (GPL)

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use test
ERROR 1044 (42000): Access denied for user ''@'%' to database 'test'
mysql> use performance_schema
ERROR 1044 (42000): Access denied for user ''@'%' to database 'performance_schema'
mysql> use sys;
ERROR 1044 (42000): Access denied for user ''@'%' to database 'sys'

mysql> show schemas;
+--------------------+
| Database           |
+--------------------+
| information_schema |
+--------------------+
1 row in set (0.00 sec)

mysql> select table_schema,engine,count(*) from information_schema.tables group by table_schema,engine;
+--------------------+--------+----------+
| table_schema       | engine | count(*) |
+--------------------+--------+----------+
| information_schema | InnoDB |       10 |
| information_schema | MEMORY |       51 |
+--------------------+--------+----------+
2 rows in set (0.00 sec)

mysql> show grants for '';
+------------------------------+
| Grants for @%                |
+------------------------------+
| GRANT USAGE ON *.* TO ''@'%' |
+------------------------------+
1 row in set (0.00 sec)

So, the anonymous user is unable to do harm, as it can't even see the databases. The only effective measure was cleaning up the table mysql.db, which was the one giving access to the test database to the anonymous users.

Wednesday, November 04, 2015

MySQL-Docker operations. - Part 3: MySQL replication in Docker


Previous Episodes:

With the material covered in the first two articles, we have all the elements needed to set up replication in Docker. We just need to put the pieces together.
If you want to do everything by hand, it will only take a few minutes. The steps are not complicated. If you have followed the reasoning in the past episodes, you will know what to do.
Or, you can make your life easier by using the ready-made scripts available in Github as MySQL replication samples. In fact, what this article will do in practice is adding comments to a stripped down version of the deployment script, which will make things clear.
First, we will use two templates for the configuration files. Notice that the server-id value is replaced by a placeholder. We will also use a smaller options file for the client's username and password.
$ cat my-template.cnf
[mysqld]
user  = mysql
port  = 3306
log-bin  = mysql-bin
relay-log = mysql-relay
server-id = _SERVERID_
master-info-repository=table
relay-log-info-repository=table
gtid_mode=ON
enforce-gtid-consistency

$ cat node-my.cnf 
[client]
user=root
password=secret

First phase: deploying the containers

Note: Don't try to copy-and-paste the code below. It is a reduced version that is used here only for the sake of commenting it. I have deliberately added line numbers to make copy-and-paste harder. Take the code from github instead.

Monday, November 02, 2015

MySQL-Docker operations. - Part 2: Customizing MySQL in Docker


Previous Episodes:

After seeing the basics of deploying a MySQL server in Docker, in this article we will lay the foundations to customising a node and eventually using more than one server, so that we can cover replication in the next one.

Enabling GTID: the dangerous approach.

To enable GTID, you need to set five variables in the database server:
  • master-info-repository=table
  • relay-log-info-repository=table
  • enforce-gtid-consistency
  • gtid_mode=ON
  • log-bin=mysql-bin
For MySQL 5.6, you also need to set log-slave-updates, but we won't deal with such ancient versions here.
Using the method that we've seen in Part 1, we can use a volume to change the default /etc/my.cnf with our own.
$ cat my-gtid.cnf
[mysqld]
user  = mysql
port  = 3306
log-bin  = mysql-bin
relay-log = mysql-relay
server-id = 12345

master-info-repository=table
relay-log-info-repository=table
gtid_mode=ON
enforce-gtid-consistency
However, this approach may fail. It will work with some MySQL images, but depending on how the image is built, the server may not install at all.
$ docker run --name boxedmysql \
    -e MYSQL_ROOT_PASSWORD=secret \
    -v $PWD/my-gtid.cnf:/etc/my.cnf \
    -d mysql/mysql-server
b9c15ed3c40c078db5335dcb76c10da1788cee43b3e32e20c22b937af50248c5

$ docker exec -it boxedmysql bash
Error response from daemon: Container boxedmysql is not running
The reason for the failure is Bug#78957. When my.cnf contains log-bin and mysql is called prior to the installation to perform some detection tasks, the server creates the binary log index in the data directory. After that, the installation task will abort because the data directory is not empty. It sounds as if there is a set of unnecessary actions here (the server should not create the index without other components in place, and the installer should not complain about finding a harmless file in the data directory) but this is the way it is, and we should work around it. At the time of writing, the bug has received a temporary fix and the installation now works.
All considered, it's best that we are forced to run things this way, because there are side effects of enabling GTIDs at startup: there will be unwanted GTID sets in the server, and that could be annoying.

Sunday, October 25, 2015

MySQL-Docker operations. - Part 1: Getting started with MySQL in Docker

Docker is one of the fastest growing trends in IT. It allows fast deployment of services and applications on a Linux machine (and, with some limits, on other operating systems). Compared to other methods of deploying databases, such as virtual machines or application isolation, it offers faster operations and better performance.
Many people, surprised by the sudden advance of this technology, keep asking What is Docker? And why you should use it?
I will write soon an article with a deep comparison of the three methods (VM, container, sandbox), but for now, we should be satisfied with a few basic facts:
  • Docker is a Linux container. It deploys every application as a series of binary layers, containing just the minimum dependencies (libraries and applications) to make the service work;
  • It stores images in a central registry, from where the docker client can download them quickly;
  • By its definition, it is lightweight. If you have the images already in your system, deployment of the service happens in seconds.
  • Unlike virtual machines, where you can deploy virtualized Windows and other non-Linux environment, Docker is Linux-only. You can virtualize every service, provided that it runs on Linux.
  • Docker can run applications in various flavors of Linux at once. It actually makes the Linux flavor dependency transparent, to the point that the users barely realize that.

Installing Docker

Docker installation is pretty much straightforward. The Docker documentation covers the basics and the fine points of installing in any operating system. Rather than repeating the procedure here, I recommend looking the pages for Ubuntu, Mac OS X, or Windows.
Once the installation is complete, the commands shown in this article will apply to all platforms. When there are exceptions, it will be noted in the text.

Thursday, October 08, 2015

Sound advice for GTID, with caveats

During the PerconaLive conference in Amsterdam, I attended a session where I heard a good piece of advice about using GTID. It amounts to: look at SHOW SLAVE STATUS output, and if you see more than one line in the Executed_Gtid_Set field, this tells you immediately if someone has written on a slave database.
This is good advice. Let's dissect it. Here is what a regular slave looks like, when nobody has messed up with it:
SHOW SLAVE STATUS\G
*************************** 1. row ***************************
 [...]
             Master_Server_Id: 1
                  Master_UUID: 00013454-1111-1111-1111-111111111111
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12
            Executed_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
1 row in set (0.00 sec)
What you see here is a slave that has received transactions from a single source (Retrieved_Gtid_Set lists only one GTID set) and has applied data from a single source (also Executed_Gtid_Set shows a single item.)
Notice that this advice holds true even when the slave being considered is an intermediate one, i.e. a relay slave which is master of one or more slaves. Due to the nature of GTIDs, even though the intermediate slave is recording the transactions to its own binary log, the transaction identifier does not change. Thus you should see a clean set of transactions throughout the chain. For example, if you have another slave that is replicating from slave #2, you would see something like this:
SHOW SLAVE STATUS\G
*************************** 1. row ***************************
 [...]
             Master_Server_Id: 102
                  Master_UUID: 00013456-3333-3333-3333-333333333333
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12
            Executed_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:

Wednesday, August 19, 2015

MySQL replication in action - Part 4 - star and hybrid topologies

Previous episodes:

Introducing star topology.

In all-masters P2P topologies, we have seen that we have a way of deploying a topology where all nodes are masters, and achieve better efficiency and stability than ring topologies. That method comes at the price of a complex setup, which requires, for a N-node cluster, N*(N-1) connections.
We can achieve the same result as in a P2P all-masters topology by trading connections for stability. In a star topology (Figure 1) all nodes are masters, but they do not connect to each other directly. There is a special node, named hub, which receives the changes produced by each endpoint and spreads them to the others.
Topologies star
Figure 1 - A star topology

Thursday, August 06, 2015

Changed defaults between MySQL 5.6 and 5.7

MySQL 5.7 comes with many changes. Some of them are better explained than others.

I wanted to see how many changes I could get by comparing SHOW VARIABLES in MySQL 5.6 and 5.7.
The most notable ones are:

  • binlog_format: the default is now ROW. This variable affects the format of the binary log, whether you use it as a backup complement or for replication, the change means bigger binary logs and possibly side effects.
  • binlog_error_action now defaults to ABORT_SERVER. If the server cannot write to the binary log, rather than continuing its work without logging, it shuts down. This could be a desirable course of action, but better be prepared for the eventuality.
  • innodb_strict_mode is enabled by default, which is probably a good thing, but it means that previously accepted events will now generate an error instead than a warning.
  • sql_mode is now STRICT by default. While many well prepared users will be pleased with this change, which was advocated as best practice by some DBAs, the practical outcome is that several exiting applications may break because of unclean input.
  • sync_binlog, which affects data safety but also server performance is enabled.

Wednesday, August 05, 2015

In search of cleanliness : the elusive quiet installation



UPDATE: Almost solved! See at the end.

A clean installation of a database server is one where everything goes according to the expectations. It used to be easy: you only had to do what the manual says, and, presto! you would see your database server installed and ready to use. If something went wrong, you got one or more error messages that informed you of what needs to be fixed.

Sometimes, rarely, it happened that you got also a warning message, telling you that while the installation was successful, you could improve it by fine tuning this and that. No big deal.

Gone are those times. A clean installation nowadays is a much harder exercise, if not impossible. Let’s give it a try using MySQL 5.7.7.

Tuesday, March 10, 2015

MySQL 5.7.6 is out. Be prepared for big changes



Today Oracle released MySQL 5.7.6 milestone 16. With this, MySQL 5.7 has been in development for over 2 years.
Compared to MySQL 5.6, the changes are quite extensive. The main effort of the team has been focused on speed, with performance reportedly improved from 2 to 3 times compared to previous releases.
A full list of what is new would take too much space here, but I would like to mention some key points:


  • Oracle has spent a considerable amount of energy in the improvement of MySQL security and safety. You will see many new features, but even more old features that were deprecated and more that were removed after deprecation in 5.6.
  • The installation process has been changing in every milestone release of MySQL 5.7, always towards the goal of making MySQL more robust. The effort, though, will break compatibility with installation practices used for previous versions.

In this article, I will cover the biggest changes during installation.
In MySQL 5.6, mysql_install_db has an option for the generation of a random password during database creation. The process is unfriendly to scripting but it is a step in the direction of ending the long time practice of creating a root user without a password.
In MySQL 5.7.4, there was a further change, with the random password generation becoming the default, with the possibility of skipping the creation with the option –skip-random-password.
In MySQL 5.7.5, the default was confirmed, but the option was changed to –insecure.


And now, in MySQL 5.7.6, the crackdown on old practices continues: mysql_install_db is deprecated, and replaced with mysqld –initialize (formerly known as “mysqld –bootstrap,” now deprecated.)
Here’s a test run:


$ ~/opt/mysql/5.7.6/bin/mysqld --no-defaults --basedir=~/opt/mysql/5.7.6/ --datadir=$PWD/data --initialize
2015-03-09T05:06:37.159659Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-03-09T05:06:37.355155Z 0 [Warning] InnoDB: New log files created, LSN=45790
2015-03-09T05:06:37.410118Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2015-03-09T05:06:37.467002Z 0 [Warning] Failed to setup SSL
2015-03-09T05:06:37.467029Z 0 [Warning] SSL error: SSL context is not usable without certificate and private key
2015-03-09T05:06:37.468142Z 1 [Warning] A temporary password is generated for root@localhost: f<jqhdJ(A5p#

Compared to the previous versions, the most notable difference is that there is no .mysql_secret file, but a simple line on the screen mentioning the temporary password.
But there is one, more important behavioral difference: this command works only once. When using mysql_install_db, you could run the same command even if the data directory existed, where the script would re-apply the data creation commands. Using mysqld –initialize, you can only run on a non-existing data directory.


$ ~/opt/mysql/5.7.6/bin/mysqld --no-defaults --basedir=~/opt/mysql/5.7.6/ --datadir=$PWD/data --initialize
2015-03-09T05:49:12.504413Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-03-09T05:49:12.505398Z 0 [ERROR] --initialize specified but the data directory exists. Aborting.
2015-03-09T05:49:12.505422Z 0 [ERROR] Aborting

Using the newly created database is a bit more trickier than before:


$ ~/opt/mysql/5.7.6/bin/mysql --no-defaults   -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.6-m16

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use mysql
ERROR 1820 (HY000): You must SET PASSWORD before executing this statement
mysql> set password=password('test');
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'password('test')' at line 1

Uh? What’s this? This command used to work until recently. The reason is a change in the syntax of SET PASSWORD, which now accepts a plain text argument:


mysql> set password='test';
Query OK, 0 rows affected (0.00 sec)

The old syntax was meant to be only deprecated, but it was accidentally completely removed. This will be hopefully fixed in MySQL 5.7.7.


More changes involve the use of GRANT, REVOKE, CREATE USER, and ALTER USER, which now are more strict. You get warnings if you try to create users with the GRANT command, or when mixing granting of privileges with authentication options.


mysql> grant all on test.* to testuser identified by 'test';
Query OK, 0 rows affected, 1 warning (0.00 sec)

Warning (Code 1287): Using GRANT for creating new user is deprecated and will be removed in future release. Create new user with CREATE USER statement.

mysql> grant all on *.* to testuser identified by 'test';

Query OK, 0 rows affected, 1 warning (0.00 sec)


Warning (Code 1287): Using GRANT statement to modify existing user's properties other than privileges is deprecated and will be removed in future release. Use ALTER USER statement for this operation.

In short, if you have automated scripts that deal with installing and administering MySQL, you should test them with warnings enabled, and be prepared for broken compatibility with your old practice.


One such ‘old practice’ scripts that is broken by the new syntax changes is MySQL-Sandbox. I have just released an updated version (MySQL Sandbox 3.0.48) with a workaround for MySQL 5.7.6 changed SET PASSWORD syntax.


BTW, did I mention that the password field in the mysql.user table was removed? That’s another surprise that may break quite a lot of existing tests:


mysql> use mysql
Database changed
mysql> select host,user,password from user;
ERROR 1054 (42S22): Unknown column 'password' in 'field list'

mysql> select host, user, authentication_string from user;
+-----------+-------------+-------------------------------------------+
| host      | user        | authentication_string                     |
+-----------+-------------+-------------------------------------------+
| localhost | root        | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.%     | msandbox    | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| localhost | msandbox    | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| localhost | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.%     | msandbox_rw | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.%     | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| localhost | msandbox_ro | *6C387FC3893DBA1E3BA155E74754DA6682D04747 |
| 127.%     | rsandbox    | *B07EB15A2E7BD9620DAE47B194D5B9DBA14377AD |
| %         | testuser    | *94BDCEBE19083CE2A1F959FD02F964C7AF4CFC29 |
+-----------+-------------+-------------------------------------------+
9 rows in set (0.00 sec)

Tuesday, January 07, 2014

Multiple masters : attraction to the stars

In the last 10 years I have worked a lot with replication systems, and I have developed a keen interest in the topic of multiple masters in a single cluster. My interest has a two distinct origins:

  • On one hand, I have interacted countless times with users who want to use a replication system as a drop-in replacement for a single server. In many cases, especially when users are dealing with applications that are not much flexible or modular, this means that the replication system must have several points of data entry, and such points must work independently and in symbiosis with the rest of the nodes.
  • On the other hand, I am a technology lover (look it up in the dictionary: it is spelled geek), and as such I get my curiosity stirred whenever I discover a new possibility of implementing multi-master systems.

The double nature of this professional curiosity makes me sometimes forget that the ultimate goal of technology is to improve the users life. I may fall in love with a cute design or a clever implementation of an idea, but that cleverness must eventually meet with usability, or else it loses its appeal. There are areas where the distinction between usefulness and cleverness is clear cut. And there are others where we really don’t know where we stand because there are so many variables involved.

One of such cases is a star topology, where you have many master nodes, which are connected to each other through a hub. You can consider it a bi-directional master/slave. If you take a master/slave topology, and make every node able to replicate back to the master, then you have almost a star. To make it complete, you also need to add the ability of the master of broadcasting the changes received from the outside nodes, so that every node gets the changes from every other node. Compared to other popular topologies, say point-to-point all-masters, and circular replication, the star topology has the distinct advantage of requiring less connections, and of making it very easy to add a new node.

Star

Figure #1: Star topology

However, anyone can see immediately one disadvantage of the star topology: the hub is the cornerstone of the cluster. It’s a single point of failure (SPOF). If the hub fails, there is no replication anywhere. Period. Therefore, when you are considering a multi-master topology, you have to weigh in the advantages and disadvantages of the star, and usually you consider the SPOF as the most important element to consider.

Depending on which technology you choose, though, there is also another important element to consider, i.e. that data must be replicated twice when you use a star topology. It’s mostly the same thing that happens in a circular replication. If you have nodes A, B, C, and D, and you write data in A, the data is replicated three times before it reaches D (A->B, B->C, and C->D). A star topology is similar. In a system where A, B, and D are terminal nodes, and C is the hub, data needs to travel twice before it reaches D (A->C, C->D). Circular replication

Figure #2: Circular replication

This double transfer is bad for two reasons: it affects performance, and it opens to the risk of unexpected transformations of data. Let’s explore this concept a bit. When we replicate data from a master to a slave, there is little risk of mischief. The data goes from the source to a reproducer. If we use row-based-replication, there is little risk of getting the wrong data in the slave. If we make the slave replicate to a further slave, we need to apply the data, generate a further binary log in the slave host, and replicate data from that second binary log. We can deal with that, but at the price of taking into account more details, like where the data came from, when to stop replicating in a loop, whether the data was created with a given configuration set, and so on. In short, if your slave server has been configured differently from the master, chances are that the data down the line may be different. In a star topology, this translates into the possibility of data in each spoke to be replicated correctly in the hub, but to be possibly different in the other spokes.

Compare this with a point-to-point all-masters. In this topology, there are no SPOFs. You pay for this privilege by having to set a higher number of connections between nodes (every node must connect to every other node), but there is no second hand replication. Before being applied to the slave service, the data is applied only once in the originating master.

Point to point all masters

Figure #2: Point-to-point all-masters topology

Where do I want to go from all the above points? I have reached the conclusion that, much as user like star topologies, because of their simplicity, I find myself often recommending the more complex but more solid point-t-point all-masters setup. Admittedly, the risk of data corruption is minimal. The real spoiler in most scenarios is performance. When users realize that the same load will flow effortlessly in a point-to-point scenario, but cause slave lags in a star topology, then the choice is easy to make. If you use row-based replication, and in a complex topology it is often a necessary requirement, the lag grows to a point where it becomes unbearable.

As I said in the beginning, all depends on the use case: if the data load is not too big, a star topology will run just as fine as point-to-point, and if the data flow is well designed, the risk of bad data transformation becomes negligible. Yet, the full extent of star topologies weaknesses must be taken into account when designing a new system. Sometimes, investing some effort into deploying a point-to-point all-masters topology pays off in the medium to long term. Of course, you can prove that only if you deploy a star and try it out with the same load. If you deploy it on a staging environment, no harm is done. If you deploy in production, then you may regret. In the end, it all boils down to my mantra: don’t trust the theory, but test, test, test.

Monday, April 22, 2013

Installing and Administering Tungsten Replicator - Part 1 - basics

Intro

Tungsten Replicator is an open source tool that does high performance replication across database servers. It was designed to replace MySQL replication, although it also supports replication from and to Oracle and other systems. In this article, we will only cover MySQL replication, both simple and multi-master.

Preparing for installation

To follow the material in this article, you will need a recent build of Tungsten Replicator. You can get the latest ones from http://bit.ly/tr20_builds. In this article, we are using build 2.0.8-167.

Before starting any installation, you should make sure that you have satisfied all the prerequisites. Don't underestimate the list. Any missing items will likely result in installation errors.

If you are using Amazon EC2 servers, this page provides a script that makes the prerequisites an almost fully automated procedure.

To install any of the topologies supported by Tungsten, you need first to extract the software, define your nodes, and eventually change your default options.

  1. Download the software from http://bit.ly/tr20_builds
  2. Expand the tarball (tar -xzf tungsten-replicator-2.0.8-136.tar.gz)
  3. Change directory to the extracted path (cd tungsten-replicator-2.0.8-167)
  4. Define the VERBOSE user variable (it will show much details in the operations
  5. Edit the configuration files COMMON_NODES.sh and USER_VALUES.sh (The fields in RED are the ones that you should probably change.)
$ tar -xzf tungsten-replicator-2.0.8-167.tar.gz
$ cd tungsten-replicator-2.0.8-167
$ export VERBOSE=1
$ export PATH=$PWD/cookbook:$PATH

$ cat cookbook/COMMON_NODES.sh
#!/bin/bash
# (C) Copyright 2012,2013 Continuent, Inc - Released under the New BSD License
# Version 1.0.5 - 2013-04-03

export NODE1=host1
export NODE2=host2
export NODE3=host3
export NODE4=host4

$ cat cookbook/USER_VALUES.sh
#!/bin/bash
# (C) Copyright 2012,2013 Continuent, Inc - Released under the New BSD License
# Version 1.0.5 - 2013-04-03

# User defined values for the cluster to be installed.

# Where to install Tungsten Replicator
export TUNGSTEN_BASE=$HOME/installs/cookbook

# Directory containing the database binary logs
export BINLOG_DIRECTORY=/var/lib/mysql

# Path to the options file
export MY_CNF=/etc/my.cnf

# Database credentials
export DATABASE_USER=tungsten
export DATABASE_PASSWORD=secret
export DATABASE_PORT=3306

# Name of the service to install
export TUNGSTEN_SERVICE=cookbook

Pay attention to the paths. TUNGSTEN_BASE is where the binaries will be installed. You must make sure that:

  • The path is writable by the current user, in all nodes;
  • There is nothing in that path that may clash with the software to be installed. It should be a dedicated directory. Do NOT use your $HOME for this purpose. Use a subdirectory or a path under /usr/local or /opt/
  • The path must have enough storage to hold Tungsten Transaction History Logs (THL). They will occupy roughly twice as much as your binary logs.

Validating your nodes

You may think that you have followed the instructions for the prerequisites, but sometimes humans make mistakes. To make sure that your cluster can run Tungsten, you can run the script validate_cluster. This is an operation that performs all the installation checks in all nodes, without actually installing anything.

$ cookbook/validate_cluster
# Performing validation check ...
## 1 (host: host4)
./tools/tungsten-installer \
    --master-slave \
    --master-host=host1 \
    --datasource-user=tungsten \
    --datasource-password=secret \
    --datasource-port=3306 \
    --service-name=cookbook \
    --home-directory=/home/tungsten/installs/cookbook \
    --cluster-hosts=host1,host2,host3,host4 \
    --datasource-mysql-conf=/etc/my.cnf \
    --datasource-log-directory=/var/lib/mysql \
    --rmi-port=10000 \
    --thl-port=2112 \
    --validate-only -a \
    --info \
    --start
INFO  >> Start: Check that the master-host is part of the config
INFO  >> Finish: Check that the master-host is part of the config
#####################################################################
# Tungsten Community Configuration Procedure
#####################################################################
NOTE:  To terminate configuration press ^C followed by ENTER
...
...
...
( LOTS OF LINES FOLLOW)

If there is any error in your prerequisites, you will get an error, or possibly more than one. If the messages provided by this command are not enough to understand what it is going on, you can ask for yet more detail, using:

$ VERBOSE=2 ./cookbook/validate_cluster

DRY-RUN installation

Should you need to install in a set of nodes where you can't allow ssh connection across nodes, you may use the DRYRUN variable.

$ export DRYRUN=1

When this variable is set, the installation commands will not install, but only show you all the commands that you should run, with the right sequence.

For example, if you want to validate the cluster without SSH communication between nodes, a DRYRUN command will tell you the list of instructions to run and in which hosts to run them

$ DRYRUN=1 ./cookbook/validate_cluster
# Performing validation check ...
...

Using the instructions so received, you can copy the software to each node, and run the appropriate command in each one.

The same goes for every installation command. Should you need to install a star topology node by node with custom options, just run:

$ DRYRUN=1 ./cookbook/install_star

When you don't need the DRYRUN command anymore, remove the variable:

$ unset DRYRUN

Installing a master-slave topology

After the validation, you can launch your installation. If the topology is master/slave, the defaults are stored in cookbook/NODES_MASTER_SLAVE.sh

$ cat cookbook/NODES_MASTER_SLAVE.sh
#!/bin/bash
# (C) Copyright 2012,2013 Continuent, Inc - Released under the New BSD License
# Version 1.0.5 - 2013-04-03

CURDIR=`dirname $0`
if [ -f $CURDIR/COMMON_NODES.sh ]
then
    . $CURDIR/COMMON_NODES.sh
else
    export NODE1=
    export NODE2=
    export NODE3=
    export NODE4=
fi

export ALL_NODES=($NODE1 $NODE2 $NODE3 $NODE4)
# indicate which servers will be masters, and which ones will have a slave service
# in case of all-masters topologies, these two arrays will be the same as $ALL_NODES
# These values are used for automated testing

#for master/slave replication
export MASTERS=($NODE1)
export SLAVES=($NODE2 $NODE3 $NODE4)

The only variables that should concern you here are MASTERS and SLAVES. They refer to the nodes defined in COMMON_NODES.sh. If your master is NODE1, there is no need to change anything. If your master is, say, NODE2, then change the variables as:

export MASTERS=($NODE2)
export SLAVES=($NODE1 $NODE3 $NODE4)

Make sure that you have adjusted both master and slave definitions.

$ cookbook/install_master_slave
## 1 (host: host4)
./tools/tungsten-installer \
    --master-slave \
    --master-host=host1 \
    --datasource-user=tungsten \
    --datasource-password=secret \
    --datasource-port=3306 \
    --service-name=cookbook \
    --home-directory=/home/tungsten/installs/cookbook \
    --cluster-hosts=host1,host2,host3,host4 \
    --datasource-mysql-conf=/etc/my.cnf \
    --datasource-log-directory=/var/lib/mysql \
    --rmi-port=10000 \
    --thl-port=2112 \
    --start
... # A few minutes later ...
--------------------------------------------------------------------------------------
Topology: 'MASTER_SLAVE'
--------------------------------------------------------------------------------------
# node host1
cookbook  [master]  seqno:          1  - latency:   0.631 - ONLINE

# node host2
cookbook  [slave]   seqno:          1  - latency:   0.607 - ONLINE

# node host3
cookbook  [slave]   seqno:          1  - latency:   0.746 - ONLINE

# node host4
cookbook  [slave]   seqno:          1  - latency:   0.640 - ONLINE

Deployment completed
Topology      :'master_slave'
Tungsten path : /home/tungsten/installs/cookbook
Nodes         : (host1 host2 host3 host4)

After installing all the nodes, the cookbook script displays the cluster status. In this list, 'cookbook' is the name of the replication service, as defined in USER_VALUES.sh. You can change it before installing. Any name will do. Next to it you see the role (master or slave), then the 'seqno', which is the global transaction ID of your database events. Finally, the 'latency' is the difference, in seconds, between the time your transaction was recorded in the master binary logs and the time it was applied to the slave.

You can ask for such a status at any time, by calling:

$ cookbook/show_cluster

Simple replication administration

A cluster status doesn't tell you if replication is working. You may check if this is true by running:

$ cookbook/test_cluster
# --------------------------------------------------------------------------------------
# Testing cluster with installed topology 'MASTER_SLAVE'
# --------------------------------------------------------------------------------------
ok - Master host1 has at least 1 master services
# slave: host2
ok - Tables from master #1
ok - Views from master #1
ok - Records from master #1
ok - Slave host2 has at least 1 services
# slave: host3
ok - Tables from master #1
ok - Views from master #1
ok - Records from master #1
ok - Slave host3 has at least 1 services
# slave: host4
ok - Tables from master #1
ok - Views from master #1
ok - Records from master #1
ok - Slave host4 has at least 1 services
1..13

This command creates a table and a view in each master in your topology (in this case, a master/slave topology has only one master), insert a record using the view, and then check that each slave has replicated what was inserted. The output changes quite a lot when using a multi-master topology.

Astute readers will recognize that the output produced here complies with the Test Anything Protocol (TAP). If you have the 'prove' tool installed in your server, you may try it:

$ prove cookbook/test_cluster
cookbook/test_cluster...ok
All tests successful.
Files=1, Tests=13,  4 wallclock secs ( 3.17 cusr +  0.26 csys =  3.43 CPU)

Replication tools

The cluster status shown above (cookbook/show_cluster) uses the output of the Tungsten built-in tool trepctl to display a simplified status.

The tool is available inside the installation directory. If you have used the defaults, it is $HOME/installs/cookbook. ANd the tools are in $HOME/installs/cookbook/tungsten/tungsten-replicator/bin/.

This is not easy to remember, and even if you can remember it correctly, it requires a lot of typing. The cookbook provides an easy shortcut: cookbook/trepctl. For example:

$ cookbook/trepctl services
Processing services command...
NAME              VALUE
----              -----
appliedLastSeqno: 17
appliedLatency  : 0.773
role            : slave
serviceName     : cookbook
serviceType     : local
started         : true
state           : ONLINE
Finished services command...

Or, if you want the simplified output:

$ cookbook/trepctl services | cookbook/simple_services
cookbook  [slave]   seqno:         17  - latency:   0.773 - ONLINE

To administer the system properly, you need to know the tools, some paths to the logs and the configuration files, which are somehow elusive. Again, the cookbook to the rescue:

$ cookbook/paths
     replicator : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/bin/replicator
        trepctl : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/bin/trepctl
            thl : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/bin/thl
            log : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/log/trepsvc.log
           conf : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/conf/
        thl-dir : (service: cookbook) /home/tungsten/installs/cookbook/thl/cookbook
     backup-dir : (service: cookbook) /home/tungsten/installs/cookbook/backups/cookbook
   backup-agent : (service: cookbook) mysqldump

This command tells you the path to the three main tools:

  • trepctl Tungsten Replicator Control
  • thl or the Transaction History Log manager
  • replicator, which is the launcher for the replicator daemon.

You also get the path to the most common places you may need to access during your administrative tasks.

Similarly, there are shortcuts to perform common tasks:

  • cookbook/replicator: Shortcut to the 'replicator' command
  • cookbook/trepctl: Shortcut to the 'trepctl' command
  • cookbook/thl: Shortcut to the 'thl' command
  • cookbook/conf: Shows the configuration files using 'less'
  • cookbook/show_conf: Same as 'conf'
  • cookbook/edit_conf: Edits the configuration files using 'vim'
  • cookbook/vimconf: Same as 'vimconf.sh'
  • cookbook/emacsconf: Edits the configuration files using 'emacs'
  • cookbook/log: Shows the replicator log using 'less'
  • cookbook/show_log: Same as 'log'
  • cookbook/vilog: Edits the replicator log using 'vi'
  • cookbook/vimlog: Edits the replicator log using 'vim'
  • cookbook/emacslog: Edits the replicator log using 'emacs'
  • cookbook/heartbeat: Performs a heartbeat in each master
  • cookbook/paths: Shows the path to all important tools and services
  • cookbook/services: Performs 'trepctl services'
  • cookbook/backups: Shows which backups were taken in all nodes

You can get all the above commands, and all the others included in the cookbook, by calling:

$ cookbook/help

Uninstalling Tungsen

The cookbook makes it easy to install a replication cluster, and makes it easy to remove it as well.

If you look at the end of cookbook/USER_VALUES.sh, you will see these variables:

$ tail -n 12 cookbook/USER_VALUES.sh
##############################################################################
# Variables used when removing the cluster
# Each variable defines an action during the cleanup
##############################################################################
[ -z "$STOP_REPLICATORS" ]            && export STOP_REPLICATORS=1
[ -z "$REMOVE_TUNGSTEN_BASE" ]        && export REMOVE_TUNGSTEN_BASE=1
[ -z "$REMOVE_SERVICE_SCHEMA" ]       && export REMOVE_SERVICE_SCHEMA=1
[ -z "$REMOVE_TEST_SCHEMAS" ]         && export REMOVE_TEST_SCHEMAS=1
[ -z "$REMOVE_DATABASE_CONTENTS" ]    && export REMOVE_DATABASE_CONTENTS=0
[ -z "$CLEAN_NODE_DATABASE_SERVER" ]  && export CLEAN_NODE_DATABASE_SERVER=1
##############################################################################

The names are self-explanatory. These variables are used when you call the clear_cluster command. Then, the meaning becomes even more clear:

$ cookbook/clear_cluster
--------------------------------------------------------------------------------------
Clearing up cluster with installed topology 'MASTER_SLAVE'
--------------------------------------------------------------------------------------
!!! WARNING !!!
--------------------------------------------------------------------------------------
'clear-cluster' is a potentially damaging operation.
This command will do all the following:
* Stop the replication software in all servers. [$STOP_REPLICATORS]
* REMOVE ALL THE CONTENTS from /home/tungsten/installs/cookbook/.[$REMOVE_TUNGSTEN_BASE]
* REMOVE the tungsten_<service_name> schemas in all nodes (host1 host2 host3 host4) [$REMOVE_SERVICE_SCHEMA]
* REMOVE the schemas created for testing (test, evaluator) in all nodes (host1 host2 host3 host4)  [$REMOVE_TEST_SCHEMAS]
* Create the test server anew;                [$CLEAN_NODE_DATABASE_SERVER]
* Unset the read_only variable;               [$CLEAN_NODE_DATABASE_SERVER]
* Set the binlog format to MIXED;             [$CLEAN_NODE_DATABASE_SERVER]
* Reset the master (removes all binary logs); [$CLEAN_NODE_DATABASE_SERVER]
If this is what you want, either set the variable I_WANT_TO_UNINSTALL
or answer 'y' to the question below
You may also set the variables in brackets to fine tune the execution.
Alternatively, have a look at cookbook/clear_cluster and customize it to your needs.
--------------------------------------------------------------------------------------
Do you wish to uninstall this cluster? [y/n]

As you can see, for each action there is a corresponding variable. By default, all variables are active, except 'REMOVE_DATABASE_CONTENTS'. Setting or unsetting these variables will determine how much of your installation you want to undo.

Getting replication status

Once you have replication up and running, you need to know what's going on at a glance. We have seen in the previous sections that we can call trepctl services to get an overview of the replication process. Using the same tool, we can also get more detailed information

$ cookbook/trepctl status | nl
 1 Processing status command...
 2 NAME                     VALUE
 3 ----                     -----
 4 appliedLastEventId     : mysql-bin.000006:0000000000003163;0
 5 appliedLastSeqno       : 17
 6 appliedLatency         : 0.773
 7 channels               : 1
 8 clusterName            : default
 9 currentEventId         : NONE
10 currentTimeMillis      : 1365193975129
11 dataServerHost         : host4
12 extensions             :
13 latestEpochNumber      : 0
14 masterConnectUri       : thl://host1:2112/
15 masterListenUri        : thl://host4:2112/
16 maximumStoredSeqNo     : 17
17 minimumStoredSeqNo     : 0
18 offlineRequests        : NONE
19 pendingError           : NONE
20 pendingErrorCode       : NONE
21 pendingErrorEventId    : NONE
22 pendingErrorSeqno      : -1
23 pendingExceptionMessage: NONE
24 pipelineSource         : thl://host1:2112/
25 relativeLatency        : 22729.129
26 resourcePrecedence     : 99
27 rmiPort                : 10000
28 role                   : slave
29 seqnoType              : java.lang.Long
30 serviceName            : cookbook
31 serviceType            : local
32 simpleServiceName      : cookbook
33 siteName               : default
34 sourceId               : host4
35 state                  : ONLINE
36 timeInStateSeconds     : 23640.125
37 uptimeSeconds          : 23640.723
38 version                : Tungsten Replicator 2.0.8 build 136
39 Finished status command...

With the line number as a reference, we can describe quite a bit of useful information:

  • appliedLastEventId: (4) This is the event as found in the source database master. Since we are replicating from a MySQL server (don't forget that Tungsten can replicate from and to several heterogeneous servers) this ID is made of the binary log file name (mysql-bin.000006) and the binary log position (0000000000003163). Most of the time, you don't really need this information, as everything in Tungsten uses the Global Transaction ID (see next item)
  • appliedLastSeqno: (5) This is the Global Transaction Identifier for the current transaction.
  • appliedLatency: (6) This is the time difference, in seconds, between the moment when the transaction was written to the binary log in the master and the moment when it was applied in the slave. Notice that, if the server system times are not synchronized, you may have greater differences than expected. Also, if you keep a slave offline and re-connect it later, this value will increase accordingly.
  • channels (7) is the number of threads used for replication. By default it is 1. When using parallel replication, it increases.
  • dataServerHost: (11) The server for which we are showing the status.
  • masterConnectUri (14) is the address (hostname or IP + port ) of the current master for this service.
  • masterListenUri (15) is the address that will be used by the current server if it becomes a master.
  • pendingErrorSeqno: (22) When any of the error* lines (19 to 21) are used, this line shows the seqno (Global Transaction ID) of the event that is causing trouble. This piece of information is vital to find what is holding the system. (We will see an example later in this article)
  • role: (28) What is the role of this service. It could be 'master' or 'slave'. More roles are possible if the replicator is embedded in a more complex system.
  • serviceName: (30) The identification of the replication service. Not much important when using a master/slave topology, but vital when deploying multi-master services.
  • state: (35) It's what the replicator is doing. If "ONLINE," all is well. "OFFLINE:NORMAL" means that the service was stopped manually, while "OFFLINE:ERROR" means that something is wrong. If you see "GOING-ONLINE:SYNCHRONIZING," it means that either there is a connection issue between master and slave, or the slave is showing this state if the master is offline.

This command is the first step whenever you are troubleshooting a problem. If something goes wrong, chances are that 'cookbook/trepctl status' will tell you what it is going on. Notice, though, that if you are using a multi-master topology, then you will need to specify a service:

$ cookbook/trepctl -service somename status

It's quite important to understand that trepctl can give you the status of any node in the replication cluster. You don't need to execute the command in another node. All you need to do is indicate to trepctl for which host it should display the status.

$ cookbook/trepctl -host host1 -service somename status

'trepctl' has quite a lot of options, as you may discover if you run 'trepctl help'. We will see some of them in this series of articles.

Logs

The second step of troubleshooting, when 'trepctl status' was not enough to nail the problem, is looking at the logs.

Here, the problem you will face is "where the heck do I find the logs?"

As we have seen above in this article, the cookbook can show you the paths:

$ cookbook/paths
     replicator : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/bin/replicator
        trepctl : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/bin/trepctl
            thl : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/bin/thl
            log : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/log/trepsvc.log
           conf : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/conf/
        thl-dir : (service: cookbook) /home/tungsten/installs/cookbook/thl/cookbook
     backup-dir : (service: cookbook) /home/tungsten/installs/cookbook/backups/cookbook
   backup-agent : (service: cookbook) mysqldump

However, there is a simpler way. You can use one of the shortcuts to access the logs. For example, cookbook/log will show the log using 'less,' the well known file viewer. Should you want to use another tool for this task, there is a wide choice:

  • cookbook/show_log: Same as 'log'
  • cookbook/vilog: Edits the replicator log using 'vi'
  • cookbook/vimlog: Edits the replicator log using 'vim'
  • cookbook/emacslog: Edits the replicator log using 'emacs'

Inside the log, when you are troubleshooting, you should first try to find the same message displayed by 'trepctl status.' Around that point, you will find one or more Java stack traces, which contain information useful for the developers (file names and line numbers) and information useful for you (error messages as reported by the database server or the operating system or third party tools, which may help identifying the problem).

Reading events

Most often, when a problem has been identified, you need to know which is the event that is causing the problem. Usually, a look at the SQL, combined with the error message, may give you enough information to fix the problem.

The replication events (or transaction) are stored in several Transaction History Log (THL) files. These files contain the events, as taken from the binary logs, plus some metadata. Unlike the binary logs, though, the THL file names are totally unimportant. Since transactions are identified by number, you don't need to know their location.

To display a THL event, you use a tool named, most aptly, 'thl.' For example, after we run this query:

mysql --host=host1 test -e "insert into v1 values (2,'inserted by node #1')"

We can check the status with

$ cookbook/trepctl services
Processing services command...
NAME              VALUE
----              -----
appliedLastSeqno: 24
appliedLatency  : 0.563
role            : slave
serviceName     : cookbook
serviceType     : local
started         : true
state           : ONLINE
Finished services command...

And then retrieve the event using the thl.

$ cookbook/thl list -seqno 24
SEQ# = 24 / FRAG# = 0 (last frag)
- TIME = 2013-04-05 23:32:18.0
- EPOCH# = 18
- EVENTID = mysql-bin.000006:0000000000004417;0
- SOURCEID = host1
- METADATA = [mysql_server_id=10;dbms_type=mysql;service=cookbook;shard=test]
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
- OPTIONS = [##charset = ISO8859_1, autocommit = 1, sql_auto_is_null = 1, foreign_key_checks = 1, unique_checks = 1, sql_mode = '', character_set_client = 8, collation_connection = 8, collation_server = 8]
- SCHEMA = test
- SQL(0) = insert into v1 values (2,'inserted by node #1') /* ___SERVICE___ = [cookbook] */

There is much metadata in this event, most of which is easily recognizable by any seasoned DBA. Some things that may be worth pointing out are:

  • SEQ# The sequence number, or seqno, or Global Transaction ID,
  • EVENTID: We have seen this when we described 'trepctl status';
  • SOURCEID: the server where the event was generated;
  • service: The service where the event was generated. This also tells us that the master role for this service is in host1.
  • shard: it is how Tungsten defines shards for parallel replication and conflict resolution. By default, a shard matches a database schema, although it can be defined otherwise.
  • SQL : this is the statement being executed. When the transaction contains more than one statement, then you will see SQL(1), SQL(2), and so on. If the event was row-based, then you will see a list of column and their contents instead of a SQL statement.
  • ___SERVICE___ = [cookbook] This comment is added by the replicator to make it recognizable even after it goes to the binary log and gets replicated to a further level. This is not the only method used to mark events. The service identification can go in other places, such as the "comment" field of a "CREATE TABLE" statement.

Skipping transactions

One of the most common replication problems is a duplicate key violation, which in turn often occurs when someone erroneously writes to a slave instead of a master. When such error happens, you may find something like this:

$ cookbook/trepctl status
Processing status command...
NAME                     VALUE
----                     -----
appliedLastEventId     : NONE
appliedLastSeqno       : -1
appliedLatency         : -1.0
channels               : -1
clusterName            : default
currentEventId         : NONE
currentTimeMillis      : 1365199283287
dataServerHost         : host4
extensions             :
latestEpochNumber      : -1
masterConnectUri       : thl://host1:2112/
masterListenUri        : thl://host4:2112/
maximumStoredSeqNo     : -1
minimumStoredSeqNo     : -1
offlineRequests        : NONE
pendingError           : Event application failed: seqno=25 fragno=0 message=java.sql.SQLException:
Statement failed on slave but succeeded on master
pendingErrorCode       : NONE
pendingErrorEventId    : mysql-bin.000006:0000000000004622;0
pendingErrorSeqno      : 25
pendingExceptionMessage: java.sql.SQLException: Statement failed on slave but succeeded on master
                         insert into v1 values (3,'inserted by node #1') /* ___SERVICE___ = [cookbook] */
pipelineSource         : UNKNOWN
relativeLatency        : -1.0
resourcePrecedence     : 99
rmiPort                : 10000
role                   : slave
seqnoType              : java.lang.Long
serviceName            : cookbook
serviceType            : unknown
simpleServiceName      : cookbook
siteName               : default
sourceId               : host4
state                  : OFFLINE:ERROR
timeInStateSeconds     : 8.749
uptimeSeconds          : 28948.881
version                : Tungsten Replicator 2.0.8 build 136

Looking at the logs, we may see something like this:

INFO   | jvm 1    | 2013/04/06 00:01:14 | 2013-04-06 00:01:14,529 [cookbook - q-to-dbms-0] ERROR pipeline.SingleThreadStageTask
Event application failed: seqno=25 fragno=0 message=java.sql.SQLException: Statement failed on slave but succeeded on master
INFO   | jvm 1    | 2013/04/06 00:01:14 | com.continuent.tungsten.replicator.applier.ApplierException: java.sql.SQLException: Statement failed on slave but succeeded on master
INFO   | jvm 1    | 2013/04/06 00:01:14 |       at com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier.applyStatementData(MySQLDrizzleApplier.java:183)
INFO   | jvm 1    | 2013/04/06 00:01:14 |       at com.continuent.tungsten.replicator.applier.JdbcApplier.apply(JdbcApplier.java:1321)
INFO   | jvm 1    | 2013/04/06 00:01:14 |       at com.continuent.tungsten.replicator.applier.ApplierWrapper.apply(ApplierWrapper.java:101)
INFO   | jvm 1    | 2013/04/06 00:01:14 |       at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.apply(SingleThreadStageTask.java:639)
INFO   | jvm 1    | 2013/04/06 00:01:14 |       at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.runTask(SingleThreadStageTask.java:468)
INFO   | jvm 1    | 2013/04/06 00:01:14 |       at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.run(SingleThreadStageTask.java:167)
INFO   | jvm 1    | 2013/04/06 00:01:14 |       at java.lang.Thread.run(Unknown Source)
INFO   | jvm 1    | 2013/04/06 00:01:14 | Caused by: java.sql.SQLException: Statement failed on slave but succeeded on master
INFO   | jvm 1    | 2013/04/06 00:01:14 |       at com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier.applyStatementData(MySQLDrizzleApplier.java:140)
INFO   | jvm 1    | 2013/04/06 00:01:14 |       ... 6 more
INFO   | jvm 1    | 2013/04/06 00:01:14 | Caused by: java.sql.SQLIntegrityConstraintViolationException: Duplicate entry '3' for key 'PRIMARY'

After inspecting the tables in all nodes, we find that the host4 already contains a record with Primary Key=3, and that it has the same contents of the record coming from host1. In this case, the easiest way of fixing the error is by telling the replicator to skip this event.

$ cookbook/trepctl online -skip-seqno 25

After this, the replicator goes online, and, provided that there are no other errors after the first one, will continue replicating.

Taking over existing Replication

In the first sections of this article, we saw how to install Tungsten replicator as the primary source of replication. We assumed that the servers had the same contents, and there was no replication already going on. Here we assume, instead, that there was replication already, and we show the steps to reproduce the process.

To simulate the initial status, we're going to clear the cluster installed before, install native MySQL replication instead, and take over from there.

There is a recipe to install standard replication, just for this purpose.

$ cookbook/install_standard_mysql_replication
Starting slave on host2 Master File = mysql-bin.000005, Master Position = 106
Starting slave on host3 Master File = mysql-bin.000005, Master Position = 106
Starting slave on host4 Master File = mysql-bin.000005, Master Position = 106
# master  host1
mysql-bin.000005    554
#slave host2
              Master_Log_File: mysql-bin.000005
          Read_Master_Log_Pos: 554
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 554
replication test: ok

#slave host3
              Master_Log_File: mysql-bin.000005
          Read_Master_Log_Pos: 554
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 554
replication test: ok

#slave host4
              Master_Log_File: mysql-bin.000005
          Read_Master_Log_Pos: 554
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 554
replication test: ok

The installation also provides a simple test that checks if replication is running by creating a table named 't1' and retrieving it in the slaves. As you can see, after the test, the slaves are at position 554 of binary log # 000005. If we create another table, we check that it is replicated and take nota again of the binlog position.

$ mysql -h host1 -e 'create table test.test_standard(i int)'

$ for host in host1 host2 host3 host4; do mysql -h $host -e 'show tables from test' ; done
+----------------+
| Tables_in_test |
+----------------+
| t1             |
| test_standard  |
+----------------+
+----------------+
| Tables_in_test |
+----------------+
| t1             |
| test_standard  |
+----------------+
+----------------+
| Tables_in_test |
+----------------+
| t1             |
| test_standard  |
+----------------+
+----------------+
| Tables_in_test |
+----------------+
| t1             |
| test_standard  |
+----------------+
$ for host in  host2 host3 host4; do mysql -h $host -e 'show slave status\G' | grep 'Master_Log_File\|Read_Master_Log_Pos\|Running' ; done
              Master_Log_File: mysql-bin.000005
          Read_Master_Log_Pos: 651
        Relay_Master_Log_File: mysql-bin.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Master_Log_File: mysql-bin.000005
          Read_Master_Log_Pos: 651
        Relay_Master_Log_File: mysql-bin.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Master_Log_File: mysql-bin.000005
          Read_Master_Log_Pos: 651
        Relay_Master_Log_File: mysql-bin.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

So, replication is running. We can try our take-over script, which the cookbook provides:

$ cookbook/take-over
./tools/tungsten-installer
    --master-slave
    --master-host=host1
    --datasource-user=tungsten
    --datasource-password=secret
    --datasource-port=3306
    --service-name=cookbook
    --home-directory=/home/tungsten/installs/cookbook
    --cluster-hosts=host1,host2,host3,host4
    --datasource-mysql-conf=/etc/my.cnf
    --datasource-log-directory=/var/lib/mysql
    --rmi-port=10000
    --thl-port=2112 -a
    --auto-enable=false
    --start

$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/trepctl -port 10000 -host host1 online -from-event mysql-bin.000005:651
$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/trepctl -port 10000 -host host2 online
$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/trepctl -port 10000 -host host3 online
$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/trepctl -port 10000 -host host4 online
--------------------------------------------------------------------------------------
Topology: 'MASTER_SLAVE'
--------------------------------------------------------------------------------------
# node host1
cookbook  [master]  seqno:          5  - latency:   0.556 - ONLINE

# node host2
cookbook  [slave]   seqno:          5  - latency:   0.663 - ONLINE

# node host3
cookbook  [slave]   seqno:          5  - latency:   0.690 - ONLINE

# node host4
cookbook  [slave]   seqno:          5  - latency:   0.595 - ONLINE

What happens here?

The first notable thing is that we install the replicator with the option --auto-enable set to false. With this option, the replicator starts, but stays OFFLINE. After that, the take-over script stops the replication in all servers, retrieves the latest binlog position, and tells the replicator to go ONLINE using the event ID. This is one of the few cases where we can't use a global transaction ID, because it does not exist yet!

Next, all the slaves go online. There is no need to tell them at which event they should start, because they will simply get the events from the master.


Further info:

Friday, November 04, 2011

Replication stars

Working with replication, you come across many topologies, some of them sound and established, some of them less so, and some of them still in the realm of the hopeless wishes. I have been working with replication for almost 10 years now, and my wish list grew quite big during this time. In the last 12 months, though, while working at Continuent, some of the topologies that I wanted to work with have moved from the cloud of wishful thinking to the firm land of things that happen. My quest for star replication starts with the most common topology. One master, many slaves.
Replication 1 master slave

Fig 1. Master/Slave topology

Replication legend

Legend

It looks like a star, with the rays extending from the master to the slaves. This is the basis of most of the replication going on mostly everywhere nowadays, and it has few surprises. Setting aside the problems related to failing over and switching between nodes, which I will examine in another post, let's move to another star.
Replication 2 fan in slave

Fig 2. Fan-in slave, or multiple sources

The multiple source replication, also known as fan-in topology, has several masters that replicate to the same slave. For years, this has been forbidden territory for me. But Tungsten Replicator allows you to create multiple source topologies easily. This is kind of uni-directional, though. I am also interested in topologies where I have more than one master, and I can retrieve data from multiple points.
Replication 3 all to all three nodes

Fig 3. all-to-all three nodes

Replication 4 all to all four nodes

Fig 4. All-to-all four nodes

Tungsten Multi-Master Installation solves this problem. It allows me to create topologies where every node replicates to every other node. Looking at the three-node scheme, it appears a straightforward solution. When we add one node, though, we see that the amount of network traffic grows quite a lot. The double sided arrows mean that there is a replication service at each end of the line, and two open data channels. When we move from three nodes to four, we double the replication services and the channels needed to sustain the scheme. For several months, I was content with this. I thought: it is heavy, but it works, and it's way more than what you can do with native replication, especially if you consider that you can have a practical way of preventing conflicts using Shard Filters. But that was not enough. Something kept gnawing at me, and from time to time I experimented with Tungsten Replicator huge flexibility to create new topologies. But the star kept eluding me. Until … Until, guess what? a customer asked for it. The problem suddenly ceased to be a personal whim, and it became a business opportunity. Instead of looking at the issue in the idle way I often think about technology, I went at it with practical determination. What failed when I was experimenting in my free time was that either the pieces did not glue together the way I wanted, or I got an endless loop. Tungsten Replicator has a set of components that are conceptually simple. You deploy a pipeline between two points, open the tap, and data starts flowing in one direction. Even with multiple masters replication, the principle is the same. You deploy many pipes, and each one has one purpose only.
Replication 5 star topology 3 rays

Fig 5. All-masters star topology

In the star topology, however, you need to open more taps, but not too many, as you need to avoid the data looping around. The recipe, as it turned out, is to create a set of bi-directional replication systems, where you enable the central node slave services to get changes only from a specific master, and the slave services on the peripheral nodes to accept changes from any master. It was as simple as that. There are, of course, benefits and drawbacks with a star topology, compared to a all-replicate-to-all design. In the star topology, we create a single point of failure. If the central node fails, replication stops, and the central node needs to be replaced. Instead, the all-to-all design has no weaknesses. Its abundance of connections makes sure that, if a node fails, the system continues working without any intervention. There is no need for fail-over.
Replication 6 all to all extending png

Fig 6. extending an all-to-all topology

Replication 7 star extending

Fig 7. Extending a star topology

However, there is a huge benefit in the node management. If you need to add a new node, it costs two services and two connections, while the same operation in the all-to-all replication costs 8 services and 8 connections. With the implementation of this topology, a new challenge has arisen. While conflict prevention by sharding is still possible, this is not the kind of scenario where you want to apply it. We have another conflict prevention mechanism in mind, and this new topology is a good occasion make it happen. YMMV. I like the additional choice. There are cases where a all-replicate-to-all topology is still the best option, and there are cases where a star topology is more advisable.

Friday, September 16, 2011

Quick recipes for database cluster building

One lesson learned in more than two decades working in this industry is that most of the IT professionals are impatient, want to achieve results immediately, and, most importantly, they don't read documentation. Much as the average geek is happy to answer many requests with a dismissive RTFM, the same geeks are not as diligent when it comes to learning about new or updated technologies. For this reason, there is a kind of documentation that is very much appreciated by busy and impatient professionals: cookbooks. And I am not talking about food. Geeks are not known for being cooks (1) and they like fast food. I am talking about collection of technical recipes, short articles where a problem is briefly stated, and a direct solution is shown. Working with Tungsten Replicator, I am constantly amazed at all the things you can do with it, and at the same time, I am amazed at how so few people read the documentation. Since I want more users to be aware of the goodies, and being aware of the geeks' aversion to regular docs, I have started putting together a Tungsten Replicator Cookbook, where users can quickly find the recipe to build their cluster of choice. The problem is stated in one short paragraph, and the solution is outlined with code in the shortest possible way. Can't get any lazier than this! Or maybe you can, but I haven't reached that level yet. I hope I have found a good balance. Some of the recipes that the cookbook offers are:
  • Install a master / slave cluster
  • Install a master slave directory with customized parameters
  • Install more than one Tungsten Replicator in one host
  • Install a direct slave with parallel replication
  • Taking over replication from a MySQL slave in direct mode
  • Install bi-directional replication
  • Install bi-directional replication with additional slave
  • Install a three masters replication
  • Install a four masters replication
  • Modify one or more properties with the installer
  • Add one slave to an existing master
In addition to the cookbook, we have inaugurated a sample of another popular literary genre, namely Troubleshooting recipes. When things go wrong (and they usually do when you are dealing with something new, you want a quick answer to your problem. These troubleshooting items are aimed at making such quick answer readily available. Both projects are moving targets. We will adjust as the project grows. Contributions and comments are welcome. If you have suggestions on how to improve these documents, you can use the mailing list. (1) With some notable exception, I must say. I have a reputation as a good cook. But then, I represent a minority in so many ways.