Almost Perfect
Service Discovery and Failover
with ProxySQL and Orchestrator
by Jean-François Gagné
and Art van Scheppingen
Presented at Percona Live Online, May 2021
Almost Perfect
Service Discovery and Failover
with ProxySQL and Orchestrator
Jean-François Gagné
System and MySQL Expert at HubSpot
jgagne AT hubspot DOT com / @jfg956
MySQL Service Discovery at MessageBird
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
3
Summary of part #1
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
• MySQL Primary High Availability
• Failover to a Replica War Story
• MySQL at MessageBird (Percona Server)
• MySQL Service Discovery at MessageBird (ProxySQL)
• Orchestrator integration and the Failover Process
4
MySQL Primary High Availability [1 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
Failing-over the primary to a replica is my favorite high availability method
• But it is not as easy as it sounds, and it is hard to automate well
• An example of complete failover solution in production:
https://github.blog/2018-06-20-mysql-high-availability-at-github/
The five considerations for primary high availability:
(https://jfg-mysql.blogspot.com/2019/02/mysql-master-high-availability-and-failover-more-thoughts.html)
• Plan how you are doing primary high availability
• Decide when you apply your plan (Failure Detection – FD)
• Tell the application about the primary change (Service Discovery – SD)
• Protect against the limit of FD and SD for avoiding split-brains (Fencing)
• Fix your data if something goes wrong
5
MySQL Primary High Availability [2 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
Failure detection (FD) is the 1st part (and 1st challenge) of failing-over
• It is a very hard problem: partial failure, unreliable network, partitions, …
• It is impossible to be 100% sure of a failure, and confidence needs time
à quick FD is unreliable, relatively reliable FD implies longer downtime
Ø Quick FD for short downtime generates false positive
Repointing is the 2nd part of failing-over to a replica:
• Relatively easy with the right tools: GTID, Pseudo-GTID, Binlog Servers, …
• Complexity grows with the number of direct replicas of the primary
• Some software for repointing:
• Orchestrator, Ripple Binlog Server,
Replication Manager, MHA, Cluster Control, MaxScale
6
MySQL Primary High Availability [3 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
What do I mean by repointing:
• In below configuration and when the primary fails,
once one of the replica as been chosen as the new primary
the other replica needs to be re-sourced (re-slaved) to the new primary
7
MySQL Primary High Availability [4 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
Service Discovery (SD) is the 3rd part (and 2nd challenge) of failover:
• If centralized à SPOF; if distributed à impossible to update atomically
• SD will either introduce a bottleneck (including performance limits)
• Or it will be unreliable in some way (pointing to the wrong primary)
• Some ways to implement Service Discovery: DNS, ViP, Proxy, Zookeeper, …
http://code.openark.org/blog/mysql/mysql-master-discovery-methods-part-1-dns
Ø Unreliable FD and unreliable SD is a recipe for split-brains !
Protecting against split-brains (Fencing): Adv. Subject – not many solutions
(proxies and semi-synchronous replication might help)
Fixing your data in case of a split-brain: only you can know how to do this !
(tip on this in the war story) 8
Failover War Story [1 of 6]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
Some infrastructure context:
• Service Discovery is DNS (and failure detector is Orchestrator)
• The databases are behind a firewall in two data centers
9
Failover War Story [1 of 6]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
Some infrastructure context:
• Service Discovery is DNS (and failure detector is Orchestrator)
• The databases are behind a firewall in two data centers
• And we have a failure of the firewall in the zone of the primary
10
Failover War Story [2 of 6]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
Things went as planned: failed-over from Zone1 to Zone2
• New primary in zone 2: stop replication, set it read-write, update DNS, …
• Everything was ok…
11
Failover War Story [2 of 6]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
Things went as planned: failed-over from Zone1 to Zone2
• New primary in zone 2: stop replication, set it read-write, update DNS, …
• Everything was ok… until the firewall came back up
12
Failover War Story [3 of 6]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
Once the firewall came back up, no detectable problems
• But some intuition made me checked the binary logs of the old primary
• And I found new transactions with timestamp after the firewall recovery
(and obviously this is after the failover to zone 2)
13
Failover War Story [4 of 6]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
These new transactions are problematic:
• They are in the databases in zone 1, but not in zone 2
• They share common auto-increments with data in zone 2
• Luckily, there are only a few transactions, so easy to fix, but what happened ?
14
Failover War Story [5 of 6]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
The infrastructure is a little more complicated than initially presented:
• There are web servers and local DNS behind the firewalls (fw)
• The DNS update of the failover did not reach zone 1 (because of the fw failure)
• When the firewall came back up, the web servers received traffic
and because the DNS was not yet updated, they wrote on the old primary
• Once updated (a few seconds later), writes went to the new primary in zone 2
15
Failover War Story [6 of 6]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
This war story was a decentralized Service Discovery causing problems
Remember that it is not a matter of “if” but “when” things will go wrong
Please share your war stories so we can learn from each-others’ experience
• GitHub has a MySQL public Post-Mortem (great of them to share this):
https://blog.github.com/2018-10-30-oct21-post-incident-analysis/
• I also have another MySQL Primary Failover war story in another talk:
https://www.usenix.org/conference/srecon19emea/presentation/gagne
Tip for easier data-reconciliation: use UUID instead of auto-increments
• But store UUID in an optimized way (in primary key order)
https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/
http://mysql.rjweb.org/doc.php/uuid 16
MySQL at MessageBird [1 of 2]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
MessageBird is using MySQL 5.7 (more precisely Percona Server)
These are hosted in many Google Cloud Regions
There are three types of MySQL deployments
1. Multi-region primary:
replicas in many regions and primary potentially on many regions
2. Single-region primary with replicas in many regions
3. Primary and replicas all in a single region
17
MySQL at MessageBird [2 of 2]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
This is a multi-region primary (regions are color-coded)
MySQL Service Discovery at MessageBird [1 / 12]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
Requirements for MySQL Service Discovery:
• Being able to route traffic to local replicas
• Embed some sort of fencing mechanism
This led to a multi-layer solution using ProxySQL:
• Three layers for multi-region primary:
1. Collect
2. Master-Gateway (mgw)
3. Fencing
For single-region, mgw and fencing are merged in local-fencing (locfen)
19
MySQL Service Discovery at MessageBird [2 / 12]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
20
MySQL Service Discovery at MessageBird [3 / 12]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
21
The collect layer is a standard entry-point design
The fencing layer is a natural HA way to route
traffic to the primary (a single node would not be HA)
The master-gateway layer is the glue between
collect and fencing (more about this later in the talk)
The local-fencing layer is the mgw and the fencing layers
merged for single-region primary databases because
routing to a single region does not need the mgw glue
(three nodes for N+2 HA, more about this later in the talk)
The collect is the entry-point of the MySQL Service Discovery:
• It starts with a load-balancer sending traffic to ProxySQL
• We have at least 3 instances of ProxySQL for N+2 high availability
• From here, read-only traffic is sent directly to replicas
• Primary traffic (read-write) is sent to mgw or locfen
22
MySQL Service Discovery at MessageBird [4 / 12]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
Routing from collect to locfen is either local or crossing a region boundary
For mgw, it is biased to local when a mgw is on the same region as collect
• ProxySQL routing is weight-based (no easy fallback routing)
MySQL Service Discovery at MessageBird [5 / 12]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
23
The master-gateway is deployed on all regions potentially hosting a primary
The same way fencing (or locfen) is designed as small as possible
to reduce the update-scope of failover (to a single region) …
… mgw bounds the update-scope of moving the primary to another region to
a continent (in this case, the three mgw regions are in Europe)
(it avoids a Planet-Scale reconfiguration of collect on failover)
… and the way mgw routes traffic to fencing protects against writing to the
primary in case of network partitions
24
MySQL Service Discovery at MessageBird [6 / 12]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
25
MySQL Service Discovery at MessageBird [7 / 12]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
The mgw routing is as follow:
• If the primary is in a remote region, traffic is routed to fencing in that region
• If the primary is in the same region, traffic is routed to the other mgw
Ø No path to the primary not crossing a region boundary
No path to the primary that is not crossing a region boundary
• That might sound sub-optimal, but it is an interesting tradeoff
• It makes the best vs worse case round-trip ratio to the primary closer
Without crossing a region boundary:
• Best case (local access): ~1 ms round-trip to the primary
• Worse case (remote access): ~20 ms round-trip to the primary
Ø Ratio of 1 to 20
With crossing a region boundary in mgw:
• Best case (remote access): ~20 ms round-trip to the primary
• Worse case (local access): ~40 ms round-trip to the primary
• Even worse case (remote routed to mgw of the primary) 60 ms
Ø Ratio of 1 to 2 (3 in the worse case) 26
MySQL Service Discovery at MessageBird [8 / 12]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
27
MySQL Service Discovery at MessageBird [9 / 12]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
The worse case (which could be avoided with smarter collect routing)
Avoiding low round-trip variance over optimal best-case
• With region-remote primary accesses, a high latency is unavoidable:
when having 20 ms latency, having 40 or 60 should not be problematic
• Moving the average closer to the median avoids problems
• And in the case where most writes are local,
it avoids surprises when the database becomes remote
• And it prevents writing to the primary in case of a network partition
28
MySQL Service Discovery at MessageBird [10 / 12]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
29
MySQL Service Discovery at MessageBird [11 / 12]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
And therefore, I claim this design has interesting tradeoff
• But it might not fit everyone’s requirements
Why only two fencing nodes and three locfen nodes:
• I like high availability N+2
• This allows a single failure to not need an immediate fix
• If a locfen node fails on Friday evening, it can wait until Monday
• A failure of one of the two fencing node looks problematic
• But we can failover to another region having two healthy nodes
• And updating two ProxySQL in case of a failover is easier than three
(needing three locfen is something I dislike as it makes failover more fragile)
30
MySQL Service Discovery at MessageBird [12 / 12]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
Failover
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
31
Failing-over is a multi-step process:
1. Detecting a failure
2. Fencing the primary: setting it as OFFLINE_HARD in ProxySQL
3. Regrouping replicas under the new primary
4. Waiting for replication to catch-up on the new primary
5. Making the new primary ready: stop replication, set writable, start HB, …
6. Updating ProxySQL to point to the new primary
7. Re-configure fencing and master-gateway if needed
Reconfiguring Fencing and MGW [1 of 6]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
32
Reconfiguring Fencing and MGW [2 of 6]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
33
Reconfiguring Fencing and MGW [3 of 6]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
34
Reconfiguring Fencing and MGW [4 of 6]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
35
Reconfiguring Fencing and MGW [5 of 6]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
36
Reconfiguring Fencing and MGW [6 of 6]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
37
Reacting to a network partition is a similar operation
Orchestrator integration
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
Orchestrator integration
• Pre-failover hook: fence the primary in fencing (or locfen)
• Post-failover hook: update fencing (or locfen) to the new primary
• List of ProxySQL nodes (fencing or locfen)
and the ProxySQL hostgroup are store in each databases
which make this information available to the Orchestrator hooks
38
MySQL Service Discovery at MessageBird
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
39
Almost Perfect
Service Discovery and Failover
with ProxySQL and Orchestrator
Art van Scheppingen
Senior Database Engineer at MessageBird
art.vanscheppingen AT messagebird DOT com
Has been successfully in production for over two years now
• Most of our workload is on single-region primary (e.g. locfen)
• We have one cluster on multi-region primary
ProxySQL has been very stable for us
• No big issues on 1.4.x, 2.0.x and 2.1.x
How does it work far?
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
41
Easier for devs to setup connections to primary
• Point connection to ip address
• No failover handling necessary
Easier for devs to scale out reads
• Point connection to the same ip address
• Uses a different user for RO
• We can differentiate reads (read-only, read-only replica-only, etc)
How does it work far?
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
42
Multiplexing in ProxySQL turned out to be tricky
Without Multiplexing:
• 1 on 1 number of connections between Collect and Locfen
• Application makes a connection for each shard (8 at the moment)
• Application only uses one actively
• High number of connections means high load
With multiplexing
• High number of connections on Collect
• Theoretically only 1/8th of the connections on Locfen
• Even lower due to less overhead on establishing connections
Mastering multiplexing [1 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
43
Prior to multiplexing
Mastering multiplexing [2 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
44
We enabled multiplexing in our configuration but not much happened
ProxySQL disabled multiplexing due to an auto increment
• Bugreport on ORM that didn’t handle multiplexing well (Hibernate)
• Parsing of the OK packet for auto increments
• Whenever encountered: multiplexing is affected
• mysql-auto_increment_delay_multiplex set to 5 by default
• This means for 5 consecutive queries multiplexing is disabled
Mastering multiplexing [3 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
45
After mastering multiplexing
Mastering multiplexing [4 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
46
We tried reusing collect and locfen as much as possible
• Centralized configuration (hostgroups and users)
• Less complexity
Expansion was inevitable
• Noisy neighbors (hostgroups)
• Reducing risk
• Better tuning for certain workloads
• Easier for maintenance
• Cascading effect to other hostgroups
Currently running 3 vertical stacks of collect and locfen
Separation of stacks
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
47
Above 50% CPU usage ProxySQL will show increased latency
• Lower CPU usage by multiplexing
• Lower CPU usage by idle-threads
When a certain hostgroup in Locfen is under stress (1000x normal workload)
• CPU usage can get above 50%
• Latency will increase on all hostgroups
• Latency will cascade upstream to the collect layer
Separation of stacks: Noisy neighbors [1 of 2]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
48
Above 50% CPU usage ProxySQL will show increased latency
Separation of stacks: Noisy neighbors [2 of 2]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
49
For us ProxySQL scales up to about 12K of connections per host
• After this we will hit the limits of TCP
Our Collect layer reached 7.8K connections
• If one collect host fails two remain
• Two remaining hosts will have to do an additional 3.9K connections
• Very close to our 12K limit
• Replacing a failed host now becomes an emergency operation
Separation of stacks: Reducing risk [1 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
50
ProxySQL keeps count of connection errors (MySQL + TCP shared)
• Will shun a host if a backend becomes “less responsive” (e.g. high
load)
• Happens on hostgroups with any number of hosts
Hitting the limits of TCP
• Errors to locfen or MGW will increase
• Locfen and MGW backends will be shunned
• Established connections will also be closed
Separation of stacks: Reducing risk [2 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
51
Separation of stacks: Reducing risk [3 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
52
Separation of stacks: Reducing risk [4 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
53
ProxySQL keeps count of connection errors (MySQL + TCP shared)
• Will shun a host if a backend becomes “less responsive” (e.g. high
load)
• Happens on hostgroups with any number of hosts
Shunning a primary for 1 second will cause another torrent of connections
• Client gets a timeout and will reconnect immediately
• No available backend: new connection is “paused” up to 10 seconds
• After 1 second primary become available again
• ProxySQL has thousands of connections waiting
• Rinse and repeat…
Shunning a primary [1 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
54
How to detect shunned hosts?
• ProxySQL will log a shunned host in the proxysql log
• This includes server name, error rate and duration of shun
2020-06-11 12:01:39 MySQL_HostGroups_Manager.cpp:311:connect_error(): [ERROR] Shunning server x.x.x.x:3306 with 10 errors/sec. Shunning for 10 seconds
2020-06-11 12:01:44 MySQL_HostGroups_Manager.cpp:311:connect_error(): [ERROR] Shunning server x.x.x.x:3306 with 18 errors/sec. Shunning for 10 seconds
2020-06-11 12:01:49 MySQL_HostGroups_Manager.cpp:311:connect_error(): [ERROR] Shunning server x.x.x.x:3306 with 10 errors/sec. Shunning for 10 seconds
2020-06-11 12:01:54 MySQL_HostGroups_Manager.cpp:311:connect_error(): [ERROR] Shunning server x.x.x.x:3306 with 10 errors/sec. Shunning for 10 seconds
How do we get them in our graphs?
• ProxySQL log tailer
• Looks for: connection timeouts, shunned hosts, shunned due to
replication lag
• Exports metrics to Prometheus every minute
Shunning a primary [2 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
55
Shunning a primary [3 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
56
Shunning a primary for 1 second will cause an avalanche connections
• Normal latency is 10ms to 50ms
• Added latency of 1 second will decrease application throughput
• Decreased application throughput means k8s scale up workers
• k8s scale up means more incoming connections
• Rinse and repeat…
How we dealt with this:
• During some incidents we throttle down workers
• Counter intuitive: throttling down works increases throughput
• Some application workers now have a fixed ceiling
Shunning a primary [4 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
57
Most ProxySQL tuning is done on a global level
Some examples:
• mysql-connection_delay_multiplex_ms
• mysql-free_connections_pct
• mysql-wait_timeout
Having a separate stack allows
• Fine tuned multiplexing configuration
• Earlier or later closing of connections
• Separate handling of (end) user connections
Separation of stacks: Better tuning
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
58
Maintenance on Collect is scary
• Draining from GLB gracefully is ”closing after X-minutes”
• Near capacity means we run a risk when performing maintenance
Maintenance on MGW/Fencing/Locfen is scary
• Draining a host takes ages to happen
• Aggressive reuse of connections by connection pool
• Connection timeout (wait_timeout) is 8 hours
• Some applications don’t handle closing of a connection well
Separation of stacks: maintenance
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
59
Warstory: instability on one cluster swiped out many others
The instable cluster
• MySQL back_log set too low
• TCP listen overflows on primary
• ProxySQL started to shun primary
The effect
• Continuous shunning happened
• TCP listen overflows started to happen on ProxySQL
• Affected stability on other hostgroups
Separation of stacks: cascading effect [1 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
60
Shunning of primary causes endless shunning loop
Separation of stacks: cascading effect [2 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
61
Listen overflows on ProxySQL hosts make it even worse
Separation of stacks: cascading effect [3 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
62
In effect Collect layer shuns Locfen layer hosts
Separation of stacks: cascading effect [4 of 4]
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
63
After a client-connection closes, the connection will be reused
• Collect → Locfen
• Locfen → database
ProxySQL resets connection to initial connection parameters
What if new connection doesn’t match settings (e.g. UTF8mb4 or CET timezone?)
Will the connection between Locfen → database also change?
Other quirks: Connection contamination
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
64
Uneven distribution of connections/queries
• Weight influences the distribution of connections
• Reuse of existing connections is favored by ProxySQL
• Influenced by mysql-free_connections_pct
The variable mysql-free_connections_pct is a global variable
• Percentage of maximum allowed connections of a hostgroup
• Some hostgroups allow up to 3000 incoming connections
• 2% of 3000 is 60 connections, actual usage is 10 to 15
• More connections are kept open in connection pool than necessary
Other quirks: Uneven distribution
(Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
65
Thanks !
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
by Jean-François Gagné
and Art van Scheppingen
Presented at Percona Live Online, May 2021

More Related Content

PDF
The Full MySQL and MariaDB Parallel Replication Tutorial
PDF
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
PDF
Demystifying MySQL Replication Crash Safety
PDF
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
PDF
MySQL Shell for DBAs
PDF
Evolution of MySQL Parallel Replication
PDF
MySQL InnoDB Cluster - Group Replication
PPTX
Running MariaDB in multiple data centers
The Full MySQL and MariaDB Parallel Replication Tutorial
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Demystifying MySQL Replication Crash Safety
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Shell for DBAs
Evolution of MySQL Parallel Replication
MySQL InnoDB Cluster - Group Replication
Running MariaDB in multiple data centers

What's hot (20)

PDF
Load Balancing MySQL with HAProxy - Slides
PDF
How to set up orchestrator to manage thousands of MySQL servers
PDF
MMUG18 - MySQL Failover and Orchestrator
DOCX
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
PPTX
ProxySQL for MySQL
PPTX
MySQL8.0_performance_schema.pptx
PPTX
Query logging with proxysql
PPTX
Maxscale 소개 1.1.1
PDF
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
PDF
MySQL Database Architectures - 2022-08
PDF
MySQL GTID 시작하기
PDF
MySQL Multi-Source Replication for PL2016
PDF
MySQL/MariaDB Proxy Software Test
PDF
MySQL Parallel Replication: inventory, use-case and limitations
PDF
MariaDB 제품 소개
PDF
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
PPTX
ProxySQL & PXC(Query routing and Failover Test)
PDF
[2018] MySQL 이중화 진화기
PDF
Parallel Replication in MySQL and MariaDB
PDF
Best practices for MySQL High Availability
Load Balancing MySQL with HAProxy - Slides
How to set up orchestrator to manage thousands of MySQL servers
MMUG18 - MySQL Failover and Orchestrator
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
ProxySQL for MySQL
MySQL8.0_performance_schema.pptx
Query logging with proxysql
Maxscale 소개 1.1.1
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
MySQL Database Architectures - 2022-08
MySQL GTID 시작하기
MySQL Multi-Source Replication for PL2016
MySQL/MariaDB Proxy Software Test
MySQL Parallel Replication: inventory, use-case and limitations
MariaDB 제품 소개
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
ProxySQL & PXC(Query routing and Failover Test)
[2018] MySQL 이중화 진화기
Parallel Replication in MySQL and MariaDB
Best practices for MySQL High Availability
Ad

Similar to Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator (20)

PPTX
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
PDF
Capital One Delivers Risk Insights in Real Time with Stream Processing
PDF
"Stateful app as an efficient way to build dispatching for riders and drivers...
PPT
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
PDF
5 Levels of High Availability: From Multi-instance to Hybrid Cloud
PDF
5 levels of high availability from multi instance to hybrid cloud
PPTX
Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...
PPTX
MySQL Options in OpenStack
PDF
OpenStack Days East -- MySQL Options in OpenStack
PDF
HTTP/2 Comes to Java: Servlet 4.0 and what it means for the Java/Jakarta EE e...
PPTX
Stephan Ewen - Experiences running Flink at Very Large Scale
PPTX
Dragonflow Austin Summit Talk
PDF
MySQL Cluster
PDF
Oss4b - pxc introduction
PDF
AAI-4847 Full Disclosure on the Performance Characteristics of WebSphere Appl...
PPTX
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
PPT
Distributed computing file system in operating system
PDF
MySQL Scalability and Reliability for Replicated Environment
PPTX
Leveraging Endpoint Flexibility in Data-Intensive Clusters
PPTX
Nordic infrastructure Conference 2017 - SQL Server on Linux Overview
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Capital One Delivers Risk Insights in Real Time with Stream Processing
"Stateful app as an efficient way to build dispatching for riders and drivers...
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
5 Levels of High Availability: From Multi-instance to Hybrid Cloud
5 levels of high availability from multi instance to hybrid cloud
Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...
MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStack
HTTP/2 Comes to Java: Servlet 4.0 and what it means for the Java/Jakarta EE e...
Stephan Ewen - Experiences running Flink at Very Large Scale
Dragonflow Austin Summit Talk
MySQL Cluster
Oss4b - pxc introduction
AAI-4847 Full Disclosure on the Performance Characteristics of WebSphere Appl...
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
Distributed computing file system in operating system
MySQL Scalability and Reliability for Replicated Environment
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Nordic infrastructure Conference 2017 - SQL Server on Linux Overview
Ad

More from Jean-François Gagné (14)

PDF
The consequences of sync_binlog != 1
PDF
Autopsy of a MySQL Automation Disaster
PDF
MySQL Scalability and Reliability for Replicated Environment
PDF
Demystifying MySQL Replication Crash Safety
PDF
Demystifying MySQL Replication Crash Safety
PDF
MySQL Parallel Replication by Booking.com
PDF
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
PDF
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
PDF
The two little bugs that almost brought down Booking.com
PDF
Autopsy of an automation disaster
PDF
How Booking.com avoids and deals with replication lag
PDF
MySQL Parallel Replication: inventory, use-case and limitations
PDF
MySQL Parallel Replication: inventory, use-cases and limitations
PDF
Riding the Binlog: an in Deep Dissection of the Replication Stream
The consequences of sync_binlog != 1
Autopsy of a MySQL Automation Disaster
MySQL Scalability and Reliability for Replicated Environment
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
MySQL Parallel Replication by Booking.com
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
The two little bugs that almost brought down Booking.com
Autopsy of an automation disaster
How Booking.com avoids and deals with replication lag
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-cases and limitations
Riding the Binlog: an in Deep Dissection of the Replication Stream

Recently uploaded (20)

PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
Configure Apache Mutual Authentication
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PPTX
Build Your First AI Agent with UiPath.pptx
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PPTX
Microsoft User Copilot Training Slide Deck
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Statistics on Ai - sourced from AIPRM.pdf
DOCX
search engine optimization ppt fir known well about this
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
giants, standing on the shoulders of - by Daniel Stenberg
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
Advancing precision in air quality forecasting through machine learning integ...
Enhancing plagiarism detection using data pre-processing and machine learning...
sustainability-14-14877-v2.pddhzftheheeeee
Configure Apache Mutual Authentication
Convolutional neural network based encoder-decoder for efficient real-time ob...
Early detection and classification of bone marrow changes in lumbar vertebrae...
Build Your First AI Agent with UiPath.pptx
Custom Battery Pack Design Considerations for Performance and Safety
Module 1 Introduction to Web Programming .pptx
Taming the Chaos: How to Turn Unstructured Data into Decisions
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
Microsoft User Copilot Training Slide Deck
Consumable AI The What, Why & How for Small Teams.pdf
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Statistics on Ai - sourced from AIPRM.pdf
search engine optimization ppt fir known well about this
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...

Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator

  • 1. Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator by Jean-François Gagné and Art van Scheppingen Presented at Percona Live Online, May 2021
  • 2. Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator Jean-François Gagné System and MySQL Expert at HubSpot jgagne AT hubspot DOT com / @jfg956
  • 3. MySQL Service Discovery at MessageBird (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 3
  • 4. Summary of part #1 (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) • MySQL Primary High Availability • Failover to a Replica War Story • MySQL at MessageBird (Percona Server) • MySQL Service Discovery at MessageBird (ProxySQL) • Orchestrator integration and the Failover Process 4
  • 5. MySQL Primary High Availability [1 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) Failing-over the primary to a replica is my favorite high availability method • But it is not as easy as it sounds, and it is hard to automate well • An example of complete failover solution in production: https://github.blog/2018-06-20-mysql-high-availability-at-github/ The five considerations for primary high availability: (https://jfg-mysql.blogspot.com/2019/02/mysql-master-high-availability-and-failover-more-thoughts.html) • Plan how you are doing primary high availability • Decide when you apply your plan (Failure Detection – FD) • Tell the application about the primary change (Service Discovery – SD) • Protect against the limit of FD and SD for avoiding split-brains (Fencing) • Fix your data if something goes wrong 5
  • 6. MySQL Primary High Availability [2 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) Failure detection (FD) is the 1st part (and 1st challenge) of failing-over • It is a very hard problem: partial failure, unreliable network, partitions, … • It is impossible to be 100% sure of a failure, and confidence needs time à quick FD is unreliable, relatively reliable FD implies longer downtime Ø Quick FD for short downtime generates false positive Repointing is the 2nd part of failing-over to a replica: • Relatively easy with the right tools: GTID, Pseudo-GTID, Binlog Servers, … • Complexity grows with the number of direct replicas of the primary • Some software for repointing: • Orchestrator, Ripple Binlog Server, Replication Manager, MHA, Cluster Control, MaxScale 6
  • 7. MySQL Primary High Availability [3 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) What do I mean by repointing: • In below configuration and when the primary fails, once one of the replica as been chosen as the new primary the other replica needs to be re-sourced (re-slaved) to the new primary 7
  • 8. MySQL Primary High Availability [4 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) Service Discovery (SD) is the 3rd part (and 2nd challenge) of failover: • If centralized à SPOF; if distributed à impossible to update atomically • SD will either introduce a bottleneck (including performance limits) • Or it will be unreliable in some way (pointing to the wrong primary) • Some ways to implement Service Discovery: DNS, ViP, Proxy, Zookeeper, … http://code.openark.org/blog/mysql/mysql-master-discovery-methods-part-1-dns Ø Unreliable FD and unreliable SD is a recipe for split-brains ! Protecting against split-brains (Fencing): Adv. Subject – not many solutions (proxies and semi-synchronous replication might help) Fixing your data in case of a split-brain: only you can know how to do this ! (tip on this in the war story) 8
  • 9. Failover War Story [1 of 6] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) Some infrastructure context: • Service Discovery is DNS (and failure detector is Orchestrator) • The databases are behind a firewall in two data centers 9
  • 10. Failover War Story [1 of 6] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) Some infrastructure context: • Service Discovery is DNS (and failure detector is Orchestrator) • The databases are behind a firewall in two data centers • And we have a failure of the firewall in the zone of the primary 10
  • 11. Failover War Story [2 of 6] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) Things went as planned: failed-over from Zone1 to Zone2 • New primary in zone 2: stop replication, set it read-write, update DNS, … • Everything was ok… 11
  • 12. Failover War Story [2 of 6] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) Things went as planned: failed-over from Zone1 to Zone2 • New primary in zone 2: stop replication, set it read-write, update DNS, … • Everything was ok… until the firewall came back up 12
  • 13. Failover War Story [3 of 6] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) Once the firewall came back up, no detectable problems • But some intuition made me checked the binary logs of the old primary • And I found new transactions with timestamp after the firewall recovery (and obviously this is after the failover to zone 2) 13
  • 14. Failover War Story [4 of 6] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) These new transactions are problematic: • They are in the databases in zone 1, but not in zone 2 • They share common auto-increments with data in zone 2 • Luckily, there are only a few transactions, so easy to fix, but what happened ? 14
  • 15. Failover War Story [5 of 6] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) The infrastructure is a little more complicated than initially presented: • There are web servers and local DNS behind the firewalls (fw) • The DNS update of the failover did not reach zone 1 (because of the fw failure) • When the firewall came back up, the web servers received traffic and because the DNS was not yet updated, they wrote on the old primary • Once updated (a few seconds later), writes went to the new primary in zone 2 15
  • 16. Failover War Story [6 of 6] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) This war story was a decentralized Service Discovery causing problems Remember that it is not a matter of “if” but “when” things will go wrong Please share your war stories so we can learn from each-others’ experience • GitHub has a MySQL public Post-Mortem (great of them to share this): https://blog.github.com/2018-10-30-oct21-post-incident-analysis/ • I also have another MySQL Primary Failover war story in another talk: https://www.usenix.org/conference/srecon19emea/presentation/gagne Tip for easier data-reconciliation: use UUID instead of auto-increments • But store UUID in an optimized way (in primary key order) https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/ http://mysql.rjweb.org/doc.php/uuid 16
  • 17. MySQL at MessageBird [1 of 2] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) MessageBird is using MySQL 5.7 (more precisely Percona Server) These are hosted in many Google Cloud Regions There are three types of MySQL deployments 1. Multi-region primary: replicas in many regions and primary potentially on many regions 2. Single-region primary with replicas in many regions 3. Primary and replicas all in a single region 17
  • 18. MySQL at MessageBird [2 of 2] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) This is a multi-region primary (regions are color-coded)
  • 19. MySQL Service Discovery at MessageBird [1 / 12] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) Requirements for MySQL Service Discovery: • Being able to route traffic to local replicas • Embed some sort of fencing mechanism This led to a multi-layer solution using ProxySQL: • Three layers for multi-region primary: 1. Collect 2. Master-Gateway (mgw) 3. Fencing For single-region, mgw and fencing are merged in local-fencing (locfen) 19
  • 20. MySQL Service Discovery at MessageBird [2 / 12] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 20
  • 21. MySQL Service Discovery at MessageBird [3 / 12] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 21 The collect layer is a standard entry-point design The fencing layer is a natural HA way to route traffic to the primary (a single node would not be HA) The master-gateway layer is the glue between collect and fencing (more about this later in the talk) The local-fencing layer is the mgw and the fencing layers merged for single-region primary databases because routing to a single region does not need the mgw glue (three nodes for N+2 HA, more about this later in the talk)
  • 22. The collect is the entry-point of the MySQL Service Discovery: • It starts with a load-balancer sending traffic to ProxySQL • We have at least 3 instances of ProxySQL for N+2 high availability • From here, read-only traffic is sent directly to replicas • Primary traffic (read-write) is sent to mgw or locfen 22 MySQL Service Discovery at MessageBird [4 / 12] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
  • 23. Routing from collect to locfen is either local or crossing a region boundary For mgw, it is biased to local when a mgw is on the same region as collect • ProxySQL routing is weight-based (no easy fallback routing) MySQL Service Discovery at MessageBird [5 / 12] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 23
  • 24. The master-gateway is deployed on all regions potentially hosting a primary The same way fencing (or locfen) is designed as small as possible to reduce the update-scope of failover (to a single region) … … mgw bounds the update-scope of moving the primary to another region to a continent (in this case, the three mgw regions are in Europe) (it avoids a Planet-Scale reconfiguration of collect on failover) … and the way mgw routes traffic to fencing protects against writing to the primary in case of network partitions 24 MySQL Service Discovery at MessageBird [6 / 12] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
  • 25. 25 MySQL Service Discovery at MessageBird [7 / 12] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) The mgw routing is as follow: • If the primary is in a remote region, traffic is routed to fencing in that region • If the primary is in the same region, traffic is routed to the other mgw Ø No path to the primary not crossing a region boundary
  • 26. No path to the primary that is not crossing a region boundary • That might sound sub-optimal, but it is an interesting tradeoff • It makes the best vs worse case round-trip ratio to the primary closer Without crossing a region boundary: • Best case (local access): ~1 ms round-trip to the primary • Worse case (remote access): ~20 ms round-trip to the primary Ø Ratio of 1 to 20 With crossing a region boundary in mgw: • Best case (remote access): ~20 ms round-trip to the primary • Worse case (local access): ~40 ms round-trip to the primary • Even worse case (remote routed to mgw of the primary) 60 ms Ø Ratio of 1 to 2 (3 in the worse case) 26 MySQL Service Discovery at MessageBird [8 / 12] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
  • 27. 27 MySQL Service Discovery at MessageBird [9 / 12] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) The worse case (which could be avoided with smarter collect routing)
  • 28. Avoiding low round-trip variance over optimal best-case • With region-remote primary accesses, a high latency is unavoidable: when having 20 ms latency, having 40 or 60 should not be problematic • Moving the average closer to the median avoids problems • And in the case where most writes are local, it avoids surprises when the database becomes remote • And it prevents writing to the primary in case of a network partition 28 MySQL Service Discovery at MessageBird [10 / 12] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
  • 29. 29 MySQL Service Discovery at MessageBird [11 / 12] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) And therefore, I claim this design has interesting tradeoff • But it might not fit everyone’s requirements
  • 30. Why only two fencing nodes and three locfen nodes: • I like high availability N+2 • This allows a single failure to not need an immediate fix • If a locfen node fails on Friday evening, it can wait until Monday • A failure of one of the two fencing node looks problematic • But we can failover to another region having two healthy nodes • And updating two ProxySQL in case of a failover is easier than three (needing three locfen is something I dislike as it makes failover more fragile) 30 MySQL Service Discovery at MessageBird [12 / 12] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021)
  • 31. Failover (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 31 Failing-over is a multi-step process: 1. Detecting a failure 2. Fencing the primary: setting it as OFFLINE_HARD in ProxySQL 3. Regrouping replicas under the new primary 4. Waiting for replication to catch-up on the new primary 5. Making the new primary ready: stop replication, set writable, start HB, … 6. Updating ProxySQL to point to the new primary 7. Re-configure fencing and master-gateway if needed
  • 32. Reconfiguring Fencing and MGW [1 of 6] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 32
  • 33. Reconfiguring Fencing and MGW [2 of 6] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 33
  • 34. Reconfiguring Fencing and MGW [3 of 6] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 34
  • 35. Reconfiguring Fencing and MGW [4 of 6] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 35
  • 36. Reconfiguring Fencing and MGW [5 of 6] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 36
  • 37. Reconfiguring Fencing and MGW [6 of 6] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 37 Reacting to a network partition is a similar operation
  • 38. Orchestrator integration (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) Orchestrator integration • Pre-failover hook: fence the primary in fencing (or locfen) • Post-failover hook: update fencing (or locfen) to the new primary • List of ProxySQL nodes (fencing or locfen) and the ProxySQL hostgroup are store in each databases which make this information available to the Orchestrator hooks 38
  • 39. MySQL Service Discovery at MessageBird (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 39
  • 40. Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator Art van Scheppingen Senior Database Engineer at MessageBird art.vanscheppingen AT messagebird DOT com
  • 41. Has been successfully in production for over two years now • Most of our workload is on single-region primary (e.g. locfen) • We have one cluster on multi-region primary ProxySQL has been very stable for us • No big issues on 1.4.x, 2.0.x and 2.1.x How does it work far? (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 41
  • 42. Easier for devs to setup connections to primary • Point connection to ip address • No failover handling necessary Easier for devs to scale out reads • Point connection to the same ip address • Uses a different user for RO • We can differentiate reads (read-only, read-only replica-only, etc) How does it work far? (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 42
  • 43. Multiplexing in ProxySQL turned out to be tricky Without Multiplexing: • 1 on 1 number of connections between Collect and Locfen • Application makes a connection for each shard (8 at the moment) • Application only uses one actively • High number of connections means high load With multiplexing • High number of connections on Collect • Theoretically only 1/8th of the connections on Locfen • Even lower due to less overhead on establishing connections Mastering multiplexing [1 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 43
  • 44. Prior to multiplexing Mastering multiplexing [2 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 44
  • 45. We enabled multiplexing in our configuration but not much happened ProxySQL disabled multiplexing due to an auto increment • Bugreport on ORM that didn’t handle multiplexing well (Hibernate) • Parsing of the OK packet for auto increments • Whenever encountered: multiplexing is affected • mysql-auto_increment_delay_multiplex set to 5 by default • This means for 5 consecutive queries multiplexing is disabled Mastering multiplexing [3 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 45
  • 46. After mastering multiplexing Mastering multiplexing [4 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 46
  • 47. We tried reusing collect and locfen as much as possible • Centralized configuration (hostgroups and users) • Less complexity Expansion was inevitable • Noisy neighbors (hostgroups) • Reducing risk • Better tuning for certain workloads • Easier for maintenance • Cascading effect to other hostgroups Currently running 3 vertical stacks of collect and locfen Separation of stacks (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 47
  • 48. Above 50% CPU usage ProxySQL will show increased latency • Lower CPU usage by multiplexing • Lower CPU usage by idle-threads When a certain hostgroup in Locfen is under stress (1000x normal workload) • CPU usage can get above 50% • Latency will increase on all hostgroups • Latency will cascade upstream to the collect layer Separation of stacks: Noisy neighbors [1 of 2] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 48
  • 49. Above 50% CPU usage ProxySQL will show increased latency Separation of stacks: Noisy neighbors [2 of 2] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 49
  • 50. For us ProxySQL scales up to about 12K of connections per host • After this we will hit the limits of TCP Our Collect layer reached 7.8K connections • If one collect host fails two remain • Two remaining hosts will have to do an additional 3.9K connections • Very close to our 12K limit • Replacing a failed host now becomes an emergency operation Separation of stacks: Reducing risk [1 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 50
  • 51. ProxySQL keeps count of connection errors (MySQL + TCP shared) • Will shun a host if a backend becomes “less responsive” (e.g. high load) • Happens on hostgroups with any number of hosts Hitting the limits of TCP • Errors to locfen or MGW will increase • Locfen and MGW backends will be shunned • Established connections will also be closed Separation of stacks: Reducing risk [2 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 51
  • 52. Separation of stacks: Reducing risk [3 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 52
  • 53. Separation of stacks: Reducing risk [4 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 53
  • 54. ProxySQL keeps count of connection errors (MySQL + TCP shared) • Will shun a host if a backend becomes “less responsive” (e.g. high load) • Happens on hostgroups with any number of hosts Shunning a primary for 1 second will cause another torrent of connections • Client gets a timeout and will reconnect immediately • No available backend: new connection is “paused” up to 10 seconds • After 1 second primary become available again • ProxySQL has thousands of connections waiting • Rinse and repeat… Shunning a primary [1 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 54
  • 55. How to detect shunned hosts? • ProxySQL will log a shunned host in the proxysql log • This includes server name, error rate and duration of shun 2020-06-11 12:01:39 MySQL_HostGroups_Manager.cpp:311:connect_error(): [ERROR] Shunning server x.x.x.x:3306 with 10 errors/sec. Shunning for 10 seconds 2020-06-11 12:01:44 MySQL_HostGroups_Manager.cpp:311:connect_error(): [ERROR] Shunning server x.x.x.x:3306 with 18 errors/sec. Shunning for 10 seconds 2020-06-11 12:01:49 MySQL_HostGroups_Manager.cpp:311:connect_error(): [ERROR] Shunning server x.x.x.x:3306 with 10 errors/sec. Shunning for 10 seconds 2020-06-11 12:01:54 MySQL_HostGroups_Manager.cpp:311:connect_error(): [ERROR] Shunning server x.x.x.x:3306 with 10 errors/sec. Shunning for 10 seconds How do we get them in our graphs? • ProxySQL log tailer • Looks for: connection timeouts, shunned hosts, shunned due to replication lag • Exports metrics to Prometheus every minute Shunning a primary [2 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 55
  • 56. Shunning a primary [3 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 56
  • 57. Shunning a primary for 1 second will cause an avalanche connections • Normal latency is 10ms to 50ms • Added latency of 1 second will decrease application throughput • Decreased application throughput means k8s scale up workers • k8s scale up means more incoming connections • Rinse and repeat… How we dealt with this: • During some incidents we throttle down workers • Counter intuitive: throttling down works increases throughput • Some application workers now have a fixed ceiling Shunning a primary [4 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 57
  • 58. Most ProxySQL tuning is done on a global level Some examples: • mysql-connection_delay_multiplex_ms • mysql-free_connections_pct • mysql-wait_timeout Having a separate stack allows • Fine tuned multiplexing configuration • Earlier or later closing of connections • Separate handling of (end) user connections Separation of stacks: Better tuning (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 58
  • 59. Maintenance on Collect is scary • Draining from GLB gracefully is ”closing after X-minutes” • Near capacity means we run a risk when performing maintenance Maintenance on MGW/Fencing/Locfen is scary • Draining a host takes ages to happen • Aggressive reuse of connections by connection pool • Connection timeout (wait_timeout) is 8 hours • Some applications don’t handle closing of a connection well Separation of stacks: maintenance (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 59
  • 60. Warstory: instability on one cluster swiped out many others The instable cluster • MySQL back_log set too low • TCP listen overflows on primary • ProxySQL started to shun primary The effect • Continuous shunning happened • TCP listen overflows started to happen on ProxySQL • Affected stability on other hostgroups Separation of stacks: cascading effect [1 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 60
  • 61. Shunning of primary causes endless shunning loop Separation of stacks: cascading effect [2 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 61
  • 62. Listen overflows on ProxySQL hosts make it even worse Separation of stacks: cascading effect [3 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 62
  • 63. In effect Collect layer shuns Locfen layer hosts Separation of stacks: cascading effect [4 of 4] (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 63
  • 64. After a client-connection closes, the connection will be reused • Collect → Locfen • Locfen → database ProxySQL resets connection to initial connection parameters What if new connection doesn’t match settings (e.g. UTF8mb4 or CET timezone?) Will the connection between Locfen → database also change? Other quirks: Connection contamination (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 64
  • 65. Uneven distribution of connections/queries • Weight influences the distribution of connections • Reuse of existing connections is favored by ProxySQL • Influenced by mysql-free_connections_pct The variable mysql-free_connections_pct is a global variable • Percentage of maximum allowed connections of a hostgroup • Some hostgroups allow up to 3000 incoming connections • 2% of 3000 is 60 connections, actual usage is 10 to 15 • More connections are kept open in connection pool than necessary Other quirks: Uneven distribution (Service Discovery and Failover with ProxySQL and Orchestrator – PL May 2021) 65
  • 66. Thanks ! Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator by Jean-François Gagné and Art van Scheppingen Presented at Percona Live Online, May 2021