Wednesday, August 20, 2014

Two lessons on haproxy checks and swap space

Let's assume you want to host a Wordpress site which is not going to get a lot of traffic. You want to use EC2 for this. You still want as much fault tolerance as you can get at a decent price, so you create an Elastic Load Balancer endpoint which points to 2 (smallish) EC2 instances running haproxy, with each haproxy instance pointing in turn to 2 (not-so-smallish) EC2 instances running Wordpress (Apache + MySQL). 

You choose to run haproxy behind the ELB because it gives you more flexibitity in terms of load balancing algorithms, health checks, redirections etc. Within haproxy, one of the Wordpress servers is marked as a backup for the other, so it only gets hit by haproxy when the primary one goes down. On this secondary Wordpress instance you set up MySQL to be a slave of the primary instance's MySQL. 

Here are two things (at least) that you need to make sure you have in this scenario:

1) Make sure you specify the httpchk option in haproxy.cfg, otherwise the primary server will not be marked as down even if Apache goes down. So you should have something like:

backend servers-http
  server s1 10.0.1.1:80 weight 1 maxconn 5000 check port 80
  server s2 10.0.1.2:80 backup weight 1 maxconn 5000 check port 80
  option httpchk GET /

2) Make sure you have swap space in case the memory on the Wordpress instances gets exhausted, in which case random processes will be killed by the oom process (and one of those processes can be mysqld). By default, there is no swap space when you spin up an Ubuntu EC2 instance. Here's how to set up a 2 GB swapfile:

dd if=/dev/zero of=/swapfile1 bs=1024 count=2097152
mkswap /swapfile1
chmod 0600 /swapfile1
swapon /swapfile1
echo "/swapfile1 swap swap defaults 0 0" >> /etc/fstab

I hope these two things will help you if you're not already doing them ;-)

Friday, August 15, 2014

Managing OpenStack security groups from the command line

I had an issue today where I couldn't connect to a particular OpenStack instance on port 443. I decided to inspect the security group it belongs (let's call it myapp) to from the command line:

# nova secgroup-list-rules myapp
+-------------+-----------+---------+------------+--------------+
| IP Protocol | From Port | To Port | IP Range   | Source Group |
+-------------+-----------+---------+------------+--------------+
| tcp         | 80        | 80      | 0.0.0.0/0  |              |
| tcp         | 443       | 443     | 0.0.0.0/24 |              |
+-------------+-----------+---------+------------+--------------+

Note that the IP range for port 443 is wrong. It should be all IPs and not a /24 network.

I proceeded to delete the wrong rule:

# nova secgroup-delete-rule myapp tcp 443 443 0.0.0.0/24                                                               
+-------------+-----------+---------+------------+--------------+
| IP Protocol | From Port | To Port | IP Range   | Source Group |
+-------------+-----------+---------+------------+--------------+
| tcp         | 443       | 443     | 0.0.0.0/24 |              |
+-------------+-----------+---------+------------+--------------+


Then I added back the correct rule:

 # nova secgroup-add-rule myapp tcp 443 443 0.0.0.0/0                                                                   
+-------------+-----------+---------+-----------+--------------+
| IP Protocol | From Port | To Port | IP Range  | Source Group |
+-------------+-----------+---------+-----------+--------------+
| tcp         | 443       | 443     | 0.0.0.0/0 |              |
+-------------+-----------+---------+------------+--------------+

Finally, I verified that the rules are now correct:

# nova secgroup-list-rules myapp                                                                                       
+-------------+-----------+---------+-----------+--------------+
| IP Protocol | From Port | To Port | IP Range  | Source Group |
+-------------+-----------+---------+-----------+--------------+
| tcp         | 443       | 443     | 0.0.0.0/0 |              |
| tcp         | 80        | 80      | 0.0.0.0/0 |              |
+-------------+-----------+---------+-----------+--------------+

Of course, the real test was to see if I could now hit port 443 on my instance, and indeed I was able to.

Tuesday, July 22, 2014

Troubleshooting haproxy 502 errors related to malformed/large HTTP headers

We had a situation recently where our web application started to behave strangely. First nginx (which sits in front of the application) started to error out with messages of this type:

upstream sent too big header while reading response header from upstream

A quick Google search revealed that a fix for this is to bump up proxy_buffer_size in nginx.conf, for both http and https traffic, along these lines:

proxy_buffer_size   256k;
proxy_buffers   4 256k;
proxy_busy_buffers_size   256k;

Now nginx was happy when hit directly. However, haproxy was still erroring out with a 502 'bad gateway' return code, followed by PH. Here is a snippet from the haproxy log file:

Jul 22 21:27:13 127.0.0.1 haproxy[14317]: 172.16.38.57:53408 [22/Jul/2014:21:27:12.776] www-frontend www-backend/www2:80 1/0/1/-1/898 502 8396 - - PH-- 0/0/0/0/0 0/0 "GET /someurl HTTP/1.1"

Another Google search revealed that PH means that haproxy rejected the header from the backend because it was malformed.

At this point, an investigation into the web app did discover a loop in the code that kept adding elements to a cookie included in the response header.

Anyway, I leave this here in the hope that somebody will stumble on it and benefit from it.

Thursday, July 17, 2014

First experiences with OpenStack

We hit a big milestone this week, as we started to use OpenStack as a private cloud, intially just for QA/integration environments. Up to now we've been creating KVM machines semi-manually, which used to take minutes. Now we cut down that process to seconds, calling the Nova API from the command line, e.g.:

$ nova boot --image precise-image --flavor www --key_name mykey --nic net-id=3eafbd4f-0389-4c5b-93ba-7764742ee8cd www1.qa1

Once an instance is provisioned, we bootstrap it with Chef:

$ knife bootstrap www1.qa1.mydomain.com -x ubuntu --sudo -E qa1 -N www1.qa1 -r "role[base], role[www]"

Our internal network architecture is fairly complex, so my colleague Jeff Roberts spent quite some time bending OpenStack Neutron to his will (in conjunction with Open vSwitch) in order to support our internal VLANs. The OpenStack infrastructure has been stable so far, and it's just such a pleasure to do everything via an API and not to spin VMs up manually. Being back to working with a (private) cloud feels good.

This is just version 1.0 of our OpenStack rollout. Soon we'll start spinning up one environment at a time using chef-metal and fog  and we'll also integrate instance + environment spin-up with Jenkins. Exciting times ahead!

Friday, June 13, 2014

Setting up the hostname in Ubuntu

Most people recommend setting up the hostname on a Linux box so that:

1) running 'hostname' returns the short name (i.e. myhost)
2) running 'hostname -f' returns the FQDN (i.e. myhost.prod.example.com)
3) running 'hostname -d' returns the domain name (i.e prod.example.com)

After experimenting a bit and also finding this helpful Server Fault post, here's what we did to achieve this (we did it via Chef recipes, but it amounts to the same thing):

  • make sure we have the short name in /etc/hostname:
myhost

(also run 'hostname myhost' at the command line)
  • make sure we have the FQDN as the first entry associated with the IP of the server in /etc/hosts:
10.0.1.10 myhost.prod.example.com myhost myhost.prod
  • make sure we have the domain name set up as the search domain in /etc/resolv.conf:
search prod.example.com

Reboot the box when you're done to make sure all of this survives reboots.



Tuesday, May 20, 2014

Technologies to look into as a sysadmin

These are some of the technologies that I think are either established, or new and promising, but all useful for sysadmins, no matter what their level of expertise is. Some of them I am already familiar with, some are on my TODO list, some I am exploring currently. They all reflect my own taste, so YMMV.

Operating systems
  • Ubuntu

Programming/scripting languages
  • Go
  • Python/Ruby

Configuration management
  • Chef
  • Ansible

Monitoring/graphing/logging/searching
  • Sensu
  • Graphite
  • Logstash
  • ElasticSearch

Load balancer/Web server
  • HAProxy
  • Nginx

Relational databases
  • MySQL
  • PostgreSQL

Non-relational distributed databases
  • Riak
  • Cassandra

Service discovery
  • etcd
  • consul

Virtualization
  • KVM
  • Vagrant
  • Docker

Software defined networking (SDN)
  • Open vSwitch

IaaS
  • OpenStack

PaaS
  • CloudFoundry


This should keep most people in the industry busy for a while ;-)

Friday, April 25, 2014

Dashboards are important!

In this case, they were a factor in my having a discussion with Eric Garcetti, the mayor of Los Angeles, who was visiting our office. He was intrigued by the Graphite dashboards we have on 8 monitors around the Ops area and I explained to him a little bit of what's going on in terms of what we're graphing. I'll let you guess who is the mayor in this photo:


Slides from my remote presentation on "Modern Web development and operations practices" at MSU

Titus Brown was kind enough to invite me to present to his students in the CSE 491 "Web development" class at MSU. I presented remotely, via Google Hangouts, on "Modern Web development and operations practices" and it was a lot of fun. Perhaps unsurprisingly, most of the questions at the end were on how to get a job in this field and be able to play with some of these cool technologies. My answer was to become active in Open Source, beef up your portfolio on GitHub, go to conferences and network (this was actually Titus's idea, but I wholeheartedly agree), and in general  be curious and passionate about your field, and success will follow. I posted my slides on Slideshare if you are curious to take a look. Thanks to Dr. Brown for inviting me! :-)

Monday, April 07, 2014

Why does it work in staging but not in production?

This is a question that I am sure was faced by every developer and operation engineer out there. There can be multiple answers to this question, and I'll try to offer some of the ones we arrived at, having to do mainly with our Chef workflow, but that can be applied I think to any other configuration management tool.

1) A Chef cookbook version in staging is different from the version in production

This is a common scenario, and it's supposed to work this way. You do want to test out new versions of your cookbooks in staging first, then update the version of the cookbook in production.

2) A feature flag is turned on in staging but turned off in production

We have Chef attributes defined in attributes/default.rb that serve as feature flags. If a certain attribute is true, some recipe code or template section gets included which wouldn't be included if the attribute were false. The situation can occur where a certain attribute is set to true in the staging environment but is set to false in the production environment, at which point things can get out of sync. Again, this is expected, as you do want to test new features out in staging first, but don't forget to turn them on in production at some point.

3) A block of code or template is included in staging but not in production

We had this situation very recently. Instead of using attributes as feature flags, we were directly comparing the environment against 'stg' or 'prod' inside an if block in a template, and only including that template section if the environment was 'stg'. So things were working perfectly in staging, but mysteriously the template section wasn't even there in production. An added difficulty was that the template in question was peppered with non-indented if blocks, so it took us a while to figure out what was going on.

Two lessons here:

a) Make your templates readable by indenting code blocks.

b) Use attributes as feature flags, and don't compare directly against the current environment. This way, it's easier to always look at the default attribute file and see if a given feature flag is true or false.

4) A modification is made to the cookbook version in production directly on the Chef server

I blogged about this issue in the past. Suppose you have an environments file that pins a given cookbook (let's designate it as cookbook C) to 1.0.1 in staging and to 1.0.0 in production. You want to upgrade production to 1.0.1, because it was tested in staging and it worked fine. However, instead of i) modifying the environments/prod.rb file and pinning the cookbook C to 1.0.1, ii) updating the Chef server via "knife environment from file environments/prod.rb" and iii) committing your changes in git, you modify the version of the cookbook C directly on the Chef server with "knife environment edit prod".

Then, the next time you or somebody else modifies environments/prod.rb to bump up another cookbook to the next version, the version of cookbook C in that file is still 1.0.0, so when you upload environments/prod.rb to the Chef server, it will downgrade cookbook C from 1.0.1 to 1.0.0. Chaos will ensue the next time chef-client runs on the nodes that have recipes from cookbook C. Production will be broken, while staging will still happily work.

Here are 2 other scenarios not related directly to staging vs production, but instead having the potential to break production altogether.

You forget to upload the new version of the cookbook to the Chef server

You make all of your modifications to the cookbook, you commit your code to git, but for some reason you forget to upload the cookbook to the Chef server. Particularly if you keep the same version of the cookbook that is in staging (and possibly in production), then your modifications won't take effect and you may spend some quality time pulling your hair.

You upload a cookbook to the Chef server without bumping its version

There is another, even worse, scenario though: you do upload your cookbook to the Chef server, but you realize that you didn't bump up the version number compared to what is currently pinned to production. As a consequence, all the nodes in production that have recipes from that cookbook will be updated the next time they run chef-client. That's a nasty one. It does happen. So make sure you pay attention to your cookbook versioning process and stick to it!

Friday, March 07, 2014

More on haproxy geolocation detection and CDN services

In a previous blog post I described a method to do geolocation detection with haproxy. The country detection was based on the user's client IP. However, if you have a CDN service in front of your load balancer, then the source IPs will all belong to the CDN server farm, and the closest such server to an end user may not be in the same country as the user. Fortunately, CDN services generally pass that end user IP address in some specific HTTP header, so you can still perform the geolocation detection by inspecting that header. For example, Akamai passes the client IP in a header called True-Client-IP.

In our haproxy.cfg rules detailed below we wanted to handle both the case where our load balancer is hit directly by end users (in case we bypass any CDN service), and the case where the load balancer is hit via a CDN.

1) We set our own HTTP headers containing the country code as detected by geolocation based on a) the source IP (this is so we can still look at the source IP in case we bypass the CDN and hit our load balancer directly) and b) the specific CDN header containing the actual client IP (True-Client-IP in the case of Akamai):

http-request set-header X-Country-Src %[src,map_ip(/etc/haproxy/geolocation.txt)]

http-request set-header X-Country-Akamai %[req.hdr_ip(True-Client-IP,-1),map_ip(/etc/haproxy/geolocation.txt)]


2) We set an ACL that is true if we detect the presence of the True-Client-IP header, which tells us that we are hit via Akamai:

acl acl_akamai_true_client_ip_header_exists req.hdr(True-Client-IP) -m found

3) We set an ACL that is true if we detect that the country of origin (obtained via Akamai's True-Client-IP) is US:

acl acl_geoloc_akamai_true_client_ip_us req.hdr(X-Country-Akamai) -m str -i US

4) We set an ACL that is true if we detect that the country of origin (obtained via the source IP of the client) is US:

acl acl_geoloc_src_us req.hdr(X-Country-Src) -m str -i US

5) Based on the ACLs defined above, we send non-US traffic to a specific backend, IF we are being hit via Akamai (ACL #2) AND we detected that the country of origin is non-US (negation of ACL #3) OR if we detected that the country of origin if non-US via the source IP (negation of ACL #4):

use_backend www-backend-non-us if acl_akamai_true_client_ip_header_exists !acl_geoloc_akamai_true_client_ip_us or !acl_geoloc_src_us

(note that the AND is implicit in the way haproxy looks at combinations of ACLs)

6) We also we an HTTP header called X-Country which our application inspects in order to perform country-specific logic. We first set this header to the X-Country-Src header set in rule #1, but we override it if we are getting hit via Akamai:

http-request set-header X-Country %[req.hdr(X-Country-Src)]
http-request set-header X-Country %[req.hdr(X-Country-Akamai)] if acl_akamai_true_client_ip_header_exists


This looks pretty complicated, but it works well :-)


Friday, February 28, 2014

Example of Chef workflow

Here is a quick example of a Chef workflow that has been working for us. It can be easily improved on, especially around testing, but it's a good foundation.

1) Put your chef-repo on Github.
2) When you want to modify a cookbook, do a git pull to get the latest version of the cookbook.
3) Modify the cookbook.
4) Check your environments (I'll assume staging and production for now, to keep it simple) to see what version of the cookbook is used in production vs staging. Let's assume both staging and production environments use the latest version of the cookbook, say 0.1.
5) Modify metadata.rb and bump up the version of the cookbook to 0.2.
6) Modify the staging environment file (for example environments/stg.rb) and pin the cookbook you modified to version 0.2. Make sure the production environment is still pinned to 0.1.
7) Update the staging environment on the Chef server via: 'knife environment from file environments/stg.rb'
8) Upload the new version of the cookbook (0.2) to the Chef server via: 'knife cookbook upload mycookbook' (it should report version 0.2 after the upload)
9) Run chef-client on a staging box that uses the cookbook you modified. Check that everything looks good.
10) Assuming everything looks good in staging, modify the production environment file (for example environments/prod.rb) and pin the cookbook you modified to the new version 0.2.
11) Update the production environment on the Chef server via: 'knife environment from file environments/prod.rb'.
12) Run chef-client on a prod box and check that everything is OK. If it looks good, either let chef-client run by itself on all prod boxes, or run chef-client manually to force the change.
13) Commit your coobook and environment changes into git and push to Github.

Note that there is the possibility of screw-ups if somebody forgets step #13. For this reason, I usually am double careful and check especially my local version of the environment files (stg.rb and prod.rb) against what is actually running on the Chef server. I run 'knife environment show stg' and compare the result to stg.rb. I also run 'knife environment show prod' and compare the result to prod.rb. Only if they both look good do I modify my local copies of stg.rb and prod.rb and then upload them to the Chef server. We've had issues in the past with changes that were made to the Chef server directly (via 'knife environment edit') that got overwritten when somebody uploaded their version of the environment file that contained an older version of the given cookbook. For this reason I don't recommed making changes directly on the Chef server by editing roles, environments, etc, but instead making all changes on your local files, then uploading those files to Chef and also committing those changes to Github.

As I said in the beginning, there is the opportunity to run various testing tools (at a minimum rubocop and Foodcritic) on your cookbook before uploading it to the Chef server. But that is for another post.

Monday, January 13, 2014

Geolocation detection with haproxy

A useful feature for a web application is the ability to detect the user's country of origin based on their source IP address. This used not to be possible in haproxy unless you applied Cyril Bonté's geolocation patches (see the end of this blog post for how exactly to do that if you don't want to live on the bleeding edge of haproxy). However, the latest development version of haproxy (which is 1.5-dev21 at this time) contains geolocation detection functionality.

Here's how to use the geolocation detection feature of haproxy:

1) Generate text file which maps IP address ranges to ISO country codes

This is done using Cyril's haproxy-geoip utility, which is available in his geolocation patches. Here's how to locate and run this utility:
  • clone patch git repo: git clone https://github.com/cbonte/haproxy-patches.git
  • the haproxy-geoip script is now available in haproxy-patches/geolocation/tools
    • for the script to run, you need to have the funzip utility available on your system (it's part of the unzip package in Ubuntu)
    • you also need the iprange binary, which you can 'make' from its source file available in the haproxy-1.5-dev21/contrib/iprange directory; once you generate the binary, copy it somewhere in your PATH so that haproxy-geoip can locate it
  • run haproxy-geoip, which prints its output (IP ranges associated to ISO country codes) to stdout, and capture stdout to a file: haproxy-geoip > geolocation.txt
  • copy geolocation.txt to /etc/haproxy
2) Set custom HTTP header based on geolocation

For this, haproxy provides the map_ip function, which locates the source IP (the predefined 'src' variable in the line below) in the IP range in geolocation.txt and returns the ISO country code. We assign this country code to the custom X-Country HTTP header:

http-request set-header X-Country %[src, map_ip(/etc/haproxy/geolocation.txt)]

If you didn't want to map the source IP to a country code, but instead wanted to inspect the value of an HTTP header such as X-Forwarded-For, you could do this:

http-request set-header X-Country %[req.hdr_ip(X-Forwarded-For,-1), map_ip(/etc/haproxy/geolocation.txt)]

3) Use geolocation in ACLs

Let's assume that if the country detected via geolocation is not US, then you want to send the user to a different backend. You can do that with an ACL. Note that we compare the HTTP header X-Country which we already set above to the string 'US' using the '-m str' string matching functionality of haproxy, and we also specify that we want a case insensitive comparison with '-i US':

acl acl_geoloc_us req.hdr(X-Country) -m str -i US
use_backend www-backend-non-us if !acl_geoloc_us

If you didn't want to set the custom HTTP header, you could use the map_ip function directly in the definition of the ACL, like this:

acl acl_geoloc_us %[src, map_ip(/etc/haproxy/geolocation.txt)] -m str -i US
use_backend www-backend-non-us if !acl_geoloc_us

Speaking of ACLs, here's an example of defining ACLs based on the existence of a cookie and based on the value of the cookie then choosing a backend based on those ACLs:

acl acl_cookie_country req.cook_cnt(country_code) eq 1
acl acl_cookie_country_us req.cook(country_code) -m str -i US
use_backend www-backend-non-us if acl_cookie_country !acl_cookie_country_us

And now for something completely different...which is what I mentioned in the beginning of this post: 

How to use the haproxy geolocation patches with the current stable (1.4) version of haproxy

a) Patch haproxy source code with gelocation patches, compile and install haproxy:
b) Generate text file which maps IP address ranges to ISO country codes
  • install funzip: apt-get install unzip
  • create iprange binary
    • cd haproxy-1.4.24/contrib/iprange
    • make
    • the iprange binary will be created in the same folder. copy that to /usr/local/sbin
  • haproxy-geoip is located here: haproxy-patches/geolocation/tools
  • haproxy-geoip > geolocation.txt
  • copy geolocation.txt to /etc/haproxy 
c) Obtain country code based on source IP and use it in ACL

This is done via the special 'geolocate' statement and the 'geoloc' variable added to the haproxy configuration syntax by the geolocation patch:

geolocate src /etc/haproxy/geolocation.txt
acl acl-au geoloc eq AU
use_backend www-backend-au if acl-au


If instead of the source IP you want to map the value of the X-Forwarded-For header to a country, use:

geolocate hdr_ip(X-Forwarded-For,-1) /etc/haproxy/geolocation.txt

If you wanted to redirect to another location instead of using an ACL, use:

redirect location http://some.location.example.com:4567 if { geoloc AU }

That's it for now. I want to thank Cyril Bonté, the author of the geolocation patches, and Willy Tarreau, the author of haproxy, for their invaluable help and their amazingly fast responses to my emails. It's a pleasure to deal with such open source developers passionate about the software they produce.  Also thanks to my colleagues Zmer Andranigian for working on getting version 1.4 of haproxy to work with geolocation, and Jeff Roberts for working on getting 1.5-dev21 to work.

One last thing: haproxy-1.5-dev21 has been very stable in production for us, but of course test it thoroughly before deploying it in your environment.

Monday, December 23, 2013

Ops Design Pattern: local haproxy talking to service layer

Modern Web application architectures are often composed of a front-end application layer (app servers running Java/Python/Ruby aided by a generous helping of JavaScript) talking to one or more layers of services (RESTful or not) which in turn may talk to a distributed database such as Riak.

Typically we see a pair of load balancers in HA mode deployed up front and handling user traffic, or more commonly serving as the origin for CDN traffic. In order to avoid deploying many pairs of load balancers in between the front-end app server layer and various services layers, or in between one service layer and another, one design pattern I've successfully used is an haproxy instance running locally (on 127.0.0.1) on each node that needs to talk to N other nodes running some type of service. This approach has several advantages:

  • No need to use up nodes, be they bare-metal servers, cloud instances or VMs, for the sole purpose of running yet another haproxy instance (and you actually need 2 nodes for an HA configuration, plus you need to run keepalive or something similar on each node)
  • Potentially fewer bottlenecks, as each node fans out to all other services it needs to talk to, with no need to go first through a centralized load balancer
  • Easy deployment via Chef or Puppet, by simply adding the installation of the haproxy instance to the app node's cookbook/manifest
The main disadvantage of this solution is an increased number of health checks against the service nodes behind each haproxy (1 health check from each app node). Also, as @lusis pointed out, in some scenarios, for example when the local haproxy instances talk to a Riak cluster, there is the possibility of each app node seeing a different image of the cluster in terms of the particular Riak node(s) it gets the data from (but I think with Riak this is the case even with a centralized load balancer approach).

In any case, I recommend this approach which has worked really well for us here at NastyGal. I used a similar approach in the past as well at Evite. 

Thanks to @bdha for spurring an interesting Twitter thread about this approach, and to @lusis and @obfuscurity for jumping into the discussion with their experiences. As @garethr said, somebody needs to start documenting these patterns!

Thursday, December 12, 2013

Setting HTTP request headers in haproxy and interpolating variables

I had the need to set a custom HTTP request header in haproxy. For versions up to 1.4.x, the way to do this is :

reqadd X-Custom-Header:\ some_string

However, some_string is just a static string, and I could see no way of interpolating a variable in the string. Googling around, this is possible in haproxy 1.5.x with this method:

http-request set-header X-Custom-Header %[dst_port]

where dst_port is the variable we want to interpolate and %[variable] is the syntax for interpolation.

Other examples of variables available for you in haproxy.cfg are in Section 7.3 "Fetching samples" in the haproxy 1.5 configuration manual.

Tuesday, December 10, 2013

Creating sensu alerts based on graphite data

We had the need to create Sensu alerts based on some of the metrics we send to Graphite. Googling around, I found this nice post by @ulfmansson talking about the way they did it at Recorded Future. Ulf recommends using Sean Porter's check-data.rb Sensu plugin for alerting based on Graphite data. It wasn't clear how to call the plugin, so we experimented a bit and came up with something along these lines (note that check-data.rb requires the sensu-plugin gem):

$ ruby check-data.rb -s graphite.example.com -t "movingAverage(stats.lb1.prod.assets-backend.session_current,10)" -w 100 -c 200

This run the check-data.rb script against the server graphite.example.com (-s option) requesting the value or the target metric movingAverage(stats.lb1.prod.assets-backend.session_current,10) (-t option) and setting a warning threshold of 100 for this value (-w option), and a critical threshold of 200 (-c option).  The target can be any function supported by Graphite. In this example, it is a 10-minute moving average for the number of sessions for the "assets" haproxy backend. By default check-data.rb looks at the last 10 minutes of Graphite data (this can be changed by specifying something like -f "-5mins").

To call the check in the context of sensu, you need to deploy it to the client which will run it, and configure the check on the Sensu server in a json file in /etc/sensu/conf.d/checks:

"command": "/etc/sensu/plugins/check-data.rb -s graphite.example.com -t \"movingAverage(stats.lb1.prod.assets-backend.session_current,10)\" -w 100 -c 200"

Friday, October 18, 2013

Avoiding keepalive storms in sensu

Sensu is a great new monitoring tool, but also a bit rough around the edges. We've been willing to live with that, because of its benefits, in particular ease of automation and increased scalability due to its use of a queuing system. Speaking of queueing systems, Sensu uses RabbitMQ for that purpose. We haven't had performance or stability issues with the rabbit, but we have been encountering a pretty severe issue with the way Sensu and RabbitMQ interact with each other.

We have systems deployed across several cloud providers and data centers, with site-to-site VPN links between locations. What started to happen fairly often for us was what we call a "keepalive storm", where all of a sudden all Sensu clients were seen by the Sensu server as unavailable, since no keepalive had been sent by the clients to RabbitMQ.  The thresholds for the keepalive timers in Sensu are hardcoded (at least in the Sensu version we are using, which is 0.10.2) and are defined in /opt/sensu/embedded/lib/ruby/gems/2.0.0/gems/sensu-0.10.2/lib/sensu/server.rb as 120 seconds for warnings and 180 seconds for critical alerts:


             thresholds = {
                :warning => 120,
                :critical => 180
              }

What we think was happening is that the connections between the Sensu clients and RabbitMQ (which in our case is running on the same box as the Sensu server) were reset, either because of a temporary glitch in the site-to-site VPN connection, or because of some other undetermined but probably network-related cause. In any case, this issue was becoming severe and was causing the engineer on pager duty to not get a lot of sleep at night.

After lots of hair-pulling, we found a workaround by specifying a non-default value for the heartbeat parameter in the RabbitMQ configuration file rabbitmq.config. Here's what the documentation says about the heartbeat parameter:

Value representing the heartbeat delay, in seconds, that the server sends in the connection.tune frame. If set to 0, heartbeats are disabled. Clients might not follow the server suggestion, see the AMQP reference for more details. Disabling heartbeats might improve performance in situations with a great number of connections, but might lead to connections dropping in the presence of network devices that close inactive connections.
Default: 600


Note that the default value is 600 seconds, much larger than the 120 and 180 second keepalive thresholds defined in Sensu. So what we did was set a heartbeat value of less than 120. We chose 60 seconds for this value and it seemed to work fine. We still have keepalive storms, but they are definitely due to real but temporary issues in site-to-site VPN connectivity and they usually resolve themselves immediately.

One more thing: we install Sensu via its Chef community cookbook. The Sensu cookbook uses the RabbitMQ community cookbook, which doesn't define the heartbeat parameter as an attribute. We had to add that attribute, as well as use it in the rabbitmq.config.erb template file.

Just for reference, we modified cookbooks/rabbitmq/attributes/default.rb and added:


#avoid sensu keepalive storms!
default['rabbitmq']['heartbeat'] = 60

We also modified cookbooks/rabbitmq/templates/default/rabbitmq.config.erb and added:

{heartbeat, <%= node['rabbitmq']['heartbeat'] %>}


Disabling public key authentication in sftp

I just had an issue trying to sftp into a 3rd party vendor server using a user name and password. It worked fine with Filezilla, but from the command line I got:

Received disconnect from A.B.C.D: 11:
Couldn't read packet: Connection reset by peer

(A.B.C.D denotes the IP address of the sftp server)

I then ran sftp in verbose mode (-v) and got:

debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/mylocaluser/.ssh/id_rsa
Received disconnect from A.B.C.D: 11:
Couldn't read packet: Connection reset by peer

This made me realize that the sftp server is configured to accept password authentication only. I inspected the man page for sftp and googled around a bit to figure out how to disable public key authentication and I found a way that works:

sftp -oPubkeyAuthentication=no remoteuser@sftpserver

Wednesday, August 28, 2013

Keepalived, iproute2 and HAProxy (part 2)

In part 1 of this 2-part series, I explained how we initially set up keepalived and iproute2 on 2 HAProxy load balancers with the goal of achieving high availability at the load balancer layer. Each of the load balancers had 3 interfaces, and we wanted to be able to ssh into any IP address on those interfaces -- hence the need to iproute2 rules. However, adding keepalived into the mix complicated things.

To test failover at the HAProxy layer, we simulated a system failure by rebooting the primary load balancer. As expected, keepalived transferred the floating IP address to the secondary load balancer, and everything worked as expected. However, things started going south when the primary load balancer came back online. We had a chicken and egg problem: the iproute2 rules related to the floating IP address didn't kick in when rc.local was run, because the floating IP wasn't there yet. Then keepalived correctly identified the primary system as being up and transferred the floating IP there, but there was no route to it via iproute2. We decided that the iproute2 rules/policies unnecessarily complicated things, so we got rid of them. This meant we were back to one default gateway, on the same subnet as our front-end interface. The downside was that we were only able to ssh into one of the 3 IPs associated with the 3 interfaces on each load balancer, but the upside was that things were a lot simpler.

However, our failover tests with keepalived were still not working as expected. We mainly had issues when the primary load balancer came back after a reboot. Although keepalived correctly reassigned the floating IP to the primary LB, we weren't able to actually hit that IP over ssh or HTTP. It turned out that it was an ARP cache issue on the switch stack where the load balancers were connected. We had to clear the ARP cache in order for the floating IP to be associated again with the correct MAC. On further investigation, it turned out that the switches weren't accepting gratuitous ARP requests, so we enabled them by running this command on our switches:

ip arp gratuitous local

With this setup in place, we were able to fail over back and forth from the primary to the secondary load balancer. Note that whenever there are modifications to be made to the keepalived configuration, there is no good way we found to apply them (via chef) to the load balancers unless we take a very short outage while restarting keepalived on both load balancers.

Wednesday, July 10, 2013

The mystery of stale haproxy processes

We had the situation with our haproxy-based load balancers where our monitoring alerts were triggered by the fact that several haproxy processes were running, when in fact only one was supposed to be running. Looking more into it, we determined that each time Chef client ran (which by default is every 30 minutes), a new haproxy process was launched. The logic in the haproxy cookbook applied to that node was to do a 'service haproxy reload' every time the haproxy configuration file changed. Since our haproxy configuration file is based on a Chef template populated via a Chef search, that meant that the haproxy reload was happening on each Chef client run.

If you look in /etc/init.d/haproxy, you'll see that the reload launches a new haproxy process, while the existing process is supposed to finish serving existing connections, then exit. However, the symptom we were seeing was that the existing haproxy process never closed all the outstanding connections, so it never exited. Inspection via lsof also revealed that the haproxy process kept many network connections in the CLOSE_WAIT state. I need to mention that this particular haproxy box was load balancing requests from Ruby clients across a Riak cluster. After some research, it turned out that the symptom of haproxy connections in CLOSE_WAIT that never go away is due to the fact that the client connection goes away, while haproxy still waits for a confirmation of the termination of that connection. See this haproxy mailing list thread for a great in-depth explanation of the issue by the haproxy author Willy Tarreau.

In short, the solution in our case (per the mailing list thread) was to add

option forceclose

to the defaults section of haproxy.cfg.

Friday, May 31, 2013

Some gotchas around keepalived and iproute2 (part 1)

I should have written this blog post a while ago, while these things were still fresh on my mind. Still, better late than never.

Scenario: 2 bare-metal servers with 6 network ports each, to serve as our HAProxy load balancers in an active/failover configuration based on keepalived (I described how we integrated this with Chef in my previous post).

The architecture we have for the load balancers is as follows

  • 1 network interface (virtual and bonded, see below) is on a 'front-end' VLAN which gets the incoming traffic hitting HAProxy
  • 1 network interface is on a 'back-end' VLAN where the actual servers behind HAProxy live
  • 1 network interface is on an 'ops' VLAN which we want to use for accessing the HAProxy server for monitoring purposes

We (and by the way, when I say we, I mean mostly my colleagues Jeff Roberts and Zmer Andranigian) used Open vSwitch to create a virtual bridge interface and bond 2 physical interfaces on this bridge for each of the 'front-end' and 'back-end' interfaces.

To install Open vSwitch on Ubuntu, use:

# apt-get install openvswitch-switch openvswitch-controller

To create a bridge:

# ovs-vsctl add-br frontend_if

To create a bonded interface with 2 physical NICs (eth0 and eth1) on the frontend_if bridge created above:

# ovs-vsctl add-bond frontend_if frontend_bond eth0 eth1 lacp=active other_config:lacp-time=slow bond_mode=balance-tcp

We did the same for the 'back-end' interface by creating a bridge and bonding eth2 and eth3. We also configured the 'ops' interface as a regular network interface on eth4. To assign IP addresses to frontend_if, backend_if and eth4, we edited /etc/network/interfaces and added stanzas similar to:

auto eth0
iface eth0 inet static
address 0.0.0.0

auto eth1
iface eth1 inet static
address 0.0.0.0



auto frontend_if
iface frontend_if  inet static
        address 172.3.15.36
        netmask 255.255.255.0
        network 172.3.15.0
        broadcast 172.3.15.255
        gateway 172.3.15.1
        # dns-* options are implemented by the resolvconf package, if installed
        dns-nameservers 172.3.10.4 172.3.10.8
        dns-search prod-vip.mydomain.com


auto eth2
iface eth2 inet static
address 0.0.0.0

auto eth3
iface eth3 inet static
address 0.0.0.0


auto backend_if
iface backend_if  inet static
        address 172.3.50.36
        netmask 255.255.255.0
        network 172.3.50.0
        broadcast 172.3.50.255
        # dns-* options are implemented by the resolvconf package, if installed
        dns-nameservers 172.3.10.4 172.3.10.8
        dns-search prod.mydomain.com


At this point, we wanted to be able to ssh from a remote location into the HAProxy box using any of the 3 IP addresses associated with frontend_if, backend_if, and eth4. The problem was that with the regular routing rules in Linux, there's one default gateway, which in our case was on the same VLAN with frontend_if (172.3.15.1).

The solution was to install and configure the iproute2 package. This allows you to have multiple default gateways, one per interface that you want to configure this way (this blog post on iproute2 commands proved to be very useful).

To configure a default gateway for each of the 2 interfaces we defined above (frontend_if and backend_if), we added the following commands to /etc/rc.local  so that they can be run each time the box gets rebooted:


echo "1 admin" > /etc/iproute2/rt_tables
ip route add 172.3.50.0/24 dev backend_if src 172.3.50.36 table admin
ip route add default via 172.3.50.1 dev backend_if table admin
ip rule add from 172.3.15.36/32 table admin
ip rule add to 172.3.15.36/32 table admin

echo "2 admin2" >> /etc/iproute2/rt_tables
ip route add 172.3.15.0/24 dev frontend_if src 172.3.15.36 table admin2
ip route add default via 172.30.15.1 dev frontend_if table admin2
ip rule add from 172.3.15.36/32 table admin2
ip rule add to 172.3.15.36/32 table admin2


This was working great, but there was another aspect to this setup: we needed to get keepalived working between the 2 HAProxy boxes. Running keepalived means there is a new floating IP, which is a virtual IP address maintained by the keepalived process. In our case, this floating IP (172.3.15.38) was attached to the frontend_if interface, which means we had to add another iproute2 stanza:


echo "3 admin3" >> /etc/iproute2/rt_tables
ip route add 172.3.15.0/24 dev frontend_if src 172.3.15.38 table admin3
ip route add default via 172.30.15.1 dev frontend_if table admin3
ip rule add from 172.3.15.38/32 table admin3
ip rule add to 172.3.15.38/32 table admin3

I'll stop here for now. Stay tuned for part 2, where you can read about our adventures trying to get keepalived to work as we wanted it to. Hint: it involved getting rid of iproute2 policies.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...