Agile Testing

Saturday, April 04, 2009

Experiences deploying a large-scale infrastructure in Amazon EC2

At OpenX we recently completed a large-scale deployment of one of our server farms to Amazon EC2. Here are some lessons learned from that experience.

Expect failures; what's more, embrace them

Things are bound to fail when you're dealing with large-scale deployments in any infrastructure setup, but especially when you're deploying virtual servers 'in the cloud', outside of your sphere of influence. You must then be prepared for things to fail. This is a Good Thing, because it forces you to think about failure scenarios upfront, and to design your system infrastructure in a way that minimizes single points of failure.

As an aside, I've been very impressed with the reliability of EC2. Like many other people, I didn't know what to expect, but I've been pleasantly surprised. Very rarely does an EC2 instance fail. In fact I haven't yet seen a total failure, only some instances that were marked as 'deteriorated'. When this happens, you usually get a heads-up via email, and you have a few days to migrate your instance, or launch a similar one and terminate the defective one.

Expecting things to fail at any time leads to and relies heavily on the next lesson learned, which is...

Fully automate your infrastructure deployments

There's simply no way around this. When you need to deal with tens and even hundreds of virtual instances, when you need to scale up and down on demand (after all, this is THE main promise of cloud computing!), then you need to fully automate your infrastructure deployment (servers, load balancers, storage, etc.)

The way we achieved this at OpenX was to write our own custom code on top of the EC2 API in order to launch and destroy AMIs and EBS volumes. We rolled our own AMI, which contains enough bootstrap code to make it 'call home' to a set of servers running slack. When we deploy a machine, we specify a list of slack 'roles' that the machine belongs to (for example 'web-server' or 'master-db-server' or 'slave-db-server'). When the machine boots up, it will run a script that belongs to that specific slack role. In this script we install everything the machine needs to do its job -- pre-requisite packages and the actual application with all its necessary configuration files.

I will blog separately about how exactly slack works for us, but let me just say that it is an extremely simple tool. It may seem overly simple, but that's exactly its strength, since it forces you to be creative with your postinstall scripts. I know that other people use puppet, or fabric, or cfengine. Whatever works for you, go ahead and use, just use SOME tool that helps with automated deployments.

The beauty of fully automating your deployments is that it truly allows you to scale infinitely (for some value of 'infinity' of course ;-). It almost goes without saying that your application infrastructure needs to be designed in such a way that allows this type of scaling. But having the building blocks necessary for automatically deploying any type of server that you need is invaluable.

Another thing we do which helps with automating various pieces of our infrastructure is that we keep information about our deployed instances in a database. This allows us to write tools that inspect the database and generate various configuration files (such as the all-important role configuration file used by slack), and other text files such as DNS zone files. This database becomes the one true source of information about our infrastructure. The DRY principle applies to system infrastructure, not only to software development.

Speaking of DNS, specifically in the context of Amazon EC2, it's worth rolling out your own internal DNS servers, with zones that aren't even registered publicly, but for which your internal DNS servers are authoritative. Then all communication within the EC2 cloud can happen via internal DNS names, as opposed to IP addresses. Trust me, your tired brain will thank you. This would be very hard to achieve though if you were to manually edit BIND zone files. Our approach is to automatically generate those files from the master database I mentioned. Works like a charm. Thanks to Jeff Roberts for coming up with this idea and implementing it.

While we're on the subject of fully automated deployments, I'd like to throw an idea out there that I first heard from Mike Todd, my boss at OpenX, who is an ex-Googler. One of his goals is for us never to have to ssh into any production server. We deploy the server using slack, the application gets installed automatically, monitoring agents get set up automatically, so there should really be no need to manually do stuff on the server itself. If you want to make a change, you make it in a slack role on the master slack server, and it gets pushed to production. If the server misbehaves or gets out of line with the other servers, you simply terminate that server instance and launch another one. Since you have everything automated, it's one command line for terminating the instance, and another one for deploying a brand new replacement. It's really beautiful.

Design your infrastructure so that it scales horizontally

There are generally two ways to scale an infrastructure: vertically, by deploying your application on more powerful servers, and horizontally, by increasing the number of servers that support your application. For 'infinite' scaling in a cloud computing environment, you need to design your system infrastructure so that it scales horizontally. Otherwise you're bound to hit limits of individual servers that you will find very hard to get past. Horizontal scaling also eliminates single points of failure.

Here are a few ideas for deploying a Web site with a database back-end so that it uses multiple tiers, with each tier being able to scale horizontally:

1) Deploy multiple Web servers behind one or more load balancers. This is pretty standard these days, and this tier is the easiest to scale. However, you also want to maximize the work done by each Web server, so you need to find the sweet spot of that particular type of server in terms of httpd processes it can handle. Too few processes and you're wasting CPU/RAM on the server, too many and you're overloading the server. You also need to be cognizant of the fact that each EC2 instance costs you money. It can become so easy to launch a new instance that you don't necessarily think of getting the most out of the existing instances. Don't go wild unless absolutely necessary if you don't want to have a sticker shock when you get the bill from Amazon at the end of the month.

2) Deploy multiple load balancers. Amazon doesn't yet offer load balancers, so what we've been doing is using HAProxy-based load balancers. Let's say you have an HAProxy instance that handles traffic for www.yourdomain.com. If your Web site becomes wildly successful, it is imaginable that a single HAProxy instance will not be able to handle all the incoming network traffic. One easy solution for this, which is also useful for eliminating single points of failure, is to use round-robin DNS, pointing www.yourdomain.com to several IP addresses, with each IP address handled by a separate HAProxy instance. All HAProxy instances can be identical in terms of back-end configuration, so your Web server farm will get 1/N of the overall traffic from each of your N load balancers. It worked really well for us, and the traffic was spread out very uniformly among the HAProxies. You do need to make sure the TTL on the DNS record for www.yourdomain.com is low.

3) Deploy several database servers. If you're using MySQL, you can set up a master DB server for writes, and multiple slave DB servers for reads. The slave DBs can sit behind an HAProxy load balancer. In this scenario, you're limited by the capacity of the single master DB server. One thing you can do is to use sharding techniques, meaning you can partition the database into multiple instances that each handle writes for a subset of your application domain. Another thing you can do is to write to local databases deployed on the Web servers, either in memory or on disk, and then periodically write to the master DB server (of course, this assumes that you don't need that data right away; this technique is useful when you have to generate statistics or reports periodically for example).

4) Another way of dealing with databases is to not use them, or at least to avoid the overhead of making a database call each time you need something from the database. A common technique for this is to use memcache. Your application needs to be aware of memcache, but this is easy to implement in all of the popular programming languages. Once implemented, you can have your Web servers first check a value in memcache, and only if it's not there have them hit the database. The more memory you give to the memcached process, the better off you are.

Establish clear measurable goals

The most common reason for scaling an Internet infrastructure is to handle increased Web traffic. However, you need to keep in mind the quality of the user experience, which means that you need to keep the response time of the pages your serve under a certain limit which will hopefully meet and surpass the user's expectations. I found it extremely useful to have a very simple script that measures the response time of certain pages and that graphs it inside a dashboard-type page (thanks to Mike Todd for the idea and the implementation). As we deployed more and more servers in order to keep up with the demands of increased traffic, we always kept an eye on our goal: keep reponse time/latency under N milliseconds (N will vary depending on your application). When we would see spikes in the latency chart, we knew we need to act at some level of our infrastructure. And this brings me to the next point...

Be prepared to quickly identify and eliminate bottlenecks

As I already mentioned in the design section above, any large-scale Internet infrastructure will have different types of servers: web servers, application servers, database servers, memcache servers, and the list goes on. As you scale the servers at each tier/level, you need to be prepared to quickly identify bottlenecks. Examples:

1) Keep track of how many httpd processes are running on your Web servers; this depends on the values you set for MaxClients and ServerLimit in your Apache configuration files. If you're using an HAProxy-based load balancer, this also depends on the connection throttling that you might be doing at the backend server level. In any case, the more httpd processes are running on a given server, the more CPU and RAM they will use up. At some point, the server will run out of resources. At that point, you either need to scale the server up (by deploying to a larger EC2 instance, for example an m1.large with more RAM, or a c1.medium with more CPU), or you need to scale your Web server farm horizontally by adding more Web servers, so the load on each server decreases.

2) Keep track of the load on your database servers, and also of slow queries. A great tool for MySQL database servers is innotop, which allows you to see the slowest queries at a glance. Sometimes all it takes is a slow query to throw a spike into your latency chart (can you tell I've been there, done that?). Also keep track of the number of connections into your database servers. If you use MySQL, you will probably need to bump up the max_connections variable in order to be able to handle an increased number of concurrent connections from the Web servers into the database.

Since we're discussing database issues here, I'd be willing to bet that if you were to discover your single biggest bottleneck in your application, it would be at the database layer. That's why it is especially important to design that layer with scalability in mind (think memcache, and load balanced read-only slaves), and also to monitor the database servers carefully, with an eye towards slow queries that need to be optimized (thanks to Chris Nutting for doing some amazing work in this area!)

3) Use your load balancer's statistics page to keep track of things such as concurrent connections, queued connections, HTTP request or response errors, etc. One of your goals should be never to see queued connections, since that means that some user requests couldn't be serviced in time.

I should mention that a good monitoring system is essential here. We're using Hyperic, and while I'm not happy at all with its limits (in the free version) in defining alerts at a global level, I really like its capabilities in presenting various metrics in both list and chart form: things like Apache bytes and requests served/second, memcached hit ratios, mysql connections, and many other statistics obtained by means of plugins specific to these services.

As you carefully watch various indicators of your systems' health, be prepared to....

Play wack-a-mole for a while, until things get stable

There's nothing like real-world network traffic, and I mean massive traffic -- we're talking hundreds of millions of hits/day -- to exercise your carefully crafted system infrastructure. I can almost guarantee that with all your planning, you'll still feel that a tsunami just hit you, and you'll scramble to solve one issue after another. For example, let's say you notice that your load balancer starts queuing HTTP requests. This means you don't have enough Web server in the pool. You scramble to add more Web servers. But wait, this increases the number of connections to your database pool! What if you don't have enough servers there? You scramble to add more database servers. You also scramble to increase the memcache settings by giving more memory to memcached, so more items can be stored in the cache. What if you still see requests taking a long time to be serviced? You scramble to optimize slow database queries....and the list goes on.

You'll say that you've done lots of load testing before. This is very good....but it still will not prepare you for the sheer amount of traffic that the internets will throw at your application. That's when all the things I mentioned before -- automated deployment of new instances, charting of the important variables that you want to keep track of, quick identification of bottlenecks -- become very useful.

That's it for this installment. Stay tuned for more lessons learned, as I slowly and sometimes painfully learn them :-) Overall, it's been a blast though. I'm really happy with the infrastructure we've built, and of course with the fact that most if not all of our deployment tools are written in Python.

Tuesday, March 17, 2009

HAProxy and Apache performance tuning tips

I want to start my post by a big shout-out to Willy Tarreau, the author of HAProxy, for his help in fine-tuning one of our HAProxy installations and working with us through some issues we had. Willy is amazingly responsive and obviously lives and breathes stuff related to load balancing, OS and TCP stack tuning, and other arcane subjects ;-)

Let's assume you have a cluster of Apache servers behind an HAProxy and you want to sustain 500 requests/second with low latency per request. First of all, you need to bump up MaxClients and ServerLimit in your Apache configuration, as I explained in another post. In this case you would set both variables to 500. Note that you actually need to stop and start the httpd service, because simply restarting it won't change the built-in limit (which is 256). Also ignore the warning that Apache gives you on startup:

WARNING: MaxClients of 500 exceeds ServerLimit value of 256 servers,
lowering MaxClients to 256. To increase, please see the ServerLimit
directive.

Note that the more httpd processes you have, the more CPU and RAM will be consumed on the server. You need to decide how much to push the envelope in terms of concurrent httpd processes you can sustain on a given server. A good measure is the latency / responsiveness you expect from your Web application. At some point, it will start to suffer, and that will be a sign that you need to add a new Web server to your server farm (of course, this over-simplifies things a bit, since there's always the question of the database layer; I'm assuming you can use memcache to minimize database access.) Here's a good overview of the trade-offs related to MaxClients.

Other Apache configuration variables I've tweaked are StartServers, MinSpareServers and MaxSpareServers. It sometimes pays to bump up the values for these variables, so you can have spare httpd processes waiting around for those peak times when the requests hitting your server suddenly increase. Again, there's a trade-off here between server resources and number of spare httpd processes you want to maintain.

Assuming you fine-tuned your Apache servers, it's time to tweak some variables in the HAProxy configuration. Perhaps the most important ones for our discussion are the number of maximum connections per server (maxconn), httpclose and abortonclose.

It's a good idea to throttle the maximum number of connections per server and set it to a number related to the request/second rate you're shooting for. In our case, that number is 500. Since HAProxy itself needs some connections for healthchecking and other internal bookkeeping, you should set the maxconn per server to something slightly lower than 500. In terms of syntax, I have something similar to this in the backend section of haproxy.cfg:


server server1 10.1.1.1:80 check maxconn 500

I also have the following 2 lines in the backend section:


option abortonclose
option httpclose

According to the official HAProxy documentation, here's what these options do:

option abortonclose

In presence of very high loads, the servers will take some time to respond.
The per-instance connection queue will inflate, and the response time will
increase respective to the size of the queue times the average per-session
response time. When clients will wait for more than a few seconds, they will
often hit the "STOP" button on their browser, leaving a useless request in
the queue, and slowing down other users, and the servers as well, because the
request will eventually be served, then aborted at the first error
encountered while delivering the response.

As there is no way to distinguish between a full STOP and a simple output
close on the client side, HTTP agents should be conservative and consider
that the client might only have closed its output channel while waiting for
the response. However, this introduces risks of congestion when lots of users
do the same, and is completely useless nowadays because probably no client at
all will close the session while waiting for the response. Some HTTP agents
support this behaviour (Squid, Apache, HAProxy), and others do not (TUX, most
hardware-based load balancers). So the probability for a closed input channel
to represent a user hitting the "STOP" button is close to 100%, and the risk
of being the single component to break rare but valid traffic is extremely
low, which adds to the temptation to be able to abort a session early while
still not served and not pollute the servers.

In HAProxy, the user can choose the desired behaviour using the option
"abortonclose". By default (without the option) the behaviour is HTTP
compliant and aborted requests will be served. But when the option is
specified, a session with an incoming channel closed will be aborted while
it is still possible, either pending in the queue for a connection slot, or
during the connection establishment if the server has not yet acknowledged
the connection request. This considerably reduces the queue size and the load
on saturated servers when users are tempted to click on STOP, which in turn
reduces the response time for other users.

option httpclose

As stated in section 2.1, HAProxy does not yes support the HTTP keep-alive
mode. So by default, if a client communicates with a server in this mode, it
will only analyze, log, and process the first request of each connection. To
workaround this limitation, it is possible to specify "option httpclose". It
will check if a "Connection: close" header is already set in each direction,
and will add one if missing. Each end should react to this by actively
closing the TCP connection after each transfer, thus resulting in a switch to
the HTTP close mode. Any "Connection" header different from "close" will also
be removed.

It seldom happens that some servers incorrectly ignore this header and do not
close the connection eventough they reply "Connection: close". For this
reason, they are not compatible with older HTTP 1.0 browsers. If this
happens it is possible to use the "option forceclose" which actively closes
the request connection once the server responds.

And now for something completely different.....TCP stack tuning! Even with all the tuning above, we were still seeing occasional high latency numbers. Willy Tarreau to the rescue again....he was kind enough to troubleshoot things by means of the haproxy log and a tcpdump. It turned out that some of the TCP/IP-related OS variables were set too low. You can find out what those values are by running:


sysctl -a | grep ^net

In our case, the main one that was out of tune was:


net.ipv4.tcp_max_syn_backlog = 1024

Because of this, when there were more than 1,024 concurrent sessions on the machine running HAProxy, the OS had to recycle through the SYN backlog, causing the latency issues. Here are all the variables we set in /etc/sysctl.conf at the advice of Willy:

net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 1024 65023
net.ipv4.tcp_max_syn_backlog = 10240
net.ipv4.tcp_max_tw_buckets = 400000
net.ipv4.tcp_max_orphans = 60000
net.ipv4.tcp_synack_retries = 3
net.core.somaxconn = 10000

(to have these values take effect, you need to run 'sysctl -p')

That's it for now. As I continue to use HAProxy in production, I'll report back with other tips/tricks/suggestions.

Wednesday, March 04, 2009

HAProxy, X-Forwarded-For, GeoIP, KeepAlive

I know the title of this post doesn't make much sense, I wrote it that way so that people who run into issues similar to mine will have an easier time finding it.

Here's a mysterious issue that I recently solved with the help of my colleague Chris Nutting:

1) Apache/PHP server sitting behind an HAProxy instance
2) MaxMind's GeoIP module installed in Apache
3) Application making use of the geotargeting features offered by the GeoIP module was sometimes displaying those features in a drop-down, and sometimes not

It turns out that the application was using the X-Forwarded-For headers in the HTTP requests to pass the real source IP of the request to the mod_geoip module and thus obtain geotargeting information about that IP. However, mysteriously, HAProxy was sometimes (once out of every N requests) not sending the X-Forwarded-For headers at all. Why? Because KeepAlive was enabled in Apache, so HAProxy was sending those headers only on the first request of the HTTP connection that was being "kept alive". Subsequent requests in that connection didn't have those headers set, so those requests weren't identified properly by mod_geoip.

The solution in this case was to disable KeepAlive in Apache. Willy Tarreau, the author of HAProxy, also recommends setting 'option httpclose' in the HAProxy configuration file. Here's an excerpt from the official HAProxy documentation:

option forwardfor [ except  ] [ header  ]
....
It is important to note that as long as HAProxy does not support keep-alive
connections, only the first request of a connection will receive the header.
For this reason, it is important to ensure that "option httpclose" is set
when using this option.

I hope this post will be of some use to people who might run into this issue.

Tuesday, February 24, 2009

You're not a cloud provider if you don't provide an API

Cloud computing is all the rage these days, with everybody and their brother claiming to be a 'cloud provider'. Just because hosting companies have a farm of virtual servers that they can parcel out to their customers, it doesn't mean that they are operating 'in the cloud'. For that to be the case, they need to offer a solid API that allows their customers to manage resources such as virtual server instances, storage mounts, IP addresses, load balancer pools, firewall rules, etc.

A short discussion on 'XaaS' nomenclature is in order here: 'aaS' stands for 'as a Service', and X can take various values, for example P==Platform, S==Software, I==Infrastructure. You will see these acronyms in pretty much every industry-sponsored article about cloud computing. Pundits seem to love this kind of stuff. When I talk about cloud providers in this post, I mean providers of 'Infrastructure as a Service', things like the ones I mentioned above -- virtual servers, networking and storage resources, in short the low-level plumbing of an infrastructure.

A good example of 'Platform as a Service' is Google AppEngine, which offers both a development environment (right now Python-specific), and an API to interact with the 'Google cloud' when deploying your GAE application.

'Software as a Service' is pretty much what 'ASP' used to be in the dot com days (ASP == Application Service Provider if you don't remember your acronyms). The poster child for SaaS these days seems to be salesforce.com. I do however emphasize that one significant difference between SaaS and ASP is that SaaS providers DO offer an API for your application to interact with the resources they expose.

So...the common thread between the XaaS offerings is the existence of an API which allows you, as a systems and/or application architect, to interact with and manage the resources offered by the particular provider.

I've been using two cloud APIs here at OpenX, one from AppNexus and one from Amazon EC2. The AppNexus API allows you to reserve physical servers, start up, shut down and delete virtual instances on each server, clone a virtual instance, manage load balancer pools and SSL certificates at the LB level, etc. In short, it's a very solid and easy to use API.

The Amazon EC2 API is more fine grained than the one from AppNexus, which can be an advantage, but also makes it hard to coordinate the management of various resources. For example, to launch an EC2 instance you first need to create a keypair, potentially a security group, maybe an EBS volume and an elastic IP, and only then you can tie everything together via yet other EC2 API calls. For this reason, we're building our own tools around the Amazon API, tools which allow us to deploy an instance with all its associated resources via a single command-line script (and yes, we call this collection of tools the MCP). We're also using slack to deploy specific packages and applications to each instance we launch, but that's a topic for another post.

So what does all this mean to you as a systems or application architect? For a system administrator, I think it means that you need to shore up your programming skills so that you will be able to take advantage of these APIs and automate the deployment, testing and scaling of your infrastructure. For an application architect, it means that you need to shore up your sysadmin skills so you can understand the lower-level resources exposed by cloud APIs and use them to your full advantage. I think the future is bright for people who possess both sets of skills.

Tuesday, February 17, 2009

Helping the 'printable world wide web' movement

Alexander Artemenko pointed out to me that my blog lacks a sane CSS stylesheet for printing. He was nice enough to provide one for me (since I'm no CSS wizard), and I inserted it into my blog's template. Alexander runs a campaign for convincing bloggers to make their blog content printable.

BTW, here's all I had to add to my Blogger template to make the content printable:


<style type="text/css">
@media print {
#sidebar, #navbar-iframe, #blog-header, 
#comments h4, #comments-block, #footer, 
span.statcounter, #b-backlink {display: none;}
#wrap, #content, #main-content {width: 100%; margin: 0; background: #FFFFFF;}
}
</style>

Wednesday, February 04, 2009

Load Balancing in Amazon EC2 with HAProxy

Until the time comes when Amazon will offer a load balancing service in their EC2 environment, people are forced to use a software-based load balancing solution. One of the most common out there is HAProxy. I've been looking at it for the past 2 months or so, and recently we started to use it in production here at OpenX. I am very impressed with its performance and capabilities. I'll explore here some of the functionality that HAProxy offers, and also discuss some of the non-obvious aspects of its configuration.

Installation

I installed HAProxy via yum. Here's the version that was installed using the default CentOS repositories on a CentOS 5.x box:

# yum list installed | grep haproxy

haproxy.i386                             1.3.14.6-1.el5         installed

The RPM installs an init.d service called haproxy that you can use to start/stop the haproxy process.

Basic Configuration

In true Unix fashion, all configuration is done via a text file: /etc/haproxy/haproxy.cfg. It's very important that you read the documentation for the configuration file. The official documentation for HAProxy 1.3 is here.

Emulating virtual servers

In version 1.3, you can specify a frontend section, which defines an IP address/port pair for requests coming into the load balancer (think of it as a way to specify a virtual server/virtual port pair on a traditional load balancer), and multiple backend sections for each frontend, which correspond to the real IP addresses and ports of the backend servers handling the requests. If you can assign multiple external IP addresses to your HAProxy server, then you can have each one of these IPs function as a virtual server (via a frontend declaration), sending traffic to real servers declared in a backend.

However, one fairly large limitation of EC2 instances is that you only get one external IP address per instance. This means that you can have HAProxy listen on port 80 on a single IP address in EC2. How then can you have multiple 'virtual servers' on an EC2 HAProxy load balancer? The answer is in a new feature of HAProxy called ACLs.

Here's what the official documentation says:

2.3) Using ACLs
---------------

The use of Access Control Lists (ACL) provides a flexible solution to perform
content switching and generally to take decisions based on content extracted
from the request, the response or any environmental status. The principle is
simple :

- define test criteria with sets of values
- perform actions only if a set of tests is valid

The actions generally consist in blocking the request, or selecting a backend.

So let's say for example that you want to handle both www.example1.com and www.example2.com using the same HAProxy instance, but you want to load balance traffic for www.example1.com to server1 and server2 with IP addresses 192.168.1.1 and 192.168.1.2, while traffic for www.example2.com gets load balanced to server3 and server4 with IP addresses 10.0.0.3 and 10.0.0.4. Traffic for other domains will be sent to a default backend.

First, you define a frontend section in haproxy.cfg similar to this:


frontend myfrontend *:80
log    global
maxconn 25000
option forwardfor
acl acl_example1 url_sub example1
acl acl_example2 url_sub example2
use_backend example1_farm if acl_example1
use_backend example2_farm if acl_example2
default_backend default_farm

This tells haproxy that there are 2 ACLs defined -- one called acl_example1, which is triggered if the incoming HTTP request is for a URL that contains the expression 'example1', and one called acl_example2, which is triggered if the incoming HTTP request is for a URL that contains the expression 'example2'.

If acl_example1 is triggered, the backend used will be example1_farm. If acl_example2 is triggered, the backend used will be example2_farm. If no acl is triggred, the default backend used will be default_farm.

This is the simplest form of ACLs. HAProxy supports many more, and you're strongly advised to read the ACL section in the documentation for a more in-depth discussion. However, the URL-based ACLs are very useful especially in an EC2 environment.

The backend sections of haproxy.cfg will look similar to this:


backend example1_farm
mode http
balance roundrobin
server server1 192.168.1.1:80 check
server server2 192.168.1.2:80 check
backend example2_farm
mode http
balance roundrobin
server server3 10.0.0.3:80 check
server server4 10.0.0.4:80 check
backend default_farm
mode http
balance roundrobin
server server5 192.168.1.5:80 check
server server6 192.168.1.6:80 check

Logging

You can have haproxy log to syslog, but first you need to allow syslog to receive UDP traffic from 127.0.0.1 on port 514. I'll discuss syslog-ng here, with its configuration file in /etc/syslog-ng/syslog-ng.conf. To allow the UDP traffic I mention, add the line 'udp(ip(127.0.0.1) port(514));' to the source s_sys section, which in my case looks like this:


source s_sys {
file ("/proc/kmsg" log_prefix("kernel: "));
unix-stream ("/dev/log");
internal();
udp(ip(127.0.0.1) port(514));
};

Also add a filter for facility local 0:


filter f_filter9   { facility(local0); };

And finally associate that filter with the d_mesg destination, which sends messages to /var/log/messages:


log { source(s_sys); filter(f_filter9); destination(d_mesg); };

Restart syslog-ng via its init.d script.

Now for the HAProxy configuration -- you need to have a line similar to this in the 'global' section of haproxy.cfg:


global
log 127.0.0.1 local0 info

This tells haproxy to log to facility 'local0' on the localhost using the severity 'info'. You could send logs to a remote syslog server just as well.

Once you define this in the global section, you can specify the logging mechanism either in the default section (which means that all frontends will log in this way), or by a frontend-to-frontend case. If you want to have it in the default section, just write:


defaults
log    global

Once you restart haproxy, you should see messages like this in /var/log/messages:

Feb  2 22:39:49 127.0.0.1 haproxy[19150]: Connect from A.B.C.D:44463 to 10.0.0.1:80 (your_frontend_name/HTTP)

However, if you want you're handling HTTP traffic and you would like to see the exact HTTP requests handled by HAProxy, you need to add this line either to the default section, or to a specific frontend:


mode httplog

In this case, the log will contain lines that look like a regular Apache combined log line.

A caveat: if you do enable logging in httplog mode, make sure /var has lots of disk space. If your HAProxy will handle a lot of traffic, the messages file will become very large, very fast. Just don't have /var be part of the typically small / partition, or you can be in a world of trouble.

Logging the client source IP in the backend web logs

One issue with load balancers and reverse proxies is that the backend servers will see traffic as always originating from the IP address of the LB or reverse proxy. This is obviously a problem when you're trying to get stats from your web logs. To mitigate this issue, many LBs/proxies use the X-Forwarded-For header to send the IP address of the client to the destination server. HAProxy offers this functionality via the forwardfor option. You can simply declare


option forwardfor

in your backend, and all your backend servers will receive the X-Forwarded-For header.

Of course, you also have to tell your Web server to handle this header in its log file. In Apache you need to modify the LogFormat directive and replace %h with %{X-Forwarded-For}i.

SSL

To handle SSL traffic in HAProxy, you need 3 things:

1) Define a frontend with a unique name which handles *:443
2) Send traffic to real_server_IP_1:443 through real_server_IP_N:443 in the backend(s) associated with the frontend
3) Specify 'mode tcp' instead of 'mode http' both in the frontend section and in the backend section(s) which handle port 443. Otherwise you won't see any SSL traffic hitting your real servers, and you'll wonder why....

Load balancing algorithms

HAProxy can handle several load balancing algorithms:

round-robin: requests are rotated among the servers in the backend -- note that servers declared in the backend section also accept a weight parameter which specifies their relative weight in that backend; the round-robin algorithm will respect that weight ratio
leastconn: the request is sent to the server with the lowest number of connections; round-robin is used if servers are similarly loaded
source: a hash of the source IP is divided by the total weight of the running servers to determine which server will receive the request; this ensures that clients from the same IP address always hit the same server, which is a poor man's session persistence solution
uri: the part of the URL up to a question mark is hashed and used to choose a server that will handle the request; this is useful when you want certain sub-parts of your web site to be served by certain servers (this is used with proxy caches to maximize the cache hit rate)
url_param: can be used to check certain parts of the URL, for example values sent via POST requests; for example a request which specifies a user_id parameter with a certain value can get directed to the same server using the url_param method -- so this is another form of achieving session persistence in some cases (see the documentation for more details)

Session persistence with cookies

If you're OK with the fact that not all client browsers accept cookies, and you still want to use cookies as a session persistence mechanism, then HAProxy offers an easy way to do so. If you add this line to the backend section:


cookie SERVERID insert nocache indirect

then you're telling HAProxy to insert a cookie named SERVERID in the HTTP response; the cookie will be sent to the client browser via a Set-Cookie header in the response, and which is sent back by the client in a Cookie header in all subsequent requests. Note that this cookie is only a session cookie, and will not be written to disk by the client browser. For this reason, and for issues related to caching, the documentation recommends specifying the other 2 options 'nocache' and 'indirect'. In particular, 'indirect' means that the cookie will be removed from the HTTP request once it is processed by HAProxy, so your application running on the backend servers will never see it.

Once you define the cookie, you need to associate it with the servers in the backend, like this:


server server1 10.1.1.1:80 cookie server01 check
server server2 10.1.1.2:80 cookie server02 check

If a client request will get sent to server serverN initially, the cookie will insert a SERVERID corresponding to serverN in the response. In the requests that follow, the client will send back this SERVERID in the cookie and hence will be directed to the same server for the duration of the session.

Server health checks

HAProxy verifies the health of the servers declared in the backend section by sending them periodic HTTP requests. You need to specify 'check' in the server declaration line. Here is the appropriate section from the official documentation:

check
This option enables health checks on the server. By default, a server is
always considered available. If "check" is set, the server will receive
periodic health checks to ensure that it is really able to serve requests.
The default address and port to send the tests to are those of the server,
and the default source is the same as the one defined in the backend. It is
possible to change the address using the "addr" parameter, the port using the
"port" parameter, the source address using the "source" address, and the
interval and timers using the "inter", "rise" and "fall" parameters. The
request method is define in the backend using the "httpchk", "smtpchk",
and "ssl-hello-chk" options. Please refer to those options and parameters for
more information.

Performance tuning

Section 1.2 of the official documentation details the variables you can set to tweak maximum performance out of your HAProxy. The only parameter I found critical so far is maxconn, which in some of the sample configuration files was set to 2,000. This means that if HAProxy is hit with more than 2,000 concurrent connections, only the first 2,000 will be serviced, and the subsequent ones will be queued. For this reason, I recommend you set maxconn to a high number (such as 25,000 for example) in all the sections of your haproxy.cfg file: default, frontend and backend.

From what I've seen so far, the performance of HAProxy itself is very satisfactory. Even on an EC2 m1.small instance, HAProxy took less than 1% CPU for a web site we maintain that was hit with around 20,000 connections. I can guarantee that you will discover many other bottlenecks in your infrastructure long before HAProxy itself becomes your bottleneck. The only caveat in all this is the maxconn parameter above, which you do need to set to a high value to avoid unnecessary throttling of connections at the HAProxy layer.

Utilization statistics

HAProxy offers very nice utilization statistics, with tables showing the servers in all declared backends. Here's how these tables look like:

my_website
	Queue			Sessions					Bytes		Denied		Errors			Warnings		Server
	Cur	Max	Limit	Cur	Max	Limit	Total	LbTot	In	Out	Req	Resp	Req	Conn	Resp	Retr	Redis	Status	Wght	Act	Bck	Chk	Dwn	Dwntme	Thrtle
server01	0	0	-	12	704	-	177432	175934	132022286	108951418		0		56	72	1498		1h6m UP	1	Y	-	86	10	56s	-
server02	0	0	-	13	716	-	177364	176665	132655773	110034272		0		13	218	699		10m48s UP	1	Y	-	54	5	25s

To enable stats, add lines such as these to the either the 'defaults' section, or to a specific backend section:


stats enable
stats uri     /lb?stats
stats realm   Haproxy\ Statistics
stats auth    myusername:mypassword

Then hit http://external.ip.of.haproxy/lb?stats and you'll be presented with a basic HTTP authentication dialog. Log in with the credentials you specified.

High-availability strategies

In an ideal situation, you would have 2 HAProxy instances using a heartbeat-type protocol and sharing an external IP address. In case one of them goes down, the other one would assume the IP and your site will be available at all times. You could use Linux-HA, or Wackamole and the Spread toolkit. However, this is not possible in Amazon EC2 because IP addresses cannot be shared among instances in the manner that heartbeat-type protocols expect.

What you can do instead is to use an Elastic IP and associate it with your HAProxy instance. Then you can have another stand-by HAProxy instance kept in sync with the live one (only the haproxy.cfg needs to be rsync-ed across). Your monitoring system can then detect when the live HAProxy instance goes down, and automatically assign the Elastic IP address to the other instance using for example the EC2 API Tools command ec2-associate-address.

Tuesday, January 27, 2009

Tadalist -- simple but powerful task management

I've been using the free Tadalist service lately to keep track of my TODO lists. It has a deceivingly simple user interface, but for me at least it turned out to be a powerful ally in keeping on top of my ever-increasing TODO lists.

In Tadalist you can only do a few things: create a list, add an item to a list, edit the list title, edit the description of an item, check off an item as done, and reorder the items in the list. It turns out this is really all you need. I especially like the feeling of checking off an item and seeing it drop to the bottom of the list, in smaller font, joining the list of tasks that are DONE! It's almost as addictive as seeing those dots when you run unit tests. Reordering items is also a very nice feature, because an item that wasn't so hot yesterday can become really critical today, in which case you want it at the top of the list.

One feature I'd like to see is for checked off items to also get a timestamp, so you can go back and see when exactly you completed a given task.

If you're not using any task management software (in which case I hope you're still using old-school pen and paper), then give Tadalist a try.

BTW -- what task management software have YOU used successfully? Please leave a comment.

Saturday, January 24, 2009

Book review: "Pro Django" by Marty Alchin

I volunteered to write a book review for Apress and I chose "Pro Django" by Marty Alchin (of course, I received the book for free at that point). Here's my review:

If you are serious about developing Web applications in Django, then "Pro Django" will be a great addition to your technical library. Note, however, that the "Pro" in the title really means "professional", "in-depth", at times even "obscure" -- so please, do not pick up this book if you're just starting out with Django. To really get the most out of this book, you need to already have at least one, and preferably several Django applications under your belt.

I personally just finished a small fun project for my daughter's 8th grade Science Fair: a Web site written of course in Django where her friends can take a fun science-related quiz and see if they improve their score the second time around, after being told the correct answer for each question. It was my first Django application, and I used the online documentation and tutorial (both very good), as well as the online Django book. I also used Sams' "Teach yourself Django in 24 hour s" by Brad Dayley, which was very helpful for a beginner like me.

I say all this because in reading "Pro Django", you need to be already familiar with the core concepts of Django: models, views, templates, forms, and the all-important URL configuration. You won't get a feel for these concepts unless you actually start writing a Web application and understand the hard way how everything fits together. Once you have this understanding, and if you want to continue on the path of creating more Django apps, it's time for you to pick up "Pro Django".

Marty Alchin doesn't waste time delving into aspects of Python which are typically not used to their full potential by many people (including me): metaclasses, introspection, decorators, descriptors. In fact, the themes of introspection, customization and extension (which all take advantage of the dynamic nature of Python) keep coming up in almost every chapter of the book.

For example, the 'Models' chapter shows how to subclass model fields and how the use of metaclasses allows a field to know its name, and the class it was assigned to. The chapter also talks about the nifty technique of creating models dynamically at runtime. The 'URLs and Views' chapter goes into the gory details of the Django URL configuration mechanism, and shows how to use decorators to make views as generic as possible.

My favorite chapter was 'Handling HTTP'. It exemplifies what for me is the best part about Alchin's book: showing readers where and how to insert their own advanced processing code into the hooks provided by Django, without disturbing the flow of the framework. This is typically one of the hard parts of learning a Web framework, and Marty Alchin does a great job of explaining how to achieve a maximum of effect with a minimum of effort in this area, for example by writing your own middleware modules and inserting them into Django.

I also liked the last two chapters, 'Coordinating applications' and 'Enhancing applications', which show practical examples of code aggregated in mini-applications. In fact, this is also the main gripe I have about this book: I wish the author used more mini-applications throughout the book to explain the advanced concepts he described. He did show code snippets for each concept, but they were all isolated, and sometimes hard to place into the context of an application. I realize that space was limited, but it would have been so much nicer to see a real application being built and described throughout the book, with more and more functionality added at each stage.

Overall, I really enjoyed reading "Pro Django". However, reading such a book is just a start. What I really need to do is to start writing code and applying some of the new techniques I learned. I can't wait to do it!

Wednesday, January 21, 2009

Watch that Apache KeepAlive setting!

The Apache KeepAlive directive specifies that TCP/IP connections from clients to the Apache server are to be kept 'alive' for a given duration specified by the value of KeepAliveTimeout (the default is 15 seconds). This is useful when you serve out heavy HTML with embedded images or other resources, since browsers will open just one TCP/IP connection to the Apache server and all the resources from that page will be retrieved via that connection.

If, however, your Apache server handles small individual resources (such as images), then KeepAlive is overkill, since it will make every TCP connection linger for N seconds. Given a lot of clients, this can quickly saturate your Apache server in terms of network connections.

So...if you have a decent server that doesn't seem to be overloaded in terms of CPU/memory, yet Apache is slow-to-unresponsive, check out the KeepAlive directive and try setting it to Off. Note that the default value is On.

More Apache performance tuning tips are in the official Apache documentation.

Sunday, January 04, 2009

Happy New Year and....Teach Me Web Testing!

Happy New Year everybody! I hope 2009 won't rush by as quickly as 2008 did...

Now for the 'Teach Me Web Testing' part: Steve Holden graciously offered to be the host of an Open Space at PyCon 2009 on this topic. Steve started the 'Teach Me...' series at the last PyCon, with his now famous 'Teach Me Twisted' session.

For this format to work, we need to put together an audience which is formed of at least 3 types of people:

1) people interested in learning about Web testing in Python
2) people who write Python Web testing tools for fun and profit
3) people who use Python Web testing tools extensively for fun and profit

My role here is to rally people in categories 2 and 3. So if you're either a Web testing tool author or somebody who uses Web testing tools extensively in your job, please either comment on this post, or send me email at grig at gheorghiu dot net and let me know if you'd be interested in attending this Open Space session. Knowing Steve, I can guarantee it will be LOTS of fun.

Tuesday, December 16, 2008

Some issues when restoring files using duplicity

I blogged a while back about how to do incremental encrypted backups to S3 using duplicity. I've been testing the restore procedure for some of my S3 backups, and I had a problem with the way duplicity deals with temporary directories and files it creates during the restore.

By default, duplicity will use the system default temporary directory, which on Unix is usually /tmp. If you have insufficient disk space in /tmp for the files you're trying to restore from S3, the restore operation will eventually fail with "IOError: [Errno 28] No space left on device".

One thing you can do is create another directory on a partition with lots of disk space, and specify that directory in the duplicity command line using the --tempdir command line option. Something like: /usr/local/bin/duplicity --tempdir=/lotsofspace/temp

However, it turns out that this is not sufficient. There's still a call to os.tmpfile() buried in the patchdir.py module installed by duplicity. Consequently, duplicity will still try to create temporary files in /tmp, and the restore operation will still fail. As a workaround, I solved the issue in a brute-force kind of way by editing /usr/local/lib/python2.5/site-packages/duplicity/patchdir.py (the path is obviously dependent on your Python installation directory) and replacing the line:


tempfp = os.tmpfile()

with the line:


tempfp, filename = tempdir.default().mkstemp_file()

(I also needed to import tempdir at the top of patchdir.py; tempdir is a module which is part of duplicity and which deals with temporary file and directory management -- I guess the author of duplicity just forgot to replace the call to os.tmpfile() with the proper calls to the tempdir methods such as mkstemp_file).

This solved the issue. I'll try to open a bug somehow with the duplicity author.

Friday, December 12, 2008

Working with Amazon EC2 regions

Now that Amazon offers EC2 instances based in data centers in Europe, there is one more variable that you need to take into account when using the EC2 API: the concept of 'region'. Right now there are 2 regions to choose from: us-east-1 (based of course in the US on the East Coast), and the new region eu-west-1 based in Western Europe. Knowing Amazon, they will probably launch data centers in other regions across the globe -- Asia, South America, etc.

Each region has several availability zones. You can see the current ones in this nice article from the AWS Developer Zone. The default region is us-east-1, with 3 availability zones (us-east-1a, 1b and 1c). If you don't specify a region when you call an EC2 API tool, then the tool will query the default region. That's why I was baffled when I tried to launch a new AMI in Europe; I was calling 'ec2-describe-availability-zones' and it was returning only the US ones. After reading the article I mentioned, I realized I need to have 2 versions of my scripts: the old one I had will deal with the default US-based region, and the new one will deal with the Europe region by adding '--region eu-west-1' to all EC2 API calls (you need the latest version of the EC2 API tools from here).

You can list the zones available in a given region by running:


# ec2-describe-availability-zones --region eu-west-1
AVAILABILITYZONE    eu-west-1a    available    eu-west-1
AVAILABILITYZONE    eu-west-1b    available    eu-west-1

Note that all AWS resources that you manage belong to a given region. So if you want to launch an AMI in Europe, you have to create a keypair in Europe, a security group in Europe, find available AMIs in Europe, and launch a given AMI in Europe. As I said, all this is accomplished by adding '--region eu-west-1' to all EC2 API calls in your scripts.

Another thing to note is that the regions are separated in terms of internal DNS too. While you can access AMIs within the same zone based on their internal DNS names, this access doesn't work across regions. You need to use the external DNS name of an instance in Europe if you want to ssh into it from an instance in the US (and you also need to allow the external IP of the US instance to access port 22 in the security policy for the European instance.)

All this introduces more headaches from a management/automation point of view, but the benefits obviously outweigh the cost. You get low latency for your European customers, and you get more disaster recovery options.

Thursday, December 11, 2008

Deploying EC2 instances from the command line

I've been doing a lot of work with EC2 instances lately, and I wrote some simple wrappers on top of the EC2 API tools provided by Amazon. These tools are Java-based, and I intend to rewrite my utility scripts in Python using the boto library, but for now I'm taking the easy way out by using what Amazon already provides.

After downloading and unpacking the EC2 API tools, you need to set the following environment variables in your .bash_profile file:

export EC2_HOME=/path/to/where/you/unpacked/the/tools/api
export EC2_PRIVATE_KEY = /path/to/pem/file/containing/your/ec2/private/key
export EC2_CERT = /path/to/pem/file/containing/your/ec2/cert

You also need to add $EC2_HOME/bin to your PATH, so the command-line tools can be found by your scripts.

At this point, you should be ready to run for example:

# ec2-describe-images -o amazon

which lists the AMIs available from Amazon.

If you manage more than a handful of EC2 AMIs (Amazon Machine Instances), it quickly becomes hard to keep track of them. When you look at them for example using the Firefox Elasticfox extension, it's very hard to tell which is which. One solution I found to this is to create a separate keypair for each AMI, and give the keypair a name that specifies the purpose of that AMI (for example mysite-db01). This way, you can eyeball the list of AMIs in Elasticfox and make sense of them.

So the very first step for me in launching and deploying a new AMI is to create a new keypair, using the ec2-add-keypair API call. Here's what I have, in a script called create_keypair.sh:

# cat create_keypair.sh
#!/bin/bash

KEYNAME=$1

if [ -z "$KEYNAME" ]
then
echo "You must specify a key name"
exit 1

fi

ec2-add-keypair $KEYNAME.keypair >  ~/.ssh/$KEYNAME.pem
chmod 600  ~/.ssh/$KEYNAME.pem

Now I have a pem file called $KEYNAME.pem containing my private key, and Amazon has my public key called $KEYNAME.keypair.

The next step for me is to launch an 'm1.small' instance (the smallest instance you can get from EC2) whose AMI ID I know in advance (it's a 32-bit Fedora Core 8 image from Amazon with an AMI ID of ami-5647a33f). I am also using the key I just created. My script calls the ec2-run-instances API.

# cat launch_ami_small.sh
#!/bin/bash

KEYNAME=$1

if [ -z "$KEYNAME" ]
then
echo "You must specify a key name"
exit 1

fi

# We launch a Fedora Core 8 32 bit AMI from Amazon
ec2-run-instances ami-5647a33f -k $KEYNAME.keypair --instance-type m1.small -z us-east-1a

Note that the script makes some assumptions -- such as the fact that I want my AMI to reside in the us-east-1a availability zone. You can obviously add command-line parameters for the availability zone, and also for the instance type (which I intend to do when I rewrite this in Python).

Next, I create an EBS volume which I will attach to the AMI I just launched. My create_volume.sh script takes an optional argument which specifies the size in GB of the volume (and otherwise sets it to 50 GB):

# cat create_volume.sh
#!/bin/bash

SIZE=$1
if [ -z "$SIZE" ]
then
SIZE=50

fi

ec2-create-volume -s $SIZE -z us-east-1a

The volume should be created in the same availability zone as the instance you intend to attach it to -- in my case, us-east-1a.

My next step is to attach the volume to the instance I just launched. For this, I need to specify the instance ID and the volume ID -- both values are returned in the output of the calls to ec2-run-instances and ec2-create-volume respectively.

Here is my script:


# cat attach_volume_to_ami.sh
#!/bin/bash

VOLUME_ID=$1
AMI_ID=$2

if [ -z "$VOLUME_ID" ] ||  [ -z "$AMI_ID" ]
then
echo "You must specify a volume ID followed by an AMI ID"
exit 1

fi

ec2-attach-volume $VOLUME_ID -i $AMI_ID -d /dev/sdh

This attaches the volume I just created to the AMI I launched and makes it available as /dev/sdh.

The next script I use does a lot of stuff. It connects to the new AMI via ssh and performs a series of commands:
* format the EBS volume /dev/sdh as an ext3 file system
* mount /dev/sdh as /var2, and copy the contents of /var to /var2
* move /var to /var.orig, create new /var
* unmount /var2 and re-mount /dev/sdh as /var
* append the mounting as /dev/sdh as /var to /etc/fstab so that it happens upon reboot

Before connecting via ssh to the new AMI, I need to know its internal DNS name or IP address. I use ec2-describe-instances to list all my running AMIs, then I copy and paste the internal DNS name of my newly launched instance (which I can isolate because I know the keypair name it runs with).

Here is the script which formats and mounts the new EBS volume:


# cat format_mount_ebs_as_var_on_ami.sh
#!/bin/bash

AMI=$1
KEYNAME=$2

if [ -z "$AMI" ] || [ -z "$KEY" ]
then
echo "You must specify an AMI DNS name or IP followed by a keypair name"
exit 1

fi

CMD='mkdir /var2; mkfs.ext3 /dev/sdh; mount -t ext3 /dev/sdh /var2; \
mv /var/* /var2/; mv /var /var.orig; mkdir /var; umount /var2;  \
echo "/dev/sdh /var ext3 defaults 0 0" >>/etc/fstab; mount /var'

ssh -i ~/.ssh/$KEY.pem root@$AMI $CMD

The effect is that /var is now mapped to a persistent EBS volume. So if I install MySQL for example, the /var/lib/mysql directory (where the data resides by default in Fedora/CentOS) will be automatically persistent. All this is done without interactively logging in to the new instance. so it can be easily scripted as part of a larger deployment procedure.

That's about it for the bare-bones stuff you have to do. I purposely kept my scripts simple, since I use them more to remember what EC2 API tools I need to run than anything else. I don't do a lot of command-line option stuff and error-checking stuff, but they do their job.

If you run scripts similar to what I have, you should have at this point a running AMI with a 50 GB EBS volume mounted as /var. Total running time of all these scripts -- 5 minutes at most.

As soon as I have a nicer Python script which will do all this and more, I'll post it here.

Thursday, December 04, 2008

New job at OpenX

I meant to post this for a while, but haven't had the time, because...well, it's a new job, so I've been quite swamped. I started 2 weeks ago as a system engineer at OpenX, a company based in Pasadena, whose main product is an Open Source ad server. I am part of the 'black ops' team, and my main task for now is to help with deploying and scaling the OpenX Hosted service within Amazon EC2 -- which is just one of several cloud computing providers that OpenX uses (another one is AppNexus for example).

Lots of Python involved in this, lots of automation, lots of testing, so all this makes me really happy :-)

Here is some stuff I've been working on, which I intend to post on with more details as time permits:

* command-line provisioning of EC2 instances
* automating the deployment of the OpenX application and its pre-requisites
* load balancing in EC2 using HAProxy
* monitoring with Hyperic
* working with S3-backed file systems

I'll also start working soon with slack, a system developed at Google for automatic provisioning of files via the interesting concept of 'roles'. It's in the same family as cfengine or puppet, but simpler to use and with a powerful inheritance concept applied to roles.

All in all, it's been a fun and intense 2 weeks :-)

Sunday, November 30, 2008

The sad state of open source monitoring tools

I've been looking lately at open source network monitoring tools. I'm not impressed at all by what I've seen so far. Pretty much the least common denominator when it comes to this type of tools is Nagios, which is not a bad tool (I used it a few years ago), but did you see its Web interface? It's soooooo 1999 -- think 'Perl CGI scripts'!

A slew of other tools are based on the Nagios engine, and are trying hard to be more pleasing to the eye -- Opsview and GroundWork are some examples. Opsview seems just a wrapper around Nagios, with not a lot of improvements in terms of both functionality and UI.

I looked at the GroundWork screencast and it seemed promising, but when I tried to install it I had a very unpleasant experience. First of all, the install script uses curses (did those guys hear about unattended installs?), and requires Java 1.5. Although I had both Java 1.5 and 1.6 on my CentOS server, and JAVA_HOME set correctly, it didn't stop the installer from complaining and exiting. Good riddance.

I should say that the first open source network monitoring tool that I tried was Zenoss, which is supposed to be the poster child for Python-based monitoring tools. Believe me, I tried hard to like it. I even went back and gave it a second chance, after noticing that other tools aren't any better. But to no avail -- I couldn't get past the sensation that it's a half-baked tool, with poor documentation and obscure user interface. It could work fine if you just want to monitor some devices with SNMP, but as soon as you try to extend it with your own plugins (called Zen Packs), or if you try to use their agents (called Zen Plugins), you run into a wall. At least I did. I got tired of Python tracebacks, obscure references to 'restarting Zope' (I thought it's based on twisted), fiddling with values for the so-called zProperties of a device, trying unsuccessfully to get ssh key authentication to work with the Zen Plugins, etc, etc. I'm not the only one who went through these frustrations either -- there are plenty of other users saying in the Zenoss forums that they've had it, and that they're going to look for something else. Which is what I did too.

I also tried OpenNMS, which was better than Zenoss, but it still had a CGI feel in terms of its Web interface.

So...for now I settled on Hyperic. It's a Java-based tool with a modern Web interface, very good documentation, and it's extensible via your own plugins (which you can write in any language you want, as long as you conform to some conventions which are not overly restrictive). Hyperic uses agents that you install on every server you need to monitor. I don't mind this, I find it better than configuring SNMP to death. It does have it quirks -- for example it calls devices that it monitors 'platforms' (instead of just 'devices' or 'servers'), and it calls the plugins that monitor specific services 'servers' (instead of services). Once you get used to it, it's not that bad. However, I wish there was a standard nomenclature for this stuff, as well as a standard way for these tools to inter-operate. As it is, you have to learn each tool and train your brain to ignore all the weirdness that it encounters. Not an optimal scenario by any means.

I'm very curious to see what tools other people use. If you care to leave a comment about your monitoring tool of choice, please do so!

I'll report back with more stuff about my experiences with Hyperic.

Friday, November 21, 2008

Issues with Ubuntu 8.10 on Lenovo T61p laptop

I got a new Lenovo ThinkPad T61p, and of course I promptly installed Ubuntu Ibex 8.10 on it. The first day I used it, I had no issues, but this morning it froze no less than 3 times, and each time the Caps Lock light flashed. I googled around, and I found what I hope is the solution in this post on the Ubuntu forums. It seems that this is the core issue:

System lock-ups with Intel 4965 wireless

The version of the iwlagn wireless driver for Intel 4965 wireless chipsets included in Linux kernel version 2.6.27 causes kernel panics when used with 802.11n or 802.11g networks. Users affected by this issue can install the linux-backports-modules-intrepid package, to install a newer version of this driver that corrects the bug. (Because the known fix requires a new version of the driver, it is not expected to be possible to include this fix in the main kernel package.)

As recommended, I did 'apt-get install backports-modules-intrepid' and I rebooted. That was around 1 hour ago, and I haven't seen any issues since. Hopefully that was it. BTW, when the Caps Lock light blinks, it means 'kernel panic'. Who knew.

Thursday, November 13, 2008

Python and MS Azure

You've probably heard by now of Microsoft's entry in the cloud computing race, dubbed Azure. What I didn't know until I saw it this morning on InfoQ was that Microsoft encourages the use of languages and tools other than their official ones. Here's what they say on the 'What is the Azure Service Platform' page:

"Windows Azure is an open platform that will support both Microsoft and non-Microsoft languages and environments. Windows Azure welcomes third party tools and languages such as Eclipse, Ruby, PHP, and Python."

While you and I may think MS says this just for marketing/PR purposes, it turns out they are walking the walk a bit. I was glad to see in the InfoQ article that a Microsoft guy wrote a Python wrapper on top of the Azure Data Storage APIs. Note that this is classic CPython, not IronPython. I assume more interesting stuff can be done with IronPython.

Wednesday, November 12, 2008

"phrase from nearest book" meme

Via Elliot:

Grab the nearest book.
Open it to page 56.
Find the fifth sentence.
Post the text of the sentence in your journal along with these instructions.
Don’t dig for your favorite book, the cool book, or the intellectual one: pick the CLOSEST.

Here's mine, from 'Kim' by Kipling:

"A little later a marriage procession would strike into the Grand Trunk with music and shoutings, and a smell of marigold and jasmine stronger even than the reek of the dust."

Not bad, I like it :-)

Friday, October 31, 2008

Migrating SSL certs from IIS to Apache

If you ever find yourself facing this task, use this guide -- worked perfectly for me.

I work for this guy

Best Halloween costume I've ever seen. You rock Nate!!!

Monday, October 27, 2008

This is depressing: Ken Thompson is also a googler

Just found out that Ken Thompson, one of the creators of Unix, works at Google. You can see his answers to various questions addressed to Google engineers by following the link with his name on this page.

So let's see, Google has hired:

* Ken Thompson == Unix
* Vint Cerf == TCP/IP
* Andrew Morton == #2 in Linux
* Guido van Rossum == Python
* Ben Collins-Sussman and Brian Fitzpatrick == subversion
* Bram Moolenaar == vim

...and I'm sure there are countless others that I missed.

If this isn't a march towards world domination, I don't know what is :-)

Thursday, October 16, 2008

The case of the missing profile photo

Earlier today I posted a blog entry, then I went to view it on my blog, only to notice that my profile photo was conspicuously absent. I double-checked the URL for the source of the image -- it was http://agile.unisonis.com/gg.jpg. Then I remembered that I recently migrated agile.unisonis.com to my EC2 virtual machine. I quickly ssh-ed into my EC2 machine and saw that the persistent storage volume was not mounted. I ran uptime and noticed that it only showed 8 hours, so the machine had somehow been rebooted. In my experiments with setting up that machine, I had failed to add a line to /etc/fstab that causes the persistent storage volume to be mounted after the rebooted. Easily rectified:

echo "/dev/sds /ebs1 ext3 defaults 0 0" >> /etc/fstab

I connected to my EC2 environment with ElasticFox and saw that the EBS volume was still attached to my machine instance as /dev/sds, so I mounted it via 'mount /dev/sds/ /ebs1', then restarted httpd and mysqld, and all my sites were again up and running.

I tested my setup by rebooting. After the reboot, another surprise: httpd and mysqld were not chkconfig-ed on, so they didn't start automatically. I fixed that, I rebooted again, and finally everything came back as expected.

A few lessons learned here in terms of hosting your web sites in 'the cloud':

1) you need to test your machine setup across reboots
2) you need automated tests for your machine setup -- things like 'is httpd chkconfig-ed on?'; 'is /dev/sds mounted as /ebs1 in /etc/fstab?'
3) you need to monitor your sites from a location outside the cloud which hosts your sites; I shouldn't have to eyeball a profile photo to realize that my EC2 instance is not functioning properly!

I'll cover all these topics and more soon in some other posts, so stay tuned!

Recommended book: "Scalable Internet Architectures"

One of my co-workers, Nathan, introduced me to this book -- "Scalable Internet Architectures" by Theo Schlossnagle. I read it in one sitting. Recommended reading for anybody who cares about scaling their web site in terms of both web/application servers and database servers. It's especially appropriate in our day and age, when cloud computing is all the rage (more on this topic in another series of posts). My preferred chapters were "Static Content Serving" (talks about wackamole and spread) and "Static Meets Dynamic" (talks about web proxy caches such as squid).

I wish the database chapter contained more in-depth architectural discussions; instead, the author spends a lot of time showing a Perl script that is supposed to illustrate some of the concepts in the chapter, but falls very short of that in my opinion.

Overall though, highly recommended.

Wednesday, October 08, 2008

Example Django app needed

Dear lazyweb, I need a good sample Django application (with a database backend) to run on Amazon EC2. If the application has Ajax elements, even better.

Comments with suggestions would be greatly appreciated!

Thursday, October 02, 2008

Update on EC2 and EBS

I promised I'll give an update on my "Experiences with Amazon EC2 and EBS" post from a month ago. Well, I just got an email from Amazon, telling me:

Greetings from Amazon Web Services,

This e-mail confirms that your latest billing statement is available on the AWS web site. Your account will be charged the following:

Total: $73.74

So there you have it. That's how much it cost me to run the new SoCal Piggies wiki, as well as some other small sites, with very little traffic. Your mileage will definitely vary, especially if you run a high-traffic site.

I also said I'll give an update on running a MySQL database on EBS. It turns out it's really easy. On my Fedora Core 8 AMI, I did this:

* installed mysql packages via yum:


yum -y install mysql mysql-server mysql-devel

* moved the default data directory for mysql (/var/lib/mysql) to /ebs1/mysql (where /ebs1 is the mount point of my 10 GB EBS volume), then symlinked /ebs1/mysql back to /var/lib, so that everything continues to work as expected as far as MySQL is concerned:


service mysqld stop
mv /var/lib/mysql /ebs1/mysql
ln -s /ebs1/mysql /var/lib
service mysqld start

That's about it. I also used the handy snapshot functionality in the ElasticFox plugin and backed up the EBS volume to S3. In case you lose your existing EBS volume, you just create another volume from the snapshot, specify a size for it, and associate it with your AMI instance. Then you mount it as usual.

Update 10/03/08

In response to comments inquiring about a more precise breakdown of the monthly cost, here it is:

$0.10 per Small Instance (m1.small) instance-hour (or partial hour) x 721 hours = $72.10

$0.100 per GB Internet Data Transfer - all data transfer into Amazon EC2 x 0.607 GB = $0.06

$0.170 per GB Internet Data Transfer - first 10 TB / month data transfer out of Amazon EC2 x 2.719 GB = $0.46

$0.010 per GB Regional Data Transfer - in/out between Availability Zones or when using public IP or Elastic IP addresses x 0.002 GB = $0.01

$0.10 per GB-Month of EBS provisioned storage x 9.958 GB-Mo = $1.00

$0.10 per 1 million EBS I/O requests x 266,331 IOs = $0.03

$0.15 per GB-Month of EBS snapshot data stored x 0.104 GB-Mo = $0.02

$0.01 per 1,000 EBS PUT requests (when saving a snapshot) x 159 Requests = $0.01

EC2 TOTAL: $73.69
Other S3 costs (outside of EC2): $0.05

GRAND TOTAL: $73.74

Friday, September 19, 2008

Presubmit testing at Google

Here is an interesting blog post from Marc Kaplan, test engineering manager at Google, on their strategy of running what they call 'presubmit tests' -- tests that are run automatically before the code gets checked in. They include performance tests, and they compare the performance of the new code with baselines from the previous week, then report back nice graphs showing the delta. Very cool.

Monday, September 15, 2008

"Unmaintained Free Software" wiki

Thanks to Heikki Toivonen, who left a comment to my previous post and pointed me to this Unmaintained Free Software wiki. Python-related projects on that site are here. Hmmm...RPM is an unmaintained Python project? Don't think so. That site could use some love...maybe it is itself in need of a maintainer? This seems like a good Google App Engine project -- to put together a similar site with a database back-end, showing unmaintained Open Source projects....

Saturday, September 13, 2008

Know of any Open Source projects that need maintainers?

I got an email on a testing-related mailing list from somebody who would like to take over an Open Source project with no current maintainer. Here's a fragment of that email:

"Folks,
As I am interested in brushing up on my coding skills, so I would
appreciate your help in identifying an existing orphan/dormant
open-source tool/toolset project who needs an owner/maintainer.

I am especially interested in software process-oriented tools that
fill a hole in an agile development/test/management tool stack."

If anybody knows of such projects, especially with a testing or agile bent, please leave a comment here. Thanks!

Tuesday, September 02, 2008

Getting around the Firefox port-blocking annoyance

Firefox 3.x has introduced something I'm sure they call a 'feature', but is a major annoyance for any sysadmin and developer -- they block access to ports other than 80. I thought IE was the only browser that was brain-dead that way, but Firefox has proved me wrong. Anyway, here's a simple recipe for getting around this:

1) go to about:config in the Firefox address bar
2) right click, choose new->string
3) enter the name network.security.ports.banned.override and the value 1-65535
4) there is no step 4

Monday, September 01, 2008

Experiences with Amazon EC2 and EBS

I decided to port some of the sites I've been running for the last few years on a dedicated server (running RHEL9) to an Amazon EC2 AMI (which stands for 'Amazon Machine Image'). I also wanted to use some more recent features offered by Amazon in conjunction with their EC2 platform -- such as the permanent block-based storage AKA the Elastic Block Store (EBS), and also the permanent external IP addresses AKA the Elastic IPs.

To get started, I used a great blog post on 'Persistent Django on Amazon EC2 and EBS' by Thomas Brox Røst. I will refer here to some of the steps that Thomas details in his post; if you want to follow along, you're advised to read his post.

1) Create an AWS account and sign up for the EC2 service.

2) Install the ElasticFox Firefox extension -- the greatest thing since sliced bread in terms of managing EC2 AMIs. To run the ElasticFox GUI, go to Tools->ElasticFox in Firefox; this will launch a new tabbed window showing the GUI. From now on, I will abbreviate ElasticFox as EF.

3) Add your AWS user name and access keys in EF (use the Credentials button).

4) Add an EC2 security group (click on the 'Security Groups' tab in EF); this can be thought of as a firewall rule that will replace the default one. In my case, I called my group 'gg' and I allowed ports 80 and 443 (http and https) and 22 (ssh).

5) Add a keypair to be used when you ssh into your AMI (click on the 'KeyPairs' tab in EF). I named mine gg-ec2-keypair and I saved the private key in my .ssh folder on my local machine (.ssh/gg-ec2-keypair.pem).

6) Get a fixed external IP (click on the 'Elastic IPs' tab in EF). You will be assigned an IP which is not yet associated with any AMI.

7) Get a block-based storage volume that you can format later into a file system (click on the 'Volumes and Snapshots' tab in EF). I got a 10 GB volume.

These 7 steps are the foundation of everything else you need to do when running an AMI. Choosing and launching the AMI itself is the next step, which you can run any time you want to launch an AMI.

I followed Thomas's example and chose a 32-bit Fedora Core 8 image for my AMI. In EF, you can search for Fedora 8 images by going to the 'AMIs and Instances' tab and typing fedora-8 in the search box. Right click on the desired image (mine was called ec2-public-images/fedora-8-i386-base-v1.07.manifest.xml) and choose 'Launch instance(s) of this AMI'. You will need to choose a keypair (I chose the one I created earlier, gg-ec2-keypair), an availability zone (I chose the 'us-east-1a') and a security group (I removed the default one and added the one I created earlier).

You should immediately see the instance in a 'pending' state in the Instances list. After a couple of minutes, if you click Refresh you'll see it in the 'running' state, which means it's ready for you to access and work with.

Once my AMI was running, I right-clicked it and chose 'copy instance ID to clipboard'. The instance ID is needed to associate the EBS volume and the Elastic IP to this instance.

To associate the fixed external IP, I went to the 'Elastic IPs' tab in EF, right clicked on the Elastic IP I was assigned and chose 'Associate this address', then I indicated the instance ID of my running AMI. As a side note, if you don't see anything in a given EF list (such as Elastic IPs or Volumes), click Refresh and you should see it.

To associate the EBS volume, I went to the 'Volumes and Snapshots' tab in EF, right clicked on the volume I had created, then chose 'Attach this volume'. In the next dialog box, I specified the instance ID of my AMI, then /dev/sdh as the volume name.

The next step is to ssh into your AMI and format the raw block storage into a file system. You can use the Elastic IP you were assigned (let's call it A.B.C.D), and run:


$ ssh -i .ssh/your-private-key.pem [email protected]

At this point, you should be logged in into your AMI. To format the EBS volume, run:


# mkdir /ebs1; mount -t ext3 /dev/sdh /ebs1

If you want the mount point to persist across reboots, also add this line to /etc/fstab:


$ echo "/dev/sdh /ebs1 ext3 noatime 0 0" >> /etc/fstab

At this point, you have a bare-bones Fedora Core 8 instance accessible via HTTP, HTTPS and SSH at the IP address A.B.C.D. Not very useful in and of itself, unless you install your application.

In my case, the first Web site I wanted to port over was the SoCal Piggies wiki, at www.socal-piggies.org. I used to run it on MoinMoin 1.3.1on my old server, but for this brand-new AMI experiment I installed MoinMoin 1.7.1. I also had to install httpd and python-devel via yum. And since we're talking about package installs, here's the main point you should take away from this post: you need to install all required packages every time you re-launch your AMI. I'm not talking about rebooting your AMI, which preserves your file systems; I'm talking about terminating your AMI for any reason, then re-launching a new AMI instance. This operation will start your AMI with a clean slate in terms of packages that are installed. You can obviously re-mount the EBS volume that you created, and all your files will still be there, but those are typically application or database files, and not the actual required packages themselves (such as httpd or python-devel).

So, very important point: as soon as you start porting applications over to your AMI, you'd better start designing the layout of your apps so that they take full advantage of the EBS volume(s) you created. You'll also have to script the installation of the required packages, so you can easily run the script every time you launch a new instance of your AMI. This can be seen as a curse, but to me it's a blessing in disguise, because it forces you to automate the installation of your applications. Automation entails faster deployment, less errors, better testability. In short, you win in the long run.

For the first application I ported, the SoCal Piggies wiki, I made the following design decisions:

a) I chose to install MoinMoin 1.7.1 from scratch every time I launch a new AMI instance; I also install httpd, httpd-devel and python-devel from scratch every time
b) I chose to point the specific instance of the Piggies wiki to /ebs1/wikis/socal-piggies, so all the actual content of the wiki is kept persistently in the EBS volume
c) I moved /etc/httpd to /ebs1/httpd, then I created a symlink from /ebs1/httpd to /etc, so all the Apache configuration files are kept persistently in the EBS volume
d) I pointed the DocumentRoot of the Apache virtual host for the Piggies wiki to /ebs1/www/socal-piggies, so that all the static files that need to be accessed via the www.socal-piggies.org domain are kept persistenly in the EBS volume

So what do I have to do if I decide to terminate the current AMI instance, and launch a new one? Simple -- I first associate the Elastic IP and the EBS volume with the new instance via EF, then I ssh into the new AMI (which has the same external IP as the old one) and run this command line:


# mkdir /ebs1; mount -t ext3 /dev/sdh /ebs1

Then I go to /ebs1/scripts and run this script:

# cat mysetup.sh
#!/bin/bash

# Install various packages via yum
yum -y install python-devel
yum -y install httpd httpd-devel

# Create symlinks
mv /etc/httpd /etc/httpd.orig
ln -s /ebs1/httpd /etc

# Download and install MoinMoin
cd /tmp
rm -rf moin*
wget http://static.moinmo.in/files/moin-1.7.1.tar.gz
tar xvfz moin-1.7.1.tar.gz
cd moin-1.7.1
python setup.py install

# Start apache
service httpd start

# Make sure /ebs1 is mounted across reboots
echo "/dev/sdh /ebs1 ext3 noatime 0 0" >> /etc/fstab

Even better, I can script all this on my local machine, so I don't even have to log in via ssh. This is the command I run on my local machine:

ssh -i ~/.ssh/gg-ec2-keypair.pem 75.101.140.75 'mkdir /ebs1; mount -t ext3 /dev/
sdh /ebs1; /ebs1/scripts/mysetup.sh'

That's it! At this point, I have the Piggies wiki running on a brand-new AMI.

Two caveats here:

1) the ssh fingerprint of the remote AMI that had been saved in .ssh/known_hosts on your local machine will no longer be valid, so you'll get a big security warning the first time you will try ssh-ing into your new AMI. Just delete that line from known_hosts and ssh again.
2) it takes a while (for me it was up to 5 minutes) for the Elastic IP to be ready for you to ssh into after you associate it with a brand-new AMI; so in a disaster recovery situation, keep in mind that your site can potentially be down for 10-15 minutes, time in which you launch a new AMI, associate the Elastic IP and the EBS volume with it, and run your setup scripts.

My experience so far with EC2 and EBS has been positive. As I already mentioned, the fact that it forces you to design your application to take advantage of the persistent EBS volume, and to script the installation of the pre-requisite packages, is a net positive in my opinion.

The next step for me will be to port other sites with a MySQL database backend. Fun fun fun! I will blog soon about my experiences. In the mean time, go ahead and browse the brand-new SoCal Piggies wiki :-)

Thursday, August 28, 2008

Back up your Windows desktop to S3 with SecoBackup

I found SecoBackup to be a good tool for backing up a Windows machine to S3. They have a 'free community edition' version, but you will still pay more in terms of your S3 costs than what you would normally pay to Amazon. You basically sign up for the SecoBackup service 'powered by AWS' and you pay $0.20 per GB of storage and $0.20 per GB of bandwidth -- so double what you'd pay if you stored it directly on S3. You don't even need to have an Amazon S3 account, they take care of it transparently for you.

I think this is a good tool for backing up certain files on Windows-based desktops. For example I back up my Quicken files from within a Windows XP virtual image that I run inside VMWare workstation on top of my regular Ubuntu Hardy desktop.

Tuesday, August 26, 2008

Ruby refugees flocking to Python?

I just wanted to put it out there that I know at least one person who was very fired up about Ruby, only to find out that all the available Ruby jobs are for Ruby-on-Rails programmers. He doesn't like Web programming, so what was he to do? You guessed it -- he started to learn Python :-)

RTFL

No, this is not a misspelling for ROTFL, but rather a variant of RTFM. It stands for Read The F...riendly Log. It's a troubleshooting technique that is very basic, yet surprisingly overlooked. I use it all the time, and I just want to draw attention to it in case you find yourself stumped by a problem that seems mysterious.

Here are some recent examples from my work.

Apache wouldn't start properly

A 'ps -def | grep http' would show only the main httpd process, with no worker processes. The Apache error log showed these lines:

Digest: generating secret for digest authentication

A google search for this line revealed this article:

http://www.raptorized.com/2006/08/11/apache-hangs-on-digest-secret-generation/

It turns out the randomness/entropy on that box had been exhausted. I grabbed the rng-tools tar.gz from sourceforge, compiled and installed it, then ran

rngd -r /dev/urandom

...and apache started its worker processes instantly.

Cannot create InnoDB tables in MySQL

Here, all it took was to read the MySQL error log in /var/lib/mysql. It's very friendly indeed, and tells you exactly what to do!

InnoDB: Error: data file ./ibdata1 is of a different size
InnoDB: 2176 pages (rounded down to MB)
InnoDB: than specified in the .cnf file 128000 pages!
InnoDB: Could not open or create data files.
InnoDB: If you tried to add new data files, and it failed here,
InnoDB: you should now edit innodb_data_file_path in my.cnf back
InnoDB: to what it was, and remove the new ibdata files InnoDB created
InnoDB: in this failed attempt. InnoDB only wrote those files full of
InnoDB: zeros, but did not yet use them in any way. But be careful: do not
InnoDB: remove old data files which contain your precious data!

Windows-based Web sites are displaying errors

Many times I've seen Windows/IIS based Web sites displaying cryptical errors such as:

Server Error in '/' Application.

Runtime Error

The IIS logs are much less friendly in terms of useful information than the Apache logs. However, the Event Viewer is a good source of information. In a recent case, inspecting the Event Viewer told us that the account used to connect from the Web server to the DB server had expired, so re-enabling it was all it took to fix the issue.

In conclusion -- RTFL and google it! You'll be surprised how large of a percentage of issues you can solve this way.