Tuesday, August 31, 2010

Poor man's MySQL disaster recovery in EC2 using EBS volumes

First of all, I want to emphasize that this is NOT a disaster recovery strategy I recommend. However, in a pinch, it might save your ass. Here's the scenario I have:

  • 2 m1.large instances running Ubuntu 10.04 64-bit and the Percona XtraDB MySQL builds (for the record, the exact version I'm using is "Server version: 5.1.43-60.jaunty.11-log (Percona SQL Server (GPL), XtraDB 9.1, Revision 60")
  • I'll call the 2 servers db101 and db201
  • each server is running 2 MySQL instances -- I'll call them m1 and m2
  • instance m1 on db101 and instance m1 on db201 are set up in master-master replication (and similar for instance m2)
  • the DATADIR for m1 is /var/lib/mysql/m1 on each server; that file system is mounted from an EBS volume (and similar for m2)
  • the configuration files for m1 are in /etc/mysql1 on each server -- that directory was initially a copy of the Ubuntu /etc/mysql configuration directory, which I then customized (and similar for m2)
  • the init.d script for m1 is in /etc/init.d/mysql1 (similar for m2)
What I tested:
  • I took a snapshot of each of the 2 EBS volumes associated with each of the DB servers (4 snapshots in all)
  • I terminated the 2 m1.large instances
  • I launched 2 m1.xlarge instances and installed the same Percona distribution (this was done via a Chef recipe at instance launch time); I'll call the 2 new instances xdb101 and xdb102
  • I pushed the configuration files for m1 and m2, as well as the init.d scripts (this was done via fabric)
  • I created new volumes from the EBS snapshots (note that these volumes can be created in any EC2 availability zone)
  • On xdb101, I attached the 2 volumes created from the EBS snapshots on db101; I specified /dev/sdm and /dev/sdn as the device names (similar on xdb201)
  • On xdb101, I created /var/lib/mysql/m1 and mounted /dev/sdm there; I also created /var/lib/mysql/m2 and mounted /dev/sdn there (similar on xdb201)
  • At this point, the DATADIR directories for both m1 and m2 are populated with 'live files' from the moment when I took the EBS snapshot
  • I made sure syslog-ng accepts UDP traffic from localhost (by default it doesn't); this is because by default in Ubuntu mysql log messages are sent to syslog --> to do this, I ensured that "udp(ip(127.0.0.1) port(514));" appears in the "source s_all" entry in /etc/syslog-ng/syslog-ng.conf
At this point, I started up the first MySQL instance on xdb101 via "/etc/init.d/mysql1 start". This script most likely will show [fail] on the console, because MySQL will not start up normally. If you look in /var/log/syslog, you'll see entries similar to:

Aug 31 18:03:21 xdb101 mysqld: 100831 18:03:21 [Note] Plugin 'FEDERATED' is disabled.
Aug 31 18:03:21 xdb101 mysqld: InnoDB: The InnoDB memory heap is disabled
Aug 31 18:03:21 xdb101 mysqld: InnoDB: Mutexes and rw_locks use GCC atomic builtins
Aug 31 18:03:22 xdb101 mysqld: 100831 18:03:22  InnoDB: highest supported file format is Barracuda.
Aug 31 18:03:23 xdb101 mysqld: InnoDB: The log sequence number in ibdata files does not match
Aug 31 18:03:23 xdb101 mysqld: InnoDB: the log sequence number in the ib_logfiles!
Aug 31 18:03:23 xdb101 mysqld: 100831 18:03:23  InnoDB: Database was not shut down normally!
Aug 31 18:03:23 xdb101 mysqld: InnoDB: Starting crash recovery.

If you wait a bit longer (and if you're lucky), you'll see entries similar to:

Aug 31 18:04:20 xdb101 mysqld: InnoDB: Restoring possible half-written data pages from the doublewrite
Aug 31 18:04:20 xdb101 mysqld: InnoDB: buffer...
Aug 31 18:04:24 xdb101 mysqld: InnoDB: In a MySQL replication slave the last master binlog file
Aug 31 18:04:24 xdb101 mysqld: InnoDB: position 0 15200672, file name mysql-bin.000015
Aug 31 18:04:24 xdb101 mysqld: InnoDB: and relay log file
Aug 31 18:04:24 xdb101 mysqld: InnoDB: position 0 15200817, file name ./mysqld-relay-bin.000042
Aug 31 18:04:24 xdb101 mysqld: InnoDB: Last MySQL binlog file position 0 17490532, file name /var/lib/mysql/m1/mysql-bin.000002
Aug 31 18:04:24 xdb101 mysqld: 100831 18:04:24 InnoDB Plugin 1.0.6-9.1 started; log sequence number 1844705956
Aug 31 18:04:24 xdb101 mysqld: 100831 18:04:24 [Note] Recovering after a crash using /var/lib/mysql/m1/mysql-bin
Aug 31 18:04:24 xdb101 mysqld: 100831 18:04:24 [Note] Starting crash recovery...
Aug 31 18:04:24 xdb101 mysqld: 100831 18:04:24 [Note] Crash recovery finished.

At this point, you can do "/etc/init.d/mysql1 restart" just to make sure that both stopping and starting that instance work as expected. Repeat for instance m2, and also repeat on server xdb201.

So....IF you are lucky and the InnoDB crash recovery process did its job, you should have 2 functional MySQL instances one each of xdb101 and xdb201. I tested this with several pairs of servers and it worked for me every time, but I hasten to say that YMMV, so DO NOT bet on this as your disaster recovery strategy!

At this point I still had to re-establish the master-master replication between m1 on xdb101 and m1 on xdb201 (and similar for m2). 

When I initially set up this replication between the original m1.large servers, I used something like this on both db101 and db201:

CHANGE MASTER TO MASTER_HOST='master1', MASTER_PORT=3306, MASTER_USER='masteruser', MASTER_PASSWORD='xxxxxx';"

The trick for me is that master1 points to db201 in db101's /etc/hosts, and vice-versa.

On the newly created xdb101 and xdb201, there are no entries for master1 in /etc/hosts, so replication is broken. Which is a good thing initially, because you want to have the MySQL instances on each server be brought back up without throwing replication into the mix.

Once I added an entry for master1 in xdb101's /etc/hosts pointing to xdb201, and did the same on xdb201, I did a 'stop slave; start slave; show slave status\G' on the m1 instance on each server. In all cases I tested, one of the slaves was showing everything OK, while the other one was complaining about   not being able to read from the master's log file. This was fairly simply to fix. Let's assume xdb101 is the one complaining. I did the following:
  • on xdb201, I ran 'show master status\G' and noted the file name (for example "mysql-bin.000017") and the file position (for example 106)
  • on xdb101, I ran the following command: "stop slave; change master to master_log_file='mysql-bin.000017', master_log_pos=106; start slave;"
  • not a 'show slave status\G' on xdb101 should show everything back to normal
Some lessons:
  • take periodic snapshots of your EBS volumes (at least 1/day)
  • for a true disaster recovery strategy, use at least mysqldump to dump your DB to disk periodically, or something more advanced such as Percona XtraBackup; I recommend dumping the DB to an EBS volume and taking periodic snapshots of that volume
  • the procedure I detailed above is handy when you want to grow your instance 'vertically' -- for example I went from m1.large to m1.xlarge

Friday, August 20, 2010

Visualizing MySQL metrics with the munin-mysql plugin

Munin is a great tool for resource visualization. Sometimes though installing a 3rd party Munin plugin is not as straightforward as you would like. I have been struggling a bit with one such plugin, munin-mysql, so I thought I'd spell it out for my future reference. My particular scenario is running multiple MySQL instances on various port numbers (3306 and up) on the same machine. I wanted to graph in particular the various InnoDB metrics that munin-mysql supports. I installed the plugin on various Ubuntu flavors such as Jaunty and Lucid.

Here are the steps:

1) Install 2 pre-requisite Perl modules for munin-mysql: IPC-ShareLite and Cache-Cache

2) git clone http://github.com/kjellm/munin-mysql

3) cd munin-mysql; edit Makefile and point PLUGIN_DIR to the directory where your munin plugins reside (if you installed Munin on Ubuntu via apt-get, that directory is /usr/share/munin/plugins)

4) make install --> this will copy the mysql_ Perl script to PLUGIN_DIR, and the mysql_.conf file to /etc/munin/plugin-conf.d

5) Edit /etc/munin/plugin-conf.d/mysql_.conf and customize it with your specific MySQL information.

For example, if you run 2 MySQL instances on ports 3306 and 3307, you could have something like this in mysql_.conf:


[mysql_3306_*]
env.mysqlconnection DBI:mysql:mysql;host=127.0.0.1;port=3306
env.mysqluser myuser1
env.mysqlpassword mypassword1

[mysql_3307_*]
env.mysqlconnection DBI:mysql:mysql;host=127.0.0.1;port=3307
env.mysqluser myuser2
env.mysqlpassword mypassword2

6) Run "/usr/share/munin/plugins/mysql_ suggest" to see what metrics are supported by the plugin. Then proceed to create symlinks in /etc/munin/plugins, adding the port number and the metric name as the suffix.

For example, to track InnoDB I/O metrics for the MySQL instance running on port 3306, you would create this symlink:

ln -s /usr/share/munin/plugins/mysql_ /etc/munin/plugins/mysql_3306_innodb_io

(replace 3306 with 3307 to track this metric for the other MySQL instance running on port 3307)

Of course, it's easy to automate this by a simple shell script.

7) Restart munin-node and wait 10-15 minutes for the munin master to receive the information about the new metrics.

Important! If you need to troubleshoot this plugin (and any Munin plugin), do not make the mistake of simply running the plugin script directly in the shell. If you do this, it will not read the configuration file(s) correctly, and it will most probably fail. Instead, what you need to do is to follow the "Debugging Munin plugins" documentation, and run the plugin through the munin-run utility. For example:


# munin-run mysql_3306_innodb_io
ib_io_read.value 34
ib_io_write.value 57870
ib_io_log.value 8325
ib_io_fsync.value 55476

One more thing: you should probably automate all these above steps. I have most of it automated via a fabric script. The only thing I do by hand is to create the appropriate symlinks for the specific port numbers I have on each server.

That's it! Enjoy staring for hours at your brand new MySQL metrics!


Monday, August 16, 2010

MySQL and AppArmor on Ubuntu

This is just a quick post that I hope will save some people some headache when they try to customize their MySQL setup on Ubuntu. I've spent some quality time with this problem over the weekend. I tried in vain for hours to have MySQL read its configuration files from a non-default location on an Ubuntu 9.04 server, only to figure out that it was all AppArmor's fault.

My ultimate goal was to run multiple instances of MySQL on the same host. In the past I achieved this with MySQL Sandbox, but this time I wanted to use MySQL installed from Debian packages and not from a tarball of the binary distribution, and MySQL Sandbox has some issues with that.

Here's what I did: I copied /etc/mysql to /etc/mysql0, then I edited /etc/mysql0/my.cnf and modified the location of the socket file, the pid file and the datadir to non-default locations. Then I tried to run:

/usr/bin/mysqld_safe --defaults-file=/etc/mysql0/my.cnf

At this point, /var/log/daemon.log showed this error:

mysqld[25133]: Could not open required defaults file: /etc/mysql0/my.cnf
mysqld[25133]: Fatal error in defaults handling. Program aborted

It took me as I said a few hours trying all kinds of crazy things until I noticed lines like these in /var/log/syslog:

kernel: [18593519.090601] type=1503 audit(1281847667.413:22): operation="inode_permission" requested_mask="::r" denied_mask="::r" fsuid=0 name="/etc/mysql0/my.cnf"
 pid=4884 profile="/usr/sbin/mysqld"

This made me realize it's AppArmor preventing mysqld from opening non-default files. I don't need AppArmor on my servers, so I just stopped it with 'service apparmor stop' and chkconfig-ed it off....at which point every customization I had started to work perfectly.

At least 2 lessons:

1) when you see mysterious, hair-pulling errors, check security-related processes on your server: iptables, AppArmor, SELinux etc.

2) check all log files in /var/log -- I was focused on daemon.log and didn't notice the errors in syslog quickly enough

Google didn't help when I searched for "mysqld Could not open required defaults file". I couldn't find any reference to AppArmor, only to file permissions.

Tuesday, August 03, 2010

What automated deployment/config mgmt tools do you use?

I posted this question yesterday as a quick tweet. I got a bunch of answers already that I'll include here, but feel free to add your answers as comments to this post too. Or reply to @griggheo on Twitter.

I started by saying I have 2 favorite tools: Fabric for pushing app state (pure Python) and Chef for pulling/bootstraping OS/package state (pure Ruby). For more discussions on push vs. pull deployment tools, see this post of mine.

Here are the replies I got on Twitter so far:

@keyist : Fabric and Chef for me as well. use Fabric to automate uploading cookbooks+json and run chef-solo on server

@vvuksan : mcollective for control, puppet for config mgmt/OS config. Some reasons why outlined here http://j.mp/cAKarI

@RackerHacker : There is another solution besides ssh and for loops? :-P

@chris_mahan : libcloud, to pop debian stable on cloud instance, fabric to set root passwd, install python2.6.5, apt-get nginx php django fapws.

@alfredodeza : I'm biased since I wrote it, but I use Pacha: http://code.google.com/p/pacha

@tcdavis : Fabric for remote calls; distribute/pip for python packaging and deps; Makefile to eliminate any repetition.

@competentgirl : Puppet for its expandability and integration w/ other tools (ie svn)

@yashh : Fabric. bundle assets, push to bunch of servers and restart in a shot.. love it

@bitprophet : I'm biased but I use Fab scripts for everything. Was turned off by daemons/declarative/etc aspects of Chef/Puppet style systems.

@kumar303 : @bitprophet we use Cap only because we do 2-way communication with remote SSH procs. Been meaning to look at patching Fab for this

Update with more Twitter replies:

@lt_kije : cfengine on work cluster (HPC/HTC), radmind on personal systems (OpenBSD)

@zenmatt : check out devstructure, built on puppet, more natural workflow for configuring servers.

@almadcz : paver for building, fabric or debian repo for deployment, buildbot for forgetting about it.

@geo_kollias: I use fabric for everything as @bitprophet, but i am thinking about making use of @hpk42 's execnet soon.

Wednesday, July 21, 2010

Bootstrapping EC2 instances with Chef

This is the third installment of my Chef post series (read the first and the second). This time I'll show how to use the Ubuntu EC2 instance bootstrap mechanism in conjunction with Chef and have the instance configure itself at launch time. I had a similar post last year, in which I was accomplishing a similar thing with puppet.

Why Chef this time, you ask? Although I am a Python guy, I prefer learning a smattering of Ruby rather than a proprietary DSL for configuration management. Also, when I upgraded my EC2 instances to the latest Ubuntu Lucid AMIs, puppet stopped working, so I was almost forced to look into Chef -- and I've liked what I've seen so far. I don't want to bad-mouth puppet though, I recommend you look into both if you need a good configuration management/deployment tool.

Here is a high-level view of the bootstrapping procedure I'm using:

1) You create Chef roles and tie them to cookbooks and recipes that you want executed on machines which will be associated with these roles.
2) You launch an EC2 Ubuntu AMI using any method you want (the EC2 Java-based command-line API, or scripts based on boto, etc.). The main thing here is that you pass a custom shell script to the instance via a user-data file.
3) When the EC2 instance boots up, it runs your custom user-data shell script. The script installs chef-client and its prerequisites, downloads the files necessary for running chef-client, runs chef-client once to register with the chef master and to run the recipes associated with its role, and finally runs chef-client in the background so that it wakes up and executed every N minutes.

Here are the 3 steps in more detail.

1) Create Chef roles, cookbooks and recipes

I already described how to do this in my previous post.

For the purposes of this example, let's assume we have a role called 'base' associated with a cookbook called 'base' and another role called 'myapp' associated with a cookbook called 'myapp'.

The 'base' cookbook contains recipes that can do things like installing packages that are required across all your applications, creating users and groups that you need across all server types, etc.

The 'myapp' cookbook contains recipes that can do things specific to one of your particular applications -- in my case, things like installing and configuring tornado/nginx/haproxy.

As a quick example, here's how to add a user and a group both called "myoctopus". This can be part of the default recipe in the cookbook 'base' (in the file cookbooks/base/recipes/default.rb).

The home directory is /home/myoctopus, and we make sure that directory exists and is owned by the user and group myoctopus.

group "myoctopus" do
action :create
end

user "myoctopus" do
gid "myoctopus"
home "/home/myoctopus"
shell "/bin/bash"
end

%w{/home/myoctopus}.each do |dir|
directory dir do
owner "myoctopus"
group "myoctopus"
mode "0755"
action :create
not_if "test -d #{dir}"
end
end


The role 'base' looks something like this, in a file called roles/base.rb:

name "base"
description "Base role (installs common packages)"
run_list("recipe[base]")


The role 'myapp' looks something like this, in a file called roles/myapp.rb:


name "myapp"
description "Installs required packages and applications for an app server"
run_list "recipe[memcached]", "recipe[myapp::tornado]"


Note that the role myapp specifies 2 recipes to be run: one is the default recipe of the 'memcached' cookbook (which is part of the Opscode cookbooks), and one is a reciped called tornado which is part of the myapp cookbook (the file for that recipe is cookbooks/myapp/recipes/tornado.rb). Basically, to denote a recipe, you either specify its cookbook (if the recipe is the default recipe of that cookbook), or you specify cookbook::recipe_name (if the recipe is non-default).

So far, we haven't associated any clients with these roles. We're going to do that on the client EC2 instance. This way the Chef server doesn't have to do any configuration operations during the bootstrap of the EC2 instance.

2) Launching an Ubuntu EC2 AMI with custom user-data

I wrote a Python wrapper around the EC2 command-line API tools. To launch an EC2 instance, I use the ec2-run-instances command-line tool. My Python script also takes a command line option called chef_role, which specifies the Chef role I want to associate with the instance I am launching. The main ingredient in the launching of the instance is the user-data file (passed to ec2-run-instances via the -f flag).

I use this template for the user-data file. My Python wrapper replaces HOSTNAME with an actual host name that I pass via a cmdline option. The Python wrapper also replaces CHEF_ROLE with the value of the chef_role cmdline option (which defaults to 'base').

The shell script which makes up the user-data file does the following:

a) Overwrites /etc/hosts with a version that has hardcoded values for chef.mycloud and mysite.com. The chef.mycloud.com box is where I run Chef server, and mysite.com is a machine serving as a download repository for utility scripts.

b) Downloads Eric Hammond's runurl script, which it uses to run other utility scripts.

c) Executes via runurl the script mysite.com/customize/hostname and passes it the real hostname of the machine being launched. The hostname script simply sets the hostname on the machine:

#!/bin/bash
hostname $1
echo $1 > /etc/hostname


d) Executes via runurl the script mysite.com/customize/hosts and passes it 2 arguments: add and self. Here's the hosts script:

#!/bin/bash
if [[ "$1" == "add" ]]; then
IPADDR=`ifconfig | grep 'inet addr:'| grep -v '127.0.0.1' | cut -d: -f2 | awk '{ print $1}'`
HOSTNAME=`hostname`
sed -i "s/127.0.0.1 localhost.localdomain localhost/127.0.0.1 localhost.localdomain localhost\n$IPADDR $HOSTNAME.mycloud.com $HOSTNAME\n/g" /etc/hosts
fi

What this does is it adds the internal IP of the machine being launched to /etc/hosts and associates it with both the FQDN and the short hostname. The FQDN bit is important for chef configuration purposes. It needs to come before the short form in /etc/hosts. I could have obviously also used DNS, but at bootstrap time I prefer to deal with hardcoded host names for now.

Update 07/22/10

Patrick Lightbody sent me a note saying that it's easier to get the local IP address of the machine by using one of the handy EC2 internal HTTP queries.

If you run "curl -s http://169.254.169.254/latest/meta-data" on any EC2 instance, you'll see a list of variables that you can inspect that way. For the local IP, I modified my script above to use:

IPADDR=`curl -s http://169.254.169.254/latest/meta-data/local-ipv4`

e) Finally, and most importantly for this discussion, executes via runurl the script mysite.com/install/chef-client and passes it the actual value of the cmdline argument chef_role. The chef-client script does the heavy lifting in terms of installing and configuring chef-client on the instance being launched. As such, I will describe it in the next step.

3) Installing and configuring chef-client on the newly launched instance

Here is the chef-client script I'm using. The comments are fairly self-explanatory. Because I am passing CHEF_ROLE as its first argument, the script knows which role to associate with the client. It does it by downloading the appropriate chef.${CHEF_ROLE}.json. To follow the example, I have 2 files corresponding to the 2 roles I created on the Chef server.

Here is chef.base.json:

{
"bootstrap": {
"chef": {
"url_type": "http",
"init_style": "init",
"path": "/srv/chef",
"serve_path": "/srv/chef",
"server_fqdn": "chef.mycloud.com"
}
},
"run_list": [ "role[base]" ]
}

The only difference in chef.myapp.json is the run_list, which in this case contains both roles (base and myapp):

{
"bootstrap": {
"chef": {
"url_type": "http",
"init_style": "init",
"path": "/srv/chef",
"serve_path": "/srv/chef",
"server_fqdn": "chef.mycloud.com"
}
},
"run_list": [ "role[base]", "role[myapp]" ]
}

The chef-client script also downloads the client.rb file which contains information about the Chef server:



log_level :info
log_location STDOUT
ssl_verify_mode :verify_none
chef_server_url "http://chef.mycloud.com:4000"

validation_client_name "chef-validator"
validation_key "/etc/chef/validation.pem"
client_key "/etc/chef/client.pem"

file_cache_path "/srv/chef/cache"
pid_file "/var/run/chef/chef-client.pid"

Mixlib::Log::Formatter.show_time = false

Note that the client knows the IP address of chef.mycloud.com because we hardcoded it in /etc/hosts.

The chef-client script also downloads validation.pem, which is an RSA key file used by the Chef server to validate the client upon the initial connection from the client.

The last file downloaded is the init script for launching chef-client automatically upon reboots. I took the liberty of butchering this sample init script and I made it much simpler (see the gist here but beware that it contains paths specific to my environment).

At this point, the client is ready to run this chef-client command which will contact the Chef server (via client.rb), validate itself (via validation.pem), download the recipes associated with the roles specified in chef.json, and run these recipes:

chef-client -j /etc/chef/chef.json -L /var/log/chef.log -l debug

I run the command in debug mode and I specify a log file location (the default output is stdout) so I can tell what's going on if something goes wrong.

That's about it. At this point, the newly launched instance is busy configuring itself via the Chef recipes. Time to sit back and enjoy your automated bootstrap process!

The last lines in chef-client remove the validation.pem file, which is only needed during the client registration, and run chef-client again, this time in the background, via the init script. The process running in the background looks something like this in my case:
/usr/bin/ruby1.8 /usr/bin/chef-client -L /var/log/chef.log -d -j /etc/chef/chef.json -c /etc/chef/client.rb -i 600 -s 30

The -i 600 option means chef-client will contact the Chef server every 600 seconds (plus a random interval given by -s 30) and it will inquire about additions or modifications to the roles it belongs to. If there are new recipes associated with any of the roles, the client will download and run them.

If you want to associate the client to new roles, you can just edit the local file /etc/chef/chef.json and add the new roles to the run_list.

Thursday, July 15, 2010

Tracking and visualizing mail logs with MongoDB and gviz_api

To me, nothing beats a nice dashboard for keeping track of how your infrastructure and your application are doing. At Evite, sending mail is a core part of our business. One thing we need to ensure is that our mail servers are busily humming away, sending mail out to our users. To this end, I built a quick outgoing email tracking tool using MongoDB and pymongo, and I also put together a dashboard visualization of that data using the Google Visualization API via the gviz_api Python module.

Tracking outgoing email from the mail logs with pymongo

Mail logs are sent to a centralized syslog. I have a simple Python script that tails the common mail log file every 5 minutes, counts the lines that conform to a specific regular expression (looking for a specific msgid pattern), then inserts that count into a MongoDB database. Here's the snippet of code that does that:

import datetime
from pymongo import Connection

conn = Connection(host="myhost.example.com")
db = conn.logs
maillogs = db.mail
d = {}
now = datetime.now()
d['insert_time'] = now
d['msg_count'] = msg_count
maillogs.save(d)

I use the pymongo module to open a connection to the host running the mongod daemon, then I declare a database called logs and a collection called maillogs within that database. Note that both the database and the collection are created on the fly in case they don't exist.

I then instantiate a Python dictionary with two keys, insert_time and msg_count. Finally, I use the save method on the maillogs collection to insert the dictionary into the MongoDB logs database. Can't get any easier than this.

Visualizing the outgoing email count with graph_viz

I have another simple Python script which queries the MongoDB logs database for all documents that have been inserted in the last hour. Here's how I do it:


MINUTES_AGO=60
conn = Connection()
db = conn.logs
maillogs = db.mail
now = datetime.datetime.now()
minutes_ago = now + datetime.timedelta(minutes=-MINUTES_AGO)
rows = maillogs.find({'insert_time': {"$gte": minutes_ago}})

As an aside, when querying MongoDB databases that contain documents with timestamp fields, the datetime module will become your intimate friend.

Just remember that you need to pass datetime objects when you put together a pymongo query. In the case above, I use the now() method to get the current timestamp, then I use timedelta with minutes=-60 to get the datetime object corresponding to 'now minus 1 hour'.

The gviz_api module has decent documentation, but it still took me a while to figure out how to use it properly (thanks to my colleague Dan Mesh for being the trailblazer and providing me with some good examples).

I want to graph the timestamps and message counts from the last hour. Using the pymongo query above, I get the documents inserted in MongoDB during the last hour. From that set, I need to generate the data that I am going to pass to gviz_api:


chart_data = []
for row in rows:
insert_time = row['insert_time']
insert_time = insert_time.strftime(%H:%M')
msg_count = int(row['msg_count'])
chart_data.append([insert_time, msg_count])

jschart("Outgoing_mail", chart_data)


In my case, chart_data is a list of lists, each list containing a timestamp and a message count.

I pass the chart_data list to the jschart function, which does the Google Visualization magic:

def jschart(name, chart_data):
description = [
("time", "string"),
("msg_count", "number", "Message count"),
]

data = []
for insert_time, msg_count in chart_data:
data.append((insert_time, msg_count))

# Loading it into gviz_api.DataTable
data_table = gviz_api.DataTable(description)
data_table.LoadData(data)

# Creating a JSON string
json = data_table.ToJSon()

name = "OUTGOING_MAIL"
html = TEMPL % {"title" : name, "json" : json}
open("charts/%s.html" % name, "w").write(html)

The important parts in this function are the description and the data variables. According to the docs, they both need to be of the same type, either dictionary or list. In my case, they're both lists. The description denotes the schema for the data I want to chart. I declare two variables I want to chart, insert_time of type string, and msg_count of type number. For msg_count, I also specify a user-friendly label called 'Message count', which will be displayed in the chart legend.

After constructing the data list based on chart_data, I declare a gviz_api DataTable, I load the data into it, I call the ToJSon method on it to get a JSON string, and finally I fill in a template string, passing it a title for the chart and the JSON data.

The template string is an HTML + Javascript snippet that actually talks to the Google Visualization backend and tells it to create an Area Chart. Click on this gist to view it.

That's it. I run the gviz_api script every 5 minutes via crontab and I generate an HTML file that serves as my dashboard.

I can easily also write a Nagios plugin based on the pymongo query, which would alert me for example if the number of outgoing email messages is too low or too high. It's very easy to write a Nagios plugin by just having a script that exits with 0 for success, 1 for warnings and 2 for critical errors. Here's a quick example, where wlimit is the warning threshold and climit is the critical threshold:


def check_maillogs(wlimit, climit):
# MongoDB
conn = Connection()
db = conn.logs
maillogs = db.mail
now = datetime.datetime.now()
minutes_ago = now + datetime.timedelta(minutes=-MINUTES_AGO)
count = maillogs.find({'insert_time': {"$gte": minutes_ago}}).count()
rc = 0
if count > wlimit:
rc = 1
if count > climit:
rc = 2
print "%d messages sent in the last %d minutes" % (count, MINUTES_AGO)
return rc


Update #1
See Mike Dirolf's comment on how to properly insert and query timestamp-related fields. Basically, use datetime.datetime.utcnow() instead of now() everywhere, and convert to local time zone when displaying.

Update #2
Due to popular demand, here's a screenshot of the chart I generate. Note that the small number of messages is a very, very small percentage of our outgoing mail traffic. I chose to chart it because it's related to some new functionality, and I want to see if we're getting too few or too many messages in that area of the application.

Friday, July 09, 2010

Working with Chef cookbooks and roles

Welcome to the second installment of my Chef saga (you can read the first one here). This time I will walk you through creating your own cookbook, modifying an existing cookbook, creating a role and adding a client machine to that role. As usual, I got much help from the good people on the #chef IRC channel, especially the omnipresent @kallistec. All these tasks are also documented in one form or another on the Chef wiki, but I found it hard to put them all together, hence this blog post.

Downloading the Opscode Chef cookbooks

Step 0 for this task is to actually clone the Opscode repository. I created a directory called /srv/chef/repos on my Chef server box and ran this command inside it:

# git clone git://github.com/opscode/chef-repo.git

This will create /srv/chef/repos/chef-repo with a bunch of sub-directories, one of them being cookbooks.
I deleted the cookbooks directory and cloned the Opscode cookbooks in its place:

# cd /srv/chef/repos/chef-repo
# git clone git://github.com/opscode/cookbooks



Uploading the cookbooks to the Chef server

Just downloading the cookbooks somewhere on the file system is not enough. The Chef server needs to be made aware of their existence. You do that with the following knife command (which I ran on the Chef server box):

# knife cookbook upload -a -o /srv/chef/repos/chef-repo/cookbooks

BTW, if you specified a non-default configuration file location for knife when you configured it (I specified /etc/chef/knife.rb for example) then you need to make a symlink from that file to ~/.chef/knife.rb, otherwise knife will complain about not finding a config file. At least it complained to me in its strange Swedish accent.

To go back to the knife command above: it says to upload all (-a) cookbooks it finds under the directory specified with -o.

Modifying an existing cookbook

If you look closely under the Opscode cookbooks, there's one called python. A cookbook contains one or more recipes, which reside under COOKBOOK_NAME/recipes. Most cookbooks have only one recipe which is a file called default.rb. In the case of the python cookbook, this recipe ensures that certain python packages such as python-dev, python-imaging, etc. get installed on the node running Chef client. To add more packages, simply edit default.rb (there's a certain weirdness in modifying a Ruby file to make a node install more Python packages....) and add your packages of choice.

Again, modifying a cookbook recipe on the file system is not enough; you need to let the Chef server know about the modification, and you do it by uploading the modified cookbook to the Chef server via knife:

# knife cookbook upload python -o /srv/chef/repos/chef-repo/cookbooks

Note the modified version of the 'knife cookbook upload' command. In this case, we don't specify '-a' for all cookbooks, but instead we specify a cookbook name (python). However, the directory remains the same. Do not make the mistake of specifying /srv/chef/repos/chef-repo/cookbooks/python as the target of your -o parameter, because it will not work. Trust me, I tried it until I was enlightened by @kallistec on the #chef IRC channel.

Creating your own cookbook

It's time to bite the bullet and create your own cookbook. Chef makes it easy to create all the files needed inside a cookbook. Run this command when in the top-level chef repository directory (/srv/chef/repos/chef-repo in my case):

# rake new_cookbook COOKBOOK=octopus

(like many other people, I felt the urge to cook Paul the Octopus when he accurately predicted Germany's loss to Spain)

This will create a directory cscp neo-app01:/opt/evite/etc/ad_urls.json .
alled octopus under chef-repo/cookbooks and it will populate it with many other directories and files. At a minimum, you need to modify only octopus/recipes/default.rb. Let's assume you want to install some packages. You can take some inspiration from other recipes (build-essential for example), but it boils down to something like this:

include_recipe "build-essential"
include_recipe "ntp"
include_recipe "python"
include_recipe "screen"
include_recipe "git"

%w{chkconfig libssl-dev syslog-ng munin-node}.each do |pkg|
  package pkg do
    action :install
  end
end

Note that I'm also including other recipes in my default.rb file. They will be executed on the Chef client node, BUT they also need to be specified in the metadata file cookbooks/octopus/metadata.rb as dependencies, so that the Chef client node knows that it needs to download them before running them. Trust me on this one, I speak again from bitter experience. This is how my metadata.rb file looks like:

maintainer        "My Organization"
maintainer_email  "admin@example.com"
license           "Apache 2.0"
description       "Installs required packages for My Organization applications"
version           "0.1"
recipe            "octopus", "Installs required packages for My Organization applications"
depends           "build-essential"
depends           "ntp"
depends           "python"
depends           "screen"
depends           "git"

%w{ fedora redhat centos ubuntu debian }.each do |os|
  supports os
end

Now it's time to upload our brand new cookbook to the Chef server. As before, we'll use the knife utility:

# knife cookbook upload octopus -o /srv/chef/repos/chef-repo/cookbooks

If you have any syntax errors in the recipe file or the metadata file, knife will let you know about them. To verify that the cookbook was uploaded successfully to the server, run this command, which should list your cookbook along the others that you uploaded previously:

# knife cookbook list

Creating a role and associating a chef client machine with it

Chef supports the notion of roles, which is very important for automated configuration management because you can assign a client node to one or more roles (such as 'web' or 'db' for example) and have it execute recipes associated with those roles.

To add a role, simply create a file under chef-repo/roles. I called mine base.rb, with the contents:

name "base"
description "Base role (installs common packages)"
run_list("recipe[octopus]")

It's pretty self-explanatory. Clients associated with the 'base' role will run the 'octopus' recipe.

As with cookbooks, we need to upload the newly created role to the Chef server. The following knife command will do it (assuming you're in the chef-repo directory):

# knife role from file roles/base.rb

To verify that the role has been uploaded, you can run:

# knife role show base

and it should show a JSON output similar to:

{
    "name": "base",
    "default_attributes": {
    },
    "json_class": "Chef::Role",
    "run_list": [
      "recipe[octopus]"
    ],
    "description": "Base role (installs common packages)",
    "chef_type": "role",
    "override_attributes": {
    }
}

Now it's time to associate a Chef client with the new role. You can see which clients are already registered with your Chef server by running:

# knife client list

Now pick a client and see which roles it is associated with:

# knife node show client01.example.com -r
{
  "run_list": [
  ]
}

The run_list is empty for this client. Let's associate it with the role we created, called 'base':

# knife node run_list add client01.example.com "role[base]"

You should now see the 'base' role in the run_list:


# knife node show client01.example.com -r
{
  "run_list": [
    "role[base]"
  ]
}

The next time client01.example.com will run chef-client, it will figure out it is part of the 'base' role, and it will follow the role' run_list by downloading and running the 'octopus' recipe from the Chef server.

In the next installment, I will talk about how to automatically bootstrap a Chef client in an EC2 environment. The goal there is to launch a new EC2 instance, have it install Chef client, have it assign a role to itself, then have it talk to the Chef server, download and run the recipes associated with the role.

Tuesday, July 06, 2010

Chef installation and minimal configuration

I started to play with Chef the other day. The instructions on the wiki are a bit confusing, but help on twitter (thanks @jtimberman) and on the #chef IRC channel (thanks @kallistec) has been great. I am at the very minimal stage of having a chef client talking to a chef server. I hasten to write down what I've done so far, both for my own sake and for others who might want to do the same. My OS is Ubuntu 10.04 32-bit on both machines.

First of all: as the chef wiki says, make sure you have FQDNs correctly set up on both client and server, and that they can ping each other at a minimum using the FQDN. I added the FQDN to the local IP address line in /etc/hosts, so that 'hostname -f' returned the FQDN correctly. In what follows, my Chef server machine is called chef.example.com and my Chef client machine is called client.example.com.

Installing the Chef server


Here I went the Ruby Gems route, because the very latest Chef (0.9.4) had not been captured in the Ubuntu packages yet when I tried to install it.

a) install pre-requisites

# apt-get install ruby ruby1.8-dev libopenssl-ruby1.8 rdoc ri irb build-essential wget ssl-cert

b) install Ruby Gems

# wget http://production.cf.rubygems.org/rubygems/rubygems-1.3.7.tgz
# tar xvfz rubygems-1.3.7.tgz
# cd rubygems-1.3.7
# ruby setup.rb
# ln -sfv /usr/bin/gem1.8 /usr/bin/gem

c) install the Chef gem

# gem install chef

d) install the Chef server by bootstrapping with the chef-solo utility

d1) create /etc/chef/solo.rb with contents:


file_cache_path "/tmp/chef-solo"
cookbook_path "/tmp/chef-solo/cookbooks"
recipe_url "http://s3.amazonaws.com/chef-solo/bootstrap-latest.tar.gz"

d2) create /etc/chef/chef.json with contents:

{
"bootstrap": {
"chef": {
"url_type": "http",
"init_style": "runit",
"path": "/srv/chef",
"serve_path": "/srv/chef",
"server_fqdn": "chef.example.com",
"webui_enabled": true
}
},
"run_list": [ "recipe[bootstrap::server]" ]
}

d3) run chef-solo to bootstrap the Chef server install:

# chef-solo -c /etc/chef/solo.rb -j /etc/chef/chef.json

e) create an initial admin client with the Knife utility, to interact with the API

#knife configure -i
Where should I put the config file? [~/.chef/knife.rb]
Please enter the chef server URL: [http://localhost:4000] http://chef.example.com
Please enter a clientname for the new client: [root]
Please enter the existing admin clientname: [chef-webui]
Please enter the location of the existing admin client's private key: [/etc/chef/webui.pem]
Please enter the validation clientname: [chef-validator]
Please enter the location of the validation key: [/etc/chef/validation.pem]
Please enter the path to a chef repository (or leave blank):

f) create an intial Chef repository

I created a directory called /srv/chef/repos , cd-ed to it and ran this command:

# git clone git://github.com/opscode/chef-repo.git

At this point, you should have a functional Chef server, although it won't help you much unless you configure some clients.

Installing a Chef client

Here's the bare minimum I did to get a Chef client to just talk to the Chef server configured above, without actually performing any cookbook recipe yet (I leave that for another post).

The first steps are very similar to the ones I followed when I installed the Chef server.


a) install pre-requisites

# apt-get install ruby ruby1.8-dev libopenssl-ruby1.8 rdoc ri irb build-essential wget ssl-cert

b) install Ruby Gems

# wget http://production.cf.rubygems.org/rubygems/rubygems-1.3.7.tgz
# tar xvfz rubygems-1.3.7.tgz
# cd rubygems-1.3.7
# ruby setup.rb
# ln -sfv /usr/bin/gem1.8 /usr/bin/gem

c) install the Chef gem

# gem install chef

d) install the Chef client by bootstrapping with the chef-solo utility

d1) create /etc/chef/solo.rb with contents:

file_cache_path "/tmp/chef-solo"
cookbook_path "/tmp/chef-solo/cookbooks"
recipe_url "http://s3.amazonaws.com/chef-solo/bootstrap-latest.tar.gz"
Caveat for this stepCaveat for this step
d2) create /etc/chef/chef.json with contents:

{
"bootstrap": {
"chef": {
"url_type": "http",
"init_style": "runit",
"path": "/srv/chef",
"serve_path": "/srv/chef",
"server_fqdn": "chef.example.com",
"webui_enabled": true
}
},
"run_list": [ "recipe[bootstrap::client]" ]
}

Note that the only difference so far between the Chef server and the Chef client bootstrap files is the one directive at the end of chef.json, which is bootstrap::server for the server and bootstrap::client for the client.

Caveat for this step: if you mess up and bootstrap the client using the wrong chef.json file containing the bootstrap::server directive, you will end up with a server and not a client. I speak from experience -- I did exactly this, then when I tried to run chef-client on this box, I got:

WARN: HTTP Request Returned 401 Unauthorized: Failed to authenticate!

/usr/lib/ruby/1.8/net/http.rb:2097:in `error!': 401 "Unauthorized" (Net::HTTPServerException)

d3) run chef-solo to bootstrap the Chef client install:

# chef-solo -c /etc/chef/solo.rb -j /etc/chef/chef.json

At this point, you should have a file called client.rb in /etc/chef on your client machine, with contents similar to:

#
# Chef Client Config File
#
# Dynamically generated by Chef - local modifications will be replaced
#

log_level :info
log_location STDOUT
ssl_verify_mode :verify_none
chef_server_url "http://chef.example.com:4000"

validation_client_name "chef-validator"
validation_key "/etc/chef/validation.pem"
client_key "/etc/chef/client.pem"

file_cache_path "/srv/chef/cache"
pid_file "/var/run/chef/chef-client.pid"

Mixlib::Log::Formatter.show_time = false
Caveat for this stepCaveat for this stepCaveat for this step
e) validate the client against the server

e1) copy /etc/chef/validation.pem from the server to /etc/chef on the client
e2) run chef-client on the client; for debug purposes you can use:

# chef-client -l debug

If everything goes well, you should see a message of the type:

# chef-client
INFO: Starting Chef Run
INFO: Client key /etc/chef/client.pem is not present - registering
WARN: Node client.example.com has an empty run list.
INFO: Chef Run complete in 1.209376 sec WARN: HTTP Request Returned 401 Unauthorized: Failed to authenticate!
31 < ggheo > 30 /usr/lib/ruby/1.8/nCaveat for this stepet/http.rb:2097:in `error!': 401 "Unauthorized" (Net::HTTPServerException)onds
INFO: Running report handlers
INFO: Report handlers complete

You should also have a file called client.pem containing a private key that the client will be using when talking to the server. At this point, you should remove validation.pem from /etc/chef on the client, as it is not needed any more.

You can also run this command on the server to see if the client got registered with it:

# knife client list -c /etc/chef/knife.rb

The output should be something like:

[
"chef-validator",
"chef-webui",
"chef.example.com",
"root",
"client.example.com"
]

That's it for now. As I warned you, nothing exciting happened here except for having a Chef client that talks to a server but doesn't actually DO anything. Stay tuned for more installments in my continuing chef saga though...


Tuesday, June 15, 2010

Common nginx configuration options

Google reveals a wealth of tutorials and sample nginx config files, but in any case here are some configuration tips that have been helpful to me.

Include files

Don't be shy in splitting up your main nginx.conf file into several smaller files. Your co-workers will be grateful. A structure that has been working for me is to have one file where I define my upstream pools, one file where I define locations that point to upstream pools, and one file where I define servers that handle those locations.

Examples:

upstreams.conf

upstream cluster1 {
fair;
server app01:7060;
server app01:7061;
server app02:7060;
server app02:7061;
}

upstream cluster2 {
fair;
server app01:7071;
server app01:7072;
server app02:7071;
server app02:7072;
}

locations.conf


location / {
root /var/www;
include cache-control.conf;
index index.html index.htm;
}

location /services/service1 {
proxy_pass_header Server;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Scheme $scheme;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

add_header Pragma "no-cache";


proxy_pass http://cluster1/;
}

location /services/service2 {
proxy_pass_header Server;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Scheme $scheme;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

add_header Pragma "no-cache";

proxy_pass http://cluster2/service2;
}

servers.conf

server {
listen 80;
include locations.conf;
}

At this point, your nginx.conf looks very clean and simple (you can still split it into more include files, by separating for example the gzip configuration options into their own file etc.)

nginx.conf

worker_processes 4;
worker_rlimit_nofile 10240;

events {
worker_connections 10240;
use epoll;
}

http {
include upstreams.conf;

include mime.types;
default_type application/octet-stream;

log_format custom '$remote_addr - $remote_user [$time_local] '
'"$request" $status $bytes_sent '
'"$http_referer" "$http_user_agent" "$http_x_forwarded_for" $request_time';

access_log /usr/local/nginx/logs/access.log custom;

proxy_buffering off;
sendfile on;
tcp_nopush on;
tcp_nodelay on;

gzip on;
gzip_min_length 10240;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/xml application/xml+rss image/svg+xml application/x-font-ttf application/vnd.ms-fontobject;
gzip_disable "MSIE [1-6]\.";

# proxy cache config
proxy_cache_path /mnt/nginx_cache levels=1:2
keys_zone=one:10m
inactive=7d max_size=10g;
proxy_temp_path /var/tmp/nginx_temp;

proxy_next_upstream error;

include servers.conf;
}

This nginx.conf file is fairly vanilla in terms of the configuration options I used, but it's worth pointing some of them out.

Multiple worker processes

This is useful when you're running nginx on a multi-core box. Example:

worker_processes 4;

Increased number of file descriptors

This is useful for nginx instances that get hit by very high traffic. You want to increase the maximum number of file descriptors that nginx can use (the default on most Unix systems is 1024; run 'ulimit -n' to see the value on your system). Example:

worker_rlimit_nofile 10240;


Custom logging

See the log_format and access_log directives above. In particular, the "$http_x_forwarded_for" value is useful if nginx is behind another load balancer, and "$request_time" is useful to see the time taken by nginx when processing a request.

Compression

This is useful when you want to compress certain types of content sent back to the client. Examples:


gzip on;
gzip_min_length 10240;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/xml application/xml+rss image/svg+xml application/x-font-ttf application/vnd.ms-fontobject;
gzip_disable "MSIE [1-6]\.";

Proxy options

These are options you can set per location. Examples:


proxy_pass_header Server;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Scheme $scheme;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
add_header Pragma "no-cache";


Most of these options have to do with setting custome HTTP headers (in particular 'no-cache' in case you don't want to cache anything related to that particular location)

Proxy cache

Nginx can be used as a caching server. You need to define a proxy_cache_path and a proxy_temp_path under your http directive, then use them in the locations you want to cache.


proxy_cache_path /mnt/nginx_cache levels=1:2
keys_zone=one:10m
inactive=7d max_size=10g;
proxy_temp_path /var/tmp/nginx_temp;

In the location you want to cache, you would add something like this:

proxy_cache one;
proxy_cache_key mylocation.$request_uri;
proxy_cache_valid 200 302 304 10m;
proxy_cache_valid 301 1h;
proxy_cache_valid any 1m;
proxy_cache_use_stale error timeout invalid_header http_500 http_502 http_503 http_504 http_404;

HTTP caching options

Many times you want to cache certain types of content and not others. You can specify your caching rules in a file that you include in your root location:

location / {
root /var/www;
include cache-control.conf;

index index.html index.htm;
}

You can specify different expire headers and cache options based on the request URI. Examples (inside cache-control.conf in my case)

# default cache 1 day
expires +1d;

if ($request_uri ~* "^/services/.*$") {
expires +0d;
add_header Pragma "no-cache";
}

if ($request_uri ~* "^/(index.html)?$") {
expires +1h;
}

SSL

All you need to do here is to define another server in servers.conf, and have it include the locations you need (which can be the same ones handled by the server on port 80 for example):

server {
server_name www.example.com;
listen 443;
ssl on;
ssl_certificate /usr/local/nginx/ssl/cert.pem;
ssl_certificate_key /usr/local/nginx/ssl/cert.key;

include locations.conf;
}



syslog-ng tips and tricks

Although I've been contemplating using scribe for our logging needs, for now I'm using syslog-ng. It's been doing the job well so far. Here are a couple of configuration tips:

1) Sending log messages for a given log facility to a given log file

Let's say you want to send all haproxy log messages to a file called /var/log/haproxy.log. In haproxy.cfg you can say:

global
 log 127.0.0.1 local7 info

...which means -- log all messages to localhost, to log facility local7 and with a log level of info.

To direct these messages to a file called /var/log/haproxy.log, you need to define the following in /etc/syslog-ng/syslog-ng.conf:

i) a destination:

destination df_haproxy { file("/var/log/haproxy.log"); };

ii) a filter:

filter f_haproxy { facility(local7); };

iii) a log (which ties the destination to the filter):

log {
source(s_all);
filter(f_haproxy);
destination(df_haproxy);
};

You also need to configure syslog-ng to allow log messages sent via UPD from localhost. Add this line to the source s_all element:

udp(ip(127.0.0.1) port(514));

Important note: since you're sending haproxy log messages to the local7 facility, this means that they'll also be captured by /var/log/syslog and /var/log/messages, since they are configured in syslog-ng.conf as destinations for the filters f_syslog and f_messages, which by default catch the local7 facility. As a result, you'll have triple logging of your haproxy messages. The solution? Add local7 to the list of facilities excluded from the f_syslog and f_messages filters.

2) Sending log messages to a remote log host

Assume you want to centralize log messages for a given service by sending them to a remote log host. Let's assume that the service logs via the local0 facility. The same procedure applies, with the creation of the following elements in syslog-ng.conf:

i) a destination


destination df_remote_log {
  udp("remote_loghost" port (5000));
};


ii) a filter:


filter f_myservice { facility(local0); };

iii) a log:

log {
        source(s_all);
        filter(f_myservice);
        destination(df_remote_log);
};

Note that you can also send messages for this particular filter (corresponding to local0) to a local file, by creating a destination poining to that file and a log element tying the filter with that destination, like this:

destination df_local_log { file("/var/log/myservice.log"); };
log {
        source(s_all);
        filter(f_myservice);
        destination(df_local_log);
};

Finally, to finish the remote logging bit, you need to configure syslog-ng on the remote host to allow messages on UDP port 5000, and to log them to a local file. Here's my configuration on host "remote_loghost":

i) a new source allowing messages on port 5000:

source s_remote_logging {
    udp(ip(0.0.0.0) port(5000));
};

ii) a destination pointing to a local file:

destination df_common_log { file ("/var/log/myservice_common.log"); };

iii) a log combining the source and the destination above; I am using the predefined f_syslog filter here, because I don't need to select messages based on a given facility anymore:

log {
        source(s_remote_logging);
        filter(f_syslog);
        destination(df_common_log);
};




Thursday, June 03, 2010

Setting up PHP5/FastCGI with nginx

PHP is traditionally used with Apache, but can also be fronted by nginx. Here are some notes that I took while setting up PHP5/FastCGI behind nginx. My main source of inspiration was this howtoforge article. My OS flavor is Ubuntu 9.04 32-bit, but the same procedure applies to other flavors.

1) Install PHP5 and other required packages

# apt-get install php5 php5-cli php5-cgi php5-xcache php5-curl php5-sqlite libpcre3 libpcre3-dev libssl-dev

2) Configure xcache

I added the lines in the following gist to the end of /etc/php5/cgi/php.ini: http://gist.github.com/424172

3) Install spawn-fcgi, which used to be included in the lighttpd package, but now can be downloaded on its own from here.

# tar xvfz spawn-fcgi-1.6.3.tar.gz; cd spawn-fcgi-1.6.3; ./configure; make; make install

At this point you should have /usr/local/bin/spawn-fcgi installed. This wrapper needs to be launched at some point via a command line similar to this:

/usr/local/bin/spawn-fcgi -a 127.0.0.1 -p 9000 -u www-data -f /usr/bin/php5-cgi

What this does is it launches the php5-cgi process which will listen on port 9000 for php requests.

The original howtoforge article recommended writing an init.d wrapper for this command. I used to do this, but I noticed that the php5-cgi process dies quite frequently, so I needed something more robust. Hence...

4) Configure spawn-fcgi to be monitored by supervisor

I installed supervisor via:

# apt-get install python-setuptools
# easy_install supervisor
# echo_supervisord_conf > /etc/supervisord.conf

Then I added this section to /etc/supervisord.conf:

[program:php5-cgi]
command=/usr/local/bin/spawn-fcgi -n -a 127.0.0.1 -p 9000 -u www-data -f /usr/bin/php5-cgi
redirect_stderr=true          ; redirect proc stderr to stdout (default false)
stdout_logfile=/var/log/php5-cgi/php5-cgi.log
stdout_logfile_maxbytes=10MB   ; max # logfile bytes b4 rotation (default 50MB)

Note that I added the -n flag to the spawn-fcgi command line. This keeps the process it spawns (/usr/bin/php5-cgi) in the foreground, so that it can be daemonized and monitored properly by supervisor.

I use this init.d script to start/stop supervisord. Don't forget to 'chkconfig supervisord on' so that it starts at boot time.

Per the configuration above, I also created /var/log/php5-cgi and chown-ed it www-data:www-data. That directory will contain a file called php5-cgi.log which will capture both the stdout and stderr of the php5-cgi process.

When I started supervisord via 'service supervisord start', I was able to see /usr/bin/php5-cgi running. I killed that process, and yes, supervisor restarted it. May I just say that SUPERVISOR ROCKS!!!

5) Install and configure nginx

I prefer to install nginx from source.

# tar xvfz nginx-0.8.20.tar.gz; cd nginx-0.8.20; ./configure --with-http_stub_status_module; make; make install

To configure nginx to serve php files, add these 'location' sections to the server listening on port 80 in /usr/local/nginx/conf/nginx.conf:

      location / {
            root   /var/www/mydocroot;
            index  index.php index.html;
        }

    location ~ \.php$ {
        include /usr/local/nginx/conf/fastcgi_params;
        fastcgi_pass  127.0.0.1:9000;
        fastcgi_index index.php;
        fastcgi_param  SCRIPT_FILENAME  /var/www/mydocroot$fastcgi_script_name;
        }

(assuming of course your root document directory is /var/www/mydocroot)

Now nginx will foward all request to *.php to a process listening on port 9000 on localhost. That process is php5-cgi, as we are aware by now.

That's about it. Start up nginx and you should be in business.

Friday, May 21, 2010

Setting up a mail server in EC2 with postfix and Postini

Setting up a mail server in 'the cloud' is notoriously difficult, mainly because many cloud-based IP addresses are already blacklisted, and also because you don't have control over reverse DNS in many cases. One way to mitigate these factors is to use Postini for outbound email from your cloud-based mail server. I've experimented with this in a test environment, with fairly good results. Here are some notes I jotted down. I am indebted to Jeff Roberts for his help with the postfix setup.

Let's assume you have a domain called mydomain.com and you want to set up a mail server in EC2 (could be any cloud provider) so that you can both send and receive email. I will also assume that you'll use Postini -- which currently costs $11/year/user. In my setup, I am mostly interested in sending mail out and less in receiving mail, so I only have one user: info@mydomain.com.

EC2 setup

I deployed a c1.medium EC2 instance running Ubuntu 9.04 and I also gave it an Elastic IP. The host name of my mail server is mail01.

I then installed postfix via 'apt-get install postfix'.

Postfix setup

For starters, I configured postfix to receive mail for mydomain.com, and to restrict the set of IP addresses that are allowed to use it as a relay. The relevant lines in /etc/postfix/main.cf are:

myhostname = mail01
myorigin = /etc/mailname
mydestination = mydomain.com, mail01, localhost.localdomain, localhost
mynetworks = cidr:/etc/postfix/allowed_relays, 127.0.0.0/8 [::ffff:127.0.0.0]/104 [::1]/128

The contents of /etc/postfix/allowed_relays are similar to:

10.1.2.3   tstapp01
10.5.6.7  stgapp01

These IPs/names are application servers that I configured to use mail01 as their outgoing mail server, so they need to be allowed to send mail out.

Postini setup

When you sign up for Postini, you can specify that the user you created is of admin type. When you then log in to https://login.postini.com/ as that user and specify that you want to go to the System Administration area, you'll be directed to a URL such as http://ac-s9.postini.com. Make a note of the number after the s, in this case 9. It will dictate which outbound Postini mail servers you will need to use, per these instructions. For s9 (which was my case), the CIDR range of the outbound Postini mail servers is 74.125.148.0/22.

Within the Postini System Administration interface, I had to edit the settings for Inbound Servers and Outbound Servers.

For Inbound Servers, I edited the Delivery Mgr tab and specified the Elastic IP of mail01 as the Email Server IP address.

For Outbound Servers, I cliked on 'Add Outbound Email Server' and I specified a range whose beginning and end are the Elastic IP of mail01. For Reinjection Hosts, I also specified the Elastic IP of mail01.

The Reinjection Host setup is necessary for using Postini as an outbound mail service. See the Postini Outbound Services Configuration Guide (PDF) for details on how to do this depending on the mail server software you use.

More Postfix setup

Now we are ready to configure postfix to use Postini as its own outbound relay. I just added one line to /etc/postfix/main.cf:

relayhost = outbounds9.obsmtp.com

(this will be different, depending on the number you get when logging into Postini; in my case the number is 9)

I also had to allow the CIDR range of the Postini mail server to use mail01 as a relay (they need to reinject the message into mail01, hence the need to specify mail01's external Elastic IP as the Reinjection Host IP above). I edited /etc/postfix/allowed_relays and added the line:

74.125.148.0/22 postini ac-s9

DNS setup

To use Postini as an incoming mail server service, you need to configure mydomain.com to use Google Apps. Then you'll be able to specify the Postini mail servers that you'll use as MX records for mydomain.com.


DKIM setup

To further improve the chances that email you send out from the cloud will actually reach its intended recipients, it's a good idea to set up DKIM and SPF. For DKIM, see my blog post on "DKIM setup with postfix and OpenDKIM".

SPF setup

This is actually fairly easy to do. You basically need to add a DNS record which details the IP addresses of the mail servers which are allowed to send mail for mydomain.com. You can use an SPF setup wizard for this, and an SPF record testing tool to see if your setup is correct. In my case, I added a TXT record with the contents:

v=spf1 ip4:207.126.144.0/20 ip4:64.18.0.0/20 ip4:74.125.148.0/22 ip4:A.B.C.D -all

where the first 3 subnets are the 3 possible CIDR ranges for Postini servers, and the last IP which I denoted with A.B.C.D is mail01's Elastic IP address.

At this point, you should be able to both send email out through mail01 (from the servers you allowed to do so) and then through Postini, and also received email for the users you created within mydomain.com. If you also have DKIM and SPF setup, you're in a pretty good shape in terms of email deliverability situation. Still, sending email reliably is one of the hardest things to do in today's Internet, so good luck!

Deploying MongoDB with master-slave replication

This is a quick post so I don't forget how I set up replication between a MongoDB master server running at a data center and a MongoDB slave running in EC2. I won't go into how to use MongoDB (I'll have another one or more posts about that), I'll just talk about installing and setting up MongoDB with replication.

Setting up the master server

My master server runs Ubuntu 9.10 64-bit, but similar instructions apply to other flavors of Linux.

1) Download MongoDB binary tarball from the mongodb.org downloads page (the current production version is 1.4.2). The 64-bit version is recommended. Since MongoDB uses memory-mapped files, if you use the 32-bit version your database files will have a 2 GB size limit. So get the 64-bit version.

2) Unpack the tar.gz somewhere on your file system. I unpacked it in /usr/local on the master server and created a mongodb symlink:

# cd /usr/local
# tar xvfz mongodb-linux-x86_64-1.4.1.tar.gz
# ln -s mongodb-linux-x86_64-1.4.1 mongodb

3) Create a mongodb group and a mongodb user. Create a data directory on a partition with generous disk space amounts. Create a log directory. Set permissions on these 2 directories for user and group mongodb:

# groupadd mongodb
# useradd -g mongodb mongodb
# mkdir /data/mongodb
# mkdir /var/log/mongodb
# chown -R mongodb:mongodb /data/mongodb /var/log/mongodb

4) Create a startup script in /etc/init.d. I modified this script so it also handles logging the way I wanted. I posted my version of the script as this gist. In my case the script is called mongodb.master. I chkconfig-ed it on and started it:

# chkconfig mongodb.master on
# service mongodb.master start

5) Add location of MongoDB utilities to your PATH. If my case, added this line to .bashrc:

export PATH=$PATH::/usr/local/mongodb/bin

That's it, you should be in business now. If you type

# mongo

you should be connected to your MongoDB instance. You should also see a log file created in /var/log/mongodb/mongodb.log.

Also check out the very good administration-related documentation on mongodb.org.

Setting up the slave server

The installation steps 1) through 3) are the same on the slave server as on the master server.

Now let's assume you already created a database on your master server. If your database name is mydatabase, you should see files named mydatabase.0 through mydatabase.N and mydatabase.ns in your data directory. You can shut down the master server (via "service mongodb.master stop"), then simply copy these files over to the data directory of the slave server. Then start up the master server again.

At this point, in step 4 you're going to use a slightly different startup script. The main difference is in the way you start the mongod process:

DAEMON_OPTS="--slave --source $MASTER_SERVER --dbpath $DBPATH -logpath $LOGFILE --logappend run"

where MASTER_SERVER is the name or the IP address of your MongoDB master server. Note that you need port 27017 open on the master, in case you're running it behind a firewall.

Here is a gist with my slave startup script (which I call mongodb.slave).

Now when you run 'service mongodb.slave start' you should have your MongoDB slave database up and running, and sync-ing from the master. Check out the log file if you run into any issues.

Log rotation

The mongod daemon has a nifty feature: if you send it a SIGUSR1 signal via 'kill -USR1 PID', it will rotate its log file. The current mongodb.log will be timestamped, and a brand new mongodb.log will be started.

Monitoring and graphing

I use Nagios for monitoring and Munin for resource graphing/visualization. I found a good Nagios plugin for MongoDB at the Tag1 Consulting git repository: check_mongo. The 10gen CTO Eliot Horowitz maintains the MongoDB munin plugin on GitHub: mongo-munin.

Here are the Nagios commands I use on the master to monitor connectivity, connection counts and long operations:

/usr/local/nagios/libexec/check_mongo -H localhost -A connect
/usr/local/nagios/libexec/check_mongo -H localhost -A count
/usr/local/nagios/libexec/check_mongo -H localhost -A long

On the slave I also monitor slave lag:

/usr/local/nagios/libexec/check_mongo -H localhost -A slavelag

The munin plugin is easy to install. Just copy the files from github into /usr/share/munin/plugins and create symlinks to them in /etc/munin/plugins, then restart munin-node.

mongo_btree -> /usr/share/munin/plugins/mongo_btree
mongo_conn -> /usr/share/munin/plugins/mongo_conn
mongo_lock -> /usr/share/munin/plugins/mongo_lock
mongo_mem -> /usr/share/munin/plugins/mongo_mem
mongo_ops -> /usr/share/munin/plugins/mongo_ops

You should then see nice graphs for btree stats, current connections, write lock percentage, memory usage and mongod operations.

That's it for now. I'll talk more about how to actually use MongoDB with pymongo in a future post. But just to express an opinion, I think MongoDB rocks! It hits a very sweet spot between SQL and noSQL technologies. At Evite we currently use it for analytics and reporting, but I'm keeping a close eye on sharding becoming production-ready so that we can potentially use it for web-facing traffic too.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...