Building Scalable Websites with Perl

          Perrin Harkins
          Plus Three (http://plusthree.com/)
Who is doing it?
• Yahoo
  – Overture
• Amazon
  – IMDB
• InterActiveCorp
  – Ticketmaster
  – CitySearch
How are they doing it?
• Not just hardware
• Basic techniques
Things we won't be covering
• mod_perl tuning
  – http://perl.apache.org/
• DBI tuning
  – http://search.cpan.org/~timb/
• hardware
Caching
Page-Level Caching
• Best performance
• Pre-generate from batch job

                      page
                                batch
                      cache             data
                                 job


              web
client
             server
wget
wget ­­mirror ­­convert­links 
   ­­html­extension ­­reject gif,jpg,png 
   ­­no­parent 
   http://app­server/dynamic/pages/

                              wget
                      page
                      cache
             web               app
client                                 data
            server            server
Generate-On-Demand


                                         data



            web     mod_proxy    app
client     server     cache     server
mod_proxy
ProxyRequests Off
ProxyPass /dynamic/stuff http://app­
  server/
ProxyPassReverse /dynamic/stuff 
  http://app­server/
CacheRoot "/mnt/proxy­cache"
CacheSize 500000
CacheGcInterval 12
CacheMaxExpire 36
CacheDefaultExpire 2
Intercepting 404 Errors
ErrorDocument 404 /page/generator


              web               page
client
             server             cache



                        404
                      handler           data
Partial-Page Caching

                       2 hours




                       5 minutes




                       24 hours
Mason cache
  my $result = $m­>cache­>get(
                         $search_term
                             );
  if (!defined($result)) {
      $result = run_search($search_term);
      $m­>cache­>set($search_term,
                     $result,              
                      '30 min');
  }
Cache::FastMmap
  our $Cache = Cache::FastMmap­>new(
                    cache_size  => '500m',
                    expire_time => '30m',
                                   );

  $Cache­>set($key, $value);
  my $value = $Cache­>get($key);
Memcached
our $Memd = Cache::Memcached­>new({
    'servers' => [
      "10.0.0.15:11211","10.0.0.15:11212",
      "10.0.0.17:11211",
       [ "10.0.0.17:11211", 3 ]
    ],
    'debug'   => 0,
    'compress_threshold' => 10_000,
                                 });
$Memd­>set($key, $value, 5*60 );
my $value = $Memd­>get($key);
Job Queuing
Get in Line
“search in progress”
                       web
 client
                      server fork     worker
          redirect



                          web       done
 client      reload                        worker
                         server     yet?
             result


http://www.stonehenge.com/merlyn/WebTechniques/col20.html
Spread::Queue client
use Spread::Queue::Sender;
my $sender = Spread::Queue::Sender­>new(
                                "myqueue"
                                       );

$sender­>submit(
    "myfunc", { name => "value" }
               );

my $response = $sender­>receive();
Spread::Queue worker
use Spread::Queue::Worker;


my $worker = Spread::Queue::Worker­>new("myqueue");
$worker­>callbacks(
           myfunc => &myfunc,
                  );
$SIG{INT} = &signal_handler;
$worker­>run;
Spread::Queue worker
sub myfunc {
    my ($worker, $originator, $input) = @_;
      
    my $result = {
        response => "I heard you!",
                 };
    $worker­>respond($originator, $result);
}
Resources
• “Computer Science and Perl Programming”
• “Mastering Algorithms with Perl”
• “Patterns of Enterprise Application
  Architecture” by Martin Fowler
• http://oreillynet.com/
Thanks!
• Craig McLane and Adam Sussman of
  Ticketmaster Online
• Zack Steinkamp of Yahoo

Building Scalable Websites with Perl