Take my logs. Please.

Mike Brittain
Director of Engineering, Infrastructure
Etsy.com

mike@etsy.com      @mikebrittain
(hello?)
This sounds
boooooorrrrring...
No, no... hang in there!
Take My Logs. Please!
25 MM uniques/month
150 Countries
$300 MM+ sales last year
Apache, PHP, MySQL,
PostgreSQL,
Memcache, Gearman,
Solr, etc.
What’s working?
What’s working?
Performance
What’s working?
Performance
Operability
What’s working?
Performance
Operability
Simplicity
Logging + Trending
App logging
(Apache access and error logs)
“Common”
LogFormat "%h %l %u %t
 "%r" %>s %b
“Combined”
LogFormat "%h %l %u %t
 "%r" %>s %b
 "%{Referer}i"
 "%{User-agent}i""
mod_log_config
  %f         Filename requested

       # of keepalive requests served
  %k         on this connection
       Time taken to serve the request,
  %T              in seconds
mod_log_config
  %f         Filename requested

       # of keepalive requests served
  %k         on this connection



       Time taken to serve the request,
  %D           in microseconds
mod_log_config
   %f              Filename requested

             # of keepalive requests served
   %k              on this connection



             Time taken to serve the request,
   %D                in microseconds
             Contents of “note” foobar from
%{foobar}n           another module
apache_note()
apache_note(“foobar”, $whatever);
“Steroids”
LogFormat %{True-Client-IP}i %l %t "%r"
%>s %b "%{Referer}i"
"%{User-Agent}i" %V
%{user_id}n %{shop_id}n %{uaid}n
%{ab_selections}n %{request_uid}n
%{api_consumer_key}n
%{api_method_name}n
%{php_bytes}n %{php_microsec}n %D
“Steroids”
LogFormat %{True-Client-IP}i %l %t "%r"
%>s %b "%{Referer}i"
"%{User-Agent}i" %V
%{user_id}n %{shop_id}n %{uaid}n
%{ab_selections}n %{request_uid}n
%{api_consumer_key}n
%{api_method_name}n
%{php_bytes}n %{php_microsec}n %D
$GLOBALS['timer'] = microtime(true) * 1000000;
$GLOBALS['timer'] = microtime(true) * 1000000;

register_shutdown_function('pageStats');

function pageStats() {




}
$GLOBALS['timer'] = microtime(true) * 1000000;

register_shutdown_function('pageStats');

function pageStats() {
  $timer_end = microtime(true) * 1000000;
  $diff = $timer_end - $GLOBALS['timer'];




}
$GLOBALS['timer'] = microtime(true) * 1000000;

register_shutdown_function('pageStats');

function pageStats() {
  $timer_end = microtime(true) * 1000000;
  $diff = $timer_end - $GLOBALS['timer'];

    apache_note('php_microsec', $diff);

    apache_note('php_bytes',
                memory_get_peak_usage());
}
What about “%D”?
“Steroids”
LogFormat %{True-Client-IP}i %l %t "%r"
%>s %b "%{Referer}i"
"%{User-Agent}i" %V
%{user_id}n %{shop_id}n %{uaid}n
%{ab_selections}n %{request_uid}n
%{api_consumer_key}n
%{api_method_name}n
%{php_bytes}n %{php_microsec}n %D
“Steroids”
LogFormat %{True-Client-IP}i %l %t "%r"
%>s %b "%{Referer}i"
"%{User-Agent}i" %V
%{user_id}n %{shop_id}n %{uaid}n
%{ab_selections}n %{request_uid}n
%{api_consumer_key}n
%{api_method_name}n
%{php_bytes}n %{php_microsec}n %D
“Steroids”
LogFormat %{True-Client-IP}i %l %t "%r"
%>s %b "%{Referer}i"
"%{User-Agent}i" %V
%{user_id}n %{shop_id}n %{uaid}n
%{ab_selections}n %{request_uid}n
%{api_consumer_key}n
%{api_method_name}n
%{php_bytes}n %{php_microsec}n %D
“Steroids”
LogFormat %{True-Client-IP}i %l %t "%r"
%>s %b "%{Referer}i"
"%{User-Agent}i" %V
%{user_id}n %{shop_id}n %{uaid}n
%{ab_selections}n ...

easy_reg=1; personalize_widget=0;
icon_in_cornflower_blue=1;
Coming soon...
%{locale}n     (i18n)

%{platform}n   (desktop vs. mobile)
Coming soon...
%{locale}n     (i18n)

%{platform}n   (desktop vs. mobile)



OPS-1805, OPS-1827
etsy.com/careers
Using something else?
time, http method, request uri,
response code, referer, user-agent,
response time, response memory,
custom segmentation fields...
Quick averages
grep "GET /listing/" access.log | 

awk '{sum=sum+$(NF-1)} END {print sum/NR}'
Quick graphs
grep "GET /listing/" access.log | 

perl -pe "s/.*[.*d{4}:(d{2}):(d{2}):d{2}.*]/1:2/" | 

awk '{print $1, $(NF-1)}' > /tmp/pagetimes.dat




gives you...
Quick graphs
# /tmp/pagetimes.dat

18:37   251.0
18:38   252.1
18:39   253.5
18:40   251.0
18:45   250.0


and then...
Quick graphs
# GNUPLOT

set terminal png
set output 'listings.png'
set yrange [0:2000]
set xdata time
set timefmt "%d/%B/%Y:%H:%M:%S"
set format x "%H:%M"
plot '/tmp/pagetimes.dat' using 1:2 with points
Quick graphs
Error logs
PHP + Apache errors in one file
Simple logging interface
Error logs
Levels: error, info, debug
Namespace: perf, sql, __class__
Logger::error("Query exceeded 5 sec: $query",

             “sql_long_query”);
web0054 [Fri Mar 04 16:27:48 2011] [error]

[sql_long_query] [mk04gw1p71] Query exceeded

5 sec: SELECT * FROM ...
web0054 [Fri Mar 04 16:27:48 2011] [error]

[sql_long_query] [mk04gw1p71] Query exceeded

5 sec: SELECT * FROM ...
$ grep "16:27:48" access.log | wc -l

1527
web0054 [Fri Mar 04 16:27:48 2011] [error]

[sql_long_query] [mk04gw1p71] Query exceeded

5 sec: SELECT * FROM ...
iow
error.log -> request_uid -> access.log
request uri, ab selections, user id, locale,
platform, api key, etc.
Filtering
tail -f error.log | grep -v “sql_long_query” | ...
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Help me, Rhonda.
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Heeeeeeellllllllllllllppp
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
web0001   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
web0201   [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
web0034   [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web1101   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0201   [04:28:54   2011]   [error] [client 10.101.x.x] You've been eaten by a gr
web0055   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!!!
web0002   [04:28:54   2011]   [warning] [client 10.101.x.x] Sky is falling.
web0089   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0020   [04:28:54   2011]   [error] [client 10.101.x.x] Sky is falling.
web1101   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
web0055   [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
web0001   [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0034   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0087   [04:28:54   2011]   [fatal] [client 10.101.x.x] Sky is falling.
web0002   [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
web0201   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
web0077   [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
web0355   [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
web0052   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0003   [04:28:54   2011]   [error] [client 10.101.x.x] You've been eaten by a gr
web0066   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!!!
web0001   [04:28:54   2011]   [warning] [client 10.101.x.x] Sky is falling
web0020   [04:28:54   2011]   [error] [client 10.101.x.x] Sky is falling.
web1101   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
web0055   [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
web0001   [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
Trending
fatals   errors   warnings
Logster
Run by cron
Maintains a cursor on log files
Simple parsing & aggregation
Output to Ganglia or Graphite

                          github.com/etsy
web0054 [Fri Mar 04 16:27:48 2011] [error]
 [login] [mk04gw1p71] User login failed.
      Reason: wrong password for ...
^.+ [.+] [(?P<log_level>.+)]
if (fields['log_level'] == “fatal”):
   self.fatals += 1

elif (fields['log_level'] == “error”):
   self.errors += 1

elif (fields['log_level'] == “warning”):
   self.warnings += 1

...
MetricObject("fatals",
  (self.fatals / self.duration), "per sec")

MetricObject("errors",
  (self.errors / self.duration), "per sec")

MetricObject("warning",
  (self.warnings / self.duration), "per sec")
fatals   errors   warnings
Logster

          Signed-in vs. Signed-out
github.com/etsy
Log a plethora of data.
Don’t be afraid to use one file.
Use custom fields to segment data.
Correlate errors to specific requests.
Make f#@k!ng graphs.
Convert rates to trend lines.
Take my logs. Please!
Thank you.
                  codeascraft.etsy.com
                  github.com/etsy

Mike Brittain
Director of Engineering, Infrastructure
Etsy.com

mike@etsy.com      @mikebrittain

More Related Content

PDF
Metrics-Driven Engineering at Etsy
PDF
Web Performance Culture and Tools at Etsy
PDF
Metrics-Driven Engineering
PDF
Web Performance Culture and Tools at Etsy
PDF
Advanced Topics in Continuous Deployment
PDF
Mobile Device APIs
PDF
Your code are my tests
PDF
Using Task Queues and D3.js to build an analytics product on App Engine
Metrics-Driven Engineering at Etsy
Web Performance Culture and Tools at Etsy
Metrics-Driven Engineering
Web Performance Culture and Tools at Etsy
Advanced Topics in Continuous Deployment
Mobile Device APIs
Your code are my tests
Using Task Queues and D3.js to build an analytics product on App Engine

What's hot (19)

PDF
Anatomy of an Addon Ecosystem - EmberConf 2019
PDF
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
PPTX
Oracle APEX Performance
PDF
PHPUnit Episode iv.iii: Return of the tests
PDF
Real World Dependency Injection - oscon13
PDF
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
PDF
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
PDF
Rails 3: Dashing to the Finish
PDF
KISS Automation.py
PDF
Phing for power users - dpc_uncon13
PDF
A Journey with React
PDF
Incremental Type Safety in React Apollo
PDF
You do not need automation engineer - Sqa Days - 2015 - EN
PDF
Your Business. Your Language. Your Code - dpc13
PPT
Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)
PDF
Bowtie: Interactive Dashboards
PDF
Hardcore URL Routing for WordPress - WordCamp Atlanta 2014
PDF
The Best (and Worst) of Django
ODP
Anatomy of an Addon Ecosystem - EmberConf 2019
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Oracle APEX Performance
PHPUnit Episode iv.iii: Return of the tests
Real World Dependency Injection - oscon13
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
Rails 3: Dashing to the Finish
KISS Automation.py
Phing for power users - dpc_uncon13
A Journey with React
Incremental Type Safety in React Apollo
You do not need automation engineer - Sqa Days - 2015 - EN
Your Business. Your Language. Your Code - dpc13
Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)
Bowtie: Interactive Dashboards
Hardcore URL Routing for WordPress - WordCamp Atlanta 2014
The Best (and Worst) of Django
Ad

Viewers also liked (10)

PPT
How to Get to Second Base with Your CDN
PDF
Continuous Deployment at Etsy — TimesOpen NYC
PDF
Continuous Deployment: The Dirty Details
PDF
Simple Log Analysis and Trending
PDF
On Failure and Resilience
PDF
Continuous Delivery: The Dirty Details
PDF
From Building a Marketplace to Building Teams
PDF
The Real Life Social Network v2
PDF
Principles and Practices in Continuous Deployment at Etsy
PPTX
26 Disruptive & Technology Trends 2016 - 2018
How to Get to Second Base with Your CDN
Continuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment: The Dirty Details
Simple Log Analysis and Trending
On Failure and Resilience
Continuous Delivery: The Dirty Details
From Building a Marketplace to Building Teams
The Real Life Social Network v2
Principles and Practices in Continuous Deployment at Etsy
26 Disruptive & Technology Trends 2016 - 2018
Ad

Similar to Take My Logs. Please! (20)

PDF
Open Source Logging and Monitoring Tools
PDF
Open Source Logging and Metrics Tools
PDF
From zero to hero - Easy log centralization with Logstash and Elasticsearch
PDF
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
PDF
Building Scalable Websites with Perl
PDF
DDoS: Practical Survival Guide
 
PPT
Caching and data analysis will move your Symfony2 application to the next level
PDF
Introduction to performance tuning perl web applications
PPTX
SplunkLive! Atlanta Mar 2013 - University of Alabama at Birmingham
PPT
Data Driven Security, from Gartner Security Summit 2012
PDF
Metrics driven engineering (velocity 2011)
PDF
Python Load Testing - Pygotham 2012
PPTX
lightning talk proposal
PDF
DrupalCamp London 2017 - Web site insecurity
PDF
Open Source Logging and Metric Tools
KEY
Picking gem ruby for penetration testers
PPTX
Building an Automated Behavioral Malware Analysis Environment using Free and ...
PDF
What should I do when my website got hack?
PPTX
Apache Performance Tuning: Scaling Up
Open Source Logging and Monitoring Tools
Open Source Logging and Metrics Tools
From zero to hero - Easy log centralization with Logstash and Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
Building Scalable Websites with Perl
DDoS: Practical Survival Guide
 
Caching and data analysis will move your Symfony2 application to the next level
Introduction to performance tuning perl web applications
SplunkLive! Atlanta Mar 2013 - University of Alabama at Birmingham
Data Driven Security, from Gartner Security Summit 2012
Metrics driven engineering (velocity 2011)
Python Load Testing - Pygotham 2012
lightning talk proposal
DrupalCamp London 2017 - Web site insecurity
Open Source Logging and Metric Tools
Picking gem ruby for penetration testers
Building an Automated Behavioral Malware Analysis Environment using Free and ...
What should I do when my website got hack?
Apache Performance Tuning: Scaling Up

Recently uploaded (20)

PDF
Introduction to MCP and A2A Protocols: Enabling Agent Communication
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
PPTX
Build automations faster and more reliably with UiPath ScreenPlay
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
CEH Module 2 Footprinting CEH V13, concepts
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
substrate PowerPoint Presentation basic one
PDF
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PDF
Co-training pseudo-labeling for text classification with support vector machi...
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
PDF
EIS-Webinar-Regulated-Industries-2025-08.pdf
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
LMS bot: enhanced learning management systems for improved student learning e...
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Introduction to MCP and A2A Protocols: Enabling Agent Communication
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
Lung cancer patients survival prediction using outlier detection and optimize...
Connector Corner: Transform Unstructured Documents with Agentic Automation
NewMind AI Weekly Chronicles – August ’25 Week IV
A symptom-driven medical diagnosis support model based on machine learning te...
Build automations faster and more reliably with UiPath ScreenPlay
Data Virtualization in Action: Scaling APIs and Apps with FME
CEH Module 2 Footprinting CEH V13, concepts
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
substrate PowerPoint Presentation basic one
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
Basics of Cloud Computing - Cloud Ecosystem
Co-training pseudo-labeling for text classification with support vector machi...
Module 1 Introduction to Web Programming .pptx
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
EIS-Webinar-Regulated-Industries-2025-08.pdf
Early detection and classification of bone marrow changes in lumbar vertebrae...
LMS bot: enhanced learning management systems for improved student learning e...
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf

Take My Logs. Please!

  • 1. Take my logs. Please. Mike Brittain Director of Engineering, Infrastructure Etsy.com [email protected] @mikebrittain
  • 5. 25 MM uniques/month 150 Countries $300 MM+ sales last year
  • 12. App logging (Apache access and error logs)
  • 13. “Common” LogFormat "%h %l %u %t "%r" %>s %b
  • 14. “Combined” LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i""
  • 15. mod_log_config %f Filename requested # of keepalive requests served %k on this connection Time taken to serve the request, %T in seconds
  • 16. mod_log_config %f Filename requested # of keepalive requests served %k on this connection Time taken to serve the request, %D in microseconds
  • 17. mod_log_config %f Filename requested # of keepalive requests served %k on this connection Time taken to serve the request, %D in microseconds Contents of “note” foobar from %{foobar}n another module
  • 19. “Steroids” LogFormat %{True-Client-IP}i %l %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %V %{user_id}n %{shop_id}n %{uaid}n %{ab_selections}n %{request_uid}n %{api_consumer_key}n %{api_method_name}n %{php_bytes}n %{php_microsec}n %D
  • 20. “Steroids” LogFormat %{True-Client-IP}i %l %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %V %{user_id}n %{shop_id}n %{uaid}n %{ab_selections}n %{request_uid}n %{api_consumer_key}n %{api_method_name}n %{php_bytes}n %{php_microsec}n %D
  • 22. $GLOBALS['timer'] = microtime(true) * 1000000; register_shutdown_function('pageStats'); function pageStats() { }
  • 23. $GLOBALS['timer'] = microtime(true) * 1000000; register_shutdown_function('pageStats'); function pageStats() { $timer_end = microtime(true) * 1000000; $diff = $timer_end - $GLOBALS['timer']; }
  • 24. $GLOBALS['timer'] = microtime(true) * 1000000; register_shutdown_function('pageStats'); function pageStats() { $timer_end = microtime(true) * 1000000; $diff = $timer_end - $GLOBALS['timer']; apache_note('php_microsec', $diff); apache_note('php_bytes', memory_get_peak_usage()); }
  • 26. “Steroids” LogFormat %{True-Client-IP}i %l %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %V %{user_id}n %{shop_id}n %{uaid}n %{ab_selections}n %{request_uid}n %{api_consumer_key}n %{api_method_name}n %{php_bytes}n %{php_microsec}n %D
  • 27. “Steroids” LogFormat %{True-Client-IP}i %l %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %V %{user_id}n %{shop_id}n %{uaid}n %{ab_selections}n %{request_uid}n %{api_consumer_key}n %{api_method_name}n %{php_bytes}n %{php_microsec}n %D
  • 28. “Steroids” LogFormat %{True-Client-IP}i %l %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %V %{user_id}n %{shop_id}n %{uaid}n %{ab_selections}n %{request_uid}n %{api_consumer_key}n %{api_method_name}n %{php_bytes}n %{php_microsec}n %D
  • 29. “Steroids” LogFormat %{True-Client-IP}i %l %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %V %{user_id}n %{shop_id}n %{uaid}n %{ab_selections}n ... easy_reg=1; personalize_widget=0; icon_in_cornflower_blue=1;
  • 30. Coming soon... %{locale}n (i18n) %{platform}n (desktop vs. mobile)
  • 31. Coming soon... %{locale}n (i18n) %{platform}n (desktop vs. mobile) OPS-1805, OPS-1827 etsy.com/careers
  • 32. Using something else? time, http method, request uri, response code, referer, user-agent, response time, response memory, custom segmentation fields...
  • 33. Quick averages grep "GET /listing/" access.log | awk '{sum=sum+$(NF-1)} END {print sum/NR}'
  • 34. Quick graphs grep "GET /listing/" access.log | perl -pe "s/.*[.*d{4}:(d{2}):(d{2}):d{2}.*]/1:2/" | awk '{print $1, $(NF-1)}' > /tmp/pagetimes.dat gives you...
  • 35. Quick graphs # /tmp/pagetimes.dat 18:37 251.0 18:38 252.1 18:39 253.5 18:40 251.0 18:45 250.0 and then...
  • 36. Quick graphs # GNUPLOT set terminal png set output 'listings.png' set yrange [0:2000] set xdata time set timefmt "%d/%B/%Y:%H:%M:%S" set format x "%H:%M" plot '/tmp/pagetimes.dat' using 1:2 with points
  • 38. Error logs PHP + Apache errors in one file Simple logging interface
  • 39. Error logs Levels: error, info, debug Namespace: perf, sql, __class__
  • 40. Logger::error("Query exceeded 5 sec: $query", “sql_long_query”);
  • 41. web0054 [Fri Mar 04 16:27:48 2011] [error] [sql_long_query] [mk04gw1p71] Query exceeded 5 sec: SELECT * FROM ...
  • 42. web0054 [Fri Mar 04 16:27:48 2011] [error] [sql_long_query] [mk04gw1p71] Query exceeded 5 sec: SELECT * FROM ...
  • 43. $ grep "16:27:48" access.log | wc -l 1527
  • 44. web0054 [Fri Mar 04 16:27:48 2011] [error] [sql_long_query] [mk04gw1p71] Query exceeded 5 sec: SELECT * FROM ...
  • 45. iow error.log -> request_uid -> access.log request uri, ab selections, user id, locale, platform, api key, etc.
  • 46. Filtering tail -f error.log | grep -v “sql_long_query” | ...
  • 47. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppp web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0201 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a gr web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!! web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling. web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling. web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling. web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0003 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a gr web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!! web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling. web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
  • 48. Trending fatals errors warnings
  • 49. Logster Run by cron Maintains a cursor on log files Simple parsing & aggregation Output to Ganglia or Graphite github.com/etsy
  • 50. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 52. if (fields['log_level'] == “fatal”): self.fatals += 1 elif (fields['log_level'] == “error”): self.errors += 1 elif (fields['log_level'] == “warning”): self.warnings += 1 ...
  • 53. MetricObject("fatals", (self.fatals / self.duration), "per sec") MetricObject("errors", (self.errors / self.duration), "per sec") MetricObject("warning", (self.warnings / self.duration), "per sec")
  • 54. fatals errors warnings
  • 55. Logster Signed-in vs. Signed-out
  • 57. Log a plethora of data. Don’t be afraid to use one file.
  • 58. Use custom fields to segment data.
  • 59. Correlate errors to specific requests.
  • 61. Convert rates to trend lines.
  • 62. Take my logs. Please!
  • 63. Thank you. codeascraft.etsy.com github.com/etsy Mike Brittain Director of Engineering, Infrastructure Etsy.com [email protected] @mikebrittain