Advanced Oracle Troubleshooting
No magic is needed, systematic approach will do
Tanel Poder http://www.tanelpoder.com
Introduction
About me:
Occupation: Expertise: DBA, engineer, researcher Oracle internals geek, End-to-end performance & scalability 10 years as DBA OCM (2002) OCP (1999) OakTable Network http://blog.tanelpoder.com
Oracle experience: Certification: Professional affiliations: Blog:
Tanel Poder
Introduction
About this presentation:
Systematic approach, rather than methodology Use right tools for right problems Break complex problems down to simple problems Therefore, use simple tools for simple problems In other words, use a systematic approach and life will be easier!
All scripts used here are freely available:
http://www.tanelpoder.com
Tanel Poder
Simple (but common) question:
What the $#*&%! is that session doing?
demo1.sql
Tanel Poder
Non-systematic troubleshooting
Check alert.log Check for disk and tablespace free space Check for locks Check for xyz "We did a healthcheck and everything looks OK!"
?????!
Tanel Poder
Semi-systematic troubleshooting
Quick check for usual suspects
System load, locks, etc
Look into Statspack Enable sql_trace
May require a change request in production
then what?
Tanel Poder
Systematic Troubleshooting Demo
SQL> @sw 114 SID STATE EVENT SEQ# SEC_IN_WAIT P1 P2 P3 P1TRANSL ------- ------- ---------------------------------------- ---------- ----------- ---------- ---------- ---------- ---------------------114 WAITING enq: TX - row lock contention 21 9 1415053318 131081 2381 0x54580006: TX mode 6 SQL> @sw &mysid SID STATE EVENT SEQ# SEC_IN_WAIT P1 P2 P3 P1TRANSL ------- ------- ---------------------------------------- ---------- ----------- ---------- ---------- ---------- ---------------------107 WORKING On CPU / runqueue 89 0 1413697536 1 0 SQL> SQL> @sn 5 &mysid -- Session Snapper v1.06 by Tanel Poder ( http://www.tanelpoder.com ) --------------------------------------------------------------------------------------------------------------------------------------------HEAD, SID, SNAPSHOT START , SECONDS, TYPE, STATISTIC , DELTA, DELTA/SEC, HDELTA, HDELTA/SEC --------------------------------------------------------------------------------------------------------------------------------------------DATA, 9, 20080221 22:05:08, 5, STAT, recursive calls , 1, 0, 1, .2 DATA, 9, 20080221 22:05:08, 5, STAT, recursive cpu usage , 1, 0, 1, .2 DATA, 9, 20080221 22:05:08, 5, STAT, session pga memory max , 25292, 5058, 25.29k, 5.06k DATA, 9, 20080221 22:05:08, 5, STAT, calls to get snapshot scn: kcmgss , 1, 0, 1, .2 DATA, 9, 20080221 22:05:08, 5, STAT, workarea executions - optimal , 18, 4, 18, 3.6 DATA, 9, 20080221 22:05:08, 5, STAT, execute count , 1, 0, 1, .2 DATA, 9, 20080221 22:05:08, 5, STAT, sorts (memory) , 11, 2, 11, 2.2 DATA, 9, 20080221 22:05:08, 5, STAT, sorts (rows) , 1904, 381, 1.9k, 380.8 DATA, 9, 20080221 22:05:08, 5, WAIT, PL/SQL lock timer , 4999649, 999930, 5s, 999.93ms -- End of snap 1 PL/SQL procedure successfully completed.
Tanel Poder
Troubleshooting approaches
How do you solve problems?
a)
Change something Did it help? Problem fixed ?
b)
Understand Sure? Measure Manage change Problem fixed and prevented
Tanel Poder
Systematic troubleshooting
Understand the "flow" of a server process and how to measure it then measure it step by step using right tool at right step ...fix the problem once you understand it
Tanel Poder
Simple (but common) question:
What the $#*&%! is that session doing?
demo2.sql
Tanel Poder
Understand the problem
Detail level
Entry point
Wait / CPU profile
Is the session stuck waiting? Which events take most time?
Cursor execution profile
One long-running or many short statements?
Which PL/SQL lines?
PL/SQL code execution profile
Performance counter profile
What counters are being incremented? In which kernel functions the execution is looping?
Which SQL exec plan lines?
SQL rowsource execution profile
Kernel function execution profile
Tanel Poder
Right tools for measuring right problems
Detail level
Entry point
Wait / CPU profile
v$session_wait v$session_event v$sess_time_model
Cursor execution profile
v$session.sql_hash_value v$session.sql_id
dbms_profiler
PL/SQL code execution profile
Performance counter profile
v$sesstat
v$sql_plan_... statistics_all
SQL rowsource execution profile
Kernel function execution profile
pstack, procstack, gdb, mdb, dbx
Tanel Poder
Right tools for measuring right problems
Detail level
Entry point
Wait / CPU profile
sw.sql / se.sql snapper.sql Sesspack / Statspack
sample v$session. sql_hash_value
Cursor execution profile
u.sql sql.sql sqlt.sql
dbms_profiler
PL/SQL code execution profile
Performance counter profile
snapper.sql Sesspack Statspack
xms.sql xmsh.sql dbms_xplan allstats last
Tanel Poder
SQL rowsource execution profile
Kernel function execution profile
stack sampling pstack
Simple (but common) question:
What the $#*&%! is that session doing?
demo3.sql
Tanel Poder
Understand the Oracle process flow
High level process flow explanation
request Client request response response
request Application
sql*net trace?
response
AWR
ASH
XYZ
Oracle Database
10046 trace Tanel Poder Wait interface
Endless request & response cycles Local procedure calls, remote procedure calls
Understanding process flow
1. Application...
a. b. c. d. ...waits for a request from a client ...issues SQL statements to a database and waits for result ...processes the SQL results ...returns processed results to client
2. Database...
a. b. c. d. ...waits for a request from an application ...issues physical IO calls to OS and waits for result ...processes the result data blocks ...returns processed results to application
Tanel Poder
Understanding process flow
3. OS...
a. ...waits for a request from a database b. ...issues device driver calls to control hardware controller and waits for result c. ...processes the hardware access routine results d. ...returns processed results to database
4. Hardware controller...
a. ...waits for a request from the OS b. ...sends (electric) signals to actual hardware and waits for result c. ...processes the result data d. ...returns processed results to OS
Tanel Poder
Oracle internal process flow
APP OCI UPI Oracle Net/TTC OS/TCP
Net
Application Oracle Call Interface User program interface SQL*Net, Two-Task Common TCP/IP Ethernet / WAN link TCP/IP SQL*Net, Two-Task Common Oracle Program Interface Kernel Kompile Shared (cursors) Query Execution Runtime Kernel Cache Buffer management Kernel Service File i/o (OSD) System Kernel Generic File ? OS / IO system calls
Time
OS/TCP Oracle Net/TTC OPI kks qer kcb ksf skgf
OS
Oracle Wait Interface V$SESSTAT V$...
Tanel Poder
Oracle internal process flow
APP OCI UPI Oracle Net/TTC OS/TCP
Net
Application Oracle Call Interface User program interface SQL*NET, TNS, Two-Task Common TCP/IP Ethernet / WAN link TCP/IP SQL*NET, TNS, Two-Task Common Oracle Program Interface Kernel Kompile Shared (cursors) Query Execution Runtime Kernel Cache Buffer management Kernel Service File i/o (OSD) System Kernel Generic File ? OS / IO system calls
Application instrumentation, ltrace, truss -u"libclntsh:*" $OH/rdbms/demo/ociucb.mk, OCITrace SQL*Net trace, Wireshark TNS protocol digester Wireshark TCP protocol digester snoop, tcpdump, Wireshark Wireshark TCP protocol digester SQL*Net trace, Wireshark, Event 10079 Event 10051 sql_trace, Event 10046, 10270 v$sql_plan_statistics, v$sql_plan_statistics_all, sql_trace x$kcbsw, Event 10200,10298,10812, _trace_pin_time v$filestat, v$tempstat, v$session_wait, Event 10298 strace, truss, tusc, filemon.exe, procmon.exe
OS/TCP Oracle Net/TTC OPI kks qer kcb ksf skgf
OS
Tanel Poder
Process stack demos
$ pstack 5855 #0 0x00c29402 #1 0x005509e4 #2 0x0e5769b7 #3 0x0e575946 #4 0x0e2c3adc #5 0x0e2c3449 #6 0x0b007261 #7 0x0c8a7961 #8 0x0e2d4dec #9 0x0e2ce9b8 #10 0x0e2cd214 #11 0x08754afa #12 0x0e39b2a8 #13 0x08930c80 #14 0x0892af0f #15 0x08c3d21a #16 0x08e6ce16 #17 0x08c403c5 #18 0x0e3c3fa9 #19 0x08b54500 #20 0x0e3be673 #21 0x0e53628a #22 0x089a87ab #23 0x089aaa00 #24 0x0e3be673 #25 0x089a4e76 #26 0x08c1626f #27 0x08539aeb #28 0x08c19a42 #29 0x08539a68 in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in __kernel_vsyscall () semtimedop () from /lib/libc.so.6 sskgpwwait () skgpwwait () ksliwat () kslwaitctx. () kjusuc () ksipgetctx () ksqcmi () ksqgtlctx () ksqgelctx. () ktcwit1 () kdddgb () kdddel () kaudel () delrow () qerdlFetch () delexe () opiexe () kpoal8 () opiodr () ttcpip () opitsk () opiino () opiodr () opidrv () sou2o () opimai_real () ssthrdmain () main ()
175982.1 ORA-600 Lookup Error Categories 453521.1 ORA-04031 KSFQ Buffers ksmlgpalloc @d.sql - Report data dictionary & X$ tables @pd.sql - Parameter descriptions @la.sql - Latch by address @lm.sql - Latch Misses by function location @fv.sql - Fixed variable by name @fva.sql- Fixed variable by address
Tanel Poder
Reading SQL plan execution stack
os_explain script Uses pstack to get process execution stack Translates function names into execution plan step names
As an Oracle SQL plan execution means that just a bunch of row-source functions are executed in defined order The order definition (in form of set of function pointers stored in library cache) is the execution plan
Uses information from Metalink:
175982.1 ORA-600 Lookup Error Categories
Demo
Tanel Poder
What if my problem lies outside Oracle?
Where to look next?
Tanel Poder
Oracle external process flow
Unix ltrace truss -u strace truss WireShark, tcpdump V$, events traces pstack strace truss Application
libclntsh.so (OCI.dll) libsocket.so (WS2_32.dll)
Windows
procmon.exe procexp.exe
OS kernel
NIC Wire NIC Network IO Interface
WireShark, tcpdump V$, events traces procexp.exe procmon.exe procexp.exe
Oracle Instance
Disk IO interface HBA/NIC Wire HBA/NIC
Storage subsystem
Tanel Poder
Tanel Poder
What if I need to look further inside Oracle ...if standard Oracle instrumentation isnt detailed enough... OS tools dont understand Oracle internal workings ...only for experimental environments
Tanel Poder
IO tracing events
10200, 00000, "consistent read buffer status" // *Cause: // *Action: alter session set "_trace_pin_time" = 1; // trace how long a current pin is held 10812, 00000, "Trace Consistent Reads" ( Trace into X$TRACE ) // *Cause: N/A // *Action: THIS IS NOT A USER ERROR NUMBER/MESSAGE. THIS DOES NOT // NEED TO BE TRANSLATED OR DOCUMENTED. IT IS USED ONLY FOR DEBUGGING. 10298, 00000, "ksfd i/o tracing" // *Cause: // *Action: If this event is set then ksfd module generates tracing // for each i/o request
Tanel Poder
Cursor usage tracing events
10270, 00000, "Debug shared cursors" // *Cause: Enables debugging code in shared cursor management modules // *Action: 10730, 00000, "trace row level security policy predicates" // *Document: NO // *Cause: // *Action: // *Comment: 10731, 00000, "dump SQL for CURSOR expressions" // *Cause: // *Action: set this event only under the supervision of Oracle development // *Comment: traces SQL statements generated to execute CURSOR expressions alter session set "_dump_qbc_tree" = 1; (10.2+) // dump top level query parse tree to trace
Tanel Poder
Network / user call tracing events
10051, 00000, "trace OPI calls" // *Cause: // *Action:
10079, 00000, "trace data sent/received via SQL*Net" // *Cause: // *Action: level 1 - trace network ops to/from client // level 2 - in addition to level 1, dump data // level 4 - trace network ops to/from dblink // level 8 - in addition to level 4, dump data
Tanel Poder
Right tools for right problems
Detail level
Entry point
Wait / CPU profile
sw.sql / se.sql snapper.sql Sesspack
sample v$session. sql_hash_value
Cursor execution profile
u.sql sql.sql sqlt.sql
dbms_profiler
PL/SQL code execution profile
Performance counter profile
snapper.sql Sesspack
xms.sql xmsh.sql dbms_xplan allstats last
Tanel Poder
SQL rowsource execution profile
Kernel function execution profile
stack sampling pstack os_explain
Questions?
Further questions welcome at http://blog.tanelpoder.com
Thank you!
Tanel Pder http://www.tanelpoder.com