ZFS - Why So Slow
ZFS - Why So Slow
A regularly scheduled job was taking an age. Its a housekeeping task which runs a big find, and it was sl-o-oo-o-w. There was nothing much on CPU, so I ran an iostat. Heres what I got, disregarding the initial,
cumulative since-boot, output.
$ iostat -inx 3
...
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 77.1 0.0 643.3 8.0 2.0 103.8 26.0 100 100 c10d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d2
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d3
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t0d0
0.0 97.1 0.0 844.1 9.0 1.0 92.7 10.3 100 100 c3t0d0
This is a machine with two SATA disks, c10d0 and c3t0d0. (One on the main board, one on a PCI SATA
card, hence the different name format.) You can see theyre both busy (which youd expect, because theyre
mirrors.) In the old days, Id be kind of stuck where to look next, but Im on Solaris 11, so I have the DTrace
toolkit.
# /usr/dtrace/DTT/iotop
Tracing... Please wait.
...
2012 Jul 23 12:25:25, load: 0.15, disk_r:
UID PID
0
0
0
5
0
0
0
5
0
0
97 5160
97 5160
PPID CMD
DEVICE MAJ MIN D
BYTES
0 sched
cmdk1 246 64 R
6656
0 zpool-rpool
sd12
98 768 R
7168
0
sd12
98 768
14336
0 zpool-rpool
cmdk1 246 64 R
19456
0 sched
sd12
98 768 R
23040
1 htcacheclean
sd12
98 768 R
247296
1 htcacheclean
cmdk1 246 64 R
266752
htcachclean? Thats Apache, but Im not running a webserver on this machine am I? Hmm. I wish
# pargs 5160
5160: /usr/apache2/2.2/bin/htcacheclean -d20160 -i -l 2048M -n -p /var/cache/pkg/sysr
argv[0]: /usr/apache2/2.2/bin/htcacheclean
argv[1]: -d20160
argv[2]: -i
argv[3]: -l
argv[4]: 2048M
argv[5]: -n
argv[6]: -p
argv[7]: /var/cache/pkg/sysrepo
argv[8]: -P
argv[9]: /var/cache/pkg/sysrepo/../sysrepo_htcacheclean.pid
argv[10]: -t
Its IPS doing its own housekeeping, killing my service performance stone dead. Just when you think you hate
IPS as much as you possibly can, it always finds a way to make you hate it a little bit more.