2023 10
向 linux 内核社区引入 objtrace
谢欢 (Jeff Xie)
Copyright © SUSE
大纲
— ftrace 的核心设计
– 全局概要图
– event
— 向 linux 内核社区引入 objtrace
– 开始提交 patch
– 跟踪 bio 的流动
– 跟踪 page/folio 的流动
– 实现原理
– 提交到内核上游的故事
Copyright © SUSE 2
2
ftrace 的核心设计 - 全局概要图
function tracer
function graph tracer kprobe
some_kernel_function()
{
trace event kretprobe
…
return n;
}
Copyright © SUSE 3
ftrace 的核心设计 - event
Kprobe(ftrace based kprobe)/kretprobe(ftrace based kretprobe)
Kprobe/kretprobe tracepoint
动态钩子函数 静态钩子函数
函数参数 输出特定信息
函数返回值
trigger
event
enable filter format id trigger
events
events
kprobe event/kretprobe event trace event
Copyright © SUSE
ftrace 的核心设计 - event
Kprobe/kretprobe tracepoint
trigger
动态钩子函数 静态钩子函数
函数参数 输出特定信息
函数返回值
uprobe
eprobe events
event
traceon traceoff stacktrace enable_event disable_event …
user enable filter format id trigger
X
events
kprobe event/kretprobe event trace event
uprobe event eprobe event user event X event
Copyright © SUSE
ftrace 的核心设计 - event
Kprobe(ftrace based kprobe)/kretprobe(ftrace based kretprobe)
Kprobe/kretprobe tracepoint
trigger
动态钩子函数 静态钩子函数
函数参数 输出特定信息
uprobe 函数返回值
eprobe events
event
traceon traceoff stacktrace enable_event disable_event …
user enable filter format id trigger
objtrace
X
events
kprobe event/kretprobe event trace event 1. 保存 object
2. function tracer
6
uprobe event eprobe event user event X event
Copyright © SUSE
向 linux 内核社区引入 objtrace
开始提交 patch
从 2021 年 10 月中旬开始提交 patchset v1[1]
[PATCH] trace: Add trace any kernel object
解决的问题:
如果想知道一个函数参数或者一个函数返回值在各个函数之间的传递或流动,
一个函数参数或者一个函数返回值我都称它们为一个 object( 实质上是一个指针 ).
如果实现了这样的功能就可以针对一个特定 object 进行跟踪。
例如,对于函数 bio_add_page():
int bio_add_page(struct bio *bio, struct page *page, unsigned int len, unsigned int offset)
如果能知道 bio 参数在各个函数之间的流动,就能知道一个指定 io 的执行过程(下一页):
对于 mm 子系统,可以跟踪 page/folio,
对于 network 子系统 , 可以跟踪 sock 等。
用来帮助排查踩内存的问题,例如对于一个函数传入了错误的参数
[1] [PATCH] trace: Add trace any kernel object
https://lore.kernel.org/all/
[email protected]/
Copyright © SUSE 7
向 linux 内核社区引入 objtrace
跟踪 bio 的流动
int bio_add_page(struct bio *bio, struct page *page, unsigned int len, unsigned int offset)
struct bio {
[..]
struct bvec_iter {
[..]
unsigned int bi_size;
[..]
[..]
};
// find the offset of bi_size in struct bio:
$ gdb vmlinux
(gdb) p &((struct bio *)0)->bi_iter.bi_size
$1 = (unsigned int *) 0x28
Copyright © SUSE 8
向 linux 内核社区引入 objtrace
跟踪 bio 的流动
int bio_add_page(struct bio *bio, struct page *page, unsigned int len, unsigned int offset)
# cd /sys/kernel/debug/tracing/
# echo 1 > ./options/sym-offset
1. # echo 'p bio_add_page bio=$arg1 page=$arg2' > ./kprobe_events
2. # echo 'objtrace:add:bio,0x28:u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger
3. # cat /test.txt > /dev/null
#du -sh /test.txt
12.0K /test.txt
Copyright © SUSE 9
向 linux 内核社区引入 objtrace
跟踪 bio 的流动
root@JeffXie tracing]# cat ./trace
# tracer: nop
#
# entries-in-buffer/entries-written: 385/385 #P:1
#
# _-----=> irqs-off/BH-disabled
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU#. ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
cat-78 [000] ...2. 13486.948438: bio_add_page+0x4/0xa0 <-ext4_mpage_readpages+0x46e/0x7d0 object:0xffff888104e770c0 value:0x0
cat-78 [000] ...1. 13486.948446: __bio_try_merge_page+0x0/0x100 <-bio_add_page+0x3d/0xa0 object:0xffff888104e770c0 value:0x0
cat-78 [000] ...1. 13486.948446: __bio_add_page+0x4/0x80 <-bio_add_page+0x64/0xa0 object:0xffff888104e770c0 value:0x0
cat-78 [000] ...2. 13486.948447: bio_add_page+0x4/0xa0 <-ext4_mpage_readpages+0x46e/0x7d0 object:0xffff888104e770c0 value:0x1000
cat-78 [000] ...1. 13486.948447: __bio_try_merge_page+0x0/0x100 <-bio_add_page+0x3d/0xa0 object:0xffff888104e770c0 value:0x1000
cat-78 [000] ...2. 13486.948448: bio_add_page+0x4/0xa0 <-ext4_mpage_readpages+0x46e/0x7d0 object:0xffff888104e770c0 value:0x2000
cat-78 [000] ...1. 13486.948448: __bio_try_merge_page+0x0/0x100 <-bio_add_page+0x3d/0xa0 object:0xffff888104e770c0 value:0x2000
cat-78 [000] ...1. 13486.948448: submit_bio+0x4/0x80 <-ext4_mpage_readpages+0x794/0x7d0 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948448: submit_bio_noacct+0x4/0x320 <-submit_bio+0x31/0x80 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948448: __cond_resched+0x4/0x30 <-submit_bio_noacct+0x27/0x320 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948448: should_fail_bio+0x4/0x40 <-submit_bio_noacct+0x6e/0x320 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948450: submit_bio_noacct_nocheck+0x4/0x130 <-submit_bio_noacct+0x29a/0x320 object:0xffff888104e770c0
value:0x3000
cat-78 [000] ...1. 13486.948450: blk_cgroup_bio_start+0x4/0x110 <-submit_bio_noacct_nocheck+0x15/0x130 object:0xffff888104e770c0
value:0x3000
cat-78 [000] ...1. 13486.948450: blk_cgroup_io_type+0x0/0x30 <-blk_cgroup_bio_start+0x1f/0x110 object:0xffff888104e770c0 value:0x3000
cat-78 © SUSE
Copyright [000] ...1. 13486.948450: ktime_get+0x4/0xa0 <-submit_bio_noacct_nocheck+0x30/0x130 object:0xffff888104e770c0 value:0x3000 10
cat-78 [000] ...1. 13486.948450: __submit_bio_noacct_mq+0x0/0xc0 <-submit_bio_noacct_nocheck+0x122/0x130 object:0xffff888104e770c0
value:0x3000
向 linux 内核社区引入 objtrace
跟踪 bio 的流动
cat-78 [000] ...1. 13486.948451: __submit_bio+0x0/0x130 <-__submit_bio_noacct_mq+0x67/0xc0 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948451: blk_mq_submit_bio+0x4/0x370 <-__submit_bio+0xba/0x130 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948451: __bio_split_to_limits+0x4/0x1d0 <-blk_mq_submit_bio+0xd9/0x370 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948451: bio_split_rw+0x4/0x270 <-__bio_split_to_limits+0xed/0x1d0 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948451: bio_set_ioprio+0x0/0x40 <-blk_mq_submit_bio+0xed/0x370 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948451: blkcg_set_ioprio+0x4/0x40 <-bio_set_ioprio+0x1b/0x40 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948452: ioprio_blkcg_from_bio+0x0/0x40 <-blkcg_set_ioprio+0x12/0x40 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948529: blk_mq_get_new_requests+0x0/0x200 <-blk_mq_submit_bio+0x20b/0x370 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948530: __rcu_read_lock+0x4/0x20 <-blk_mq_get_new_requests+0x5d/0x200 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948530: rcu_preempt_read_enter+0x0/0x30 <-__rcu_read_lock+0xe/0x20 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948531: blk_mq_attempt_bio_merge+0x0/0x60 <-blk_mq_get_new_requests+0xcf/0x200 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948531: blk_attempt_plug_merge+0x4/0xa0 <-blk_mq_attempt_bio_merge+0x3b/0x60 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948531: blk_mq_sched_bio_merge+0x4/0xf0 <-blk_mq_attempt_bio_merge+0x4d/0x60 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948531: blk_mq_sched_bio_merge+0x4/0xf0 <-blk_mq_attempt_bio_merge+0x4d/0x60 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948532: dd_bio_merge+0x4/0xa0 <-blk_mq_sched_bio_merge+0x31/0xf0 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948533: dd_bio_merge+0x4/0xa0 <-blk_mq_sched_bio_merge+0x31/0xf0 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948533: _raw_spin_lock+0x4/0x40 <-dd_bio_merge+0x4b/0xa0 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948533: _raw_spin_lock+0x4/0x40 <-dd_bio_merge+0x4b/0xa0 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...2. 13486.948533: blk_mq_sched_try_merge+0x4/0x1c0 <-dd_bio_merge+0x5c/0xa0 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...2. 13486.948533: blk_mq_sched_try_merge+0x4/0x1c0 <-dd_bio_merge+0x5c/0xa0 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...2. 13486.948533: elv_merge+0x4/0x100 <-blk_mq_sched_try_merge+0x3c/0x1c0 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...2. 13486.948533: elv_merge+0x4/0x100 <-blk_mq_sched_try_merge+0x3c/0x1c0 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...2. 13486.948533: elv_rqhash_find+0x4/0x130 <-elv_merge+0x69/0x100 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...2. 13486.948533: elv_rqhash_find+0x4/0x130 <-elv_merge+0x69/0x100 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...2. 13486.948534: dd_request_merge+0x4/0xe0 <-elv_merge+0xda/0x100 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...2. 13486.948534: dd_request_merge+0x4/0xe0 <-elv_merge+0xda/0x100 object:0xffff888104e770c0 value:0x3000
Copyright © SUSE 11
向 linux 内核社区引入 objtrace
跟踪 bio 的流动
cat-78 [000] ...2. 13486.948534: elv_rb_find+0x4/0x40 <-dd_request_merge+0x74/0xe0 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...2. 13486.948534: _raw_spin_unlock+0x4/0x30 <-dd_bio_merge+0x66/0xa0 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948534: __rq_qos_throttle+0x4/0x40 <-blk_mq_get_new_requests+0xf0/0x200 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948534: __rq_qos_throttle+0x4/0x40 <-blk_mq_get_new_requests+0xf0/0x200 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948536: blkcg_iolatency_throttle+0x4/0xb0 <-__rq_qos_throttle+0x2f/0x40 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948536: blkcg_iolatency_throttle+0x4/0xb0 <-__rq_qos_throttle+0x2f/0x40 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948536: __blk_mq_alloc_requests+0x0/0x180 <-blk_mq_get_new_requests+0x118/0x200 object:0xffff888104e770c0 value:0x300
cat-78 [000] ...1. 13486.948536: __blk_mq_alloc_requests+0x0/0x180 <-blk_mq_get_new_requests+0x118/0x200 object:0xffff888104e770c0 value:0x300
cat-78 [000] ...1. 13486.948536: dd_limit_depth+0x4/0x50 <-__blk_mq_alloc_requests+0x7a/0x180 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948536: blk_mq_get_tag+0x4/0x2a0 <-__blk_mq_alloc_requests+0xd9/0x180 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948539: __blk_mq_get_tag+0x0/0x110 <-blk_mq_get_tag+0xb6/0x2a0 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948542: __rq_qos_track+0x4/0x50 <-blk_mq_submit_bio+0x264/0x370 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948542: blk_mq_bio_to_request+0x0/0x70 <-blk_mq_submit_bio+0x273/0x370 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948542: __blk_account_io_start+0x0/0x60 <-blk_mq_bio_to_request+0x67/0x70 object:0xffff888104e770c0 value:0x3000
cat-78 [000] ...1. 13486.948549: __blk_bios_map_sg+0x0/0x2e0 <-__blk_rq_map_sg+0x32/0xe0 object:0xffff888104e770c0 value:0x3000
<idle>-0 [000] d.h3. 13486.949069: req_bio_endio+0x0/0x80 <-blk_update_request+0x12f/0x350 object:0xffff888104e770c0 value:0x3000
<idle>-0 [000] d.h3. 13486.949069: bio_endio+0x4/0x110 <-req_bio_endio+0x71/0x80 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] d.h3. 13486.949069: __rq_qos_done_bio+0x4/0x40 <-bio_endio+0x37/0x110 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] d.h3. 13486.949069: blkcg_iolatency_done_bio+0x4/0x120 <-__rq_qos_done_bio+0x30/0x40 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] d.h3. 13486.949071: bio_uninit+0x4/0x70 <-bio_endio+0xf8/0x110 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] d.h3. 13486.949071: bio_uninit+0x4/0x70 <-bio_endio+0xf8/0x110 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] d.h3. 13486.949071: __rcu_read_lock+0x4/0x20 <-bio_uninit+0x22/0x70 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] d.h3. 13486.949071: __rcu_read_lock+0x4/0x20 <-bio_uninit+0x22/0x70 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] d.h3. 13486.949071: rcu_preempt_read_enter+0x0/0x30 <-__rcu_read_lock+0xe/0x20 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] d.h3. 13486.949071: rcu_preempt_read_enter+0x0/0x30 <-__rcu_read_lock+0xe/0x20 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] d.h3. 13486.949071: __rcu_read_unlock+0x4/0x40 <-bio_uninit+0x33/0x70 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] d.h3. 13486.949071: __rcu_read_unlock+0x4/0x40 <-bio_uninit+0x33/0x70 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] d.h3. 13486.949071: rcu_preempt_read_exit+0x0/0x30 <-__rcu_read_unlock+0x18/0x40 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] d.h3. 13486.949072: rcu_preempt_read_exit+0x0/0x30 <-__rcu_read_unlock+0x18/0x40 object:0xffff888104e770c0 value:0x0
Copyright © SUSE 12
向 linux 内核社区引入 objtrace
跟踪 bio 的流动
<idle>-0 [000] d.h3. 13486.949072: mpage_end_io+0x4/0x40 <-bio_endio+0x106/0x110 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] d.h3. 13486.949072: mpage_end_io+0x4/0x40 <-bio_endio+0x106/0x110 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] d.h3. 13486.949072: bio_post_read_required+0x0/0x30 <-mpage_end_io+0x12/0x40 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] d.h3. 13486.949072: bio_post_read_required+0x0/0x30 <-mpage_end_io+0x12/0x40 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] d.h3. 13486.949072: __read_end_io+0x0/0x180 <-mpage_end_io+0x1e/0x40 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] d.h3. 13486.949072: __read_end_io+0x0/0x180 <-mpage_end_io+0x1e/0x40 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] dNh3. 13486.949074: bio_put+0x4/0x120 <-__read_end_io+0x16d/0x180 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] dNh3. 13486.949074: bio_free+0x0/0x50 <-bio_put+0x111/0x120 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] dNh3. 13486.949075: bio_uninit+0x4/0x70 <-bio_free+0x1b/0x50 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] dNh3. 13486.949075: mempool_free+0x4/0xa0 <-bio_free+0x3e/0x50 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] dNh3. 13486.949075: mempool_free_slab+0x4/0x20 <-mempool_free+0x2f/0xa0 object:0xffff888104e770c0 value:0x0
<idle>-0 [000] dNh3. 13486.949075: kmem_cache_free+0x4/0x4c0 <-mempool_free_slab+0x17/0x20 object:0xffff888104e770c0 value:0x0
Copyright © SUSE 13
向 linux 内核社区引入 objtrace
跟踪 page/folio 的流动
struct page {
[..]
atomic_t _refcount;
[..]
};
// find the offset of _refcount in struct page:
$ gdb vmlinux
(gdb) p &((struct page *)0)->_refcount
$1 = (atomic_t *) 0x34
Copyright © SUSE 14
向 linux 内核社区引入 objtrace
跟踪 page/folio 的流动
# cd /sys/kernel/debug/tracing/
# echo 1 > ./options/sym-offset
1. # echo 'r get_page_from_freelist page=$retval' > ./kprobe_events
2. # echo 'objtrace:add:page,0x34:5 if comm == "cat"' >
./events/kprobes/r_get_page_from_freelist_0/trigger
3. # cat /test.txt > /dev/null
Copyright © SUSE 15
向 linux 内核社区引入 objtrace
跟踪 page/folio 的流动
root@JeffXie tracing]# cat ./trace
# tracer: nop
#
# entries-in-buffer/entries-written: 507/507 #P:1
#
# _-----=> irqs-off/BH-disabled
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU#. ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
cat-75 [000] ...1. 13.737600: __mem_cgroup_charge+0x4/0x80 <-do_cow_fault+0x9c/0x220 object:0xffffea00046e1800 value:0x1
cat-75 [000] ...1. 13.737600: charge_memcg+0x0/0xc0 <-__mem_cgroup_charge+0x2c/0x80 object:0xffffea00046e1800 value:0x1
cat-75 [000] ...1. 13.737600: folio_flags+0x0/0x20 <-charge_memcg+0x20/0xc0 object:0xffffea00046e1800 value:0x1
cat-75 [000] ...1. 13.737600: commit_charge+0x0/0x10 <-charge_memcg+0x5d/0xc0 object:0xffffea00046e1800 value:0x1
cat-75 [000] ...1. 13.737601: __cgroup_throttle_swaprate+0x4/0xe0 <-do_cow_fault+0x19e/0x220 object:0xffffea00046e1800 value:0x1
cat-75 [000] ...1. 13.737601: blk_cgroup_congested+0x4/0x50 <-__cgroup_throttle_swaprate+0x22/0xe0 object:0xffffea00046e1800
value:0x1
cat-75 [000] ...1. 13.737601: __rcu_read_lock+0x4/0x20 <-blk_cgroup_congested+0xf/0x50 object:0xffffea00046e1800 value:0x1
cat-75 [000] ...1. 13.737601: rcu_preempt_read_enter+0x0/0x30 <-__rcu_read_lock+0xe/0x20 object:0xffffea00046e1800 value:0x1
cat-75 [000] ...1. 13.737601: blkcg_css+0x0/0x30 <-blk_cgroup_congested+0x14/0x50 object:0xffffea00046e1800 value:0x1
cat-75 [000] ...1. 13.737601: kthread_blkcg+0x4/0x50 <-blkcg_css+0xa/0x30 object:0xffffea00046e1800 value:0x1
cat-75 [000] ...1. 13.737602: __rcu_read_unlock+0x4/0x40 <-blk_cgroup_congested+0x40/0x50 object:0xffffea00046e1800 value:0x1
cat-75 [000] ...1. 13.737602: rcu_preempt_read_exit+0x0/0x30 <-__rcu_read_unlock+0x18/0x40 object:0xffffea00046e1800 value:0x1
cat-75 [000] ...1. 13.737614: folio_flags+0x0/0x20 <-do_cow_fault+0x156/0x220 object:0xffffea00046e1800 value:0x1
cat-75 [000] ...2. 13.737615: do_set_pte+0x4/0x1c0 <-finish_fault+0x155/0x280 object:0xffffea00046e1800 value:0x1
cat-75 © SUSE
Copyright [000] ...2. 13.737615: page_add_new_anon_rmap+0x4/0x20 <-do_set_pte+0x16f/0x1c0 object:0xffffea00046e1800 value:0x1 16
向 linux 内核社区引入 objtrace
跟踪 page/folio 的流动
cat-75 [000] ...2. 13.737615: folio_add_new_anon_rmap+0x4/0xd0 <-page_add_new_anon_rmap+0xe/0x20 object:0xffffea00046e1800
value:0x1
cat-75 [000] ...2. 13.737615: folio_flags+0x0/0x20 <-folio_add_new_anon_rmap+0x22/0xd0 object:0xffffea00046e1800 value:0x1
cat-75 [000] ...2. 13.737615: folio_flags+0x0/0x20 <-folio_add_new_anon_rmap+0x34/0xd0 object:0xffffea00046e1800 value:0x1
cat-75 [000] ...2. 13.737615: __mod_lruvec_page_state+0x4/0x160 <-folio_add_new_anon_rmap+0x64/0xd0 object:0xffffea00046e1800
value:0x1
cat-75 [000] ...2. 13.737616: __rcu_read_lock+0x4/0x20 <-__mod_lruvec_page_state+0x4c/0x160 object:0xffffea00046e1800 value:0x1
cat-75 [000] ...2. 13.737616: rcu_preempt_read_enter+0x0/0x30 <-__rcu_read_lock+0xe/0x20 object:0xffffea00046e1800 value:0x1
cat-75 [000] ...2. 13.737616: __page_set_anon_rmap+0x0/0x70 <-folio_add_new_anon_rmap+0x7b/0xd0 object:0xffffea00046e1800
value:0x1
cat-75 [000] ...2. 13.737616: __page_set_anon_rmap+0x0/0x70 <-folio_add_new_anon_rmap+0x7b/0xd0 object:0xffffea00046e1800
value:0x1
cat-75 [000] ...2. 13.737616: lru_cache_add_inactive_or_unevictable+0x4/0x60 <-do_set_pte+0x17a/0x1c0
object:0xffffea00046e1800 value:0x1
[...]
cat-75 [000] ...1. 13.738638: folio_flags+0x0/0x20 <-free_swap_cache+0x32/0x100 object:0xffffea00042d9980 value:0x1
cat-75 [000] ...1. 13.738638: free_swap_cache+0x4/0x100 <-free_pages_and_swap_cache+0x2e/0x50 object:0xffffea00042d9b00
value:0x1
cat-75 [000] ...1. 13.738638: folio_flags+0x0/0x20 <-free_swap_cache+0x32/0x100 object:0xffffea00042d9b00 value:0x1
cat-75 [000] d..2. 13.738640: folio_flags+0x0/0x20 <-release_pages+0xf4/0x470 object:0xffffea00046e1800 value:0x0
[...]
cat-75 [000] .N.1. 13.738830: free_unref_page_prepare+0x0/0xa0 <-free_unref_page_list+0x86/0x340 object:0xffffea00046e1800 value:0x0
cat-75 [000] .N.1. 13.738830: free_pcp_prepare+0x0/0x590 <-free_unref_page_prepare+0x14/0xa0 object:0xffffea00046e1800
value:0x0
cat-75 [000] .N.1. 13.738831: __reset_page_owner+0x4/0x80 <-free_pcp_prepare+0x2bf/0x590 object:0xffffea00046e1800 value:0x0
cat-75 [000] .N.1. 13.738831: page_ext_get+0x4/0x40 <-__reset_page_owner+0x22/0x80 object:0xffffea00046e1800 value:0x0
cat-75
Copyright © SUSE [000] .N.1. 13.738831: __rcu_read_lock+0x4/0x20 <-page_ext_get+0x12/0x40 object:0xffffea00046e1800 value:0x0 17
cat-75 [000] .N.1. 13.738831: rcu_preempt_read_enter+0x0/0x30 <-__rcu_read_lock+0xe/0x20 object:0xffffea00046e1800 value:0x0
向 linux 内核社区引入 objtrace
实现原理
int bio_add_page(struct bio *bio, struct page *page, unsigned int len, unsigned int offset)
1. # echo 'p bio_add_page bio=$arg1 page=$arg2' > ./kprobe_events
2. # echo 'objtrace:add:bio,0x28:u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger
1. 初始化时开启 function tracer register_ftrace_function(&tr->obj_data->fops)
fops->func = trace_object_events_call;
2. 事件触发时保存 object set_trace_object(void *obj, ...)
3. # cat /test.txt > /dev/null
Copyright © SUSE 18
向 linux 内核社区引入 objtrace
提交到内核上游的故事
提交 patchset v1 之后,两位 ftrace 维护者 steve 和 masami 的回复 :
from Steve:
So in conclusion, I really like this idea. Now we need to help you clean
it up, and make a proper interface and something that is flexible as
well. Looking forward to working with you more on this. Cheers,
-- Steve [1] // patch 发送之后 半个小时左右收到
from Masami:
>
> I didn't expect this idea to be a relatively large project. :-)
>
Because you have an exciting idea :)
Thank you,
-- Masami Hiramatsu <[email protected]> [2]
[1] https://lore.kernel.org/all/[email protected]/
[2]
https://lore.kernel.org/all/[email protected]
Copyright © SUSE 19
/
向 linux 内核社区引入 objtrace
提交到内核上游的故事
v2-v15 主要的工作
定义解析 objtrace 的方法
兼容通用的 trigger 架构
documents
fixup
cleanup
testcases
v15 时 steve 提出暂时去掉搜索参数的逻辑 [1]
Having a objfollow may be nice, but reading the arguments of a function is
really a "best attempt" at most, and you can't really trust the arguments are
what you are tracing. I would hold off on that until we have a good BTF tracing
infrastructure in the function tracer.
--- by steve
v15 时 steve 同时提出修改解析方法 [2]
objtrace:add:obj[,offset][:type][:count][if <filter>]
换成 :
objtrace:+0x16(+0x28(arg1)):u32[2] // 这样和操作 kprobe_events 文件的方法保持一致
I think doing this will make it much more extensive, not to mention it
will match the syntax of other code in the tracing infrastructure.
--- by steve
Copyright © SUSE [v15 邮件列表讨论 ] https://lore.kernel.org/all/[email protected]
m/
[v15 源 https://github.com/x-lugoo/linux/tree/objtrace-v15
© SUSE LLC. All Rights Reserved. SUSE and
the SUSE logo are registered trademarks of
SUSE LLC in the United States and other
Thank you
countries. All third-party trademarks are the
property of their respective owners.
For more information, contact SUSE at:
+1 800 796 3700 (U.S./Canada)
Frankenstrasse 146
90461 Nürnberg
www.suse.com
Copyright © SUSE