Skip to content

Commit bc9ab59

Browse files
add obi troubleshooting docs (#8559)
Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com>
1 parent f175c44 commit bc9ab59

File tree

3 files changed

+219
-0
lines changed

3 files changed

+219
-0
lines changed

content/en/docs/zero-code/obi/_index.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,3 +102,8 @@ For a comprehensive list of capabilities required by OBI, refer to
102102

103103
- Follow the [setup](setup/) documentation to get started with OBI either with
104104
Docker or Kubernetes.
105+
106+
## Troubleshooting
107+
108+
- See the [troubleshooting](./troubleshooting) guide for help with common
109+
issues.
Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
---
2+
title: Troubleshooting
3+
description: Troubleshooting OBI common issues and errors
4+
weight: 22
5+
cSpell:ignore: Clickhouse
6+
---
7+
8+
On this page, you can learn how to diagnose and resolve common OBI errors and
9+
issues.
10+
11+
## Troubleshooting tools
12+
13+
OBI provides a variety of tools and configuration options to help diagnose and
14+
troubleshoot issues.
15+
16+
### Detailed logging
17+
18+
You can increase the logging verbosity of OBI by setting the `log_level`
19+
configuration or the `OTEL_EBPF_LOG_LEVEL` environment variable to `debug`. This
20+
provides more detailed logs that may help in diagnosing issues.
21+
22+
To enable logging from the BPF programs, set the `ebpf.bpf_debug` configuration
23+
or the `OTEL_EBPF_BPF_DEBUG` environment variable to `true`. **Use this only for
24+
debugging**, as it can generate a significant number of logs.
25+
26+
### Configuration logging
27+
28+
By default, OBI merges its configuration from three different sources, from
29+
least to most priority:
30+
31+
- Built-in default configuration
32+
- Configuration file, provided using the `--config` flag or
33+
`OTEL_EBPF_CONFIG_PATH`
34+
- Environment variables, usually starting with `OTEL_EBPF_`
35+
36+
It is often helpful to view the final merged configuration. Using the
37+
`log_config` configuration value (or `OTEL_EBPF_LOG_CONFIG` environment
38+
variable), you can instruct OBI to log the final configuration at startup.
39+
40+
`log_config` supports the following values:
41+
42+
- `yaml` — logs the final configuration in YAML format; best for human
43+
readability since it matches the config file structure
44+
- `json` — logs the final configuration in JSON format; best for log shippers
45+
since it is a single structured line
46+
47+
### Internal metrics
48+
49+
You can configure and use [OBI internal metrics](../metrics/#internal-metrics)
50+
to monitor performance and internal state.
51+
52+
To turn on internal metrics, configure `internal_metrics.exporter` with one of
53+
the following values:
54+
55+
- `none` (default): disables internal metrics
56+
- `prometheus`: exports internal metrics in Prometheus format via an HTTP server
57+
- `otlp`: exports internal metrics via an OTLP exporter
58+
59+
### Debug traces exporter
60+
61+
To debug the raw trace spans generated by OBI, you can set the
62+
`otel_traces_exporter.protocol` configuration value or the
63+
`OTEL_EXPORTER_OTLP_TRACES_PROTOCOL` environment variable to `debug`. This logs
64+
the raw trace spans to the console in a human-readable format, matching the OTel
65+
Collector debug exporter with `verbosity: detailed`. Example spans to the
66+
console look like this:
67+
68+
```text
69+
Traces {"resource spans": 1, "spans": 1}
70+
ResourceSpans #0
71+
Resource SchemaURL:
72+
Resource attributes:
73+
-> service.name: Str(flagd)
74+
-> telemetry.sdk.language: Str(go)
75+
-> telemetry.sdk.name: Str(opentelemetry-ebpf-instrumentation)
76+
-> telemetry.sdk.version: Str(main)
77+
-> host.name: Str(flagd-5cccb4c4f5-sfkcm)
78+
-> os.type: Str(linux)
79+
-> service.namespace: Str(opentelemetry-demo)
80+
-> k8s.owner.name: Str(flagd)
81+
-> k8s.kind: Str(Deployment)
82+
-> k8s.replicaset.name: Str(flagd-5cccb4c4f5)
83+
-> k8s.pod.name: Str(flagd-5cccb4c4f5-sfkcm)
84+
-> k8s.container.name: Str(flagd)
85+
-> k8s.deployment.name: Str(flagd)
86+
-> service.version: Str(2.0.2)
87+
-> k8s.namespace.name: Str(default)
88+
-> otel.library.name: Str(go.opentelemetry.io/obi)
89+
ScopeSpans #0
90+
ScopeSpans SchemaURL:
91+
InstrumentationScope
92+
Span #0
93+
Trace ID : 63a2723a58e0033170e58b1ff27ef03d
94+
Parent ID :
95+
ID : fab47609b60cc4e0
96+
Name : /opentelemetry.proto.collector.metrics.v1.MetricsService/Export
97+
Kind : Client
98+
Start time : 2025-11-28 16:10:35.4241749 +0000 UTC
99+
End time : 2025-11-28 16:10:35.42555658 +0000 UTC
100+
Status code : Unset
101+
Status message :
102+
Attributes:
103+
-> rpc.method: Str(/opentelemetry.proto.collector.metrics.v1.MetricsService/Export)
104+
-> rpc.system: Str(grpc)
105+
-> rpc.grpc.status_code: Int(0)
106+
-> server.address: Str(otel-collector.default)
107+
-> peer.service: Str(otel-collector.default)
108+
-> server.port: Int(4317)
109+
```
110+
111+
### Performance profiler (pprof)
112+
113+
OBI can expose a `pprof` port to allow performance profiling. To enable it, set
114+
the `profile_port` configuration value or the `OTEL_EBPF_PROFILE_PORT`
115+
environment variable to the desired port.
116+
117+
This is an advanced use case and typically not required.
118+
119+
## Common OBI issues
120+
121+
This section covers how to resolve common OBI issues.
122+
123+
### Node.js services crash or become unresponsive when OBI is running
124+
125+
To enable better context propagation in Node.js applications, OBI injects custom
126+
code to track the current execution context. It does so using the Node.js
127+
inspector protocol and sends the `SIGUSR1` signal to the Node process to open
128+
the inspector.
129+
130+
However, if the application defines its own `SIGUSR1` signal handler, it handles
131+
OBI's signal in a custom way, which may cause crashes or unresponsiveness of the
132+
targeted application. For example:
133+
134+
```javascript
135+
process.on('SIGUSR1', () => {
136+
process.exit(0);
137+
});
138+
```
139+
140+
Or by using Node.js flags that register their own signal handling, such as:
141+
142+
```commandline
143+
node --heapsnapshot-signal=SIGUSR1
144+
```
145+
146+
**Solutions:**
147+
148+
- Use the `discovery` configuration to exclude specific Node.js applications
149+
from OBI tracking, preventing OBI from sending `SIGUSR1`.
150+
- Disable Node.js context propagation entirely by setting `nodejs.enabled:false`
151+
in configuration file or environment variable
152+
`OTEL_EBPF_NODEJS_ENABLED=false`.
153+
154+
### ClickHouse instances crash when OBI is running
155+
156+
If you're running [Clickhouse](https://github.com/ClickHouse/ClickHouse) on the
157+
same node with OBI, you might see ClickHouse crashing with logs such as:
158+
159+
```text
160+
Application: Code: 246. DB::Exception: Calculated checksum of the executable (...) does not correspond to the reference checksum ...
161+
```
162+
163+
The issue is likely caused by OBI attaching eBPF uprobes to the ClickHouse
164+
binary.
165+
[A relevant GitHub](https://github.com/ClickHouse/ClickHouse/issues/83637) issue
166+
explains this behavior:
167+
168+
> When attaching a uprobe, the kernel will modify the target process memory to
169+
> insert a trap instruction at the attachment address. This causes the
170+
> ClickHouse binary checksum validation to fail during startup.
171+
172+
**Solution:**
173+
174+
Start ClickHouse with the
175+
[skip_binary_checksum_checks](https://clickhouse.com/docs/operations/server-configuration-parameters/settings#skip_binary_checksum_checks)
176+
flag
177+
178+
### Missing telemetry data for Go applications or TLS requests
179+
180+
If you are missing telemetry coming from Go applications or TLS requests (like
181+
HTTPS communication), it might be due to insufficient privileges for attaching
182+
uprobes. Due to some recent kernel security changes which were backported to
183+
many older kernel versions, uprobes now require `CAP_SYS_ADMIN` capability. OBI
184+
uses uprobes to instrument Golang applications and TLS requests, along with
185+
other runtime/language specific instrumentations. If your OBI deployment
186+
security configuration isn't using privileged operation (for example,
187+
`privileged:true` or Docker and Kubernetes) or it doesn't provide
188+
`CAP_SYS_ADMIN` as a security capability, you might not see some or all of your
189+
telemetry.
190+
191+
To troubleshoot this issue, enable detailed OBI logging with
192+
`OTEL_EBPF_LOG_LEVEL=debug`. If you see all the uprobe injections failing with
193+
the error "setting uprobe (offset)..." then you are likely experiencing this
194+
issue.
195+
196+
**Solutions:**
197+
198+
You can either:
199+
200+
- Run OBI as privileged.
201+
- Add `CAP_SYS_ADMIN` to the list of capabilities in your deployment security
202+
configuration.

static/refcache.json

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -651,6 +651,10 @@
651651
"StatusCode": 206,
652652
"LastSeen": "2025-12-08T09:50:53.577916862Z"
653653
},
654+
"https://clickhouse.com/docs/operations/server-configuration-parameters/settings#skip_binary_checksum_checks": {
655+
"StatusCode": 206,
656+
"LastSeen": "2025-12-02T15:20:14.11156+02:00"
657+
},
654658
"https://cloud-native.slack.com/archives/C014L2KCTE3": {
655659
"StatusCode": 200,
656660
"LastSeen": "2025-12-09T09:47:13.402705871Z"
@@ -3287,6 +3291,14 @@
32873291
"StatusCode": 206,
32883292
"LastSeen": "2025-12-08T09:48:50.441449276Z"
32893293
},
3294+
"https://github.com/ClickHouse/ClickHouse": {
3295+
"StatusCode": 206,
3296+
"LastSeen": "2025-12-02T15:20:07.242009+02:00"
3297+
},
3298+
"https://github.com/ClickHouse/ClickHouse/issues/83637": {
3299+
"StatusCode": 206,
3300+
"LastSeen": "2025-12-02T15:20:10.164251+02:00"
3301+
},
32903302
"https://github.com/ClickHouse/clickhouse-java": {
32913303
"StatusCode": 206,
32923304
"LastSeen": "2025-12-09T09:42:39.412969479Z"

0 commit comments

Comments
 (0)