|
| 1 | +--- |
| 2 | +title: Troubleshooting |
| 3 | +description: Troubleshooting OBI common issues and errors |
| 4 | +weight: 22 |
| 5 | +cSpell:ignore: Clickhouse |
| 6 | +--- |
| 7 | + |
| 8 | +On this page, you can learn how to diagnose and resolve common OBI errors and |
| 9 | +issues. |
| 10 | + |
| 11 | +## Troubleshooting tools |
| 12 | + |
| 13 | +OBI provides a variety of tools and configuration options to help diagnose and |
| 14 | +troubleshoot issues. |
| 15 | + |
| 16 | +### Detailed logging |
| 17 | + |
| 18 | +You can increase the logging verbosity of OBI by setting the `log_level` |
| 19 | +configuration or the `OTEL_EBPF_LOG_LEVEL` environment variable to `debug`. This |
| 20 | +provides more detailed logs that may help in diagnosing issues. |
| 21 | + |
| 22 | +To enable logging from the BPF programs, set the `ebpf.bpf_debug` configuration |
| 23 | +or the `OTEL_EBPF_BPF_DEBUG` environment variable to `true`. **Use this only for |
| 24 | +debugging**, as it can generate a significant number of logs. |
| 25 | + |
| 26 | +### Configuration logging |
| 27 | + |
| 28 | +By default, OBI merges its configuration from three different sources, from |
| 29 | +least to most priority: |
| 30 | + |
| 31 | +- Built-in default configuration |
| 32 | +- Configuration file, provided using the `--config` flag or |
| 33 | + `OTEL_EBPF_CONFIG_PATH` |
| 34 | +- Environment variables, usually starting with `OTEL_EBPF_` |
| 35 | + |
| 36 | +It is often helpful to view the final merged configuration. Using the |
| 37 | +`log_config` configuration value (or `OTEL_EBPF_LOG_CONFIG` environment |
| 38 | +variable), you can instruct OBI to log the final configuration at startup. |
| 39 | + |
| 40 | +`log_config` supports the following values: |
| 41 | + |
| 42 | +- `yaml` — logs the final configuration in YAML format; best for human |
| 43 | + readability since it matches the config file structure |
| 44 | +- `json` — logs the final configuration in JSON format; best for log shippers |
| 45 | + since it is a single structured line |
| 46 | + |
| 47 | +### Internal metrics |
| 48 | + |
| 49 | +You can configure and use [OBI internal metrics](../metrics/#internal-metrics) |
| 50 | +to monitor performance and internal state. |
| 51 | + |
| 52 | +To turn on internal metrics, configure `internal_metrics.exporter` with one of |
| 53 | +the following values: |
| 54 | + |
| 55 | +- `none` (default): disables internal metrics |
| 56 | +- `prometheus`: exports internal metrics in Prometheus format via an HTTP server |
| 57 | +- `otlp`: exports internal metrics via an OTLP exporter |
| 58 | + |
| 59 | +### Debug traces exporter |
| 60 | + |
| 61 | +To debug the raw trace spans generated by OBI, you can set the |
| 62 | +`otel_traces_exporter.protocol` configuration value or the |
| 63 | +`OTEL_EXPORTER_OTLP_TRACES_PROTOCOL` environment variable to `debug`. This logs |
| 64 | +the raw trace spans to the console in a human-readable format, matching the OTel |
| 65 | +Collector debug exporter with `verbosity: detailed`. Example spans to the |
| 66 | +console look like this: |
| 67 | + |
| 68 | +```text |
| 69 | +Traces {"resource spans": 1, "spans": 1} |
| 70 | +ResourceSpans #0 |
| 71 | +Resource SchemaURL: |
| 72 | +Resource attributes: |
| 73 | + -> service.name: Str(flagd) |
| 74 | + -> telemetry.sdk.language: Str(go) |
| 75 | + -> telemetry.sdk.name: Str(opentelemetry-ebpf-instrumentation) |
| 76 | + -> telemetry.sdk.version: Str(main) |
| 77 | + -> host.name: Str(flagd-5cccb4c4f5-sfkcm) |
| 78 | + -> os.type: Str(linux) |
| 79 | + -> service.namespace: Str(opentelemetry-demo) |
| 80 | + -> k8s.owner.name: Str(flagd) |
| 81 | + -> k8s.kind: Str(Deployment) |
| 82 | + -> k8s.replicaset.name: Str(flagd-5cccb4c4f5) |
| 83 | + -> k8s.pod.name: Str(flagd-5cccb4c4f5-sfkcm) |
| 84 | + -> k8s.container.name: Str(flagd) |
| 85 | + -> k8s.deployment.name: Str(flagd) |
| 86 | + -> service.version: Str(2.0.2) |
| 87 | + -> k8s.namespace.name: Str(default) |
| 88 | + -> otel.library.name: Str(go.opentelemetry.io/obi) |
| 89 | +ScopeSpans #0 |
| 90 | +ScopeSpans SchemaURL: |
| 91 | +InstrumentationScope |
| 92 | +Span #0 |
| 93 | + Trace ID : 63a2723a58e0033170e58b1ff27ef03d |
| 94 | + Parent ID : |
| 95 | + ID : fab47609b60cc4e0 |
| 96 | + Name : /opentelemetry.proto.collector.metrics.v1.MetricsService/Export |
| 97 | + Kind : Client |
| 98 | + Start time : 2025-11-28 16:10:35.4241749 +0000 UTC |
| 99 | + End time : 2025-11-28 16:10:35.42555658 +0000 UTC |
| 100 | + Status code : Unset |
| 101 | + Status message : |
| 102 | +Attributes: |
| 103 | + -> rpc.method: Str(/opentelemetry.proto.collector.metrics.v1.MetricsService/Export) |
| 104 | + -> rpc.system: Str(grpc) |
| 105 | + -> rpc.grpc.status_code: Int(0) |
| 106 | + -> server.address: Str(otel-collector.default) |
| 107 | + -> peer.service: Str(otel-collector.default) |
| 108 | + -> server.port: Int(4317) |
| 109 | +``` |
| 110 | + |
| 111 | +### Performance profiler (pprof) |
| 112 | + |
| 113 | +OBI can expose a `pprof` port to allow performance profiling. To enable it, set |
| 114 | +the `profile_port` configuration value or the `OTEL_EBPF_PROFILE_PORT` |
| 115 | +environment variable to the desired port. |
| 116 | + |
| 117 | +This is an advanced use case and typically not required. |
| 118 | + |
| 119 | +## Common OBI issues |
| 120 | + |
| 121 | +This section covers how to resolve common OBI issues. |
| 122 | + |
| 123 | +### Node.js services crash or become unresponsive when OBI is running |
| 124 | + |
| 125 | +To enable better context propagation in Node.js applications, OBI injects custom |
| 126 | +code to track the current execution context. It does so using the Node.js |
| 127 | +inspector protocol and sends the `SIGUSR1` signal to the Node process to open |
| 128 | +the inspector. |
| 129 | + |
| 130 | +However, if the application defines its own `SIGUSR1` signal handler, it handles |
| 131 | +OBI's signal in a custom way, which may cause crashes or unresponsiveness of the |
| 132 | +targeted application. For example: |
| 133 | + |
| 134 | +```javascript |
| 135 | +process.on('SIGUSR1', () => { |
| 136 | + process.exit(0); |
| 137 | +}); |
| 138 | +``` |
| 139 | + |
| 140 | +Or by using Node.js flags that register their own signal handling, such as: |
| 141 | + |
| 142 | +```commandline |
| 143 | +node --heapsnapshot-signal=SIGUSR1 |
| 144 | +``` |
| 145 | + |
| 146 | +**Solutions:** |
| 147 | + |
| 148 | +- Use the `discovery` configuration to exclude specific Node.js applications |
| 149 | + from OBI tracking, preventing OBI from sending `SIGUSR1`. |
| 150 | +- Disable Node.js context propagation entirely by setting `nodejs.enabled:false` |
| 151 | + in configuration file or environment variable |
| 152 | + `OTEL_EBPF_NODEJS_ENABLED=false`. |
| 153 | + |
| 154 | +### ClickHouse instances crash when OBI is running |
| 155 | + |
| 156 | +If you're running [Clickhouse](https://github.com/ClickHouse/ClickHouse) on the |
| 157 | +same node with OBI, you might see ClickHouse crashing with logs such as: |
| 158 | + |
| 159 | +```text |
| 160 | +Application: Code: 246. DB::Exception: Calculated checksum of the executable (...) does not correspond to the reference checksum ... |
| 161 | +``` |
| 162 | + |
| 163 | +The issue is likely caused by OBI attaching eBPF uprobes to the ClickHouse |
| 164 | +binary. |
| 165 | +[A relevant GitHub](https://github.com/ClickHouse/ClickHouse/issues/83637) issue |
| 166 | +explains this behavior: |
| 167 | + |
| 168 | +> When attaching a uprobe, the kernel will modify the target process memory to |
| 169 | +> insert a trap instruction at the attachment address. This causes the |
| 170 | +> ClickHouse binary checksum validation to fail during startup. |
| 171 | +
|
| 172 | +**Solution:** |
| 173 | + |
| 174 | +Start ClickHouse with the |
| 175 | +[skip_binary_checksum_checks](https://clickhouse.com/docs/operations/server-configuration-parameters/settings#skip_binary_checksum_checks) |
| 176 | +flag |
| 177 | + |
| 178 | +### Missing telemetry data for Go applications or TLS requests |
| 179 | + |
| 180 | +If you are missing telemetry coming from Go applications or TLS requests (like |
| 181 | +HTTPS communication), it might be due to insufficient privileges for attaching |
| 182 | +uprobes. Due to some recent kernel security changes which were backported to |
| 183 | +many older kernel versions, uprobes now require `CAP_SYS_ADMIN` capability. OBI |
| 184 | +uses uprobes to instrument Golang applications and TLS requests, along with |
| 185 | +other runtime/language specific instrumentations. If your OBI deployment |
| 186 | +security configuration isn't using privileged operation (for example, |
| 187 | +`privileged:true` or Docker and Kubernetes) or it doesn't provide |
| 188 | +`CAP_SYS_ADMIN` as a security capability, you might not see some or all of your |
| 189 | +telemetry. |
| 190 | + |
| 191 | +To troubleshoot this issue, enable detailed OBI logging with |
| 192 | +`OTEL_EBPF_LOG_LEVEL=debug`. If you see all the uprobe injections failing with |
| 193 | +the error "setting uprobe (offset)..." then you are likely experiencing this |
| 194 | +issue. |
| 195 | + |
| 196 | +**Solutions:** |
| 197 | + |
| 198 | +You can either: |
| 199 | + |
| 200 | +- Run OBI as privileged. |
| 201 | +- Add `CAP_SYS_ADMIN` to the list of capabilities in your deployment security |
| 202 | + configuration. |
0 commit comments