架构
搭建
OpenTelemetry
1.在Java应用程序,启动参数添加:
-javaagent:opentelemetry-javaagent.jar
docker-compose配置示例:
command:
- java
- '-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=9876'
- '-javaagent:opentelemetry-javaagent.jar'
- '-Dotel.service.name=xxx'
- '-Duser.timezone=GMT+08'
- '-Dfile.encoding=utf-8'
- '-jar'
- xxx.jar
2.获取调用链路的traceId
如果需要在Java程序代码中动态获取调用链路的traceId
String traceId = Span.current().getSpanContext().getTraceId();
需要引入以下依赖:
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-api</artifactId>
<version>${io.opentelemetry.version}</version>
</dependency><dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-sdk</artifactId>
<version>${io.opentelemetry.version}</version>
</dependency>
Promtail
Promtail 将本地日志内容传送到 Loki 实例
这里我们配置成收集服务器所有docker容器的日志,对应promtail-config.yaml的配置如下:
scrape_configs:
- job_name: flog_scrape
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5srelabel_configs:
- source_labels: ['__meta_docker_container_name']
regex: '/(.*)'
target_label: 'container'
Loki
负责存储日志和处理查询
配置文件loki-config.yaml如下:
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 3110
grpc_server_max_recv_msg_size: 1073741824 #grpc最大接收消息值,默认4m
grpc_server_max_send_msg_size: 1073741824 #grpc最大发送消息值,默认4mingester:
wal:
enabled: false
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 5m
chunk_retain_period: 30s
max_transfer_retries: 0
max_chunk_age: 20m #一个timeseries块在内存中的最大持续时间。如果timeseries运行的时间超过此时间,则当前块将刷新到存储并创建一个新块schema_config:
configs:
- from: 2021-01-01
store: boltdb
object_store: filesystem
schema: v11
index:
prefix: index_
period: 168hstorage_config:
boltdb:
directory: /opt/loki/index #存储索引地址
filesystem:
directory: /opt/loki/chunkslimits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
ingestion_rate_mb: 30 #修改每用户摄入速率限制,即每秒样本量,默认值为4M
ingestion_burst_size_mb: 15 #修改每用户摄入速率限制,即每秒样本量,默认值为6Mchunk_store_config:
#max_look_back_period: 168h #回看日志行的最大时间,只适用于即时日志
max_look_back_period: 0stable_manager:
retention_deletes_enabled: true #日志保留周期开关,默认为false
retention_period: 336h #日志保留周期
Tempo
Grafana Tempo 是一个开源、易于使用且大容量的分布式跟踪后端
这里使用服务器文件系统作为存储(也可使用minio)tempo.yaml配置如下:
server:
http_listen_port: 3200distributor:
receivers: # this configuration will listen on all ports and protocols that tempo is capable of.
otlp:
protocols:
http:
grpc:
opencensus:ingester:
max_block_duration: 5m # cut the headblock when this much time passes. this is being set for demo purposes and should probably be left alone normallycompactor:
compaction:
block_retention: 1h # overall Tempo trace retention. set for demo purposes
storage:
trace:
backend: local # backend configuration to use
wal:
path: /tmp/tempo/wal # where to store the the wal locally
local:
path: /tmp/tempo/blocksoverrides:
metrics_generator_processors: [service-graphs, span-metrics] # enables metrics generator
Grafana
用于 UI 展示
可以在datasouces.yaml文件中配置数据源(也可在web页面手动配置):
# config file version
apiVersion: 1# list of datasources that should be deleted from the database
deleteDatasources: #如果之前存在name为Prometheus,orgId为1的数据源先删除
- name: Prometheus
orgId: 1
- name: Loki
orgId: 1
- name: Tempo
orgId: 1# list of datasources to insert/update depending
# on what's available in the database
datasources:
- name: Prometheus
type: prometheus
access: proxy
orgId: 1
url: http://prometheus0:9090
# k8s的Prometheus数据源路径
#url: http://prometheus-k8s.kubesphere-monitoring-system:9090
basicAuth: false
isDefault: false
version: 1
editable: true
# <string> 自定义UID,可以用于在配置的其他部分引用此数据源,如果没有指定,将自动生成
uid: prometheus-node- name: Loki
type: loki
access: proxy
orgId: 1
url: http://loki:3100
basicAuth: false
isDefault: true
version: 1
editable: true
uid: loki- name: Tempo
type: tempo
access: proxy
orgId: 1
url: http://tempo:3200
basicAuth: false
isDefault: false
version: 1
editable: false
apiVersion: 1
uid: tempo
使用
切换到 grafana 左侧区域的Explore,即可进入到Loki的页面:
然后我们点击Log labels就可以把当前系统采集的日志标签给显示出来,可以根据这些标签进行日志的过滤查询: