spark-thrift 访问延迟10秒的排查思路

最新推荐文章于 2025-05-08 16:34:34 发布

原创最新推荐文章于 2025-05-08 16:34:34 发布 · 613 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#hadoop #大数据

数据库_Hadoop相关专栏收录该内容

9 篇文章

订阅专栏

本文记录了使用 CDH6.2.1 和 spark-thrift 版本 2.4.5 时，通过 dbeaver 连接出现大约 10 秒延迟的问题。经过排查，发现延迟主要由网络解析产生，并最终定位到 NetManager 优先使用 DNS 而非 hosts 文件。通过调整网络配置解决了问题。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

环境介绍

CDH6.2.1
spark-thrift版本2.4.5
异常描述
使用dbeaver连接时延迟10秒左右
主要日志为

20/10/21 11:41:16 INFO ThriftCLIService: Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V1
20/10/21 11:41:16 DEBUG ThriftCLIService: Client's IP Address: 10.103.117.243
20/10/21 11:41:16 DEBUG ThriftCLIService: Client's username: anonymous
20/10/21 11:41:16 DEBUG ThriftCLIService: Client's IP Address: 10.103.117.243
20/10/21 11:41:16 DEBUG UserGroupInformation: PrivilegedAction as:anonymous (auth:SIMPLE) from:org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
20/10/21 11:41:16 DEBUG SessionState: SessionState user: anonymous
20/10/21 11:41:16 DEBUG BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
20/10/21 11:41:16 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit = false
20/10/21 11:41:16 DEBUG BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
20/10/21 11:41:16 DEBUG BlockReaderLocal: dfs.domain.socket.path = /var/run/hdfs-sockets/dn
20/10/21 11:41:16 DEBUG HAUtil: No HA service delegation token found for logical URI hdfs://newbig
20/10/21 11:41:16 DEBUG BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
20/10/21 11:41:16 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit = false
20/10/21 11:41:16 DEBUG BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
20/10/21 11:41:16 DEBUG BlockReaderLocal: dfs.domain.socket.path = /var/run/hdfs-sockets/dn
20/10/21 11:41:16 DEBUG RetryUtils: multipleLinearRandomRetry = null
20/10/21 11:41:16 DEBUG Client: getting client out of cache: org.apache.hadoop.ipc.Client@134c370e
20/10/21 11:41:17 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9565
20/10/21 11:41:17 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9565
20/10/21 11:41:17 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms
20/10/21 11:41:18 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9566
20/10/21 11:41:18 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9566
20/10/21 11:41:18 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms
20/10/21 11:41:19 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9567
20/10/21 11:41:19 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9567
20/10/21 11:41:19 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms
20/10/21 11:41:20 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9568
20/10/21 11:41:20 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9568
20/10/21 11:41:20 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 2ms
20/10/21 11:41:21 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9569
20/10/21 11:41:21 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9569
20/10/21 11:41:21 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms
20/10/21 11:41:22 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9570
20/10/21 11:41:22 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9570
20/10/21 11:41:22 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms
20/10/21 11:41:23 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9571
20/10/21 11:41:23 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9571
20/10/21 11:41:23 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 2ms
20/10/21 11:41:24 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9572
20/10/21 11:41:24 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9572
20/10/21 11:41:24 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms
20/10/21 11:41:25 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9573
20/10/21 11:41:25 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9573
20/10/21 11:41:25 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms
20/10/21 11:41:26 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9574
20/10/21 11:41:26 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9574
20/10/21 11:41:26 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms
20/10/21 11:41:27 DEBUG DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection
20/10/21 11:41:27 DEBUG Client: The ping interval is 60000 ms.
20/10/21 11:41:27 DEBUG Client: Connecting to newbigma02.localdo

主要排查思路

1.校验disk io,网络带宽

IO正常
网络带宽万兆

2.检查所有主机日志

主机日志为发现明显相关延迟错误

3.关注spark-thrift的实现方式

解读部分源码后，发现主要延迟是网络解析产生的

4.主要关注网络配置

发现根因使用NetManager管理网络后DNS被优先解析而不是hosts文件
主要参考链接：https://bugzilla.redhat.com/show_bug.cgi?id=1093777
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/sec-starting_networkmanager