spark-thrift 访问延迟10秒的排查思路

本文记录了使用 CDH6.2.1 和 spark-thrift 版本 2.4.5 时,通过 dbeaver 连接出现大约 10 秒延迟的问题。经过排查,发现延迟主要由网络解析产生,并最终定位到 NetManager 优先使用 DNS 而非 hosts 文件。通过调整网络配置解决了问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

环境介绍

CDH6.2.1
spark-thrift版本2.4.5
异常描述
使用dbeaver连接时延迟10秒左右
主要日志为

20/10/21 11:41:16 INFO ThriftCLIService: Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V1
20/10/21 11:41:16 DEBUG ThriftCLIService: Client's IP Address: 10.103.117.243
20/10/21 11:41:16 DEBUG ThriftCLIService: Client's username: anonymous
20/10/21 11:41:16 DEBUG ThriftCLIService: Client's IP Address: 10.103.117.243
20/10/21 11:41:16 DEBUG UserGroupInformation: PrivilegedAction as:anonymous (auth:SIMPLE) from:org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
20/10/21 11:41:16 DEBUG SessionState: SessionState user: anonymous
20/10/21 11:41:16 DEBUG BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
20/10/21 11:41:16 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit = false
20/10/21 11:41:16 DEBUG BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
20/10/21 11:41:16 DEBUG BlockReaderLocal: dfs.domain.socket.path = /var/run/hdfs-sockets/dn
20/10/21 11:41:16 DEBUG HAUtil: No HA service delegation token found for logical URI hdfs://newbig
20/10/21 11:41:16 DEBUG BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
20/10/21 11:41:16 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit = false
20/10/21 11:41:16 DEBUG BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
20/10/21 11:41:16 DEBUG BlockReaderLocal: dfs.domain.socket.path = /var/run/hdfs-sockets/dn
20/10/21 11:41:16 DEBUG RetryUtils: multipleLinearRandomRetry = null
20/10/21 11:41:16 DEBUG Client: getting client out of cache: org.apache.hadoop.ipc.Client@134c370e
20/10/21 11:41:17 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9565
20/10/21 11:41:17 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9565
20/10/21 11:41:17 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms
20/10/21 11:41:18 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9566
20/10/21 11:41:18 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9566
20/10/21 11:41:18 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms
20/10/21 11:41:19 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9567
20/10/21 11:41:19 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9567
20/10/21 11:41:19 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms
20/10/21 11:41:20 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9568
20/10/21 11:41:20 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9568
20/10/21 11:41:20 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 2ms
20/10/21 11:41:21 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9569
20/10/21 11:41:21 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9569
20/10/21 11:41:21 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms
20/10/21 11:41:22 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9570
20/10/21 11:41:22 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9570
20/10/21 11:41:22 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms
20/10/21 11:41:23 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9571
20/10/21 11:41:23 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9571
20/10/21 11:41:23 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 2ms
20/10/21 11:41:24 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9572
20/10/21 11:41:24 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9572
20/10/21 11:41:24 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms
20/10/21 11:41:25 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9573
20/10/21 11:41:25 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9573
20/10/21 11:41:25 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms
20/10/21 11:41:26 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop sending #9574
20/10/21 11:41:26 DEBUG Client: IPC Client (1607020784) connection to cdhnode01.localdomain/172.27.10.70:8032 from hadoop got value #9574
20/10/21 11:41:26 DEBUG ProtobufRpcEngine: Call: getApplicationReport took 1ms
20/10/21 11:41:27 DEBUG DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection
20/10/21 11:41:27 DEBUG Client: The ping interval is 60000 ms.
20/10/21 11:41:27 DEBUG Client: Connecting to newbigma02.localdo

主要排查思路

1.校验disk io,网络带宽

IO正常
网络带宽万兆

2.检查所有主机日志

主机日志为发现明显相关延迟错误

3.关注spark-thrift的实现方式

解读部分源码后,发现主要延迟是网络解析产生的

4.主要关注网络配置

发现根因使用NetManager管理网络后DNS被优先解析而不是hosts文件
主要参考链接:https://bugzilla.redhat.com/show_bug.cgi?id=1093777
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/sec-starting_networkmanager

5.解决Linux中/etc/resolv.conf文件总是自动改回的问题

修改/etc/sysconfig/network-scripts/ifcfig-team0文件,添加一句PEERDNS=yes即可

Spark-ThriftSpark-SQL是Spark框架中的两个组件,它们有以下区别: 1. Spark-SQL是Spark的一个模块,用于处理结构化数据,支持SQL查询和DataFrame API。它提供了一种高效且易于使用的方法来处理和分析结构化数据。用户可以使用SQL语句或DataFrame API来查询和操作数据。Spark-SQL允许用户直接在Spark应用程序中使用SQL查询,而无需编写复杂的MapReduce代码。 2. Spark-ThriftSpark的一个独立服务,它提供了一个标准的Thrift接口,用于执行SQL查询。它可以作为一个独立的进程运行,并通过网络接收来自客户端的SQL查询请求,并将查询转发到Spark集群中的Spark-SQL模块进行处理。Spark-Thrift使得可以使用不同的编程语言,如Java、Python、R等,通过Thrift接口与Spark集群交互。 因此,Spark-SQL是Spark框架中用于处理结构化数据的模块,而Spark-Thrift是提供Thrift接口让用户可以使用不同编程语言与Spark-SQL模块交互的独立服务。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* *3* [CDH15.0支持spark-sql和spark-thrift-server](https://blog.csdn.net/u012458821/article/details/87635599)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] - *2* [122.Thriftspark-sql客户端部署](https://blog.csdn.net/m0_47454596/article/details/126856172)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值