有个客户数据库环境是三节点RAC,三个节点每周都有不定期轮流重启,每次看alert日志都是心跳异常被踢出集群。主机重启后又可以加入集群,百度了很多,最后参考了一个,已过去1个多月了,再也没有出现过重启的问题。记录下
检查netstat -s发现packet reassembles failed指标大量增加
netstat -s|grep "packet reassembles failed"
netstat -s | fgrep reassembles
配置/etc/sysctl.conf并生效
# echo 'net.ipv4.ipfrag_high_thresh = 16777216 ' >> /etc/sysctl.conf
# echo 'net.ipv4.ipfrag_low_thresh = 15728640 ' >> /etc/sysctl.conf
# echo 'net.ipv4.ipfrag_time = 120 ' >> /etc/sysctl.conf
# echo 'net.ipv4.ipfrag_secret_interval = 600 ' >> /etc/sysctl.conf
# echo 'net.ipv4.ipfrag_max_dist = 1024 ' >> /etc/sysctl.conf
# sysctl -p
RHEL 6.6: IPC Send timeout/node eviction etc with high packet reassembles failure (Doc ID 2008933.1)
Troubleshooting gc block lost and Poor Network Performance in a RAC Environment (Doc ID 563566.1)
Recommendation for the Real Application Cluster Interconnect and Jumbo Frames (Doc ID 341788.1)
Unable To Start ASM RAC Instances Due To ORA-27303: Remote Port MTU Does Not Match Local MTU. (Doc ID 947223.1)
Tuning Inter-Instance Performance in RAC and OPS (Doc ID 181489.1)
集群中所有节点配置私网的网卡有相同的mtu大小
#分片占用内存的高阈值
net.ipv4.ipfrag_high_thresh = 41943040
#分片占用内存的低阈值
net.ipv4.ipfrag_low_thresh = 40894464