file-type

Python数据结构:元组、字典与集合详解

PDF文件

下载需积分: 4 | 370KB | 更新于2024-08-31 | 182 浏览量 | 0 下载量 举报 收藏
download 立即下载
本资源主要介绍了Python编程语言中的三种数据结构:元组、字典和集合,特别是它们的定义、特点以及使用方法。 【元组】 元组是Python中的一个不可变序列,通常用于存储不可修改的数据。元组用括号`()`表示,即使元素只有一个,也需要在末尾加上逗号。元组的特点决定了它在需要保持数据不变性的情景下非常适用,而列表则更适合于需要频繁修改元素的情况。元组解包是Python中的一种特性,允许将元组的元素分别赋值给多个变量,使得代码更简洁易读。 【不可变对象与可变对象】 在Python中,对象分为可变和不可变。不可变对象包括整数、浮点数、字符串和元组,它们的标识(id)、类型(type)和值(value)在创建后不会改变。而可变对象如列表、字典和集合,它们的值可以改变。可变对象在内存中保存的标识、类型和值可以随着对象状态的改变而更新。 【字典】 字典是一种映射数据结构,它提供了通过键来快速查找值的功能。与列表相比,字典在查询性能上更优,但插入和删除操作可能相对较慢。每个键值对由键(key)和对应的值(value)组成,键必须是不可变对象,而值可以是任意对象。字典的键是唯一的,如果有重复的键,后面的值会覆盖前面的。创建字典可以通过大括号`{}`,并以键值对的形式定义,如`{'key': 'value'}`。字典的常用方法包括: - `dict()`:创建一个空字典。 - `get(key[, default])`:根据键获取值,如果没有该键,则返回默认值。 - `update()`:将另一个字典的键值对添加到当前字典中。 - `del`:删除字典中的键值对。 - `popitem()`:删除并返回字典的最后一个键值对。 【集合】 虽然在提供的内容中没有直接提及集合,但集合(set)也是Python中的一种数据结构,它是一组无序且不重复的元素。集合可用于去重、成员关系测试和数学运算,如交集、并集和差集。创建集合可以使用`set()`函数,或者通过大括号`{}`,但注意与字典的区别,集合中只包含键,不包含键值对。 理解和掌握元组、字典和集合这些数据结构是使用Python进行数据处理和编程的基础,它们各自的特点和用途使它们在不同场景下发挥着重要作用。了解它们的特性和操作方法能够帮助开发者编写更加高效、灵活的代码。

相关推荐

filetype

import numpy as np import matplotlib.pyplot as plt from pymatgen.io.vasp import Vasprun from pymatgen.core.structure import Structure from scipy.spatial import cKDTree from tqdm import tqdm import matplotlib as mpl import warnings import os import csv import argparse import multiprocessing import time import sys 忽略可能的警告 warnings.filterwarnings(“ignore”, category=UserWarning) 专业绘图设置 plt.style.use(‘seaborn-v0_8-whitegrid’) mpl.rcParams.update({ ‘font.family’: ‘serif’, ‘font.serif’: [‘Times New Roman’, ‘DejaVu Serif’], ‘font.size’: 12, ‘axes.labelsize’: 14, ‘axes.titlesize’: 16, ‘xtick.labelsize’: 12, ‘ytick.labelsize’: 12, ‘figure.dpi’: 600, ‘savefig.dpi’: 600, ‘figure.figsize’: (8, 6), ‘lines.linewidth’: 2.0, ‘legend.fontsize’: 10, ‘legend.framealpha’: 0.8, ‘mathtext.default’: ‘regular’, ‘axes.linewidth’: 1.5, ‘xtick.major.width’: 1.5, ‘ytick.major.width’: 1.5, ‘xtick.major.size’: 5, ‘ytick.major.size’: 5, }) def identify_atom_types(struct): “”“原子类型识别函数”“” # 初始化数据结构 atom_types = { “phosphate_oxygens”: {“P-O/P=O”: [], “P-OH”: []}, “phosphate_hydrogens”: [], “water_oxygens”: [], “water_hydrogens”: [], “hydronium_oxygens”: [], “hydronium_hydrogens”: [], “fluoride_atoms”: [i for i, site in enumerate(struct) if site.species_string == “F”], “aluminum_atoms”: [i for i, site in enumerate(struct) if site.species_string == “Al”] } # 构建全局KDTree all_coords = np.array([site.coords for site in struct]) kdtree = cKDTree(all_coords, boxsize=struct.lattice.abc) # 识别磷酸基团 p_atoms = [i for i, site in enumerate(struct) if site.species_string == "P"] phosphate_oxygens = [] for p_idx in p_atoms: # 查找P周围的O原子 (距离 < 1.6Å) neighbors = kdtree.query_ball_point(all_coords[p_idx], r=1.6) p_o_indices = [idx for idx in neighbors if idx != p_idx and struct[idx].species_string == "O"] phosphate_oxygens.extend(p_o_indices) # 识别所有H原子并确定归属 hydrogen_owners = {} h_atoms = [i for i, site in enumerate(struct) if site.species_string == "H"] for h_idx in h_atoms: neighbors = kdtree.query_ball_point(all_coords[h_idx], r=1.2) candidate_os = [idx for idx in neighbors if idx != h_idx and struct[idx].species_string == "O"] if not candidate_os: continue min_dist = float('inf') owner_o = None for o_idx in candidate_os: dist = struct.get_distance(h_idx, o_idx) if dist < min_dist: min_dist = dist owner_o = o_idx hydrogen_owners[h_idx] = owner_o # 分类磷酸氧:带H的为P-OH,不带H的为P-O/P=O for o_idx in phosphate_oxygens: has_hydrogen = any(owner_o == o_idx for h_idx, owner_o in hydrogen_owners.items()) if has_hydrogen: atom_types["phosphate_oxygens"]["P-OH"].append(o_idx) else: atom_types["phosphate_oxygens"]["P-O/P=O"].append(o_idx) # 识别水和水合氢离子 all_o_indices = [i for i, site in enumerate(struct) if site.species_string == "O"] non_phosphate_os = [o_idx for o_idx in all_o_indices if o_idx not in phosphate_oxygens] o_h_count = {} for h_idx, owner_o in hydrogen_owners.items(): o_h_count[owner_o] = o_h_count.get(owner_o, 0) + 1 for o_idx in non_phosphate_os: h_count = o_h_count.get(o_idx, 0) attached_hs = [h_idx for h_idx, owner_o in hydrogen_owners.items() if owner_o == o_idx] if h_count == 2: atom_types["water_oxygens"].append(o_idx) atom_types["water_hydrogens"].extend(attached_hs) elif h_count == 3: atom_types["hydronium_oxygens"].append(o_idx) atom_types["hydronium_hydrogens"].extend(attached_hs) # 识别磷酸基团的H原子 for o_idx in atom_types["phosphate_oxygens"]["P-OH"]: attached_hs = [h_idx for h_idx, owner_o in hydrogen_owners.items() if owner_o == o_idx] atom_types["phosphate_hydrogens"].extend(attached_hs) return atom_types def get_hbond_config(): “”“返回氢键配置,包含距离和角度阈值”“” return [ { “name”: “P-O/P=O···Hw”, “donor_type”: “water_oxygens”, “acceptor_type”: “P-O/P=O”, “h_type”: “water_hydrogens”, “distance_threshold”: 2.375, “angle_threshold”: 143.99, “color”: “#1f77b4” }, { “name”: “P-O/P=O···Hh”, “donor_type”: “hydronium_oxygens”, “acceptor_type”: “P-O/P=O”, “h_type”: “hydronium_hydrogens”, “distance_threshold”: 2.275, “angle_threshold”: 157.82, “color”: “#ff7f0e” }, { “name”: “P-O/P=O···Hp”, “donor_type”: “P-OH”, “acceptor_type”: “P-O/P=O”, “h_type”: “phosphate_hydrogens”, “distance_threshold”: 2.175, “angle_threshold”: 155.00, “color”: “#2ca02c” }, { “name”: “P-OH···Ow”, “donor_type”: “P-OH”, “acceptor_type”: “water_oxygens”, “h_type”: “phosphate_hydrogens”, “distance_threshold”: 2.275, “angle_threshold”: 155.13, “color”: “#d62728” }, { “name”: “Hw···Ow”, “donor_type”: “water_oxygens”, “acceptor_type”: “water_oxygens”, “h_type”: “water_hydrogens”, “distance_threshold”: 2.375, “angle_threshold”: 138.73, “color”: “#9467bd” }, { “name”: “Hh···Ow”, “donor_type”: “hydronium_oxygens”, “acceptor_type”: “water_oxygens”, “h_type”: “hydronium_hydrogens”, “distance_threshold”: 2.225, “angle_threshold”: 155.31, “color”: “#8c564b” }, { “name”: “Hw···F”, “donor_type”: “water_oxygens”, “acceptor_type”: “fluoride_atoms”, “h_type”: “water_hydrogens”, “distance_threshold”: 2.225, “angle_threshold”: 137.68, “color”: “#e377c2” }, { “name”: “Hh···F”, “donor_type”: “hydronium_oxygens”, “acceptor_type”: “fluoride_atoms”, “h_type”: “hydronium_hydrogens”, “distance_threshold”: 2.175, “angle_threshold”: 154.64, “color”: “#7f7f7f” }, { “name”: “Hp···F”, “donor_type”: “P-OH”, “acceptor_type”: “fluoride_atoms”, “h_type”: “phosphate_hydrogens”, “distance_threshold”: 2.275, “angle_threshold”: 153.71, “color”: “#bcbd22” } ] def calculate_angle(struct, donor_idx, h_idx, acceptor_idx): “”“计算D-H···A键角 (角度制),使用笛卡尔向量表示并处理周期性”“” # 获取分数坐标 frac_coords = struct.frac_coords lattice = struct.lattice # 获取氢原子H的分数坐标 h_frac = frac_coords[h_idx] # 计算供体D相对于H的分数坐标差 d_frac = frac_coords[donor_idx] dh_frac = d_frac - h_frac # 计算受体A相对于H的分数坐标差 a_frac = frac_coords[acceptor_idx] ah_frac = a_frac - h_frac # 应用周期性修正 (将分数坐标差限制在[-0.5, 0.5]范围内) dh_frac = np.where(dh_frac > 0.5, dh_frac - 1, dh_frac) dh_frac = np.where(dh_frac < -0.5, dh_frac + 1, dh_frac) ah_frac = np.where(ah_frac > 0.5, ah_frac - 1, ah_frac) ah_frac = np.where(ah_frac < -0.5, ah_frac + 1, ah_frac) # 转换为笛卡尔向量 (H->D 和 H->A) vec_hd = np.dot(dh_frac, lattice.matrix) # H->D向量 vec_ha = np.dot(ah_frac, lattice.matrix) # H->A向量 # 计算向量点积 dot_product = np.dot(vec_hd, vec_ha) # 计算向量模长 norm_hd = np.linalg.norm(vec_hd) norm_ha = np.linalg.norm(vec_ha) # 避免除以零 if norm_hd < 1e-6 or norm_ha < 1e-6: return None # 计算余弦值 cos_theta = dot_product / (norm_hd * norm_ha) # 处理数值误差 cos_theta = np.clip(cos_theta, -1.0, 1.0) # 计算角度 (弧度转角度) angle_rad = np.arccos(cos_theta) angle_deg = np.degrees(angle_rad) return angle_deg def calculate_hbond_lengths_frame(struct, atom_types, hbond_config, bond_threshold=1.3): “”“计算单帧结构中氢键键长(H···A距离)”“” # 构建全局KDTree用于快速搜索 all_coords = np.array([site.coords for site in struct]) lattice_abc = struct.lattice.abc kdtree = cKDTree(all_coords, boxsize=lattice_abc) # 结果字典: {氢键类型: [键长列表]} length_results = {hbond["name"]: [] for hbond in hbond_config} # 处理每一类氢键 for hbond in hbond_config: # 获取供体原子列表 if hbond["donor_type"] == "P-OH": donors = atom_types["phosphate_oxygens"]["P-OH"] else: donors = atom_types[hbond["donor_type"]] # 获取受体原子列表 if hbond["acceptor_type"] == "P-O/P=O": acceptors = atom_types["phosphate_oxygens"]["P-O/P=O"] elif hbond["acceptor_type"] == "P-OH": acceptors = atom_types["phosphate_oxygens"]["P-OH"] else: acceptors = atom_types[hbond["acceptor_type"]] # 获取氢原子列表 hydrogens = atom_types[hbond["h_type"]] # 如果没有氢原子或受体,跳过 if len(hydrogens) == 0 or len(acceptors) == 0: continue # 为受体构建KDTree(使用全局坐标) acceptor_coords = all_coords[acceptors] acceptor_kdtree = cKDTree(acceptor_coords, boxsize=lattice_abc) # 遍历所有氢原子 for h_idx in hydrogens: h_coords = all_coords[h_idx] # 查找与H成键的供体 (距离 < bond_threshold) donor_neighbors = kdtree.query_ball_point(h_coords, r=bond_threshold) donor_candidates = [idx for idx in donor_neighbors if idx in donors] # 如果没有找到供体,跳过 if not donor_candidates: continue # 选择最近的供体 min_dist = float('inf') donor_idx = None for d_idx in donor_candidates: dist = struct.get_distance(h_idx, d_idx) if dist < min_dist: min_dist = dist donor_idx = d_idx # 查找在距离阈值内的受体 acceptor_indices = acceptor_kdtree.query_ball_point(h_coords, r=hbond["distance_threshold"]) for a_idx_offset in acceptor_indices: a_idx = acceptors[a_idx_offset] # 排除供体自身 if a_idx == donor_idx: continue # 计算键长 (H···A距离) ha_distance = struct.get_distance(h_idx, a_idx) # 计算键角 angle = calculate_angle(struct, donor_idx, h_idx, a_idx) # 检查角度阈值 if angle is not None and angle >= hbond["angle_threshold"]: length_results[hbond["name"]].append(ha_distance) return length_results def calculate_hbond_lengths_frame_wrapper(args): “”“包装函数用于多进程处理”“” struct, hbond_config = args atom_types = identify_atom_types(struct) return calculate_hbond_lengths_frame(struct, atom_types, hbond_config) def calculate_hbond_lengths_parallel(structures, workers=1, step_interval=10): “”“并行计算氢键键长,每step_interval帧计算一次”“” hbond_config = get_hbond_config() all_results = [] # 只选择每step_interval帧的结构 selected_structures = structures[::step_interval] frame_indices = list(range(0, len(structures), step_interval)) # 准备参数列表 args_list = [(struct, hbond_config) for struct in selected_structures] # 如果没有可用的worker,则顺序执行 if workers == 1: results = [] for args in tqdm(args_list, desc="Calculating HBond Lengths"): results.append(calculate_hbond_lengths_frame_wrapper(args)) else: with multiprocessing.Pool(processes=workers) as pool: results = list(tqdm( pool.imap(calculate_hbond_lengths_frame_wrapper, args_list), total=len(selected_structures), desc="Calculating HBond Lengths" )) # 将结果与帧索引组合 for frame_idx, frame_result in zip(frame_indices, results): all_results.append({ "frame_idx": frame_idx, "results": frame_result }) return all_results def plot_hbond_length_time_series(all_results, system_name): “”“绘制氢键键长随时间变化的曲线并保存原始数据”“” # 创建输出目录 os.makedirs(“HBond_Length_Time_Series”, exist_ok=True) os.makedirs(“HBond_Length_Data”, exist_ok=True) hbond_config = get_hbond_config() # 创建统计信息汇总文件 - 使用'A'代替Å避免编码问题 summary_path = os.path.join("HBond_Length_Data", f"{system_name}_summary.csv") with open(summary_path, 'w', newline='', encoding='utf-8') as summary_file: summary_writer = csv.writer(summary_file) summary_writer.writerow(["HBond Type", "Mean Length (A)", "Std Dev (A)", "Median (A)", "Min (A)", "Max (A)"]) # 处理每种氢键类型 for hbond in hbond_config: hbond_name = hbond["name"] safe_name = hbond_name.replace("/", "").replace("=", "").replace("···", "_").replace(" ", "_") # 提取该氢键类型的所有帧的键长 frame_indices = [] all_lengths = [] for frame_data in all_results: frame_idx = frame_data["frame_idx"] lengths = frame_data["results"].get(hbond_name, []) if lengths: # 只记录有数据的帧 frame_indices.append(frame_idx) all_lengths.append(lengths) if not frame_indices: print(f"No hydrogen bonds found for {hbond_name} in {system_name}") continue # 计算每帧的统计量 mean_lengths = [np.mean(lengths) for lengths in all_lengths] median_lengths = [np.median(lengths) for lengths in all_lengths] std_lengths = [np.std(lengths) for lengths in all_lengths] # 计算全局统计量 all_flat_lengths = [length for sublist in all_lengths for length in sublist] global_mean = np.mean(all_flat_lengths) global_std = np.std(all_flat_lengths) global_median = np.median(all_flat_lengths) global_min = np.min(all_flat_lengths) global_max = np.max(all_flat_lengths) # 保存全局统计信息到汇总文件 with open(summary_path, 'a', newline='', encoding='utf-8') as summary_file: summary_writer = csv.writer(summary_file) summary_writer.writerow([ hbond_name, f"{global_mean:.4f}", f"{global_std:.4f}", f"{global_median:.4f}", f"{global_min:.4f}", f"{global_max:.4f}" ]) # ========== 保存原始时间序列数据 ========== data_path = os.path.join("HBond_Length_Data", f"{system_name}_{safe_name}_time_series.csv") with open(data_path, 'w', newline='', encoding='utf-8') as data_file: data_writer = csv.writer(data_file) data_writer.writerow(["Frame Index", "Mean Length (A)", "Median Length (A)", "Std Dev (A)"]) for i, frame_idx in enumerate(frame_indices): data_writer.writerow([ frame_idx, f"{mean_lengths[i]:.4f}", f"{median_lengths[i]:.4f}", f"{std_lengths[i]:.4f}" ]) # ========== 绘制时间序列图 ========== plt.figure(figsize=(8, 6)) # 符合期刊要求的尺寸 # 绘制平均键长曲线 plt.plot(frame_indices, mean_lengths, color=hbond["color"], label="Mean Length", linewidth=1.5) # 绘制中位数键长曲线 plt.plot(frame_indices, median_lengths, color=hbond["color"], linestyle="--", label="Median Length", linewidth=1.0) # 添加标准差误差带 plt.fill_between( frame_indices, np.array(mean_lengths) - np.array(std_lengths), np.array(mean_lengths) + np.array(std_lengths), color=hbond["color"], alpha=0.2, label="±1 Std Dev" ) # 添加全局统计信息(放在右上角) stats_text = (f"Global Mean: {global_mean:.3f} $\mathrm{{\\AA}}$\n" f"Global Std: {global_std:.3f} $\mathrm{{\\AA}}$\n" f"Global Median: {global_median:.3f} $\mathrm{{\\AA}}$\n" f"Count: {len(all_flat_lengths)}") plt.text(0.95, 0.95, stats_text, transform=plt.gca().transAxes, verticalalignment='top', horizontalalignment='right', fontsize=10, bbox=dict(boxstyle='round', facecolor='white', alpha=0.8)) # 设置标题和标签 plt.title(f"{system_name}: {hbond_name} Bond Length", fontsize=14) # 精简标题 plt.xlabel("Time(fs)", fontsize=12) plt.ylabel(r"Bond Length ($\mathrm{\AA}$)", fontsize=12) # 使用LaTeX格式的Å符号 # 统一设置Y轴范围(1.0-2.4 Å) plt.ylim(1.0, 2.4) plt.xlim(0,4000) plt.grid(True, linestyle='--', alpha=0.5) # 将图例放在左上角 plt.legend(loc='upper left', fontsize=10, frameon=True, framealpha=0.8) # 调整布局 plt.tight_layout() # 保存图像(使用TIFF格式,期刊推荐) image_path = os.path.join("HBond_Length_Time_Series", f"{system_name}_{safe_name}_time_series.tiff") plt.savefig(image_path, dpi=600, format='tiff', bbox_inches='tight') plt.close() print(f"Saved HBond length time series data: {data_path}") print(f"Saved HBond length time series plot: {image_path}") print(f"Saved summary statistics: {summary_path}") def main(vasprun_files, workers=1, step_interval=10): “”“主处理函数”“” for system_name, vasprun_file in vasprun_files.items(): print(f"\n{‘=’*50}“) print(f"Processing {system_name}: {vasprun_file} with {workers} workers”) print(f"Step interval: {step_interval} frames") print(f"{‘=’*50}") start_time = time.time() try: # 加载VASP结果 vr = Vasprun(vasprun_file, ionic_step_skip=5) structures = vr.structures print(f"Loaded {len(structures)} frames") print(f"Lattice parameters: {structures[0].lattice.abc}") print(f"Periodic boundary handling: Fractional coordinates + PBC correction") # 计算氢键键长时间序列 print("Calculating hydrogen bond lengths over time...") hbond_lengths = calculate_hbond_lengths_parallel(structures, workers=workers, step_interval=step_interval) # 绘制并保存结果 plot_hbond_length_time_series(hbond_lengths, system_name) elapsed = time.time() - start_time print(f"\nCompleted processing for {system_name} in {elapsed:.2f} seconds") except Exception as e: print(f"Error processing {system_name}: {str(e)}", file=sys.stderr) import traceback traceback.print_exc() print("\nAll HBond length time series analysis completed successfully!") if name == “main”: # 设置命令行参数 parser = argparse.ArgumentParser(description=‘Calculate hydrogen bond lengths from VASP simulations’) parser.add_argument(‘–workers’, type=int, default=multiprocessing.cpu_count(), help=f’Number of parallel workers (default: {multiprocessing.cpu_count()})‘) parser.add_argument(’–step_interval’, type=int, default=10, help=‘Frame interval for analysis (default: 10)’) args = parser.parse_args() # 自动设置vasprun文件和系统名称 vasprun_files = { "System1": "vasprun1.xml", "System2": "vasprun2.xml", "System3": "vasprun3.xml", "System4": "vasprun4.xml" } # 检查文件是否存在 missing_files = [name for name, path in vasprun_files.items() if not os.path.exists(path)] if missing_files: raise FileNotFoundError(f"Missing vasprun files: {', '.join(missing_files)}") print(f"Starting HBond length analysis with {args.workers} workers...") main(vasprun_files, workers=args.workers, step_interval=args.step_interval)以上代码是从RDF到氢键角度的计算完整过程,其中的逻辑判断,首先是供体-H之间的化学键判定,然后再是H受体之间的氢键距离判定,最后是氢键角度的判定,由于目前氢键角度的判定暂定为120°,在这里我们需要延用一样的逻辑计算不同的内容,即计算氢键寿命,采用存活相关函数SCF计算,弛豫时间采用积分法,计算对象为LAMMPS数据,并只计算一个体系,导入的文件为lammpstrij和data文件,计算的步长为0.1fs,按照合适的要求,最好符合The Journal of Chemical Physics期刊要求,轨迹输出频率,相关函数计算窗口,以及时间原点间隔,由于LAMMPS体系较大,尽可能的提高CPU和内存利用,以计算速率为主,可以增大CPU和内存的占用率。其中对原子进行映射,具体主要为1: "H", 2: "O", 3: "F", 7: "P",H3PO4-23pure.data为data文件名,nvt-P2O5-353K-23.lammpstrj为lammpstrij的文件名,然后请输出完整代码,然后提供一个测试代码,对第一帧进行氢键的识别,确保无误之后再进行氢键寿命计算的调试,在计算过程中只需要计算氢键寿命,保留键长随时间变化的识别逻辑,不需要再算键长随时间的变化。data文件为 序号 原子类型 x y z类型,例如 Atoms # charge 1 7 17.470000000000 55.085000000000 14.227000000000 2 2 16.654000000000 54.249000000000 15.498000000000 3 2 16.582000000000 55.222000000000 12.750000000000 4 2 17.569000000000 56.681000000000 14.791000000000 5 1 16.331000000000 53.417000000000 15.086000000000 优化原子ID识别问题,最好能分步进行,命令中能够输入只算第一帧氢键识别,如果无误再接着后续默认参数计算氢键寿命,然后文件的地址都设为默认参数而不用在命令栏中输入文件名

filetype

import numpy as np from pymatgen.io.vasp import Vasprun from tqdm import tqdm import matplotlib.pyplot as plt import seaborn as sns import pandas as pd import os import matplotlib as mpl 设置全局绘图参数 mpl.rcParams[‘font.family’] = ‘serif’ # 期刊常用字体 mpl.rcParams[‘mathtext.fontset’] = ‘dejavuserif’ # 数学字体 mpl.rcParams[‘axes.linewidth’] = 1.5 # 增加轴线宽度 mpl.rcParams[‘lines.linewidth’] = 2.5 # 线宽 mpl.rcParams[‘figure.dpi’] = 300 # 高分辨率 def get_angle(A, B, C): “”“计算三个点之间的角度(B为顶点)”“” BA = A - B BC = C - B cos_theta = np.dot(BA, BC) / (np.linalg.norm(BA) * np.linalg.norm(BC)) cos_theta = np.clip(cos_theta, -1.0, 1.0) # 避免数值误差 return np.degrees(np.arccos(cos_theta)) def analyze_system(system_name, vasprun_file, output_dir): “”“分析单个体系的氢键分布”“” 创建体系特定输出目录 system_dir = os.path.join(output_dir, system_name) os.makedirs(system_dir, exist_ok=True) 初始化分类存储 bond_types = { ‘O-O’: {‘angles’: [], ‘color’: ‘#1f77b4’}, ‘O-F’: {‘angles’: [], ‘color’: ‘#ff7f0e’}, ‘F-O’: {‘angles’: [], ‘color’: ‘#2ca02c’}, ‘F-F’: {‘angles’: [], ‘color’: ‘#d62728’} } # 参数设置 (可根据不同体系调整) step = 100 # 采样步长 donor_cutoff = 1.2 # 供体原子最大距离 (Å) receptor_cutoff = 2.5 # 受体原子最大距离 (Å) angle_threshold = 120 # 最小氢键角度 (度) print(f"[{system_name}] 正在读取 {vasprun_file} 文件…“) try: vr = Vasprun(vasprun_file) except Exception as e: print(f”[{system_name}] 读取vasprun文件时出错: {e}“) return system_name, bond_types # 返回空结果 structures = vr.structures print(f”[{system_name}] 总帧数: {len(structures)}, 采样步长: {step}“) # 主分析循环 for frame_idx, structure in enumerate(tqdm(structures[::step], desc=f”{system_name}氢键分析", unit=“帧”)): # 预处理原子索引(优化性能) o_f_sites = [site for site in structure if site.species_string in (“O”, “F”)] for H_site in structure: if H_site.species_string != “H”: continue # 寻找供体原子 donor = min( (site for site in o_f_sites), key=lambda x: H_site.distance(x), default=None ) if donor is None or H_site.distance(donor) > donor_cutoff: continue # 获取受体原子 receptors = [] for neighbor in structure.get_neighbors(H_site, r=receptor_cutoff): try: site = neighbor.site dist = neighbor.distance except AttributeError: site, dist, _, _ = neighbor if (site.species_string in (“O”, “F”) and site != donor and dist <= receptor_cutoff): receptors.append(site) # 分类计算角度 for receptor in receptors: angle = get_angle(donor.coords, H_site.coords, receptor.coords) if angle < angle_threshold: continue # 确定键类型 donor_type = donor.species_string receptor_type = receptor.species_string bond_key = f"{donor_type}-{receptor_type}" if bond_key in bond_types: bond_types[bond_key][‘angles’].append(angle) # 保存体系统计数据 stats_data = [] for bond_type, data in bond_types.items(): angles = data[‘angles’] if len(angles) == 0: continue stats_data.append({ ‘Type’: bond_type, ‘Count’: len(angles), ‘Mean’: np.mean(angles), ‘Std’: np.std(angles), ‘Median’: np.median(angles), ‘Min’: np.min(angles), ‘Max’: np.max(angles) }) stats_df = pd.DataFrame(stats_data) stats_file = os.path.join(system_dir, “bond_type_stats.csv”) stats_df.to_csv(stats_file, index=False) print(f"[{system_name}] 键型统计已保存到 {stats_file}") # 保存当前体系的分布图 plot_single_system(system_name, bond_types, system_dir) return system_name, bond_types def plot_single_system(system_name, bond_types, output_dir): “”“为单个体系绘制分布图并保存”“” plt.figure(figsize=(10, 6)) sns.set_style(“whitegrid”) 统一设置颜色映射 bond_colors = { ‘O-O’: ‘#1f77b4’, ‘O-F’: ‘#ff7f0e’, ‘F-O’: ‘#2ca02c’, ‘F-F’: ‘#d62728’ } bins = np.linspace(90, 180, 30) # 绘制分布 for bond_type, data in bond_types.items(): angles = data[‘angles’] if len(angles) == 0: continue # KDE曲线 sns.kdeplot(angles, label=f"{bond_type} (n={len(angles)})“, color=bond_colors[bond_type], linewidth=2.5, alpha=0.8) # 直方图叠加 plt.hist(angles, bins=bins, color=bond_colors[bond_type], alpha=0.2, density=True) # 设置图表属性 plt.xlabel(“Hydrogen Bond Angle (degrees)”, fontsize=14) plt.ylabel(“Probability Density”, fontsize=14) plt.title(f"Hydrogen Bond Angle Distribution: {system_name}”, fontsize=16) plt.xlim(90, 180) plt.ylim(0, 0.035) plt.legend(title=“Bond Type”, fontsize=12) # 保存图像 plot_file = os.path.join(output_dir, f"{system_name}_bond_distribution.png") plt.tight_layout() plt.savefig(plot_file, dpi=300, bbox_inches=‘tight’) plt.close() # 关闭图形释放内存 print(f"[{system_name}] 已保存分布图到 {plot_file}") def plot_combined_results(results, output_dir): “”“将四个体系的结果绘制在统一坐标的子图中并保存”“” plt.figure(figsize=(14, 12)) sns.set(style=“whitegrid”, font_scale=1.2, palette=“muted”) bins = np.linspace(90, 180, 30) 创建2x2子图 fig, axes = plt.subplots(2, 2, figsize=(14, 12), sharex=True, sharey=True) axes = axes.flatten() # 统一设置颜色映射 bond_colors = { ‘O-O’: ‘#1f77b4’, ‘O-F’: ‘#ff7f0e’, ‘F-O’: ‘#2ca02c’, ‘F-F’: ‘#d62728’ } # 为每个体系绘制子图 for idx, (system_name, bond_types) in enumerate(results.items()): ax = axes[idx] # 绘制每种键型的分布 for bond_type, data in bond_types.items(): angles = data[‘angles’] if len(angles) == 0: continue # KDE曲线 sns.kdeplot(angles, label=f"{bond_type}“, color=bond_colors[bond_type], linewidth=2.5, ax=ax) # 直方图叠加 ax.hist(angles, bins=bins, color=bond_colors[bond_type], alpha=0.2, density=True) # 设置子图标题和标签 ax.set_title(f”{system_name}“, fontsize=14, fontweight=‘bold’) ax.set_xlabel(“Hydrogen Bond Angle (degrees)”, fontsize=12) ax.set_ylabel(“Probability Density”, fontsize=12) ax.set_xlim(90, 180) ax.set_ylim(0, 0.035) # 统一Y轴范围 ax.legend(title=“Bond Type”, fontsize=10, loc=‘upper left’) ax.grid(True, linestyle=‘–’, alpha=0.7) # 添加统计信息文本框 stats_text = “\n”.join([f”{bt}: n={len(data[‘angles’])}" for bt, data in bond_types.items() if len(data[‘angles’]) > 0]) ax.text(0.95, 0.95, stats_text, transform=ax.transAxes, verticalalignment=‘top’, horizontalalignment=‘right’, bbox=dict(boxstyle=‘round’, facecolor=‘white’, alpha=0.8), fontsize=9) # 添加整体标题 plt.suptitle(“Hydrogen Bond Angle Distribution in Different Systems”, fontsize=16, fontweight=‘bold’, y=0.98) # 调整布局 plt.tight_layout(rect=[0, 0, 1, 0.96]) # 保存组合图 combined_file = os.path.join(output_dir, “combined_bond_distribution.png”) plt.savefig(combined_file, dpi=300, bbox_inches=‘tight’) plt.close() # 关闭图形释放内存 print(f"已保存组合分布图到 {combined_file}“) # 保存组合图PDF版本(期刊投稿要求) combined_pdf = os.path.join(output_dir, “combined_bond_distribution.pdf”) plt.savefig(combined_pdf, bbox_inches=‘tight’) print(f"已保存组合分布图PDF版本到 {combined_pdf}”) if name == “main”: ===== 配置四个体系 ===== 假设所有vasprun文件都在当前目录下 systems = { “System_1”: “vasprun1.xml”, “System_2”: “vasprun2.xml”, “System_3”: “vasprun3.xml”, “System_4”: “vasprun4.xml” } output_dir = “hydrogen_bond_analysis” os.makedirs(output_dir, exist_ok=True) # ===== 顺序处理四个体系 ===== results = {} for system_name, file in tqdm(systems.items(), desc=“处理体系”): try: system_name, bond_types = analyze_system(system_name, file, output_dir) results[system_name] = bond_types except Exception as e: print(f"处理体系 {system_name} 时出错: {e}“) # ===== 绘制组合图 ===== plot_combined_results(results, output_dir) print(”\n所有体系分析完成!结果保存在", output_dir) print(“包含:”) print(" - 每个体系的单独目录(包含统计数据和分布图)“) print(” - 组合分布图 (combined_bond_distribution.png 和 .pdf)")以上代码实现了对四个vasprun体系中计算四类氢键键角的角度分布情况,在这里我们对其中的原子分类逻辑做进一步优化,对计算的类别进一步优化,然后将该代码转为在58核100g内存的台式ananconda promt中执行,先识别P,在P周围搜寻O原子,如果该O原子距离在1.6埃以内则视为Op,而对于Op在其周围搜寻H原子,如果该H在距离1.3埃以内则视为成键即该H为Hp,通过是否有Hp则可以识别P-OH与P=O/P-O,在这里P=O和P-O还不能区分,在这里我们将不对P=O进行区分识别,为方便清晰,将两者统一记为P-O/P=O。接着体系中全部的O原子在去除Op之后剩下的O,在这些剩余的O周围搜寻整体的H,如果H的距离在1.2埃以内则视为成键,然后依照成键的H数量判定:如果H的数量为1,则记为-OH羟基(在这里不需要计算羟基部分,只是识别出来有利于逻辑完整性,并不参与RDF计算,也不需要特别标注表明),H的数量为2,则记为H2O水(该O也随之记为Ow,对应的两个H也记为Hw),如果H的数量为3,则记为水合氢离子(该O随之记为Oh,对应的三个H也记为Hh)。体系中存在质子转移的情况,所以需要每一帧重新识别原子的归属问题,如果H同时处于两个成键识别范围则按照就近原则,离哪个近则归属到哪一个(这里包括磷酸-磷酸,磷酸-水,磷酸-水合氢离子,水-水,水-水合氢离子,水合氢离子-水合氢离子,如果H同时处于某种情况下两个化学成键范围则采用就近原则),在实时重新归属质子的情况下,计算出包含质子转移部分的氢键角度变化统计,在这里,我们将排除自身化学键的阈值先设置为0,不需要只看氢键部分了。直接将-OH视为质子转移或者不完整而直接忽略即可,磷酸上的O需要通过H来进一步识别,所以符合距离的氧可暂时作为Op的候选,等H的识别完成再进行细分P=O/P-O和P-OH。同时由于后续RDF的计算过程中我们发现,在这里我们只需要将原先对四种类别的计算改为对九类的计算,分别为P-O/P=O···Hw P-O/P=O···Hh P-O/P=O···Hp P-OH···Ow Hw···Ow Hh···Ow Hw···F Hh···F Hp···F,以上补充的逻辑中提供了化学键的判别(供体和H之间的距离),在这里我们对H和受体之间的距离再进一步补充,这九类分别的距离分别为2.375 2.275 2.175 2.275 2.375 2.225 2.225 2.175 2.275.这数据的导出需要能导入origin中方便我后续作图,在原代码中文件名后加上2,作为第二版的意思。

filetype