15. 计算机二级-使用IF、INDEX和IF进行模糊匹配

# 1. 介绍IF函数 ## 1.1 IF函数的基本语法和用法 IF函数是一种在常见的电子表格软件中（比如Microsoft Excel、Google Sheets等）广泛应用的函数，用于根据指定的条件对数值进行逻辑判断和返回不同的结果。IF函数的基本语法如下： ```python =IF(logical_test, value_if_true, value_if_false) ``` 其中： - `logical_test` 是要进行逻辑判断的条件，可以是一个表达式或者是一个单元格的数值； - `value_if_true` 是当逻辑条件为真时返回的值； - `value_if_false` 是当逻辑条件为假时返回的值。举个例子，假如我们要判断一个人的成绩是否及格，如果及格则返回"及格"，否则返回"不及格"，可以这样使用IF函数： ```python =IF(成绩>=60, "及格", "不及格") ``` ## 1.2 IF函数在Excel中的应用在Excel中，IF函数同样是非常常用的函数之一，通过它可以轻松实现复杂的逻辑判断和结果返回。比如，在一个成绩单的表格中，可以利用IF函数来判断每个学生的成绩是否及格，并返回相应的标识。具体的公式可以是这样的： ```python =IF(B2>=60, "及格", "不及格") ``` 其中 `B2` 是存储成绩的单元格。 ## 1.3 使用IF函数进行条件判断和逻辑运算除了简单的条件判断外，IF函数还可以通过嵌套和组合来实现更复杂的逻辑运算。比如，可以利用多个IF函数来进行多条件判断，也可以将IF函数嵌套在其他函数中进行使用。在实际应用中，掌握IF函数的嵌套和组合用法将会极大地提升数据处理的灵活性和效率。 # 2. 深入理解INDEX函数 INDEX函数在Excel中被广泛应用于数据提取和查找操作，它能够在指定的数组或表格中返回特定位置的数值或引用。下面将详细介绍INDEX函数的作用和用法。 ### 2.1 INDEX函数的作用和用法 **作用**：INDEX函数用于在指定的数组或区域中返回一个单元格的值或者引用。 **语法**：INDEX(array, row_num, [column_num]) - `array`：要从中返回值的数组或区域。 - `row_num`：要返回其值的行号，可选参数。 - `column_num`：要返回其值的列号，可选参数。 ### 2.2 INDEX函数在数组和表格中的应用假设有一个包含学生成绩的表格，如下所示： | 学生姓名 | 语文成绩 | 数学成绩 | 英语成绩 | | --- | --- | --- | --- | | 小明 | 90 | 88 | 92 | | 小红 | 85 | 95 | 90 | | 小亮 | 78 | 82 | 79 | 如果我们要从中提取小红的数学成绩，可以使用INDEX函数： ```excel =INDEX(B2:D4, 2, 3) ``` 这个公式将返回数学成绩 95。其中 `B2:D4` 是包含所有成绩的区域，`2` 表示在第2行，`3` 表示在第3列，即数学成绩所在的列。 ### 2.3 INDEX函数与MATCH函数的结合运用 INDEX函数通常与MATCH函数结合使用，通过MATCH函数找到某个值在数组或区域中的位置，再利用INDEX函数获取相应的数值或引用。例如，要查找小亮的英语成绩，可以使用如下公式： ```excel =INDEX(B2:D4, MATCH("小亮", A2:A4, 0), MATCH("英语成绩", B1:D1, 0)) ``` 这个公式将返回小亮的英语成绩 79。其中第一个MATCH函数用于查找"小亮"在学生姓名列中的位置，第二个MATCH函数用于确定"英语成绩"在表头中的位置，最终INDEX函数返回小亮的英语成绩。通过以上实例，可以更深入地理解INDEX函数在数据提取和查找中的作用，以及与其他函数的结合运用方式。 # 3. 利用IF函数进行模糊匹配在数据处理中，有时我们需要进行模糊匹配，即在一组数据中查找与指定条件相似的值。IF函数在Excel中的模糊匹配中发挥着关键作用，能够帮助我们根据条件筛选数据。下面将介绍IF函数在模糊匹配中的应用方法。 #### 3.1 了解模糊匹配的概念和应用场景模糊匹配是指查找符合某种模式或规律的数据，而不是精确匹配。在实际应用中，例如查找包含某个特定字符的文本、对数据进行模糊筛选等情形下，模糊匹配非常有用。 #### 3.2 IF函数在模糊匹配中的作用和实际应用 ```python # 示例：使用Python实现IF函数进行模糊匹配 data = ['apple', 'banana', 'gra ```

最低0.47元/天解锁专栏

赠100次下载

继续阅读点击查看下一篇

400次会员资源下载次数

300万+ 优质博客文章

1000万+ 优质下载资源

1000万+ 优质文库回答

复制全文

相关推荐

--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[2], line 34 32 file_path = "C:\附件\附件\C题附件.xlsx" 33 data = load_data(file_path) ---> 34 train_and_evaluate(data) Cell In[2], line 16, in train_and_evaluate(data) 14 def train_and_evaluate(data): 15 # 准备数据 ---> 16 X = data[['温度，oC', '频率，Hz', 'Bmax']] # 确保这里的列名与实际数据匹配 17 y = data['损耗功率，w/m3'] # 确保这里的列名与实际数据匹配 19 # 分割数据 File ~\AppData\Roaming\Python\Python312\site-packages\pandas\core\frame.py:4108, in DataFrame.getitem(self, key) 4106 if is_iterator(key): 4107 key = list(key) -> 4108 indexer = self.columns._get_indexer_strict(key, "columns")[1] 4110 # take() does not accept boolean indexers 4111 if getattr(indexer, "dtype", None) == bool: File ~\AppData\Roaming\Python\Python312\site-packages\pandas\core\indexes\base.py:6200, in Index._get_indexer_strict(self, key, axis_name) 6197 else: 6198 keyarr, indexer, new_indexer = self._reindex_non_unique(keyarr) -> 6200 self._raise_if_missing(keyarr, indexer, axis_name) 6202 keyarr = self.take(indexer) 6203 if isinstance(key, Index): 6204 # GH 42790 - Preserve name from an Index File ~\AppData\Roaming\Python\Python312\site-packages\pandas\core\indexes\base.py:6252, in Index._raise_if_missing(self, key, indexer, axis_name) 6249 raise KeyError(f"None of [{key}] are in the [{axis_name}]") 6251 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique()) -> 6252 raise KeyError(f"{not_found} not in index") KeyError: "['温度，oC', '频率，Hz'] not in index"

3. **模糊匹配方案** python selected_cols = [col for col in df.columns if '温度' in col or '频率' in col] df_filtered = df[selected_cols] ### 三、高级处理方法 1. **正则表达式匹配** python ...

''' 提取机器学习特征 ''' import os import torch import torch.nn as nn import numpy as np import nibabel as nib import nrrd from monai.networks.nets import resnet50 import pandas as pd import matplotlib.pyplot as plt os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE' # --------------------------- # 1. 自定义3D ResNet-50模型（用于特征提取） # --------------------------- class ResNet3D50FeatureExtractor(nn.Module): def init(self, pretrained=False): super(ResNet3D50FeatureExtractor, self).init() self.model = resnet50( spatial_dims=3, n_input_channels=1, num_classes=1000, pretrained=pretrained ) self.feature_extractor = nn.Sequential(*list(self.model.children())[:-2]) def forward(self, x): x = self.feature_extractor(x) x = torch.mean(x, dim=(2, 3, 4)) return x # --------------------------- # 2. 图像与掩模加载及裁剪函数（真正基于mask的真实坐标裁剪，不使用包围框） # --------------------------- def load_image_and_mask(image_path, mask_path): if not os.path.exists(image_path): raise FileNotFoundError(f"图像文件不存在: {image_path}") if not os.path.exists(mask_path): raise FileNotFoundError(f"掩模文件不存在: {mask_path}") # 加载图像 image_nii = nib.load(image_path) image_data = image_nii.get_fdata() print("图像 shape:", image_data.shape) # 加载掩模 mask_data, header = nrrd.read(mask_path) print("掩模 shape:", mask_data.shape) # 获取掩模中非零值的坐标 coords = np.argwhere(mask_data > 0) if len(coords) == 0: raise ValueError("掩模中没有非零体素，无法提取肿瘤区域") # 确保 coords 是一个 NumPy 数组 coords = np.array(coords) # 计算掩模和图像之间的偏移量 offset_x, offset_y, offset_z = map(int, header.get('Segmentation_ReferenceImageExtentOffset', '0 0 0').split()) # 提取对应图像中的值 values = [] for x, y, z in coords: img_x = x + offset_x img_y = y + offset_y img_z = z + offset_z if 0 <= img_x < image_data.shape[0] and 0 <= img_y < image_data.shape[1] and 0 <= img_z < image_data.shape[2]: values.append(image_data[img_x, img_y, img_z]) else: print(f"警告: 掩膜坐标 ({x}, {y}, {z}) 超出图像边界 ({img_x}, {img_y}, {img_z})") print(f"从掩模中提取了 {len(values)} 个体素值") print(f"体素值范围: min={min(values):.2f}, max={max(values):.2f}, mean={np.mean(values):.2f}") return image_data, mask_data, values, coords, offset_x, offset_y, offset_z # --------------------------- # 3. 预处理：归一化 + 转 Tensor（并扩展为 3D 张量） # --------------------------- def preprocess_tumor_region(tumor_values, target_shape=(128, 128, 128)): """ 将一维数组转换为 3D Tensor，并填充到统一大小 """ tumor_norm = (tumor_values - tumor_values.min()) / (tumor_values.max() - tumor_values.min() + 1e-6) # 创建一个空的目标张量 tumor_tensor = np.zeros(target_shape, dtype=np.float32) # 将归一化的值分配到目标张量中 indices = np.linspace(0, tumor_values.size - 1, target_shape[0]).astype(int) for i in range(target_shape[0]): tumor_tensor[i, :, :] = tumor_norm[indices] # 转换为 PyTorch 张量并增加 batch 和 channel 维度 tumor_tensor = torch.tensor(tumor_tensor, dtype=torch.float32).unsqueeze(0).unsqueeze(0) return tumor_tensor # --------------------------- # 4. 主体逻辑 # --------------------------- device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f"Using device: {device}") model = ResNet3D50FeatureExtractor(pretrained=False).to(device) model.eval() all_features = [] base_dir = "D:\\SCI_1\\0INPUT\\HS1\\" # 使用双反斜杠 output_excel_path = 'extracted_features.xlsx' for i in range(1, 2): # 测试用例仅 HS1 name = f"HS{i}" image_path = os.path.join(base_dir, f"{name}.nii") mask_path = os.path.join(base_dir, f"{name}-seg.seg.nrrd") try: image_data, mask_data, tumor_values, coords, offset_x, offset_y, offset_z = load_image_and_mask(image_path, mask_path) # 获取掩模的边界框 x_min, y_min, z_min = coords.min(axis=0) x_max, y_max, z_max = coords.max(axis=0) # 计算中间切片索引 mid_slice_idx = (z_min + z_max) // 2 # 裁剪图像 cropped_image = image_data[x_min:x_max+1, y_min:y_max+1, :] # 可视化原始图像的中间切片 fig, ax = plt.subplots(figsize=(8, 8)) # 显示中间切片的图像 ax.imshow(cropped_image[:, :, mid_slice_idx - z_min], cmap='gray') ax.set_title('Cropped Image Middle Slice') plt.show() input_tensor = preprocess_tumor_region(tumor_values).to(device) with torch.no_grad(): features = model(input_tensor).cpu().numpy()[0] # 去掉 batch 维度 print(f"✅ 成功提取 {name} 的特征，维度: {features.shape}") feature_dict = {'Name': name} feature_dict.update({f'feature_{j}': feat for j, feat in enumerate(features)}) all_features.append(feature_dict) except Exception as e: print(f"❌ 处理 {name} 时出错: {str(e)}") # --------------------------- # 5. 所有特征写入 Excel # --------------------------- if all_features: df = pd.DataFrame(all_features) df.to_excel(output_excel_path, index=False) print(f"\n🎉 特征已成功保存至 {output_excel_path}") else: print("\n⚠️ 没有任何样本被成功处理！") 我的nii文件和nrrd文件处理过程不对，请修改

- 重采样时使用 nearest-neighbor 插值避免掩模模糊（医学掩模通常是二值）。 - 对齐后，可直接应用引用[1]的覆盖逻辑可视化对齐效果 [^1]，但这不是特征提取必需的。 #### 3. **提取肿瘤区域特征** 对齐后，使用...

import os import pandas as pd import tkinter as tk from tkinter import ttk, filedialog, scrolledtext, messagebox from tkinter.colorchooser import askcolor from difflib import SequenceMatcher import re import openpyxl import threading import numpy as np from openpyxl.utils import get_column_letter import xlrd import gc import hashlib import json import tempfile from concurrent.futures import ThreadPoolExecutor, as_completed import unicodedata from datetime import datetime class EnhancedSignalComparator: def init(self, root): self.root = root self.root.title("增强版信号功能对比工具") self.root.geometry("1200x800") self.root.configure(bg="#f0f0f0") # 初始化变量 self.folder_path = tk.StringVar() self.search_text = tk.StringVar() self.files = [] self.results = {} # 存储信号对比结果 self.highlight_color = "#FFD700" # 默认高亮色 self.search_running = False self.stop_requested = False self.cache_dir = os.path.join(tempfile.gettempdir(), "excel_cache") self.file_cache = {} # 文件缓存 self.column_cache = {} # 列名缓存 self.max_workers = 4 # 最大并发线程数 # 创建缓存目录 os.makedirs(self.cache_dir, exist_ok=True) # 创建界面 self.create_widgets() self.log_file = "comparator.log" self.setup_logging() def setup_logging(self): """初始化日志系统""" with open(self.log_file, "w", encoding="utf-8") as log_file: log_file.write(f"{datetime.now().isoformat()} - 日志初始化\n") def log(self, message): """记录日志""" timestamp = datetime.now().isoformat() log_entry = f"{timestamp} - {message}\n" # 控制台输出 print(log_entry.strip()) # 文件记录 with open(self.log_file, "a", encoding="utf-8") as log_file: log_file.write(log_entry) # 状态栏显示（缩短版本） if len(message) > 60: self.status_var.set(message[:57] + "...") else: self.status_var.set(message) def create_widgets(self): # 顶部控制面板 control_frame = ttk.Frame(self.root, padding=10) control_frame.pack(fill=tk.X) # 文件夹选择 ttk.Label(control_frame, text="选择文件夹:").grid(row=0, column=0, sticky=tk.W) folder_entry = ttk.Entry(control_frame, textvariable=self.folder_path, width=50) folder_entry.grid(row=0, column=1, padx=5, sticky=tk.EW) ttk.Button(control_frame, text="浏览...", command=self.browse_folder).grid(row=0, column=2) # 搜索输入 ttk.Label(control_frame, text="搜索信号:").grid(row=1, column=0, sticky=tk.W, pady=(10,0)) search_entry = ttk.Entry(control_frame, textvariable=self.search_text, width=50) search_entry.grid(row=1, column=1, padx=5, pady=(10,0), sticky=tk.EW) search_entry.bind("<Return>", lambda event: self.start_search_thread()) ttk.Button(control_frame, text="搜索", command=self.start_search_thread).grid(row=1, column=2, pady=(10,0)) ttk.Button(control_frame, text="停止", command=self.stop_search).grid(row=1, column=3, pady=(10,0), padx=5) # 高级选项 ttk.Label(control_frame, text="并发线程:").grid(row=2, column=0, sticky=tk.W, pady=(10,0)) self.thread_var = tk.StringVar(value="4") ttk.Combobox(control_frame, textvariable=self.thread_var, values=["1", "2", "4", "8"], width=5).grid(row=2, column=1, sticky=tk.W, padx=5, pady=(10,0)) # 文件过滤 ttk.Label(control_frame, text="文件过滤:").grid(row=2, column=2, sticky=tk.W, pady=(10,0)) self.filter_var = tk.StringVar(value=".xlsx;.xlsm;.xls") ttk.Entry(control_frame, textvariable=self.filter_var, width=20).grid(row=2, column=3, sticky=tk.W, padx=5, pady=(10,0)) # 高亮颜色选择 ttk.Label(control_frame, text="高亮颜色:").grid(row=3, column=0, sticky=tk.W, pady=(10,0)) self.color_btn = tk.Button(control_frame, bg=self.highlight_color, width=3, command=self.choose_color) self.color_btn.grid(row=3, column=1, sticky=tk.W, padx=5, pady=(10,0)) # 进度条 self.progress = ttk.Progressbar(control_frame, orient="horizontal", length=200, mode="determinate") self.progress.grid(row=3, column=2, columnspan=2, sticky=tk.EW, padx=5, pady=(10,0)) # 结果标签 self.result_label = ttk.Label(control_frame, text="") self.result_label.grid(row=3, column=4, sticky=tk.W, padx=5, pady=(10,0)) # 对比面板 notebook = ttk.Notebook(self.root) notebook.pack(fill=tk.BOTH, expand=True, padx=10, pady=10) # 表格视图 self.table_frame = ttk.Frame(notebook) notebook.add(self.table_frame, text="表格视图") # 文本对比视图 self.text_frame = ttk.Frame(notebook) notebook.add(self.text_frame, text="行内容对比") # 状态栏 self.status_var = tk.StringVar() status_bar = ttk.Label(self.root, textvariable=self.status_var, relief=tk.SUNKEN, anchor=tk.W) status_bar.pack(side=tk.BOTTOM, fill=tk.X) # 初始化表格和文本区域 self.init_table_view() self.init_text_view() def init_table_view(self): """初始化表格视图""" # 创建树状表格 columns = ("信号", "文件", "行内容摘要") self.tree = ttk.Treeview(self.table_frame, columns=columns, show="headings") # 设置列标题 for col in columns: self.tree.heading(col, text=col) self.tree.column(col, width=200, anchor=tk.W) # 添加滚动条 scrollbar = ttk.Scrollbar(self.table_frame, orient=tk.VERTICAL, command=self.tree.yview) self.tree.configure(yscrollcommand=scrollbar.set) self.tree.pack(side=tk.LEFT, fill=tk.BOTH, expand=True) scrollbar.pack(side=tk.RIGHT, fill=tk.Y) # 绑定选择事件 self.tree.bind("<<TreeviewSelect>>", self.on_table_select) def init_text_view(self): """初始化文本对比视图""" self.text_panes = {} self.text_frame.columnconfigure(0, weight=1) self.text_frame.rowconfigure(0, weight=1) # 创建对比容器 self.compare_container = ttk.Frame(self.text_frame) self.compare_container.grid(row=0, column=0, sticky="nsew", padx=5, pady=5) # 添加差异高亮按钮 btn_frame = ttk.Frame(self.text_frame) btn_frame.grid(row=1, column=0, sticky="ew", padx=5, pady=5) ttk.Button(btn_frame, text="高亮显示差异", command=self.highlight_differences).pack(side=tk.LEFT) ttk.Button(btn_frame, text="导出差异报告", command=self.export_report).pack(side=tk.LEFT, padx=5) ttk.Button(btn_frame, text="清除缓存", command=self.clear_cache).pack(side=tk.LEFT, padx=5) ttk.Button(btn_frame, text="手动指定列名", command=self.manual_column_select).pack(side=tk.LEFT, padx=5) def browse_folder(self): """选择文件夹""" folder = filedialog.askdirectory(title="选择包含Excel文件的文件夹") if folder: self.folder_path.set(folder) self.load_files() def load_files(self): """加载文件夹中的Excel文件（优化特殊字符处理）""" folder = self.folder_path.get() if not folder or not os.path.isdir(folder): return # 获取文件过滤模式 filter_patterns = self.filter_var.get().split(';') self.files = [] for file in os.listdir(folder): file_path = os.path.join(folder, file) # 跳过临时文件 if file.startswith('~$'): continue # 检查文件扩展名 file_lower = file.lower() matched = False for pattern in filter_patterns: # 移除通配符并转换为小写 ext = pattern.replace('', '').lower() if file_lower.endswith(ext): matched = True break if matched: # 规范化文件名处理特殊字符 normalized_path = self.normalize_file_path(file_path) if normalized_path and os.path.isfile(normalized_path): self.files.append(normalized_path) self.status_var.set(f"找到 {len(self.files)} 个Excel文件") def normalize_file_path(self, path): """规范化文件路径，处理特殊字符""" try: # 尝试直接访问文件 if os.path.exists(path): return path # 尝试Unicode规范化 normalized = unicodedata.normalize('NFC', path) if os.path.exists(normalized): return normalized # 尝试不同编码方案 encodings = ['utf-8', 'shift_jis', 'euc-jp', 'cp932'] for encoding in encodings: try: decoded = path.encode('latin1').decode(encoding) if os.path.exists(decoded): return decoded except: continue # 最终尝试原始路径 return path except Exception as e: self.status_var.set(f"文件路径处理错误: {str(e)}") return path def get_file_hash(self, file_path): """计算文件哈希值用于缓存""" try: hash_md5 = hashlib.md5() with open(file_path, "rb") as f: for chunk in iter(lambda: f.read(4096), b""): hash_md5.update(chunk) return hash_md5.hexdigest() except Exception as e: self.status_var.set(f"计算文件哈希失败: {str(e)}") return str(os.path.getmtime(file_path)) def get_cache_filename(self, file_path): """获取缓存文件名""" file_hash = self.get_file_hash(file_path) return os.path.join(self.cache_dir, f"{os.path.basename(file_path)}_{file_hash}.cache") def load_header_cache(self, file_path): """加载列名缓存""" cache_file = self.get_cache_filename(file_path) if os.path.exists(cache_file): try: with open(cache_file, "r", encoding='utf-8') as f: return json.load(f) except: return None return None def save_header_cache(self, file_path, header_info): """保存列名缓存""" cache_file = self.get_cache_filename(file_path) try: with open(cache_file, "w", encoding='utf-8') as f: json.dump(header_info, f) return True except: return False def find_header_row(self, file_path): """查找列名行（增强版）""" # 禁用缓存进行测试 # return None, None # 检查缓存 cache = self.load_header_cache(file_path) if cache: return cache.get("header_row"), cache.get("signal_col") # 没有缓存则重新查找 if file_path.lower().endswith((".xlsx", ".xlsm")): return self.find_header_row_openpyxl(file_path) elif file_path.lower().endswith(".xls"): return self.find_header_row_xlrd(file_path) return None, None def find_header_row_openpyxl(self, file_path): """使用openpyxl查找列名行（增强版）""" try: self.log(f"开始处理文件: {os.path.basename(file_path)}") wb = openpyxl.load_workbook(file_path, read_only=True, data_only=True) ws = wb.active # 尝试多种列名匹配模式 patterns = [ r'ﾃﾞｰﾀ名', r'データ名', r'信号名', r'Signal Name', r'Data Name', r'信号名称', r'データ名称', r'信号' ] # 扩大搜索范围：前100行和前200列 for row_idx in range(1, 101): for col_idx in range(1, 201): try: cell = ws.cell(row=row_idx, column=col_idx) cell_value = cell.value if not cell_value: continue cell_str = str(cell_value) for pattern in patterns: if re.search(pattern, cell_str, re.IGNORECASE): self.log(f"找到匹配模式 '{pattern}' 在行{row_idx}列{col_idx}") # 找到列名行后，尝试确定信号列 signal_col = None # 在同行中查找信号列 for col_idx2 in range(1, 101): # 1-100列 try: cell2 = ws.cell(row=row_idx, column=col_idx2) cell2_value = cell2.value if not cell2_value: continue cell2_str = str(cell2_value) if re.search(pattern, cell2_str, re.IGNORECASE): signal_col = col_idx2 break except: continue # 保存缓存 if signal_col is not None: header_info = {"header_row": row_idx, "signal_col": signal_col} self.save_header_cache(file_path, header_info) wb.close() return row_idx, signal_col except: continue wb.close() except Exception as e: self.log(f"查找列名行出错: {str(e)}") return None, None def find_header_row_xlrd(self, file_path): """使用xlrd查找列名行（增强版）""" try: wb = xlrd.open_workbook(file_path) ws = wb.sheet_by_index(0) # 尝试多种列名匹配模式 patterns = [ r'ﾃﾞｰﾀ名', # 半角片假名 r'データ名', # 全角片假名 r'信号名', # 中文 r'Signal Name', # 英文 r'Data Name', r'信号名称', r'データ名称' ] # 扩大搜索范围：前50行和前100列 for row_idx in range(0, 50): # 0-49行 # 扩大列搜索范围到100列 for col_idx in range(0, 100): # 0-99列 try: cell_value = ws.cell_value(row_idx, col_idx) if not cell_value: continue # 尝试所有匹配模式 cell_str = str(cell_value) for pattern in patterns: if re.search(pattern, cell_str, re.IGNORECASE): # 找到列名行后，尝试确定信号列 signal_col = None # 在同行中查找信号列 for col_idx2 in range(0, 100): # 0-99列 try: cell2_value = ws.cell_value(row_idx, col_idx2) if not cell2_value: continue cell2_str = str(cell2_value) if re.search(pattern, cell2_str, re.IGNORECASE): signal_col = col_idx2 break except: continue # 保存缓存 if signal_col is not None: header_info = {"header_row": row_idx, "signal_col": signal_col} self.save_header_cache(file_path, header_info) return row_idx, signal_col except: continue except Exception as e: self.status_var.set(f"查找列名行出错: {str(e)}") return None, None def start_search_thread(self): """启动搜索线程""" if self.search_running: return self.search_running = True self.stop_requested = False self.max_workers = int(self.thread_var.get()) threading.Thread(target=self.search_files, daemon=True).start() def stop_search(self): """停止搜索""" self.stop_requested = True self.status_var.set("正在停止搜索...") def search_files(self): """在文件中搜索内容（优化特殊文件处理）""" search_term = self.search_text.get().strip() if not search_term: self.status_var.set("请输入搜索内容") self.search_running = False return if not self.files: self.status_var.set("请先选择文件夹") self.search_running = False return # 重置结果和UI self.results = {} for item in self.tree.get_children(): self.tree.delete(item) total_files = len(self.files) processed_files = 0 found_signals = 0 # 使用线程池处理文件 # 在search_files方法中添加详细进度 with ThreadPoolExecutor(max_workers=self.max_workers) as executor: futures = {} for i, file_path in enumerate(self.files): if self.stop_requested: break future = executor.submit(self.process_file, file_path, search_term) futures[future] = (file_path, i) # 保存文件索引 for future in as_completed(futures): if self.stop_requested: break file_path, idx = futures[future] try: found = future.result() found_signals += found processed_files += 1 # 更详细的进度反馈 progress = int(processed_files / total_files * 100) self.progress["value"] = progress self.status_var.set( f"已处理 {processed_files}/{total_files} 个文件 | " f"当前: {os.path.basename(file_path)} | " f"找到: {found_signals} 个匹配" ) self.root.update_idletasks() except Exception as e: self.status_var.set(f"处理文件 {os.path.basename(file_path)} 出错: {str(e)}") # 更新结果 if self.stop_requested: self.status_var.set(f"搜索已停止，已处理 {processed_files}/{total_files} 个文件") elif found_signals == 0: self.status_var.set(f"未找到包含 '{search_term}' 的信号") else: self.status_var.set(f"找到 {len(self.results)} 个匹配信号，共 {found_signals} 处匹配") self.update_text_view() self.progress["value"] = 0 self.search_running = False gc.collect() # 强制垃圾回收释放内存 def process_file(self, file_path, search_term): """处理单个文件（增强异常处理）""" found = 0 try: # 获取列名行和信号列 header_row, signal_col = self.find_header_row(file_path) # 如果自动查找失败，尝试手动模式 if header_row is None or signal_col is None: self.status_var.set(f"文件 {os.path.basename(file_path)} 未找到列名行，尝试手动查找...") header_row, signal_col = self.manual_find_header_row(file_path) if header_row is None or signal_col is None: self.status_var.set(f"文件 {os.path.basename(file_path)} 无法确定列名行，已跳过") return found # 使用pandas处理所有Excel文件类型 found = self.process_file_with_pandas(file_path, search_term, header_row, signal_col) except Exception as e: self.status_var.set(f"处理文件 {os.path.basename(file_path)} 出错: {str(e)}") return found def manual_column_select(self): """手动指定列名位置（增强版）""" if not self.files: messagebox.showinfo("提示", "请先选择文件夹") return # 创建手动选择窗口 manual_window = tk.Toplevel(self.root) manual_window.title("手动指定列名位置") manual_window.geometry("500x400") # 文件选择 ttk.Label(manual_window, text="选择文件:").pack(pady=(10, 5)) file_var = tk.StringVar() file_combo = ttk.Combobox(manual_window, textvariable=file_var, values=[os.path.basename(f) for f in self.files], width=40) file_combo.pack(fill=tk.X, padx=20, pady=5) file_combo.current(0) # 预览框架 preview_frame = ttk.Frame(manual_window) preview_frame.pack(fill=tk.BOTH, expand=True, padx=10, pady=10) # 表格预览 columns = ("列", "值") self.preview_tree = ttk.Treeview(preview_frame, columns=columns, show="headings", height=10) # 设置列标题 for col in columns: self.preview_tree.heading(col, text=col) self.preview_tree.column(col, width=100, anchor=tk.W) # 添加滚动条 scrollbar = ttk.Scrollbar(preview_frame, orient=tk.VERTICAL, command=self.preview_tree.yview) self.preview_tree.configure(yscrollcommand=scrollbar.set) self.preview_tree.pack(side=tk.LEFT, fill=tk.BOTH, expand=True) scrollbar.pack(side=tk.RIGHT, fill=tk.Y) # 加载预览数据 def load_preview(event=None): file_idx = file_combo.current() file_path = self.files[file_idx] # 清空现有预览 for item in self.preview_tree.get_children(): self.preview_tree.delete(item) # 加载前10行数据 try: if file_path.lower().endswith((".xlsx", ".xlsm")): wb = openpyxl.load_workbook(file_path, read_only=True, data_only=True) ws = wb.active # 读取前10行 for row_idx in range(1, 11): for col_idx in range(1, 51): # 前50列 try: cell = ws.cell(row=row_idx, column=col_idx) if cell.value is not None: self.preview_tree.insert("", tk.END, values=( f"行{row_idx}列{col_idx}", str(cell.value)[:50] # 限制显示长度 )) except: continue wb.close() elif file_path.lower().endswith(".xls"): wb = xlrd.open_workbook(file_path) ws = wb.sheet_by_index(0) # 读取前10行 for row_idx in range(0, 10): for col_idx in range(0, 50): # 前50列 try: cell_value = ws.cell_value(row_idx, col_idx) if cell_value: self.preview_tree.insert("", tk.END, values=( f"行{row_idx+1}列{col_idx+1}", str(cell_value)[:50] # 限制显示长度 )) except: continue except Exception as e: messagebox.showerror("错误", f"加载预览失败: {str(e)}") file_combo.bind("<<ComboboxSelected>>", load_preview) load_preview() # 初始加载 # 输入框架 input_frame = ttk.Frame(manual_window) input_frame.pack(fill=tk.X, padx=20, pady=10) # 行号输入 ttk.Label(input_frame, text="列名行号:").grid(row=0, column=0, sticky=tk.W) row_var = tk.StringVar(value="1") row_entry = ttk.Entry(input_frame, textvariable=row_var, width=10) row_entry.grid(row=0, column=1, padx=5) # 列号输入 ttk.Label(input_frame, text="信号列号:").grid(row=0, column=2, sticky=tk.W, padx=(10,0)) col_var = tk.StringVar(value="1") col_entry = ttk.Entry(input_frame, textvariable=col_var, width=10) col_entry.grid(row=0, column=3, padx=5) # 确认按钮 def confirm_selection(): try: file_idx = file_combo.current() file_path = self.files[file_idx] header_row = int(row_var.get()) signal_col = int(col_var.get()) # 保存到缓存 header_info = {"header_row": header_row, "signal_col": signal_col} self.save_header_cache(file_path, header_info) messagebox.showinfo("成功", f"已为 {os.path.basename(file_path)} 设置列名位置：行{header_row} 列{signal_col}") manual_window.destroy() except Exception as e: messagebox.showerror("错误", f"无效输入: {str(e)}") ttk.Button(manual_window, text="确认", command=confirm_selection).pack(pady=10) def process_file_with_pandas(self, file_path, search_term, header_row, signal_col): """使用pandas高效处理Excel文件（优化版）""" found = 0 try: # 添加文件信息日志 file_size = os.path.getsize(file_path) short_name = os.path.basename(file_path) self.status_var.set(f"处理文件: {short_name} ({file_size}字节)") self.root.update_idletasks() # 使用pandas读取Excel文件 file_ext = os.path.splitext(file_path)[1].lower() engine = 'openpyxl' if file_ext in ['.xlsx', '.xlsm'] else 'xlrd' # 动态确定要读取的列范围（智能调整） # 计算最大可用列数 max_columns = self.get_max_columns(file_path) # 扩大列范围（前后10列） start_col = max(1, signal_col - 10) end_col = min(max_columns, signal_col + 10) # 确保信号列在读取范围内 if signal_col < start_col or signal_col > end_col: # 如果信号列不在范围内，调整读取范围 start_col = max(1, signal_col - 10) end_col = min(max_columns, signal_col + 10) # 计算信号列在DataFrame中的索引 signal_col_idx = signal_col - start_col # 确保索引有效 if signal_col_idx < 0 or signal_col_idx >= (end_col - start_col + 1): self.status_var.set(f"文件 {short_name}: 信号列索引计算错误") return 0 # 验证列位置 try: if file_path.lower().endswith((".xlsx", ".xlsm")): wb = openpyxl.load_workbook(file_path, read_only=True) ws = wb.active actual_col_name = ws.cell(row=header_row, column=signal_col).value wb.close() self.status_var.set(f"文件 {short_name}: 信号列 '{actual_col_name}' (位置 {signal_col})") elif file_path.lower().endswith(".xls"): wb = xlrd.open_workbook(file_path) ws = wb.sheet_by_index(0) actual_col_name = ws.cell_value(header_row, signal_col-1) self.status_var.set(f"文件 {short_name}: 信号列 '{actual_col_name}' (位置 {signal_col})") except Exception as e: self.status_var.set(f"列验证失败: {str(e)}") # 读取数据 df = pd.read_excel( file_path, engine=engine, header=header_row-1, usecols=range(start_col-1, end_col), dtype=str ) # 获取实际列名 column_names = df.columns.tolist() # 获取信号列数据（通过位置索引） if signal_col_idx < len(df.columns): signal_series = df.iloc[:, signal_col_idx] else: self.status_var.set(f"文件 {short_name}: 信号列超出范围") return 0 # 搜索匹配的信号 # 处理可能的NaN值 signal_series = signal_series.fillna('') # 更灵活的匹配逻辑 matches = df[signal_series.str.contains( re.escape(search_term), case=False, na=False, regex=True )] # 处理匹配行 for idx, row in matches.iterrows(): # 只显示有值的列 row_content = [] for col_idx, value in enumerate(row): # 跳过空值 if pd.notna(value) and str(value).strip() != '': # 使用实际列名 if col_idx < len(column_names): col_name = column_names[col_idx] else: col_name = f"列{start_col + col_idx}" row_content.append(f"{col_name}: {str(value).strip()}") row_content = "\n".join(row_content) signal_value = row.iloc[signal_col_idx] # 使用位置索引获取信号值 # 使用更唯一的复合键（包含行索引） signal_key = f"{signal_value}||{short_name}||{idx}" # 添加到结果集 self.results[signal_key] = { "signal": signal_value, "file": short_name, "content": row_content } # 添加到表格 summary = row_content[:50] + "..." if len(row_content) > 50 else row_content self.tree.insert("", tk.END, values=(signal_value, short_name, summary)) found += 1 # 每处理10行更新一次UI if found % 10 == 0: self.status_var.set(f"处理 {short_name}: 找到 {found} 个匹配") self.root.update_idletasks() # 添加完成日志 self.status_var.set(f"文件 {short_name} 处理完成: 找到 {found} 个匹配") except Exception as e: import traceback traceback.print_exc() self.status_var.set(f"处理文件 {short_name} 出错: {str(e)}") finally: # 显式释放内存 if 'df' in locals(): del df if 'matches' in locals(): del matches gc.collect() return found def get_max_columns(self, file_path): """获取Excel文件的最大列数""" try: if file_path.lower().endswith((".xlsx", ".xlsm")): wb = openpyxl.load_workbook(file_path, read_only=True) ws = wb.active max_col = ws.max_column wb.close() return max_col elif file_path.lower().endswith(".xls"): wb = xlrd.open_workbook(file_path) ws = wb.sheet_by_index(0) return ws.ncols except: return 100 # 默认值 return 100 # 默认值 def update_text_view(self): """更新文本对比视图""" # 清除现有文本区域 for widget in self.compare_container.winfo_children(): widget.destroy() if not self.results: return # 获取第一个信号作为默认显示 first_signal_key = next(iter(self.results.keys())) self.display_signal_comparison(first_signal_key) def on_table_select(self, event): """表格选择事件处理""" selected = self.tree.selection() if not selected: return item = self.tree.item(selected[0]) signal_value = item["values"][0] # 获取信号值 # 直接传递信号值给显示方法 self.display_signal_comparison(signal_value) def display_signal_comparison(self, signal_value): """显示指定信号在不同文件中的对比""" # 清除现有文本区域 for widget in self.compare_container.winfo_children(): widget.destroy() # 获取包含该信号的所有结果项 signal_items = [ (key, data) for key, data in self.results.items() if data["signal"] == signal_value ] if not signal_items: return # 按文件名排序 signal_items.sort(key=lambda x: x[1]["file"]) # 创建列框架 for i, (signal_key, signal_data) in enumerate(signal_items): col_frame = ttk.Frame(self.compare_container) col_frame.grid(row=0, column=i, sticky="nsew", padx=5, pady=5) self.compare_container.columnconfigure(i, weight=1) # 文件名标签 file_label = ttk.Label(col_frame, text=signal_data["file"], font=("Arial", 10, "bold")) file_label.pack(fill=tk.X, pady=(0, 5)) # 信号名标签 signal_label = ttk.Label(col_frame, text=signal_data["signal"], font=("Arial", 9, "italic")) signal_label.pack(fill=tk.X, pady=(0, 5)) # 文本区域 text_area = scrolledtext.ScrolledText(col_frame, wrap=tk.WORD, width=30, height=15) text_area.insert(tk.INSERT, signal_data["content"]) text_area.configure(state="disabled") text_area.pack(fill=tk.BOTH, expand=True) # 保存引用 self.text_panes[signal_key] = text_area def highlight_differences(self): """高亮显示文本差异""" if not self.text_panes: return # 获取所有行内容 all_contents = [] for text_area in self.text_panes.values(): text_area.configure(state="normal") text = text_area.get("1.0", tk.END).strip() text_area.configure(state="disabled") all_contents.append(text) # 如果所有内容相同，则不需要高亮 if len(set(all_contents)) == 1: self.status_var.set("所有文件行内容完全一致") return # 使用第一个文件作为基准 base_text = all_contents[0] # 对比并高亮差异 for i, (file, text_area) in enumerate(self.text_panes.items()): if i == 0: # 基准文件不需要处理 continue text_area.configure(state="normal") text_area.tag_configure("diff", background=self.highlight_color) # 清除之前的高亮 text_area.tag_remove("diff", "1.0", tk.END) # 获取当前文本 compare_text = text_area.get("1.0", tk.END).strip() # 使用序列匹配器查找差异 s = SequenceMatcher(None, base_text, compare_text) # 高亮差异部分 for tag in s.get_opcodes(): opcode = tag[0] start = tag[3] end = tag[4] if opcode != "equal": # 添加高亮标签 text_area.tag_add("diff", f"1.0+{start}c", f"1.0+{end}c") text_area.configure(state="disabled") self.status_var.set("差异已高亮显示") def choose_color(self): """选择高亮颜色""" color = askcolor(title="选择高亮颜色", initialcolor=self.highlight_color) if color[1]: self.highlight_color = color[1] self.color_btn.configure(bg=self.highlight_color) def export_report(self): """导出差异报告""" if not self.results: messagebox.showwarning("警告", "没有可导出的结果") return try: # 创建报告数据结构 report_data = [] for signal, files_data in self.results.items(): for file, content in files_data.items(): report_data.append({ "信号": signal, "文件": file, "行内容": content }) # 转换为DataFrame df = pd.DataFrame(report_data) # 保存到Excel save_path = filedialog.asksaveasfilename( defaultextension=".xlsx", filetypes=[("Excel文件", "*.xlsx")], title="保存差异报告" ) if save_path: df.to_excel(save_path, index=False) self.status_var.set(f"报告已保存到: {save_path}") except Exception as e: messagebox.showerror("错误", f"导出报告失败: {str(e)}") def clear_cache(self): """清除缓存""" try: for file in os.listdir(self.cache_dir): if file.endswith(".cache"): os.remove(os.path.join(self.cache_dir, file)) self.file_cache = {} self.column_cache = {} self.status_var.set("缓存已清除") except Exception as e: self.status_var.set(f"清除缓存失败: {str(e)}") def manual_column_select(self): """手动指定列名位置""" if not self.files: messagebox.showinfo("提示", "请先选择文件夹") return # 创建手动选择窗口 manual_window = tk.Toplevel(self.root) manual_window.title("手动指定列名位置") manual_window.geometry("400x300") # 文件选择 ttk.Label(manual_window, text="选择文件:").pack(pady=(10, 5)) file_var = tk.StringVar() file_combo = ttk.Combobox(manual_window, textvariable=file_var, values=[os.path.basename(f) for f in self.files]) file_combo.pack(fill=tk.X, padx=20, pady=5) file_combo.current(0) # 行号输入 ttk.Label(manual_window, text="列名行号:").pack(pady=(10, 5)) row_var = tk.StringVar(value="1") row_entry = ttk.Entry(manual_window, textvariable=row_var) row_entry.pack(fill=tk.X, padx=20, pady=5) # 列号输入 ttk.Label(manual_window, text="信号列号:").pack(pady=(10, 5)) col_var = tk.StringVar(value="1") col_entry = ttk.Entry(manual_window, textvariable=col_var) col_entry.pack(fill=tk.X, padx=20, pady=5) # 确认按钮 def confirm_selection(): try: file_idx = file_combo.current() file_path = self.files[file_idx] header_row = int(row_var.get()) signal_col = int(col_var.get()) # 保存到缓存 header_info = {"header_row": header_row, "signal_col": signal_col} self.save_header_cache(file_path, header_info) messagebox.showinfo("成功", f"已为 {os.path.basename(file_path)} 设置列名位置：行{header_row} 列{signal_col}") manual_window.destroy() except Exception as e: messagebox.showerror("错误", f"无效输入: {str(e)}") ttk.Button(manual_window, text="确认", command=confirm_selection).pack(pady=20) if name == "main": root = tk.Tk() app = EnhancedSignalComparator(root) root.mainloop() 日志内容为：2025-07-23T13:33:19.517561 - 日志初始化上述两个问题还是没有解决：1、文件夹内一共三个文件，每个文件中都有只有一个信号，但是实际上，901_CAN送受信値-f.xlsx中没有找到信号，在【ドラフト版】D01D-00-02(HEV車).xlsm与【ドラフト版】D01D-00-03(コンベ車).xlsx分别找到了两个信号，但实际上分别只有一个信号。 2、行内容比对中【ドラフト版】D01D-00-02(HEV車).xlsm与【ドラフト版】D01D-00-03(コンベ車).xlsx搜索到的内容相同，但实际上有不同的

4. **信号值匹配不精确**：匹配时使用了re.escape和case=False，可能匹配到多个信号。 **解决方案**： 1. **确保键的唯一性**：在创建signal_key时，加入文件路径和行索引，确保键的唯一性。 2. **扩大列范围...

JavaWeb跨域问题及解决方案

1、什么是跨域跨域是指一个域下的文档或脚本试图去请求另一个域下的资源，这里跨域是广义的 1.) 资源跳转： A链接、重定向、表单提交 2.) 资源嵌入： <link>、<script>、<img>、<frame>等dom标签，还有样式中background:url()、@font-face()等文件外链 3.) 脚本请求： js发起的ajax...

基于OpenCV手势识别控制电脑音量及多项实用小项目源码分享

内容概要：本文介绍了一个基于OpenCV的手势控制电脑音量的小项目，详细讲解了从环境搭建到最终实现的......

马运良

行业讲师

曾就职于多家知名的IT培训机构和技术公司，担任过培训师、技术顾问和认证考官等职务。

最低0.47元/天解锁专栏

赠100次下载

百万级高质量VIP文章无限畅学

千万级优质资源任意下载

千万级优质文库回答免费看

专栏简介

这个专栏旨在帮助读者掌握计算机二级Excel常考函数，从而提高数据处理和分析的效率。专栏内容包括使用VLOOKUP函数进行纵向查找、SUMIFS函数进行多条件求和、MID函数进行字符串截取、IF、INDEX和IF函数进行模糊匹配、SUMIF函数进行条件求和、SUMPRODUCT函数进行条件求和、AVERAGE函数进行求平均值以及LOOKUP函数进行近似匹配。通过学习这些函数的具体用法和案例分析，读者将能够更加熟练地利用Excel进行数据处理和分析工作，提升工作效率和准确性。无论是初学者还是已有一定经验的使用者，都能从专栏中获得实用的知识和技巧，为自己的Excel技能提升和职场发展打下坚实的基础。

立即解锁

专栏目录

资源上传下载、课程学习等过程中有任何疑问或建议，欢迎提出宝贵意见哦~我们会及时处理！点击此处反馈

15. 计算机二级-使用IF、INDEX和IF进行模糊匹配

相关推荐

2021-2022计算机二级等级考试试题及答案No.10427.docx

2021-2022计算机二级等级考试试题及答案No.18519.docx

19. 计算机二级-使用LOOKUP函数进行近似匹配

1. 计算机二级-使用VLOOKUP函数进行纵向查找

计算机二级vf笔试资料.pdf

全国计算机等级考试二级MSOffice高级应用Excel函数总结.doc

JavaWeb跨域问题及解决方案

基于OpenCV手势识别控制电脑音量及多项实用小项目源码分享

专栏目录

最新推荐

【代码优化图表性能】：Coze减少代码冗余提升图表速度的秘诀

【信道编解码器Simulink仿真】：编码与解码的全过程详解

MATLAB GUI设计：打造用户友好工具，轻松计算Dagum基尼系数（动手指南）

工作流版本控制：管理Coze工作流变更的最佳实践与策略

【MATLAB机器学习进阶篇】：大数据环境下外部函数的性能挑战与应对

多语言支持：Coze本地RAG知识库的国际化知识管理平台构建攻略

【Matlab优化算法】：提升问题解决能力的工具箱

架构可扩展性：COZE工作流的灵活设计与未来展望

【coze工作流的音频处理】：打造与画面相匹配的音效

从理论到实践：遗传算法的MATLAB实现与应用深度解析