ragflow在鲲鹏服务器上启动出现[root@k8s01 docker]# docker logs -f ragflow-server Starting nginx... Starting ragflow_server... Starting 1 task executor(s) on host '61e1fda06dea'... 2025-07-30 07:27:35,391 INFO 21 ragflow_server log path: /ragflow/logs/ragflow_server.log, log levels: {'peewee': 'WARNING', 'pdfminer': 'WARNING', 'root': 'INFO'} 2025-07-30 07:28:07,339 INFO 21 found 0 gpus 2025-07-30 07:28:10,138 INFO 21 [HUQIE]:Trie file /ragflow/rag/res/huqie.txt.trie not found, build the default trie file 2025-07-30 07:28:10,139 INFO 21 [HUQIE]:Build trie from /ragflow/rag/res/huqie.txt 2025-07-30 07:28:36,500 INFO 21 [HUQIE]:Build trie cache to /ragflow/rag/res/huqie.txt.trie 2025-07-30 07:28:42,050 INFO 21 init database on cluster mode successfully 2025-07-30 07:28:47,514 INFO 21 load_model /ragflow/rag/res/deepdoc/det.onnx uses CPU 2025-07-30 07:28:47,647 INFO 21 load_model /ragflow/rag/res/deepdoc/rec.onnx uses CPU Traceback (most recent call last): File "/ragflow/api/ragflow_server.py", line 36, in <module> from api.apps import app File "/ragflow/api/apps/__init__.py", line 137, in <module> client_urls_prefix = [ File "/ragflow/api/apps/__init__.py", line 138, in <listcomp> register_page(path) for dir in pages_dir for path in search_pages_path(dir) File "/ragflow/api/apps/__init__.py", line 120, in register_page spec.loader.exec_module(page) File "/ragflow/api/apps/api_app.py", line 28, in <module> from api.db.services.dialog_service import DialogService, chat File "/ragflow/api/db/services/dialog_service.py", line 36, in <module> from rag.app.resume import forbidden_select_fields4resume File "/ragflow/rag/app/resume.py", line 27, in <module> from deepdoc.parser.resume import step_one, step_two File "/ragflow/deepdoc/parser/resume/step_two.py", line 26, in <module> from deepdoc.parser.resume.entities import degrees, schools, corporations File "/ragflow/deepdoc/parser/resume/entities/corporations.py", line 93, in <module> GOOD_CORP = set([corpNorm(rmNoise(c), False) for c in GOOD_CORP]) File "/ragflow/deepdoc/parser/resume/entities/corporations.py", line 93, in <listcomp> GOOD_CORP = set([corpNorm(rmNoise(c), False) for c in GOOD_CORP]) File "/ragflow/deepdoc/parser/resume/entities/corporations.py", line 68, in corpNorm tks = rag_tokenizer.tokenize(nm).split() File "/ragflow/rag/nlp/rag_tokenizer.py", line 331, in tokenize res.extend([self.stemmer.stem(self.lemmatizer.lemmatize(t)) for t in word_tokenize(L)]) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 142, in word_tokenize sentences = [text] if preserve_line else sent_tokenize(text, language) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize tokenizer = _get_punkt_tokenizer(language) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer return PunktTokenizer(language) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__ self.load_lang(lang) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang lang_dir = find(f"tokenizers/punkt_tab/{lang}/") File "/ragflow/.venv/lib/python3.10/site-packages/nltk/data.py", line 579, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('punkt_tab') For more information see: https://www.nltk.org/data.html Attempted to load tokenizers/punkt_tab/english/ Searched in: - '/root/nltk_data' - '/ragflow/.venv/nltk_data' - '/ragflow/.venv/share/nltk_data' - '/ragflow/.venv/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' ********************************************************************** Traceback (most recent call last): File "/ragflow/rag/svr/task_executor.py", line 57, in <module> from rag.app import laws, paper, presentation, manual, qa, table, book, resume, picture, naive, one, audio, \ File "/ragflow/rag/app/resume.py", line 27, in <module> from deepdoc.parser.resume import step_one, step_two File "/ragflow/deepdoc/parser/resume/step_two.py", line 26, in <module> from deepdoc.parser.resume.entities import degrees, schools, corporations File "/ragflow/deepdoc/parser/resume/entities/corporations.py", line 93, in <module> GOOD_CORP = set([corpNorm(rmNoise(c), False) for c in GOOD_CORP]) File "/ragflow/deepdoc/parser/resume/entities/corporations.py", line 93, in <listcomp> GOOD_CORP = set([corpNorm(rmNoise(c), False) for c in GOOD_CORP]) File "/ragflow/deepdoc/parser/resume/entities/corporations.py", line 68, in corpNorm tks = rag_tokenizer.tokenize(nm).split() File "/ragflow/rag/nlp/rag_tokenizer.py", line 331, in tokenize res.extend([self.stemmer.stem(self.lemmatizer.lemmatize(t)) for t in word_tokenize(L)]) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 142, in word_tokenize sentences = [text] if preserve_line else sent_tokenize(text, language) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize tokenizer = _get_punkt_tokenizer(language) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer return PunktTokenizer(language) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__ self.load_lang(lang) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang lang_dir = find(f"tokenizers/punkt_tab/{lang}/") File "/ragflow/.venv/lib/python3.10/site-packages/nltk/data.py", line 579, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('punkt_tab') For more information see: https://www.nltk.org/data.html Attempted to load tokenizers/punkt_tab/english/ Searched in: - '/root/nltk_data' - '/ragflow/.venv/nltk_data' - '/ragflow/.venv/share/nltk_data' - '/ragflow/.venv/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' **********************************************************************
时间: 2025-08-05 16:49:42 AIGC 浏览: 47
在鲲鹏服务器上启动 Ragflow 时,若遇到 `LookupError: Resource 'tokenizers/punkt_tab' not found` 错误,表明系统缺少 NLTK 的 `punkt_tab` 资源。该资源是 NLTK 库用于分句(sentence tokenization)的关键数据文件。以下是解决此问题的详细方案:
### 1. 安装 NLTK 并下载 `punkt_tab` 资源
确保 Python 环境中已安装 `nltk` 库,并手动下载 `punkt_tab` 资源包。
```bash
pip install nltk
```
然后运行以下 Python 代码以下载 `punkt_tab` 资源:
```python
import nltk
nltk.download('punkt_tab')
```
如果因网络问题无法在线下载,可考虑手动下载资源包并指定路径加载。
### 2. 手动安装 `punkt_tab` 资源
如果服务器无法访问外部网络,可从有网络的机器上下载 `punkt_tab` 资源包,通常位于 `nltk_data/tokenizers/punkt_tab` 目录下,将其拷贝至鲲鹏服务器的 NLTK 数据目录中。
默认的 NLTK 数据目录为:
- Linux/macOS: `~/nltk_data/`
- Windows: `C:\nltk_data\`
将资源文件拷贝至服务器的对应目录,例如:
```
~/nltk_data/tokenizers/punkt_tab/
```
### 3. 设置 NLTK 数据路径
若资源文件放置在非默认路径中,可通过环境变量或代码设置 NLTK 数据路径:
```python
import nltk
nltk.data.path.append("/path/to/your/nltk_data")
```
或在系统环境中设置:
```bash
export NLTK_DATA=/path/to/your/nltk_data
```
### 4. 验证资源是否加载成功
执行以下代码验证是否能正确加载 `punkt_tab` 资源:
```python
from nltk.tokenize import sent_tokenize
text = "This is a test sentence. Another sentence follows."
print(sent_tokenize(text))
```
若输出正确分句结果,则说明资源加载成功,Ragflow 启动时应不再报错 `LookupError: punkt_tab missing` [^3]。
---
阅读全文
相关推荐


















