自然语言处理的一些环境和包

最新推荐文章于 2022-03-18 11:56:39 发布

原创最新推荐文章于 2022-03-18 11:56:39 发布 · 466 阅读

1 ·

CC 4.0 BY-SA版权

日常专栏收录该内容

468 篇文章

订阅专栏

NLTK是一个开源免费的项目，只需要下载即可
支持三个平台
直接安装

pip3 install nltk  -i https://pypi.doubanio.com/simple

中文分词模块
CRF
NShort
安装Ltp Python

 pip3 install pyltp  -i https://pypi.doubanio.com/simple

遇到这个bug

error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

请使用注意你是Python几就写几

sudo apt-get install python3.7-dev

目前他卡住了
.9-4 都是过卡
我等等他
等出来个

error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

参照
https://github.com/HIT-SCIR/pyltp
进行源码安装
安装成功
测试一下在这之前要下载模型
pyltp 版本：0.3.0
LTP 版本：3.4.0
模型版本：3.4.0
https://pan.baidu.com/share/link?errmsg=Auth+Login+Sucess&errno=0&shareid=1988562907&ssnerror=0&&uk=2738088569#list/path=%2F

from pyltp import Segmentor
model_path="/home/dfy/ltp-models/3.4.0/ltp_data_v3.4.0/cws.model"
seg=Segmentor()
seg.load(model_path)
words=seg.segment("请问你们看琉璃这个电视剧吗")
print("|".join(words))
if __name__ == '__main__':
    pass

使用jieba分词块

pip3 install jieba  -i https://pypi.doubanio.com/simple

例子代码

import jieba
words_c="今天实在是太热了，你不热吗"


# 精确模式
ws=jieba.cut(words_c)
print("|".join(ws))
# 搜索引擎模式
ws=jieba.cut_for_search(words_c)
print("|".join(ws))
if __name__ == '__main__':
    pass