UnicodeDecodeError:‘utf-8‘codec can‘t decode byte Oxff in position 0:invalid start byte（解决方法）

wdrawing

于 2020-08-16 18:12:23 发布

阅读量1.1k

点赞数

CC 4.0 BY-SA版权

文章标签： python

本文链接：https://blog.csdn.net/wdrawing/article/details/108039820

当遇到'utf-8' codec can't decode byte 0xff in position 0: invalid start byte错误时，常见解决方案是设置errors='ignore'，但这可能导致字母乱码。一种有效的方法是在open()函数中尝试使用不同的编码，如'utf-16'或'gbk'，并结合errors='ignore'来避免错误。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

import jieba
def get_text(filepath):
    f = open(filepath,'r',encoding="utf-8")
    text = f.read()
    f.close()
    return text


def word_freq(filepath,text,topn):
    words = jieba.lcut(text.strip())
    counts = {}
    for word in words:
        if len(word) ==1:
            continue
        counts[word] = counts.get(word,0)+1
    items = list(counts.items())
    items.sort(key = lambda x:x[1],reverse=True )
    f = open(filepath[:-4]+'_词频.txt','w')
    for i in range(topn):
        word , count = items[i]
        f.writelines("{}\t{}\n".format(word,count))
    f.close()

运行主函数后
UnicodeDecodeError:‘utf-8’ codec can’t decode byte Oxff in position 0:invalid start byte

在网上找到的办法是
将encoding=“utf-8” 改为errors=“ignore”
但这个方法仅对个别文本有效，部分文本使用这个之后出现字母乱码

在此要特别感谢我朋友点醒了我
在记事本下方有
在这里插入图片描述
此类文本改 f = open(filepath,‘r’, encoding=‘utf-16’, errors = “ignore”)

f = open(filepath,‘r’, encoding=‘gbk’, errors = “ignore”)

以此类推
问题解决