小白python爬虫学习1（vscode+乱码）

最新推荐文章于 2024-08-23 12:19:26 发布

不会打代码的猪

最新推荐文章于 2024-08-23 12:19:26 发布

阅读量358

点赞数 2

CC 4.0 BY-SA版权

文章标签： python

本文链接：https://blog.csdn.net/qq_40533899/article/details/113463175

本文介绍了在VSCode中遇到中文乱码问题时，通过设置编码和使用BeautifulSoup解析网页目录的步骤，重点在于解决gzip编码和输出编码的调整方法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

import requests
from bs4 import BeautifulSoup
import chardet
 

target_url = "https://www.dmzj.com/info/yaoshenji.html"
r = requests.get(url=target_url)


bs = BeautifulSoup(r.text, 'lxml')
list_con_li = bs.find('ul', class_="list_con_li")
comic_list = list_con_li.find_all('a')
chapter_names = []
chapter_urls = []
for comic in comic_list:
    href = comic.get('href')
    name = comic.text
    chapter_names.insert(0, name)
    chapter_urls.insert(0, href)

print(chapter_names)
print(chapter_urls)

准备从一个无名小说网站获取目录和链接，然鹅在vscode打印出来发现中文乱码

查阅其他博客，发现可能的问题

1.是网页编码虽然是utf-8，但还用了gzip怀疑可能影响，之后找解决方法，但还没找到可以用的

2vscode本身的问题，网上查了关于print的问题，找到了俩个简便的方法

1不用run code，用调试即f5来解决问题，乱码问题不在，

2在代码前填上

 
import io
import sys
#改变标准输出的默认编码
sys.stdout=io.TextIOWrapper(sys.stdout.buffer,encoding='utf8')

也可以解决，

至于为啥是这么个结果我也不懂，希望有大佬帮忙解释一下