python jupyter 爬虫

最新推荐文章于 2023-07-10 00:19:24 发布

Chaokwang

最新推荐文章于 2023-07-10 00:19:24 发布

阅读量1.6k

点赞数 1

CC 4.0 BY-SA版权

文章标签： python jupyter 爬虫

本文链接：https://blog.csdn.net/qq_39595615/article/details/82596742

本文介绍如何利用Python中的Requests库获取网页内容，并使用BeautifulSoup解析这些内容，从中提取链接。通过实例演示了从央视网站爬取所有链接的过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

**#requests是正则表达式库
**#Beautiful Soup是python的一个库，最主要的功能是从网页抓取数据
# 将网页爬取的内容转换成 r.encoding=r.apparent_encoding****

import requests
from bs4 import BeautifulSoup as bs
r=requests.get("http://www.cctv.com")
r.encoding=r.apparent_encoding
soup=bs(r.text)
text=soup.find_all("a")
for i in text:
print(i.get_text())