python网络爬虫(第二章/共三章：安装浏览器驱动，驱动浏览器加载网页、批量下载资源)

星期天要睡觉

已于 2025-07-20 22:39:22 修改

阅读量1.2k

点赞数 32

CC 4.0 BY-SA版权

分类专栏： html和网络爬虫文章标签： python 爬虫开发语言

于 2025-07-17 21:51:21 首次发布

本文链接：https://blog.csdn.net/2302_78022640/article/details/149431071

html和网络爬虫专栏收录该内容

6 篇文章

订阅专栏

python网络爬虫(第二章/共三章：安装浏览器驱动，驱动浏览器加载网页、批量下载资源)

学习python网络爬虫的完整路径：

（第一章）

python网络爬虫(第一章/共三章：网络爬虫库、robots.txt规则（防止犯法）、查看获取网页源代码)-CSDN博客https://blog.csdn.net/2302_78022640/article/details/149428719?sharetype=blogdetail&sharerId=149428719&sharerefer=PC&sharesource=2302_78022640&spm=1011.2480.3001.8118（第二章即此篇文章）

（第三章）

python网络爬虫(第三章/共三章：驱动浏览器窗口界面，网页元素定位，模拟用户交互（输入操作、点击操作、文件上传），浏览器窗口切换，循环爬取存储）-CSDN博客https://blog.csdn.net/2302_78022640/article/details/149453182?spm=1011.2415.3001.5331

（额外加一个小项目）

python网络爬虫小项目（爬取评论）超级简单-CSDN博客https://blog.csdn.net/2302_78022640/article/details/149488367?spm=1011.2124.3001.6209

安装浏览器驱动

一：

查看浏览器版本：

二：

安装对应版本驱动器

Microsoft Edge WebDriver | Microsoft Edge Developer

打开往下翻

下载好后解压，复制文件夹内的.exe文件粘贴到你的python的Scripts文件夹路径中即可

正式学习爬虫

1.from selenium import webdriver

核心作用：导入 Selenium 的webdriver模块，该模块是实现浏览器自动化的基础。
具体功能：
- 提供各种浏览器的驱动类（如webdriver.Edge、webdriver.Chrome等），通过这些类可以创建浏览器实例，控制浏览器的启动、访问网页、关闭等操作。
- 例如，driver = webdriver.Edge()会初始化一个 Edge 浏览器实例，后续通过driver对象调用get()（访问网页）、find_element()（定位元素（第三章里使用））等方法实现自动化。

2. from selenium.webdriver.edge.options import Options

核心作用：导入 Edge 浏览器的配置类Options，用于自定义浏览器的启动参数和行为。
具体功能：
- 指定浏览器路径：通过binary_location参数设置 Edge 浏览器的安装路径（如edge_options.binary_location = r"C:\...\msedge.exe"），确保驱动能正确找到浏览器程序。
- 启用无界面模式：通过add_argument('--headless')配置浏览器在后台运行，不显示窗口，适用于服务器环境或批量任务。
- 其他参数配置：还可设置窗口大小、禁用插件、添加代理等（如add_argument('--window-size=1920,1080')设置窗口尺寸）。
- 最终通过webdriver.Edge(options=edge_options)将配置应用到浏览器实例中。

加载网页

代码 1

from selenium import webdriver
from selenium.webdriver.edge.options import Options
edge_options = Options()
edge_options.binary_location = r"C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe"
driver = webdriver.Edge(options=edge_options)
driver.get("http://www.baidu.com")
input("")

运行结果：启动 Edge 浏览器并打开百度首页，程序等待用户输入后退出。
代码解析：配置 Edge 浏览器路径，创建 WebDriver 实例，打开指定 URL，input ("") 用于保持浏览器打开状态。

代码 2

from selenium import webdriver
from selenium.webdriver.edge.options import Options
edge_options = Options()
edge_options.binary_location = r"C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe"
driver = webdriver.Edge(options=edge_options)
driver.get('https://www.ptpress.com.cn//periodical')
input("")

运行结果：启动 Edge 浏览器并打开人民邮电出版社期刊页面，等待用户输入后退出。
代码解析：同代码1，仅 URL 不同，打开的是出版社期刊页面。

打开新标签页

代码

from selenium import webdriver
from selenium.webdriver.edge.options import Options
edge_options = Options()
edge_options.binary_location = r"C:\Program Files\Google\Edge\Application\edge.exe"
driver = webdriver.Edge(options=edge_options)
driver.get('https://www.ptpress.com.cn/')
driver.execute_script("window.open('https://www.ptpress.com.cn/login','_blank');")
driver.execute_script("window.open('https://www.shuyishe.com/','_blank');")
driver.execute_script("window.open('https://www.shuyishe.com/course','_blank');")
input("")

运行结果：启动 Edge 浏览器，先打开出版社首页，再依次在新标签页中打开登录页、书艺社首页和课程页，等待用户输入后退出。

代码解析：通过 execute_script 执行 JavaScript 代码，使用 window.open 方法在新标签页打开指定 URL。

获取渲染后的网页代码

代码

from selenium import webdriver
from selenium.webdriver.edge.options import Options
edge_options = Options()
edge_options.binary_location = r"C:\Program Files\Google\Edge\Application\edge.exe"
driver = webdriver.Edge(options=edge_options)
driver.get('https://www.ptpress.com.cn/')
print(driver.page_source)
input("")

运行结果：启动 Edge 浏览器打开出版社首页，打印浏览器渲染后的完整 HTML 源代码，等待用户输入后退出。
代码解析：通过 page_source 属性获取浏览器当前页面的源代码，适用于获取动态加载内容。

批量下载网页中的资源

代码

from selenium import webdriver
from selenium.webdriver.edge.options import Options
import re
import requests

edge_options = Options()
edge_options.binary_location = r"C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe"
driver = webdriver.Edge(options=edge_options)
driver.get("https://www.ptpress.com.cn/search?keyword=C")
imgs = re.findall(r'<img src="(.+?jpg)">', driver.page_source)
a=1
for i in imgs:
    f = open('imgs/'+str(a)+'.jpg', 'wb')
    a+=1
    img = requests.get(i)
    f.write(img.content)
    f.close()

运行结果：启动 Edge 浏览器打开 C 语言相关搜索结果页，从页面源代码中提取所有 jpg 图片 URL，下载并保存到本地 imgs（提前创建）文件夹中。

代码解析：使用正则表达式从页面源代码中提取图片 URL，通过 requests 库下载图片并保存到本地文件。