xpath 是一种html文件的解析方法, 和beautifulsoup 库作用一样
- xpath中获取标签中的属性的值使用’@'符号
– i.xpath(‘./a/text()’) 解析标签内容
– i.xpath(‘./a/@href’) 解析标签中的属性值
from lxml import etree
import requests
url="https://www.shicimingju.com/book/sanguoyanyi.html"
head={
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36 Edg/88.0.705.74"
}
req=requests.get(url=url,headers=head)
req.encoding=req.apparent_encoding
text=req.text
page=etree.HTML(text)
page_list=page.xpath('//*[@id="main_left"]/div/div[4]/ul/li')
for i in page_list:
c=i.xpath('./a/text()')
u=i.xpath('./a/@href')
c.append(f"https://www.shicimingju.com/{u[0]}")
with open('./sanguo.txt','a+',encoding='UTF-8') as f:
f.write(f"{c}\n")