问题
用 Python 写一个爬图片的程序,爬 这个链接里的日本妹子图片 ?
( 这妹子真漂亮…
代码
from bs4 import BeautifulSoup
import requests, os
def get_html(url):
html = requests.get(url)
# print(html.text)
return html.text
def get_pic_urls(html):
bs = BeautifulSoup(html, 'html.parser')
urls = set()
for image in bs.find_all('img'):
try:
if image.get('pic_type'):
urls.add(image['src'])
# print('ok', image['src'])
except:
continue
return urls
def download_pics(urls, path):
if not os.path.exists(path):
os.mkdir(path)
os.chdir(path)
sum = 0
for url in urls:
try:
r = requests.get(url)
pic_name = url[-10:]
with open(pic_name, 'wb') as f:
f.write(r.content)
sum += 1
f.close()
except:
continue
print('success download ', sum)
if __name__ == '__main__':
url = 'http://tieba.baidu.com/p/2166231880';
html = get_html(url)
urls = get_pic_urls(html)
# print(urls)
download_pics(urls, './shanbenyoumei')
注解
要分析要下载链接的格式:
<img pic_type="0" class="BDE_Image" src="http://imgsrc.baidu.com/forum/w%3D580%3Bcp%3Dtieba%2C10%2C302%3Bap%3D%C9%BC%B1%BE%D3%D0%C3%C0%B0%C9%2C90%2C310/sign=8800a2e3b3119313c743ffb855036fa7/1e29460fd9f9d72abb1a7c3cd52a2834349bbb7e.jpg"
bdwater="杉本有美吧,955,550" width="560" height="323" changedsize="true">
这是贴吧回复中的图片链接格式,跟头像等其他图片链接相比,其有 pic_type
属性,因此用 bs 选择有 pic_type
属性的链接。