fake_useragent模块可以随机生成User-Agent, 我们不用再自己去收集User-Agent,
用法也很简单
首先导入模块:
from fake_useragent import UserAgent
实例化对象然后调用就可以了(这个模块的UserAgent多到不能想象)
ua = UserAgent()
>>> ua.random
'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:23.0) Gecko/20131011 Firefox/23.0'
>>> ua.random
'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0'
>>> ua.ie
'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; Media Center PC 6.0; InfoPath.3; MS-RTC LM 8; Zune 4.7)'
>>> ua.firefox
'Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:21.0.0) Gecko/20121011 Firefox/21.0.0'
>>>
然后是在scrapy中的DOWNLOADER_MIDDLEWARES使用:
网上有很多种方法, 但比较推荐下面的这种方法,别的方法设置不对的话,很容易失败,这里就不写了
class UserAgentMiddleware(object):
def __init__(self, crawler):
super().__init__()
self.ua = UserAgent()
@classmethod
def from_crawler(cls, crawler):
return cls(crawler)
def process_request(self, request, spider):
request.headers['User-Agent'] = self.ua.random