Python在request设置代理ip
验证请求的ip是否有效的两种方法
tid = ''
pass-----word = ""
tunnel_host = "tps202.kdlapi.com"
tunnel_port =
proxies = {
"http": "http://%s:%s@%s:%s/" % (tid, password, tunnel_host, tunnel_port),
"https": "http://%s:%s@%s:%s/" % (tid, password, tunnel_host, tunnel_port)
}
a=requests.get("http://icanhazip.com/",proxies=proxies)
a.text
成功返回'124.94.241.40\n'
失败返回'Proxy Bad Server'
或者 '2408:823c:3101:7eec:7dc1:da28:9174:7af0\n'类似的
因此为了爬虫稳定性,建议判断自己的ip是否正确
http://icanhazip.com/是一个返回请求ip的网站来验证请求的ip是否有效
也可以用快代理自己的网站 https://dev.kdlapi.com/testproxy
a=requests.get("http://icanhazip.com/",proxies=proxies)
a.text
返回 'sucess! client ip: 113.121.75.65 '
失败返回
ProxyError: HTTPSConnectionPool(host='dev.kdlapi.com', port=443): Max retries exceeded with url: /testproxy (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 502 Proxy Bad Server')))
Python在scrapy设置代理ip
Spider
import scrapy
class XXXSpider(scrapy.Spider):
name = 'XXXscrapy'
allowed_domains = ['icanhazip.com']#不加也行
start_urls = ['http://icanhazip.com/']
def parse(self, response):
print(response.text)
middlewares
67行左右process_request
class xxxxDownloaderMiddleware:
def process_request(self, request, spider):
# 使用隧道代理
# 隧道服务器
tunnel_host = "tps202.kdlapi.com"#我这里是快代理
tunnel_port = 15818#端口
# # 隧道id和密码
tid = -----
pas-------word = -----
proxies = {
"http": "http://%s:%s@%s:%s/" % (tid, password, tunnel_host, tunnel_port),
"https": "http://%s:%s@%s:%s/" % (tid, password, tunnel_host, tunnel_port)
}
request.meta['proxy']=proxies["http"]
request.headers['Proxy-Authorization'] = basic_auth_header(tid, password)
settings
65行左右开启Middleware中的xxxxDownloaderMiddleware下载中间件
DOWNLOADER_MIDDLEWARES = {
'shell.middlewares.ShellDownloaderMiddleware': 100,#543,官方文档调度为750,低于750就可以
# 'shell.middlewares.ProxyMiddleware': 100
}
成功返回ip,设置到Middleware里,scrapy会自己更换无效的ip
判断是否为ip地址的函数,经典算法题
def isip(str1):