编程新手
无法从属于同一网站的某个域中获取内容。在
例如,我可以抓取it.example.com、es.example.com、pt.example.com,但当我尝试用fr.example.com或{}进行相同的操作时,我得到:2017-12-17 14:20:27 [scrapy.extensions.telnet] DEBUG: Telnet console
listening on 127.0.0.1:6025
2017-12-17 14:21:27 [scrapy.extensions.logstats] INFO: Crawled 0 pages
(at
0 pages/min), scraped 0 items (at 0 items/min)
2017-12-17 14:22:27 [scrapy.extensions.logstats] INFO: Crawled 0 pages
(at
0 pages/min), scraped 0 items (at 0 items/min)
2017-12-17 14:22:38 [scrapy.downloadermiddlewares.retry] DEBUG:
Retrying
(failed 1 times): TCP
connection
timed out: 110: Connection timed out.
这是蜘蛛一些.py
^{pr2}$
我的尝试:从不同的IP运行spider(相同域的问题相同)
添加IP池(无效)
在Stackoverflow上的某处发现:在setting.py中,set
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95
Safari/537.36'
ROBOTSTXT_OBEY = False
欢迎有任何想法!在