【定制开发】【M6】Python爬虫 - 获取【猪八戒】最新发布需求，并实时通知用户

背景

有个朋友计划拓展业务渠道，准备在众包平台上接单，他的主营产品是微信小程序，因此他想第一时间收到客户发出的需求信息，然后第一时间联系客户，这样成交率才能够得到保障，否则单早都被其他同行接完了，他的黄花菜也就都凉了。

开发环境

开发语言 Python ，开发架构Scrapy，非 Python 莫属，数据采集的神器！
开发工具 PyCharm;

功能设计

实时通知：采用发邮件方式通知，将邮箱绑定到微信，实现实时通知的效果。
过滤模块：根据标题和内容双重过滤关键词，不符合要求的订单丢弃，符合要求的订单实时通知。
配置模块：采用json文件配置。

关键代码

采集模块

# -*- coding: utf-8 -*-
import re
import time

import scrapy
from scrapy import Selector

from .. import common

class ZbjtaskSpider(scrapy.Spider):
    name = 'zbjtask'
    allowed_domains = ['zbj.com']
    start_urls = ['https://task.zbj.com/?m=1111&so=1&ss=0&fee=1']

    def parse(self, response):
        #30 item per page
        nodes = response.xpath('//div[@class="demand-card"]').getall()
        id_nodes = response.xpath('//a[@class="prevent-defalut-link"]/@href').getall()

        print(id_nodes)
        max_id = 0
        for url in id_nodes:
            # //task.zbj.com/16849389/
            pattern = re.compile("/\d*/$")
            id_str_ori = pattern.findall(url).pop()
            id_str = id_str_ori[1:len(id_str_ori) - 1]
            id = int(id_str)
            if id > max_id:
                max_id = id
        print(max_id)

        for node in nodes:
            date = Selector(text=node).xpath('//span[@class="card-pub-time flt"]/text()').get()
            url = "https:" + Selector(text=node).xpath('//a[@class="prevent-defalut-link"]/@href').get()
            name = Selector(text=node).xpath('//a[@class="prevent-defalut-link"]/text()').get()
            desc = Selector(text=node).xpath('//div[@class="demand-card-desc"]/text()').get()
            price = Selector(text=node).xpath('//div[@class="demand-price"]/text()').get()
            tag = Selector(text=node).xpath('//span[@class="demand-tags"]/i/text()').get()

            # //task.zbj.com/16849389/
            pattern = re.compile("/\d*/$")
            id_str_ori = pattern.findall(url).pop()
            id_str = id_str_ori[1:len(id_str_ori)-1]
            id = int(id_str)

            sended_id = common.read_taskid()
            if  id > sended_id :
                subject = "ZBJ " + id_str + " " + name
                # content = price + "\n" + desc + "\n" +  url + "\n" + tag + "\n"
                content = "%s <p> %s <p> <a href=%s>%s</a>  <p> %s" % (price, desc, url, url, tag)
                if common.send_mail(subject, content):
                    print("ZBJ mail: send task <%r> sucess " % id)
                else:
                    print("ZBJ mail: send task <%r> fail " % id)
            else :
                print("mail: task is already sended  <%r>" % id)
            time.sleep(3)

        common.write_taskid(id=max_id)

通知模块


def send_mail(subject, content):
    sender = u'xxxxx@qq.com'  # 发送人邮箱
    passwd = u'xxxxxx'  # 发送人邮箱授权码
    receivers = u'xxxxx@qq.com'  # 收件人邮箱

    # subject = u'一品威客 开发任务 ' #主题
    # content = u'这是我使用python smtplib模块和email模块自动发送的邮件'    #正文
    try:
        # msg = MIMEText(content, 'plain', 'utf-8')
        msg = MIMEText(content, 'html', 'utf-8')
        msg['Subject'] = subject
        msg['From'] = sender
        msg['TO'] = receivers

        s = smtplib.SMTP_SSL('smtp.qq.com', 465)
        s.set_debuglevel(1)
        s.login(sender, passwd)
        s.sendmail(sender, receivers, msg.as_string())
        return True
    except Exception as e:
        print(e)
        return False