SlideShare a Scribd company logo
Website Monitoring with Distributed
Messages/Tasks Processing (AMQP &
       RabbitMQ) on Django
About me?
●   Rahmat Ramadhan Irianto
●   Software Developer at Void-Labs & Defpy-Labs
●   is a Open Source Software Developer Team
●   A Student from Indonesian University STMIK
    Dipanegara 2010 Makassar
●   Lives in Indonesian, Makassar
●   Write Python Apps every day
What is Website-Monitoring ?




●   Website monitoring provides page change monitoring
    and notification services to internet users worldwide.
    Website monitoring will create a change log for the
    page and alert user by email when it detects a change
    in the page text.
What Useful For ?
●   Website monitoring can monitor almost any page on the internet and when it
    detect page changes then it will alert you by email.
●   Website Monitoring can be your good choice for business intelligence
    strategy. Track your competition and get timely alerts when a they changes
    their website. or You can Watch for developments at your customer's
    websites.
●   Monitor the press release page of companies you are invested in. Keep
    track of their current executives. Be alerted to changes on their home page.
●   Monitoring page privacy policies or terms and conditions without notice
    companies on the web , Now you can use website monitoring for alert you to
    these changes.
●   Monitor the new job listings pages at companies where you would like to
    work. When they post a new listing, we will email you.
●   Keep your up to date news. Monitor news page of your top site news. When
    they update it, you'll get an email alert.
                                                  Inspirate from changedetection
●   And much more                                 http://www.changedetection.com
What Power build Website-
       monitoring?




http://goo.gl/hCf34
Python !


                                     http://goo.gl/sSqHh


( Powerfull,Efficient,flexibility,ideal language,Effective for
      OOP,Elegant syntax,Rich of library & etc )
                     www.python.org
http://goo.gl/YXnA9




            Django !
( Django is a high-level Python Web framework that
encourages rapid development and clean, pragmatic
                    design & Etc)
        https://www.djangoproject.com/
Mongodb
  ( flexibility, powerfull, Fast,
        and ease of use )




http://www.mongodb.org

                                    http://goo.gl/NZQ18
RabbitMQ
  ( Powerfull,fast, reliable & high availability
 for message queuing system. open source
  queueing option & Greats for building and
      managing scalable applications)



http://www.rabbitmq.com
                                      http://goo.gl/Pvd9Q
Workflow Website-Monitoring
Ajax Post             Post Api



               request         If Post Api      Rest Api                     Save data


If ajax post                                                            Procces task
                                                                                       Scrape page

                          Message queue      Create worker     worker
  Myview
               Publish task

                                                             Save result

                                                                           If changepage

Save data
                                                                             Alert Email


                                                    Report Diff


                                                                                        Mongodb
Lets Talk About




          http://goo.gl/m8QUH
Why Mongodb ?
●   Greats features of document databases,key-
    value stores, and relational databases.
●   How greats ?
●     Fast
●     Smart
●     Scalable
●     Schema-less
●     Dynamic Query
●     Easy use & etc..
What we gonna Need ?


              +               = Pymongo
http://pypi.python.org/pypi/pymongo/
How to ?
import pymongo
from pymongo import Connection
collection_user = pymongo.Connection().website_monitor.user
collection_monitor = pymongo.Connection().website_monitor.monitor
collection_task = pymongo.Connection().website_monitor.task

INSERT
monitor = {'username':smart_str(request.user),
             'user_id':request.user.id,
             'url':url,
             'datetime':datetime.utcnow(),
             'status':status,
             'hit':0,
             'fail_hit':0,
             'period':int(request.POST.get('period')),
             'email':collection_user.find_one({'name':str(request.user)})['email'],
             'pk':pk,
             'last_checking':None,
             'task_id':task_id,
 }
collection_monitor.insert(monitor)
UPDATE
collection_user.update({'name':data_user['id']},{'$set':
{'email':data_user['email'],
                      'firstname':smart_str(data_user['first_name']),
                      'lastname':smart_str(data_user['last_name']),
                      'ip': request.META.get('REMOTE_ADDR','unknown'),
                      'login':datetime.now(),
                      'user_agent':
request.META.get('HTTP_USER_AGENT','unknown'),
                      'session':
request.META.get('XDG_SESSION_COOKIE','unknown'),
                      'session_fb':session_key,
                      'ts':datetime.now(),
                      'authkey':authkey,
                             }
                          }
                      )



 REMOVE
 if collection_content.find({'url':i['url']}).count() == 3:
     collection_content.remove({'url':i['url'][0]})
Why we must use Distributed
       Computing

       Distributed Computing
Is a method of solving computational
problem by dividing the problem into
  many tasks run simultaneously on
many hardware or software systems
             (Wikipedia)
What is Message queue ?
Message Queues are:
 0->Communication Buffers
 0->Between independent sender & receiver processes
 0->Asynchronous
  • Time of sending not necessarily same as receiving
  • In context of Web Applications:
     o Sender: Web Application Servers
     o Receiver: Background worker processes
     o Queue items: Tasks that the web server doesn’t
       have time/resources to do
How it work ?
Say a web application server has a task it
doesn’t have time to do
• It puts the task in the message queue
• Other web servers can access the same
queue(s)
and put tasks there
• Workers are greedy and they all watch the
queues for tasks
• Workers asynchronously pick up the first
available task on the queue when they are ready
What usefull for ?

• Message Queues are useful in certain
situations
• General guidelines:
  0->Does your web applications take more than
a few seconds to generate a response?
  o->Are you using a lot of cron jobs to process
data in the background?
  o->Do you wish you could distribute the
processing of the data generated by your
application among
many servers?
What We Need To Make Message
          Queue ?
AMQP & RabbitMQ
Why Choice AMQP & RabbitMQ ?
1.RabbitMQ is free to use
2.The documentation is decent
3.There is decent clustering support, even though we
never needed clustering
4.We didn’t want to lose queues or messages upon
broker crash/ restart
5. We develop applications using Python/django and
setting up an AMQP backend using carrot was
easy
Now Lets Talk about RabbitMQ
RabbitMQ ?

 RabbitMQ is Erlang-based open source
application that serves as a message broker or
message-oriented middleware.
 RabbitMQ implementation refers to the
application layer protocol that is the Advanced
Message Queuing Protocol(AMQP).
 AMQP provide an interoperable standard
protocol between the vendor to regulate the
exchange of messages on enterprise-scale
systems.
Why Use RabbitMQ ?
● We need For...
●  Running Task / Procces in the
  backround
●  Asynchronous tasking process
●  Scheduling system & Etc
So .. What make Rabbit Focus ?
Carrot !
           Carrot is an AMQP messaging
           queue framework. AMQP is the
           Advanced Message Queuing
           Protocol, an open standard
           protocol for message orientation,
           queuing, routing, reliability and
           security.

             Easy way to connect to
           RabbitMQ.
             Easy way to pull stuff out of the
           queue.
             Easy way to throw stuff into the
           queue.


 https://github.com/ask/carrot/
Concept ?
●   Publishers (Publishers sends messages to an exchange.)
●   Exchanges (Messages are sent to exchanges. Exchanges are named and can be
    configured to use one of several routing algorithms. The exchange routes the
    messages to consumers by matching the routing key in the message with the routing
    key the consumer provides when binding to the exchange.)
●   Consumers (Consumers declares a queue, binds it to a exchange and receives
    messages from it.)
●   Queues ( Queues receive messages sent to exchanges. The queues are declared by
    consumers. )
●   Routing keys ( Every message has a routing key. The interpretation of the routing
    key depends on the exchange type. There are four default exchange types defined by
    the AMQP standard, and vendors can define custom types (so see your vendors
    manual for details )
●   Exchange types defined by AMQP/0.8:
●     Direct exchange ( Matches if the routing key property of the message and the
    routing_key attribute of the consumer are identical. )
●     Fan-out exchange(Always matches, even if the binding does not have a routing
    key.)
●     Topic exchange (Matches the routing key property of the message by a primitive
    pattern matching scheme.)
Creating Connetion on Django

Settings.py
RABBITMQ_HOST = 'localhost'
RABBITMQ_PORT = 5672
RABBITMQ_USER = 'guest'
RABBITMQ_PASS = 'guest'
RABBITMQ_VHOST = '/'




Views.py
from carrot.messaging import Publisher, Consumer
from carrot.connection import AMQPConnection
from django.conf import settings

conn_for_carrot =
AMQPConnection(hostname=settings.RABBITMQ_HOST,
                  port=settings.RABBITMQ_PORT,
                  userid=settings.RABBITMQ_USER,
                  password=settings.RABBITMQ_PASS,
                  vhost=settings.RABBITMQ_VHOST)
Publisher
      publisher = Publisher(connection=conn_for_carrot,
exchange='website_monitoring_exchange', exchange_type = 'direct')
      publisher.send({'msg':{'do': 'check',
                 'task_id':task_id,
                 }
            })




        publisher = Publisher(connection=conn_for_carrot,
exchange='website_monitoring_exchange', exchange_type = 'direct')
        publisher.send({'msg':{'do': 'check',
                  'task_id':hashlib.md5(str(task_id)
+request.PUT.get('url')).hexdigest(),
                  }
            })
Consumer
def monitoring_check():
   def call(message_data,message):
      if message_data['msg']['do'] == 'check':
         print '[+] receiving message'
         message.ack()
         task_id = message_data['msg']['task_id']
         get_pid = subprocess.Popen(['python','scraper.py', task_id])
         pid = get_pid.pid
         collection_task.update({'task_id':task_id}, {'$set': {'status':'RUNNING',
'pid':pid}})
         print '[Starting PID:%s]'%pid
         get_pid.wait()
      else:
         message.ack()

  queuename = 'website_monitoring_checker'
  consumer = Consumer(connection=conn_for_carrot, queue=queuename,
exchange='website_monitoring_exchange', exchange_type = 'direct')
  consumer.register_callback(call)
  try:
     print '[queue:%s]consume..' % queuename
     consumer.wait()
  except Exception, err:
     print err
Cooking soup with beautifullsoup?

from BeautifulSoup import BeautifulSoup
monitor = collection_monitor.find_one({'pk':pk})

contents = [collection_content.find({'url':str(monitor['url'])})
[1],collection_content.find({'url':str(monitor['url'])})[0]]

 texts = BeautifulSoup(BeautifulSoup(i['content']).prettify()).findAll(text=True)
        data = {'content': ' '.join(filter(visible, texts)),
             'datetime': i['datetime'],
        }



def visible(element):
   if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
       return False
   if re.search('<!--', str(element)) or re.search('-->', str(element)) or
re.search('&nbsp;', str(element)):
       return False
   return True
Alert by email !

def sending_email(to,sub,msg):
  try:
     gmail_user = 'romanticdevil.jimmy@gmail.com'
     gmail_pwd = '***************'
     smtpserver = smtplib.SMTP("smtp.gmail.com",587)
     smtpserver.ehlo()
     smtpserver.starttls()
     smtpserver.ehlo
     smtpserver.login(gmail_user, gmail_pwd)
     header = 'To:' + to + 'n' + 'From: Website-Monitoring <'+gmail_user+'>n' +
'Subject: %sn'%sub
     msg = header + msg
     smtpserver.sendmail(gmail_user,to, msg)
     smtpserver.close()
  except Exception ,err :
     print err
Task / Scheduling Checking ?
task_id = sys.argv[1]
print task_id
raw_delay = collection_task.find_one({'task_id':task_id})['schedule']
print raw_delay
if raw_delay == "1":
   delay = 60*60
elif raw_delay =="12":
   delay = 720*60
else:
   delay = 1440*60
while True:
    try:
       print '[+] Starting task: %s' %sys.argv[1]
       log(task_id, 'INFO', 'starting session')
       main()
    except Exception, err:
       log(task_id, 'exception', err)
       print err
       collection_task.update({'task_id':task_id}, {'$set': {'status':'STOPPED', 'pid':None}})
       log(task_id, 'INFO', 'updating database [status:STOPPED]')
    else:
       collection_task.update({'task_id':task_id}, {'$set': {'status':'SLEEP', 'pid':None}})
       log(task_id, 'INFO', 'updating database [status:SLEEP] for %s sec' %delay)
       time.sleep(delay)
Django-Piston
    ( A mini-framework for Django but powerfull for creating RESTful APIs )
               https://bitbucket.org/jespern/django-piston/wiki/Home



●    Ties into Django's internal mechanisms.
●    Supports OAuth out of the box (as well as Basic/Digest or custom auth.)
●    Doesn't require tying to models, allowing arbitrary resources.
●    Speaks JSON, YAML, Python Pickle & XML (and HATEOAS.)
●    Ships with a convenient reusable library in Python
●    Respects and encourages proper use of HTTP (status codes, ...)
●    Has built in (optional) form validation (via Django), throttling, etc.
●    Supports streaming, with a small memory footprint.
●    Stays out of your way.
How to ?
Include on urls.py
url(r'^api/', include('api.urls')),

Include on settings.py

INSTALLED_APPS = (
  ….......
  'api',

Create folder name /api/ on project
directory and file.
-API/
-----handlers.py
-----__init__.py
-----urls.py
Rest API'S urls.py

from django.conf.urls.defaults import *
from piston.resource import Resource
from piston.authentication import HttpBasicAuthentication
from api.handlers import *

auth = HttpBasicAuthentication(realm="website-monitoring")
ad = { 'authentication': auth }

main = Resource(handler=Main, **ad)
monitor = Resource(handler=Monitor, **ad)

urlpatterns = patterns('',
  url(r'^(?P<obj_id>[^/]+)/$', main),
  url(r'^monitor/(?P<obj_id>[^/]+)/$', monitor),
)
Rest API'S handlers.py
from piston.handler import BaseHandler
class Main(BaseHandler):
   allowed_methods = ('GET')
   def read(self, request, obj_id):
      data = collection_user.find_one({'pk': obj_id})
      if data:
         return data
      data = collection_monitor.find_one({'pk': obj_id})
      if data:
         return data
class Monitor(BaseHandler):
   allowed_methods = ('GET', 'PUT', 'DELETE')
   fields = ('url', 'status', 'hit', 'fail_hit', 'year', 'month', 'day', 'hour', 'email', 'period', 'diff')
   def read(self, request, obj_id):
      try:
         if obj_id == 'all':
            data = list(collection_monitor.find({'username': str(request.user)}))
         elif obj_id =="status_running":
            data = list(collection_monitor.find({'status':'running'}))
            ….........
      except Exception, err:
         return rc.BAD_REQUEST
      return data

  def update(self, request, obj_id):
    try:
       if obj_id == 'create':
          url_list = []
          for i in collection_monitor.find({'username': str(request.user)}):
              url_list.append(i['url'])
          if request.PUT.get('url') in url_list:
              print '[+] Url is exist '
              print '[+] Data will be Update '

       else:
         raise Exception
     except Exception, err:
       print err
       return rc.BAD_REQUEST
      …......................
def delete(self, request, obj_id):
     try:
        if obj_id == 'all':
           for i in collection_monitor.find({'username': str(request.user)}):
              collection_monitor.remove({'username': str(request.user)})
        else:
           if collection_monitor.find_one({'pk': obj_id}):
              collection_monitor.remove({'pk': obj_id})

    except Exception, err:
      print err
      return rc.FORBIDDEN
    else:
      print 'deleted'
      return rc.DELETED
Facebook Integration ?
●   Just for lazy people
●   You don't have to fill the register form just login
    in to your facebook then klick – klick & klick .
●   Good for bussiness marketing
●   Easy integrate & Etc
●   Download :
●    git clone
    http://github.com/dickeytk/django_facebook_oauth.git
Question ?
●   Twitter :@jimmyromanticde
●   Facebook:https://www.facebook.com/jimmy.ro
    mantic.devil
●   Email : romanticdevil.jimmy@gmail.com
●   Bitbucket:
    https://bitbucket.org/jimmyromanticdevil/
●   Blog : http://jimmyromanticdevil.wordpress.com
References
               http://www.python.org
          https://www.djangoproject.com
              http://www.mongodb.org
             http://www.rabbitmq.com
        http://pypi.python.org/pypi/pymongo

           https://github.com/ask/carrot/

https://bitbucket.org/jespern/django-piston/wiki/Home

http://github.com/dickeytk/django_facebook_oauth.git

         Life in a Queue “Tareque Hossain”
             Google “Message Queue”
Thank You ! :)
Ad

Recommended

Introduction to Python Celery
Introduction to Python Celery
Mahendra M
 
Django Celery
Django Celery
Mat Clayton
 
Understanding Non Blocking I/O with Python
Understanding Non Blocking I/O with Python
Vaidik Kapoor
 
Celery in the Django
Celery in the Django
Walter Liu
 
Building Distributed System with Celery on Docker Swarm
Building Distributed System with Celery on Docker Swarm
Wei Lin
 
Advanced task management with Celery
Advanced task management with Celery
Mahendra M
 
Celery by dummy
Celery by dummy
Dungjit Shiowattana
 
Scaling up task processing with Celery
Scaling up task processing with Celery
Nicolas Grasset
 
Resftul API Web Development with Django Rest Framework & Celery
Resftul API Web Development with Django Rest Framework & Celery
Ridwan Fadjar
 
Practical Celery
Practical Celery
Cameron Maske
 
Building Distributed System with Celery on Docker Swarm - PyCon JP 2016
Building Distributed System with Celery on Docker Swarm - PyCon JP 2016
Wei Lin
 
Life in a Queue - Using Message Queue with django
Life in a Queue - Using Message Queue with django
Tareque Hossain
 
Distributed Task Processing with Celery - PyZH
Distributed Task Processing with Celery - PyZH
Cesar Cardenas Desales
 
Europython 2011 - Playing tasks with Django & Celery
Europython 2011 - Playing tasks with Django & Celery
Mauro Rocco
 
An Introduction to Celery
An Introduction to Celery
Idan Gazit
 
Queue Everything and Please Everyone
Queue Everything and Please Everyone
Vaidik Kapoor
 
Django at Scale
Django at Scale
bretthoerner
 
Evented applications with RabbitMQ and CakePHP
Evented applications with RabbitMQ and CakePHP
markstory
 
Python & Django TTT
Python & Django TTT
kevinvw
 
The Art of Message Queues - TEKX
The Art of Message Queues - TEKX
Mike Willbanks
 
On Rabbits and Elephants
On Rabbits and Elephants
Gavin Roy
 
PyCon 2011 Scaling Disqus
PyCon 2011 Scaling Disqus
zeeg
 
Celery: The Distributed Task Queue
Celery: The Distributed Task Queue
Richard Leland
 
Fixing twitter
Fixing twitter
Roger Xia
 
Fixing_Twitter
Fixing_Twitter
liujianrong
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
Pinkoi Platform
Pinkoi Platform
mikeleeme
 
Cooking a rabbit pie
Cooking a rabbit pie
Tomas Doran
 
RabbitMQ with python and ruby RuPy 2009
RabbitMQ with python and ruby RuPy 2009
Paolo Negri
 

More Related Content

Viewers also liked (8)

Resftul API Web Development with Django Rest Framework & Celery
Resftul API Web Development with Django Rest Framework & Celery
Ridwan Fadjar
 
Practical Celery
Practical Celery
Cameron Maske
 
Building Distributed System with Celery on Docker Swarm - PyCon JP 2016
Building Distributed System with Celery on Docker Swarm - PyCon JP 2016
Wei Lin
 
Life in a Queue - Using Message Queue with django
Life in a Queue - Using Message Queue with django
Tareque Hossain
 
Distributed Task Processing with Celery - PyZH
Distributed Task Processing with Celery - PyZH
Cesar Cardenas Desales
 
Europython 2011 - Playing tasks with Django & Celery
Europython 2011 - Playing tasks with Django & Celery
Mauro Rocco
 
An Introduction to Celery
An Introduction to Celery
Idan Gazit
 
Queue Everything and Please Everyone
Queue Everything and Please Everyone
Vaidik Kapoor
 
Resftul API Web Development with Django Rest Framework & Celery
Resftul API Web Development with Django Rest Framework & Celery
Ridwan Fadjar
 
Building Distributed System with Celery on Docker Swarm - PyCon JP 2016
Building Distributed System with Celery on Docker Swarm - PyCon JP 2016
Wei Lin
 
Life in a Queue - Using Message Queue with django
Life in a Queue - Using Message Queue with django
Tareque Hossain
 
Distributed Task Processing with Celery - PyZH
Distributed Task Processing with Celery - PyZH
Cesar Cardenas Desales
 
Europython 2011 - Playing tasks with Django & Celery
Europython 2011 - Playing tasks with Django & Celery
Mauro Rocco
 
An Introduction to Celery
An Introduction to Celery
Idan Gazit
 
Queue Everything and Please Everyone
Queue Everything and Please Everyone
Vaidik Kapoor
 

Similar to Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django (20)

Django at Scale
Django at Scale
bretthoerner
 
Evented applications with RabbitMQ and CakePHP
Evented applications with RabbitMQ and CakePHP
markstory
 
Python & Django TTT
Python & Django TTT
kevinvw
 
The Art of Message Queues - TEKX
The Art of Message Queues - TEKX
Mike Willbanks
 
On Rabbits and Elephants
On Rabbits and Elephants
Gavin Roy
 
PyCon 2011 Scaling Disqus
PyCon 2011 Scaling Disqus
zeeg
 
Celery: The Distributed Task Queue
Celery: The Distributed Task Queue
Richard Leland
 
Fixing twitter
Fixing twitter
Roger Xia
 
Fixing_Twitter
Fixing_Twitter
liujianrong
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
Pinkoi Platform
Pinkoi Platform
mikeleeme
 
Cooking a rabbit pie
Cooking a rabbit pie
Tomas Doran
 
RabbitMQ with python and ruby RuPy 2009
RabbitMQ with python and ruby RuPy 2009
Paolo Negri
 
MongoDB as Message Queue
MongoDB as Message Queue
MongoDB
 
Python RESTful webservices with Python: Flask and Django solutions
Python RESTful webservices with Python: Flask and Django solutions
Solution4Future
 
Rabbit MQ introduction
Rabbit MQ introduction
Sitg Yao
 
RabbitMQ
RabbitMQ
Lenz Gschwendtner
 
App engine devfest_mexico_10
App engine devfest_mexico_10
Chris Schalk
 
Real time system_performance_mon
Real time system_performance_mon
Tomas Doran
 
Evented applications with RabbitMQ and CakePHP
Evented applications with RabbitMQ and CakePHP
markstory
 
Python & Django TTT
Python & Django TTT
kevinvw
 
The Art of Message Queues - TEKX
The Art of Message Queues - TEKX
Mike Willbanks
 
On Rabbits and Elephants
On Rabbits and Elephants
Gavin Roy
 
PyCon 2011 Scaling Disqus
PyCon 2011 Scaling Disqus
zeeg
 
Celery: The Distributed Task Queue
Celery: The Distributed Task Queue
Richard Leland
 
Fixing twitter
Fixing twitter
Roger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
Pinkoi Platform
Pinkoi Platform
mikeleeme
 
Cooking a rabbit pie
Cooking a rabbit pie
Tomas Doran
 
RabbitMQ with python and ruby RuPy 2009
RabbitMQ with python and ruby RuPy 2009
Paolo Negri
 
MongoDB as Message Queue
MongoDB as Message Queue
MongoDB
 
Python RESTful webservices with Python: Flask and Django solutions
Python RESTful webservices with Python: Flask and Django solutions
Solution4Future
 
Rabbit MQ introduction
Rabbit MQ introduction
Sitg Yao
 
App engine devfest_mexico_10
App engine devfest_mexico_10
Chris Schalk
 
Real time system_performance_mon
Real time system_performance_mon
Tomas Doran
 
Ad

Recently uploaded (20)

OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
Cluster-Based Multi-Objective Metamorphic Test Case Pair Selection for Deep N...
Cluster-Based Multi-Objective Metamorphic Test Case Pair Selection for Deep N...
janeliewang985
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
OpenPOWER Foundation & Open-Source Core Innovations
OpenPOWER Foundation & Open-Source Core Innovations
IBM
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
Cluster-Based Multi-Objective Metamorphic Test Case Pair Selection for Deep N...
Cluster-Based Multi-Objective Metamorphic Test Case Pair Selection for Deep N...
janeliewang985
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
OpenPOWER Foundation & Open-Source Core Innovations
OpenPOWER Foundation & Open-Source Core Innovations
IBM
 
Ad

Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

  • 1. Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django
  • 2. About me? ● Rahmat Ramadhan Irianto ● Software Developer at Void-Labs & Defpy-Labs ● is a Open Source Software Developer Team ● A Student from Indonesian University STMIK Dipanegara 2010 Makassar ● Lives in Indonesian, Makassar ● Write Python Apps every day
  • 3. What is Website-Monitoring ? ● Website monitoring provides page change monitoring and notification services to internet users worldwide. Website monitoring will create a change log for the page and alert user by email when it detects a change in the page text.
  • 4. What Useful For ? ● Website monitoring can monitor almost any page on the internet and when it detect page changes then it will alert you by email. ● Website Monitoring can be your good choice for business intelligence strategy. Track your competition and get timely alerts when a they changes their website. or You can Watch for developments at your customer's websites. ● Monitor the press release page of companies you are invested in. Keep track of their current executives. Be alerted to changes on their home page. ● Monitoring page privacy policies or terms and conditions without notice companies on the web , Now you can use website monitoring for alert you to these changes. ● Monitor the new job listings pages at companies where you would like to work. When they post a new listing, we will email you. ● Keep your up to date news. Monitor news page of your top site news. When they update it, you'll get an email alert. Inspirate from changedetection ● And much more http://www.changedetection.com
  • 5. What Power build Website- monitoring? http://goo.gl/hCf34
  • 6. Python ! http://goo.gl/sSqHh ( Powerfull,Efficient,flexibility,ideal language,Effective for OOP,Elegant syntax,Rich of library & etc ) www.python.org
  • 7. http://goo.gl/YXnA9 Django ! ( Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design & Etc) https://www.djangoproject.com/
  • 8. Mongodb ( flexibility, powerfull, Fast, and ease of use ) http://www.mongodb.org http://goo.gl/NZQ18
  • 9. RabbitMQ ( Powerfull,fast, reliable & high availability for message queuing system. open source queueing option & Greats for building and managing scalable applications) http://www.rabbitmq.com http://goo.gl/Pvd9Q
  • 11. Ajax Post Post Api request If Post Api Rest Api Save data If ajax post Procces task Scrape page Message queue Create worker worker Myview Publish task Save result If changepage Save data Alert Email Report Diff Mongodb
  • 12. Lets Talk About http://goo.gl/m8QUH
  • 13. Why Mongodb ? ● Greats features of document databases,key- value stores, and relational databases. ● How greats ? ● Fast ● Smart ● Scalable ● Schema-less ● Dynamic Query ● Easy use & etc..
  • 14. What we gonna Need ? + = Pymongo http://pypi.python.org/pypi/pymongo/
  • 15. How to ? import pymongo from pymongo import Connection collection_user = pymongo.Connection().website_monitor.user collection_monitor = pymongo.Connection().website_monitor.monitor collection_task = pymongo.Connection().website_monitor.task INSERT monitor = {'username':smart_str(request.user), 'user_id':request.user.id, 'url':url, 'datetime':datetime.utcnow(), 'status':status, 'hit':0, 'fail_hit':0, 'period':int(request.POST.get('period')), 'email':collection_user.find_one({'name':str(request.user)})['email'], 'pk':pk, 'last_checking':None, 'task_id':task_id, } collection_monitor.insert(monitor)
  • 16. UPDATE collection_user.update({'name':data_user['id']},{'$set': {'email':data_user['email'], 'firstname':smart_str(data_user['first_name']), 'lastname':smart_str(data_user['last_name']), 'ip': request.META.get('REMOTE_ADDR','unknown'), 'login':datetime.now(), 'user_agent': request.META.get('HTTP_USER_AGENT','unknown'), 'session': request.META.get('XDG_SESSION_COOKIE','unknown'), 'session_fb':session_key, 'ts':datetime.now(), 'authkey':authkey, } } ) REMOVE if collection_content.find({'url':i['url']}).count() == 3: collection_content.remove({'url':i['url'][0]})
  • 17. Why we must use Distributed Computing Distributed Computing Is a method of solving computational problem by dividing the problem into many tasks run simultaneously on many hardware or software systems (Wikipedia)
  • 18. What is Message queue ? Message Queues are: 0->Communication Buffers 0->Between independent sender & receiver processes 0->Asynchronous • Time of sending not necessarily same as receiving • In context of Web Applications: o Sender: Web Application Servers o Receiver: Background worker processes o Queue items: Tasks that the web server doesn’t have time/resources to do
  • 19. How it work ? Say a web application server has a task it doesn’t have time to do • It puts the task in the message queue • Other web servers can access the same queue(s) and put tasks there • Workers are greedy and they all watch the queues for tasks • Workers asynchronously pick up the first available task on the queue when they are ready
  • 20. What usefull for ? • Message Queues are useful in certain situations • General guidelines: 0->Does your web applications take more than a few seconds to generate a response? o->Are you using a lot of cron jobs to process data in the background? o->Do you wish you could distribute the processing of the data generated by your application among many servers?
  • 21. What We Need To Make Message Queue ?
  • 23. Why Choice AMQP & RabbitMQ ? 1.RabbitMQ is free to use 2.The documentation is decent 3.There is decent clustering support, even though we never needed clustering 4.We didn’t want to lose queues or messages upon broker crash/ restart 5. We develop applications using Python/django and setting up an AMQP backend using carrot was easy
  • 24. Now Lets Talk about RabbitMQ
  • 25. RabbitMQ ? RabbitMQ is Erlang-based open source application that serves as a message broker or message-oriented middleware. RabbitMQ implementation refers to the application layer protocol that is the Advanced Message Queuing Protocol(AMQP). AMQP provide an interoperable standard protocol between the vendor to regulate the exchange of messages on enterprise-scale systems.
  • 26. Why Use RabbitMQ ? ● We need For... ● Running Task / Procces in the backround ● Asynchronous tasking process ● Scheduling system & Etc
  • 27. So .. What make Rabbit Focus ?
  • 28. Carrot ! Carrot is an AMQP messaging queue framework. AMQP is the Advanced Message Queuing Protocol, an open standard protocol for message orientation, queuing, routing, reliability and security. Easy way to connect to RabbitMQ. Easy way to pull stuff out of the queue. Easy way to throw stuff into the queue. https://github.com/ask/carrot/
  • 29. Concept ? ● Publishers (Publishers sends messages to an exchange.) ● Exchanges (Messages are sent to exchanges. Exchanges are named and can be configured to use one of several routing algorithms. The exchange routes the messages to consumers by matching the routing key in the message with the routing key the consumer provides when binding to the exchange.) ● Consumers (Consumers declares a queue, binds it to a exchange and receives messages from it.) ● Queues ( Queues receive messages sent to exchanges. The queues are declared by consumers. ) ● Routing keys ( Every message has a routing key. The interpretation of the routing key depends on the exchange type. There are four default exchange types defined by the AMQP standard, and vendors can define custom types (so see your vendors manual for details ) ● Exchange types defined by AMQP/0.8: ● Direct exchange ( Matches if the routing key property of the message and the routing_key attribute of the consumer are identical. ) ● Fan-out exchange(Always matches, even if the binding does not have a routing key.) ● Topic exchange (Matches the routing key property of the message by a primitive pattern matching scheme.)
  • 30. Creating Connetion on Django Settings.py RABBITMQ_HOST = 'localhost' RABBITMQ_PORT = 5672 RABBITMQ_USER = 'guest' RABBITMQ_PASS = 'guest' RABBITMQ_VHOST = '/' Views.py from carrot.messaging import Publisher, Consumer from carrot.connection import AMQPConnection from django.conf import settings conn_for_carrot = AMQPConnection(hostname=settings.RABBITMQ_HOST, port=settings.RABBITMQ_PORT, userid=settings.RABBITMQ_USER, password=settings.RABBITMQ_PASS, vhost=settings.RABBITMQ_VHOST)
  • 31. Publisher publisher = Publisher(connection=conn_for_carrot, exchange='website_monitoring_exchange', exchange_type = 'direct') publisher.send({'msg':{'do': 'check', 'task_id':task_id, } }) publisher = Publisher(connection=conn_for_carrot, exchange='website_monitoring_exchange', exchange_type = 'direct') publisher.send({'msg':{'do': 'check', 'task_id':hashlib.md5(str(task_id) +request.PUT.get('url')).hexdigest(), } })
  • 32. Consumer def monitoring_check(): def call(message_data,message): if message_data['msg']['do'] == 'check': print '[+] receiving message' message.ack() task_id = message_data['msg']['task_id'] get_pid = subprocess.Popen(['python','scraper.py', task_id]) pid = get_pid.pid collection_task.update({'task_id':task_id}, {'$set': {'status':'RUNNING', 'pid':pid}}) print '[Starting PID:%s]'%pid get_pid.wait() else: message.ack() queuename = 'website_monitoring_checker' consumer = Consumer(connection=conn_for_carrot, queue=queuename, exchange='website_monitoring_exchange', exchange_type = 'direct') consumer.register_callback(call) try: print '[queue:%s]consume..' % queuename consumer.wait() except Exception, err: print err
  • 33. Cooking soup with beautifullsoup? from BeautifulSoup import BeautifulSoup monitor = collection_monitor.find_one({'pk':pk}) contents = [collection_content.find({'url':str(monitor['url'])}) [1],collection_content.find({'url':str(monitor['url'])})[0]] texts = BeautifulSoup(BeautifulSoup(i['content']).prettify()).findAll(text=True) data = {'content': ' '.join(filter(visible, texts)), 'datetime': i['datetime'], } def visible(element): if element.parent.name in ['style', 'script', '[document]', 'head', 'title']: return False if re.search('<!--', str(element)) or re.search('-->', str(element)) or re.search('&nbsp;', str(element)): return False return True
  • 34. Alert by email ! def sending_email(to,sub,msg): try: gmail_user = '[email protected]' gmail_pwd = '***************' smtpserver = smtplib.SMTP("smtp.gmail.com",587) smtpserver.ehlo() smtpserver.starttls() smtpserver.ehlo smtpserver.login(gmail_user, gmail_pwd) header = 'To:' + to + 'n' + 'From: Website-Monitoring <'+gmail_user+'>n' + 'Subject: %sn'%sub msg = header + msg smtpserver.sendmail(gmail_user,to, msg) smtpserver.close() except Exception ,err : print err
  • 35. Task / Scheduling Checking ? task_id = sys.argv[1] print task_id raw_delay = collection_task.find_one({'task_id':task_id})['schedule'] print raw_delay if raw_delay == "1": delay = 60*60 elif raw_delay =="12": delay = 720*60 else: delay = 1440*60 while True: try: print '[+] Starting task: %s' %sys.argv[1] log(task_id, 'INFO', 'starting session') main() except Exception, err: log(task_id, 'exception', err) print err collection_task.update({'task_id':task_id}, {'$set': {'status':'STOPPED', 'pid':None}}) log(task_id, 'INFO', 'updating database [status:STOPPED]') else: collection_task.update({'task_id':task_id}, {'$set': {'status':'SLEEP', 'pid':None}}) log(task_id, 'INFO', 'updating database [status:SLEEP] for %s sec' %delay) time.sleep(delay)
  • 36. Django-Piston ( A mini-framework for Django but powerfull for creating RESTful APIs ) https://bitbucket.org/jespern/django-piston/wiki/Home ● Ties into Django's internal mechanisms. ● Supports OAuth out of the box (as well as Basic/Digest or custom auth.) ● Doesn't require tying to models, allowing arbitrary resources. ● Speaks JSON, YAML, Python Pickle & XML (and HATEOAS.) ● Ships with a convenient reusable library in Python ● Respects and encourages proper use of HTTP (status codes, ...) ● Has built in (optional) form validation (via Django), throttling, etc. ● Supports streaming, with a small memory footprint. ● Stays out of your way.
  • 37. How to ? Include on urls.py url(r'^api/', include('api.urls')), Include on settings.py INSTALLED_APPS = ( …....... 'api', Create folder name /api/ on project directory and file. -API/ -----handlers.py -----__init__.py -----urls.py
  • 38. Rest API'S urls.py from django.conf.urls.defaults import * from piston.resource import Resource from piston.authentication import HttpBasicAuthentication from api.handlers import * auth = HttpBasicAuthentication(realm="website-monitoring") ad = { 'authentication': auth } main = Resource(handler=Main, **ad) monitor = Resource(handler=Monitor, **ad) urlpatterns = patterns('', url(r'^(?P<obj_id>[^/]+)/$', main), url(r'^monitor/(?P<obj_id>[^/]+)/$', monitor), )
  • 39. Rest API'S handlers.py from piston.handler import BaseHandler class Main(BaseHandler): allowed_methods = ('GET') def read(self, request, obj_id): data = collection_user.find_one({'pk': obj_id}) if data: return data data = collection_monitor.find_one({'pk': obj_id}) if data: return data
  • 40. class Monitor(BaseHandler): allowed_methods = ('GET', 'PUT', 'DELETE') fields = ('url', 'status', 'hit', 'fail_hit', 'year', 'month', 'day', 'hour', 'email', 'period', 'diff') def read(self, request, obj_id): try: if obj_id == 'all': data = list(collection_monitor.find({'username': str(request.user)})) elif obj_id =="status_running": data = list(collection_monitor.find({'status':'running'})) …......... except Exception, err: return rc.BAD_REQUEST return data def update(self, request, obj_id): try: if obj_id == 'create': url_list = [] for i in collection_monitor.find({'username': str(request.user)}): url_list.append(i['url']) if request.PUT.get('url') in url_list: print '[+] Url is exist ' print '[+] Data will be Update ' else: raise Exception except Exception, err: print err return rc.BAD_REQUEST …......................
  • 41. def delete(self, request, obj_id): try: if obj_id == 'all': for i in collection_monitor.find({'username': str(request.user)}): collection_monitor.remove({'username': str(request.user)}) else: if collection_monitor.find_one({'pk': obj_id}): collection_monitor.remove({'pk': obj_id}) except Exception, err: print err return rc.FORBIDDEN else: print 'deleted' return rc.DELETED
  • 42. Facebook Integration ? ● Just for lazy people ● You don't have to fill the register form just login in to your facebook then klick – klick & klick . ● Good for bussiness marketing ● Easy integrate & Etc ● Download : ● git clone http://github.com/dickeytk/django_facebook_oauth.git
  • 43. Question ? ● Twitter :@jimmyromanticde ● Facebook:https://www.facebook.com/jimmy.ro mantic.devil ● Email : [email protected] ● Bitbucket: https://bitbucket.org/jimmyromanticdevil/ ● Blog : http://jimmyromanticdevil.wordpress.com
  • 44. References http://www.python.org https://www.djangoproject.com http://www.mongodb.org http://www.rabbitmq.com http://pypi.python.org/pypi/pymongo https://github.com/ask/carrot/ https://bitbucket.org/jespern/django-piston/wiki/Home http://github.com/dickeytk/django_facebook_oauth.git Life in a Queue “Tareque Hossain” Google “Message Queue”