Advanced
Monitoring
with
API
A
Presenta*on
for
MUM
Sydney,
2012
By
Herry
Darmawan
About
ME
Herry
Darmawan
Working
for
:
Spectrum
Indonesia
Posi*on
:
Technical
and
Opera*onal
Manager
Home
base
:
Surabaya,
Indonesia
Has
been
using
MikroTik
since
2004
Daily
Ac*vity
Train
people
how
to
use
MikroTik
through
MikroTik
Cer*ed
Training
(basic
and
advance
class)
Managing
technical
team
of
ISP
in
Surabaya
for
the
last
mile
connec*on
(Wireless
and
Fiber)
Conduc*ng
Networking
Project
and
Consulta*on
Developing
Monitoring
and
System
for
network
and
standard
procedure
par*cularly
using
MikroTik
as
the
object
What
is
This
Presentation
About?
Monitoring
devices
What
regular
method
(non-API)
cannot
achieve?
How
to
use
API
in
polling-based
method
Case
Study
!
Regular
Monitoring
System
Method
ICMP
SNMP
and
SNMP-Trap
TCP/UDP
checked
(based
on
port)
How
to
monitor
case
like
this???
10.10.10.100/24
OSPF
Network
within
same
Area
Introducing
Nagios
Web
based
monitoring
system
Modular
Check
plugin
(in
perl
or
c++)
Lots
of
improvement
module
(front-end,
polling,
3rd
party
integra*on,
etc)
Database
backend
NDOMY
MySQL
Postgres
SQL
Recommended
Front-End
:
CENTREON
Centreon
host
and
service
Centreon
host
details
conBig
Centreon
plugins
for
service
Plugin
short-name
The
real
command-prompt
syntax
(including
the
parameters)
Centreon
-
command
The
actual
command
prompt
with
some
MACROs
[root@localhost
plugins]#
./check_centreon_snmp_up*me
-H
192.168.3.1
-C
dsnmp
-v
2c
d
OK
-
Up*me
(in
day):
0|up*me=0day(s)
Centreon
-
attaching
to
service
Centreon
service
result
Nagios
Plugin
Structure
Plugins
can
be
created
using
perl
or
c++
(compiled
or
not)
For
un-compiled
script,
this
is
the
structure
Header
Parameter
ini*aliza*on
Help
menu
Process
Processing
and
gathering
informa*on
from
device
Return
Value
Result
display
RRD
result
Service
status
return
Nagios
Plugin
Structure
Header
Taking
parameters
from
the
command
prompt
Check
whether
the
parameters
are
correct
and
complete
(for
example
we
need
to
take
the
username,
but
user
didnt
provide
us
with
the
username
parameter
Print
help
(if
necessary)
Global
and
local
variable
ini*aliza*on
Process
Is
the
real
process
All
process
(SNMP,
Telnet,
SSH,
API)
is
happening
in
this
part
Beware
to
check
the
structure
Nagios
Plugin
Structure
Return
Value
Print
a
line
to
send
out
to
Centreon/Nagios
as
Status
Informa*on
2nd
line,
if
any,
will
be
considered
as
Extended
Informa*on
Send
out
a
performance
data
to
be
graphed
using
RRD
Tool
At
the
end
of
the
script,
we
have
to
send
out
no*ca*on
whether
this
service
state
is
OK
0
WARNING
1
CRITICAL
2
UNKNOWN
3
or
DEPENDENT
-
4
Case
Study
Scenario
192.168.2.1
MR4
192.168.4.1
MR2
MR1
BGP
MR3
OSPF
MR5
192.168.5.1
192.168.1.1
192.168.3.1
192.168.3.2
Monitoring
OSPF
What
parameter
do
we
need?
Router
IP
API
Port
(in
this
case,
we
use
the
default
port)
Username
and
Password
for
the
API
Interface
NAME
/
NUMBER
Threshold
Value
We
will
create
a
help
menu
which
will
be
shown
if
there
is
uncompleted
parameters
given
Monitoring
OSPF
through
API
Command
Prompt
Parameters
usage: $0 m <mtik_ip> -u <user> -p <passwd>! ! -h : help (this message)! -m : hostname or IP of Mikrotik router! -u : admin username! -p : password! -l : list of interface! -i : interface number! -w : warning threshold (in Kbps)! -c : critical threshold (in Kbps)!
Monitoring
OSPF
through
API
Concept
./check_ospf.pl m <RA> -u <U> -p <P> -l! Will
list
all
the
corresponding
interface
inside
this
router
./check_ospf.pl m <RA> -u <U> -p <P> -i ether1! Will
show
the
OSPF
Status,
along
with
the
u*liza*on
of
interface
name
ether1
with
condi*on
like
this
:
IF
the
status
of
OSPF
<>
FULL,
then
considered
CRITICAL
About
API
in
PERL
Created
by
a
forum
member
called
cheesegrits
He
provide
some
sample
source-code
and
one
of
it
is
ac*ng
like
terminal
for
API
Improvement
from
the
original
module
:
Accept
?
sign
rather
than
only
=
for
the
command
parameter
Improve
output
(used
to
be
hang
for
more
than
1kB
output)
Adding
some
subprocedure
Sub
getall_by_key
,
to
list
all
the
result
based
on
.id
Sub
get_by_key,
to
get
a
list
of
result
based
on
.id
as
search_key
Sub
get_by_name,
to
get
a
list
of
result
based
on
custom
search_key
Sub
get_by_value,
to
get
one
single
value
of
an
item
(for
example
to
get
the
status
of
interface
name
ether1)
About
API
Command
Must
be
started
with
Command
Word,
followed
by
Auribute
Word
(or
Query
Word),
then
terminated
by
zero-length
Word
API
Command
Word
Its
a
command
in
API
Almost
the
same
as
the
terminal
command
syntax,
but
no
space,
instead
use
/
as
the
replacement
Special
API
command
is
:
getall, login, cancel! Example
/interface/getall! /interface/set! /ip/address/print! /login! /interface/wireless/remove!
About
API
Attribute
API
Auribute
Word
Its
the
value
depend
on
the
content
of
a
command
Started
with
=
followed
by
the
auribute
name,
followed
by
=
then
end
with
the
auribute
value
Example
=name=ether1! =status=enable! =.proplist=name,mtu,type,running!
About
API
Query
API
Query
Auribute
Used
only
for
print
and
getall
Start
with
?,
followed
by
auribute
name
(or
addi*onal
command),
followed
by
=
then
end
by
auribute
value
Example
?status (means
if
THERE
IS
a
auribute
named
status)
?name=ether1 (means
if
NAME
is
ether1)
?-name=ether5 (means
if
NAME
is
NOT
ether5)
?>comment= (means
if
there
is
non-empty
comment)
?#<operator> (means
popup
2
value
just
before
this
query
then
compare
with
operator)
The
operator
can
be
|
(or),
&
(and),
!
change
top
value
with
opposite,
etc
How
to
List
the
OSPF
Interface
In
terminal,
if
I
want
to
list
the
interface,
the
command
is
!/interface print! In
API,
we
convert
the
Terminal
Command
into
API
format
!/interface/getall! !=.proplist=name!
How
to
List
the
OSPF
Interface
In
PERL,
the
command
will
look
like
this
my(%attrs);! $attrs{'=.proplist'} = name';! my(%results) = Mtik::get_by_key('/interface/getall', \%attrs);! print "List of interface in router $mtik_host\n";! foreach my $item (keys(%results)) {! !my($intno) = $results{$item}{.id};! !my($intname) = $results{$item}{'name'};! !print " $intno - $intname \n";! }!
And
the
result
would
be
[root@localhost
plugins]#
./check_ospf.pl
-m
192.168.3.1
-u
api
-p
test
l
List
of
interface
in
router
192.168.3.1
*3
-
ether3
*4
-
ether4
*2
-
ether2
*1
-
ether1
Monitoring
OSPF
through
API
Concept
./check_ospf m <RA> -u <U> -p <P> -l! Will
list
all
the
corresponding
interface
inside
this
router
./check_ospf.pl m <RA> -u <U> -p <P> -i ether1! Will
show
the
OSPF
Status,
along
with
the
u*liza*on
of
interface
name
ether1
with
condi*on
like
this
:
IF
the
status
of
OSPF
<>
FULL,
then
considered
CRITICAL
OSPF
Neighbor
Check
In
terminal,
the
command
is
!/routing ospf neighbor print!
In
API,
it
looks
like
this
/rou*ng/ospf/neighbor/getall
?interface=<interface_name>
=.proplist=interface,state,adjacency
OSPF
Neighbor
Check
#get the interface status based on interface name! $ospfattrs{'=.proplist'} = 'interface,state,adjacency';! my(%results) = Mtik::get_by_name! ! !('/routing/ospf/neighbor/getall', ! ! ! 'interface', $intname, \%ospfattrs);! if (%results) {! # IF the result is non empty, then check the state! !$state = $results{$intname}{'state'};! !$adjacency = $results{$intname}{adjacency};! !if ($state ne "Full) {! ! !$errmsg = "OSPF for $intname status is $state";! ! !$status = "WARNING";! !} else {! ! !$status = "OK";! !}! } else {! # IF the result is empty, then it might be not there! !$errmsg = "OSPF for $intname status not connected";! !$status = "CRITICAL";! } ! !!
Final
RESULT
my %ERRORS=('OK'=>0,! ! 'WARNING'=>1,! ! 'CRITICAL'=>2,! ! 'UNKNOWN'=>3,! ! 'DEPENDENT'=>4);!
if ($errmsg) {! !print $errmsg."\n";! } else {! !print "$status : "OSPF status for $intname ! ! ! ! !is $state for $adjacency \n";! }! exit $ERRORS{$status};!
Command
Prompt
RESULT
###
LIST
all
the
interface
[root@localhost
plugins]#
./check_ospf.pl
-m
192.168.3.1
-u
api
-p
test
l
List
of
interface
in
router
192.168.3.1
*3
-
ether3
*4
-
ether4
*2
-
ether2
*1
-
ether1
###
RESULT
for
OK
OSPF
Status
(FULL)
[root@localhost
plugins]#
./check_ospf.pl
-m
192.168.3.1
-u
api
-p
test
-i
*3
OK
:
OSPF
status
for
ether3
is
Full
for
00:43:30
###
RESULT
for
NOT
OK
OSPF
(status
Down
or
not
connected)
[root@localhost
plugins]#
./check_ospf.pl
-m
192.168.3.1
-u
api
-p
test
-i
*1
OSPF
for
ether1
status
unknown/not
connected
Integrate
to
NAGIOS
$USER1$/check_ospf.pl m $HOSTADDRESS$ -u api p test i $ARG1$! IP
Address
of
the
HOST
ARGUMENT1
could
be
dierent
for
each
service
/usr/lib/nagios/plugins
Attach
it
to
HOST
Command
short-name
ARGUMENT1
:
the
interface
number
TESTING
Drawbacks
API
connec*on
will
constantly
ini*ate
and
closed
each
*me
the
monitoring
tools
doing
polling
to
the
device
/
host
Not
as
fast
as
SNMP
(since
we
are
using
TCP
Socket
conn)
Improvement
Instead
of
just
checking
the
OSPF
status,
why
dont
we
check
the
trac
u*liza*on
as
well
and
give
alert
if
it
reach
some
threshold?
./check_ospf m <RA> -u <U> -p <P> -i ether1 w 10 c 100! Will
show
the
OSPF
Status,
along
with
the
u*liza*on
of
interface
name
ether1
with
condi*on
like
this
:
IF
the
trac
u*lized
is
more
than
10kbps
(-w
10)
then
this
service
status
is
considered
WARNING
IF
the
trac
u*lized
is
more
than
100kbps
(-c
100)
then
this
service
status
is
considered
CRITICAL
IF
the
status
of
OSPF
<>
FULL,
then
considered
CRITICAL
GRAPH
the
TX
and
RX
trac
TrafBic
Utilization
IF
the
trac
u*lized
is
more
than
10kbps
(-w
10)
then
this
service
status
is
considered
WARNING
IF
the
trac
u*lized
is
more
than
100kbps
(-c
100)
then
this
service
status
is
considered
CRITICAL
First
of
all,
we
will
take
the
external
value
for
the
WARNING
and
CRITICAL
threshold
WARNING
threshold
is
taken
by
parameter
w! CRITICAL
threshold
is
taken
by
parameter
-c!
TrafBic
Utilization
In
Terminal
we
write
it
like
this
!/interface monitor-traffic [ether1]!
In
API,
we
write
it
like
this
!/interface/monitor-traffic! !=once=! !=interface=[ether1]! !
TrafBic
Utilization
### TAKING the interface number from the parameter! my($intno) = $options{'i'}; ! ! ### Getting the interface name (the monitor-traffic use name)! $intattrs{'=.proplist'} = 'name';! $intattrs{'.id'} = $intno;! $intname = Mtik::get_value_by_id! ! !('/interface/getall', $intno, 'name', \%intattrs);! ! !! ### Getting the real traffic from monitor-traffic command ! !! $trafficattr{'=.proplist'} = ! ! ! !'rx-bits-per-second, tx-bits-per-second';! $trafficattr{'=once'} = '';! $trafficattr{'=interface'} = $intname;! my(%traffics) = Mtik::get_by_key! ! !(/interface/monitor-traffic, \%trafficattr);! $txbits = $traffics{$intno}{'tx-bits-per-second'};! $rxbits = $traffics{$intno}{'rx-bits-per-second'};!
TrafBic
Utilization
Now
we
compare
the
bits
received
with
the
actual
Threshold
if ($txbits > $warningbits || $rxbits > $warningbits) {! !$retmsg .= " but the traffic exceeded the threshold";! !$status = "WARNING";! } elsif ($txbits > $criticalbits || $rxbits > $criticalbits) {! !$retmsg .= " but the traffic exceeded the threshold";! !$status = "CRITICAL";! }! ! print "$status : $retmsg \n";! printf("Traffic Utilization : TX : %.2f ".$txprefix."bps/ ! ! RX : %.2f ".$rxprefix."bps\n! !,$txdispbits,$rxdispbits);! print "|traffic_in=".$txbits."Bits/s;! ! ! !$warningbits;$criticalbits ! ! traffic_out=".$rxbits."Bits/s;! ! ! !$warningbits;$criticalbits\n";! exit $ERRORS{$status};!
TrafBic
Utilization
-
COMMAND
### When the OSPF is OK and the traffic is OK! [root@localhost]# ./check_ospf.pl -m 192.168.3.1 -u api -p test -i *4! OK : OSPF status for ether4 is Full for 00:49:37 ! Traffic Utilization : TX : 0.00 bps/ RX : 0.00 bps! |traffic_in=0Bits/s;100000;1000000 traffic_out=0Bits/s;100000;1000000! ! ! ### When the OSPF is OK but the traffic exceed the threshold! [root@localhost]# ./check_ospf.pl -m 192.168.3.1 -u api -p test -i *3! WARNING : OSPF status for ether3 is Full for 00:01:49 ! but the traffic exceeded the threshold ! Traffic Utilization : TX : 131.97 kbps/ RX : 130.43 kbps! |traffic_in=131968Bits/s;100000;1000000! traffic_out=130432Bits/s;100000;1000000!
Visual
Result
Whats
NEXT?
Basically
we
can
monitor
and
graph
anything
Graph
BGP
prexes
received
and
alert
when
the
BGP
DOWN
or
the
prexes
reach
some
low
threshold
Graph
the
number
of
Ac*ve
Hotspot
user,
Host
that
connected
to
a
Hotspot
server,
and
the
number
of
DHCP
Lease
that
has
been
established
Graph
the
number
of
sta*on
that
connect
to
an
Access
Point
Graph
TX/RX
Rate
and
CCQ
of
a
connec*on
and
send
alert
once
they
goes
below
certain
threshold
Centreon
and
Nagios
also
provide
Passive
Check
Lots
of
Modules
and
Plugins
hup://project.spectrumindo.com
hup://www.mikro*ktraining.co.id
FURTHER
QUESTION
[email protected]