Automated RMON Alarm/Event-configuration for class-based QoS-Monitoring using NAPALM

In Configure RMON Alarms&Events by script I’ve shown a short python-algorithm to to discover all Cisco class-based QoS (cbQoS) packet-/drop-counters and to generate RMON-alarms for each. The router monitors these counters every 300s, calculates the delta for the interval and raises RMON-events when there were packets/drops or when the have been before but not anymore.

This RMON-event has been configured as an syslog-message to an syslog-receiver etc.

The existing script just generated a list of cli-commands which had to be entered manually to the router-config.
Not a valid aproach when having hundreds devices to be configured.

Now i want the script to automatically configure the router.

  • add both „rmon event“-objects for the rising- and the falling-threshold of the monitored alarms
  • read the existing „rmon alarm“-objects from the device config, which have been configured by this script during a former run
  • remove these existing alarms
  • discover all cbQoS-packet/drop-counters
  • add corresponding „rmon alarm“-objects

I’d like to refer to Centralized access to device-configuration and other state-information using NAPALM for some basic information regarding NAPALM and how to create the „router“-object in python.

NAPALM: Read existing RMON alarms.
I’ll use the following python-logic to

  • remote-execute the command
  • immedeately pull the cli-output out of the python-dictionary: the CLI-Command is the dict-key
>>> cligetrmon=['show rmon alarms | inc RMONevent']
>>> rmonalarms = router.cli(cligetrmon)[cligetrmon[0]]
>>> print rmonalarms
Alarm 10001 is active, owned by RMONevent

Generate CLI to delete these RMON alarms

>>> cmdnormon = ""
>>> for alarm in rmonalarms.split('\n'):
...  alarmid = alarm.split(' ')[1]
...  cmdnormon += "no rmon alarm "+alarmid+"\n"
...
>>>
>>> print cmdnormon
no rmon alarm 10001

Static CLI to add required RMON events

>>> cmdrmonevent = "rmon event 10 log owner RMONevent\n"
>>> cmdrmonevent += "rmon event 11 log owner RMONevent\n"

Read Cisco cbQoS-MIB to fetch interesting QoS-counters, generate CLI for RMON-alarms

>>> from easysnmp import Session
>>> hostname = "192.168.2.72"
>>> session = Session(hostname, community='READ', version=2)
>>>
... cbqos = session.walk('1.3.6.1.4.1.9.9.166.1.15.1.1.13')
>>>
... cmdrmon = ""
>>> alarmID = 10001
>>>
... for i in cbqos:
...   oidList=i.oid.split(".")
...   q=oidList.pop()
...   p=oidList.pop()
...   #print p,q
...   ifTypeID=int(session.get("1.3.6.1.4.1.9.9.166.1.1.1.1.2."+p).value)
...   ifDirID=int(session.get("1.3.6.1.4.1.9.9.166.1.1.1.1.3."+p).value)
...   if (ifDirID==2):
...     cmdrmon += "rmon alarm "+str(alarmID)+" "+i.oid+" 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent\n"
...   alarmID += 1
...

Concatenate all commmands

>>> cmd = cmdrmonevent+cmdnormon+cmdrmon
>>> print cmd
rmon event 10 log owner RMONevent
rmon event 11 log owner RMONevent
no rmon alarm 10001
rmon alarm 10001 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10002 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10003 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10004 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10005 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10006 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent

Push the commands to the Router

>>> router.load_merge_candidate(config=cmd)

Check the differences befor apply the changes

>>> print router.compare_config()
-no rmon alarm 10001
+rmon alarm 10001 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
+rmon alarm 10002 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
+rmon alarm 10003 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
+rmon alarm 10004 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
+rmon alarm 10005 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
+rmon alarm 10006 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent

Verify the pushed commands at the router-CLI

IOS-RTR#dir *.txt
Directory of bootflash:/*.txt

Directory of bootflash:/

   21  -rw-         898  Nov 24 2017 15:05:32 +00:00  merge_config.txt
7835619328 bytes total (6613028864 bytes free)

IOS-RTR#more merge_config.txt
rmon event 10 log owner RMONevent
rmon event 11 log owner RMONevent
no rmon alarm 10001
rmon alarm 10001 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10002 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10003 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10004 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10005 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10006 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent

Commit the changes

>>> router.commit_config()

Or discard them

>>> router.discard_config()

It’s possible to rollback committed changes.

>>> router.rollback()

Finally: Disconnect the session with the device

>>> router.close()

Again: A brief look to the router

IOS-RTR#show run | inc rmon
! Last configuration change at 19:08:59 UTC Fri Nov 24 2017 by rmond
! NVRAM config last updated at 19:09:00 UTC Fri Nov 24 2017 by rmond
username rmond privilege 15 secret 5 $1$7VnE$2O18Vfcr4y7eO5gY7l4xx1
rmon event 10 log owner RMONevent
rmon event 11 log owner RMONevent
rmon alarm 10001 cbQosCMStatsEntry.13.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10002 cbQosCMStatsEntry.13.18.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10003 cbQosCMStatsEntry.13.18.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10004 cbQosCMStatsEntry.13.34.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10005 cbQosCMStatsEntry.13.34.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10006 cbQosCMStatsEntry.13.34.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
IOS-RTR#

It’s already written to NVRAM

IOS-RTR#show startup-config | inc rmon
! Last configuration change at 19:08:59 UTC Fri Nov 24 2017 by rmond
! NVRAM config last updated at 19:09:00 UTC Fri Nov 24 2017 by rmond
username rmond privilege 15 secret 5 $1$7VnE$2O18Vfcr4y7eO5gY7l4xx1
rmon event 10 log owner RMONevent
rmon event 11 log owner RMONevent
rmon alarm 10001 cbQosCMStatsEntry.13.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10002 cbQosCMStatsEntry.13.18.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10003 cbQosCMStatsEntry.13.18.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10004 cbQosCMStatsEntry.13.34.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10005 cbQosCMStatsEntry.13.34.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10006 cbQosCMStatsEntry.13.34.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent

Configure RMON Alarms&Events by script

Getting back to the original task..
Use a script on a centralized Controller-VM to figure out for which SNMP-OID RMON-Alarms should get configured

Get all current QoS-Drop-Counters, check the traffic-direction to monitor only outbount-queues, generate RMON-Alarms.

hostname = "192.168.2.72"

session = Session(hostname, community='READ', version=2)

cbqos = session.walk('1.3.6.1.4.1.9.9.166.1.15.1.1.13')

cmds = ["Configure on Host \""+hostname+"\"\n---"]
cmds.append("rmon event 10 log owner RMONevent")
cmds.append("rmon event 11 log owner RMONevent")

alarmID = 10001

for i in cbqos:
  oidList=i.oid.split(".")
  q=oidList.pop()
  p=oidList.pop()
  #print p,q
  ifTypeID=int(session.get("1.3.6.1.4.1.9.9.166.1.1.1.1.2."+p).value)
  ifDirID=int(session.get("1.3.6.1.4.1.9.9.166.1.1.1.1.3."+p).value)
  if (ifDirID==2):
    cmds.append("rmon alarm "+str(alarmID)+" "+i.oid+" 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent")
  alarmID += 1

for cmd in cmds:
  print cmd

Example Output:

Configure on Host "192.168.2.72"
---
rmon event 10 log owner RMONevent
rmon event 11 log owner RMONevent
rmon alarm 10001 enterprises.9.9.166.1.15.1.1.13.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10002 enterprises.9.9.166.1.15.1.1.13.18.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10003 enterprises.9.9.166.1.15.1.1.13.18.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10004 enterprises.9.9.166.1.15.1.1.13.34.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10005 enterprises.9.9.166.1.15.1.1.13.34.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10006 enterprises.9.9.166.1.15.1.1.13.34.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent

Todo: Verify existing RMON-Alarm/Event-Configuration at the device
Todo: Push the config automatically to the device

Getting Details of a Traffic Class from the SNMP-MIB

Today I’ll show how retrieve additional details from already discoverd QoS-Counters. They are mostly descriptive, for human eyes.
The „Traffic-Direction“-Attribute might be relevant since in most cases only outbound drop-counters might be interesting, so the discovered list of OIDs could get filtered to process only those outbound OIDs.

Refresh: Retrieve all „QoS Packet-Counters“

>>> cbqos = session.walk('1.3.6.1.4.1.9.9.166.1.15.1.1.2')
>>> print cbqos
[<SNMPVariable value='9' (oid='enterprises.9.9.166.1.15.1.1.2.18.65536', oid_index='', snmp_type='COUNTER')>,
<SNMPVariable value='0' (oid='enterprises.9.9.166.1.15.1.1.2.18.131072', oid_index='', snmp_type='COUNTER')>,
<SNMPVariable value='1035' (oid='enterprises.9.9.166.1.15.1.1.2.18.196608', oid_index='', snmp_type='COUNTER')>,
<SNMPVariable value='0' (oid='enterprises.9.9.166.1.15.1.1.2.34.65536', oid_index='', snmp_type='COUNTER')>,
<SNMPVariable value='0' (oid='enterprises.9.9.166.1.15.1.1.2.34.131072', oid_index='', snmp_type='COUNTER')>,
<SNMPVariable value='0' (oid='enterprises.9.9.166.1.15.1.1.2.34.196608', oid_index='', snmp_type='COUNTER')>]

There are two Policy-Objects #P:

  • Policy #18
  • Policy #34

Both Policy-Objects contain three Traffic-Classes #Q:

  • Class #65535
  • Class #131072
  • Class #196608

Attributes of a bound Policy #P

Each Policy has at least two attributes:

  • Interface-Type of the Policy (5 : CoPP)
    • 1:mainInterface
    • 2:subInterface
    • 3:frDLCI
    • 4:atmPVC
    • 5:controlPlane
    • 6:vlanPort
    • 7:evc
  • Traffic-Direction
    • 1:input
    • 2:output
  • Interface bound to

Get the type of a Policy = „cbQosServicePolicyEntry.2.#P“ = „1.3.6.1.4.1.9.9.166.1.1.1.1.2.#P“

  • both are type „1“ = Main-Interface
>>> print session.get("1.3.6.1.4.1.9.9.166.1.1.1.1.2.18")
<SNMPVariable value='1' (oid='enterprises.9.9.166.1.1.1.1.2.18', oid_index='', snmp_type='INTEGER')>
>>> print session.get("1.3.6.1.4.1.9.9.166.1.1.1.1.2.34")
<SNMPVariable value='1' (oid='enterprises.9.9.166.1.1.1.1.2.34', oid_index='', snmp_type='INTEGER')>

Direction = „cbQosServicePolicyEntry.3.#P“ = „1.3.6.1.4.1.9.9.166.1.1.1.1.3.#P“

  • both are direction „2“ = Output
>>> print session.get("1.3.6.1.4.1.9.9.166.1.1.1.1.3.18")
<SNMPVariable value='2' (oid='enterprises.9.9.166.1.1.1.1.3.18', oid_index='', snmp_type='INTEGER')>
>>> print session.get("1.3.6.1.4.1.9.9.166.1.1.1.1.3.34")
<SNMPVariable value='2' (oid='enterprises.9.9.166.1.1.1.1.3.34', oid_index='', snmp_type='INTEGER')>

Interface-ID = „cbQosServicePolicyEntry.4.#P“ = „1.3.6.1.4.1.9.9.166.1.1.1.1.4.#P“ (cbQosIfIndex)

  • Policy#18 is bound to Interface #1
  • Policy#34 is bound to Interface #2
>>> print session.get("1.3.6.1.4.1.9.9.166.1.1.1.1.4.18")
<SNMPVariable value='1' (oid='enterprises.9.9.166.1.1.1.1.4.18', oid_index='', snmp_type='INTEGER')>
>>> print session.get("1.3.6.1.4.1.9.9.166.1.1.1.1.4.34")
<SNMPVariable value='2' (oid='enterprises.9.9.166.1.1.1.1.4.34', oid_index='', snmp_type='INTEGER')>

Interface-NAME = „1.3.6.1.2.1.2.2.1.2.#IFID“ (ifDescr)

  • Interface#1 is named „GigabitEthernet1“
  • Interface#2 is named „GigabitEthernet2“
>>> print session.get("1.3.6.1.2.1.2.2.1.2.1")
<SNMPVariable value='GigabitEthernet1' (oid='ifDescr', oid_index='1', snmp_type='OCTETSTR')>
>>> print session.get("1.3.6.1.2.1.2.2.1.2.2")
<SNMPVariable value='GigabitEthernet2' (oid='ifDescr', oid_index='2', snmp_type='OCTETSTR')>

Attributes of the Traffic-Classes #Q in Policy #P

Each Traffic-Class has (beyond of all counters) the attribute:

  • Name

Class-ID = „1.3.6.1.4.1.9.9.166.1.5.1.1.2.#P.#Q“ (cbQosConfigIndex)

  • Class #65535 in Policy #18 has ID #309479785
  • Class #131072 in Policy #18 has ID #342719994
  • Class #196608 in Policy #18 has ID #1593
  • Class #65535 in Policy #34 has ID #309479785
  • Class #131072 in Policy #34 has ID #342719994
  • Class #196608 in Policy #34 has ID #1593

since the Class-IDs are the same, it seems to be one and the same policy-map bound to two interfaces

>>> print session.get("1.3.6.1.4.1.9.9.166.1.5.1.1.2.18.65536")
<SNMPVariable value='309479785' (oid='enterprises.9.9.166.1.5.1.1.2.18.65536', oid_index='', snmp_type='GAUGE')>
>>> print session.get("1.3.6.1.4.1.9.9.166.1.5.1.1.2.18.131072")
<SNMPVariable value='342719994' (oid='enterprises.9.9.166.1.5.1.1.2.18.131072', oid_index='', snmp_type='GAUGE')>
>>> print session.get("1.3.6.1.4.1.9.9.166.1.5.1.1.2.18.196608")
<SNMPVariable value='1593' (oid='enterprises.9.9.166.1.5.1.1.2.18.196608', oid_index='', snmp_type='GAUGE')>
>>> print session.get("1.3.6.1.4.1.9.9.166.1.5.1.1.2.34.196608")
<SNMPVariable value='1593' (oid='enterprises.9.9.166.1.5.1.1.2.34.196608', oid_index='', snmp_type='GAUGE')>
>>> print session.get("1.3.6.1.4.1.9.9.166.1.5.1.1.2.34.65536")
<SNMPVariable value='309479785' (oid='enterprises.9.9.166.1.5.1.1.2.34.65536', oid_index='', snmp_type='GAUGE')>
>>> print session.get("1.3.6.1.4.1.9.9.166.1.5.1.1.2.34.131072")
<SNMPVariable value='342719994' (oid='enterprises.9.9.166.1.5.1.1.2.34.131072', oid_index='', snmp_type='GAUGE')>
>>> print session.get("1.3.6.1.4.1.9.9.166.1.5.1.1.2.34.196608")
<SNMPVariable value='1593' (oid='enterprises.9.9.166.1.5.1.1.2.34.196608', oid_index='', snmp_type='GAUGE')>
>>>

Class-NAME = „1.3.6.1.4.1.9.9.166.1.7.1.1.1.#CLASS-ID“ (cbQosCMName)

  • Class #309479785 has the name „CM_VOIP_RTP“
  • Class #342719994 has the name „CM_VOIP_CTRL“
  • Class #1593 has the name „class-default“
>>> print session.get("1.3.6.1.4.1.9.9.166.1.7.1.1.1.309479785")
<SNMPVariable value='CM_VOIP_RTP' (oid='enterprises.9.9.166.1.7.1.1.1.309479785', oid_index='', snmp_type='OCTETSTR')>
>>> print session.get("1.3.6.1.4.1.9.9.166.1.7.1.1.1.342719994")
<SNMPVariable value='CM_VOIP_CTRL' (oid='enterprises.9.9.166.1.7.1.1.1.342719994', oid_index='', snmp_type='OCTETSTR')>
>>> print session.get("1.3.6.1.4.1.9.9.166.1.7.1.1.1.1593")
<SNMPVariable value='class-default' (oid='enterprises.9.9.166.1.7.1.1.1.1593', oid_index='', snmp_type='OCTETSTR')>

Put it all together
Fetch the list of all „Packet-Counter“-OIDs.

Get the Queue/Class-Details and the Packet-Counter per Class.

session = Session(hostname='192.168.2.72', community='READ', version=2)

cbqos = session.walk('1.3.6.1.4.1.9.9.166.1.15.1.1.2')

ifType=["","mainInterface","subInterface","frDLCI","atmPVC","controlPlane","vlanPort","evc"]
ifDir=["","input","output"]

for i in cbqos:
  oidList=i.oid.split(".")
  q=oidList.pop()
  p=oidList.pop()

  ifTypeID=int(session.get("1.3.6.1.4.1.9.9.166.1.1.1.1.2."+p).value)
  ifDirID=int(session.get("1.3.6.1.4.1.9.9.166.1.1.1.1.3."+p).value)
  ifID=session.get("1.3.6.1.4.1.9.9.166.1.1.1.1.4."+p).value
  ifName=session.get("1.3.6.1.2.1.2.2.1.2."+ifID).value
  classID=session.get("1.3.6.1.4.1.9.9.166.1.5.1.1.2."+p+"."+q).value
  className=session.get("1.3.6.1.4.1.9.9.166.1.7.1.1.1."+classID).value
  pktCounter=session.get(i.oid).value
  print ifName+"("+ifType[ifTypeID]+")"+ifDir[ifDirID],className,pktCounter+" Packets"

Example Output:

GigabitEthernet1(mainInterface)output CM_VOIP_RTP 9 Packets
GigabitEthernet1(mainInterface)output CM_VOIP_CTRL 0 Packets
GigabitEthernet1(mainInterface)output class-default 1716 Packets
GigabitEthernet2(mainInterface)output CM_VOIP_RTP 0 Packets
GigabitEthernet2(mainInterface)output CM_VOIP_CTRL 0 Packets
GigabitEthernet2(mainInterface)output class-default 0 Packets

Exploring the SNMP-MIB for Class-based QoS

Discover the OIDs representing the counter-values of all active traffic-classes

Ciscos „SNMP Object Navigator“ (http://mibs.cloudapps.cisco.com/ITDIT/MIBS/servlet/index) is our friend to get the base-OID when you know the name of the MIB:

  • Object-NAME <=> Object-ID (OID)
  • „cbQosCMStatsEntry“ <=> „1.3.6.1.4.1.9.9.166.1.15.1.1“

Each object is a set of all counters from „show policy-map interface“-command, the Object Navigator documents the ID of these counters, too.
„Exploring the SNMP-MIB for Class-based QoS“ weiterlesen

Refresher: RMON @ Cisco IOS

RMON Refresher
Think about this given Router-Configuration:

class-map match-all CM_VOIP_CTRL
 match dscp af31
class-map match-all CM_VOIP_RTP
 match dscp ef

policy-map PM_OUT
 class CM_VOIP_RTP
  priority percent 10
 class CM_VOIP_CTRL
  bandwidth percent 1
 class class-default
  fair-queue
!
interface GigabitEthernet1
 ip address 192.168.2.72 255.255.255.0
 service-policy output PM_OUT

Three Queues at interface Gig1:

  • CM_VOIP_RTP
  • CM_VOIP_CTRL
  • class-default

with per-Queue-Statistics:

  • Packet counters
  • Drop-counters
  • etc.

In these first examples, i don’t want to wait for queue-drops, i’ll just generate DSCP=EF-Traffic by the ping-command and watch the Queue-Packet-Counters, not Drops.
Configure RMON Alarms and Events
I’ll add two RMON-Events

rmon event 10 log owner RMONevent
rmon event 11 log owner RMONevent

event #10 = rising-threshold – in my example: >1 Packet has been dropped forwarded
event #11 = the falling-threshold – no packets have been…

Than, instruct the Router to have a look at a QoS-counter:

rmon alarm 10001 enterprises.9.9.166.1.15.1.1.2.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent

In the upcoming post I’ll discover the RMON-MIB to illustrate where the „enterprise.9….65536“-Parameter comes from.

This alarm #10001 monitors:

  • the value the QoS-counter with OID „enterprises.9.9.166.1.15.1.1.2.18.65536“ (Pkt-Counter of the RTP-Queue).
  • every 300s
  • watch for delta-values (not for absolute counters which might be interesting when monitoring temperatures, fan-speed etc…)
  • define a hysteresis:
    • rising: if the last counter-delta „was <1" and "is now >=1″ – it raises event#11.
    • falling: if the last counter-delta „was >=1“ and „is now <1" - it raises event#10.

Both events instruct the router to generate a syslog-message.
In production event 10 will be configured without the „log“-option [to do nothing]. This config is for demonstration purpose.

Forward some Traffic

Generate some Traffic (TOS 184 = DSCP 46 = Expedited Forwarding (EF).

IOS-RTR#ping 192.168.2.1 tos 184
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.2.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/5 ms

So – the Queue wasn’t used before:

  • the old-counter has been „0“
  • now 5 Packets have been forwarded

This delta-counter „5“ has exceeded rising-threshold „1“:

  • event#11 should be raised.

When?

  • we’ll have to wait between 0..300s for the cyclic 300s-alarm-interval to fire
IOS-RTR#
*Nov 20 14:54:39.015: %RMON-5-RISINGTRAP: Rising threshold has been crossed because the value of cbQosCMStatsEntry.2.18.65536 exceeded the rising-threshold value 1

The current counter is „5“:

  • whithout rtp-data, the next delta-counter will be 0.

Wait for the next 300s interval:

  • the falling-event#10 should get raised.
IOS-RTR#
*Nov 20 14:59:38.837: %RMON-5-FALLINGTRAP: Falling threshold has been crossed because the value of cbQosCMStatsEntry.2.18.65536 has fallen below the falling-threshold value 0

Works, perfect!

QoS Monitoring: Watch the Queues!?

Stop frequent polling of everything, please!
Last week I had to troubleshoot a network of a customer which was overwhelmed with SNMP-Queries – it wasn’t the first one.
All Switch- and Router-CPUs have been at high level, since every tiny counter was polled at high rate. To provide real-time graphs to the top-level-management. Which hopefully don’t waste time to watch these colourful pictures all day for entertainment purposes.

Doesn’t anybody remember RMON?
Years ago I’ve been teaching routing&switching-classes as a full-time Cisco/BayNetworks/Fluke-instructor, and in every switching class there was a brief explanation about SNMP.

And about RMON.
RFC2819 – RMON (Remote Network Monitoring) MIB

4 out of 9 RMON-groups are available:

  • Statistics – Real-Time counters
  • History – not interesting here 😉
  • Alarms – how to monitor OIDs (statistics-counters for example) by the device itself, incl. a hysteresis
  • Events – what to do if hysteresis-thresholds are passed.

Covered in 10 slides, and I’m pretty sure.. I’ve explained the difference between SNMP-GET/Polling and RMON-Alarms&Events/Traps and the negative impact of frequent polling.
Only 15 minutes time given to teach this. Might not been enough.

But people still prefer to poll every second the same error counter value instead of waiting for traps indicating the new counter-value.

Don’t watch the queues: Let the devices watch and notify you if something happens.

Upcoming Project: RMON-QOS Controller
I decided to refresh an old project to help people configuring rmon-alarms for Low-Latency-Queuing(LLQ) packet-drops in an automatic fashion.

Since the old code was TCL-based to run on the routers locally [which had advantages, too] I now want a centralized solution, and I want to take the chance to improve my python skills.

Never start to implement before having a design

Brainstorming:

  • central controller
    • orchestrate features
      • discover outbound QoS-classes/queues
      • configure alarms&events(SNMP/RMON)
    • listen for events
    • provide persistent event-storage
  • distributed intelligence
    • watch specified (error-)counters
    • notify the central snmp-manager if something happens
    • no dumb devices, please, like in OpenFlow, LAN-Emulation or other failed technologies…

The central controller has to be build.

  • SNMP-/RMON-Agents will provide the distributed intelligence.

Next step: RMON@IOS Refresher

Tomorrow I’ll start with a „RMON@IOS Refresher“ to visualise why you can’t implement RMON without some kind of automation, intelligence or how you call it.