Automated RMON Alarm/Event-configuration for class-based QoS-Monitoring using NAPALM

In Configure RMON Alarms&Events by script I’ve shown a short python-algorithm to to discover all Cisco class-based QoS (cbQoS) packet-/drop-counters and to generate RMON-alarms for each. The router monitors these counters every 300s, calculates the delta for the interval and raises RMON-events when there were packets/drops or when the have been before but not anymore.

This RMON-event has been configured as an syslog-message to an syslog-receiver etc.

The existing script just generated a list of cli-commands which had to be entered manually to the router-config.
Not a valid aproach when having hundreds devices to be configured.

Now i want the script to automatically configure the router.

  • add both „rmon event“-objects for the rising- and the falling-threshold of the monitored alarms
  • read the existing „rmon alarm“-objects from the device config, which have been configured by this script during a former run
  • remove these existing alarms
  • discover all cbQoS-packet/drop-counters
  • add corresponding „rmon alarm“-objects

I’d like to refer to Centralized access to device-configuration and other state-information using NAPALM for some basic information regarding NAPALM and how to create the „router“-object in python.

NAPALM: Read existing RMON alarms.
I’ll use the following python-logic to

  • remote-execute the command
  • immedeately pull the cli-output out of the python-dictionary: the CLI-Command is the dict-key
>>> cligetrmon=['show rmon alarms | inc RMONevent']
>>> rmonalarms = router.cli(cligetrmon)[cligetrmon[0]]
>>> print rmonalarms
Alarm 10001 is active, owned by RMONevent

Generate CLI to delete these RMON alarms

>>> cmdnormon = ""
>>> for alarm in rmonalarms.split('\n'):
...  alarmid = alarm.split(' ')[1]
...  cmdnormon += "no rmon alarm "+alarmid+"\n"
...
>>>
>>> print cmdnormon
no rmon alarm 10001

Static CLI to add required RMON events

>>> cmdrmonevent = "rmon event 10 log owner RMONevent\n"
>>> cmdrmonevent += "rmon event 11 log owner RMONevent\n"

Read Cisco cbQoS-MIB to fetch interesting QoS-counters, generate CLI for RMON-alarms

>>> from easysnmp import Session
>>> hostname = "192.168.2.72"
>>> session = Session(hostname, community='READ', version=2)
>>>
... cbqos = session.walk('1.3.6.1.4.1.9.9.166.1.15.1.1.13')
>>>
... cmdrmon = ""
>>> alarmID = 10001
>>>
... for i in cbqos:
...   oidList=i.oid.split(".")
...   q=oidList.pop()
...   p=oidList.pop()
...   #print p,q
...   ifTypeID=int(session.get("1.3.6.1.4.1.9.9.166.1.1.1.1.2."+p).value)
...   ifDirID=int(session.get("1.3.6.1.4.1.9.9.166.1.1.1.1.3."+p).value)
...   if (ifDirID==2):
...     cmdrmon += "rmon alarm "+str(alarmID)+" "+i.oid+" 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent\n"
...   alarmID += 1
...

Concatenate all commmands

>>> cmd = cmdrmonevent+cmdnormon+cmdrmon
>>> print cmd
rmon event 10 log owner RMONevent
rmon event 11 log owner RMONevent
no rmon alarm 10001
rmon alarm 10001 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10002 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10003 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10004 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10005 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10006 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent

Push the commands to the Router

>>> router.load_merge_candidate(config=cmd)

Check the differences befor apply the changes

>>> print router.compare_config()
-no rmon alarm 10001
+rmon alarm 10001 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
+rmon alarm 10002 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
+rmon alarm 10003 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
+rmon alarm 10004 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
+rmon alarm 10005 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
+rmon alarm 10006 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent

Verify the pushed commands at the router-CLI

IOS-RTR#dir *.txt
Directory of bootflash:/*.txt

Directory of bootflash:/

   21  -rw-         898  Nov 24 2017 15:05:32 +00:00  merge_config.txt
7835619328 bytes total (6613028864 bytes free)

IOS-RTR#more merge_config.txt
rmon event 10 log owner RMONevent
rmon event 11 log owner RMONevent
no rmon alarm 10001
rmon alarm 10001 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10002 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10003 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10004 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10005 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10006 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent

Commit the changes

>>> router.commit_config()

Or discard them

>>> router.discard_config()

It’s possible to rollback committed changes.

>>> router.rollback()

Finally: Disconnect the session with the device

>>> router.close()

Again: A brief look to the router

IOS-RTR#show run | inc rmon
! Last configuration change at 19:08:59 UTC Fri Nov 24 2017 by rmond
! NVRAM config last updated at 19:09:00 UTC Fri Nov 24 2017 by rmond
username rmond privilege 15 secret 5 $1$7VnE$2O18Vfcr4y7eO5gY7l4xx1
rmon event 10 log owner RMONevent
rmon event 11 log owner RMONevent
rmon alarm 10001 cbQosCMStatsEntry.13.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10002 cbQosCMStatsEntry.13.18.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10003 cbQosCMStatsEntry.13.18.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10004 cbQosCMStatsEntry.13.34.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10005 cbQosCMStatsEntry.13.34.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10006 cbQosCMStatsEntry.13.34.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
IOS-RTR#

It’s already written to NVRAM

IOS-RTR#show startup-config | inc rmon
! Last configuration change at 19:08:59 UTC Fri Nov 24 2017 by rmond
! NVRAM config last updated at 19:09:00 UTC Fri Nov 24 2017 by rmond
username rmond privilege 15 secret 5 $1$7VnE$2O18Vfcr4y7eO5gY7l4xx1
rmon event 10 log owner RMONevent
rmon event 11 log owner RMONevent
rmon alarm 10001 cbQosCMStatsEntry.13.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10002 cbQosCMStatsEntry.13.18.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10003 cbQosCMStatsEntry.13.18.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10004 cbQosCMStatsEntry.13.34.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10005 cbQosCMStatsEntry.13.34.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10006 cbQosCMStatsEntry.13.34.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent

QoS Monitoring: Watch the Queues!?

Stop frequent polling of everything, please!
Last week I had to troubleshoot a network of a customer which was overwhelmed with SNMP-Queries – it wasn’t the first one.
All Switch- and Router-CPUs have been at high level, since every tiny counter was polled at high rate. To provide real-time graphs to the top-level-management. Which hopefully don’t waste time to watch these colourful pictures all day for entertainment purposes.

Doesn’t anybody remember RMON?
Years ago I’ve been teaching routing&switching-classes as a full-time Cisco/BayNetworks/Fluke-instructor, and in every switching class there was a brief explanation about SNMP.

And about RMON.
RFC2819 – RMON (Remote Network Monitoring) MIB

4 out of 9 RMON-groups are available:

  • Statistics – Real-Time counters
  • History – not interesting here 😉
  • Alarms – how to monitor OIDs (statistics-counters for example) by the device itself, incl. a hysteresis
  • Events – what to do if hysteresis-thresholds are passed.

Covered in 10 slides, and I’m pretty sure.. I’ve explained the difference between SNMP-GET/Polling and RMON-Alarms&Events/Traps and the negative impact of frequent polling.
Only 15 minutes time given to teach this. Might not been enough.

But people still prefer to poll every second the same error counter value instead of waiting for traps indicating the new counter-value.

Don’t watch the queues: Let the devices watch and notify you if something happens.

Upcoming Project: RMON-QOS Controller
I decided to refresh an old project to help people configuring rmon-alarms for Low-Latency-Queuing(LLQ) packet-drops in an automatic fashion.

Since the old code was TCL-based to run on the routers locally [which had advantages, too] I now want a centralized solution, and I want to take the chance to improve my python skills.

Never start to implement before having a design

Brainstorming:

  • central controller
    • orchestrate features
      • discover outbound QoS-classes/queues
      • configure alarms&events(SNMP/RMON)
    • listen for events
    • provide persistent event-storage
  • distributed intelligence
    • watch specified (error-)counters
    • notify the central snmp-manager if something happens
    • no dumb devices, please, like in OpenFlow, LAN-Emulation or other failed technologies…

The central controller has to be build.

  • SNMP-/RMON-Agents will provide the distributed intelligence.

Next step: RMON@IOS Refresher

Tomorrow I’ll start with a „RMON@IOS Refresher“ to visualise why you can’t implement RMON without some kind of automation, intelligence or how you call it.