QoS Monitoring: Watch the Queues!?

Stop frequent polling of everything, please!
Last week I had to troubleshoot a network of a customer which was overwhelmed with SNMP-Queries – it wasn’t the first one.
All Switch- and Router-CPUs have been at high level, since every tiny counter was polled at high rate. To provide real-time graphs to the top-level-management. Which hopefully don’t waste time to watch these colourful pictures all day for entertainment purposes.

Doesn’t anybody remember RMON?
Years ago I’ve been teaching routing&switching-classes as a full-time Cisco/BayNetworks/Fluke-instructor, and in every switching class there was a brief explanation about SNMP.

And about RMON.
RFC2819 – RMON (Remote Network Monitoring) MIB

4 out of 9 RMON-groups are available:

  • Statistics – Real-Time counters
  • History – not interesting here 😉
  • Alarms – how to monitor OIDs (statistics-counters for example) by the device itself, incl. a hysteresis
  • Events – what to do if hysteresis-thresholds are passed.

Covered in 10 slides, and I’m pretty sure.. I’ve explained the difference between SNMP-GET/Polling and RMON-Alarms&Events/Traps and the negative impact of frequent polling.
Only 15 minutes time given to teach this. Might not been enough.

But people still prefer to poll every second the same error counter value instead of waiting for traps indicating the new counter-value.

Don’t watch the queues: Let the devices watch and notify you if something happens.

Upcoming Project: RMON-QOS Controller
I decided to refresh an old project to help people configuring rmon-alarms for Low-Latency-Queuing(LLQ) packet-drops in an automatic fashion.

Since the old code was TCL-based to run on the routers locally [which had advantages, too] I now want a centralized solution, and I want to take the chance to improve my python skills.

Never start to implement before having a design

Brainstorming:

  • central controller
    • orchestrate features
      • discover outbound QoS-classes/queues
      • configure alarms&events(SNMP/RMON)
    • listen for events
    • provide persistent event-storage
  • distributed intelligence
    • watch specified (error-)counters
    • notify the central snmp-manager if something happens
    • no dumb devices, please, like in OpenFlow, LAN-Emulation or other failed technologies…

The central controller has to be build.

  • SNMP-/RMON-Agents will provide the distributed intelligence.

Next step: RMON@IOS Refresher

Tomorrow I’ll start with a „RMON@IOS Refresher“ to visualise why you can’t implement RMON without some kind of automation, intelligence or how you call it.

Schreibe einen Kommentar