Cisco CSR1000V Software Upgrade – Automated

No. There’s no need to export the IOS-Config, deploy another Router-VM using an OVA and import the old IOS-Config to this new router.

  • even, if finetuned… This strategy might lower the downtime!?
  • VMware uses this strategy when upgrading NSX-Edge-Gateways very successfully!

But this is a lab environment, i’ll have to upgrade almost ten CSR1000v-Routers and there’s no time to do it manually router-by-router.

The common process is as it has been for the last decades:

  • copy the new csr1000v-bin-File into the routers bootflash
  • verify the file
  • set the boot-variable
  • reboot

Upload BIN-File into the routers
There might be dozens of valid possibilities to get the bin-file into the router.

I prefer SCP (Secure Copy Protocol) since i uses the same firewall-rules as SSH so it’s unlikely that firewalls will disturb the update process.

  • i downloaded the bin-file using a windows-machine
  • i’ll use PSCP from the Putty-software-suite

Basics: Loop over a set of IPs in Windows Command-Shell?
That’s all:

C:> for %i in (235,241,240,239,236,237,238,242,243) do @echo %i
235
241
240
239
236
237
238
242
243

Let’s go


c:\Users\admin\Downloads>dir *.bin

 Verzeichnis von c:\Users\admin\Downloads

16.12.2017  17:44       365.660.728 csr1000v-universalk9.16.03.05.SPA.bin
               1 Datei(en),    365.660.728 Bytes
               0 Verzeichnis(se), 73.892.016.128 Bytes frei

c:\Users\admin\Downloads>for %i in (235,241,240,239,236,237,238,242,243) do @start pscp -2 -scp -l rmond -pw rmondpass csr1000v-universalk916.03.05.SPA.bin 192.168.2.%i:bootflash:csr1000v-universalk9.16.03.05.SPA.bin

This will initiate 9 parallel SCP-Filetransfers:

  • nobody said, this would improve the transfer speed 😉
  • i’ll do something else in the meantime
9x PSCP-File-Transfers

Verify the transferred images

import napalm
from easysnmp import Session
 
#credentials
DEVICE="192.168.2.235"
USER="rmond"
PASS="rmondpass"
SNMPRW="WRITE"
IOSFILE="bootflash:csr1000v-universalk9.16.03.05.SPA.bin"
IOSMD5="49922f08698284312379b4e0a2534bc2"
VERIFIED="Verified"

SNMPOIDReload="1.3.6.1.4.1.9.2.9.9.0"
SNMPOIDReloadVal=2
 
#instanciate NAPALM
iosdriver = napalm.get_network_driver('ios')
 
#connect to device
router = iosdriver(hostname=DEVICE, username=USER,  password=PASS, optional_args={'port': 22, 'dest_file_system': 'bootflash:'})
router.open()

#construct command to verify the integrity 
cliVerify=["verify /md5 "+IOSFILE+" "+IOSMD5]
result=router.cli(cliVerify)[cliVerify[0]]

Set the Bootvar and check, if it’s set

#%Error verifying 
#Verified
if (result.find(VERIFIED)>-1):
    print "(1) uploaded File: OK"
    cmdBootSystem="boot system flash bootflash:csr1000v-universalk9.16.03.05.SPA.bin"
    #push boot-system-command to router
    router.load_merge_candidate(config=cmdBootSystem)
    router.commit_config()

    cliShowBootvar=["show bootvar"]
    result=router.cli(cliShowBootvar)[cliShowBootvar[0]]
    #disconnet
    router.close()
    if (result.find("BOOT variable = "+IOSFILE)>-1):
        print "(2) boot-Variable set"
        print "=> Router "+DEVICE+" ready to reload"

Reload the Router using SNMP

        #snmp-server system-shutdown = 1.3.6.1.4.1.9.2.9.9.0 => Value 2 => Reload
        session = Session(hostname=DEVICE, community=SNMPRW, version=2)
        session.set(SNMPOIDReload,SNMPOIDReloadVal,"INTEGER")
else:
    #disconnet
    router.close()

The Router reboots

***
*** --- SHUTDOWN in 0:00:00 ---
*** Message from network to all terminals:
***
Null Message

Be patient.

W-DCFW#show ver | inc IOS.*Version
Cisco IOS XE Software, Version 16.03.05
Cisco IOS Software [Denali], CSR1000V Software (X86_64_LINUX_IOSD-UNIVERSALK9-M), Version 16.3.5, RELEASE SOFTWARE (fc1)

The new software-release is up and running.

Automated RMON Alarm/Event-configuration for class-based QoS-Monitoring using NAPALM

In Configure RMON Alarms&Events by script I’ve shown a short python-algorithm to to discover all Cisco class-based QoS (cbQoS) packet-/drop-counters and to generate RMON-alarms for each. The router monitors these counters every 300s, calculates the delta for the interval and raises RMON-events when there were packets/drops or when the have been before but not anymore.

This RMON-event has been configured as an syslog-message to an syslog-receiver etc.

The existing script just generated a list of cli-commands which had to be entered manually to the router-config.
Not a valid aproach when having hundreds devices to be configured.

Now i want the script to automatically configure the router.

  • add both „rmon event“-objects for the rising- and the falling-threshold of the monitored alarms
  • read the existing „rmon alarm“-objects from the device config, which have been configured by this script during a former run
  • remove these existing alarms
  • discover all cbQoS-packet/drop-counters
  • add corresponding „rmon alarm“-objects

I’d like to refer to Centralized access to device-configuration and other state-information using NAPALM for some basic information regarding NAPALM and how to create the „router“-object in python.

NAPALM: Read existing RMON alarms.
I’ll use the following python-logic to

  • remote-execute the command
  • immedeately pull the cli-output out of the python-dictionary: the CLI-Command is the dict-key
>>> cligetrmon=['show rmon alarms | inc RMONevent']
>>> rmonalarms = router.cli(cligetrmon)[cligetrmon[0]]
>>> print rmonalarms
Alarm 10001 is active, owned by RMONevent

Generate CLI to delete these RMON alarms

>>> cmdnormon = ""
>>> for alarm in rmonalarms.split('\n'):
...  alarmid = alarm.split(' ')[1]
...  cmdnormon += "no rmon alarm "+alarmid+"\n"
...
>>>
>>> print cmdnormon
no rmon alarm 10001

Static CLI to add required RMON events

>>> cmdrmonevent = "rmon event 10 log owner RMONevent\n"
>>> cmdrmonevent += "rmon event 11 log owner RMONevent\n"

Read Cisco cbQoS-MIB to fetch interesting QoS-counters, generate CLI for RMON-alarms

>>> from easysnmp import Session
>>> hostname = "192.168.2.72"
>>> session = Session(hostname, community='READ', version=2)
>>>
... cbqos = session.walk('1.3.6.1.4.1.9.9.166.1.15.1.1.13')
>>>
... cmdrmon = ""
>>> alarmID = 10001
>>>
... for i in cbqos:
...   oidList=i.oid.split(".")
...   q=oidList.pop()
...   p=oidList.pop()
...   #print p,q
...   ifTypeID=int(session.get("1.3.6.1.4.1.9.9.166.1.1.1.1.2."+p).value)
...   ifDirID=int(session.get("1.3.6.1.4.1.9.9.166.1.1.1.1.3."+p).value)
...   if (ifDirID==2):
...     cmdrmon += "rmon alarm "+str(alarmID)+" "+i.oid+" 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent\n"
...   alarmID += 1
...

Concatenate all commmands

>>> cmd = cmdrmonevent+cmdnormon+cmdrmon
>>> print cmd
rmon event 10 log owner RMONevent
rmon event 11 log owner RMONevent
no rmon alarm 10001
rmon alarm 10001 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10002 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10003 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10004 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10005 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10006 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent

Push the commands to the Router

>>> router.load_merge_candidate(config=cmd)

Check the differences befor apply the changes

>>> print router.compare_config()
-no rmon alarm 10001
+rmon alarm 10001 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
+rmon alarm 10002 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
+rmon alarm 10003 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
+rmon alarm 10004 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
+rmon alarm 10005 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
+rmon alarm 10006 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent

Verify the pushed commands at the router-CLI

IOS-RTR#dir *.txt
Directory of bootflash:/*.txt

Directory of bootflash:/

   21  -rw-         898  Nov 24 2017 15:05:32 +00:00  merge_config.txt
7835619328 bytes total (6613028864 bytes free)

IOS-RTR#more merge_config.txt
rmon event 10 log owner RMONevent
rmon event 11 log owner RMONevent
no rmon alarm 10001
rmon alarm 10001 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10002 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10003 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.18.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10004 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10005 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10006 iso.3.6.1.4.1.9.9.166.1.15.1.1.13.34.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent

Commit the changes

>>> router.commit_config()

Or discard them

>>> router.discard_config()

It’s possible to rollback committed changes.

>>> router.rollback()

Finally: Disconnect the session with the device

>>> router.close()

Again: A brief look to the router

IOS-RTR#show run | inc rmon
! Last configuration change at 19:08:59 UTC Fri Nov 24 2017 by rmond
! NVRAM config last updated at 19:09:00 UTC Fri Nov 24 2017 by rmond
username rmond privilege 15 secret 5 $1$7VnE$2O18Vfcr4y7eO5gY7l4xx1
rmon event 10 log owner RMONevent
rmon event 11 log owner RMONevent
rmon alarm 10001 cbQosCMStatsEntry.13.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10002 cbQosCMStatsEntry.13.18.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10003 cbQosCMStatsEntry.13.18.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10004 cbQosCMStatsEntry.13.34.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10005 cbQosCMStatsEntry.13.34.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10006 cbQosCMStatsEntry.13.34.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
IOS-RTR#

It’s already written to NVRAM

IOS-RTR#show startup-config | inc rmon
! Last configuration change at 19:08:59 UTC Fri Nov 24 2017 by rmond
! NVRAM config last updated at 19:09:00 UTC Fri Nov 24 2017 by rmond
username rmond privilege 15 secret 5 $1$7VnE$2O18Vfcr4y7eO5gY7l4xx1
rmon event 10 log owner RMONevent
rmon event 11 log owner RMONevent
rmon alarm 10001 cbQosCMStatsEntry.13.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10002 cbQosCMStatsEntry.13.18.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10003 cbQosCMStatsEntry.13.18.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10004 cbQosCMStatsEntry.13.34.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10005 cbQosCMStatsEntry.13.34.131072 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent
rmon alarm 10006 cbQosCMStatsEntry.13.34.196608 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent

Centralized access to device-configuration and other state-information using NAPALM

Since I still want to build a centralized solution for automated configuration of RMON-alarms to monitor Cisco cbQoS-packet/drop-counters a solution to retrieve and modify device configurations was needed.

The NAPALM (Network Automation and Programmability Abstraction Layer with Multivendor support) Framework seems to provide the required features.
NAPALM Installation
Some NAPALM-IOS dependencies have to be fulfilled first.

sudo apt-get install -y --force-yes libssl-dev libffi-dev python-dev python-cffi

The „partial installation“ seems to be not working anymore, the full installation uses some KB more ressources, not worth to think about the partial installation…

pip install napalm

IOS Preperation
To allow remote-access to the centralized NAPALM server these features need to be enabled in each IOS device:

  • Remote-Access via SSH,
  • SCP (Secure Copy),
  • the IOS „Archive“-feature is the foundation of NAPALM config-operations.
IOS-RTR#conf t
Enter configuration commands, one per line.  End with CNTL/Z.

! AAA preferred for production-systems, of course
IOS-RTR(config)#username rmond privilege 15 secret rmondpass

! required
IOS-RTR(config)#ip scp server enable

! no annoying [yes/no]-prompts for file-operations anymore
IOS-RTR(config)#file prompt quiet

! create the folder in the filesystem for the Archive
IOS-RTR(config)#do mkdir bootflash:/ARCHIVE

IOS-RTR(config)#archive
IOS-RTR(config-archive)#path bootflash:/ARCHIVE/bak-

Explore Napalm

Go the the python shell:

user@snmp-server:~$ python
>>> import napalm
>>> iosdriver = napalm.get_network_driver('ios')

Autodiscovery of the router-filesystem doesn’t work, i know the filesystem of my router and pass it as „optional argument“ to router-object.
*** todo: troubleshoot/fix the autodiscovery ***

>>> router = iosdriver(hostname='192.168.2.72', username='rmond',  password='rmondpass', 
optional_args={'port': 22, 'dest_file_system': 'bootflash:'})
>>> router.open()

Go to the router cli-shell:

  • user „rmond“ is logged in
IOS-RTR#who
    Line       User       Host(s)              Idle       Location
*  1 vty 0     user       idle                 00:00:00 192.168.2.109
   2 vty 1     rmond      idle                 00:00:02 192.168.2.89

Back to python, try some NAPALM-functions.

>>> print router.get_facts()
{u'os_version': u'CSR1000V Software (X86_64_LINUX_IOSD-UNIVERSALK9-M), Version 15.4(2)S3, RELEASE SOFTWARE (fc2)', 
u'uptime': 657600, u'interface_list': [u'GigabitEthernet1', u'GigabitEthernet2', u'GigabitEthernet3'], u'vendor': u'Cisco', 
u'serial_number': u'afdökjl0123', u'model': u'CSR1000V', u'hostname': u'IOS-RTR', u'fqdn': u'IOS-RTR.lab.local'}

Execute interactive EXEC-Commands.

>>> cliping=['ping 192.168.2.1']
>>> print router.cli(cliping)
{'ping 192.168.2.1': u'Type escape sequence to abort.\n
Sending 5, 100-byte ICMP Echos to 192.168.2.1, timeout is 2 seconds:\n!!!!!\n
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/2 ms'}

Fetch the current running-config.

>>> clishowrun=['show running-config']
>>> print router.cli(clishowrun)
{'show running-config': u'Building configuration...\n\nCurrent configuration : 2411 bytes\n!\n
! Last configuration change at 12:52:32 UTC Fri Nov 24 2017 by user\n! NVRAM config last updated at 10:24:26 UTC Fri Nov 24 2017 by user\n!\n
version 15.4\nservice timestamps debug datetime msec\nservice timestamps log datetime msec\n
...rmon event 10 log owner RMONevent\nrmon event 11 log owner RMONevent\n
rmon alarm 10001 cbQosCMStatsEntry.2.18.65536 300 delta rising-threshold 1 11 falling-threshold 0 10 owner RMONevent\n
...
end'}

Read some lines from the running-config, filter using the pipe.

>>> cligetrmon=['show rmon alarms | inc RMONevent']
>>> print router.cli(cligetrmon)
{'show rmon alarms | inc RMONevent': u'Alarm 10001 is active, owned by RMONevent'}

Log-out if you are finished:

>>> router.close()

QoS Monitoring: Watch the Queues!?

Stop frequent polling of everything, please!
Last week I had to troubleshoot a network of a customer which was overwhelmed with SNMP-Queries – it wasn’t the first one.
All Switch- and Router-CPUs have been at high level, since every tiny counter was polled at high rate. To provide real-time graphs to the top-level-management. Which hopefully don’t waste time to watch these colourful pictures all day for entertainment purposes.

Doesn’t anybody remember RMON?
Years ago I’ve been teaching routing&switching-classes as a full-time Cisco/BayNetworks/Fluke-instructor, and in every switching class there was a brief explanation about SNMP.

And about RMON.
RFC2819 – RMON (Remote Network Monitoring) MIB

4 out of 9 RMON-groups are available:

  • Statistics – Real-Time counters
  • History – not interesting here 😉
  • Alarms – how to monitor OIDs (statistics-counters for example) by the device itself, incl. a hysteresis
  • Events – what to do if hysteresis-thresholds are passed.

Covered in 10 slides, and I’m pretty sure.. I’ve explained the difference between SNMP-GET/Polling and RMON-Alarms&Events/Traps and the negative impact of frequent polling.
Only 15 minutes time given to teach this. Might not been enough.

But people still prefer to poll every second the same error counter value instead of waiting for traps indicating the new counter-value.

Don’t watch the queues: Let the devices watch and notify you if something happens.

Upcoming Project: RMON-QOS Controller
I decided to refresh an old project to help people configuring rmon-alarms for Low-Latency-Queuing(LLQ) packet-drops in an automatic fashion.

Since the old code was TCL-based to run on the routers locally [which had advantages, too] I now want a centralized solution, and I want to take the chance to improve my python skills.

Never start to implement before having a design

Brainstorming:

  • central controller
    • orchestrate features
      • discover outbound QoS-classes/queues
      • configure alarms&events(SNMP/RMON)
    • listen for events
    • provide persistent event-storage
  • distributed intelligence
    • watch specified (error-)counters
    • notify the central snmp-manager if something happens
    • no dumb devices, please, like in OpenFlow, LAN-Emulation or other failed technologies…

The central controller has to be build.

  • SNMP-/RMON-Agents will provide the distributed intelligence.

Next step: RMON@IOS Refresher

Tomorrow I’ll start with a „RMON@IOS Refresher“ to visualise why you can’t implement RMON without some kind of automation, intelligence or how you call it.

Cisco UCS Director (UCSD) as Unified Infrastructure Controller

Yes, I agree: this product name can’t get worse.

What’s not good with the product name „UCS Director“? It needs explanation!

Nobody in this world could guess it’s feature-set, everybody thinks it’s some additional umbrella-management on top of the UCS-Manager or UCS Central.

„Unified Infrastructure Controller“ would fit much better, since the UCSD not only automates UCS-Components, but the whole Datacenter (and more) including LAN/SAN-Switches, Firewalls and the virtualization environment like vSphere or Hyper-V.