My networking experience is in the small realm.
Because of this, I don’t get to play with Cisco’s that much, mostly their small business line.
Lately I’ve been exposed to their slightly more complex systems, including stacked switches.
I get the concept and it certainly has a place in the right environment.
Just not in most of my environments because of the cost of equipment and support.
So my first attempt at doing a minor revision update on a series of stacked switches didn’t go, terribly well…
It begins with a client who has a random full stack restart in the middle of the day.
At first it could be written off as a power blip since no one was in the office to confirm power status.
Digging deeper though, the sh ver command says the switches were all rebooted by a reload command.
I pressed the customer if he or anyone else had been mucking around in the switches, to which he explained that no one there was and probably don’t even know how.
Lovely.
So I look for a scheduled task, nope, none found.
Googling shows that outside of invoking via a command, script, scheduled task, or other manual event, this shouldn’t happen.
I told them this could be a software bug as he is running older switches (the latest firmware in this branch was released in 2017), and it may be time to replace them as Cisco state they are EOL.
The client is shutting down their office soon and would rather not waste the money on new switches.
Fair.
I suggest then that we leave well enough alone, 1 single reboot like this could’ve just been a one-off.
No dice.
He wants to update the firmware because the CEO has a monster presentation to give to over 1000 people and they can’t have this stack, that takes 20 minutes to reboot, go down.
Fair enough.
Now I know enough to be dangerous, but I have never professed to be a Cisco expert, or even moderately versed.
But it’s a firmware update, fairly simple across everything I’ve ever done, at least in the last 10 years.
Ever done a BIOS hot flash? I have, it’s terrifying. The fact that we have built-in BIOS recovery systems means I’ll never have to do that again.
So I confirm the process.
Get a TFTP server up, easy
Transfer the firmware to one of the switches, easy
Update the boot system to point to the new firmware, easy
Reload the switch, easy
Confirm the version, oh and you’ll get a stack version mismatch, easy
Copy the firmware to the other devices, update the boot system and reload them, easy
Well it would be, if I didn’t execute every command against the stack expect the firmware file copy
If you’re versed in Cisco environments you’re probably going “Oh no”, and you’d be right
The WHOLE stack rebooted, but only 1 switch had the firmware, crap
Switch 1 comes up fine, but the others are stuck on a switch: prompt
Okay, I need to get this booted, either I transfer the firmware somehow or roll the switches back.
Again I’m sure some Cisco experts have plenty of options at this point, I really could’ve used you!
Okay, how can I get these files over?
Can I TFTP from this low level environment?
Apparently yes!
So I try a copy tftp://server/filename flash:samefilename
D’oh, forgot to give the switch an IP address!
So I throw an IP address and gateway on the switch and try again.
Huzzah! It’s trying…. and failing, going ungodly slow.
Okay, let’s go old school and try and XMODEM transfer over Tera Term…
About an hour, for 1 file, for each switch, it’s 3:30 and I want to be HOME by 6.
Alright, let’s try finding a way to boot them, I still have the original firmware.
After finding nothing really useful I try picking boot files that seem logical and finally stumble on what I can only assume a properly trained Cisco admin would know:boot flash:packages.conf
It lives!
And it crashes!
As it starts scanning and detecting the stack it locks up.
DAMNIT!
Okay, let’s take them out of the stack.
Disconnect the cables, and it comes up!
Keep in mind each reboot is taking 10-15 minutes and I’m trying these processes across multiple switches so I’m trying multiple things to reduce the waiting until I hit the right solution.
Each time I make progress I drop what the others are doing that end in failure and catch them up while I’m working with whichever is up at that time.
Okay, I’ve got them up, just not functioning in the stack, let’s see what happens when I plug a switch with old firmware into the stack!
Another lock up, okay, that’s not working.
Back up after re-isolating, now I still need to get this firmware on.
Okay, let me get myself on to a network these will recognize and HOLY BALLS WHY SO MANY VLANS?!
These needed a cleanup years ago, a dozen or so VLANs when they only needed 5 at the MOST.
Ug, so once up and in enable mode show int switchport to get the interface that’s on the same subnet my laptop is on because it’s dumb to reconfigure my interface when the switches are already up.
I just want to connect to console mode, copy the file, and reboot.
So that’s exactly what I do.
I get this done on one of the failed switches, and issue a reload.
30 nail-biting minutes later, it’s up on the new firmware!
Connect to stack aaaaand, nothing happens!
No crash, no errors.
SUCCESS!
Do the same for the other switches and everything is up and running.
Worst part is even I was successful at only reloading a single switch, it would’ve crashed the stack anyway.
TL;DR:
Be careful when upgrading a stack’s firmware.
Have a Cisco expert on standby if you aren’t already one.