I oofed… – 5.10.2019

Today, I started the day off with an oof…

Picture this, Friday morning, I’m in the office early (7am) to start work on some configs before the campus started filling up with Admin, Faculty and students. I grab my coffee, sat down at your desk and logged my PC.

I was feeling confident, started studying my CCNP, felt like I knew my environment like it was the back of my hand, so I launched SuperPutty, configured my sessions ( over SSHv2. I proceed to show my current interface status:

#show interface status
From here, I saw that there were several updates that needed to be done.

Performing the show interface status command showed that I had several things that needed to be updated. To list a few: out of date descriptions, port-channel groups on non-redundant switchports, dated hostname, and various open ports that should be disabled.

So, I address the quick and dirty: renamed switchport descriptions, disabled unused ports, changed down ports to an arbitrary VLAN # in the event that it did come back online, and fixed my “archive” link that automatically backed up my config to a local tftp server. I know, I should be using SFTP, but I don’t know how to! Teach me.

Moving forward, I wanted to tackle the port-channels. I log into both switches (Core and the adjacent switch) and proceed to remove the fiber uplinks from the assigned channel-groups. Well, I started with the core…

That was a mistake, it completely threw my switches into error state. It’s now 8AM, students are rolling in and teachers are starting to check students into our Student Information System (SIS).

Then, all phones go offline, my primary DC (domain controller), goes offline, printers aren’t working and several people are calling me to tell me that the internet is not working!

It’s because I blocked Google. ahahaha, Just kidding.

I quickly realized that the switchports originally configured are now in errdisable state! I oofed! First off, I should have waited until the end of the day to make these changes so that this sort of thing can be mitigated without causing a service outage. Second off, I needed to be in two places at once so that I could cycle both ports, prevent loops, and then verify that they were both up and running but I was alone. My colleague was still on his way in for the day.

After several minutes of frustration, running back and forth on an 18-acre campus, I finally did it. I cycled the ports, waited two minutes for the entire core to come online (I did a full restart) – we were finally online. I then had to test the DC to make sure it was happy, pinged the phone system and validated that the printers were operational again.

We were back online after 15 minutes of outage due to my stupidity. Reflecting on my ID-10-T error, I realized that I should have waited until the end of the day, notified my head master that I wanted to make the changes, scheduled the change and then proceeded to a planned change. Remember to stay humble and grounded. You can prevent these mistakes.

Lesson learned.

Leave a Reply

Your email address will not be published.