June 26, 2006
The router at work died late Thursday night. I came in Friday morning to find all the WAN lights showing errors, and a power-cycle just made them all go dark forever. Which is only a problem for however many dozens of web sites we run, and about three dozen clients' e-mail.
After doing some testing with Verizon to make sure everything else was OK (they were able to test the like right up to the patch panel in the rack area) we called in an emergency tech to get our old Cisco working.
Late last year we were running out of our allotment on IP addresses, so we requested, and received, a block of 64 to go with our original 16. These two sets of IP addresses aren't contiguous. Since we didn't have the manuals to configure the router ourselves we just waited until we got the dual T1 put in a couple months later. The old Cisco can only handle a single T1 so it'd have to be replaced anyway.
The new router, that had all the IP addresses in it and handled both T1s, is the one that failed. We really wanted to get that second IP block on the Cisco to avoid disruption of services (especially secure web sites, which have to use different port numbers like www.example.com:444 when they share an IP address), and because our name servers were both on the new block of addresses. We needed someone who knew Cisco routers, and that's who we were told we'd get.
This guy wasn't what you'd call an expert. After half an hour he'd managed to do a password recovery (that was another obstacle to us updating the thing) then spent the time from 12:30 to 4:30 -- at $190 an hour -- failing to add the second IP block to the Cisco. At 4:30 the boss told me to have him just get the old addresses working again (they'd come and go as he futzed with the thing) and transfer the web sites and name server off the new block until our replacement Tasman router (yay, warranties) comes in on Monday.
So at 4:30 I started making the changes to the web servers, to run each server on one of the old addresses. This required disabling the streaming media server for the weekend, but oh well. Then I modified any sites that used SSL to do the different-port thing I mentioned before. After that I made the changes at NSI (to point to our name server's new address) and updated the bare minimum of A records.
While I was doing that, the other person in the office (boss and his wife were finishing their vacation) started calling all the clients I had phone numbers for to let them know that things were fixed, and that in some cases it may take a few hours to everything to get back to normal.
So all the work I was planning on doing Friday will now have to be done Saturday and Sunday, since the project I'm working on still needs to deliver when the clients' butts hit their chairs Monday morning. The long weekend for Independance Day can't come fast enough.
Edit, 6/26 10:29 AM: And it turns out that with only one incoming line, you can't have discontiguous IP blocks. You'd really think an "expert" would know that...