A Failing Switch

Datacenter Diary #1

Im starting off a new category on my blog, “Datacenter Diary.” It basically just chronicles my adventures in the “Server Canyon” (or other) realm.

In this post I recount dealing with our first major hardware failure. Hardware failure happens all the time, but this was a single-point-of-failure device. Our main backbone switch.

Network switch with a fan warning
Fan light lit up but temp seems fine

I’ll admit I might have been ignoring an ever-louder growing rattle for a few weeks, but when the warning light lit up it was time for action.

I picked up a new (previously enjoyed and lovingly revitalized) switch from Unix Surplus. I have no special relationship with them, just a happy customer. I ended up with a fancy Dell/Force-10 S60-44T-AC-R. I picked this device because the Server Canyon network sees large bursts of traffic and this device has a pretty large packet buffer. Plus, it has two 10-gigabit ports that I will definitely take advantage of later.

Force-10 glamor shot

The first thing to do was copy the config from the old PowerConnect to the new Force10 switch. This turned out to be a little more difficult than expected since they were different product lines, but a switch is a switch and I got it sorted.

Luckily, nearly every piece of server infrastructure has redundant uplinks. This means that I just needed to trunk the switches and start moving servers over. The management and client networks were only disrupted as long as it took to move the plugs over, but the storage network utilizes redundant links so no real downtime occured.

Here you can see an in-progress picture of the migration:

The process ended without much fuss, I only managed to miss tagging one VLAN.

Next, a post-mortem on the old switch. That is a problem for another blog post though!


Leave a Reply