|Thursday, May 18, 2006
16:18 - How Not to Do It
Around three months ago, I got an e-mail from my hosting company, Managed.com. This message was to inform me that within the coming few months, all their dedicated and colocated servers in their San Jose data center were to be migrated to a new and better data center in New Jersey.
A little background is probably in order. I have not one but two dedicated servers with Managed.com, a company I chose for its low prices, its physical nearness to me, and—in no small part—its swanky domain name that seemed to indicate that it was the gold standard in the hosting industry. (I was later to discover that this was not really the case, that many other hosting companies offering similar services and often for better prices and with better service and terms existed in plenty; but once established at a given hosting location, I am loath to move, because of the logistics involved.) To keep from veering too very much off-topic, I need these two servers, with complete root access, a supported FreeBSD installation, and a significant monthly transfer allotment. One of these servers (henceforth Server 1) is a fan site supporting a community nearly twelve years old, including hundreds of primary user e-mail and shell accounts and Web space for many virtual domains representing sites run by my users, as well as this blog; and the other (Server 2) is a highly popular fan-art community gallery for the same subject matter. We're talking many gigabytes of data on both servers. In other words, this blog is one of the least impactful and least attention-consuming things I'm in charge of running.
Things have been chugging along with the usual (I presume) ups and downs for the past two or three years. I've certainly had my run-ins with Managed.com, whose support staff seems to alternate between being monumentally clueless and being unaccountably deft at what they do. I'd grown used to the fact that Managed seemed to use some of the shoddiest hardware known to man, in that every two or three months I'd start seeing random hard drive I/O errors getting thrown in my kernel logs, and/or the machine would start pathologically crashing for no particular reason I could diagnose. I'd grown used to sending reboot requests to an e-mail address that seemed to pop out of its wormhole in India or Taiwan or somewhere, where the techs seemed very capable of rebooting my servers if they became wedged, but didn't have any other hands-on capabilities. (I guess they can do remote-reboots from halfway around the world, but they can't do any actual administration.) The on-site support staff often went out of their way to resolve my problems, usually by swapping my hard drive into a new chassis, after which the crashes would magically stop—at least for a few months, until the new hardware started getting flaky too. I tended to avert my mind from thinking about what these chassis were made of or actually looked like—the proverbial mental picture of gum and baling wire was what prevailed.
Indeed, about a month ago the hard drive started flaking out; I contacted support, and they offered to re-image me a brand-spanking-new machine, to which I could restore my heaping gigabytes of data. I didn't much like this idea; you see, my backup solution is to use the brilliant CVSup to mirror the contents of both machines to a FreeBSD box sitting in my garage. This worked great as a backup+restore solution back when we had a T1 into the house and thus symmetric transfers with static IPs; but that cost me more than $500 per month, and with cable Internet coming in at something like ten times the speed for one-tenth the cost, it was only a matter of time before I switched. The downside, of course, is the asymmetry—not only are uploads from the garage only 768K or so, I no longer have a static IP from which to run a CVSup server and contact it from outside the LAN to suck the data back up to the servers after being re-imaged. And thus my backup+restore solution is now a backup-only solution. Granted, I could make it work, but it's much simpler for me to be able to get the hosting company to simply remount the old hard drive alongside the new one and allow me to copy the data over at my leisure. This is what I convinced them to do a month ago, for a $50 above-and-beyond-the-norm service fee, and all was well.
Well, no it wasn't. During that procedure, after the operating system was re-imaged, Managed.com sent me an e-mail containing the fresh system's root password, which turned out not to work. But because it was late on a Friday, my increasingly worried inquiries received no response until some three days later. To get some sense of what this implies: Sendmail on FreeBSD is enabled by default. But none of the data had been restored, which means none of the user accounts had been restored, and I couldn't SSH in to either disable Sendmail or restore the data. Thus, all the e-mail coming in addressed to any of the 500+ shell users on my system were getting bounced back—not with a "Transient failure" or an "Operation timed out" or anything like that that would have caused the senders to keep trying, but with a "No such user" message, a permanent failure. Hundreds of users were losing messages by the armload. And day after day passed without there being a dang thing I could do about it. Nothing, that is, except turn off my nightly backup process on my garage server, to prevent it from deleting all its formerly backed-up data to match the server's present blank-slate state.
Finally someone got back to me, disabled Sendmail, and told me what the real root password was (they'd typoed). I restored the data and got everything back to normal.
A month passed sleepily in the Shire. And then came May 3.
Sometime during the late afternoon, I stopped being able to contact server 2 (fanart). Server 1 was still up, though, so this blog was still accessible, as was the main site and all my virtual domains; I had long forgotten about the announced migration to New Jersey, and I'd assumed (foolishly) that Managed would, y'know, like, warn me about it right before taking my servers offline. So I assumed that it had simply crashed, as it was so wont to do on this hardware. I sent a message to the reboot address, which generated the usual trouble ticket. A few hours later, someone in India wrote back to me to tell me that the server was in the process of being migrated. "Our admins will get back to you when it is done withis 24 hours," he said. 24 hours? Great. I can live with that.
But 24 hours came and went. Friday, the 5th, dawned without any word on the progress. I wrote back to the reboot guy for an update. "Yes, I believe we are expecting these servers to be online tonight," was his answer. Mmmkay.
Shortly after that, Server 1 went offline.
And in an eyeblink, it was the weekend, and I knew I couldn't expect any further responses from their "24/7" support service. I could only imagine the chaos they must be undergoing, whether they were dd'ing all the thousands of hosted servers' data across the network to the new data center's machines, or whether they'd put all the hard drives in a FedEx truck and barreled them across the country with instructions to ignore all stoplights and crosswalks. Nonetheless, it did nothing to assuage my concerns when I received the following on the evening of Sunday the 7th:
As you already know, Managed.com is being migrated to a new and better datacenter. All of the clients are migrated to new and better hardware. We are switching your existing servers to State of the art SuperMicro Servers that have better and faster performance. The new datacenter will have better Network; you will have our state of the art Trouble ticket system and advanced port monitoring systems. During the migration you may experience downtime, due to the migration to new hardware. We apologize for any downtime that you are having. Our Techs are working round the clock to migra t e your server to new hardware and confirm that the server is up and running perfectly. Please bear with us during the migration period. After the migration you will receive an email with new trouble ticket system access. New ticket system provides better and faster response so all of your issues will be solved on the fly. If you have any questions, please email email@example.com that our support team can assist you and create you trouble ticket login. Thank you WebHostPlus
I should point out that there should be a big [sic] next to all the quoted material here.
This message, with all its vacuous promises, did nothing to assuage my stress over the downtime which had by this point reached two days for Server 1 (including the blog) and four days for Server 2. Indeed, it spurred me to remember all the incidents of shoddy support, canned e-mail replies chosen from a drop-down menu (and often repeated robotically to me even after I explained that the previous canned reply didn't answer my question), crappy flaky hardware, and the fact that a quick Google search revealed a great many other hosting companies offering their own services, often greatly superior on paper to Managed's offering, and for quite a bit less money. And that's just within the subset of those hosting companies that support FreeBSD. In the heat of the moment, I made not one but two orders—one to The Planet (which came highly recommended by friends), and another to Hostik (which offered twice the hard disk space and transfer volume of any of its competitors, and was semi-local to boot). I figured I'd have to talk with either of them before they approved my order, so I could cancel as needed and save time this way. (I did cancel the ThePlanet server on Monday the 8th, but the Hostik order I placed on hold, telling the salesperson that I'd placed the order while under mental duress, and I wanted to give Managed some time to get their act together and to prove that the new data center really was "all that".)
My placing these orders was tempered with a sense of futility, however. The reason for this was that I knew I'd have to populate the new servers with my old data... and the only accessible backup that I had was sitting in my garage, behind a NAT, on a slow cable uplink. I could, if it came down to it, take the server to work and put it on the DMZ or something. But that was only to ignore another fatal flaw in my plan, which only made itself present upon close inspection:
After the previous month's hard drive debacle, I had not resumed the backup process. So my backup was a month old.
There's not much that's so effective in making the bottom fall out of your stomach as a discovery like this.
My servers remained offline throughout the 8th. I received no word from their support staff. I did, however, get my automatic EFT billing notice, which I thought was pretty tacky.
I resumed my fitful reloading of Managed's support page, which seems to be intermittently accessible now, and which had been replaced with the same status message that had been mailed out to everybody on the 7th. (I also kept checking WebHostPlus' site, looking for any mention of a status message or any hint of a time when their "Live Chat" button read anything but "Unavailable". It was evident that WHP had bought Managed some months ago, and this migration was the final stage in the acquisition. You'd think they'd mention that in their acquisitions news at the bottom of the page.) Finally, on Tuesday the 9th, the page was abruptly updated with this:
Managed Migration May, 9 2006.
Dear Migrated Customers, As you already know, your machines have been migrated from CA to a state of the art datacenter in NJ. As of tonight we are 90% done with bringing all of the servers online connected to our redundant gigabit network. We know many of you are wondering about your machines which are still not back online; please rest assured, we are in possession of your original HD and ALL of your data is safe. We plan within the next 24 hours to have ALL managed customers online in NJ on faster redundant hardware, as well as network. We understand that there has been mass confusion over the past and we greatly apologize; however, once all machines are online your customer support will resume as normal and you will not be experiencing long wait times for reboots, nor unanswered emails or tickets. Please begin contacting our support team firstname.lastname@example.org so they can create you a trouble ticket login on our system. Please include your First/Last Name, Server IP, and email address inside of the request so we can promptly create your login. If your machine has already come back online, you need not concern yourself with this email.
This answered some questions, but raised others. Yay, my data was safe! But to what did I owe my servers both being in the lucky 10% that were not yet online? I started to suspect that the fact that they were running FreeBSD might have something to do with it, because poking through WHP's server offerings, I didn't find any mention of FreeBSD support. I was starting to wonder if maybe WHP hadn't realized until late in the acquisition process that this meant they'd have to start supporting FreeBSD, and I also wondered whether 10% was a reasonable figure for the proportion of Managed's customers who were running FreeBSD. It didn't seem out of the question. Nothing much did.
But it came with an e-mail address, which I used with all my might, along with sending messages to Managed's support address, leaving messages on the live-chat board, and every other avenue I could find that allowed me to express my displeasure (there weren't many). Note that I was being as pleasant as possible, withholding any notes of impatience from my typed voice. These guys might well be in over their heads, having bitten off way more than they could chew. I didn't put any more faith in this new "within the next 24 hours" figure as I had in any of the previous ones, but I still was resigned to the fact that these guys had my data. I wasn't about to endanger it with an ill-aimed sling and/or arrow.
So I decided to change my tactics. I deleted a nearly-completed, vitriol-filled flame message that I was writing, and wrote a new one, which read:
Subject: I am pleading with you
What do I need to do to get a response from you-- any of you?
I'm willing to bake a plate of cookies just for an e-mail. Do you want
money? Gold? Jewels? What price do you demand? What do you want from me?
PLEASE reply. PLEASE.
And lo! The following morning, Thursday the 11th, I got a reply, from a guy billing himself as the "Director of Sales" at WHP. It read, and I quote:
We are checking on the servers onw
...Well! That's certainly progress.
Another day passed. I wrote back to the Director of Sales, asking as politely as I could whether the "checking" was still taking place. No reply. I noticed that his e-mail had contained a phone number, which I called, noticing the thick Manhattanite accent on the automated menu system. Finally I got the guy's voice-mail, on which I prepared to leave a message. BEEP! I prepared to speak, but then: This user's mailbox cannot accept any new messages.
Curses! Stymied once more. So I went back to reloading the support page. Eventually my patience paid off:
Managed Migration May, 12 2006.
WHP is proud to announce the migration has finally come to an end. We have approximately 99% of clients up and running and have almost hit the 10GB mark. All of the clients are now located at WebHostPlus, in Fort Lee NJ. At this new datacenter we have 100% redundancy in every aspect from the power to the pipes. We also utilize our in-house monitoring system which our administrators watch 24/7 for machines which unexpectedly go down. Many clients have noticed they might not be on the correct hardware; please rest assured we are aware of this and it is only temporary. In order to get over 7500 clients up in such a short period of time we had to utilize transit hardware. In the next 7 days you will begin receiving emails which you will need to reply to with your previous hardware specs. At that point we will schedule you for your hardware upgrade to keep downtime under 20minutes. We would also like to take this time to introduce you to NMS, Network Monitoring System. Many of Managed’s previous customers utilize firewalls thus not allowing the systems to be monitored. During the next week as we add your machines to our NMS you will receive an email with the NMS ip. Please add this IP to your allow list so our administrators can monitor your machines. Please begin contacting our support team email@example.com so they can create you a trouble ticket login on our system. Please include your First/Last Name, Server IP, and email address inside of the request so we can promptly create your login. Updates will be posted daily so please check back for more news.
99%! Just imagine—both of my servers are not only part of the exclusive 10% club, they've made it into the 1% Executive Diner's Echelon!
And yet nothing more was forthcoming for another whole day. Perhaps needless to say, there was no "NMS" to be found on WHP's site, no e-mail notifying me of how to get into it, and no replies to my inquiries to the proffered support address. It seemed to me a little bit of a sick joke to talk about how great and redundant the network infrastructure was when my servers were still unavailable. But by this point I knew that I could probably just relax and let this situation take its course. These guys had customers. They were clearly swamped well beyond their means. I didn't even expect there to be any more "daily" updates on the support page (indeed, there have yet to be any to date). An odd feeling of calm took me, because by this time my server at Hostik had been provisioned, with a crisp fresh installation of FreeBSD 6.1 (released only days before) and even the CVSup package installed, indicating that these guys, at least, know FreeBSD. Besides which, they'd responded to my request for a second hard drive (which I wanted for ensuring rapid asynchronous response by keeping the database content on a separate spindle from the static data) by upping the primary drive's capacity from 160 to 250 GB and giving me a secondary 120 GB drive, far more than I could use even if both my servers' content increased fivefold. I was a happy camper, all things considered. I figured I could even consolidate the two servers onto a single machine, thus halving my hosting costs. Huzzah!
But Friday came and went, and so did Saturday. I began looking up the location of WHP on Google Earth, noting with some interest that it's literally right across the George Washington Bridge from Manhattan. I know one or two people who live within punching distance.
Shortly after midnight between Saturday and Sunday, I sent one more message to the WHP support address, which to date had resulted in one (count 'em) reply, the one from Director of Sales Man who had assured me four days previously that he was "checking" my servers. But within minutes, I got a reply from someone with a real name in the support trenches. Server 2 was online! "As for the other server, it'll be a bit longer on it," he cautioned, but... Server 2 was online!
Sort of. SSH wouldn't connect. HTTP requests timed out. It felt to me as though it was having trouble resolving reverse DNS addresses; I suggested to the guy that he add a local name server to the /etc/resolv.conf file (Server 1 was its primary name server, and it was still down, of course), which he did within minutes. Success! And the tech and I continued talking for a while, discussing FreeBSD, the fact that I'd written a book on it, etc. He'd even tried installing it, he told me, but his hardware apparently didn't like the CD. Ah well—I'll have words with my editor, ho ho ho. Anyway. Turns out that he'd only been with WHP for a month, and he had in fact no experience with FreeBSD to speak of. This all but confirmed my suspicion that WHP hadn't planned on supporting Managed's FreeBSD customers. And incidentally, this is the downside of hosting companies that tout "no contract" among their selling points: true, you as the customer aren't locked in to any service term or anything. But it also means the company is under no obligation to keep providing you the service. Whaddyagonna do, sue them for breach of contract?
So I spent Sunday CVSupping the data over to the new server at Hostik. WHP might indeed have gleaming Emerald-City data-center iron, and they might employ one or two amicable and helpful people, but so far nothing they'd done had filled me with any confidence or instilled in me a desire to give them a nineteenth chance after my giving them a pass the first eighteen times. I was committed to moving away from them now. And it wasn't without a pang of regret—irrational, I know, but somehow I felt like I'd been through a lot with these guys. Granted, it was all bad, and sticking with them would be falling prey to battered-spouse psychology, and for all I knew WHP now comprised nothing of Managed's capital or personnel but the name, but I still felt like I was betraying somebody or other. Meh.
And then, Monday morning, Server 2 went down again. That is, it stopped accepting connections on any of its ports. I could still ping it, but no services were apparently running. I don't even know of a way it can get into this kind of state; it was almost like there was a firewall in front of it. I sent off yet another message reiterating that Server 1 was still down; and within an hour, that server abruptly came back online. I posted a blog message and started making the necessary repairs. I noticed that, true to their last support page update, the server was now on a choked-down Celeron that was floundering under the load of its various reawakening Perl processes, but at least it was there. I started sucking down the data from it to the new server.
But Server 2 was still being stupid and not responding to anything. I sent off yet another message reminding them that of the two problems my previous message had mentioned, only one had been fixed. And within an hour, I heard back from a second articulate tech person (they have more than one! Hip hip hooray!), who told me that he'd rebooted Server 2 and changed its password. I poked around and found that all was well, and resumed the transfer from it as well.
And that's where things stand as of now. Both servers continue to chug along, though Server 1 is still on its transitional crapware and Server 2 might randomly crash any minute. I'm making progress getting the data pulled off both of them—slow progress, but it'll soon be over. And then there'll be one more downtime as I shut down these servers and transfer the DNS to the new Hostik box. That one, hopefully, will be a smoother transition. And with any luck, it'll mean a much more trouble-free future. Because even if Hostik uses crappy hardware too, I can be pretty sure that they won't up and decide to vault themselves and all their hosted data across the country in the wake of a slipshod acquisition.
And if they do, I can always try someone else.
White Papers for Success
Several companies are now using dedicated hosting
services from a well know and cheap web hosting
companies for their company and business website. For web design
companies usually out source or take webhostine services. These companies not only provide the cheap domain name registration
services but also give complete SEO solutions. They make a web site design
for your company and then market the website with the help of search engine optimization services
. Many affiliate programs can be employed to market your website like cpc
and ppc but select the one which fulfills your requirement and goes with your business scope.