It's been three days and I still find myself shaking my head in amazement. About a week ago on Friday, I got asked to sit in on a con-call about a POC (Proof of Concept) that the bosses were thinking of taking on (a POC is where companies have us set up equipment in our labs to prove that our stuff is going to do what we say it will do). The deal was that another company was trying to sell some of their gear, that works with our gear, to Farmer's Insurance. Our storage systems will talk to each other so that data on one system can be replicated to another system some distance away so that if there is some sort of disaster at one site, the data is still available at the remote site. This, of course, requires sending large amounts of data across networks and as the distance between the sites increases, the time it takes for the information to get there increases. This other company makes gear that speeds up the transfer of the replicated data and they wanted to get into our lab to prove that their stuff works with our stuff (which the customer already had).
The problem was that it was Friday, and they wanted to show this to the customer on the next Friday. Oh yeah, and this was the Friday before the Memorial Day weekend so there would only be four days the next week to set everything up for the demo. Of course, management said "It should be easy. Let's do it." So we stared scrambling around to try and get ready for the gear that would be coming in next Tuesday. Being the Friday before a long weekend, management usually sends out an email around 3:00 saying "Enjoy the long weekend. Feel free to leave 'as your schedule permits'". It's kind of a running joke that I always get tasked with something just before a long weekend and thus never get to 'leave as my schedule permits', and that day was no different.
So the next week (last week) comes and we get the equipment in and start setting it up (along with one of the guys from the company that makes the gear) and start slogging through no end of problems - mostly with EMC gear. The basic test is to set up a program that writes data to one of the EMC storage systems which then replicates it through the other company's gear to another EMC storage system and compare how fast it is both with and without the other company's "speed-up" gear. After fixing cable problems, hardware problems, and various other road blocks through a couple of late nights, we were finally ready for a dry-run on Thursday. Thursday morning, one of the account team members calls and says that they don't like the data we are getting and want to switch to a different load generating program. We tell him that it's too late to switch at this point and he hangs up in a huff. We got through the dry-run pretty easily but then went to try one more test and the third-party gear had a failure. We stay late on Thursday trying to get one of their boxes running faster but they finally determine that it's some kind of hardware problem that they can't fix in time so they'll just wing it and hope the customer doesn't notice. The demo, by the way, is actually taking place in San Jose via a remote video connection to our labs. The customer and the account team will be in SJ and will remotely connect to our computers here (just like I do when I view Dad's computer from home) to see the demo.
So Friday comes around and we make one last run through of the test and, of course, everything goes to Hell. Turns out the program we have been using to generate the data has filled up the disks of the storage system and won't run anymore. The process for creating the whole test environment has taken three days and we are faced with wiping out everything and rebuilding in about three hours. We first thought that maybe the disks were just very close to being full and that maybe could get a few more runs out of it so we started up the program again. It looked like it was going to run but it was taking a very long time to get started. We were watching it set itself up and as some numbers were incrementing, it was looking like it was going to take about an hour to get going. Cool, that would leave us with two hours to spare. Except that after about 15 minutes into it, it seemed to be going slower and it now looked like it was going to take two hours to start. Hmm.... A little while later it looked like it was going to take about 2.5 hours to start. Needless to say the tension was high. With about an hour and a half until show time, I called the account manager and told him our problems and that we may have to fall back to an alternate method of data generation which wouldn't provide them all the data they wanted. He said "Well, that would be very unfortunate. Please keep me posted." With about 30 minutes to go, the test finally finished initializing but when it started, it still would not generate any data. Crap! I had an idea that maybe I could re-initialize to a tiny fraction of the size of the original system and it still might work. So I stopped the test so that I could re-initialize - except that it wouldn't stop. I finally had to kill the test and reload the whole program. I fired it up, raced through the configuration of the test parameters and set the data size to 3% of it's original value and hit the Run button. It thought about it awhile, built it's test files and, praise God, started generating data. At that exact moment, I got a text message from the account manager that said "Well, are we ready or do I have to start dancing?" I fired back a text message saying "We are go!". We quickly reset for the beginning of the test, turned on the remote video session, dialed into the conference call, ran through the whole scenario without a hitch, said "That's about all there is to it.", to which they said "Thanks very much.", disconnected, and then collapsed into a heap.
One of the guys got a text message later that said something like "Wow, you guys hit that one out of the park!" Like I said, I'm still shaking my head over this. We were so close to absolute disaster to have it come off so well almost shakes my faith in atheism.
No comments:
Post a Comment