No announcement yet.

server down for first time in 18 months

  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thanks guys, but nightmare night tonight unfortunately.

    Server was shut down at 9pm for drive replacement after raid failed last night.

    The server refused to accept a new drive so Cristiano is now rebuilding the server after 8 hours downtime so far. Unfortunately I think the DC guys waited too long before deciding to rebuild. My first client has woken up abroad and I've been shouted at, which wasn't nice.

    I'm expected alot more swearing and phone calls if the server isn;t up by at least 7am.

    4 years of hard work building up my reputation, my clients are all very important as web hosting is only a small part of what we do for them. I promised them when we moved to dedicated servers this couldn't happen because of "raid".

    Just goes to show, it's not 100% reliable. I'm dreading what will happen if the server isn;t up by 7am max. PLEASE Cristiano. PLEASE EUK, save me the stress.

    A good man is on the job, so lets hope.

    I was going to upgrade to a quad soon...

    UPDATE: Peter Murphy now on the job 5.37am, OS installed, cpanel installing, 60-80 accounts still to restore.. Me getting very stressed. Hoping EUK pull this one off before 7am to save me from unimaginable torture.
    Last edited by sihost; 14-09-2011, 04:39.
    Cheap international calls from mobile phones >


    • #17
      Update : Dedicated server down for 10 hours and counting

      I've remained calm so far. Now 2 hours with no updates.

      10 hours and the server is still down. Mark assured me this would never happen again if I went dedicated. I'm really going to start getting stressed about this now as clients are starting to get up and I've had calls already.

      I am NOT pleased, I asked for this to be resolved by 7am. 10 hours downtime is UNACCEPTABLE, it will ruin my business if this is not resolved within the next hour.

      Thanks EUK. THANKS.
      Cheap international calls from mobile phones >


      • #18

        Peter has been in constant touch with you through the ticket VFX-969-26486. User data copy has been completed and sites are up now. I would request you to verify and let us know whether all sites are working fine.
        Wordpress Hosting - Your Adventure, Your Story!
        Website Security Bundle - Great Value Security Solutions in value-for-money package!
        Follow eUKhost on Twitter || Join eUKhost Community on Facebook


        • #19
          He has yes since the 2 hour gap, I was panicing.

          I have be able to keep clients as calm as possible and things are now settling.

          I'm not happy about the 14 hours downtime and will want answers to why the server wasn't able to accept the new drive and the chances of it happening again.

          EUK have given great service since we had the dedicated so we need to consider how we move forward and expand the hosting business in the best way possible now we know Raid isn't as reliable as thought.

          For now I'm continuing to liase with Peter to iron out any remaining issues.
          Cheap international calls from mobile phones >


          • #20
            Hi sihost,

            Maybe you had issue with reused SATA drives?

            I choose eUK X3430 i5 with 2 x 300 GB SAS 15k rpm as they're much faster and as they're newer I'm guessing they'll be much more stable.

            The eUK X3430 i5 with 2 x 300 GB SAS 15k rpm with RAID1 is money well spent.

            May want to consider upgrading to that setup when you can.

            I just got eUK E5200 and hoping I don't have same issue due to crappy reused sata drives. I ordered 250gb and they used 160gb drive at first but they replaced drive today and did os resinstall. Just need to sort IPs now.

            All the best
            cPanel Hosting | Fastest Hosting | WordPress Hosting | Web Hosting Forum


            • #21
              Thanks yeh that's one of the servers I considered purchasing, interesting, I thought they used new drives!

              This comment in old emails
              "A dedicated server is a single point of failure, but it takes less than an hour no matter what goes wrong with the dedicated server. "

              I've been awake 30 hours and it makes me chuckle now
              Cheap international calls from mobile phones >


              • #22
                Hi Simon,

                Such problems can be resolved within an hour or two if right decision is taken at the right time. In your case, our team tried to replace 1 drive from RAID and tried RAID rebuild, if they had chosen to replace both drives and restore data from the backup, they would have managed to restore your server within an hour or two, unfortunately our focus here was to retain latest data and that's why we had to go with the RAID rebuild option.

                I have asked our technical people to install the RAID monitoring and Drive performance check script on your server to avoid any such problem in the future. Your server has been added in our internal monitoring system, so any service failure problem should not bother you. I'll let you know once the drive check and RAID monitoring scripts are installed.
                eUKhost - eNlight Cloud Hosting || eUKhost Knowledgebase
                Toll Free : 0808 262 0255 || Skype : mark_ducadi


                • #23
                  Hi Mark,

                  After the 5 hours taken trying to get the server to accept the new raid drive I do agree, the right decision was to rebuild using data from the old drive survived from the original raid.

                  THis is because Tuesday nights backups failed as the server was down, so it would have created more problems using Monday nights backups. I'm grateful for the team for retaining this data.

                  I've only had 2-3 reports of lost emails from Tuesday and some work from Monday afternoon after the initial server crash. Cristiano agreed this was probably due to one raid being degraded throughout Monday afternoon and Tuesday.

                  Server monitoring is already installed (to my knowledge) and so is raid monitoring, Cristiano set that up and it is this which alerted us on Tuesday morning at 4am to the degraded drive. However, I decided to wait until Tuesday evening 9pm out of business hours to replace the degraded drive.

                  Fate played it's part there as a 1 hour replacement became 14 hours downtime (the reasons I now fully understand). Thankfully only the last 2-3 of these 14 hours were working hours.

                  I've been looking at my options for the next step in the hosting side of the business. I would have liked 2 servers mirrored and load balanced, however it appears this e5400 can't be mirrored/load balanced and I would have to buy 2 new servers, e5400 are now more money and in maidenhead.

                  I'm not sure at the moment which way to go or whether to just take on a quad when I need it, after all, in 18 months+ I've never needed mirroring as the service and uptime has been great in MK (just as you said it would)
                  Cheap international calls from mobile phones >