User loginNavigation |
Service Outage: 12/19/2009 9AM – 1PM ESTSOAR Customers, It is our policy to provide you with notification of any issue that has interrupted service to your website. We do this to ensure good communication with our customers and to make you aware of the issue so you can answer questions raised by your unit members. Service Outage – 12/19/2009 9AM – 1PM EST At approximately 9:00 AM EST on Saturday December 19th, one of our Database Servers started having issues. The hard drive controller started producing intermittent errors which started corrupting data on the hard drives themselves. Some customers lost access to their sites at this time as individual customer databases became corrupt due to the hardware errors. At approximately 9:45 AM EST the hard drive controller completely failed and triggered an alert notification for the SOAR Support Staff. At this point those customers associated with that Database Server lost full access to their websites. We registered a ticket with our dedicated server provider and they got to work on the issue. By 11:00 AM EST our service provider had replaced the hard drive controller and all hard drives on the server in question. The hard drives were replaced as a preventative measure given possible long term damage from the hard drive controller errors. Once the server was online SOAR began the process of rebuilding the server with software and customer data from backups. Just after 1:00 PM EST the restoration of the server was completed and customers again had access to their websites. SOAR does the following to help reduce downtime:
While this is the longest single downtime since January, overall we are pleased with the controlled manner in which it was dealt and the speed in which the server was restored to service. 1 hour for hardware and 2 hours for software. Currently our redundancy strategy is at the individual server level. As we grow and can introduce additional infrastructure we will be looking towards pairing servers for redundancy so that when one goes offline, another will take the load while a repair is occurring. Service History We thought it would be a good idea on each Service Outage announcement to include a record of service interruptions during the last year. While we strive to provide 100% service availability, unscheduled service outages will occur from time to time. When these do occur our upfront planning will help to minimize the duration of these service outages. Our service outage history for the last 12 months will better demonstrate that with facts.
By soar at 2010-01-04 16:27
|