Amazon Web Services Zone Struck By Lightning; Servers Go Down Temporarily
We have written many things on this blog over time, but that is a title we never expected to write. For the last 12 hours or so, our servers have been down, due to a fire in one of Amazon’s primary zones in which our servers are hosted. The fire was caused by none other than a lightning striking the location.
Amazon is obviously dealing with the issue and you can read the most updated status here. Some of our customers have been affected, but as of now, we are putting on our creativity hat to find a solution as quickly and efficiently as possible.
Meanwhile, we copied the thread of updates from Amazon’s site and pasted it below just in case you are interested in the events that led to our servers, along with millions of other customers losing their service overnight.
11:13 AM PDT We are investigating connectivity issues in the EU-WEST-1 region.
11:27 AM PDT EC2 APIs in the EU-WEST-1 region are currently impaired. We are working to restore full service. We are also investigating instance connectivity that we believe to be limited to a single Availability Zone.
11:51 AM PDT To find out if you have instances in the affected availability zone log into the AWS Console at https://console.aws.amazon.com, navigate to the EC2 tab and click on the “EC2 Dashboard” link at the top of the navigation bar. The affected availability zone will be shown under the Availability Zone Status.
12:10 PM PDT The issues with the affected Availability Zone are the result of a power failure in that zone. We are currently recovering power and anticipate that instances in the effected available zone will start to recover within the next 30-60 minutes.
12:33 PM PDT Please note, Availability Zone designations are different for each customer. This impact is limited to a single Availability Zone. Please log onto the AWS Console to see your account’s designated name of the impacted zone.
12:49 PM PDT We are still working to restore power to the affected zone. We are first working on restoring power to the EC2 network for this zone, at which point we can begin recovering instances.
1:29 PM PDT Some of our network devices have regained power, but we are having a problem with a generator which is preventing us from getting the affected zone back online. We do not currently have an ETA for recovery of the affected Availability Zone.
1:56 PM PDT Power to the majority of network devices has been restored. We are now focusing on bringing EC2 instances back online.
2:11 PM PDT Some instances in the affected zone have started to come back online. We are continuing to work on restoring the remaining instances.
2:46 PM PDT We have restored power and connectivity to approximately 50% of the affected instances. We continue working to restore the remaining instances.
3:01 PM PDT A quick update on what we know so far about the event. What we have is preliminary, but we want to share it with you. We understand at this point that a lighting strike hit a transformer from a utility provider to one of our Availability Zones in Dublin, sparking an explosion and fire. Normally, upon dropping the utility power provided by the transformer, electrical load would be seamlessly picked up by backup generators. The transient electric deviation caused by the explosion was large enough that it propagated to a portion of the phase control system that synchronizes the backup generator plant, disabling some of them. Power sources must be phase-synchronized before they can be brought online to load. Bringing these generators online required manual synchronization. We’ve now restored power to the Availability Zone and are bringing EC2 instances up. We’ll be carefully reviewing the isolation that exists between the control system and other components. The event began at 10:41 AM PDT with instances beginning to recover at 1:47 PM PDT.
3:55 PM PDT We are continuing to recover impacted instances. Note that EBS volumes in the affected zone also lost power. Some of these volumes and EBS backed instances are taking longer to bring back online.
5:06 PM PDT We are continuing to recover the remaining affected instances. It may be several hours until all remaining instances and volumes can be recovered but we don’t have a firm timeline and it may be longer to bring everything online. We recommend re-launching your instance in a different availability zone in order to get back up and running more quickly.
6:25 PM PDT We have started to see some of the EBS volumes in the affected zone recover. We are working to restore connectivity to the remaining instances and volumes.
7:37 PM PDT We are seeing slower than expected progress on recovering the remaining instances, but can now report that 60% of the impacted instances have recovered and are available. Stopping and starting impaired instances will not help you recover your instance. For those looking for what you can do to recover more quickly, we recommend re-launching your instance in another Availability Zone.
8:40 PM PDT We continue to see slow but steady progress in recovering affected instances, with 65% of affected instances recovered and available. Bringing additional EBS volumes back online is happening more slowly. We will continue to update you with progress and additional information as we have it.
9:36 PM PDT We have now recovered 75% of the impacted instances.
As we know from the rare instances in which we had technical difficulties in the past, our customers realize we are doing everything in our power to solve the issue and want nothing more than to continue generating revenue from their mobile apps. Stay tuned…
Please share your thoughts in the comments or on Twitter, Google+, or Facebook where we are always listening.
In addition, to sign up with inneractive and start monetizing your free apps now, click here.
Post Footer automatically generated by Add Post Footer Plugin for wordpress.







Comments
There are no comments on this entry.