Friday, April 5, 2013

Information about the Hotmail / Outlook.com failure on March 12

Outlook.com links preview a couple of weeks, and heard that we shared, that we begin the hundreds of millions of people, with Hotmail on the new, modern Outlook.com experiences to update.  We had done several pilots during the preview period and learned a ton. Overall the upgrade is very frozen - upgraded people have much more quickly than we had expected.  The vast majority of people who use our services are enjoying a smooth experience during this time and the new experience of Outlook.com.  That is, we had a problem yesterday and offer a deeper insight into what was going to happen.

Before we want to dive, in the detail we anyone sincerely to apologize, which could be accessed during the interruption to their emails. Failures are something we take very seriously and invest a significant amount of our time and energy in doing our best to prevent.

At 13:35 PM PDT on March 12, 2013 there is an interruption of the service, some of the people concerned access to a lesser extent, of SkyDrive service, but primarily Hotmail.com Outlook.com. Availability has been fully restored in the course of the afternoon and evening and 5:43 am PDT on March 13, 2013.

In the afternoon of the 12th, in a physical region in one of our data centers, we conducted our regular process of updating the firmware on a central part of our physical plant. This is an update, it had not done before successfully, but in this particular case in unexpected ways. This mistake led to a rapid and extensive temperature spike in the data center. This tip was important enough, before it was reduced, it causes our guarantees to come to space for a large number of servers in this part of the data center.

Access to mailboxes prevents these security measures installed on these servers and also does not prevent automatically to allow failover and continued access for other pieces of our infrastructure. Parts of the infrastructure of Hotmail.com Outlook.com and SkyDrive hosts this section of the data center, and so were some of the people who are trying to influence access to these services.

As soon as the protective measures on these systems appeared, the team was immediately alerted and immediately began to work to restore the access. On the basis of the failure scenario, there was again a mixture of infrastructure software and human intervention, which was necessary to the core infrastructure to bring online. Requiring this type of human intervention is not the norm for our services and restoring significant time added.

Show you, that brought team access in waves through the night. The majority of the affected mailboxes have been restored before midnight and 05:30 rest completed fully.

We hope that this helped to provide understanding of the incident and again we sincerely apologize and regret the impact of this failure had on you all.  Now that we are through the resolution, we are also hard at work to ensure that that does not happen again.

https://status.live.com always the best and most reliable way to get real time information is specifically for all service issues that we take, and if you are logged in, in, is adjusted, based on the health of your specific account.

--Arthur de Haan, Vice President


View the original article here

No comments:

Post a Comment