Hats off. Firefox shows you how to communicate with users when it broke down

Mozilla can inspire confidence, even talking about its mistakes. Post Erica Rescorli with a story about how his team dealt with the great failure of extensions, should read every manager communicating with customers.

If you are a Firefox user and have used the browser last weekend, instead of translating the pork shoulder to bacon, you have probably noticed that a large part of your normal extensions did not work. The big crash affected almost all user extensions. Eric Rescorla, CTO of the Firefox team, in a post on the blog of Mozilla told in detail how it came about, how the company reacted to the crisis and what it intends to do, so that it will never happen again.

These things always spoil on Friday afternoon.

As the CTO writes, over 15,000 are currently available. extensions to Firefox . To prevent suspicious and malicious programs from penetrating computers, each of them is signed with a certificate, which shows that someone from the Mozilla team has looked at it and decided that there is nothing suspicious about it. The problem is that the certificates have their validity dates. One of those used by Mozilla expired on May 4 at 3 am our time. About the fact that something bad was happening, the Firefox team found out at the end of a long working week, when he was happily running out of office on Friday at 18.

The team began with a series of steps that would make the situation not get out of control anymore and only then went on to solve the problem. Mozilla employees took into account several solutions, and because they were not sure which one would work best, he decided to work on the two that were the most promising. On the one hand, they prepared a patch for Firefox, which was supposed to change the date used to validate the certificate, on the other hand they generated a replacement certificate and convinced the browser to accept it instead of that expired. As a last resort, the latter option was chosen.

Setback. Repair. A clear message. Conclusions.

The team remained at work for an additional 9 hours to come up with and implement this contingency plan. At 2:44, a patch was sent to users, and after 12 hours it was already on most computers.

Rescorla describes not only what his employees were doing at the time, but also what options were taken into account and what obstacles they encountered along the way. Postmortem will be carried out only next week and, as promised by the CTO, conclusions from it will also be presented on the blog. One of the obvious is to make this situation never happen again.

Errors happen to everyone and you can not eliminate them. A good company gets to know how to deal with them and how to talk about them.

