Postmortem: Moonbeam Collator Incident (September 4 2022)

Summary

On September 4 at 15:15 UTC one of our collators stopped syncing and due to an error in the monitoring rules the incident went unnoticed. As a result of this the collator did not produce blocks for a period of 27 hours.

What Happened?

An error was made in one of the many rules of our monitoring system. When writing this rule, we mistakenly used the wrong metric to determine the status of the collator in the candidate list. As a result, when the collator stopped synchronizing, the engineering team did not receive any notifications about the incident.

Customer Impact

Delegators who nominated and had their staked allocated to this collator did not receive rewards for a period of 4 rounds.

What went wrong?

Unfortunately, we received information about the incident from an external source, which is unacceptable.

What went well?

After receiving information about the incident, the failed collator was promptly restored as soon as possible and began to produce blocks.

Lessons learnt and action plan

We are going to implement strict checks and audits of all our monitoring rules. We are also in the process of automating testing of monitoring rules as much as possible before applying them to the production environment.


P2P takes full responsibility for the event that led to the weak performance and we are sorry for the inconvenience. Please be assured that we are taking actions to eliminate even a small probability of such an event occurring in the future.


About P2P Validator

P2P Validator is a world-leading staking provider with the best industry security  practices and proven expertise. We provide comprehensive due-diligence of digital assets and offer only high class staking opportunities. At the time of the latest update, more than 1.5 billion USD value is staked with P2P Validator by over 25,000 delegators across 40+ networks.

If you have any questions, feel free to join our Telegram chat, we are always open for communication.