Loss of Epoch Rewards in Graph Network: A Post-mortem

Post preview image


Due to an old version of an indexer agent combined with a short-time node desynchronisation, closed allocation resulted in a zero proof of indexing on February 6th 2021. Due to this rewards for the epoch vanished. Lost rewards will be fully compensated by waiving our fee and sharing our own indexing rewards with our delegators.

What happened?

An oversight in an upgrade process led to the indexer-agent version being out of date. A minor hiccup in Ethereum node operation coincided with the allocation closing, resulting in the closing of an allocation without a proof of indexing being submitted. Rewards will be fully compensated over the course of 12 days by charging 0% fee and sharing our own indexing rewards.

What went wrong?

We provide regular reward stats on a daily/weekly basis for delegators with locked GRT and thus had no monitoring of epoch rewards after closing an allocation. This led to a minor delay in reaction as reports are filled automatically without human intervention. Right after realising that rewards had been lost we set up an internal investigation.

Additionally was the absence of a well-reviewed upgrade procedure for new indexer releases. The upgrade procedure had no clear standard and checklist of actions to follow. New allocations were opened and no technical mistakes occurred, but existing monitoring that allocation was insufficient. We noticed that monitoring for a subgraph syncing delay had insufficient resolution time-wise so failed to catch a small hiccup in Ethereum node operation.

What went well?

We close and open allocations on a daily basis so we lost only one day of rewards. We collect statistics on stake changes every day and spotted a problem with reward crediting. We also thank our delegators who have noticed the issue and pointed concerns out in our Telegram channel.

Impact on clients

All our Graph Network delegators lost one day of indexing rewards. To compensate and mitigate their loss, P2P will waive the fees for 12 days in addition to distributing all the rewards from our own indexer stake for that period.

Lessons learned

We should have had monitoring for new version releases and checked that software is up to date before taking actions. To avoid such issues in the future we have defined a clear written procedure for upgrades and set monitoring for all necessary releases. This case revealed flaws in our existing monitoring so we have now initiated a deep analysis of all metrics improving it from ground up and adding necessary alerts to support this.

We have decided to make reviews for all standard operating procedures more rigorous by allocating more engineering time to these activities. Testing and improvement will be continued on the new testnet where we are already set up. All the crucial operations, upgrades and innovations that can potentially lead to a financial loss will be implemented on testnet first.

P2P takes full responsibility for this and we are sorry for the inconvenience. Please be assured that P2P is taking actions to eliminate even a small probability of such an event occurring in future.

Subscribe to P2P-economy

Get the latest posts delivered right to your inbox

Alex Bondar

Research & Analytics at p2p.org.

Read more