Ethereum Prysm client experienced a mainnet, resulting in resource exhaustion and a large-scale loss of blocks and witnesses.

AI Summary2 min read

TL;DR

The Ethereum Prysm client experienced a mainnet incident on December 4th, causing resource exhaustion and missing blocks due to processing out-of-sync attestations. This led to a drop in network participation and validator losses, with fixes implemented in subsequent releases.

Tags

EthereumSmart ContractsLayer 1Prysm clientmainnet incidentresource exhaustionblock loss
According to Mars Finance, the Prysm team released a mainnet recap report stating that during the Ethereum mainnet Fusaka session on December 4th, almost all Prysm beacon nodes experienced resource exhaustion while processing specific attestations, resulting in their inability to respond to validator requests in a timely manner and causing a large number of missing blocks and witnesses. The incident affected epochs 411439 to 411480, a total of 42 epochs, with 248 blocks missing out of 1344 slots, a missing rate of approximately 18.5%. Network participation dropped to 75% at one point, and validators lost approximately 382 ETH in witness rewards. The root cause was that Prysm received attestations from nodes that might have been out of sync with the mainnet. These attestations referenced the block root of the previous epoch. To verify their legitimacy, Prysm repeatedly replayed the old epoch state and performed high-cost epoch transitions, causing nodes to exhaust their resources under high concurrency. The defect originated from Prysm PR 15965, which had been deployed to the testnet a month prior but did not trigger the same scenario. The official temporary solution was to enable the `--disable-last-epoch-target` parameter in version 7.0; subsequent releases 7.1 and 7.1.0 included a long-term fix, using head state to verify attestations and avoid repeatedly replaying historical states. Prysm stated that the issue gradually subsided after 4:45 UTC on December 4th, with network participation recovering to over 95% by epoch 411480. The Prysm team pointed out that this incident highlights the importance of client diversity; if a single client accounts for more than one-third, it may lead to a temporary inability to terminate; exceeding two-thirds poses a risk of an invalid termination chain. They also reflected on the unclear communication regarding feature switches and the failure of the test environment to simulate large-scale asynchronous nodes, and will improve testing strategies and configuration management in the future.

Visit Website