Ethereum researchers remain mystified after blockchain cards briefly failed to complete
Last week, Ethereum briefly stopped completing blocks, raising concerns across the web3 community despite transactions continuing to be processed normally.
Two events rocked the Ethereum ecosystem on Thursday and Friday, with blocks fail to finish for three and eight epochs (about 20 minutes and one hour) in separate events. dYdX, a popular derivatives platform, paused deposits while waiting for the shutdown to resume.
Developers released updates for the two affected clients, Prysm and Teku, on Friday, but researchers remain unsure of the exact cause of the problem.
“I’m not sure any of us fully understand why,” said Ben Edgington of the Ethereum Foundation. “It is still under analysis exactly what the root cause of the problem was and why the chain recovered.”
This is the first major incident to hit the Beacon chain, Ethereum’s proof of stake (PoS) consensus layer that merged with the mainnet execution layer last September, and serves as a cautionary reminder of the experimental nature of blockchain technology.
Despite Ethereum being the #2 cryptocurrency with a $225B market cap and a $27B DeFi ecosystem, the protocol can still run into unexpected issues, especially as work continues on its disruptive upgrade roadmap.
Business As Usual
Ethereum users continued to successfully trade on the chain throughout the incident.
“Although the network was not able to complete, the network, as designed, was live and end users were able to transact on the network,” Ethereum Foundation so in a blog post. “After all clients caught up, the network was finished again.”
The Ethereum Foundation attributed the incident to an “exceptional scenario” that caused a high load for Teku and Prysm’s consensus layer clients. “The full cause of this is still being evaluated,” it added.
Teku and Prysm’s updates include optimizations that limit resource usage during periods of network congestion.
On Sunday, Ben Edgington of the Ethereum Foundation and Beacon Chain public health consultant Superphiz discussed the incident on YouTube.
Edgington said finality occurs when at least two-thirds of validators agree on Ethereum’s state during attestations after each epoch. He said last week’s incident manifested itself in roughly 60% of validators failing to attest at the same time, preventing the network from reaching finality.
“It [was] like 60% of validators went offline,” Edgington said. “To complete the chain we need two thirds or 66% of the validators showing up.”
The pair described the network’s recovery as a testament to the value of Ethereum client diversity, with only two of Ethereum’s five major clients suffering problems.
Edgington said Lighthouse client users did not experience any problems during the incident because Lighthouse speed limits reprocessing of old states. However, he said Lighthouse’s design could cause various problems in certain circumstances.
“As we’ve seen around these edges, it can actually strengthen things if clients take slightly different approaches because some will be able to carry the network where others fail,” he said.
Edgington and Superphiz agreed that it is likely that Ethereum will face similar problems again in the future.
While researchers are currently unsure what exactly triggered the termination issues, Edgington suggested that the speed of the network’s growth could be driving up the computational resources needed to validate Ethereum.
He noted that Ethereum’s validator count is up 2,500% since the Beacon chain launched in December 2020, and admitted that developers may have neglected large-scale testnet stress testing in recent years.
Edgington said Ethereum’s core developers have learned their lesson and will deploy large private testnets to “stress test some of these scenarios with more realistic validator numbers.”
State of emergency
While the Ethereum network regained completion on its own last week, Edgington and Superphiz noted that measures are in place to protect the network from a severe power outage.
Completion usually occurs after two epochs, but the Beacon chain enters an emergency state called “Inactivity Leakage” mode if completion does not occur after four epochs. In this mode, validators receive no reward for attesting, but receive escalating penalties for not doing so.
Edgington said the mechanism slowly drains ETH from non-performing validators until active validators come to represent a two-thirds majority and can complete the network again.
He said the mechanism provides protection against catastrophic events, such as war, that can isolate people living in different jurisdictions from each other. After about three weeks without closure, Ethereum would fork and recognize the block history maintained by the network’s remaining active validators.
Last week’s event had a “minimal” impact on validators, according to Edgington, with Ethereum’s nearly half a million validators losing a cumulative 28 ETH during a brief period of inactivity leak.