Analyzing honest committee selection
Statistics will fuck you up lmao or how I learned to start worrying and hate statistical methods
A long time ago, I sat down to analyze a protocol we were designing. The protocol needed to choose a comittee from a large group of nodes. This is a verry common problem in decentralized networks. It applies to PoS, sharding, DA sampling, and a million other things.
My question was pretty simple: If nodes have an x%
chance of being honest, and we sample y
nodes, what is the likelihood that at least a supermajority (2y // 3 + 1
)
of the sample are honest? In other words, when selecting a committee from a large group of potential participants, how likely are you to get a committee whose behavior will not match that of an honest node?
Being young & naive at the time (I am now old & naive), I turned to statistics to assuage my fears. Statistics told me to model this as a hypergeometric distribution. We draw n
committee members from N
nodes, set K = x% * N
, and want to calculate the odds of getting k >= 2y // 3+1
. Writing this out in more appropriate statistical terms is left as an exercise for whoever isn’t bored by now. Long story short, statistics, the ancient enemy of my people, gave me a straightforward, compelling, & thoroughly wrong answer.
I was very satisfied. This answer let me cockily continue to do what I already knew was “right.” Statistics (that over-friendly crow) politely confirmed what I wanted to believe anyway: the odds of randomly sampling a bad committee were negligible. Maths were on my side.
Growth (inasmuch as I am capable of it)
It took me a couple more years to recognize my bad assumptions:
“Honesty” describes conformance to protocol rules. The assumption that
x%
of a population is honest is true only in a lab, because “honest” is not an objective quality of a node. I.e. honest nodes may lie, if lying is not forbidden by protocol rules.Multi-player games are reflexive. The committee takes protocol incentives into account when determining behavior & strategy within protocol rules. Optimal strategies are heavily contextual, and change with committee size and incentives.
Which is to say, statistics is the wrong tool. We cannot simply assume that honest nodes refuse to deviate when sampled, as the act of sampling may change their behavior. Instead we need to examine incentive compatibility. We ask whether the sampled nodes have a more-profitable behavior, that is still “honest” according to protocol rules. In what cases can the committee’s vote profitably deviate? Have we shaped incentives to prevent that from occurring?
Sunflowers turn to face the sun. Magnets align with the local field. Even honest nodes point themselves towards maximum value extraction. We orient ourselves to opportunities. Inefficient extraction strategies collapse to efficient ones without warning. Phrasing this in terms of Crosby, Stills, Nash, & Young equilibria is outside the scope of this post, but if you want to correct my terminology here, please send an email to james@nomad.xyz.
Concretization: ALC
To use a concrete example, consider the Altair Light Client sync committee under a statistical lens. It would seem that Ethereum uses a 1/3+1
honesty assumption, and therefore we can rely on that when sampling nodes for the sync committee. This gives an incredibly, impossibly low chance of hitting a malicious committee via random sampling. However, this result falls apart under cross-examination.
Nodes signing invalid blocks are honest under ALC rules, as long as they also sign the correct block. Which is to say, the ALC protocol explicitly permits signing invalid blocks! As a result, ALC sync committee members cannot lose by lying, but can gain. Deviations from correct reporting are +EV within the context of the Ethereum protocol. Accepting a bribe of even 0.01 ETH for a lie is positive EV for a sync committee member, and is therefore the “correct” strategy. In other words, sync committee members should tell the truth to the L1 chain & lie their asses off for anyone that fronts cash.
Let me reiterate: A sync committee member that accepts a bribe is still honest according to the definition of the Ethereum protocol. Lying is “honest” behavior because it fulfills all obligations to the protocol, without taking any proscribed actions. Assuming that 1/3+1
nodes honestly follow protocol rules is not an assumption that sync committee members refuse to lie for money. Lying is explicitly permitted, and likely the highest EV strategy.
On the other hand, lying is not the best strategy in the Ethereum Gasper protocol. Validators in an epoch committee cannot effectively profit by deviating from the protocol rules1. Because the ALC explicitly permits deviation, we can say producing only correct attestations is not incentive compatible. Another, strictly more-profitable, strategy exists. The incentive-compatible strategy results in false attestations in addition to correct ones. Because the strategies & outcomes differ from the main protocol, ALC is not a light client. Sampling more committee members is not effective at mitigating this, because each committee member individually has incentive to deviate regardless of the behavior of other committee members.
Because ALC correctness is not incentive-compatible, relying on the ALC committee alone is not recommended. It should be supplemented with another system.
Wrapping up
In conclusion, analyzing honesty via statistical methods is naive. I was naive. It conflates honest adherence to protocol rules with non-deviating strategies. We know how to do better. We must analyze honesty via incentive compatibility & expected value of deviation. When EV of deviation from the default strategy exceeds coordination costs, committees fall apart. We can prevent this by penalizing deviations.
don’t @ me I don’t want to get into a full gasper incentive analysis