In a previous post we discussed Crusader Broadcast and showed a $O(n^2)$ words, $O(1)$ time solution for $f<n$ and assuming a PKI. In this post, we overview a new Dolev-Reischuk style bower bound (see our full paper):
Theorem AS22: In any deterministic protocol solving Crusader Broadcast with $f<n$ Byzantine failures there exists at least one run in which the nonfaulty parties send at least $\Omega(nf)$ messages, if the adversary can simulate nonfaulty parties.
Note the difference with the (modern view) of the classic Dolev-Reischuk Lower Bound:
Theorem DR82+: In any deterministic protocol solving Broadcast with $f<n$ omission failures there exists at least one run in which the nonfaulty parties send at least $\Omega(nf)$ messages.
Three obvious differences:
- DR82 holds even against the weaker omission failure adversary, while AS22 holds only against a Byzantine failure adversary.
- AS22 holds even for the weaker Crusader Broadcast problem, while DS82 holds only against the Broadcast problem.
- AS22 holds only against computationally bounded adversaries (formally, when the adversary can simulate), while DR82 works even against unbounded adversaries.
What does this new Lower Bound imply on upper bounds?
-
In the previous post we showed a protocol with a PKI that uses $O(n^2)$ messages for crusader agreement. Getting $O(n)$ words for strongly adaptive adversaries with a PKI seems to be an open question.
-
For $f<n/2$ with a PKI it is easy to get $O(n)$ messages, and with $O(1)$ size using threshold signatures. Our lower bound shows that a PKI is required for this.
-
For $f<n/3$ and against a computationally unbounded adversary, it is easy to get $O(n^2)$ messages and $O(1)$ rounds. Our lower bound shows that this is optimal for deterministic protocols.
Recap: Crusader Broadcast
As a reminder, consider a network of $n$ parties with up to $f$ Byzantine faults. Let $s$ be a designated sender with some input $m$. Every party outputs either a message $m’$ from the protocol, or a special non-message value, $\bot$. This value is used to signify that either no message was received from the sender, or that conflicting messages were received.
A Crusader Broadcast protocol has the following properties:
- Validity. If $s$ is nonfaulty, then all nonfaulty parties output $m$.
- Weak Agreement. If two nonfaulty parties output $m,m’$ such that $m\neq \bot$ and $m’\neq \bot$, then $m=m’$.
Lower bound: high level idea
In order to prove this lower bound, we will construct an adversary that attacks any deterministic protocol in which no such runs exists (see the full paper for the adaptation to randomized protocols).
-
Like in the Dolev-Reischuk lower bound, the adversary isolates a certain nonfaulty party $i$, and makes sure it talks only with parties the adversary controls.
-
Unlike the Dolev-Reischuk lower bound, the adversary cannot just omit messages, in fact it needs to be able simulate an alternative world.
-
Unlike the Dolev-Reischuk lower bound, the adversary acts as if the sender sent the value $1$ when talking with the isolated party $i$, and acts as if the sender sent the value $0$ when talking with the rest of the nonfaulty parties. This leads $i$ to output $1$ and every other nonfaulty party to output $0$, which is a contradiction to the Weak Agreement property.
Lower Bound: Proof of main Theorem
In this proof, we will assume that $n\geq f+2$ in order for the task to be non-trivial. In addition, we will assume extremely strong synchrony: lockstep synchrony. In this setting communication proceeds in rounds, and messages sent in the beginning of a round reach their destination by the end of the round. If no protocol exists in such a system, then it cannot exist in any system with weaker synchrony assumptions.
We will start by assuming by way of contradiction that there exists a deterministic protocol solving Crusader Broadcast in which all nonfaulty parties send no more than $\frac{1}{4}(n-1)f$ message in any run of the protocol.
In this case, observe two worlds $W_0$ and $W_1$. In $W_0$ there are no faults, and $s$ has the input $0$. Similarly, in $W_1$ there are no faults, and $s$ has the input $1$. Note that in $W_0$ all parties must output the value $0$ and in $W_1$ all parties must output $1$. If our attack manages to make some parties think they are in $W_0$ and some parties think they are in $W_1$ (using an indistinguishability arguments), we will reach a contradiction to the weak agreement property.
We would like to observe the behaviour of the nonfaulty parties in these worlds in order to isolate some party $i$, other than the sender (which we cannot fool because it knows its own input).
Claim 1: There exists some party $i\neq s$ which communicates with no more than $f$ parties in both $W_0$ and $W_1$ combined.
Proof: Assume by way of contradiction that every party $i\neq s$ communicates with at least $f+1$ parties in $W_0$ and in $W_1$ combined. This means that in total in both worlds, every party either sent or received a message from at least $f+1$ parties. Define $M_i$ to be the number of messages sent or received by party $i$ in $W_0$ and in $W_1$ combined. By the above assumption, we know that $\sum_{i\neq s} M_i \geq (n-1)(f+1)$. Unfortunately, summing over all the $M_i$ we count every message up to twice: once in $M_i$ if $i$ sent the message and once in $M_j$ if $j$ receives it (the only time when we don’t is when either $i=s$ or $j=s$). Therefore, we can conclude that at least $\frac{1}{2}(n-1)(f+1)$ messages were sent overall in both $W_0$ and $W_1$.
However, we assumed that in our protocol all nonfaulty parties send no more than $\frac{1}{4}(n-1)f$ messages in any run of the protocol, including in $W_0$ and in $W_1$. In total, we get that they sent no more than $\frac{1}{2}(n-1)f$ messages in both worlds, reaching a contradiction.
Now using Claim $1$, we know that there is some party $i$ which communicates with no more than $f$ parties in both worlds. All that is left to do is isolate $i$ and make it think it is in $W_1$ while the rest of the parties think they are in $W_0$. We will show that in the following claim.
Claim 2: There exists a strategy for the adversary in which $i$ outputs $1$ and all other nonfaulty parties output $0$.
Define the world $W_{hybrid}$ in which the adversary levies an attack. In this world, $s$ has the input $0$. Before the first round, the adversary simulates runs of both $W_0$ and of $W_1$ in its head. It then corrupts all parties with which $i$ communicates in either world. By our previous claim, the adversary needs to corrupt no more than $f$ parties in total in both worlds, so it is allowed this many parties. Define $P$ to be the set of parties controlled by the adversary.
The adversary now simulates the whole run of $W_1$ in its head round-by-round, controlling all parties other than $i$. If in any round a party $j$ sends $i$ a message in $W_1$, the adversary instructs $j$ to sends the same message to $i$ in $W_{hybrid}$. If $i$ ever sends a messages to some party $j$ in $W_{hybrid}$, then the adversary simulates $j$ receiving the same message in the run of $W_1$.
Similarly, the adversary simulates a full run of $W_0$ in its head, controlling only parties in $P$ and $i$. Whenever a faulty party receives a message from a nonfaulty party other than $i$ in $W_{hybrid}$, the adversary simulates the same message being sent in $W_0$. Whenever one of the parties in $P$ sends a message to a nonfaulty party other than $i$ in its simulation of $W_0$, it instructs that corrupted party to send the same message in $W_{hybrid}$ as well. The adversary also simulates all message to and from $i$ in $W_0$, but does not send any corresponding message in $W_{hybrid}$.
In total, this means that $i$’s communication in $W_{hybrid}$ proceeds according to the adversary’s simulation of $W_1$. In addition, every other nonfaulty party’s communication in $W_{hybrid}$ proceeds according to the adversary’s simulation of $W_0$. In both of those runs, no messages are exchanged between $i$ and other nonfaulty parties, so the run continues being consistent with both of these simulations in each round. Since $i$’s view is indistinguishable from its view in $W_0$, it must output $0$ in $W_{hybrid}$. Similarly, every other nonfaulty party’s view is indistinguishable from its view in $W_1$, so they must output $1$ in $W_{hybrid}$, reaching a contradiciton.
Generalizations and Remarks
The main insight in the above lower bound is that if a small number of messages is sent, then at least one nonfaulty party can be isolated and made to talk only with faulty parties. The rest of the proof is essentially just bookkeeping we do in order to make sure that after isolating that party, we can make its run look correct, and other parties’ run look correct as well (but with $s$ having different inputs). In this second part of the proof we make use of the assumption that the adversary can simulate other parties.
Using this insight, this lower bound can be generalized to randomized protocols with some probability of failure. This is done by noticing that if a small number of messages is sent in expectation, there is a good probability that some party talks with no more than $f$ parties overall in two runs of the protocol. In order to do so, we require a strongly-adaptive adversary that can corrupt parties on the fly and delete messages that haven’t been delivered yet. This attack can be generalized even further, to isolating a large number of parties if a very small number of messages is sent in expectation (smaller than $O(nf)$). Then we can make a large number of nonfaulty parties output $0$ and a large number of nonfaulty parties output $1$ instead if we desire.
There are also a few important facts to notice in our attack. First of all, we only use the number of messages sent as a proxy to the number of parties each party communicates with. More precisely, think of a communication graph, in which vertices correspond to parties and there is an edge between two parties if they communicate throughout a run of the protocol. This whole proof works as long as the number of edges is small, not the number of messages. In addition, as long as the communication graph is deterministic, we can isolate a party. After that, our attack should work even if the protocol is probabilistic (but with a deterministic communication graph).
The keen-eyed among you might have noticed that if the protocol is fully deterministic, we actually didn’t need to simulate messages round-by-round. The adversary could have just ran its simulation of $W_0$ and $W_1$ and sent messages accordingly. The stronger proof above generalizes to probabilistic protocols with deterministic communication graphs.
Summing up, we actually got a stronger result: Any Crusader Broadcast protocol with a deterministic communication graph requires at least $\Omega(nf)$ communication edges, regardless of how many messages are sent, or if the content of the messages is probabilistic.
Relationship to Eclipse Attacks in Blockchain Systems
The classic Dolev-Reischuk lower bound has been known for decades, but blockchain systems have been designed with linear communication costs per agreed upon value. One way we can understand this discrepancy is by noticing an important feature of the original Dolev-Reischuk lower bound: it uses silence. In the original attack, a party is isolated, and no messages are forwarded to it. By the definition of the consensus task it must output something regardless of whether it heard any message. All that is left to do is making sure that other parties output another value, and the attack succeeds. However, this definition of the consensus task doesn’t really make sense in the blockchain world. It’s hard to imagine a protocol in which nodes decide that some transaction took place if they didn’t hear anything. To that end, in this post we discussed Crusader Broadcast instead, in which parties are allowed to output $\bot$ (i.e. no decision) if they heard nothing.
One type of attack researched in the blockchain literature is the eclipse attack. In such attacks, an attacker makes sure some node (or several nodes) only communicate with attacker nodes. During this step, the attacker utilizes the fact that the communication graph in many blockchain systems is very static and has very few edges (generally, a constant number of edges per node). Even worse, the attacker can influence the communication graph and make it so that certain nodes only talk to nodes controlled by the attacker.
After that, the attacker filters messages to and from the isolated nodes to different ends. For example, it can publish a certain block to one part of the network, and a different block to a different part of the network. This attack is hard to execute, since mining two blocks in quick succession is extremely unlikely. Thinking about this difficulty in terms of the lower bound we discussed, this translates to a difficulty to simulate: it is hard for the adversary to “simulate” two different runs of the protocol with two different blocks being mined. In order to mitigate this difficulty many different attacks have been suggested. For example, some attacks simply drop communication from certain parts of the network which is very easy to “simulate”. This helps raise the relative mining power of the attacker. More complicated attacks can publish blocks to only parts of the network. Those parts will then mine on top of those blocks, essentially using the eclipsed network’s mining power in order to “simulate” another run of the protocol. Many other types of concrete attacks use various tricks in order to essentially simulate the desired run of the protocol.
In general, we hope that noticing the connections between Dolev-Reischuk style attacks and eclipse attacks can lead to a deeper theoretic understanding of eclipse attacks in the wild. This also suggests ways to mitigate eclipse attacks: either make it harder to simulate, have a probabilistic communication graph which cannot be influenced as easily by an attacker, or have a denser communication graph.
Your thoughts on Twitter