Living with Asynchrony: Bracha's Reliable Broadcast

In this series of posts, we explore what can be done in the Asynchronous model. This model seems challenging because the adversary can delay messages by any bounded time. By the end of this series, you will see that almost everything that can be done in synchrony can be obtained in asynchrony. The next posts in this series are about gather, round complexity, and finally our series on Asynchronous Agreement.

We begin with Bracha’s Reliable Broadcast from 1987 (conference version from 1984). This is one of the most important build blocks for Byzantine Fault Tolerant protocols in the Asynchronous model. In this standard setting there are $n$ parties, where one of them is designated as the leader. The malicious threshold adversary can corrupt at most $f<n/3$ parties. The leader has some input value $v$ and a party that terminates needs to output a value.

Reliable Broadcast Properties

(validity): If the leader is non-faulty then eventually all non-faulty parties will output the leader’s input.

(totality): If some non-faulty party outputs a value then eventually all non-faulty parties will output a value.

(agreement): All non-faulty parties that output a value, output the same value.

Since this protocol is supposed to work in the asynchronous model, these properties use the term eventually. This term is often used in asynchrony to indicate that no matter what the adversary does and for how long it delays messages, some event will occur after a bounded number of message processing at each party (assuming a bounded threshold adversary).

Reliable Broadcast

The high level idea:

First, we force the leader to send just one value: we do this by requiring each party to echo just one message and wait for $n-f$ echo messages before voting for it. Since any two sets of $n-f$ must intersect by at least $f+1$ parties, it cannot be that two different non-faulty parties echo different values.
Second, we make sure that if a non-faulty delivers a value, then all non-faulty will. We do this by requiring a party to send just one vote after seeing either $n-f$ echo messages or after seeing $f+1$ votes. So if any party sees $n-f$ votes then all non-faulty will see $n-2f \geq f+1$ votes.

The pseudo-code is simple:

   // leader with input v
   send <v> to all parties

   // Party j (including the leader)
   echo = true
   
   on receiving <v> from leader:
      if echo == true:
         send <echo, v> to all parties
         echo = false

   on receiving <echo, v> from n-f distinct parties:
         send <vote, v> to all parties

   on receiving <vote, v> from f+1 distinct parties:
         send <vote, v> to all parties

   on receiving <vote, v> from n-f distinct parties:
       deliver v

Analysis

Claim 1 (validity): If the leader is honest and sends to all parties then all non-faulty will eventually deliver .

Proof: All non-faulty will eventually send <echo,v>, so eventually all non-faulty will receive $n-f$ echoes for $v$ and at most $f$ echoes for other values. Hence all non-faulty will eventually send <vote, v>, so eventually all non-faulty will receive $n-f$ votes for $v$ and at most $f$ votes for other values.

Claim 2:: No two non-faulty will send conflicting votes.

Proof: Seeking a contradiction, consider the first vote for $v$ and the first vote for $v’ \neq v$ by two non-faulty parties $a$ and $b$. Since these are the first, party $a$ must have seen a set $A$ of $n-f$ echoes for $v$ and party $b$ must have seen a set $B$ of $n-f$ echoes for $v’ \neq v$ (since they are the first, they could not have voted due to seeing $f+1$ votes). Observe that since $|A|=|B|=n-f$, then $|A \cap B| \geq f+1$ (this is the famous “quorum intersection” property). This implies that there must be at least $f+1$ parties that sent an echo to both of them, which implies that at least one non-faulty party sent two echoes for different values, which contradicts the code.

Claim 3 (agreement and totality): If a non-faulty delivers $v$, then all non-faulty will eventually deliver $v$.

Proof: From the previous claim, we know that all non-faulty that vote, will vote for the same value. So if a non-faulty delivers, it has seen $n-f$ distinct votes, of which at least $n-2f \geq f+1$ came from non-faulty parties. So all non-faulty parties will either vote $v$ due to seeing $n-f$ echoes or eventually due to seeing the votes from these $f+1$ non-faulty parties. Note that a non-faulty will never vote for $v’ \neq v$ because from claim 2, there will not be $n-f$ echoes for $v’$, and there will not be $f+1$ votes for $v’$.

Notes

The protocol only needs authenticated channels (no other cryptography).
Even in partial synchrony, the resilience threshold of $3f+1 optimal due to the DLS split brain lower bounds.
Bracha used Reliable Broadcast to improve Ben-Or’s Asynchonrus Byzantine Agreement from $n>5f$ to the optimal resilience of $n>3f$.
Reliable broadcast requires sending $O(n^2)$ messages that contain the value $v$. If $v$ has $\ell$ bits this is a total of $O(\ell n^2)$ bits. The post about Verifiable Information dispersal shows how this can be improved to $O(\ell n + n^2)$ bits.
Christian Cachin has excellent course notes on Byzantine Broadcasts and Randomized Consensus including Bracha’s Reliable broadcast.

Scratch your Brains!

Prove correctness (or provide a counterexample) of the following optimization that simplifies Bracha’s original protocol and saves a round!

    on receiving <v> from leader:
      if echo == true:
         send <echo, v> to all parties
         echo = false

    on receiving <echo, v> from f+1 distinct parties:
       if echo == true:
         send <echo, v> to all parties
         echo = false

    on receiving <echo, v> from n-f distinct parties:
       deliver v

Acknowledgment. We would like to thank hemengjie for fixing an error in the exercise.

Please answer/discuss/comment/ask on Twitter.