Guest post by Zhuolun Xiang

## State Machine Replication and Broadcast

Many existing permission blockchains are built using *Byzantine fault-tolerant state machine replication (BFT SMR)*, which ensures all honest replicas agree on the same sequence of client inputs. Most of the practical solutions for BFT SMR are based on the **Primary-Backup paradigm**. In this approach, in each view, there is a leader in charge to drive decisions efficiently, until replaced by the next leader. The Primary-Backup approach for SMR exposes deep connections to *broadcast*. Each view in BFT SMR is similar to an instance of broadcast where the leader taking on a similar role as the broadcaster, and hence **an efficient broadcast protocol can be converted to an SMR protocol with similar efficiency guarantees**.

## Good-case Latency

Practical SMR solutions care about the **good case** performance measured as the *latency to commit when the Primary is honest*. For many applications, *latency* is crucial. In a talk from 2000, Barbara Liskov, the author of PBFT, commented on PBFT needing 3 rounds in the good case:

I don’t know about a minimality proof that would show you require three phases, though I certainly haven’t been able to think of a way of doing it with fewer.

Therefore, it is natural and important to ask

What is the best latency a BFT SMR can achieve to commit decisions in the good case?

We refer to the above latency notion as *good-case latency*. For broadcast, we similarly define the good-case latency to be the *latency to commit when the broadcaster is honest*.

Somehow surprisingly, the above question has not been formally answered yet. Although a sequence of efforts improves the good-case latency of BFT SMR, there lacks a complete and rigorous characterization of the whole picture. Before we present our results, let’s take a quick look at the **existing best solutions** for BFT SMR on the good-case latency. For the synchrony model where the network delay is bounded by $\Delta$, Sync HotStuff commits in $2\Delta$ under $n\geq 2f+1$. For the partial synchrony model where the network delay is bounded only after a Global Stable Time (GST), PBFT commits in 3 rounds after GST under $n\geq 3f+1$, and FaB commits in 2 rounds after GST under $n\geq 5f+1$. **We show that all these protocols can be improved!**

## Results Overview

### **Theory**

Our **good-case latency paper** gives a **complete categorization** of the good-case latency for broadcast, under *synchrony, partial synchrony and asynchrony*. As mentioned, the protocols for broadcast can be converted to BFT SMR with similar good-case latency guarantees. The lower bound results also shed light on what is the limitation of good-case latency for BFT SMR. All of our bounds are **tight** except for just one case, as summarized in the table below (new and non-trivial results are marked bold).

- For asynchrony, Byzantine broadcast is impossible and the standard broadcast formulation is
*Byzantine reliable broadcast (BRB)*, which has a**tight bound of 2 rounds**for the good-case latency. - For partial synchrony, we propose a new broadcast formulation called
*partially synchronous Byzantine broadcast (Psync-BB)*that captures a single-shot of BFT SMR protocols like PBFT. We show that**$n\geq 5f-1$ is the tight resilience bound**for solving psync-BB with good-case latency of 2 rounds. Since psync-BB solves a single-shot of BFT SMR, our results directly refute the claim made in FaB saying that $n=5f+1$ is the best possible resilience for $2$-round BFT SMR protocols. - For synchrony, we reveal a surprisingly rich structure of the good-case latency for Byzantine broadcast (BB). For a more accurate characterization, we adopt the separation of
*assumed network delay $\Delta$*and the*actual (unknown) network delay $\delta$*. For instance, 1 round in the bounds for asynchrony and partial synchrony above equals $\delta$, as the protocols can proceed with the network speed. We also distinguish two assumptions about the clock synchronization – the*synchronized start*case where all parties can start the protocol and local clock at the same time, and the*unsynchronized start*case where all parties start the protocol and local clock within $\Delta$ time of each other. To strengthen the results, all lower bounds assume sync start and all upper bounds assume unsynchronized start, except the case when $n/3<f<n/2$, which is especially interesting as the*tight bounds depend on the clock synchronization assumption*, and for unsynchronized start the tight bound is $\Delta+1.5\delta$,*not even an integer multiple of the delay*!

### **Practice**

For the practical side, the investigation on good-case latency leads to **better BFT SMR protocols** over PBFT, FaB, and Sync HotStuff in terms of good-case latency.

- For partial synchrony, we obtain a
**2-round BFT SMR**protocol that only requires $n\geq 5f-1$. Our protocol refutes the claim made in FaB saying that $n=5f+1$ is the best possible resilience for $2$-round BFT SMR protocols. Interestingly, for the canonical example with $n=4$ and $f=1$, we can have a 2-round PBFT protocol with the optimal resilience! - For synchrony, we obtain a
**$1\Delta$-SMR**protocol that commits in $\Delta+2\delta$ with $n\geq 2f+1$, reducing the commit latency (which is $2\Delta$) of Sync HotStuff by almost half when $\delta\ll\Delta$.

## Protocols and Impossibility Proofs

Check out our next post and the next one!

Please answer/discuss/comment/ask on Twitter.