We recently published our work HotStuff-2 on eprint, introducing a two-phase HotStuff variant which simultaneously achieves $O(n^2)$ worst-case communication, optimistically linear communication, a two-phase commit regime within a view, and optimistic responsiveness in partially-synchronous BFT.

The main takeaway is that two phases are enough for BFT after all.

In this post, we will elaborate on the key observation made in the work, and compare and contrast it to relevant prior works: PBFT, Tendermint, and the original HotStuff. We will focus on explaining one key component that also forms the difference between these works – how do we ensure liveness during a view change while simultaneously achieving other useful properties: responsiveness, good communication complexity, and fewer number of phases in the protocol?

HotStuff-2 is remarkably simple, adding no substantive complexity to the original HotStuff protocol. Hence, other aspects such as the way leaders change, differences in the normal case leader commit phase, pipelining, concurrency, and randomness are the same in HotStuff-2 as in HotStuff. Details regarding these aspects can be found in earlier posts, see the suggested tutorials section at the bottom of the post for background material.

Overview of Approaches

For simplicity, we will assume that these partially synchronous protocols with $n \geq 3t+1$ are committing a single value, though the idea directly extends to multiple values. At the core, all of PBFT, Tendermint, HotStuff, and HotStuff-2 use (roughly) the same approach to obtain the safety property.

In these protocols, in a view, parties hope to certify and commit at most one value. Thus, they introduce one phase of votes between parties to ensure uniqueness of a value within the view. They then engage in another phase that allows parties to commit the value. A two-phase regime materializing this approach, where each phase uses a linear secure broadcast, looks like the following. Note that party 4 is Byzantine in all of the examples below.

When a party commits a value $v$ after the second phase, a principal invariant that is ensured is that $\geq 2t+1$ parties (and thus $\geq t+1$ honest parties) are locked on $v$ from the first phase. These $\geq t+1$ honest parties would guard the safety of the commit. Thus, they would not vote on a conflicting value unless they are shown a proof that it is safe to do so.

Thus, all of these protocols rely on a lock-commit paradigm (also known as commit-adopt) where sufficiently many parties are locked before any party can commit. Consequently, if a leader fails or the network stalls, we can end up in one of three situations depicted below.

  1. No party has a lock and a fortioti, no value is committed.
  2. One or a few parties are locked on a value, but no honest party has committed.
  3. $2t+1$ parties are locked, some honest parties have committed but not all.

Observe that while case 1 is not great from the perspective of making progress, all parties are free to vote on other values in subsequent views. The key question is how do parties that are locked on a value, behave in the other two cases given that they do not know which scenario is the true system state. It turns out that all of these safety-preferring protocols, the locked parties guard the safety of a potential commit, i.e., by default assume we are in case 3, and not vote for a conflicting value unless shown a proof otherwise. When we are indeed in case 2, we need some mechanism to ensure that these locked honest parties vote for a safe proposal from a leader (perhaps conflicting with a locked value) after a view-change. The different schemes differ in this regard. Note that since all of these are safety-preferring protocols, we are only concerned about an honest leader making progress — a malicious leader, at the most, can delay progress, and it can do so by simply not sending a proposal in the view.

How does the view-change protocol differ between PBFT, Tendermint, HotStuff, and HotStuff-2?

While reading about these protocols, it may be beneficial to think about these two questions: (i) What does the leader learn about the status of the system (w.r.t. the three scenarios) at the start of a view? (ii) How does the leader convince other parties about the status of the system (and thus to vote for its proposal)?

PBFT

In PBFT, the leader collects a status of locks from $2t+1$ parties at the start of a view. Consider case 3 depicted above, namely, that some party has committed in earlier views. Among the $2t+1$ locks in the status, up to $t$ can be malicious, up to $t$ can be from honest parties are not locked on the committed value, but there is at least one honest party locked on the committed value. That party would not vote for a leader proposal conflicting with the lock it holds, thus guarding the safety of the commit. Moreover, the leader would necessarily learn about that lock when it collects the status message.

Now consider case 2 depicted below, namely, no party had committed in an earlier view, but some parties (in the figure below, parties 1 and 4) locked on the value. Their status may or may not be a part of the locks the leader receives. For instance, we can be in a scenario where the leader receives a status from all parties except party 1, and Byzantine party 4 does not report its lock. In this case, the leader is actually guaranteed that no honest party may have committed the value and is free to propose any value.

How does the leader convince the honest parties locked on a conflicting value to vote for the value it proposed? It sends the status certificate containing the $2t+1$ locks in its proposal. For example, in the figure below, the leader provides (absence of) locks from itself, party 3, and party 4 (who is lying). Observe that every honest party can apply the same reasoning as the leader did, and thus know that it is safe to vote for the value. In particular, this holds even in situations when a party is locked on a value and the leader proposes a different value that is justified by the status certificate. In the figure above, party 1 is locked in a previous proposal but it will vote for the leader proposal. In other words, the status certificate from an honest leader provides sufficient information to all honest parties to vote on the proposal.

Note that sending this message can be responsive since the leader can act as soon as it receives $2t+1$ locks. However, the status certificate has linear complexity and thus sending it in the proposal leads to a quadratic communication complexity.

Tendermint

Tendermint improves PBFT by not requiring the leader to send a $(2t+1)$-sized status certificate and thus can perform a view change with linear complexity. Well, how can a leader convince honest parties would indeed vote for it? Observe that by receiving $2t+1$ locks, an honest leader perhaps knows what to propose, but if parties do not have access to those locks (through a status certificate), they may not vote for it. In particular, in a situation where some honest party $p$ is locked on a value, and a leader proposes a conflicting value, $p$ cannot distinguish whether we are in case 2 or case 3 in the scenarios described above. A generalization of this dilemma for this party is presented as a livelessness attack in HotStuff.

To address this concern, Tendermint requires the leader to wait for an $O(\Delta)$ time at the start of the view. After GST, the $O(\Delta)$ wait ensures that the leader receives locks from all honest parties. In the figure above, the leader will obtain the highest lock value from party 1. Thus, it can send a proposal that conforms with the value corresponding to the lock from the highest view, and send this lock together with the proposal. The highest-ranked lock convinces all honest parties to vote on the proposal sent by the leader. Note that, in this solution, while the leader learns the globally highest-ranked lock, the parties do not. However, the amount of information is sufficient to ascertain that the proposal is safe to vote on.

Tendermint obtains a linear communication complexity for view change compared to PBFT’s quadratic communication complexity. However, due to the $O(\Delta)$ wait, the protocol is not responsive.

HotStuff

The HotStuff protocol simultaneously obtains a linear view change and responsiveness when the leader is honest after GST. HotStuff addresses the livelessness concern differently while still ensuring that the value corresponding to the highest lock is proposed by the leader. Abstractly speaking, it uses the same argument as the one used for safety. For safety, we said, “if some party commits, then $\geq 2t+1$ parties hold a lock that will guard the safety of the commit, and ensure that the next leader receives it”. For liveness, HotStuff introduces another phase of votes and obtains a similar invariant: “if some party locks, then $\geq 2t+1$ parties know about the existence of this lock and thus hold a key corresponding to it. This key will be shared with the next leader to decide on a proposal appropriately.”

In the figure below, a leader failure or poor network at a critical moment leaves parties 1 and 4 locked; the extra phase guarantees that $2t+1$, namely parties 1, 3, and 4, already obtained a key corresponding to this lock. Note that, in particular, party 3 has a key despite not reaching the lock step.

Correspondingly, the next leader would learn about the highest lock through the $2t+1$ keys it receives, and the proposal would respect the globally highest lock held by any honest party. In the example, the new leader (party 2) obtains the key from party 3 (Byzantine party 4 may not send its key). Note that, while locks guard the safety of a commit, keys do not, and thus honest parties only holding a higher key on a different value than the proposal can still vote for the proposal.

Thus, HotStuff obtains linear view change and responsiveness, but introducing the key phase makes it a three-phase protocol.

HotStuff-2

Our work takes a fresh look at the livelessness concern and asks, do we really need to add another phase to address this concern while still obtaining linear communication complexity and optimistic responsiveness? Following the above discussion, if a leader knows about the highest lock and it can convince all honest parties about it, then the problem is solved.

The key observation is that a new leader can choose between two options:

  • If the leader obtains a lock (a quorum certificate) from the preceding view, it knows that it has obtained the maximal locked value that possibly exists in the system. In this case, it proceeds with a proposal in a responsive manner.
  • Otherwise, the leader knows that a timer delay of $\Omega(\Delta)$ must have expired in the preceding view. In that case, there is no responsiveness anyway, hence it waits an extra $O(\Delta)$ delay after all parties are guaranteed to enter the view to obtain the maximal locked value in the system.

Indeed if a leader would receive a lock corresponding to the previous view, there cannot exist an even higher lock. Thus, a proposal respecting this lock would be voted for by all honest parties and the livelessness concern does not exist. If the leader does not obtain such a lock from the previous view, it would wait to hear about the locks from all honest parties by waiting $O(\Delta)$ time — this can happen when the leader from the previous view is malicious or the previous view was before GST. However, in those cases, we cannot hope to obtain responsiveness anyway. Thus, if we have a sequence of honest leaders after GST, each of them is guaranteed to drive progress responsively and generate a certificate in a view that will aid the next leader. Thus, all leaders except the first one in the chain can make honest parties commit responsively.

Revisiting the example above, in HotStuff-2 one of the following two scenarios depicted below may happens:

  • Scenario 1: Some parties may not commit in a view but they become locked in it; the highest lock is obtained by the next view leader and it proceeds responsively.
  • Scenario 2: No (honest) party obtains a lock on a value in a view, and the next view leader has to wait $\Delta$ to propose in the next view.

It’s worth noting that using a dual leader regime differentiating between the case of a normal/faulty preceding view has been used in prior works; see the HotStuff-2 manuscript for a detailed discussion of related works. The first one (to our knowledge) is Pala which introduced a dual regime into the Tendermint protocol. Subsequently, several works have attempted to improve the number of phases in the HotStuff protocol using a dual leader regime. This includes Fast-HotStuff and a recent post that adds a $\Delta$ wait to Tendermint and to the original two-phase HotStuff paper, but with no discussion of responsiveness or view synchronization.

The protocol modification in HotStuff-2 is remarkably simple, adding no substantive complexity to the original HotStuff protocol. The essence of our result is not in a new protocol but understanding that a simple solution such as this suffices and we can simultaneously obtain a linear communication complexity view change with optimistic responsiveness, worst-case quadratic complexity while still committing within two phases. Our key observation is subtle, but once the insight is understood, really simple (at least we think so!), allowing us to solve the problem without any heavy machinery.

Summary

Here’s a summary of the differences between these protocols.

  Phases Worst-case complexity View-change communication Responsive view-change?
PBFT 2 $O(n^3)$ $O(n^2)$ Yes
Tendermint 2 $O(n^2)$ $O(n)$ No
HotStuff 3 $O(n^2)$ $O(n)$ Yes
HotStuff-2 2 $O(n^2)$ $O(n)$ Yes

Suggested Tutorials

Please add your comments on Twitter.

Acknowledgment. We thank Benjamin Chan for providing feedback on this post.