Asynchronous Agreement Part Two: Ben-Or's protocol

We continue to explore the marvelous world of consensus in the Asynchronous model. In this post, we present Ben-Or’s classic protocol from 1983. In the next post, we will present a more modern version that is a simplified version from our paper.

In the previous post we defined the problem of Asynchronous Agreement, so without further ado, here is Ben-Or’s protocol for Binary Asynchronous Agreement with $n=2f+1$ parties assuming the adversary can crash $f$ parties:

Ben-Or's Protocol for party i

input: v (0 or 1)
r:=1

while true
    send <echo1, r, v> to all
    wait for f+1 <echo1, r, *>
        if all have value w, then send <echo2, r, w> to all
        otherwise send <echo2, r, bot> to all
    wait for f+1 <echo2, r, *>
        if all have the same non-bot value u, then decide(u)
        if all have the value bot, then v:= coin()
        otherwise, v:=u where u is a non-bot value from <echo2, r, *>
    r++    

Note that this protocol does not terminate when it decides. Adding a termination gadget that provides termination in the crash model is quite simple:

Termination Gadget for crash failures

If receive <decide u>, then
    decide(u) 
If decide(u), then
    send <decide u> to all
    Terminate

Why does Ben-Or’s protocol work? The first thing to check is Weak Validity (if all are non-faulty and start with $b$ then all decide $b$). Indeed, if all parties have the same input $b$, and no party is faulty, then clearly all parties will send <echo1, 1, b>, hence will see $f+1$ <echo1, 1, b> message after one round, hence will send <echo2, 1, b>, hence will see $f+1$ <echo2, 1, b> message and decide (and terminate from the termination gadget).

The next property to verify is Agreement (that no two parties decide on different values). This follows from two simple claims:

Claim 1: there cannot be <echo2, r, 0> and <echo2, r, 1>. This follows from quorum intersection, sending echo2 requires $f+1$ echo1 of the same value, but this would mean one party sent both <echo1, r, 0> and <echo1, r, 1> which is not possible.

Claim 2: if one party sees $f+1$ messages for <echo2, r, u> then all parties will see at least one message for <echo2, r, u>. This follows since at least one of the $f+1$ parties that sent <echo2, r, u> is non-faulty and again from quorum intersection.

Combing both claims, we can now look at the first round $r$ in which a party decided value $u$, and conclude that any party that reaches the end of round $r$ will have $v=u$. We can now use the argument above for Weak Validity and prove that all non-faulty parties will decide $u$ by round $r+1$.

Now for the hard part. Why does this protocol have Finite Expected Termination?

It turns out that this is a non-trivial theorem for an adaptive adversary. It took over 15 years to realize via a rather complicated argument:

Theorem: [Aguilera, Toueg 1998]: Ben-Or’s protocol with $n=2f+1$ parties has an Finite Expected Termination of $O(2^{2n})$.

This leaves us in an unsatisfying state. While Ben-Or’s protocol is elegant, the proof for Finite Expected Termination is non-trivial and not easily taught in a standard class (or a short blog post). Is there a simpler proof? Maybe there is a reason for the complexity? Another concrete question is about efficiency: can we improve the $O(2^{2n})$ bound to a more natural $O(2^n)$ bound?

More fundamentally, the question we ask is: how should we argue about the expected termination of Asynchronous protocols with an adaptive adversary? Is there a more general framework that can allow us to decompose the problem?

We will answer this in the next post.

Thoughts and comments on Twitter.