Traditionally, protocol performance is summarized using two metrics:

latency (measured in rounds), and
communication complexity (measured asymptotically, e.g., $O(n^2)$).

If both are small, we might expect the protocol to perform well in practice.

But this intuition is incomplete.

When Low Latency and Low Communication Aren’t Enough

Consider two protocols:

Protocol A: 3-round good-case latency and $O(n^4)$ communication.
Protocol B: 5-round good-case latency and $O(n^3)$ communication.

These are the asymptotics for two variants of Sailfish. If one has to choose between these two options, which should we deploy?

At a high level, the traditional metrics are good indicators of protocol performance; but they do not completely describe their performance. In particular, it has been shown that having lower communication complexity does not imply higher throughput; multi-sender protocols can obtain high throughput even though they have high communication complexity.

The Standard Three-Step Process

Today, determining practical performance typically follows three steps:

Design protocols with low latency and low communication complexity.
Implement them in a geo-distributed setting.
Measure throughput and latency experimentally.

The second step is expensive in time, engineering effort, and infrastructure cost.

Can we analytically predict performance instead?

The Pipes Model

The paper The Pipes Model of Latency and Throughput Analysis proposes a framework to compute latency as a function of bandwidth and load analytically during the protocol design process.

The model assumes:

Each processor has bandwidth $S$ data parcels per timeslot (upload and download).
Transactions arrive at rate $D$ parcels per timeslot.
Message delay between processors is a fixed $\Delta$.
Upload and download speeds are equal and constant.
All parties are non-faulty.
Computation costs (signatures, erasure coding) are ignored.

Unlike traditional analysis, Pipes treats the interplay between bandwidth, delays, and communication as a primary consideration to analytically obtain the results we would otherwise measure experimentally.

In Pipes, given a protocol design, the goal is to:

Identify the latency bottleneck, or the incoming transaction rate at which latency becomes unbounded over the protocol execution, i.e., a maximum throughput that the protocol can handle without unbounded latency. As an example, observe that if the system receives transaction data at the rate of $D = 100$MBps, but the system delivers blocks of size 10MB in 1sec, then the protocol cannot sustain the rate of incoming transactions. Eventually, the transactions will pile up, and the time to deliver transactions will grow to inifinity.
Identify how the ideal latency varies below the bottleneck,as a function of $D$, $S$, $n$, $\Delta$, and protocol parameters. For example, if the incoming rate is $D = 1$MBps, then setting block size = 10MB is not optimal from a latency standpoint.

A Tutorial: Applying Pipes to Bracha’s Reliable Broadcast

We now apply Pipes to Bracha’s reliable broadcast. You can refer to this post for an intuitive description of the protocol, and Figure 16 in the paper for a complete description.

The protocol proceeds in three steps. The leader proposes a block, all parties echo the block proposal (including the data), and on receiving sufficiently many echoes, parties vote (on the hash of the proposal) for the proposal. On receiving sufficiently many votes, parties deliver the block.

Assume:

Block size: $B$ (only transactions, no metadata)
Metadata size: $M$
Vote size: $\lambda$
$n$ parties

The leader repeatedly proposes a new block once the previous block is delivered.

Block Delivery Time

Let $t^*$ denote the time required to deliver one block.

Proposal

Suppose the leader starts sending a block at time $t$. The leader sends $B + M$ parcels to each of $n$ parties. Since upload bandwidth is $S$, this reaches by time

\[t + \frac{(B+M)n}{S} + \Delta.\]

Echo

Each party echoes the full proposal. Echoes arrive at

\[t + \frac{2(B+M)n}{S} + 2\Delta.\]

Vote

Votes have size $\lambda$. They arrive at

\[t + \frac{2(B+M)n}{S} + \frac{n\lambda}{S} + 3\Delta.\]

Thus,

\[t^* = \frac{2(B+M)n}{S} + \frac{n\lambda}{S} + 3\Delta.\]

The Latency Bottleneck

Note that we are considering a multi-shot protocol where a leader proposes a new block immediately after the previous one is delivered. To prevent unbounded latency for any transaction (and unbounded mempool growth for the leader), the leader must include all transactions that arrive in the $t^*$ timeframe. Thus,

\[B \ge D t^*.\]

Substituting:

\[B \ge D\left(\frac{2(B+M)n}{S} + \frac{n\lambda}{S} + 3\Delta\right).\]

Solving for $B$:

\[B \ge D\left(\frac{2Mn + n\lambda + 3\Delta S}{S - 2nD}\right).\]

For the denominator to remain positive, we must have

\[2nD < S.\]

Therefore, the maximum sustainable throughput is

\[D_{\max} = \frac{S}{2n}.\]

Even though Bracha runs in three rounds, its throughput shrinks linearly with $n$.

Transaction Latency

Our goal is to compute the worst-case latency for any transaction. Suppose $t$ is the time at which the leader “completes” a block at depth $d-1$, i.e., it only adds metadata to this block from this point, and then starts proposing it. Thus, any transaction arriving after time $t$ will be included in a block at depth $d$. However, first, the depth $d-1$ block needs to be delivered. The $d-1$ depth block contents have already been sent by time $t$; the proposal should still send the metadata $M$. This arrives at all parties by time $t+\frac{Mn}{S} + \Delta$. Each party echoes the block, and the echoes arrive by time $t+\frac{Mn}{S} + \frac{(B+M)n}{S} + 2\Delta$. Finally, parties vote, and the block is delivered by time $t+\frac{Mn}{S} + \frac{(B+M)n}{S} + \frac{\lambda n}{S} + 3\Delta$. At this time, the block at depth $d-1$ is delivered.

By continuing a similar chain of thought, delivering the block at depth $d$ (containing transaction $tx$) will require an additional time $\frac{(2(B+M)+\lambda)n}{S} + 3\Delta$

Summing up, the worst-case latency of a transaction becomes

\[\frac{(3B+4M+2\lambda)n}{S} + 6\Delta.\]

Substituting the expression for $B$ gives

\[\text{Latency} = \left(\frac{(4M+2\lambda)n}{S} + 6\Delta\right) \left(1 + \frac{1.5}{(S/Dn)-2}\right).\]

If $S/(Dn) \gg 2$, latency is approximately

\[\frac{(4M+2\lambda)n}{S} + 6\Delta.\]

As $2nD \to S$, latency tends to infinity.

Round complexity or communication complexity alone do not reveal this information.

fig33

Plotting the curve yields a relationship between latency and throughput when other parameters are fixed. Here, we set the size of a transaction as 2500 bits, $M = 1000$bits, $\lambda=500$bits, $\Delta = 0.2$secs, and $n = 200$.

What Did We Learn?

Bracha’s reliable broadcast has throughput ceiling $S/(2n)$. Its latency explodes as load approaches this ceiling. We note that the analysis above is for a variant of the protocol described in the paper. The protocol itself could be improved to perform better, but that’s an orthogonal concern.

At a higher level, latency and communication complexity remain important metrics. But to understand real-world performance, we must also ask:

How close does the protocol operate to network capacity?
How does latency scale under load?
Where is the saturation point?

The Pipes model gives us these answers — without running experiments. We should caveat that experiments will eventually be necessary since Pipes does make simplifying assumptions (e.g., considering a fixed $\Delta$ between all pairs of parties). However, it provides a much more useful signal during the design process.