What’s DAG got to do with it?

Since 2018, blockchain research has seen a surge in DAG-based BFT protocols aimed at achieving higher throughput. These protocols trace back to Hashgraph, 2018, Aleph, 2019, the theoretical work of All You Need is DAG, 2021, and systems papers like Narwhal and Bullshark.

In this post, we highlight six key aspects of this body of research:

Separation of dispersal and consensus
Liveness of dispersal in asynchrony and combining DACs
Single-disperser BFT vs. multi-disperser BFT
Certified vs. uncertified DAGs
Virtual voting: piggybacking consensus on the DAG for free
Ordering transactions

1. Separation of Dispersal and Consensus

DAGs highlight the separation of dispersal and consensus:

Disperse the data and obtain a short data availability certificate (DAC) that proves the data can be retrieved, binding the DAC to the data.
Reach consensus on the DAC.
Retrieve the data using the DAC, achieving totality.

This pattern of dispersal-consensus-retrieval using a DAC is based on provable verifiable data dissemination (provable VID). Provable VIDs can be implemented as provable broadcast or using error-correcting codes for improved message complexity at the cost of higher latency. The latter approach appears essential for achieving $O(1)$ scaling of blockchains.

Another option is dispersal via provable reliable broadcast, which obtains totality and removes the need for a separate retrieval protocol. This method requires higher message complexity during dispersal but offers lower overall latency. Message complexity can be improved using message-efficient reliable broadcast protocols that employ error-correcting codes.

2. Liveness of Dispersal in Asynchrony and Combining DACs

Dispersal is live in asynchrony: since dispersal is a single-writer object and does not solve consensus, it can make progress even under asynchrony and is not limited by the FLP lower bound.
DACs can be combined: multiple data availability certificates can be aggregated within a new dispersal. This combined dispersal has a small DAC, and its retrieval can recursively fetch all previously dispersed data.

Some systems make no progress during asynchrony because they lock-step dispersal with consensus progress. These systems suffer performance degradation when the network becomes synchronous again due to backlog processing. In contrast, systems that separate dispersal from consensus continue dispersing in asynchrony and commit the entire batch once synchrony returns. Many DAG-based protocols and protocols like Autobahn follow this approach.

Having seen how dispersal can progress independently of consensus, we now turn to who performs dispersal — a single proposer or multiple concurrent ones.

3. Single-Disperser BFT vs. Multi-Disperser BFT

BFT consensus protocols typically have a single proposer (also called primary or leader) per view. Separating consensus from dispersal opens two possibilities:

In single-disperser BFT, the primary is both the sole proposer and the sole disperser in the view.
In multi-disperser BFT, multiple parties act as dispersers concurrently. The single proposer then combines multiple DACs.

Examples of multi-disperser BFT include All You Need is DAG, Narwhal, Bullshark, Shoal, Sailfish, Shoal++, Cordial Miners, Star, Autobahn, and Mysticeti. They demonstrate how multiple dispersers can boost throughput and achieve $O(n)$ amortized communication complexity when each disperser handles a disjoint $O(n)$-sized set of transactions.

A similar throughput boost can be achieved in single-disperser BFT protocols that use error-correcting codes for dispersal (see Figure 1 in the Pipes paper).

When Do Multiple Dispersers Help?

Multiple dispersers help only if their transaction sets are mostly disjoint. If they overlap heavily, bandwidth is wasted because transactions are dispersed redundantly.

From the client perspective (assuming clients have disjoint transactions), the ideal case is that each client uses a single non-faulty disperser, yielding a real bandwidth boost.

However, if $f$ dispersers are faulty, in the worst case each client must send its transactions to $f+1$ dispersers. Without coordination, this repeats every transaction $f+1$ times, and multiple dispersers may redundantly disperse the same transactions.

4. Multi-Disperser BFT: Certified and uncertified Directed Acyclic Graphs (DAGs)

Certified DAG: Aleph pioneered this approach, where each party waits to receive $n-f$ round $i$ DACs from different dispersers and combines them (along with any new data) into a new round $i+1$ dispersal protocol. Recent works like Sailfish and Shoal++ focus on improving latency for certified DAGs.

Uncertified DAG: Hashgraph introduced this pattern in the blockchain space, where each party waits for $n-f$ signed round $i$ messages and combines them (along with new data) to create a signed round $i+1$ message. This uncertified DAG pattern is used in Cordial Miners, Mysticeti, and Mahi-Mahi. The general idea of converting a causal order into a total order has roots in 20th century research on distributed systems.

Data Synchronization for DAGs

In DAGs, dispersed data includes not only new data from round $i$ but also data from all prior rounds ($i-1$, $i-2$, etc.).

If reliable broadcast is not used for dispersal, some parties may lack data from earlier rounds, requiring synchronization to fetch missing data.

For uncertified DAGs, Byzantine parties can attack by selectively sending data to some parties while withholding it from others, forcing synchronization on the critical path. In certified DAGs using DACs, parties avoid synchronization on the critical path but must recursively fetch missing data before execution, potentially requiring multiple network round trips. Certified DAGs shift complexity off the critical path: consensus proceeds on DACs, not raw data, while retrieval can occur lazily or in parallel.

Autobahn exemplifies a system that achieves non-blocking synchronization and avoids recursive synchronization by designing the dispersal protocol so retrieval for arbitrarily large data requires only a single round-trip.

5. Virtual Voting: Piggybacking consensus on the DAG without extra messages

Virtual voting piggybacks consensus messages onto DAG messages, trading off throughput and latency.

Virtual voting adds no bandwidth overhead. However, it can introduce latency drawbacks: consensus protocols may require different timeouts or triggers (e.g., a message from a designated leader or messages with specific content). Additionally, some virtual voting protocols on certified DAGs may incur extra queuing latency if they “miss the bus.”

6. Ordering Transactions

Most DAG protocols maintain a structure where every block in a view (round) references a super-majority of blocks from the previous view. Despite multiple dispersers, there is a single proposer (leader) per view. Parties agree on whether the proposer proposed a block in view $v$. Given a sequence of ordered proposer blocks across views, non-proposer blocks can be ordered deterministically among them. The super-majority references from view $v$ to blocks in view $v-1$ ensure that sufficiently many non-proposer blocks from $v-1$ can be ordered before the block from view $v$.

Acknowledgements

This post follows insightful discussions at the Dagstuhl seminar on Next-Generation Secure Distributed Computing. We thank the organizers and participants for their valuable feedback.

We also thank Adi Seredinschi, Nibesh Shrestha, and Francois Garillot for useful feedback on this post.

Your thoughts and comments on X.