Public key cryptography (PKC) is a fundamental technology that is a key enabler to the Internet and the whole client-server paradigm. Without public key cryptography, the modern Internet economy, from cryptocurrencies to online banking and retail, would simply not exist.
In the PKC paradigm with clients and servers, clients authenticate to servers by holding secret keys and servers verify the authentication by using known corresponding public keys.
Servers performing critical tasks are at greater risk of failure. To increase fault tolerance, distributed protocols with multiple servers are used. Systems that coordinate multiple servers are challenging; these systems may suffer liveness violations (stop responding) and safety violations (provide inconsistent views to clients).
The CAP theorem (Consistency - Availability - Partition tolerance) is a fundamental result that captures the inherent limitations of replicating a server:
Any replicated server system, in the face of a perceived 50-50 split, can either protect against safety violations or liveness violations, but not both.
While the CAP theorem focuses on servers, clients face a parallel dilemma when storing secret keys.
But what about the client? It needs to store its secret keys, not lose them, and protect them from being stolen. Storing secret keys for critical tasks is challenging due to the risk of failures. To increase fault tolerance, distributed storage protocols with multiple storage nodes are used. Systems that distribute client secrets are challenging; these distributed storage systems may suffer key loss (availability failure: losing access to your assets) or key theft (safety failure: attacker gains access to your assets). Several recent papers study this space, see MKE22, ME24.
The SAP theorem (Safety - Availability - Partition tolerance) is the fundamental analog result that captures the inherent limitations of distributing the secrets of a client:
Any distributed client secret storage system, in the face of a perceived 50-50 split (where the system cannot distinguish between a network partition and node failure), can either protect against key theft or key loss, but not both.
SAP theorem for two nodes
We proceed by contradiction, assuming the system achieves both properties and constructing two indistinguishable worlds that violate safety.
It is impossible to implement a distributed client secret key storage that is resilient to both key loss and key theft given two storage nodes and an adversary that can either crash one storage node or gain access to one storage node, in the partial synchrony model.
Proof
Consider a system with two storage nodes and two possible worlds. We will assume the system provides key loss resilience and prove that it cannot provide key theft resilience.
World A: Storage node 2 crashes. To provide liveness (key loss resilience) the system must still be able to sign commands in this setting.
World B: Storage node 1 is compromised, and the adversary has control over it. Moreover, due to asynchrony, storage node 2 is slow. The adversary can send malicious commands to storage node 1. From the perspective of the honest node, the network delay in World B is indistinguishable from a crash in World A.
Due to the indistinguishably between the two worlds, the execution of world B must succeed in signing commands. Therefore, in any execution consistent with World B, the system signs commands under adversarial control, violating key-theft resilience.
General SAP theorem
Let $P \subset N$ be any non-empty proper subset ($P \neq \emptyset$ and $P \neq N$).
The intuition is that the subset $P$ represents the crashed portion of the system, while its complement models the potentially compromised side.
It is impossible to implement a distributed client secret key storage that is resilient to both key loss and key theft given $N$ storage nodes and an adversary that can either crash the storage nodes in $P$ or gain access to the storage nodes in $N\setminus P$, in the partial synchrony model.
Proof
The proof of the general statement is a reduction to the two node case. The main observation is that world A must make progress if the storage nodes in $P$ crash, and hence world B must make progress even if the storage nodes in $P$ are all slow.
Just as the CAP theorem defines the limits of distributed servers, the SAP theorem defines the limits of distributed clients.
These limitations motivate ongoing research into techniques such as threshold signing and hardware enclaves that can partially mitigate the SAP tradeoff.
Your comments on twitter.