From 33ae7de627baba00e80d8c5297ededac133c0a39 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Mon, 28 Jun 2021 12:14:05 +0200 Subject: started on a refactored design description --- doc/design.md | 115 ++++++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 96 insertions(+), 19 deletions(-) diff --git a/doc/design.md b/doc/design.md index 10b598f..7138a57 100644 --- a/doc/design.md +++ b/doc/design.md @@ -6,10 +6,6 @@ keys produced what checksum signatures. For example, malicious and unintended key-usage can be _detected_. We present our design and discuss a few use-case scenarios like binary transparency and reproducible builds. -**Target audience.** -You are likely interested in transparent logs, public-key infrastructures, -or supply-chain security. - **Preliminaries.** You have basic understanding of cryptographic primitives like digital signatures, hash functions, and Merkle trees. You roughly know what problem @@ -21,21 +17,98 @@ revision of this document will bump the version number to v1. Please let us know if you have any feedback. ## Introduction -Transparency logs make it possible to detect unwanted events. For example, +Transparent logs make it possible to detect unwanted events. For example, are there any (mis-)issued TLS certificates [\[CT\]](https://tools.ietf.org/html/rfc6962), did you get a different Go module than everyone else [\[ChecksumDB\]](https://go.googlesource.com/proposal/+/master/design/25530-sumdb.md), or is someone running unexpected commands on your server [\[AuditLog\]](https://transparency.dev/application/reliably-log-all-actions-performed-on-your-servers/). -A sigsum log brings transparency to **sig**ned check**sum**s. +A sigsum log brings transparency to **sig**ned check**sum**s. + +**Problem description.** +Suppose that you are an entity that publishes some opaque data. For example, +the opaque data might be + a provenance file, + an executable binary, + an automatic software update, + a BGP announcement, or + a TPM quote. +You claim to publish the same opaque data to everyone in a public repository. +However, past incidents taught us that word is cheap and sometimes things go +wrong. Trusted parties get compromised and lie about it [\[DigiNotar\]](), or +they might not even realize it until later on because the break-in was stealthy +[\[SolarWinds\]](). + +The goal of sigsum logging is to make your claims verifiable by you and +others. To keep the design simple and general, we want to achieve this goal +with few assumptions about the opaque data or the involved claims. You can +think of this as some sort of bottom-line for what it takes to apply a +transparent logging pattern. Use-cases that wanted to piggy-back on an +existing reliable log ecosystem fit well into our scope [\[BinTrans\]](). + +We also want the design to be easy from the perspective of log operations and +deployment in constrained environments. This includes considerations such as +idiot-proof parsing, protection against log spam and poisoning, and a +well-defined gossip protocol without complex auditing logic. See [feature +overview](). + +**Setting overview.** +You would like users of the published data to _believe_ your claims. Therefore, +we refer to you as a _claimant_ and your users as _believers_. Belief is going +to be reasonable because each claim is expressed as a _signed statement_ that is +transparency logged. A _verifier_ can discover your claims in a public sigsum +log. If a claim turns out to be false, an _arbiter_ is notified that can act on +it. An overview of these _roles_ and how they interact are shown in Figure 1. +A party may play multiple roles. Refer to the claimant model for additional +details [\[CM\]](). + +``` + claim +----------+ + +----------| Claimant |----------+ + | +----------+ |Data + | |Proofs + v v + +---------+ +------------+ + | Log | | Repository | + +---------+ +------------+ + | | | + | | |Data + | claims +----------+ Data | |Proofs + +---------->| Verifier |<------+ | + +----------+ v + +---------+ | +------------+ + | Arbiter | <--------+ | Believer | + +---------+ +------------+ + + Figure 1: system overview +``` + +The claimant's signed statement encodes the following claim: _the opaque data +has cryptographic hash X_. It is stored in a sigsum log for discoverability. +The claimant may define additional _implicit_ meanings for each such statement. +These implicit claims are not stored by the log and are communicated through +policy. For example: +- The opaque data can be located in Repository using X as an identifier. +- The opaque data is a `.buildinfo` file that facilitates a reproducible build +[\[R-B\]](). + +Detailed examples of use-case specific claimant models are defined in a separate +document [\[CM-Examples\]](https://github.com/sigsum/sigsum/blob/main/doc/claimant.md). + +**Roadmap.** +So far we only introduced the overall problem and the setting. Our main +contribution is the way in which the log component is designed. First we +describe our threat model. Then we give a bird's view of the design. Finally, +we go into greater detail using a question-answer format that is easy to extend +and/or modify. ## Threat model and (non-)goals -We consider a powerful attacker that gained control of a target's signing and +We consider a powerful attacker that gained control of a claimant's signing and release infrastructure. This covers a weaker form of attacker that is able to sign data and distribute it to a subset of isolated users. For example, this is essentially what the FBI requested from Apple in the San Bernardino case [\[FBI-Apple\]](https://www.eff.org/cases/apple-challenges-fbi-all-writs-act-order). The fact that signing keys and related infrastructure components get compromised should not be controversial these days [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/). -The attacker can also gain control of the transparency log's signing key and +The attacker can also gain control of the sigsum log's signing key and infrastructure. This covers a weaker form of attacker that is able to sign log data and distribute it to a subset of isolated users. For example, this could have been the case when a remote code execution was found for a Certificate @@ -47,24 +120,28 @@ detection would result in a significant loss of capability that is by no means trivial to come by. Second, detection means that some part of the attacker's malicious behavior will be disclosed publicly. -Our goal is to facilitate _disocvery_ of signed checksums. Such discovery -makes it possible to detect attacks on signing and release infrastructures. For -example, the signer can detect an unwanted sigsum by inspecting the log. +Following from our introductory goal we want to facilitate _disocvery_ of sigsum +statements. Such discovery makes it possible to detect attacks on a claimant's +signing and release infrastructures. For example, a claimant can detect an +unwanted sigsum by inspecting the log. It could be the result of a compromised +signing key. The opposite direction is also possible. Anyone may detect that a +repository is not serving data and/or proofs of public logging. It is a non-goal to disclose the data that a cryptographic checksum represents -_in the log_. A log cannot distinguish between a checksum that represents a tax -declaration, an ISO image, or a Debian package. The type of detection that we -support is therefore more _coarse-grained_ when compared to Certificate -Transparency. A significant benefit is that the resulting design becomes -simpler, generally useful, and less costly to bootstrap into a reliable -operation. +_in the log_. It is also a non-goal to allow richer metadata that is +use-case specific. The type of detection that a sigsum log supports is +therefore more _coarse-grained_ when compared to Certificate Transparency. A +significant benefit is that the resulting design becomes simpler, general, and +less costly to bootstrap into a reliable log ecosystem. For security we need a collision resistant hash function and an unforgeable signature scheme. We also assume that at most a threshold of seemingly -independent parties are adversarial. - +independent parties are adversarial to protect against split-views +[\[Gossip\]](). ## Design +TODO: not updated from here on. + We consider a data publisher that wants to digitally sign their data. The data is of opaque type. We assume that end-users have a mechanism to locate the relevant public verification keys. Data and signatures can also be retrieved -- cgit v1.2.3