aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRasmus Dahlberg <rasmus.dahlberg@kau.se>2021-04-29 14:50:49 +0200
committerRasmus Dahlberg <rasmus.dahlberg@kau.se>2021-04-29 14:50:49 +0200
commit94fea7a3c993686d26efbf7ca9b73d598222a272 (patch)
tree99933f073ff2c165bb20b1d6ac8d1f784b803feb
parent87a2fa506c1861158ca04fd34d64e10b6447d8f3 (diff)
added start on design document
Work in progress.
-rw-r--r--doc/design.md196
1 files changed, 176 insertions, 20 deletions
diff --git a/doc/design.md b/doc/design.md
index 59cd7c8..9fcf4b6 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -2,9 +2,9 @@
We propose System Transparency logging. It is similar to Certificate
Transparency, expect that cryptographically signed checksums are logged as
opposed to X.509 certificates. Publicly logging signed checksums allow anyone
-to discover which keys signed what. As such, malicious and unintended key-usage
-can be _discovered_. We present our design and discuss how two possible
-use-cases influenced it: binary transparency and reproducible builds.
+to discover which keys produced what signatures. As such, malicious and
+unintended key-usage can be _detected_. We present our design and conclude by
+providing two use-cases: binary transparency and reproducible builds.
**Target audience.**
You are most likely interested in transparency logs or supply-chain security.
@@ -12,20 +12,20 @@ You are most likely interested in transparency logs or supply-chain security.
**Preliminaries.**
You have basic understanding of cryptographic primitives like digital
signatures, hash functions, and Merkle trees. You roughly know what problem
-Certificate Transparency solves and how. You may never have heard the term
-_gossip-audit model_, or know how it is related to trust assumptions and
-detectability properties.
+Certificate Transparency solves and how.
**Warning.**
-This is a work-in-progress document that may be moved or modified.
+This is a work-in-progress document that may be moved or modified. A future
+revision of this document will bump the version number to v1. Please let us
+know if you have any feedback.
## Introduction
Transparency logs make it possible to detect unwanted events. For example,
are there any (mis-)issued TLS certificates [\[CT\]](https://tools.ietf.org/html/rfc6962),
did you get a different Go module than everyone else [\[ChecksumDB\]](https://go.googlesource.com/proposal/+/master/design/25530-sumdb.md),
or is someone running unexpected commands on your server [\[AuditLog\]](https://transparency.dev/application/reliably-log-all-actions-performed-on-your-servers/).
-System Transparency logging makes signed checksums transparent. The goal is to
-_detect_ unwanted key-usage without making assumptions about the signed data.
+A System Transparency log makes signed checksums transparent. The overall goal
+is to facilitate detection of unwanted key-usage.
## Threat model and (non-)goals
We consider a powerful attacker that gained control of a target's signing and
@@ -33,7 +33,7 @@ release infrastructure. This covers a weaker form of attacker that is able to
sign data and distribute it to a subset of isolated users. For example, this is
essentially what FBI requested from Apple in the San Bernardino case [\[FBI-Apple\]](https://www.eff.org/cases/apple-challenges-fbi-all-writs-act-order).
The fact that signing keys and related infrastructure components get
-compromised should not be controversial [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/).
+compromised should not be controversial these days [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/).
The attacker can also gain control of the transparency log's signing key and
infrastructure. This covers a weaker form of attacker that is able to sign log
@@ -47,16 +47,172 @@ detection would result in a significant loss of capability that is by no means
trivial to come by. Second, detection means that some part of the attacker's
malicious behavior will be disclosed publicly.
-Our goal is to facilitate _detection_ of compromised signing keys. Therefore,
-we transparency log signed checksums. We assume that clients _fail closed_ if a
-checksum does not appear in a public log. We also assume that the attacker
-controls at most a threshold of independent parties to achieve our goal
-("strength in numbers").
+Our goal is to facilitate _detection_ of compromised signing keys. We consider
+a signing key compromised if an end-user accepts an unwanted signature as valid.
+The solution that we propose is that signed checksums are transparency logged.
+For security we need a collision resistant hash function and an unforgeable
+signature scheme. We also assume that at most a threshold of seemingly
+independent parties are adversarial.
-It is a non-goal to disclose the data that a signed checksum represents. For
-example, the log cannot distinguish between a checksum that represents a tax
-declaration, an ISO image, or a Debian package. This means that the type of
-detection we support is _courser-grained_ when compared to Certificate
-Transparency.
+It is a non-goal to disclose the data that a checksum represents. For example,
+the log cannot distinguish between a checksum that represents a tax declaration,
+an ISO image, or a Debian package. This means that the type of detection we
+support is more _course-grained_ when compared to Certificate Transparency.
## Design
+We consider a data publisher that wants to digitally sign their data. The data
+is of opaque type. We assume that end-users have a mechanism to locate the
+relevant public verification keys. Data and signatures can also be retrieved
+(in)directly from the data publisher. We make little assumptions about the
+signature tooling. The ecosystem at large can continue to use `gpg`, `openssl`,
+`ssh-keygen -Y`, `signify`, or something else.
+
+We _have to assume_ that additional tooling can be installed by end-users that
+wish to enforce transparency logging. For example, none of the existing
+signature tooling support verification of Merkle tree proofs. A side-effect of
+our design is that this additional tooling makes no outbound connections. The
+above data flows are thus preserved.
+
+### A bird's view
+A central part of any transparency log is the data. The data is stored by the
+leaves of an append-only Merkle tree. Our leaf structure contains four fields:
+- **shard_hint**: a number that binds the leaf to a particular _shard interval_.
+Sharding means that the log has a predefined time during which logging requests
+will be accepted. Once elapsed, the log can be shutdown.
+- **checksum**: a cryptographic hash of some opaque data. The log never
+sees the opaque data; just the hash.
+- **signature**: a digital signature that is computed by the data publisher over
+the leaf's shard hint and checksum.
+- **key_hash**: a cryptographic hash of the public verification key that can be
+used to verify the leaf's signature.
+
+#### Step 1 - preparing a logging request
+The data publisher selects a shard hint and a checksum that should be logged.
+For example, the shard hint could be "logs that are active during 2021". The
+checksum might be a hashed release file or something else.
+
+The data publisher signs the selected shard hint and checksum using their secret
+signing key. Both the signed message and the signature is stored
+in the leaf for anyone to verify. Including a shard hint in the signed message
+ensures that the good Samaritan cannot change it to log all leaves from an
+earlier shard into a newer one.
+
+The hashed public verification key is also stored in the leaf. This makes it
+easy to attribute the leaf to the signing entity. For example, a data publisher
+that monitors the log can look for leaves that match their own key hash(es).
+
+A hash, rather than the full public verification key, is used to force the
+verifier to locate the key and trust it explicitly. Not disclosing the public
+verification key in the leaf makes it more difficult to use an untrusted key _by
+mistake_.
+
+#### Step 2 - submitting a logging request
+The log implements an HTTP(S) API. Input and output is human-readable and uses
+percent encoding. We decided to use percent encoding for requests and responses
+because it is a simple format that is commonly used on the web. A more complex
+parser like JSON is not needed if the exchanged data structures are basic
+enough.
+
+The data publisher submits their shard hint, checksum, signature, and public
+verification key as key-value pairs. The log will use the public verification
+key to check that the signature is valid, then hash it to construct the leaf.
+
+The data publisher also submits a _domain hint_. The log will download a DNS
+TXT resource record based on the provided domain name. The downloaded result
+must match the public verification key hash. By verifying that the submitter
+controls a domain that is aware of the public verification key, rate limits can
+be applied per second-level domain. As a result, you would need a large number
+of domain names to spam the log in any significant way.
+
+Using DNS to combat spam is convenient because many data publishers already have
+a domain name. A single domain name is also relatively cheap. Another
+benefit is that the same anti-spam mechanism can be used across several
+independent logs without coordination. This is important because a healthy log
+ecosystem needs more than one log to be reliable. DNS also has built-in
+caching that can be influenced by setting TTLs accordingly.
+
+The submitter's domain hint is not part of the leaf because key management is
+more complex than that. The only service that the log provides is discovery of
+signed checksums. Key transparency projects have their own merit.
+
+The log will _try_ to incorporate a leaf into the Merkle tree if a logging
+request is accepted. There are no _promises of public logging_ as in
+Certificate Transparency. Therefore, the submitter needs to wait for an
+inclusion proof before concluding that the request succeeded. Not having
+inclusion promises makes the log less complex.
+
+#### Step 3 - distributing proofs of public logging
+The data publisher is responsible for collecting all cryptographic proofs that
+their end-users will need to enforce public logging. It must be possible to
+download the following collection (in)directly from the data publisher:
+1. **Shard hint**: the data publisher's selected shard hint.
+2. **Opaque data**: the data publisher's opaque data.
+3. **Signature**: the data publisher's leaf signature.
+5. **Cosigned tree head**: the log's tree head and a _list of signatures_ that
+state it is consistent with prior history.
+6. **Inclusion proof**: a proof of inclusion that is based on the leaf and tree
+head in question.
+
+The public verification key is known. Therefore, the first three fields are
+sufficient to reconstruct the logged leaf. The leaf's signature can be
+verified. The final two fields then prove that the leaf is in the log. If the
+leaf is included in the log, any monitor can detect that there is a new
+signature for a data publisher's public verification key.
+
+The catch is that the proof of logging is only as convincing as the tree head
+that the inclusion proof leads up to. To bypass public logging, the attacker
+needs to control a threshold of independent _witnesses_ that cosign the log. A
+benign witness will only sign the log's tree head if it is consistent with prior
+history.
+
+#### Summary
+The log is sharded and will shutdown at a predefined time. The log can shut
+down _safely_ because end-user verification is not interactive. The difficulty
+of bypassing public logging is based on the difficulty of controlling a
+threshold of independent witnesses. Witnesses cosign tree heads to make them
+trustworthy.
+
+Submitters, monitors, and witnesses interact with the log using an HTTP(S) API.
+Submitters must prove that they own a domain name as an anti-spam mechanism.
+End-users interact with the log _indirectly_ via a data publisher. It is the
+data publisher's job to log signed checksums, distribute necessary proofs of
+logging, and monitor the log.
+
+### A peak into the details
+Our bird's view introduction skipped many details that matter in practise. Some
+of these details are presented here using a question-answer format. A
+question-answer format is helpful because it is easily modified and extended.
+
+#### What cryptographic primitives are supported?
+The only supported hash algorithm is SHA256. The only supported signature
+scheme is Ed25519. Not having any cryptographic agility makes the protocol
+simpler and more secure.
+
+An immediate follow-up question is how that is supposed to work with existing
+and future signature tooling. The key insight is that _additional tooling is
+already required to verify Merkle tree proofs. That tooling should use SHA256.
+That tooling should also verify all Ed25519 signatures that logs, witnesses, and
+data publishers create_.
+
+For example, suppose that an ecosystem uses `gpg` which has its own incompatible
+signature format and algorithms. The data publisher could _cross-sign_ using
+Ed25519 as follows:
+1. Sign the opaque data as you normally would with `gpg`.
+2. Hash the opaque data and use that as the leaf's checksum. Sign the leaf
+using Ed25519.
+
+First the end-user verifies that the `gpg` signature is valid. This is the
+old verification process. Then the end-user uses the additional tooling to
+verify proofs of logging, which involves SHA256 hashing and Ed25519 signatures.
+
+The downside is that the data publisher may need to manage an Ed25519 key _as
+well_. TODO: motivate why that is a suboptimal but worth-while trade-off.
+
+#### What (de)serialization parsers are needed?
+#### Why witness cosigning?
+#### What policy should be used?
+#### TODO
+Add more key questions and answers.
+
+## Concluding remarks
+Example of binary transparency and reproducible builds.