diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/api.md | 398 | ||||
-rw-r--r-- | doc/claimant.md | 71 | ||||
-rw-r--r-- | doc/design.md | 251 | ||||
-rw-r--r-- | doc/sketch.md | 372 |
4 files changed, 720 insertions, 372 deletions
diff --git a/doc/api.md b/doc/api.md new file mode 100644 index 0000000..57ad119 --- /dev/null +++ b/doc/api.md @@ -0,0 +1,398 @@ +# System Transparency Logging: API v0 +This document describes details of the System Transparency logging +API, version 0. The broader picture is not explained here. We assume +that you have read the System Transparency Logging design document. +It can be found +[here](https://github.com/system-transparency/stfe/blob/design/doc/design.md). + +**Warning.** +This is a work-in-progress document that may be moved or modified. + +## Overview +Logs implement an HTTP(S) API for accepting requests and sending +responses. + +- Input data in requests and output data in responses are expressed as + ASCII-encoded key/value pairs. +- Requests with input data use HTTP POST to send the data to a log. +- Binary data is hex-encoded before being transmitted. + +The motivation for using a text based key/value format for request and +response data is that it's simple to parse. Note that this format is +not being used for the serialization of signed or logged data, where a +more well defined and storage efficient format is desirable. A +submitter may distribute log responses to their end-users in any +format that suits them. The (de)serialization required for +_end-users_ is a small subset of Trunnel. Trunnel is an "idiot-proof" +wire-format in use by the Tor project. + +## Primitives +### Cryptography +Logs use the same Merkle tree hash strategy as +[RFC 6962,§2](https://tools.ietf.org/html/rfc6962#section-2). +The hash functions must be +[SHA256](https://csrc.nist.gov/csrc/media/publications/fips/180/4/final/documents/fips180-4-draft-aug2014.pdf). +Logs must sign tree heads using +[Ed25519](https://tools.ietf.org/html/rfc8032). Log witnesses +must also sign tree heads using Ed25519. + +All other parts that are not Merkle tree related also use SHA256 as +the hash function. Using more than one hash function would increases +the overall attack surface: two hash functions must be collision +resistant instead of one. + +### Serialization +Log requests and responses are transmitted as ASCII-encoded key/value +pairs, for a smaller dependency than an alternative parser like JSON. +Some input and output data is binary: cryptographic hashes and +signatures. Binary data must be Base16-encoded, also known as hex +encoding. Using hex as opposed to base64 is motivated by it being +simpler, favoring ease of decoding and encoding over efficiency on the +wire. + +We use the +[Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html) +to define (de)serialization of data structures that need to be signed or +inserted into the Merkle tree. Trunnel is more expressive than the +[SSH wire format](https://tools.ietf.org/html/rfc4251#section-5). +It is about as expressive as the +[TLS presentation language](https://tools.ietf.org/html/rfc8446#section-3). +A notable difference is that Trunnel supports integer constraints. +The Trunnel language is also readable by humans _and_ machines. +"Obviously correct code" can be generated in C and Go. + +A fair summary of our Trunnel usage is as follows. + +All integers are 64-bit, unsigned, and in network byte order. +Fixed-size byte arrays are put into the serialization buffer in-order, +starting from the first byte. Variable length byte arrays first +declare their length as an integer, which is then followed by that +number of bytes. These basic types are concatenated to form a +collection. You should not need a general-purpose Trunnel +(de)serialization parser to work with this format. If you have one, +you may use it though. The main point of using Trunnel is that it +makes a simple format explicit and unambiguous. + +#### Merkle tree head +Tree heads are signed both by a log and its witnesses. It contains a +timestamp, a tree size, and a root hash. The timestamp is included so +that monitors can ensure _liveliness_. It is the time since the UNIX +epoch (January 1, 1970 00:00 UTC) in seconds. The tree size +specifies the current number of leaves. The root hash fixes the +structure and content of the Merkle tree. + +``` +struct tree_head { + u64 timestamp; + u64 tree_size; + u8 root_hash[32]; +}; +``` + +The serialized tree head must be signed using Ed25519. A witness must +not cosign a tree head if it is inconsistent with prior history or if +the timestamp is backdated or future-dated more than 12 hours. + +#### Merkle tree leaf +Logs support a single leaf type. It contains a shard hint, a +checksum over whatever the submitter wants to log a checksum for, a +signature that the submitter computed over the shard hint and the +checksum, and a hash of the submitter's public verification key, that +can be used to verify the signature. + +``` +struct message { + u64 shard_hint; + u8 checksum[32]; +}; + +struct tree_leaf { + struct message; + u8 signature_over_message[64]; + u8 key_hash[32]; +} +``` + +`message` is composed of the `shard_hint`, chosen by the submitter to +match the shard interval for the log it's submitting to, and the +submitter's `checksum` to be logged. + +`signature_over_message` is a signature over `message`, using the +submitter's verification key. It must be possible to verify the +signature using the submitter's public verification key, as indicated +by `key_hash`. + +`key_hash` is a hash of the submitter's verification key used for +signing `message`. It is included in `tree_leaf` so that the leaf can +be attributed to the submitter. A hash, rather than the full public +key, is used to motivate verifiers to locate the appropriate key and +make an explicit trust decision. + +## Public endpoints +Every log has a base URL that identifies it uniquely. The only +constraint is that it must be a valid HTTP(S) URL that can have the +`/st/v0/<endpoint>` suffix appended. For example, a complete endpoint +URL could be +`https://log.example.com/2021/st/v0/get-tree-head-cosigned`. + +Input data (in requests) is POST:ed in the HTTP message body as ASCII +key/value pairs. + +Output data (in replies) is sent in the HTTP message body in the same +format as the input data, i.e. as ASCII key/value pairs on the format +`Key=Value` + +The HTTP status code is 200 OK to indicate success. A different HTTP +status code is used to indicate failure, in which case a log should +respond with a human-readable string describing what went wrong using +the key `error`. Example: `error=Invalid signature.`. + +### get-tree-head-cosigned +Returns the latest cosigned tree head. Used together with +`get-proof-by-hash` and `get-consistency-proof` for verifying the tree. + +``` +GET <base url>/st/v0/get-tree-head-cosigned +``` + +Input: +- None + +Output on success: +- `timestamp`: `tree_head.timestamp` ASCII-encoded decimal number, + seconds since the UNIX epoch. +- `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number. +- `root_hash`: `tree_head.root_hash` hex-encoded. +- `signature`: hex-encoded Ed25519 signature over `timestamp`, + `tree_size` and `root_hash` serialized into a `tree_head` as + described in section `Merkle tree head`. +- `key_hash`: a hash of the public verification key (belonging to + either the log or to one of its witnesses), which can be used to + verify the most recent `signature`. The key is encoded as defined + in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), + and then hashed using SHA256. The hash value is hex-encoded. + +The `signature` and `key_hash` fields may repeat. The first signature +corresponds to the first key hash, the second signature corresponds to +the second key hash, etc. The number of signatures and key hashes +must match. + +### get-tree-head-to-sign +Returns the latest tree head to be signed by log witnesses. Used by +witnesses. + +``` +GET <base url>/st/v0/get-tree-head-to-sign +``` + +Input: +- None + +Output on success: +- `timestamp`: `tree_head.timestamp` ASCII-encoded decimal number, + seconds since the UNIX epoch. +- `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number. +- `root_hash`: `tree_head.root_hash` hex-encoded. +- `signature`: hex-encoded Ed25519 signature over `timestamp`, + `tree_size` and `root_hash` serialized into a `tree_head` as + described in section `Merkle tree head`. +- `key_hash`: a hash of the log's public verification key, which can + be used to verify `signature`. The key is encoded as defined in + [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), + and then hashed using SHA256. The hash value is hex-encoded. + +There is exactly one `signature` and one `key_hash` field. The +`key_hash` refers to the log's public verification key. + + +### get-tree-head-latest +Returns the latest tree head, signed only by the log. Used for +debugging purposes. + +``` +GET <base url>/st/v0/get-tree-head-latest +``` + +Input: +- None + +Output on success: +- `timestamp`: `tree_head.timestamp` ASCII-encoded decimal number, + seconds since the UNIX epoch. +- `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number. +- `root_hash`: `tree_head.root_hash` hex-encoded. +- `signature`: hex-encoded Ed25519 signature over `timestamp`, + `tree_size` and `root_hash` serialized into a `tree_head` as + described in section `Merkle tree head`. +- `key_hash`: a hash of the log's public verification key that can be + used to verify `signature`. The key is encoded as defined in + [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), + and then hashed using SHA256. The hash value is hex-encoded. + +There is exactly one `signature` and one `key_hash` field. The +`key_hash` refers to the log's public verification key. + + +### get-proof-by-hash +``` +POST <base url>/st/v0/get-proof-by-hash +``` + +Input: +- `leaf_hash`: leaf identifying which `tree_leaf` the log should prove + inclusion of, hex-encoded. +- `tree_size`: tree size of the tree head that the proof should be + based on, as an ASCII-encoded decimal number. + +Output on success: +- `tree_size`: tree size that the proof is based on, as an + ASCII-encoded decimal number. +- `leaf_index`: zero-based index of the leaf that the proof is based + on, as an ASCII-encoded decimal number. +- `inclusion_path`: node hash, hex-encoded. + +The leaf hash is computed using the RFC 6962 hashing strategy. In +other words, `SHA256(0x00 | tree_leaf)`. + +`inclusion_path` may be omitted or repeated to represent an inclusion +proof of zero or more node hashes. The order of node hashes follow +from the hash strategy, see RFC 6962. + +Example: `echo "leaf_hash=241fd4538d0a35c2d0394e4710ea9e6916854d08f62602fb03b55221dcdac90f +tree_size=4711" | curl --data-binary @- localhost/st/v0/get-proof-by-hash` + +### get-consistency-proof +``` +POST <base url>/st/v0/get-consistency-proof +``` + +Input: +- `new_size`: tree size of a newer tree head, as an ASCII-encoded + decimal number. +- `old_size`: tree size of an older tree head that the log should + prove is consistent with the newer tree head, as an ASCII-encoded + decimal number. + +Output on success: +- `new_size`: tree size of the newer tree head that the proof is based + on, as an ASCII-encoded decimal number. +- `old_size`: tree size of the older tree head that the proof is based + on, as an ASCII-encoded decimal number. +- `consistency_path`: node hash, hex-encoded. + +`consistency_path` may be omitted or repeated to represent a +consistency proof of zero or more node hashes. The order of node +hashes follow from the hash strategy, see RFC 6962. + +Example: `echo "new_size=4711 +old_size=42" | curl --data-binary @- localhost/st/v0/get-consistency-proof` + +### get-leaves +``` +POST <base url>/st/v0/get-leaves +``` + +Input: +- `start_size`: index of the first leaf to retrieve, as an + ASCII-encoded decimal number. +- `end_size`: index of the last leaf to retrieve, as an ASCII-encoded + decimal number. + +Output on success: +- `shard_hint`: `tree_leaf.message.shard_hint` as an ASCII-encoded + decimal number. +- `checksum`: `tree_leaf.message.checksum`, hex-encoded. +- `signature`: `tree_leaf.signature_over_message`, hex-encoded. +- `key_hash`: `tree_leaf.key_hash`, hex-encoded. + +All fields may be repeated to return more than one leaf. The first +value in each list refers to the first leaf, the second value in each +list refers to the second leaf, etc. The size of each list must +match. + +A log may return fewer leaves than requested. At least one leaf +must be returned on HTTP status code 200 OK. + +Example: `echo "start_size=42 +end_size=4711" | curl --data-binary @- localhost/st/v0/get-leaves` + +### add-leaf +``` +POST <base url>/st/v0/add-leaf +``` + +Input: +- `shard_hint`: number within the log's shard interval as an + ASCII-encoded decimal number. +- `checksum`: the cryptographic checksum that the submitter wants to + log, hex-encoded. +- `signature_over_message`: the submitter's signature over + `tree_leaf.message`, hex-encoded. +- `verification_key`: the submitter's public verification key. The + key is encoded as defined in + [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2) + and then hex-encoded. +- `domain_hint`: domain name indicating where `tree_leaf.key_hash` + can be found as a DNS TXT resource record. + +Output on success: +- None + +The submission will not be accepted if `signature_over_message` is +invalid or if the key hash retrieved using `domain_hint` does not +match a hash over `verification_key`. + +The submission may also not be accepted if the second-level domain +name exceeded its rate limit. By coupling every add-leaf request to +a second-level domain, it becomes more difficult to spam logs. You +would need an excessive number of domain names. This becomes costly +if free domain names are rejected. + +Logs don't publish domain-name to key bindings because key +management is more complex than that. + +Public logging should not be assumed to have happened until an +inclusion proof is available. An inclusion proof should not be relied +upon unless it leads up to a trustworthy signed tree head. Witness +cosigning can make a tree head trustworthy. + +Example: `echo "shard_hint=1640995200 +checksum=cfa2d8e78bf273ab85d3cef7bde62716261d1e42626d776f9b4e6aae7b6ff953 +signature_over_message=c026687411dea494539516ee0c4e790c24450f1a4440c2eb74df311ca9a7adf2847b99273af78b0bda65dfe9c4f7d23a5d319b596a8881d3bc2964749ae9ece3 +verification_key=c9a674888e905db1761ba3f10f3ad09586dddfe8581964b55787b44f318cbcdf +domain_hint=example.com" | curl --data-binary @- localhost/st/v0/add-leaf` + +### add-cosignature +``` +POST <base url>/st/v0/add-cosignature +``` + +Input: +- `signature`: Ed25519 signature over `tree_head`, hex-encoded. +- `key_hash`: hash of the witness' public verification key that can be + used to verify `signature`. The key is encoded as defined in + [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), + and then hashed using SHA256. The hash value is hex-encoded. + +Output on success: +- None + +`key_hash` can be used to identify which witness signed the tree +head. A key-hash, rather than the full verification key, is used to +motivate verifiers to locate the appropriate key and make an explicit +trust decision. + +Example: `echo "signature=d1b15061d0f287847d066630339beaa0915a6bbb77332c3e839a32f66f1831b69c678e8ca63afd24e436525554dbc6daa3b1201cc0c93721de24b778027d41af +key_hash=662ce093682280f8fbea9939abe02fdba1f0dc39594c832b411ddafcffb75b1d" | curl --data-binary @- localhost/st/v0/add-cosignature` + +## Summary of log parameters +- **Public key**: The Ed25519 verification key to be used for + verifying tree head signatures. +- **Log identifier**: The public verification key `Public key` hashed + using SHA256. +- **Shard interval start**: The earliest time at which logging + requests are accepted as the number of seconds since the UNIX epoch. +- **Shard interval end**: The latest time at which logging + requests are accepted as the number of seconds since the UNIX epoch. +- **Base URL**: Where the log can be reached over HTTP(S). It is the + prefix to be used to construct a version 0 specific endpoint. diff --git a/doc/claimant.md b/doc/claimant.md new file mode 100644 index 0000000..6728fef --- /dev/null +++ b/doc/claimant.md @@ -0,0 +1,71 @@ +# Claimant model +## **System<sup>CHECKSUM</sup>** +System<sup>CHECKSUM</sup> is about the claims made by a data publisher. +* **Claim<sup>CHECKSUM</sup>**: + _I, data publisher, claim that the data_: + 1. has cryptographic hash X + 2. is produced by no-one but myself +* **Statement<sup>CHECKSUM</sup>**: signed checksum<br> +* **Claimant<sup>CHECKSUM</sup>**: data publisher<br> + The data publisher is a party that wants to publish some data. +* **Believer<sup>CHECKSUM</sup>**: end-user<br> + The end-user is a party that wants to use some published data. +* **Verifier<sup>CHECKSUM</sup>**: data publisher<br> + Only the data publisher can verify the above claims. +* **Arbiter<sup>CHECKSUM</sup>**:<br> + There's no official body. Invalidated claims would affect reputation. + +System<sup>CHECKSUM\*</sup> can be defined to make more specific claims. Below +is a reproducible builds example. + +### **System<sup>CHECKSUM-RB</sup>**: +System<sup>CHECKSUM-RB</sup> is about the claims made by a _software publisher_ +that makes reproducible builds available. +* **Claim<sup>CHECKSUM-RB</sup>**: + _I, software publisher, claim that the data_: + 1. has cryptographic hash X + 2. is the output of a reproducible build for which the source can be located + using X as an identifier +* **Statement<sup>CHECKSUM-RB</sup>**: Statement<sup>CHECKSUM</sup> +* **Claimant<sup>CHECKSUM-RB</sup>**: software publisher<br> + The software publisher is a party that wants to publish the output of a + reproducible build. +* **Believer<sup>CHECKSUM-RB</sup>**: end-user<br> + The end-user is a party that wants to run an executable binary that built + reproducibly. +* **Verifier<sup>CHECKSUM-RB</sup>**: any interested party<br> + These parties try to verify the above claims. For example: + * the software publisher itself (_"has my identity been compromised?"_) + * rebuilders that check for locatability and reproducibility +* **Arbiter<sup>CHECKSUM-RB</sup>**:<br> + There's no official body. Invalidated claims would affect reputation. + +## **System<sup>CHECKSUM-LOG</sup>**: +System<sup>CHECKSUM-LOG</sup> is about the claims made by a _log operator_. +It adds _discoverability_ into System<sup>CHECKSUM\*</sup>. Discoverability +means that Verifier<sup>CHECKSUM\*</sup> can see all +Statement<sup>CHECKSUM</sup> that Believer<sup>CHECKSUM\*</sup> accept. + +* **Claim<sup>CHECKSUM-LOG</sup>**: + _I, log operator, make available:_ + 1. a globally consistent append-only log of Statement<sup>CHECKSUM</sup> +* **Statement<sup>CHECKSUM-LOG</sup>**: signed tree head +* **Claimant<sup>CHECKSUM-LOG</sup>**: log operator<br> + Possible operators might be: + * a small subset of data publishers + * members of relevant consortia +* **Believer<sup>CHECKSUM-LOG</sup>**: + * Believer<sup>CHECKSUM\*</sup> + * Verifier<sup>CHECKSUM\*</sup><br> +* **Verifier<sup>CHECKSUM-LOG</sup>**: third parties<br> + These parties verify the above claims. Examples include: + * members of relevant consortia + * non-profits and other reputable organizations + * security enthusiasts and researchers + * log operators (cross-ecosystem) + * monitors (cross-ecosystem) + * a small subset of data publishers (cross-ecosystem) +* **Arbiter<sup>CHECKSUM-LOG</sup>**:<br> + There is no official body. The ecosystem at large should stop using an + instance of System<sup>CHECKSUM-LOG</sup> if cryptographic proofs of log + misbehavior are preseneted by some Verifier<sup>CHECKSUM-LOG</sup>. diff --git a/doc/design.md b/doc/design.md new file mode 100644 index 0000000..2e01a34 --- /dev/null +++ b/doc/design.md @@ -0,0 +1,251 @@ +# System Transparency Logging: Design v0 +We propose System Transparency logging. It is similar to Certificate +Transparency, except that cryptographically signed checksums are logged as +opposed to X.509 certificates. Publicly logging signed checksums allow anyone +to discover which keys produced what signatures. As such, malicious and +unintended key-usage can be _detected_. We present our design and conclude by +providing two use-cases: binary transparency and reproducible builds. + +**Target audience.** +You are most likely interested in transparency logs or supply-chain security. + +**Preliminaries.** +You have basic understanding of cryptographic primitives like digital +signatures, hash functions, and Merkle trees. You roughly know what problem +Certificate Transparency solves and how. + +**Warning.** +This is a work-in-progress document that may be moved or modified. A future +revision of this document will bump the version number to v1. Please let us +know if you have any feedback. + +## Introduction +Transparency logs make it possible to detect unwanted events. For example, + are there any (mis-)issued TLS certificates [\[CT\]](https://tools.ietf.org/html/rfc6962), + did you get a different Go module than everyone else [\[ChecksumDB\]](https://go.googlesource.com/proposal/+/master/design/25530-sumdb.md), + or is someone running unexpected commands on your server [\[AuditLog\]](https://transparency.dev/application/reliably-log-all-actions-performed-on-your-servers/). +A System Transparency log makes signed checksums transparent. The overall goal +is to facilitate detection of unwanted key-usage. + +## Threat model and (non-)goals +We consider a powerful attacker that gained control of a target's signing and +release infrastructure. This covers a weaker form of attacker that is able to +sign data and distribute it to a subset of isolated users. For example, this is +essentially what the FBI requested from Apple in the San Bernardino case [\[FBI-Apple\]](https://www.eff.org/cases/apple-challenges-fbi-all-writs-act-order). +The fact that signing keys and related infrastructure components get +compromised should not be controversial these days [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/). + +The attacker can also gain control of the transparency log's signing key and +infrastructure. This covers a weaker form of attacker that is able to sign log +data and distribute it to a subset of isolated users. For example, this could +have been the case when a remote code execution was found for a Certificate +Transparency Log [\[DigiCert\]](https://groups.google.com/a/chromium.org/g/ct-policy/c/aKNbZuJzwfM). + +Any attacker that is able to position itself to control these components will +likely be _risk-averse_. This is at minimum due to two factors. First, +detection would result in a significant loss of capability that is by no means +trivial to come by. Second, detection means that some part of the attacker's +malicious behavior will be disclosed publicly. + +Our goal is to facilitate _detection_ of compromised signing keys. We consider +a signing key compromised if an end-user accepts an unwanted signature as valid. +The solution that we propose is that signed checksums are transparency logged. +For security we need a collision resistant hash function and an unforgeable +signature scheme. We also assume that at most a threshold of seemingly +independent parties are adversarial. + +It is a non-goal to disclose the data that a checksum represents. For example, +the log cannot distinguish between a checksum that represents a tax declaration, +an ISO image, or a Debian package. This means that the type of detection we +support is more _coarse-grained_ when compared to Certificate Transparency. + +## Design +We consider a data publisher that wants to digitally sign their data. The data +is of opaque type. We assume that end-users have a mechanism to locate the +relevant public verification keys. Data and signatures can also be retrieved +(in)directly from the data publisher. We make little assumptions about the +signature tooling. The ecosystem at large can continue to use `gpg`, `openssl`, +`ssh-keygen -Y`, `signify`, or something else. + +We _have to assume_ that additional tooling can be installed by end-users that +wish to enforce transparency logging. For example, none of the existing +signature tooling supports verification of Merkle tree proofs. A side-effect of +our design is that this additional tooling makes no outbound connections. The +above data flows are thus preserved. + +### A bird's view +A central part of any transparency log is the data stored by the log. The data is stored by the +leaves of an append-only Merkle tree. Our leaf structure contains four fields: +- **shard_hint**: a number that binds the leaf to a particular _shard interval_. +Sharding means that the log has a predefined time during which logging requests +are accepted. Once elapsed, the log can be shut down. +- **checksum**: a cryptographic hash of some opaque data. The log never +sees the opaque data; just the hash made by the data publisher. +- **signature**: a digital signature that is computed by the data publisher over +the leaf's shard hint and checksum. +- **key_hash**: a cryptographic hash of the data publisher's public verification key that can be +used to verify the signature. + +#### Step 1 - preparing a logging request +The data publisher selects a shard hint and a checksum that should be logged. +For example, the shard hint could be "logs that are active during 2021". The +checksum might be the hash of a release file. + +The data publisher signs the selected shard hint and checksum using a secret +signing key. Both the signed message and the signature is stored +in the leaf for anyone to verify. Including a shard hint in the signed message +ensures that a good Samaritan cannot change it to log all leaves from an +earlier shard into a newer one. + +A hash of the public verification key is also stored in the leaf. This makes it +possible to attribute the leaf to the data publisher. For example, a data publisher +that monitors the log can look for leaves that match their own key hash(es). + +A hash, rather than the full public verification key, is used to motivate the +verifier to locate the key and make an explicit trust decision. Not disclosing the public +verification key in the leaf makes it more unlikely that someone would use an untrusted key _by +mistake_. + +#### Step 2 - submitting a logging request +The log implements an HTTP(S) API. Input and output is human-readable and uses +a simple key-value format. A more complex parser like JSON is not needed +because the exchanged data structures are primitive enough. + +The data publisher submits their shard hint, checksum, signature, and public +verification key as key-value pairs. The log will use the public verification +key to check that the signature is valid, then hash it to construct the `key_hash` part of the leaf. + +The data publisher also submits a _domain hint_. The log will download a DNS +TXT resource record based on the provided domain name. The downloaded result +must match the public verification key hash. By verifying that the submitter +controls a domain that is aware of the public verification key, rate limits can +be applied per second-level domain. As a result, you would need a large number +of domain names to spam the log in any significant way. + +Using DNS to combat spam is convenient because many data publishers already have +a domain name. A single domain name is also relatively cheap. Another +benefit is that the same anti-spam mechanism can be used across several +independent logs without coordination. This is important because a healthy log +ecosystem needs more than one log in order to be reliable. DNS also has built-in +caching which data publishers can influence by setting TTLs accordingly. + +The submitter's domain hint is not part of the leaf because key management is +more complex than that. A separate project should focus on transparent key +management. The scope of our work is transparent _key-usage_. + +The log will _try_ to incorporate a leaf into the Merkle tree if a logging +request is accepted. There are no _promises of public logging_ as in +Certificate Transparency. Therefore, the submitter needs to wait for an +inclusion proof to appear before concluding that the logging request succeeded. Not having +inclusion promises makes the log less complex. + +#### Step 3 - distributing proofs of public logging +The data publisher is responsible for collecting all cryptographic proofs that +their end-users will need to enforce public logging. The collection below +should be downloadable from the same place that published data is normally hosted. +1. **Opaque data**: the data publisher's opaque data. +2. **Shard hint**: the data publisher's selected shard hint. +3. **Signature**: the data publisher's leaf signature. +4. **Cosigned tree head**: the log's tree head and a _list of signatures_ that +state it is consistent with prior history. +5. **Inclusion proof**: a proof of inclusion based on the logged leaf and tree +head in question. + +The data publisher's public verification key is known. Therefore, the first three fields are +sufficient to reconstruct the logged leaf. The leaf's signature can be +verified. The final two fields then prove that the leaf is in the log. If the +leaf is included in the log, any monitor can detect that there is a new +signature made by a given data publisher, 's public verification key. + +The catch is that the proof of logging is only as convincing as the tree head +that the inclusion proof leads up to. To bypass public logging, the attacker +needs to control a threshold of independent _witnesses_ that cosign the log. A +benign witness will only sign the log's tree head if it is consistent with prior +history. + +#### Summary +The log is sharded and will shut down at a predefined time. The log can shut +down _safely_ because end-user verification is not interactive. The difficulty +of bypassing public logging is based on the difficulty of controlling a +threshold of independent witnesses. Witnesses cosign tree heads to make them +trustworthy. + +Submitters, monitors, and witnesses interact with the log using an HTTP(S) API. +Submitters must prove that they own a domain name as an anti-spam mechanism. +End-users interact with the log _indirectly_ via a data publisher. It is the +data publisher's job to log signed checksums, distribute necessary proofs of +logging, and monitor the log. + +### A peek into the details +Our bird's view introduction skipped many details that matter in practise. Some +of these details are presented here using a question-answer format. A +question-answer format is helpful because it is easily modified and extended. + +#### What cryptographic primitives are supported? +The only supported hash algorithm is SHA256. The only supported signature +scheme is Ed25519. Not having any cryptographic agility makes the protocol less +complex and more secure. + +We can be cryptographically opinionated because of a key insight. Existing +signature tools like `gpg`, `ssh-keygen -Y`, and `signify` cannot verify proofs +of public logging. Therefore, _additional tooling must already be installed by +end-users_. That tooling should verify hashes using the log's hash function. +That tooling should also verify signatures using the log's signature scheme. +Both tree heads and tree leaves are being signed. + +#### Why not let the data publisher pick their own signature scheme and format? +Agility introduces complexity and difficult policy questions. For example, +which algorithms and formats should (not) be supported and why? Picking Ed25519 +is a current best practise that should be encouraged if possible. + +There is not much we can do if a data publisher _refuses_ to rely on the log's +hash function or signature scheme. + +#### What if the data publisher must use a specific signature scheme or format? +They may _cross-sign_ the data as follows. +1. Sign the data as they're used to. +2. Hash the data and use the result as the leaf's checksum to be logged. +3. Sign the leaf using the log's signature scheme. + +For verification, the end-user first verifies that the usual signature from step 1 is valid. Then the +end-user uses the additional tooling (which is already required) to verify the rest. +Cross-signing should be a relatively comfortable upgrade path that is backwards +compatible. The downside is that the data publisher may need to manage an +additional key-pair. + +#### What (de)serialization parsers are needed? +#### What policy should be used? +#### Why witness cosigning? +#### Why sharding? +Unlike X.509 certificates which already have validity ranges, a +checksum does not carry any such information. Therefore, we require +that the submitter selects a _shard hint_. The selected shard hint +must be in the log's _shard interval_. A shard interval is defined by +a start time and an end time. Both ends of the shard interval are +inclusive and expressed as the number of seconds since the UNIX epoch +(January 1, 1970 00:00 UTC). + +Sharding simplifies log operations because it becomes explicit when a +log can be shutdown. A log must only accept logging requests that +have valid shard hints. A log should only accept logging requests +during the predefined shard interval. Note that _the submitter's +shard hint is not a verified timestamp_. The submitter should set the +shard hint as large as possible. If a roughly verified timestamp is +needed, a cosigned tree head can be used. + +Without a shard hint, the good Samaritan could log all leaves from an +earlier shard into a newer one. Not only would that defeat the +purpose of sharding, but it would also become a potential +denial-of-service vector. + +#### TODO +Add more key questions and answers. +- Log spamming +- Log poisoning +- Why we removed identifier field from the leaf +- Explain `latest`, `stable` and `cosigned` tree head. +- Privacy aspects +- How does this whole thing work with more than one log? + +## Concluding remarks +Example of binary transparency and reproducible builds. diff --git a/doc/sketch.md b/doc/sketch.md deleted file mode 100644 index 31964e0..0000000 --- a/doc/sketch.md +++ /dev/null @@ -1,372 +0,0 @@ -# System Transparency Logging -This document provides a sketch of System Transparency (ST) logging. The basic -idea is to insert hashes of system artifacts into a public, append-only, and -tamper-evident transparency log, such that any enforcing client can be sure that -they see the same system artifacts as everyone else. A system artifact could -be a browser update, an operating system image, a Debian package, or more -generally something that is opaque. - -We take inspiration from the Certificate Transparency Front-End -([CTFE](https://github.com/google/certificate-transparency-go/tree/master/trillian/ctfe)) -that implements [RFC 6962](https://tools.ietf.org/html/rfc6962) for -[Trillian](https://transparency.dev). - -## Log parameters -An ST log is defined by the following parameters: -- `log_identifier`: a `Namespace` of type `ed25519_v1` that defines the log's -signing algorithm and public verification key. -- `supported_namespaces`: a list of namespace types that the log supports. -Entities must use a supported namespace type when posting signed data to the -log. -- `base_url`: prefix used by clients that contact the log, e.g., -example.com:1234/log. -- `final_cosigned_tree_head`: an `StItem` of type `cosigned_tree_head_v*`. Not -set until the log is turned into read-only mode in preparation of a shutdown. - -ST logs use the same hash strategy as described in RFC 6962: SHA256 with `0x00` -as leaf node prefix and `0x01` as interior node prefix. - -In contrast to Certificate Transparency (CT) **there is no Maximum Merge Delay -(MMD)**. New entries are merged into the log as soon as possible, and no client -should trust that something is logged until an inclusion proof can be provided -that references a trustworthy STH. Therefore, **there are no "promises" of -public logging** as in CT. - -To produce trustworthy STHs a simple form of [witness -cosigning](https://arxiv.org/pdf/1503.08768.pdf) is built into the log. -Witnesses poll the log for the next stable STH, and verify that it is consistent -before posting a cosignature that can then be served by the log. - -## Acceptance criteria and scope -A log should accept a leaf submission if it is: -- Well-formed, see data structure definitions below. -- Digitally signed by a registered namespace. - -Rate limits may be applied per namespace to combat spam. Namespaces may also be -used by clients to determine which entries belong to who. It is up to the -submitters to communicate trusted namespaces to their own clients. In other -words, there are no mappings from namespaces to identities built into the log. -There is also no revocation of namespaces: **we facilitate _detection_ of -compromised signing keys by making artifact hashes public, which is not to be -confused with _prevention_ or even _recovery_ after detection**. - -## Data structure definitions -Data structures are defined and serialized using the presentation language in -[RFC 5246, §4](https://tools.ietf.org/html/rfc5246). A definition of the log's -Merkle tree can be found in [RFC 6962, -§2](https://tools.ietf.org/html/rfc6962#section-2). - -### Namespace -A _namespace_ is a versioned data structure that contains a public verification -key (or fingerprint), as well as enough information to determine its format, -signing, and verification operations. Namespaces are used as identifiers, both -for the log itself and the parties that submit artifact hashes and cosignatures. - -``` -enum { - reserved(0), - ed25519_v1(1), - (2^16-1) -} NamespaceFormat; - -struct { - NamespaceFormat format; - select (format) { - case ed25519_v1: Ed25519V1; - } message; -} Namespace; -``` - -Our namespace format is inspired by Keybase's -[key-id](https://keybase.io/docs/api/1.0/kid). - -#### Ed25519V1 -At this time the only supported namespace type is based on Ed25519. The -namespace field contains the full verification key. Signing operations and -serialized formats are defined by [RFC -8032](https://tools.ietf.org/html/rfc8032). -``` -struct { - opaque namespace[32]; // public verification key -} Ed25519V1; -``` - -### `StItem` -A general-purpose `TransItem` is defined in [RFC 6962/bis, -§4.5](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.5). -We define our own `TransItem`, but name it `StItem` to emphasize that they are -not the same. - -``` -enum { - reserved(0), - signed_tree_head_v1(1), - cosigned_tree_head_v1(2), - consistency_proof_v1(3), - inclusion_proof_v1(4), - signed_checksum_v1(5), // leaf type - (2^16-1) -} StFormat; - -struct { - StFormat format; - select (format) { - case signed_tree_head_v1: SignedTreeHeadV1; - case cosigned_tree_head_v1: CosignedTreeHeadV1; - case consistency_proof_v1: ConsistencyProofV1; - case inclusion_proof_v1: InclusionProofV1; - case signed_checksum_v1: SignedChecksumV1; - } message; -} StItem; - -struct { - StItem items<0..2^32-1>; -} StItemList; -``` - -#### `signed_tree_head_v1` -We use the same tree head definition as in [RFC 6962/bis, -§4.9](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.9). -The resulting _signed_ tree head is packaged differently: a namespace is used as -log identifier, and it is communicated in a `SignatureV1` structure. -``` -struct { - TreeHeadV1 tree_head; - SignatureV1 signature; -} SignedTreeHeadV1; - -struct { - uint64 timestamp; - uint64 tree_size; - NodeHash root_hash; - Extension extensions<0..2^16-1>; -} TreeHeadV1; -opaque NodeHash<32..2^8-1>; - -struct { - Namespace namespace; - opaque signature<1..2^16-1>; -} SignatureV1; -``` - -#### `cosigned_tree_head_v1` -Transparency logs were designed to be cryptographically verifiable in the -presence of a gossip-audit model that ensures everyone observes _the same -cryptographically verifiable log_. The gossip-audit model is largely undefined -in today's existing transparency logging ecosystems, which means that the logs -must be trusted to play by the rules. We wanted to avoid that outcome in our -ecosystem. Therefore, a gossip-audit model is built into the log. - -The basic idea is that an STH should only be considered valid if it is cosigned -by a number of witnesses that verify the append-only property. Which witnesses -to trust and under what circumstances is defined by a client-side _witness -cosigning policy_. For example, - "require no witness cosigning", - "must have at least `k` signatures from witnesses A...J", and - "must have at least `k` signatures from witnesses A...J where one is from - witness B". - -Witness cosigning policies are beyond the scope of this specification. - -A cosigned STH is composed of an STH and a list of cosignatures. A cosignature -must cover the serialized STH as an `StItem`, and be produced with a witness -namespace of type `ed25519_v1`. - -``` -struct { - SignedTreeHeadV1 signed_tree_head; - SignatureV1 cosignatures<0..2^32-1>; // vector of cosignatures -} CosignedTreeHeadV1; -``` - -#### `consistency_proof_v1` -For the most part we use the same consistency proof definition as in [RFC -6962/bis, -§4.11](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.11). -There are two modifications: our log identifier is a namespace rather than an -[OID](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.4), -and a consistency proof may be empty. - -``` -struct { - Namespace log_id; - uint64 tree_size_1; - uint64 tree_size_2; - NodeHash consistency_path<0..2^16-1>; -} ConsistencyProofV1; -``` - -#### `inclusion_proof_v1` -For the most part we use the same inclusion proof definition as in [RFC -6962/bis, -§4.12](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.12). -There are two modifications: our log identifier is a namespace rather than an -[OID](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.4), -and an inclusion proof may be empty. -``` -struct { - Namespace log_id; - uint64 tree_size; - uint64 leaf_index; - NodeHash inclusion_path<0..2^16-1>; -} InclusionProofV1; -``` - -#### `signed_checksum_v1` -A checksum entry contains a package identifier like `foobar-1.2.3` and an -artifact hash. It is then signed so that clients can distinguish artifact -hashes from two different software publishers A and B. For example, the -`signed_checksum_v1` type can help [enforce public binary logging before -accepting a new software -update](https://wiki.mozilla.org/Security/Binary_Transparency). - -``` -struct { - ChecksumV1 data; - SignatureV1 signature; -} SignedChecksumV1; - -struct { - opaque identifier<1..128>; - opaque checksum<1..64>; -} ChecksumV1; -``` - -It is assumed that clients know how to find the real artifact source (if not -already at hand), such that the logged hash can be recomputed and compared for -equality. The log is not aware of how artifact hashes are computed, which means -that it is up to the submitters to define hash functions, data formats, and -such. - -## Public endpoints -Clients talk to the log using HTTP(S). Successfully processed requests are -responded to with HTTP status code `200 OK`, and any returned data is -serialized. Endpoints without input parameters use HTTP GET requests. -Endpoints that have input parameters HTTP POST a TLS-serialized data structure. -The HTTP content type `application/octet-stream` is used when sending data. - -### add-entry -``` -POST https://<base url>/st/v1/add-entry -``` - -Input: -- An `StItem` of type `signed_checksum_v1`. - -No output. - -### add-cosignature -``` -POST https://<base url>/st/v1/add-cosignature -``` - -Input: -- An `StItem` of type `cosigned_tree_head_v1`. The list of cosignatures must -be of length one, the witness signature must cover the item's STH, and that STH -must additionally match the log's stable STH that is currently being cosigned. - -No output. - -### get-latest-sth -``` -GET https://<base url>/st/v1/get-latest-sth -``` - -No input. - -Output: -- An `StItem` of type `signed_tree_head_v1` that corresponds to the most -recent STH. - -### get-stable-sth -``` -GET https://<base url>/st/v1/get-stable-sth -``` - -No input. - -Output: -- An `StItem` of type `signed_tree_head_v1` that corresponds to a stable STH -that witnesses should cosign. The same STH is returned for a period of time. - -### get-cosigned-sth -``` -GET https://<base url>/st/v1/get-cosigned-sth -``` - -No input. - -Output: -- An `StItem` of type `cosigned_tree_head_v1` that corresponds to the most -recent cosigned STH. - -### get-proof-by-hash -``` -POST https://<base url>/st/v1/get-proof-by-hash -``` - -Input: -``` -struct { - opaque hash[32]; // leaf hash - uint64 tree_size; // tree size that the proof should be based on -} GetProofByHashV1; -``` - -Output: -- An `StItem` of type `inclusion_proof_v1`. - -### get-consistency-proof -``` -POST https://<base url>/st/v1/get-consistency-proof -``` - -Input: -``` -struct { - uint64 first; // first tree size that the proof should be based on - uint64 second; // second tree size that the proof should be based on -} GetConsistencyProofV1; -``` - -Output: -- An `StItem` of type `consistency_proof_v1`. - -### get-entries -``` -POST https://<base url>/st/v1/get-entries -``` - -Input: -``` -struct { - uint64 start; // 0-based index of first entry to retrieve - uint64 end; // 0-based index of last entry to retrieve in decimal. -} GetEntriesV1; -``` - -Output: -- An `StItem` list where each entry is of type `signed_checksum_v1`. The first -`StItem` corresponds to the start index, the second one to `start+1`, etc. The -log may return fewer entries than requested. - -# Appendix A -In the future other namespace types might be supported. For example, we could -add [RSASSA-PKCS1-v1_5](https://tools.ietf.org/html/rfc3447#section-8.2) as -follows: -1. Add `rsa_v1` format and RSAV1 namespace. This is what we would register on -the server-side such that the server knows the namespace and complete key. -``` -struct { - opaque namespace<32>; // key fingerprint - // + some encoding of public key -} RSAV1; -``` -2. Add `rsassa_pkcs1_5_v1` format and `RSASSAPKCS1_5_v1`. This is what the -submitter would use to communicate namespace and RSA signature mode. -``` -struct { - opaque namespace<32>; // key fingerprint - // + necessary parameters, e.g., SHA256 as hash function -} RSASSAPKCS1_5V1; -``` |