diff options
-rw-r--r-- | README.md | 4 | ||||
-rw-r--r-- | doc/api.md | 398 | ||||
-rw-r--r-- | doc/claimant.md | 71 | ||||
-rw-r--r-- | doc/design.md | 251 |
4 files changed, 2 insertions, 722 deletions
@@ -34,8 +34,8 @@ _rejected_ unless a corresponding signed checksum is publicly logged. ## Design considerations We had several design considerations in mind while developing siglog. A short -preview is listed below. Please refer to our [design document](https://github.com/system-transparency/stfe/blob/main/doc/design.md) -and [API specification](https://github.com/system-transparency/stfe/blob/main/doc/api.md) +preview is listed below. Please refer to our [design document](https://github.com/sigsum/sigsum/blob/main/doc/design.md) +and [API specification](https://github.com/sigsum/sigsum/blob/main/doc/api.md) for additional details. Feedback is welcomed and encouraged! - **Preserved data flows:** an end-user can enforce transparent logging without making additional outbound network connections. Proofs of public logging should diff --git a/doc/api.md b/doc/api.md deleted file mode 100644 index 57ad119..0000000 --- a/doc/api.md +++ /dev/null @@ -1,398 +0,0 @@ -# System Transparency Logging: API v0 -This document describes details of the System Transparency logging -API, version 0. The broader picture is not explained here. We assume -that you have read the System Transparency Logging design document. -It can be found -[here](https://github.com/system-transparency/stfe/blob/design/doc/design.md). - -**Warning.** -This is a work-in-progress document that may be moved or modified. - -## Overview -Logs implement an HTTP(S) API for accepting requests and sending -responses. - -- Input data in requests and output data in responses are expressed as - ASCII-encoded key/value pairs. -- Requests with input data use HTTP POST to send the data to a log. -- Binary data is hex-encoded before being transmitted. - -The motivation for using a text based key/value format for request and -response data is that it's simple to parse. Note that this format is -not being used for the serialization of signed or logged data, where a -more well defined and storage efficient format is desirable. A -submitter may distribute log responses to their end-users in any -format that suits them. The (de)serialization required for -_end-users_ is a small subset of Trunnel. Trunnel is an "idiot-proof" -wire-format in use by the Tor project. - -## Primitives -### Cryptography -Logs use the same Merkle tree hash strategy as -[RFC 6962,ยง2](https://tools.ietf.org/html/rfc6962#section-2). -The hash functions must be -[SHA256](https://csrc.nist.gov/csrc/media/publications/fips/180/4/final/documents/fips180-4-draft-aug2014.pdf). -Logs must sign tree heads using -[Ed25519](https://tools.ietf.org/html/rfc8032). Log witnesses -must also sign tree heads using Ed25519. - -All other parts that are not Merkle tree related also use SHA256 as -the hash function. Using more than one hash function would increases -the overall attack surface: two hash functions must be collision -resistant instead of one. - -### Serialization -Log requests and responses are transmitted as ASCII-encoded key/value -pairs, for a smaller dependency than an alternative parser like JSON. -Some input and output data is binary: cryptographic hashes and -signatures. Binary data must be Base16-encoded, also known as hex -encoding. Using hex as opposed to base64 is motivated by it being -simpler, favoring ease of decoding and encoding over efficiency on the -wire. - -We use the -[Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html) -to define (de)serialization of data structures that need to be signed or -inserted into the Merkle tree. Trunnel is more expressive than the -[SSH wire format](https://tools.ietf.org/html/rfc4251#section-5). -It is about as expressive as the -[TLS presentation language](https://tools.ietf.org/html/rfc8446#section-3). -A notable difference is that Trunnel supports integer constraints. -The Trunnel language is also readable by humans _and_ machines. -"Obviously correct code" can be generated in C and Go. - -A fair summary of our Trunnel usage is as follows. - -All integers are 64-bit, unsigned, and in network byte order. -Fixed-size byte arrays are put into the serialization buffer in-order, -starting from the first byte. Variable length byte arrays first -declare their length as an integer, which is then followed by that -number of bytes. These basic types are concatenated to form a -collection. You should not need a general-purpose Trunnel -(de)serialization parser to work with this format. If you have one, -you may use it though. The main point of using Trunnel is that it -makes a simple format explicit and unambiguous. - -#### Merkle tree head -Tree heads are signed both by a log and its witnesses. It contains a -timestamp, a tree size, and a root hash. The timestamp is included so -that monitors can ensure _liveliness_. It is the time since the UNIX -epoch (January 1, 1970 00:00 UTC) in seconds. The tree size -specifies the current number of leaves. The root hash fixes the -structure and content of the Merkle tree. - -``` -struct tree_head { - u64 timestamp; - u64 tree_size; - u8 root_hash[32]; -}; -``` - -The serialized tree head must be signed using Ed25519. A witness must -not cosign a tree head if it is inconsistent with prior history or if -the timestamp is backdated or future-dated more than 12 hours. - -#### Merkle tree leaf -Logs support a single leaf type. It contains a shard hint, a -checksum over whatever the submitter wants to log a checksum for, a -signature that the submitter computed over the shard hint and the -checksum, and a hash of the submitter's public verification key, that -can be used to verify the signature. - -``` -struct message { - u64 shard_hint; - u8 checksum[32]; -}; - -struct tree_leaf { - struct message; - u8 signature_over_message[64]; - u8 key_hash[32]; -} -``` - -`message` is composed of the `shard_hint`, chosen by the submitter to -match the shard interval for the log it's submitting to, and the -submitter's `checksum` to be logged. - -`signature_over_message` is a signature over `message`, using the -submitter's verification key. It must be possible to verify the -signature using the submitter's public verification key, as indicated -by `key_hash`. - -`key_hash` is a hash of the submitter's verification key used for -signing `message`. It is included in `tree_leaf` so that the leaf can -be attributed to the submitter. A hash, rather than the full public -key, is used to motivate verifiers to locate the appropriate key and -make an explicit trust decision. - -## Public endpoints -Every log has a base URL that identifies it uniquely. The only -constraint is that it must be a valid HTTP(S) URL that can have the -`/st/v0/<endpoint>` suffix appended. For example, a complete endpoint -URL could be -`https://log.example.com/2021/st/v0/get-tree-head-cosigned`. - -Input data (in requests) is POST:ed in the HTTP message body as ASCII -key/value pairs. - -Output data (in replies) is sent in the HTTP message body in the same -format as the input data, i.e. as ASCII key/value pairs on the format -`Key=Value` - -The HTTP status code is 200 OK to indicate success. A different HTTP -status code is used to indicate failure, in which case a log should -respond with a human-readable string describing what went wrong using -the key `error`. Example: `error=Invalid signature.`. - -### get-tree-head-cosigned -Returns the latest cosigned tree head. Used together with -`get-proof-by-hash` and `get-consistency-proof` for verifying the tree. - -``` -GET <base url>/st/v0/get-tree-head-cosigned -``` - -Input: -- None - -Output on success: -- `timestamp`: `tree_head.timestamp` ASCII-encoded decimal number, - seconds since the UNIX epoch. -- `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number. -- `root_hash`: `tree_head.root_hash` hex-encoded. -- `signature`: hex-encoded Ed25519 signature over `timestamp`, - `tree_size` and `root_hash` serialized into a `tree_head` as - described in section `Merkle tree head`. -- `key_hash`: a hash of the public verification key (belonging to - either the log or to one of its witnesses), which can be used to - verify the most recent `signature`. The key is encoded as defined - in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), - and then hashed using SHA256. The hash value is hex-encoded. - -The `signature` and `key_hash` fields may repeat. The first signature -corresponds to the first key hash, the second signature corresponds to -the second key hash, etc. The number of signatures and key hashes -must match. - -### get-tree-head-to-sign -Returns the latest tree head to be signed by log witnesses. Used by -witnesses. - -``` -GET <base url>/st/v0/get-tree-head-to-sign -``` - -Input: -- None - -Output on success: -- `timestamp`: `tree_head.timestamp` ASCII-encoded decimal number, - seconds since the UNIX epoch. -- `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number. -- `root_hash`: `tree_head.root_hash` hex-encoded. -- `signature`: hex-encoded Ed25519 signature over `timestamp`, - `tree_size` and `root_hash` serialized into a `tree_head` as - described in section `Merkle tree head`. -- `key_hash`: a hash of the log's public verification key, which can - be used to verify `signature`. The key is encoded as defined in - [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), - and then hashed using SHA256. The hash value is hex-encoded. - -There is exactly one `signature` and one `key_hash` field. The -`key_hash` refers to the log's public verification key. - - -### get-tree-head-latest -Returns the latest tree head, signed only by the log. Used for -debugging purposes. - -``` -GET <base url>/st/v0/get-tree-head-latest -``` - -Input: -- None - -Output on success: -- `timestamp`: `tree_head.timestamp` ASCII-encoded decimal number, - seconds since the UNIX epoch. -- `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number. -- `root_hash`: `tree_head.root_hash` hex-encoded. -- `signature`: hex-encoded Ed25519 signature over `timestamp`, - `tree_size` and `root_hash` serialized into a `tree_head` as - described in section `Merkle tree head`. -- `key_hash`: a hash of the log's public verification key that can be - used to verify `signature`. The key is encoded as defined in - [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), - and then hashed using SHA256. The hash value is hex-encoded. - -There is exactly one `signature` and one `key_hash` field. The -`key_hash` refers to the log's public verification key. - - -### get-proof-by-hash -``` -POST <base url>/st/v0/get-proof-by-hash -``` - -Input: -- `leaf_hash`: leaf identifying which `tree_leaf` the log should prove - inclusion of, hex-encoded. -- `tree_size`: tree size of the tree head that the proof should be - based on, as an ASCII-encoded decimal number. - -Output on success: -- `tree_size`: tree size that the proof is based on, as an - ASCII-encoded decimal number. -- `leaf_index`: zero-based index of the leaf that the proof is based - on, as an ASCII-encoded decimal number. -- `inclusion_path`: node hash, hex-encoded. - -The leaf hash is computed using the RFC 6962 hashing strategy. In -other words, `SHA256(0x00 | tree_leaf)`. - -`inclusion_path` may be omitted or repeated to represent an inclusion -proof of zero or more node hashes. The order of node hashes follow -from the hash strategy, see RFC 6962. - -Example: `echo "leaf_hash=241fd4538d0a35c2d0394e4710ea9e6916854d08f62602fb03b55221dcdac90f -tree_size=4711" | curl --data-binary @- localhost/st/v0/get-proof-by-hash` - -### get-consistency-proof -``` -POST <base url>/st/v0/get-consistency-proof -``` - -Input: -- `new_size`: tree size of a newer tree head, as an ASCII-encoded - decimal number. -- `old_size`: tree size of an older tree head that the log should - prove is consistent with the newer tree head, as an ASCII-encoded - decimal number. - -Output on success: -- `new_size`: tree size of the newer tree head that the proof is based - on, as an ASCII-encoded decimal number. -- `old_size`: tree size of the older tree head that the proof is based - on, as an ASCII-encoded decimal number. -- `consistency_path`: node hash, hex-encoded. - -`consistency_path` may be omitted or repeated to represent a -consistency proof of zero or more node hashes. The order of node -hashes follow from the hash strategy, see RFC 6962. - -Example: `echo "new_size=4711 -old_size=42" | curl --data-binary @- localhost/st/v0/get-consistency-proof` - -### get-leaves -``` -POST <base url>/st/v0/get-leaves -``` - -Input: -- `start_size`: index of the first leaf to retrieve, as an - ASCII-encoded decimal number. -- `end_size`: index of the last leaf to retrieve, as an ASCII-encoded - decimal number. - -Output on success: -- `shard_hint`: `tree_leaf.message.shard_hint` as an ASCII-encoded - decimal number. -- `checksum`: `tree_leaf.message.checksum`, hex-encoded. -- `signature`: `tree_leaf.signature_over_message`, hex-encoded. -- `key_hash`: `tree_leaf.key_hash`, hex-encoded. - -All fields may be repeated to return more than one leaf. The first -value in each list refers to the first leaf, the second value in each -list refers to the second leaf, etc. The size of each list must -match. - -A log may return fewer leaves than requested. At least one leaf -must be returned on HTTP status code 200 OK. - -Example: `echo "start_size=42 -end_size=4711" | curl --data-binary @- localhost/st/v0/get-leaves` - -### add-leaf -``` -POST <base url>/st/v0/add-leaf -``` - -Input: -- `shard_hint`: number within the log's shard interval as an - ASCII-encoded decimal number. -- `checksum`: the cryptographic checksum that the submitter wants to - log, hex-encoded. -- `signature_over_message`: the submitter's signature over - `tree_leaf.message`, hex-encoded. -- `verification_key`: the submitter's public verification key. The - key is encoded as defined in - [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2) - and then hex-encoded. -- `domain_hint`: domain name indicating where `tree_leaf.key_hash` - can be found as a DNS TXT resource record. - -Output on success: -- None - -The submission will not be accepted if `signature_over_message` is -invalid or if the key hash retrieved using `domain_hint` does not -match a hash over `verification_key`. - -The submission may also not be accepted if the second-level domain -name exceeded its rate limit. By coupling every add-leaf request to -a second-level domain, it becomes more difficult to spam logs. You -would need an excessive number of domain names. This becomes costly -if free domain names are rejected. - -Logs don't publish domain-name to key bindings because key -management is more complex than that. - -Public logging should not be assumed to have happened until an -inclusion proof is available. An inclusion proof should not be relied -upon unless it leads up to a trustworthy signed tree head. Witness -cosigning can make a tree head trustworthy. - -Example: `echo "shard_hint=1640995200 -checksum=cfa2d8e78bf273ab85d3cef7bde62716261d1e42626d776f9b4e6aae7b6ff953 -signature_over_message=c026687411dea494539516ee0c4e790c24450f1a4440c2eb74df311ca9a7adf2847b99273af78b0bda65dfe9c4f7d23a5d319b596a8881d3bc2964749ae9ece3 -verification_key=c9a674888e905db1761ba3f10f3ad09586dddfe8581964b55787b44f318cbcdf -domain_hint=example.com" | curl --data-binary @- localhost/st/v0/add-leaf` - -### add-cosignature -``` -POST <base url>/st/v0/add-cosignature -``` - -Input: -- `signature`: Ed25519 signature over `tree_head`, hex-encoded. -- `key_hash`: hash of the witness' public verification key that can be - used to verify `signature`. The key is encoded as defined in - [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), - and then hashed using SHA256. The hash value is hex-encoded. - -Output on success: -- None - -`key_hash` can be used to identify which witness signed the tree -head. A key-hash, rather than the full verification key, is used to -motivate verifiers to locate the appropriate key and make an explicit -trust decision. - -Example: `echo "signature=d1b15061d0f287847d066630339beaa0915a6bbb77332c3e839a32f66f1831b69c678e8ca63afd24e436525554dbc6daa3b1201cc0c93721de24b778027d41af -key_hash=662ce093682280f8fbea9939abe02fdba1f0dc39594c832b411ddafcffb75b1d" | curl --data-binary @- localhost/st/v0/add-cosignature` - -## Summary of log parameters -- **Public key**: The Ed25519 verification key to be used for - verifying tree head signatures. -- **Log identifier**: The public verification key `Public key` hashed - using SHA256. -- **Shard interval start**: The earliest time at which logging - requests are accepted as the number of seconds since the UNIX epoch. -- **Shard interval end**: The latest time at which logging - requests are accepted as the number of seconds since the UNIX epoch. -- **Base URL**: Where the log can be reached over HTTP(S). It is the - prefix to be used to construct a version 0 specific endpoint. diff --git a/doc/claimant.md b/doc/claimant.md deleted file mode 100644 index 6728fef..0000000 --- a/doc/claimant.md +++ /dev/null @@ -1,71 +0,0 @@ -# Claimant model -## **System<sup>CHECKSUM</sup>** -System<sup>CHECKSUM</sup> is about the claims made by a data publisher. -* **Claim<sup>CHECKSUM</sup>**: - _I, data publisher, claim that the data_: - 1. has cryptographic hash X - 2. is produced by no-one but myself -* **Statement<sup>CHECKSUM</sup>**: signed checksum<br> -* **Claimant<sup>CHECKSUM</sup>**: data publisher<br> - The data publisher is a party that wants to publish some data. -* **Believer<sup>CHECKSUM</sup>**: end-user<br> - The end-user is a party that wants to use some published data. -* **Verifier<sup>CHECKSUM</sup>**: data publisher<br> - Only the data publisher can verify the above claims. -* **Arbiter<sup>CHECKSUM</sup>**:<br> - There's no official body. Invalidated claims would affect reputation. - -System<sup>CHECKSUM\*</sup> can be defined to make more specific claims. Below -is a reproducible builds example. - -### **System<sup>CHECKSUM-RB</sup>**: -System<sup>CHECKSUM-RB</sup> is about the claims made by a _software publisher_ -that makes reproducible builds available. -* **Claim<sup>CHECKSUM-RB</sup>**: - _I, software publisher, claim that the data_: - 1. has cryptographic hash X - 2. is the output of a reproducible build for which the source can be located - using X as an identifier -* **Statement<sup>CHECKSUM-RB</sup>**: Statement<sup>CHECKSUM</sup> -* **Claimant<sup>CHECKSUM-RB</sup>**: software publisher<br> - The software publisher is a party that wants to publish the output of a - reproducible build. -* **Believer<sup>CHECKSUM-RB</sup>**: end-user<br> - The end-user is a party that wants to run an executable binary that built - reproducibly. -* **Verifier<sup>CHECKSUM-RB</sup>**: any interested party<br> - These parties try to verify the above claims. For example: - * the software publisher itself (_"has my identity been compromised?"_) - * rebuilders that check for locatability and reproducibility -* **Arbiter<sup>CHECKSUM-RB</sup>**:<br> - There's no official body. Invalidated claims would affect reputation. - -## **System<sup>CHECKSUM-LOG</sup>**: -System<sup>CHECKSUM-LOG</sup> is about the claims made by a _log operator_. -It adds _discoverability_ into System<sup>CHECKSUM\*</sup>. Discoverability -means that Verifier<sup>CHECKSUM\*</sup> can see all -Statement<sup>CHECKSUM</sup> that Believer<sup>CHECKSUM\*</sup> accept. - -* **Claim<sup>CHECKSUM-LOG</sup>**: - _I, log operator, make available:_ - 1. a globally consistent append-only log of Statement<sup>CHECKSUM</sup> -* **Statement<sup>CHECKSUM-LOG</sup>**: signed tree head -* **Claimant<sup>CHECKSUM-LOG</sup>**: log operator<br> - Possible operators might be: - * a small subset of data publishers - * members of relevant consortia -* **Believer<sup>CHECKSUM-LOG</sup>**: - * Believer<sup>CHECKSUM\*</sup> - * Verifier<sup>CHECKSUM\*</sup><br> -* **Verifier<sup>CHECKSUM-LOG</sup>**: third parties<br> - These parties verify the above claims. Examples include: - * members of relevant consortia - * non-profits and other reputable organizations - * security enthusiasts and researchers - * log operators (cross-ecosystem) - * monitors (cross-ecosystem) - * a small subset of data publishers (cross-ecosystem) -* **Arbiter<sup>CHECKSUM-LOG</sup>**:<br> - There is no official body. The ecosystem at large should stop using an - instance of System<sup>CHECKSUM-LOG</sup> if cryptographic proofs of log - misbehavior are preseneted by some Verifier<sup>CHECKSUM-LOG</sup>. diff --git a/doc/design.md b/doc/design.md deleted file mode 100644 index 2e01a34..0000000 --- a/doc/design.md +++ /dev/null @@ -1,251 +0,0 @@ -# System Transparency Logging: Design v0 -We propose System Transparency logging. It is similar to Certificate -Transparency, except that cryptographically signed checksums are logged as -opposed to X.509 certificates. Publicly logging signed checksums allow anyone -to discover which keys produced what signatures. As such, malicious and -unintended key-usage can be _detected_. We present our design and conclude by -providing two use-cases: binary transparency and reproducible builds. - -**Target audience.** -You are most likely interested in transparency logs or supply-chain security. - -**Preliminaries.** -You have basic understanding of cryptographic primitives like digital -signatures, hash functions, and Merkle trees. You roughly know what problem -Certificate Transparency solves and how. - -**Warning.** -This is a work-in-progress document that may be moved or modified. A future -revision of this document will bump the version number to v1. Please let us -know if you have any feedback. - -## Introduction -Transparency logs make it possible to detect unwanted events. For example, - are there any (mis-)issued TLS certificates [\[CT\]](https://tools.ietf.org/html/rfc6962), - did you get a different Go module than everyone else [\[ChecksumDB\]](https://go.googlesource.com/proposal/+/master/design/25530-sumdb.md), - or is someone running unexpected commands on your server [\[AuditLog\]](https://transparency.dev/application/reliably-log-all-actions-performed-on-your-servers/). -A System Transparency log makes signed checksums transparent. The overall goal -is to facilitate detection of unwanted key-usage. - -## Threat model and (non-)goals -We consider a powerful attacker that gained control of a target's signing and -release infrastructure. This covers a weaker form of attacker that is able to -sign data and distribute it to a subset of isolated users. For example, this is -essentially what the FBI requested from Apple in the San Bernardino case [\[FBI-Apple\]](https://www.eff.org/cases/apple-challenges-fbi-all-writs-act-order). -The fact that signing keys and related infrastructure components get -compromised should not be controversial these days [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/). - -The attacker can also gain control of the transparency log's signing key and -infrastructure. This covers a weaker form of attacker that is able to sign log -data and distribute it to a subset of isolated users. For example, this could -have been the case when a remote code execution was found for a Certificate -Transparency Log [\[DigiCert\]](https://groups.google.com/a/chromium.org/g/ct-policy/c/aKNbZuJzwfM). - -Any attacker that is able to position itself to control these components will -likely be _risk-averse_. This is at minimum due to two factors. First, -detection would result in a significant loss of capability that is by no means -trivial to come by. Second, detection means that some part of the attacker's -malicious behavior will be disclosed publicly. - -Our goal is to facilitate _detection_ of compromised signing keys. We consider -a signing key compromised if an end-user accepts an unwanted signature as valid. -The solution that we propose is that signed checksums are transparency logged. -For security we need a collision resistant hash function and an unforgeable -signature scheme. We also assume that at most a threshold of seemingly -independent parties are adversarial. - -It is a non-goal to disclose the data that a checksum represents. For example, -the log cannot distinguish between a checksum that represents a tax declaration, -an ISO image, or a Debian package. This means that the type of detection we -support is more _coarse-grained_ when compared to Certificate Transparency. - -## Design -We consider a data publisher that wants to digitally sign their data. The data -is of opaque type. We assume that end-users have a mechanism to locate the -relevant public verification keys. Data and signatures can also be retrieved -(in)directly from the data publisher. We make little assumptions about the -signature tooling. The ecosystem at large can continue to use `gpg`, `openssl`, -`ssh-keygen -Y`, `signify`, or something else. - -We _have to assume_ that additional tooling can be installed by end-users that -wish to enforce transparency logging. For example, none of the existing -signature tooling supports verification of Merkle tree proofs. A side-effect of -our design is that this additional tooling makes no outbound connections. The -above data flows are thus preserved. - -### A bird's view -A central part of any transparency log is the data stored by the log. The data is stored by the -leaves of an append-only Merkle tree. Our leaf structure contains four fields: -- **shard_hint**: a number that binds the leaf to a particular _shard interval_. -Sharding means that the log has a predefined time during which logging requests -are accepted. Once elapsed, the log can be shut down. -- **checksum**: a cryptographic hash of some opaque data. The log never -sees the opaque data; just the hash made by the data publisher. -- **signature**: a digital signature that is computed by the data publisher over -the leaf's shard hint and checksum. -- **key_hash**: a cryptographic hash of the data publisher's public verification key that can be -used to verify the signature. - -#### Step 1 - preparing a logging request -The data publisher selects a shard hint and a checksum that should be logged. -For example, the shard hint could be "logs that are active during 2021". The -checksum might be the hash of a release file. - -The data publisher signs the selected shard hint and checksum using a secret -signing key. Both the signed message and the signature is stored -in the leaf for anyone to verify. Including a shard hint in the signed message -ensures that a good Samaritan cannot change it to log all leaves from an -earlier shard into a newer one. - -A hash of the public verification key is also stored in the leaf. This makes it -possible to attribute the leaf to the data publisher. For example, a data publisher -that monitors the log can look for leaves that match their own key hash(es). - -A hash, rather than the full public verification key, is used to motivate the -verifier to locate the key and make an explicit trust decision. Not disclosing the public -verification key in the leaf makes it more unlikely that someone would use an untrusted key _by -mistake_. - -#### Step 2 - submitting a logging request -The log implements an HTTP(S) API. Input and output is human-readable and uses -a simple key-value format. A more complex parser like JSON is not needed -because the exchanged data structures are primitive enough. - -The data publisher submits their shard hint, checksum, signature, and public -verification key as key-value pairs. The log will use the public verification -key to check that the signature is valid, then hash it to construct the `key_hash` part of the leaf. - -The data publisher also submits a _domain hint_. The log will download a DNS -TXT resource record based on the provided domain name. The downloaded result -must match the public verification key hash. By verifying that the submitter -controls a domain that is aware of the public verification key, rate limits can -be applied per second-level domain. As a result, you would need a large number -of domain names to spam the log in any significant way. - -Using DNS to combat spam is convenient because many data publishers already have -a domain name. A single domain name is also relatively cheap. Another -benefit is that the same anti-spam mechanism can be used across several -independent logs without coordination. This is important because a healthy log -ecosystem needs more than one log in order to be reliable. DNS also has built-in -caching which data publishers can influence by setting TTLs accordingly. - -The submitter's domain hint is not part of the leaf because key management is -more complex than that. A separate project should focus on transparent key -management. The scope of our work is transparent _key-usage_. - -The log will _try_ to incorporate a leaf into the Merkle tree if a logging -request is accepted. There are no _promises of public logging_ as in -Certificate Transparency. Therefore, the submitter needs to wait for an -inclusion proof to appear before concluding that the logging request succeeded. Not having -inclusion promises makes the log less complex. - -#### Step 3 - distributing proofs of public logging -The data publisher is responsible for collecting all cryptographic proofs that -their end-users will need to enforce public logging. The collection below -should be downloadable from the same place that published data is normally hosted. -1. **Opaque data**: the data publisher's opaque data. -2. **Shard hint**: the data publisher's selected shard hint. -3. **Signature**: the data publisher's leaf signature. -4. **Cosigned tree head**: the log's tree head and a _list of signatures_ that -state it is consistent with prior history. -5. **Inclusion proof**: a proof of inclusion based on the logged leaf and tree -head in question. - -The data publisher's public verification key is known. Therefore, the first three fields are -sufficient to reconstruct the logged leaf. The leaf's signature can be -verified. The final two fields then prove that the leaf is in the log. If the -leaf is included in the log, any monitor can detect that there is a new -signature made by a given data publisher, 's public verification key. - -The catch is that the proof of logging is only as convincing as the tree head -that the inclusion proof leads up to. To bypass public logging, the attacker -needs to control a threshold of independent _witnesses_ that cosign the log. A -benign witness will only sign the log's tree head if it is consistent with prior -history. - -#### Summary -The log is sharded and will shut down at a predefined time. The log can shut -down _safely_ because end-user verification is not interactive. The difficulty -of bypassing public logging is based on the difficulty of controlling a -threshold of independent witnesses. Witnesses cosign tree heads to make them -trustworthy. - -Submitters, monitors, and witnesses interact with the log using an HTTP(S) API. -Submitters must prove that they own a domain name as an anti-spam mechanism. -End-users interact with the log _indirectly_ via a data publisher. It is the -data publisher's job to log signed checksums, distribute necessary proofs of -logging, and monitor the log. - -### A peek into the details -Our bird's view introduction skipped many details that matter in practise. Some -of these details are presented here using a question-answer format. A -question-answer format is helpful because it is easily modified and extended. - -#### What cryptographic primitives are supported? -The only supported hash algorithm is SHA256. The only supported signature -scheme is Ed25519. Not having any cryptographic agility makes the protocol less -complex and more secure. - -We can be cryptographically opinionated because of a key insight. Existing -signature tools like `gpg`, `ssh-keygen -Y`, and `signify` cannot verify proofs -of public logging. Therefore, _additional tooling must already be installed by -end-users_. That tooling should verify hashes using the log's hash function. -That tooling should also verify signatures using the log's signature scheme. -Both tree heads and tree leaves are being signed. - -#### Why not let the data publisher pick their own signature scheme and format? -Agility introduces complexity and difficult policy questions. For example, -which algorithms and formats should (not) be supported and why? Picking Ed25519 -is a current best practise that should be encouraged if possible. - -There is not much we can do if a data publisher _refuses_ to rely on the log's -hash function or signature scheme. - -#### What if the data publisher must use a specific signature scheme or format? -They may _cross-sign_ the data as follows. -1. Sign the data as they're used to. -2. Hash the data and use the result as the leaf's checksum to be logged. -3. Sign the leaf using the log's signature scheme. - -For verification, the end-user first verifies that the usual signature from step 1 is valid. Then the -end-user uses the additional tooling (which is already required) to verify the rest. -Cross-signing should be a relatively comfortable upgrade path that is backwards -compatible. The downside is that the data publisher may need to manage an -additional key-pair. - -#### What (de)serialization parsers are needed? -#### What policy should be used? -#### Why witness cosigning? -#### Why sharding? -Unlike X.509 certificates which already have validity ranges, a -checksum does not carry any such information. Therefore, we require -that the submitter selects a _shard hint_. The selected shard hint -must be in the log's _shard interval_. A shard interval is defined by -a start time and an end time. Both ends of the shard interval are -inclusive and expressed as the number of seconds since the UNIX epoch -(January 1, 1970 00:00 UTC). - -Sharding simplifies log operations because it becomes explicit when a -log can be shutdown. A log must only accept logging requests that -have valid shard hints. A log should only accept logging requests -during the predefined shard interval. Note that _the submitter's -shard hint is not a verified timestamp_. The submitter should set the -shard hint as large as possible. If a roughly verified timestamp is -needed, a cosigned tree head can be used. - -Without a shard hint, the good Samaritan could log all leaves from an -earlier shard into a newer one. Not only would that defeat the -purpose of sharding, but it would also become a potential -denial-of-service vector. - -#### TODO -Add more key questions and answers. -- Log spamming -- Log poisoning -- Why we removed identifier field from the leaf -- Explain `latest`, `stable` and `cosigned` tree head. -- Privacy aspects -- How does this whole thing work with more than one log? - -## Concluding remarks -Example of binary transparency and reproducible builds. |