From 4ef60084222c7d14bde9032d74ff5d02bcc3e32d Mon Sep 17 00:00:00 2001
From: Rasmus Dahlberg <rasmus.dahlberg@kau.se>
Date: Tue, 22 Jun 2021 23:35:42 +0200
Subject: moved documentation to sigsum/sigsum repository

---
 doc/api.md      | 398 --------------------------------------------------------
 doc/claimant.md |  71 ----------
 doc/design.md   | 251 -----------------------------------
 3 files changed, 720 deletions(-)
 delete mode 100644 doc/api.md
 delete mode 100644 doc/claimant.md
 delete mode 100644 doc/design.md

(limited to 'doc')

diff --git a/doc/api.md b/doc/api.md
deleted file mode 100644
index 57ad119..0000000
--- a/doc/api.md
+++ /dev/null
@@ -1,398 +0,0 @@
-# System Transparency Logging: API v0
-This document describes details of the System Transparency logging
-API, version 0.  The broader picture is not explained here.  We assume
-that you have read the System Transparency Logging design document.
-It can be found
-[here](https://github.com/system-transparency/stfe/blob/design/doc/design.md).
-
-**Warning.**
-This is a work-in-progress document that may be moved or modified.
-
-## Overview
-Logs implement an HTTP(S) API for accepting requests and sending
-responses.
-
-- Input data in requests and output data in responses are expressed as
-  ASCII-encoded key/value pairs.
-- Requests with input data use HTTP POST to send the data to a log.
-- Binary data is hex-encoded before being transmitted.
-
-The motivation for using a text based key/value format for request and
-response data is that it's simple to parse.  Note that this format is
-not being used for the serialization of signed or logged data, where a
-more well defined and storage efficient format is desirable.  A
-submitter may distribute log responses to their end-users in any
-format that suits them.  The (de)serialization required for
-_end-users_ is a small subset of Trunnel.  Trunnel is an "idiot-proof"
-wire-format in use by the Tor project.
-
-## Primitives
-### Cryptography
-Logs use the same Merkle tree hash strategy as
-[RFC 6962,§2](https://tools.ietf.org/html/rfc6962#section-2).
-The hash functions must be
-[SHA256](https://csrc.nist.gov/csrc/media/publications/fips/180/4/final/documents/fips180-4-draft-aug2014.pdf).
-Logs must sign tree heads using
-[Ed25519](https://tools.ietf.org/html/rfc8032).  Log witnesses
-must also sign tree heads using Ed25519.
-
-All other parts that are not Merkle tree related also use SHA256 as
-the hash function.  Using more than one hash function would increases
-the overall attack surface: two hash functions must be collision
-resistant instead of one.
-
-### Serialization
-Log requests and responses are transmitted as ASCII-encoded key/value
-pairs, for a smaller dependency than an alternative parser like JSON.
-Some input and output data is binary: cryptographic hashes and
-signatures.  Binary data must be Base16-encoded, also known as hex
-encoding.  Using hex as opposed to base64 is motivated by it being
-simpler, favoring ease of decoding and encoding over efficiency on the
-wire.
-
-We use the
-[Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html)
-to define (de)serialization of data structures that need to be signed or
-inserted into the Merkle tree.  Trunnel is more expressive than the
-[SSH wire format](https://tools.ietf.org/html/rfc4251#section-5).
-It is about as expressive as the
-[TLS presentation language](https://tools.ietf.org/html/rfc8446#section-3).
-A notable difference is that Trunnel supports integer constraints.
-The Trunnel language is also readable by humans _and_ machines.
-"Obviously correct code" can be generated in C and Go.
-
-A fair summary of our Trunnel usage is as follows.
-
-All integers are 64-bit, unsigned, and in network byte order.
-Fixed-size byte arrays are put into the serialization buffer in-order,
-starting from the first byte.  Variable length byte arrays first
-declare their length as an integer, which is then followed by that
-number of bytes.  These basic types are concatenated to form a
-collection.  You should not need a general-purpose Trunnel
-(de)serialization parser to work with this format.  If you have one,
-you may use it though.  The main point of using Trunnel is that it
-makes a simple format explicit and unambiguous.
-
-#### Merkle tree head
-Tree heads are signed both by a log and its witnesses.  It contains a
-timestamp, a tree size, and a root hash.  The timestamp is included so
-that monitors can ensure _liveliness_.  It is the time since the UNIX
-epoch (January 1, 1970 00:00 UTC) in seconds.  The tree size
-specifies the current number of leaves.  The root hash fixes the
-structure and content of the Merkle tree.
-
-```
-struct tree_head {
-	u64 timestamp;
-	u64 tree_size;
-	u8 root_hash[32];
-};
-```
-
-The serialized tree head must be signed using Ed25519.  A witness must
-not cosign a tree head if it is inconsistent with prior history or if
-the timestamp is backdated or future-dated more than 12 hours.
-
-#### Merkle tree leaf
-Logs support a single leaf type.  It contains a shard hint, a
-checksum over whatever the submitter wants to log a checksum for, a
-signature that the submitter computed over the shard hint and the
-checksum, and a hash of the submitter's public verification key, that
-can be used to verify the signature.
-
-```
-struct message {
-    u64 shard_hint;
-    u8 checksum[32];
-};
-
-struct tree_leaf {
-    struct message;
-    u8 signature_over_message[64];
-    u8 key_hash[32];
-}
-```
-
-`message` is composed of the `shard_hint`, chosen by the submitter to
-match the shard interval for the log it's submitting to, and the
-submitter's `checksum` to be logged.
-
-`signature_over_message` is a signature over `message`, using the
-submitter's verification key. It must be possible to verify the
-signature using the submitter's public verification key, as indicated
-by `key_hash`.
-
-`key_hash` is a hash of the submitter's verification key used for
-signing `message`. It is included in `tree_leaf` so that the leaf can
-be attributed to the submitter.  A hash, rather than the full public
-key, is used to motivate verifiers to locate the appropriate key and
-make an explicit trust decision.
-
-## Public endpoints
-Every log has a base URL that identifies it uniquely.  The only
-constraint is that it must be a valid HTTP(S) URL that can have the
-`/st/v0/<endpoint>` suffix appended.  For example, a complete endpoint
-URL could be
-`https://log.example.com/2021/st/v0/get-tree-head-cosigned`.
-
-Input data (in requests) is POST:ed in the HTTP message body as ASCII
-key/value pairs.
-
-Output data (in replies) is sent in the HTTP message body in the same
-format as the input data, i.e. as ASCII key/value pairs on the format
-`Key=Value`
-
-The HTTP status code is 200 OK to indicate success.  A different HTTP
-status code is used to indicate failure, in which case a log should
-respond with a human-readable string describing what went wrong using
-the key `error`. Example: `error=Invalid signature.`.
-
-### get-tree-head-cosigned
-Returns the latest cosigned tree head. Used together with
-`get-proof-by-hash` and `get-consistency-proof` for verifying the tree.
-
-```
-GET <base url>/st/v0/get-tree-head-cosigned
-```
-
-Input:
-- None
-
-Output on success:
-- `timestamp`: `tree_head.timestamp` ASCII-encoded decimal number,
-  seconds since the UNIX epoch.
-- `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number.
-- `root_hash`: `tree_head.root_hash` hex-encoded.
-- `signature`: hex-encoded Ed25519 signature over `timestamp`,
-  `tree_size` and `root_hash` serialized into a `tree_head` as
-  described in section `Merkle tree head`.
-- `key_hash`: a hash of the public verification key (belonging to
-  either the log or to one of its witnesses), which can be used to
-  verify the most recent `signature`.  The key is encoded as defined
-  in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), 
-  and then hashed using SHA256.  The hash value is hex-encoded.
-
-The `signature` and `key_hash` fields may repeat. The first signature
-corresponds to the first key hash, the second signature corresponds to
-the second key hash, etc.  The number of signatures and key hashes
-must match.
-
-### get-tree-head-to-sign
-Returns the latest tree head to be signed by log witnesses. Used by
-witnesses.
-
-```
-GET <base url>/st/v0/get-tree-head-to-sign
-```
-
-Input:
-- None
-
-Output on success:
-- `timestamp`: `tree_head.timestamp` ASCII-encoded decimal number,
-  seconds since the UNIX epoch.
-- `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number.
-- `root_hash`: `tree_head.root_hash` hex-encoded.
-- `signature`: hex-encoded Ed25519 signature over `timestamp`,
-  `tree_size` and `root_hash` serialized into a `tree_head` as
-  described in section `Merkle tree head`.
-- `key_hash`: a hash of the log's public verification key, which can
-  be used to verify `signature`.  The key is encoded as defined in
-  [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2),
-  and then hashed using SHA256.  The hash value is hex-encoded.
-
-There is exactly one `signature` and one `key_hash` field. The
-`key_hash` refers to the log's public verification key.
-
-
-### get-tree-head-latest
-Returns the latest tree head, signed only by the log. Used for
-debugging purposes.
-
-```
-GET <base url>/st/v0/get-tree-head-latest
-```
-
-Input:
-- None
-
-Output on success:
-- `timestamp`: `tree_head.timestamp` ASCII-encoded decimal number,
-  seconds since the UNIX epoch.
-- `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number.
-- `root_hash`: `tree_head.root_hash` hex-encoded.
-- `signature`: hex-encoded Ed25519 signature over `timestamp`,
-  `tree_size` and `root_hash` serialized into a `tree_head` as
-  described in section `Merkle tree head`.
-- `key_hash`: a hash of the log's public verification key that can be
-  used to verify `signature`.  The key is encoded as defined in
-  [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2),
-  and then hashed using SHA256.  The hash value is hex-encoded.
-
-There is exactly one `signature` and one `key_hash` field. The
-`key_hash` refers to the log's public verification key.
-
-
-### get-proof-by-hash
-```
-POST <base url>/st/v0/get-proof-by-hash
-```
-
-Input:
-- `leaf_hash`: leaf identifying which `tree_leaf` the log should prove
-  inclusion of, hex-encoded.
-- `tree_size`: tree size of the tree head that the proof should be
-  based on, as an ASCII-encoded decimal number.
-
-Output on success:
-- `tree_size`: tree size that the proof is based on, as an
-  ASCII-encoded decimal number.
-- `leaf_index`: zero-based index of the leaf that the proof is based
-  on, as an ASCII-encoded decimal number.
-- `inclusion_path`: node hash, hex-encoded.
-
-The leaf hash is computed using the RFC 6962 hashing strategy.  In
-other words, `SHA256(0x00 | tree_leaf)`.
-
-`inclusion_path` may be omitted or repeated to represent an inclusion
-proof of zero or more node hashes.  The order of node hashes follow
-from the hash strategy, see RFC 6962.
-
-Example: `echo "leaf_hash=241fd4538d0a35c2d0394e4710ea9e6916854d08f62602fb03b55221dcdac90f
-tree_size=4711" | curl --data-binary @- localhost/st/v0/get-proof-by-hash`
-
-### get-consistency-proof
-```
-POST <base url>/st/v0/get-consistency-proof
-```
-
-Input:
-- `new_size`: tree size of a newer tree head, as an ASCII-encoded
-  decimal number.
-- `old_size`: tree size of an older tree head that the log should
-  prove is consistent with the newer tree head, as an ASCII-encoded
-  decimal number.
-
-Output on success:
-- `new_size`: tree size of the newer tree head that the proof is based
-  on, as an ASCII-encoded decimal number.
-- `old_size`: tree size of the older tree head that the proof is based
-  on, as an ASCII-encoded decimal number.
-- `consistency_path`: node hash, hex-encoded.
-
-`consistency_path` may be omitted or repeated to represent a
-consistency proof of zero or more node hashes.  The order of node
-hashes follow from the hash strategy, see RFC 6962.
-
-Example: `echo "new_size=4711
-old_size=42" | curl --data-binary @- localhost/st/v0/get-consistency-proof`
-
-### get-leaves
-```
-POST <base url>/st/v0/get-leaves
-```
-
-Input:
-- `start_size`: index of the first leaf to retrieve, as an
-  ASCII-encoded decimal number.
-- `end_size`: index of the last leaf to retrieve, as an ASCII-encoded
-  decimal number.
-
-Output on success:
-- `shard_hint`: `tree_leaf.message.shard_hint` as an ASCII-encoded
-  decimal number.
-- `checksum`: `tree_leaf.message.checksum`, hex-encoded.
-- `signature`: `tree_leaf.signature_over_message`, hex-encoded.
-- `key_hash`: `tree_leaf.key_hash`, hex-encoded.
-
-All fields may be repeated to return more than one leaf.  The first
-value in each list refers to the first leaf, the second value in each
-list refers to the second leaf, etc.  The size of each list must
-match.
-
-A log may return fewer leaves than requested.  At least one leaf
-must be returned on HTTP status code 200 OK.
-
-Example: `echo "start_size=42
-end_size=4711" | curl --data-binary @- localhost/st/v0/get-leaves`
-
-### add-leaf
-```
-POST <base url>/st/v0/add-leaf
-```
-
-Input:
-- `shard_hint`: number within the log's shard interval as an
-  ASCII-encoded decimal number.
-- `checksum`: the cryptographic checksum that the submitter wants to
-  log, hex-encoded.
-- `signature_over_message`: the submitter's signature over
-  `tree_leaf.message`, hex-encoded.
-- `verification_key`: the submitter's public verification key.  The
-  key is encoded as defined in
-  [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2)
-  and then hex-encoded.
-- `domain_hint`: domain name indicating where `tree_leaf.key_hash`
-  can be found as a DNS TXT resource record.
-
-Output on success:
-- None
-
-The submission will not be accepted if `signature_over_message` is
-invalid or if the key hash retrieved using `domain_hint` does not
-match a hash over `verification_key`.
-
-The submission may also not be accepted if the second-level domain
-name exceeded its rate limit.  By coupling every add-leaf request to
-a second-level domain, it becomes more difficult to spam logs.  You
-would need an excessive number of domain names.  This becomes costly
-if free domain names are rejected.
-
-Logs don't publish domain-name to key bindings because key
-management is more complex than that.
-
-Public logging should not be assumed to have happened until an
-inclusion proof is available.  An inclusion proof should not be relied
-upon unless it leads up to a trustworthy signed tree head.  Witness
-cosigning can make a tree head trustworthy.
-
-Example: `echo "shard_hint=1640995200
-checksum=cfa2d8e78bf273ab85d3cef7bde62716261d1e42626d776f9b4e6aae7b6ff953
-signature_over_message=c026687411dea494539516ee0c4e790c24450f1a4440c2eb74df311ca9a7adf2847b99273af78b0bda65dfe9c4f7d23a5d319b596a8881d3bc2964749ae9ece3
-verification_key=c9a674888e905db1761ba3f10f3ad09586dddfe8581964b55787b44f318cbcdf
-domain_hint=example.com" | curl --data-binary @- localhost/st/v0/add-leaf`
-
-### add-cosignature
-```
-POST <base url>/st/v0/add-cosignature
-```
-
-Input:
-- `signature`: Ed25519 signature over `tree_head`, hex-encoded.
-- `key_hash`: hash of the witness' public verification key that can be
-  used to verify `signature`.  The key is encoded as defined in
-  [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2),
-  and then hashed using SHA256. The hash value is hex-encoded.
-
-Output on success:
-- None
-
-`key_hash` can be used to identify which witness signed the tree
-head.  A key-hash, rather than the full verification key, is used to
-motivate verifiers to locate the appropriate key and make an explicit
-trust decision.
-
-Example: `echo "signature=d1b15061d0f287847d066630339beaa0915a6bbb77332c3e839a32f66f1831b69c678e8ca63afd24e436525554dbc6daa3b1201cc0c93721de24b778027d41af
-key_hash=662ce093682280f8fbea9939abe02fdba1f0dc39594c832b411ddafcffb75b1d" | curl --data-binary @- localhost/st/v0/add-cosignature`
-
-## Summary of log parameters
-- **Public key**: The Ed25519 verification key to be used for
-  verifying tree head signatures.
-- **Log identifier**: The public verification key `Public key` hashed
-  using SHA256.
-- **Shard interval start**: The earliest time at which logging
-  requests are accepted as the number of seconds since the UNIX epoch.
-- **Shard interval end**: The latest time at which logging
-  requests are accepted as the number of seconds since the UNIX epoch.
-- **Base URL**: Where the log can be reached over HTTP(S).  It is the
-  prefix to be used to construct a version 0 specific endpoint.
diff --git a/doc/claimant.md b/doc/claimant.md
deleted file mode 100644
index 6728fef..0000000
--- a/doc/claimant.md
+++ /dev/null
@@ -1,71 +0,0 @@
-# Claimant model
-## **System<sup>CHECKSUM</sup>**
-System<sup>CHECKSUM</sup> is about the claims made by a data publisher.
-* **Claim<sup>CHECKSUM</sup>**:
-	_I, data publisher, claim that the data_:
-	1. has cryptographic hash X
-	2. is produced by no-one but myself
-* **Statement<sup>CHECKSUM</sup>**: signed checksum<br>
-* **Claimant<sup>CHECKSUM</sup>**: data publisher<br>
-	The data publisher is a party that wants to publish some data.
-* **Believer<sup>CHECKSUM</sup>**: end-user<br>
-	The end-user is a party that wants to use some published data.
-* **Verifier<sup>CHECKSUM</sup>**: data publisher<br>
-	Only the data publisher can verify the above claims.
-* **Arbiter<sup>CHECKSUM</sup>**:<br>
-    There's no official body.  Invalidated claims would affect reputation.
-
-System<sup>CHECKSUM\*</sup> can be defined to make more specific claims.  Below
-is a reproducible builds example.
-
-### **System<sup>CHECKSUM-RB</sup>**:
-System<sup>CHECKSUM-RB</sup> is about the claims made by a _software publisher_
-that makes reproducible builds available.
-* **Claim<sup>CHECKSUM-RB</sup>**:
-	_I, software publisher, claim that the data_:
-	1. has cryptographic hash X
-	2. is the output of a reproducible build for which the source can be located
-	using X as an identifier
-* **Statement<sup>CHECKSUM-RB</sup>**: Statement<sup>CHECKSUM</sup>
-* **Claimant<sup>CHECKSUM-RB</sup>**: software publisher<br>
-	The software publisher is a party that wants to publish the output of a
-	reproducible build.
-* **Believer<sup>CHECKSUM-RB</sup>**: end-user<br>
-	The end-user is a party that wants to run an executable binary that built
-	reproducibly.
-* **Verifier<sup>CHECKSUM-RB</sup>**: any interested party<br>
-	These parties try to verify the above claims.  For example:
-	* the software publisher itself (_"has my identity been compromised?"_)
-	* rebuilders that check for locatability and reproducibility
-* **Arbiter<sup>CHECKSUM-RB</sup>**:<br>
-    There's no official body.  Invalidated claims would affect reputation.
-
-## **System<sup>CHECKSUM-LOG</sup>**:
-System<sup>CHECKSUM-LOG</sup> is about the claims made by a _log operator_.
-It adds _discoverability_ into System<sup>CHECKSUM\*</sup>.  Discoverability
-means that Verifier<sup>CHECKSUM\*</sup> can see all
-Statement<sup>CHECKSUM</sup> that Believer<sup>CHECKSUM\*</sup> accept.
-
-* **Claim<sup>CHECKSUM-LOG</sup>**:
-	_I, log operator, make available:_
-	1. a globally consistent append-only log of Statement<sup>CHECKSUM</sup>
-* **Statement<sup>CHECKSUM-LOG</sup>**: signed tree head
-* **Claimant<sup>CHECKSUM-LOG</sup>**: log operator<br>
-   Possible operators might be:
-	* a small subset of data publishers
-	* members of relevant consortia
-* **Believer<sup>CHECKSUM-LOG</sup>**:
-	* Believer<sup>CHECKSUM\*</sup>
-	* Verifier<sup>CHECKSUM\*</sup><br>
-* **Verifier<sup>CHECKSUM-LOG</sup>**: third parties<br>
-	These parties verify the above claims.  Examples include:
-	* members of relevant consortia
-	* non-profits and other reputable organizations
-	* security enthusiasts and researchers
-	* log operators (cross-ecosystem)
-	* monitors (cross-ecosystem)
-	* a small subset of data publishers (cross-ecosystem)
-* **Arbiter<sup>CHECKSUM-LOG</sup>**:<br>
-	There is no official body.  The ecosystem at large should stop using an
-	instance of System<sup>CHECKSUM-LOG</sup> if cryptographic proofs of log
-	misbehavior are preseneted by some Verifier<sup>CHECKSUM-LOG</sup>.
diff --git a/doc/design.md b/doc/design.md
deleted file mode 100644
index 2e01a34..0000000
--- a/doc/design.md
+++ /dev/null
@@ -1,251 +0,0 @@
-# System Transparency Logging: Design v0
-We propose System Transparency logging.  It is similar to Certificate
-Transparency, except that cryptographically signed checksums are logged as
-opposed to X.509 certificates.  Publicly logging signed checksums allow anyone
-to discover which keys produced what signatures.  As such, malicious and
-unintended key-usage can be _detected_.  We present our design and conclude by
-providing two use-cases: binary transparency and reproducible builds.
-
-**Target audience.**
-You are most likely interested in transparency logs or supply-chain security.
-
-**Preliminaries.**
-You have basic understanding of cryptographic primitives like digital
-signatures, hash functions, and Merkle trees.  You roughly know what problem
-Certificate Transparency solves and how.
-
-**Warning.**
-This is a work-in-progress document that may be moved or modified.  A future
-revision of this document will bump the version number to v1.  Please let us
-know if you have any feedback.
-
-## Introduction
-Transparency logs make it possible to detect unwanted events.  For example,
-	are there any (mis-)issued TLS certificates [\[CT\]](https://tools.ietf.org/html/rfc6962),
-	did you get a different Go module than everyone else [\[ChecksumDB\]](https://go.googlesource.com/proposal/+/master/design/25530-sumdb.md),
-	or is someone running unexpected commands on your server [\[AuditLog\]](https://transparency.dev/application/reliably-log-all-actions-performed-on-your-servers/).
-A System Transparency log makes signed checksums transparent.  The overall goal
-is to facilitate detection of unwanted key-usage.
-
-## Threat model and (non-)goals
-We consider a powerful attacker that gained control of a target's signing and
-release infrastructure.  This covers a weaker form of attacker that is able to
-sign data and distribute it to a subset of isolated users.  For example, this is
-essentially what the FBI requested from Apple in the San Bernardino case [\[FBI-Apple\]](https://www.eff.org/cases/apple-challenges-fbi-all-writs-act-order).
-The fact that signing keys and related infrastructure components get
-compromised should not be controversial these days [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/).
-
-The attacker can also gain control of the transparency log's signing key and
-infrastructure.  This covers a weaker form of attacker that is able to sign log
-data and distribute it to a subset of isolated users.  For example, this could
-have been the case when a remote code execution was found for a Certificate
-Transparency Log [\[DigiCert\]](https://groups.google.com/a/chromium.org/g/ct-policy/c/aKNbZuJzwfM).
-
-Any attacker that is able to position itself to control these components will
-likely be _risk-averse_.  This is at minimum due to two factors.  First,
-detection would result in a significant loss of capability that is by no means
-trivial to come by.  Second, detection means that some part of the attacker's
-malicious behavior will be disclosed publicly.
-
-Our goal is to facilitate _detection_ of compromised signing keys.  We consider
-a signing key compromised if an end-user accepts an unwanted signature as valid.
-The solution that we propose is that signed checksums are transparency logged.
-For security we need a collision resistant hash function and an unforgeable
-signature scheme.  We also assume that at most a threshold of seemingly
-independent parties are adversarial.
-
-It is a non-goal to disclose the data that a checksum represents.  For example,
-the log cannot distinguish between a checksum that represents a tax declaration,
-an ISO image, or a Debian package.  This means that the type of detection we
-support is more _coarse-grained_ when compared to Certificate Transparency.
-
-## Design
-We consider a data publisher that wants to digitally sign their data.  The data
-is of opaque type.  We assume that end-users have a mechanism to locate the
-relevant public verification keys.  Data and signatures can also be retrieved
-(in)directly from the data publisher.  We make little assumptions about the
-signature tooling.  The ecosystem at large can continue to use `gpg`, `openssl`,
-`ssh-keygen -Y`, `signify`, or something else.
-
-We _have to assume_ that additional tooling can be installed by end-users that
-wish to enforce transparency logging.  For example, none of the existing
-signature tooling supports verification of Merkle tree proofs.  A side-effect of
-our design is that this additional tooling makes no outbound connections.  The
-above data flows are thus preserved.
-
-### A bird's view
-A central part of any transparency log is the data stored by the log.  The data is stored by the
-leaves of an append-only Merkle tree.  Our leaf structure contains four fields:
-- **shard_hint**: a number that binds the leaf to a particular _shard interval_.
-Sharding means that the log has a predefined time during which logging requests
-are accepted.  Once elapsed, the log can be shut down.
-- **checksum**: a cryptographic hash of some opaque data.  The log never
-sees the opaque data; just the hash made by the data publisher.
-- **signature**: a digital signature that is computed by the data publisher over
-the leaf's shard hint and checksum.
-- **key_hash**: a cryptographic hash of the data publisher's public verification key that can be
-used to verify the signature.
-
-#### Step 1 - preparing a logging request
-The data publisher selects a shard hint and a checksum that should be logged.
-For example, the shard hint could be "logs that are active during 2021".  The
-checksum might be the hash of a release file.
-
-The data publisher signs the selected shard hint and checksum using a secret
-signing key.  Both the signed message and the signature is stored
-in the leaf for anyone to verify.  Including a shard hint in the signed message
-ensures that a good Samaritan cannot change it to log all leaves from an
-earlier shard into a newer one.
-
-A hash of the public verification key is also stored in the leaf.  This makes it
-possible to attribute the leaf to the data publisher.  For example, a data publisher
-that monitors the log can look for leaves that match their own key hash(es).
-
-A hash, rather than the full public verification key, is used to motivate the
-verifier to locate the key and make an explicit trust decision.  Not disclosing the public
-verification key in the leaf makes it more unlikely that someone would use an untrusted key _by
-mistake_.
-
-#### Step 2 - submitting a logging request
-The log implements an HTTP(S) API.  Input and output is human-readable and uses
-a simple key-value format.  A more complex parser like JSON is not needed
-because the exchanged data structures are primitive enough.
-
-The data publisher submits their shard hint, checksum, signature, and public
-verification key as key-value pairs.  The log will use the public verification
-key to check that the signature is valid, then hash it to construct the `key_hash` part of the leaf.
-
-The data publisher also submits a _domain hint_.  The log will download a DNS
-TXT resource record based on the provided domain name.  The downloaded result
-must match the public verification key hash.  By verifying that the submitter
-controls a domain that is aware of the public verification key, rate limits can
-be applied per second-level domain.  As a result, you would need a large number
-of domain names to spam the log in any significant way.
-
-Using DNS to combat spam is convenient because many data publishers already have
-a domain name.  A single domain name is also relatively cheap.  Another
-benefit is that the same anti-spam mechanism can be used across several
-independent logs without coordination.  This is important because a healthy log
-ecosystem needs more than one log in order to be reliable.  DNS also has built-in
-caching which data publishers can influence by setting TTLs accordingly.
-
-The submitter's domain hint is not part of the leaf because key management is
-more complex than that.  A separate project should focus on transparent key
-management.  The scope of our work is transparent _key-usage_.
-
-The log will _try_ to incorporate a leaf into the Merkle tree if a logging
-request is accepted.  There are no _promises of public logging_ as in
-Certificate Transparency.  Therefore, the submitter needs to wait for an
-inclusion proof to appear before concluding that the logging request succeeded.  Not having
-inclusion promises makes the log less complex.
-
-#### Step 3 - distributing proofs of public logging
-The data publisher is responsible for collecting all cryptographic proofs that
-their end-users will need to enforce public logging.  The collection below
-should be downloadable from the same place that published data is normally hosted.
-1. **Opaque data**: the data publisher's opaque data.
-2. **Shard hint**: the data publisher's selected shard hint.
-3. **Signature**: the data publisher's leaf signature.
-4. **Cosigned tree head**: the log's tree head and a _list of signatures_ that
-state it is consistent with prior history.
-5. **Inclusion proof**: a proof of inclusion based on the logged leaf and tree
-head in question.
-
-The data publisher's public verification key is known.  Therefore, the first three fields are
-sufficient to reconstruct the logged leaf.  The leaf's signature can be
-verified.  The final two fields then prove that the leaf is in the log.  If the
-leaf is included in the log, any monitor can detect that there is a new
-signature made by a given data publisher, 's public verification key.
-
-The catch is that the proof of logging is only as convincing as the tree head
-that the inclusion proof leads up to.  To bypass public logging, the attacker
-needs to control a threshold of independent _witnesses_ that cosign the log.  A
-benign witness will only sign the log's tree head if it is consistent with prior
-history.
-
-#### Summary
-The log is sharded and will shut down at a predefined time.  The log can shut
-down _safely_ because end-user verification is not interactive.  The difficulty
-of bypassing public logging is based on the difficulty of controlling a
-threshold of independent witnesses.  Witnesses cosign tree heads to make them
-trustworthy.
-
-Submitters, monitors, and witnesses interact with the log using an HTTP(S) API.
-Submitters must prove that they own a domain name as an anti-spam mechanism.
-End-users interact with the log _indirectly_ via a data publisher.  It is the
-data publisher's job to log signed checksums, distribute necessary proofs of
-logging, and monitor the log.
-
-### A peek into the details
-Our bird's view introduction skipped many details that matter in practise.  Some
-of these details are presented here using a question-answer format.  A
-question-answer format is helpful because it is easily modified and extended.
-
-#### What cryptographic primitives are supported?
-The only supported hash algorithm is SHA256.  The only supported signature
-scheme is Ed25519.  Not having any cryptographic agility makes the protocol less
-complex and more secure.
-
-We can be cryptographically opinionated because of a key insight.  Existing
-signature tools like `gpg`, `ssh-keygen -Y`, and `signify` cannot verify proofs
-of public logging.  Therefore, _additional tooling must already be installed by
-end-users_.  That tooling should verify hashes using the log's hash function.
-That tooling should also verify signatures using the log's signature scheme.
-Both tree heads and tree leaves are being signed.
-
-#### Why not let the data publisher pick their own signature scheme and format?
-Agility introduces complexity and difficult policy questions.  For example,
-which algorithms and formats should (not) be supported and why?  Picking Ed25519
-is a current best practise that should be encouraged if possible.
-
-There is not much we can do if a data publisher _refuses_ to rely on the log's
-hash function or signature scheme.
-
-#### What if the data publisher must use a specific signature scheme or format?
-They may _cross-sign_ the data as follows.
-1. Sign the data as they're used to.
-2. Hash the data and use the result as the leaf's checksum to be logged.
-3. Sign the leaf using the log's signature scheme.
-
-For verification, the end-user first verifies that the usual signature from step 1 is valid.  Then the
-end-user uses the additional tooling (which is already required) to verify the rest.
-Cross-signing should be a relatively comfortable upgrade path that is backwards
-compatible.  The downside is that the data publisher may need to manage an
-additional key-pair.
-
-#### What (de)serialization parsers are needed?
-#### What policy should be used?
-#### Why witness cosigning?
-#### Why sharding?
-Unlike X.509 certificates which already have validity ranges, a
-checksum does not carry any such information.  Therefore, we require
-that the submitter selects a _shard hint_.  The selected shard hint
-must be in the log's _shard interval_.  A shard interval is defined by
-a start time and an end time.  Both ends of the shard interval are
-inclusive and expressed as the number of seconds since the UNIX epoch
-(January 1, 1970 00:00 UTC).
-
-Sharding simplifies log operations because it becomes explicit when a
-log can be shutdown.  A log must only accept logging requests that
-have valid shard hints.  A log should only accept logging requests
-during the predefined shard interval.  Note that _the submitter's
-shard hint is not a verified timestamp_.  The submitter should set the
-shard hint as large as possible.  If a roughly verified timestamp is
-needed, a cosigned tree head can be used.
-
-Without a shard hint, the good Samaritan could log all leaves from an
-earlier shard into a newer one.  Not only would that defeat the
-purpose of sharding, but it would also become a potential
-denial-of-service vector.
-
-#### TODO
-Add more key questions and answers.
-- Log spamming
-- Log poisoning
-- Why we removed identifier field from the leaf
-- Explain `latest`, `stable` and `cosigned` tree head.
-- Privacy aspects
-- How does this whole thing work with more than one log?
-
-## Concluding remarks
-Example of binary transparency and reproducible builds.
-- 
cgit v1.2.3