4 files changed, 720 insertions, 372 deletions
diff --git a/doc/api.md b/doc/api.md
new file mode 100644
index 0000000..57ad119
--- /dev/null
+++ b/doc/api.md
@@ -0,0 +1,398 @@
+# System Transparency Logging: API v0
+This document describes details of the System Transparency logging
+API, version 0.  The broader picture is not explained here.  We assume
+that you have read the System Transparency Logging design document.
+It can be found
+[here](https://github.com/system-transparency/stfe/blob/design/doc/design.md).
+
+**Warning.**
+This is a work-in-progress document that may be moved or modified.
+
+## Overview
+Logs implement an HTTP(S) API for accepting requests and sending
+responses.
+
+- Input data in requests and output data in responses are expressed as
+  ASCII-encoded key/value pairs.
+- Requests with input data use HTTP POST to send the data to a log.
+- Binary data is hex-encoded before being transmitted.
+
+The motivation for using a text based key/value format for request and
+response data is that it's simple to parse.  Note that this format is
+not being used for the serialization of signed or logged data, where a
+more well defined and storage efficient format is desirable.  A
+submitter may distribute log responses to their end-users in any
+format that suits them.  The (de)serialization required for
+_end-users_ is a small subset of Trunnel.  Trunnel is an "idiot-proof"
+wire-format in use by the Tor project.
+
+## Primitives
+### Cryptography
+Logs use the same Merkle tree hash strategy as
+[RFC 6962,§2](https://tools.ietf.org/html/rfc6962#section-2).
+The hash functions must be
+[SHA256](https://csrc.nist.gov/csrc/media/publications/fips/180/4/final/documents/fips180-4-draft-aug2014.pdf).
+Logs must sign tree heads using
+[Ed25519](https://tools.ietf.org/html/rfc8032).  Log witnesses
+must also sign tree heads using Ed25519.
+
+All other parts that are not Merkle tree related also use SHA256 as
+the hash function.  Using more than one hash function would increases
+the overall attack surface: two hash functions must be collision
+resistant instead of one.
+
+### Serialization
+Log requests and responses are transmitted as ASCII-encoded key/value
+pairs, for a smaller dependency than an alternative parser like JSON.
+Some input and output data is binary: cryptographic hashes and
+signatures.  Binary data must be Base16-encoded, also known as hex
+encoding.  Using hex as opposed to base64 is motivated by it being
+simpler, favoring ease of decoding and encoding over efficiency on the
+wire.
+
+We use the
+[Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html)
+to define (de)serialization of data structures that need to be signed or
+inserted into the Merkle tree.  Trunnel is more expressive than the
+[SSH wire format](https://tools.ietf.org/html/rfc4251#section-5).
+It is about as expressive as the
+[TLS presentation language](https://tools.ietf.org/html/rfc8446#section-3).
+A notable difference is that Trunnel supports integer constraints.
+The Trunnel language is also readable by humans _and_ machines.
+"Obviously correct code" can be generated in C and Go.
+
+A fair summary of our Trunnel usage is as follows.
+
+All integers are 64-bit, unsigned, and in network byte order.
+Fixed-size byte arrays are put into the serialization buffer in-order,
+starting from the first byte.  Variable length byte arrays first
+declare their length as an integer, which is then followed by that
+number of bytes.  These basic types are concatenated to form a
+collection.  You should not need a general-purpose Trunnel
+(de)serialization parser to work with this format.  If you have one,
+you may use it though.  The main point of using Trunnel is that it
+makes a simple format explicit and unambiguous.
+
+#### Merkle tree head
+Tree heads are signed both by a log and its witnesses.  It contains a
+timestamp, a tree size, and a root hash.  The timestamp is included so
+that monitors can ensure _liveliness_.  It is the time since the UNIX
+epoch (January 1, 1970 00:00 UTC) in seconds.  The tree size
+specifies the current number of leaves.  The root hash fixes the
+structure and content of the Merkle tree.
+
+```
+struct tree_head {
+	u64 timestamp;
+	u64 tree_size;
+	u8 root_hash[32];
+};
+```
+
+The serialized tree head must be signed using Ed25519.  A witness must
+not cosign a tree head if it is inconsistent with prior history or if
+the timestamp is backdated or future-dated more than 12 hours.
+
+#### Merkle tree leaf
+Logs support a single leaf type.  It contains a shard hint, a
+checksum over whatever the submitter wants to log a checksum for, a
+signature that the submitter computed over the shard hint and the
+checksum, and a hash of the submitter's public verification key, that
+can be used to verify the signature.
+
+```
+struct message {
+    u64 shard_hint;
+    u8 checksum[32];
+};
+
+struct tree_leaf {
+    struct message;
+    u8 signature_over_message[64];
+    u8 key_hash[32];
+}
+```
+
+`message` is composed of the `shard_hint`, chosen by the submitter to
+match the shard interval for the log it's submitting to, and the
+submitter's `checksum` to be logged.
+
+`signature_over_message` is a signature over `message`, using the
+submitter's verification key. It must be possible to verify the
+signature using the submitter's public verification key, as indicated
+by `key_hash`.
+
+`key_hash` is a hash of the submitter's verification key used for
+signing `message`. It is included in `tree_leaf` so that the leaf can
+be attributed to the submitter.  A hash, rather than the full public
+key, is used to motivate verifiers to locate the appropriate key and
+make an explicit trust decision.
+
+## Public endpoints
+Every log has a base URL that identifies it uniquely.  The only
+constraint is that it must be a valid HTTP(S) URL that can have the
+`/st/v0/<endpoint>` suffix appended.  For example, a complete endpoint
+URL could be
+`https://log.example.com/2021/st/v0/get-tree-head-cosigned`.
+
+Input data (in requests) is POST:ed in the HTTP message body as ASCII
+key/value pairs.
+
+Output data (in replies) is sent in the HTTP message body in the same
+format as the input data, i.e. as ASCII key/value pairs on the format
+`Key=Value`
+
+The HTTP status code is 200 OK to indicate success.  A different HTTP
+status code is used to indicate failure, in which case a log should
+respond with a human-readable string describing what went wrong using
+the key `error`. Example: `error=Invalid signature.`.
+
+### get-tree-head-cosigned
+Returns the latest cosigned tree head. Used together with
+`get-proof-by-hash` and `get-consistency-proof` for verifying the tree.
+
+```
+GET <base url>/st/v0/get-tree-head-cosigned
+```
+
+Input:
+- None
+
+Output on success:
+- `timestamp`: `tree_head.timestamp` ASCII-encoded decimal number,
+  seconds since the UNIX epoch.
+- `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number.
+- `root_hash`: `tree_head.root_hash` hex-encoded.
+- `signature`: hex-encoded Ed25519 signature over `timestamp`,
+  `tree_size` and `root_hash` serialized into a `tree_head` as
+  described in section `Merkle tree head`.
+- `key_hash`: a hash of the public verification key (belonging to
+  either the log or to one of its witnesses), which can be used to
+  verify the most recent `signature`.  The key is encoded as defined
+  in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), 
+  and then hashed using SHA256.  The hash value is hex-encoded.
+
+The `signature` and `key_hash` fields may repeat. The first signature
+corresponds to the first key hash, the second signature corresponds to
+the second key hash, etc.  The number of signatures and key hashes
+must match.
+
+### get-tree-head-to-sign
+Returns the latest tree head to be signed by log witnesses. Used by
+witnesses.
+
+```
+GET <base url>/st/v0/get-tree-head-to-sign
+```
+
+Input:
+- None
+
+Output on success:
+- `timestamp`: `tree_head.timestamp` ASCII-encoded decimal number,
+  seconds since the UNIX epoch.
+- `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number.
+- `root_hash`: `tree_head.root_hash` hex-encoded.
+- `signature`: hex-encoded Ed25519 signature over `timestamp`,
+  `tree_size` and `root_hash` serialized into a `tree_head` as
+  described in section `Merkle tree head`.
+- `key_hash`: a hash of the log's public verification key, which can
+  be used to verify `signature`.  The key is encoded as defined in
+  [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2),
+  and then hashed using SHA256.  The hash value is hex-encoded.
+
+There is exactly one `signature` and one `key_hash` field. The
+`key_hash` refers to the log's public verification key.
+
+
+### get-tree-head-latest
+Returns the latest tree head, signed only by the log. Used for
+debugging purposes.
+
+```
+GET <base url>/st/v0/get-tree-head-latest
+```
+
+Input:
+- None
+
+Output on success:
+- `timestamp`: `tree_head.timestamp` ASCII-encoded decimal number,
+  seconds since the UNIX epoch.
+- `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number.
+- `root_hash`: `tree_head.root_hash` hex-encoded.
+- `signature`: hex-encoded Ed25519 signature over `timestamp`,
+  `tree_size` and `root_hash` serialized into a `tree_head` as
+  described in section `Merkle tree head`.
+- `key_hash`: a hash of the log's public verification key that can be
+  used to verify `signature`.  The key is encoded as defined in
+  [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2),
+  and then hashed using SHA256.  The hash value is hex-encoded.
+
+There is exactly one `signature` and one `key_hash` field. The
+`key_hash` refers to the log's public verification key.
+
+
+### get-proof-by-hash
+```
+POST <base url>/st/v0/get-proof-by-hash
+```
+
+Input:
+- `leaf_hash`: leaf identifying which `tree_leaf` the log should prove
+  inclusion of, hex-encoded.
+- `tree_size`: tree size of the tree head that the proof should be
+  based on, as an ASCII-encoded decimal number.
+
+Output on success:
+- `tree_size`: tree size that the proof is based on, as an
+  ASCII-encoded decimal number.
+- `leaf_index`: zero-based index of the leaf that the proof is based
+  on, as an ASCII-encoded decimal number.
+- `inclusion_path`: node hash, hex-encoded.
+
+The leaf hash is computed using the RFC 6962 hashing strategy.  In
+other words, `SHA256(0x00 | tree_leaf)`.
+
+`inclusion_path` may be omitted or repeated to represent an inclusion
+proof of zero or more node hashes.  The order of node hashes follow
+from the hash strategy, see RFC 6962.
+
+Example: `echo "leaf_hash=241fd4538d0a35c2d0394e4710ea9e6916854d08f62602fb03b55221dcdac90f
+tree_size=4711" | curl --data-binary @- localhost/st/v0/get-proof-by-hash`
+
+### get-consistency-proof
+```
+POST <base url>/st/v0/get-consistency-proof
+```
+
+Input:
+- `new_size`: tree size of a newer tree head, as an ASCII-encoded
+  decimal number.
+- `old_size`: tree size of an older tree head that the log should
+  prove is consistent with the newer tree head, as an ASCII-encoded
+  decimal number.
+
+Output on success:
+- `new_size`: tree size of the newer tree head that the proof is based
+  on, as an ASCII-encoded decimal number.
+- `old_size`: tree size of the older tree head that the proof is based
+  on, as an ASCII-encoded decimal number.
+- `consistency_path`: node hash, hex-encoded.
+
+`consistency_path` may be omitted or repeated to represent a
+consistency proof of zero or more node hashes.  The order of node
+hashes follow from the hash strategy, see RFC 6962.
+
+Example: `echo "new_size=4711
+old_size=42" | curl --data-binary @- localhost/st/v0/get-consistency-proof`
+
+### get-leaves
+```
+POST <base url>/st/v0/get-leaves
+```
+
+Input:
+- `start_size`: index of the first leaf to retrieve, as an
+  ASCII-encoded decimal number.
+- `end_size`: index of the last leaf to retrieve, as an ASCII-encoded
+  decimal number.
+
+Output on success:
+- `shard_hint`: `tree_leaf.message.shard_hint` as an ASCII-encoded
+  decimal number.
+- `checksum`: `tree_leaf.message.checksum`, hex-encoded.
+- `signature`: `tree_leaf.signature_over_message`, hex-encoded.
+- `key_hash`: `tree_leaf.key_hash`, hex-encoded.
+
+All fields may be repeated to return more than one leaf.  The first
+value in each list refers to the first leaf, the second value in each
+list refers to the second leaf, etc.  The size of each list must
+match.
+
+A log may return fewer leaves than requested.  At least one leaf
+must be returned on HTTP status code 200 OK.
+
+Example: `echo "start_size=42
+end_size=4711" | curl --data-binary @- localhost/st/v0/get-leaves`
+
+### add-leaf
+```
+POST <base url>/st/v0/add-leaf
+```
+
+Input:
+- `shard_hint`: number within the log's shard interval as an
+  ASCII-encoded decimal number.
+- `checksum`: the cryptographic checksum that the submitter wants to
+  log, hex-encoded.
+- `signature_over_message`: the submitter's signature over
+  `tree_leaf.message`, hex-encoded.
+- `verification_key`: the submitter's public verification key.  The
+  key is encoded as defined in
+  [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2)
+  and then hex-encoded.
+- `domain_hint`: domain name indicating where `tree_leaf.key_hash`
+  can be found as a DNS TXT resource record.
+
+Output on success:
+- None
+
+The submission will not be accepted if `signature_over_message` is
+invalid or if the key hash retrieved using `domain_hint` does not
+match a hash over `verification_key`.
+
+The submission may also not be accepted if the second-level domain
+name exceeded its rate limit.  By coupling every add-leaf request to
+a second-level domain, it becomes more difficult to spam logs.  You
+would need an excessive number of domain names.  This becomes costly
+if free domain names are rejected.
+
+Logs don't publish domain-name to key bindings because key
+management is more complex than that.
+
+Public logging should not be assumed to have happened until an
+inclusion proof is available.  An inclusion proof should not be relied
+upon unless it leads up to a trustworthy signed tree head.  Witness
+cosigning can make a tree head trustworthy.
+
+Example: `echo "shard_hint=1640995200
+checksum=cfa2d8e78bf273ab85d3cef7bde62716261d1e42626d776f9b4e6aae7b6ff953
+signature_over_message=c026687411dea494539516ee0c4e790c24450f1a4440c2eb74df311ca9a7adf2847b99273af78b0bda65dfe9c4f7d23a5d319b596a8881d3bc2964749ae9ece3
+verification_key=c9a674888e905db1761ba3f10f3ad09586dddfe8581964b55787b44f318cbcdf
+domain_hint=example.com" | curl --data-binary @- localhost/st/v0/add-leaf`
+
+### add-cosignature
+```
+POST <base url>/st/v0/add-cosignature
+```
+
+Input:
+- `signature`: Ed25519 signature over `tree_head`, hex-encoded.
+- `key_hash`: hash of the witness' public verification key that can be
+  used to verify `signature`.  The key is encoded as defined in
+  [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2),
+  and then hashed using SHA256. The hash value is hex-encoded.
+
+Output on success:
+- None
+
+`key_hash` can be used to identify which witness signed the tree
+head.  A key-hash, rather than the full verification key, is used to
+motivate verifiers to locate the appropriate key and make an explicit
+trust decision.
+
+Example: `echo "signature=d1b15061d0f287847d066630339beaa0915a6bbb77332c3e839a32f66f1831b69c678e8ca63afd24e436525554dbc6daa3b1201cc0c93721de24b778027d41af
+key_hash=662ce093682280f8fbea9939abe02fdba1f0dc39594c832b411ddafcffb75b1d" | curl --data-binary @- localhost/st/v0/add-cosignature`
+
+## Summary of log parameters
+- **Public key**: The Ed25519 verification key to be used for
+  verifying tree head signatures.
+- **Log identifier**: The public verification key `Public key` hashed
+  using SHA256.
+- **Shard interval start**: The earliest time at which logging
+  requests are accepted as the number of seconds since the UNIX epoch.
+- **Shard interval end**: The latest time at which logging
+  requests are accepted as the number of seconds since the UNIX epoch.
+- **Base URL**: Where the log can be reached over HTTP(S).  It is the
+  prefix to be used to construct a version 0 specific endpoint.
diff --git a/doc/claimant.md b/doc/claimant.md
new file mode 100644
index 0000000..6728fef
--- /dev/null
+++ b/doc/claimant.md
@@ -0,0 +1,71 @@
+# Claimant model
+## **System<sup>CHECKSUM</sup>**
+System<sup>CHECKSUM</sup> is about the claims made by a data publisher.
+* **Claim<sup>CHECKSUM</sup>**:
+	_I, data publisher, claim that the data_:
+	1. has cryptographic hash X
+	2. is produced by no-one but myself
+* **Statement<sup>CHECKSUM</sup>**: signed checksum<br>
+* **Claimant<sup>CHECKSUM</sup>**: data publisher<br>
+	The data publisher is a party that wants to publish some data.
+* **Believer<sup>CHECKSUM</sup>**: end-user<br>
+	The end-user is a party that wants to use some published data.
+* **Verifier<sup>CHECKSUM</sup>**: data publisher<br>
+	Only the data publisher can verify the above claims.
+* **Arbiter<sup>CHECKSUM</sup>**:<br>
+    There's no official body.  Invalidated claims would affect reputation.
+
+System<sup>CHECKSUM\*</sup> can be defined to make more specific claims.  Below
+is a reproducible builds example.
+
+### **System<sup>CHECKSUM-RB</sup>**:
+System<sup>CHECKSUM-RB</sup> is about the claims made by a _software publisher_
+that makes reproducible builds available.
+* **Claim<sup>CHECKSUM-RB</sup>**:
+	_I, software publisher, claim that the data_:
+	1. has cryptographic hash X
+	2. is the output of a reproducible build for which the source can be located
+	using X as an identifier
+* **Statement<sup>CHECKSUM-RB</sup>**: Statement<sup>CHECKSUM</sup>
+* **Claimant<sup>CHECKSUM-RB</sup>**: software publisher<br>
+	The software publisher is a party that wants to publish the output of a
+	reproducible build.
+* **Believer<sup>CHECKSUM-RB</sup>**: end-user<br>
+	The end-user is a party that wants to run an executable binary that built
+	reproducibly.
+* **Verifier<sup>CHECKSUM-RB</sup>**: any interested party<br>
+	These parties try to verify the above claims.  For example:
+	* the software publisher itself (_"has my identity been compromised?"_)
+	* rebuilders that check for locatability and reproducibility
+* **Arbiter<sup>CHECKSUM-RB</sup>**:<br>
+    There's no official body.  Invalidated claims would affect reputation.
+
+## **System<sup>CHECKSUM-LOG</sup>**:
+System<sup>CHECKSUM-LOG</sup> is about the claims made by a _log operator_.
+It adds _discoverability_ into System<sup>CHECKSUM\*</sup>.  Discoverability
+means that Verifier<sup>CHECKSUM\*</sup> can see all
+Statement<sup>CHECKSUM</sup> that Believer<sup>CHECKSUM\*</sup> accept.
+
+* **Claim<sup>CHECKSUM-LOG</sup>**:
+	_I, log operator, make available:_
+	1. a globally consistent append-only log of Statement<sup>CHECKSUM</sup>
+* **Statement<sup>CHECKSUM-LOG</sup>**: signed tree head
+* **Claimant<sup>CHECKSUM-LOG</sup>**: log operator<br>
+   Possible operators might be:
+	* a small subset of data publishers
+	* members of relevant consortia
+* **Believer<sup>CHECKSUM-LOG</sup>**:
+	* Believer<sup>CHECKSUM\*</sup>
+	* Verifier<sup>CHECKSUM\*</sup><br>
+* **Verifier<sup>CHECKSUM-LOG</sup>**: third parties<br>
+	These parties verify the above claims.  Examples include:
+	* members of relevant consortia
+	* non-profits and other reputable organizations
+	* security enthusiasts and researchers
+	* log operators (cross-ecosystem)
+	* monitors (cross-ecosystem)
+	* a small subset of data publishers (cross-ecosystem)
+* **Arbiter<sup>CHECKSUM-LOG</sup>**:<br>
+	There is no official body.  The ecosystem at large should stop using an
+	instance of System<sup>CHECKSUM-LOG</sup> if cryptographic proofs of log
+	misbehavior are preseneted by some Verifier<sup>CHECKSUM-LOG</sup>.
diff --git a/doc/design.md b/doc/design.md
new file mode 100644
index 0000000..2e01a34
--- /dev/null
+++ b/doc/design.md
@@ -0,0 +1,251 @@
+# System Transparency Logging: Design v0
+We propose System Transparency logging.  It is similar to Certificate
+Transparency, except that cryptographically signed checksums are logged as
+opposed to X.509 certificates.  Publicly logging signed checksums allow anyone
+to discover which keys produced what signatures.  As such, malicious and
+unintended key-usage can be _detected_.  We present our design and conclude by
+providing two use-cases: binary transparency and reproducible builds.
+
+**Target audience.**
+You are most likely interested in transparency logs or supply-chain security.
+
+**Preliminaries.**
+You have basic understanding of cryptographic primitives like digital
+signatures, hash functions, and Merkle trees.  You roughly know what problem
+Certificate Transparency solves and how.
+
+**Warning.**
+This is a work-in-progress document that may be moved or modified.  A future
+revision of this document will bump the version number to v1.  Please let us
+know if you have any feedback.
+
+## Introduction
+Transparency logs make it possible to detect unwanted events.  For example,
+	are there any (mis-)issued TLS certificates [\[CT\]](https://tools.ietf.org/html/rfc6962),
+	did you get a different Go module than everyone else [\[ChecksumDB\]](https://go.googlesource.com/proposal/+/master/design/25530-sumdb.md),
+	or is someone running unexpected commands on your server [\[AuditLog\]](https://transparency.dev/application/reliably-log-all-actions-performed-on-your-servers/).
+A System Transparency log makes signed checksums transparent.  The overall goal
+is to facilitate detection of unwanted key-usage.
+
+## Threat model and (non-)goals
+We consider a powerful attacker that gained control of a target's signing and
+release infrastructure.  This covers a weaker form of attacker that is able to
+sign data and distribute it to a subset of isolated users.  For example, this is
+essentially what the FBI requested from Apple in the San Bernardino case [\[FBI-Apple\]](https://www.eff.org/cases/apple-challenges-fbi-all-writs-act-order).
+The fact that signing keys and related infrastructure components get
+compromised should not be controversial these days [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/).
+
+The attacker can also gain control of the transparency log's signing key and
+infrastructure.  This covers a weaker form of attacker that is able to sign log
+data and distribute it to a subset of isolated users.  For example, this could
+have been the case when a remote code execution was found for a Certificate
+Transparency Log [\[DigiCert\]](https://groups.google.com/a/chromium.org/g/ct-policy/c/aKNbZuJzwfM).
+
+Any attacker that is able to position itself to control these components will
+likely be _risk-averse_.  This is at minimum due to two factors.  First,
+detection would result in a significant loss of capability that is by no means
+trivial to come by.  Second, detection means that some part of the attacker's
+malicious behavior will be disclosed publicly.
+
+Our goal is to facilitate _detection_ of compromised signing keys.  We consider
+a signing key compromised if an end-user accepts an unwanted signature as valid.
+The solution that we propose is that signed checksums are transparency logged.
+For security we need a collision resistant hash function and an unforgeable
+signature scheme.  We also assume that at most a threshold of seemingly
+independent parties are adversarial.
+
+It is a non-goal to disclose the data that a checksum represents.  For example,
+the log cannot distinguish between a checksum that represents a tax declaration,
+an ISO image, or a Debian package.  This means that the type of detection we
+support is more _coarse-grained_ when compared to Certificate Transparency.
+
+## Design
+We consider a data publisher that wants to digitally sign their data.  The data
+is of opaque type.  We assume that end-users have a mechanism to locate the
+relevant public verification keys.  Data and signatures can also be retrieved
+(in)directly from the data publisher.  We make little assumptions about the
+signature tooling.  The ecosystem at large can continue to use `gpg`, `openssl`,
+`ssh-keygen -Y`, `signify`, or something else.
+
+We _have to assume_ that additional tooling can be installed by end-users that
+wish to enforce transparency logging.  For example, none of the existing
+signature tooling supports verification of Merkle tree proofs.  A side-effect of
+our design is that this additional tooling makes no outbound connections.  The
+above data flows are thus preserved.
+
+### A bird's view
+A central part of any transparency log is the data stored by the log.  The data is stored by the
+leaves of an append-only Merkle tree.  Our leaf structure contains four fields:
+- **shard_hint**: a number that binds the leaf to a particular _shard interval_.
+Sharding means that the log has a predefined time during which logging requests
+are accepted.  Once elapsed, the log can be shut down.
+- **checksum**: a cryptographic hash of some opaque data.  The log never
+sees the opaque data; just the hash made by the data publisher.
+- **signature**: a digital signature that is computed by the data publisher over
+the leaf's shard hint and checksum.
+- **key_hash**: a cryptographic hash of the data publisher's public verification key that can be
+used to verify the signature.
+
+#### Step 1 - preparing a logging request
+The data publisher selects a shard hint and a checksum that should be logged.
+For example, the shard hint could be "logs that are active during 2021".  The
+checksum might be the hash of a release file.
+
+The data publisher signs the selected shard hint and checksum using a secret
+signing key.  Both the signed message and the signature is stored
+in the leaf for anyone to verify.  Including a shard hint in the signed message
+ensures that a good Samaritan cannot change it to log all leaves from an
+earlier shard into a newer one.
+
+A hash of the public verification key is also stored in the leaf.  This makes it
+possible to attribute the leaf to the data publisher.  For example, a data publisher
+that monitors the log can look for leaves that match their own key hash(es).
+
+A hash, rather than the full public verification key, is used to motivate the
+verifier to locate the key and make an explicit trust decision.  Not disclosing the public
+verification key in the leaf makes it more unlikely that someone would use an untrusted key _by
+mistake_.
+
+#### Step 2 - submitting a logging request
+The log implements an HTTP(S) API.  Input and output is human-readable and uses
+a simple key-value format.  A more complex parser like JSON is not needed
+because the exchanged data structures are primitive enough.
+
+The data publisher submits their shard hint, checksum, signature, and public
+verification key as key-value pairs.  The log will use the public verification
+key to check that the signature is valid, then hash it to construct the `key_hash` part of the leaf.
+
+The data publisher also submits a _domain hint_.  The log will download a DNS
+TXT resource record based on the provided domain name.  The downloaded result
+must match the public verification key hash.  By verifying that the submitter
+controls a domain that is aware of the public verification key, rate limits can
+be applied per second-level domain.  As a result, you would need a large number
+of domain names to spam the log in any significant way.
+
+Using DNS to combat spam is convenient because many data publishers already have
+a domain name.  A single domain name is also relatively cheap.  Another
+benefit is that the same anti-spam mechanism can be used across several
+independent logs without coordination.  This is important because a healthy log
+ecosystem needs more than one log in order to be reliable.  DNS also has built-in
+caching which data publishers can influence by setting TTLs accordingly.
+
+The submitter's domain hint is not part of the leaf because key management is
+more complex than that.  A separate project should focus on transparent key
+management.  The scope of our work is transparent _key-usage_.
+
+The log will _try_ to incorporate a leaf into the Merkle tree if a logging
+request is accepted.  There are no _promises of public logging_ as in
+Certificate Transparency.  Therefore, the submitter needs to wait for an
+inclusion proof to appear before concluding that the logging request succeeded.  Not having
+inclusion promises makes the log less complex.
+
+#### Step 3 - distributing proofs of public logging
+The data publisher is responsible for collecting all cryptographic proofs that
+their end-users will need to enforce public logging.  The collection below
+should be downloadable from the same place that published data is normally hosted.
+1. **Opaque data**: the data publisher's opaque data.
+2. **Shard hint**: the data publisher's selected shard hint.
+3. **Signature**: the data publisher's leaf signature.
+4. **Cosigned tree head**: the log's tree head and a _list of signatures_ that
+state it is consistent with prior history.
+5. **Inclusion proof**: a proof of inclusion based on the logged leaf and tree
+head in question.
+
+The data publisher's public verification key is known.  Therefore, the first three fields are
+sufficient to reconstruct the logged leaf.  The leaf's signature can be
+verified.  The final two fields then prove that the leaf is in the log.  If the
+leaf is included in the log, any monitor can detect that there is a new
+signature made by a given data publisher, 's public verification key.
+
+The catch is that the proof of logging is only as convincing as the tree head
+that the inclusion proof leads up to.  To bypass public logging, the attacker
+needs to control a threshold of independent _witnesses_ that cosign the log.  A
+benign witness will only sign the log's tree head if it is consistent with prior
+history.
+
+#### Summary
+The log is sharded and will shut down at a predefined time.  The log can shut
+down _safely_ because end-user verification is not interactive.  The difficulty
+of bypassing public logging is based on the difficulty of controlling a
+threshold of independent witnesses.  Witnesses cosign tree heads to make them
+trustworthy.
+
+Submitters, monitors, and witnesses interact with the log using an HTTP(S) API.
+Submitters must prove that they own a domain name as an anti-spam mechanism.
+End-users interact with the log _indirectly_ via a data publisher.  It is the
+data publisher's job to log signed checksums, distribute necessary proofs of
+logging, and monitor the log.
+
+### A peek into the details
+Our bird's view introduction skipped many details that matter in practise.  Some
+of these details are presented here using a question-answer format.  A
+question-answer format is helpful because it is easily modified and extended.
+
+#### What cryptographic primitives are supported?
+The only supported hash algorithm is SHA256.  The only supported signature
+scheme is Ed25519.  Not having any cryptographic agility makes the protocol less
+complex and more secure.
+
+We can be cryptographically opinionated because of a key insight.  Existing
+signature tools like `gpg`, `ssh-keygen -Y`, and `signify` cannot verify proofs
+of public logging.  Therefore, _additional tooling must already be installed by
+end-users_.  That tooling should verify hashes using the log's hash function.
+That tooling should also verify signatures using the log's signature scheme.
+Both tree heads and tree leaves are being signed.
+
+#### Why not let the data publisher pick their own signature scheme and format?
+Agility introduces complexity and difficult policy questions.  For example,
+which algorithms and formats should (not) be supported and why?  Picking Ed25519
+is a current best practise that should be encouraged if possible.
+
+There is not much we can do if a data publisher _refuses_ to rely on the log's
+hash function or signature scheme.
+
+#### What if the data publisher must use a specific signature scheme or format?
+They may _cross-sign_ the data as follows.
+1. Sign the data as they're used to.
+2. Hash the data and use the result as the leaf's checksum to be logged.
+3. Sign the leaf using the log's signature scheme.
+
+For verification, the end-user first verifies that the usual signature from step 1 is valid.  Then the
+end-user uses the additional tooling (which is already required) to verify the rest.
+Cross-signing should be a relatively comfortable upgrade path that is backwards
+compatible.  The downside is that the data publisher may need to manage an
+additional key-pair.
+
+#### What (de)serialization parsers are needed?
+#### What policy should be used?
+#### Why witness cosigning?
+#### Why sharding?
+Unlike X.509 certificates which already have validity ranges, a
+checksum does not carry any such information.  Therefore, we require
+that the submitter selects a _shard hint_.  The selected shard hint
+must be in the log's _shard interval_.  A shard interval is defined by
+a start time and an end time.  Both ends of the shard interval are
+inclusive and expressed as the number of seconds since the UNIX epoch
+(January 1, 1970 00:00 UTC).
+
+Sharding simplifies log operations because it becomes explicit when a
+log can be shutdown.  A log must only accept logging requests that
+have valid shard hints.  A log should only accept logging requests
+during the predefined shard interval.  Note that _the submitter's
+shard hint is not a verified timestamp_.  The submitter should set the
+shard hint as large as possible.  If a roughly verified timestamp is
+needed, a cosigned tree head can be used.
+
+Without a shard hint, the good Samaritan could log all leaves from an
+earlier shard into a newer one.  Not only would that defeat the
+purpose of sharding, but it would also become a potential
+denial-of-service vector.
+
+#### TODO
+Add more key questions and answers.
+- Log spamming
+- Log poisoning
+- Why we removed identifier field from the leaf
+- Explain `latest`, `stable` and `cosigned` tree head.
+- Privacy aspects
+- How does this whole thing work with more than one log?
+
+## Concluding remarks
+Example of binary transparency and reproducible builds.
diff --git a/doc/sketch.md b/doc/sketch.md
deleted file mode 100644
index 31964e0..0000000
--- a/doc/sketch.md
+++ /dev/null
@@ -1,372 +0,0 @@
-# System Transparency Logging
-This document provides a sketch of System Transparency (ST) logging.  The basic
-idea is to insert hashes of system artifacts into a public, append-only, and
-tamper-evident transparency log, such that any enforcing client can be sure that
-they see the same system artifacts as everyone else.  A system artifact could
-be a browser update, an operating system image, a Debian package, or more
-generally something that is opaque.
-
-We take inspiration from the Certificate Transparency Front-End
-([CTFE](https://github.com/google/certificate-transparency-go/tree/master/trillian/ctfe))
-that implements [RFC 6962](https://tools.ietf.org/html/rfc6962) for
-[Trillian](https://transparency.dev).
-
-## Log parameters
-An ST log is defined by the following parameters:
-- `log_identifier`: a `Namespace` of type `ed25519_v1` that defines the log's
-signing algorithm and public verification key.
-- `supported_namespaces`: a list of namespace types that the log supports.
-Entities must use a supported namespace type when posting signed data to the
-log.
-- `base_url`: prefix used by clients that contact the log, e.g.,
-example.com:1234/log.
-- `final_cosigned_tree_head`: an `StItem` of type `cosigned_tree_head_v*`.  Not
-set until the log is turned into read-only mode in preparation of a shutdown.
-
-ST logs use the same hash strategy as described in RFC 6962: SHA256 with `0x00`
-as leaf node prefix and `0x01` as interior node prefix.
-
-In contrast to Certificate Transparency (CT) **there is no Maximum Merge Delay
-(MMD)**.  New entries are merged into the log as soon as possible, and no client
-should trust that something is logged until an inclusion proof can be provided
-that references a trustworthy STH.  Therefore, **there are no "promises" of
-public logging** as in CT.
-
-To produce trustworthy STHs a simple form of [witness
-cosigning](https://arxiv.org/pdf/1503.08768.pdf) is built into the log.
-Witnesses poll the log for the next stable STH, and verify that it is consistent
-before posting a cosignature that can then be served by the log.
-
-## Acceptance criteria and scope
-A log should accept a leaf submission if it is:
-- Well-formed, see data structure definitions below.
-- Digitally signed by a registered namespace.
-
-Rate limits may be applied per namespace to combat spam.  Namespaces may also be
-used by clients to determine which entries belong to who.  It is up to the
-submitters to communicate trusted namespaces to their own clients.  In other
-words, there are no mappings from namespaces to identities built into the log.
-There is also no revocation of namespaces: **we facilitate _detection_ of
-compromised signing keys by making artifact hashes public, which is not to be
-confused with _prevention_ or even _recovery_ after detection**.
-
-## Data structure definitions
-Data structures are defined and serialized using the presentation language in
-[RFC 5246, §4](https://tools.ietf.org/html/rfc5246).  A definition of the log's
-Merkle tree can be found in [RFC 6962,
-§2](https://tools.ietf.org/html/rfc6962#section-2).
-
-### Namespace
-A _namespace_ is a versioned data structure that contains a public verification
-key (or fingerprint), as well as enough information to determine its format,
-signing, and verification operations.  Namespaces are used as identifiers, both
-for the log itself and the parties that submit artifact hashes and cosignatures.
-
-```
-enum {
-	reserved(0),
-	ed25519_v1(1),
-	(2^16-1)
-} NamespaceFormat;
-
-struct {
-	NamespaceFormat format;
-	select (format) {
-		case ed25519_v1: Ed25519V1;
-	} message;
-} Namespace;
-```
-
-Our namespace format is inspired by Keybase's
-[key-id](https://keybase.io/docs/api/1.0/kid).
-
-#### Ed25519V1
-At this time the only supported namespace type is based on Ed25519.  The
-namespace field contains the full verification key.  Signing operations and
-serialized formats are defined by [RFC
-8032](https://tools.ietf.org/html/rfc8032).
-```
-struct {
-	opaque namespace[32]; // public verification key
-} Ed25519V1;
-```
-
-### `StItem`
-A general-purpose `TransItem` is defined in [RFC 6962/bis,
-§4.5](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.5).
-We define our own `TransItem`, but name it `StItem` to emphasize that they are
-not the same.
-
-```
-enum {
-	reserved(0),
-	signed_tree_head_v1(1),
-	cosigned_tree_head_v1(2),
-	consistency_proof_v1(3),
-	inclusion_proof_v1(4),
-	signed_checksum_v1(5), // leaf type
-	(2^16-1)
-} StFormat;
-
-struct {
-	StFormat format;
-	select (format) {
-		case signed_tree_head_v1: SignedTreeHeadV1;
-		case cosigned_tree_head_v1: CosignedTreeHeadV1;
-		case consistency_proof_v1: ConsistencyProofV1;
-		case inclusion_proof_v1: InclusionProofV1;
-		case signed_checksum_v1: SignedChecksumV1;
-	} message;
-} StItem;
-
-struct {
-	StItem items<0..2^32-1>;
-} StItemList;
-```
-
-#### `signed_tree_head_v1`
-We use the same tree head definition as in [RFC 6962/bis,
-§4.9](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.9).
-The resulting _signed_ tree head is packaged differently: a namespace is used as
-log identifier, and it is communicated in a `SignatureV1` structure.
-```
-struct {
-	TreeHeadV1 tree_head;
-	SignatureV1 signature;
-} SignedTreeHeadV1;
-
-struct {
-	uint64 timestamp;
-	uint64 tree_size;
-	NodeHash root_hash;
-	Extension extensions<0..2^16-1>;
-} TreeHeadV1;
-opaque NodeHash<32..2^8-1>;
-
-struct {
-	Namespace namespace;
-	opaque signature<1..2^16-1>;
-} SignatureV1;
-```
-
-#### `cosigned_tree_head_v1`
-Transparency logs were designed to be cryptographically verifiable in the
-presence of a gossip-audit model that ensures everyone observes _the same
-cryptographically verifiable log_.  The gossip-audit model is largely undefined
-in today's existing transparency logging ecosystems, which means that the logs
-must be trusted to play by the rules.   We wanted to avoid that outcome in our
-ecosystem.  Therefore, a gossip-audit model is built into the log.
-
-The basic idea is that an STH should only be considered valid if it is cosigned
-by a number of witnesses that verify the append-only property.  Which witnesses
-to trust and under what circumstances is defined by a client-side _witness
-cosigning policy_.  For example,
-	"require no witness cosigning",
-	"must have at least `k` signatures from witnesses A...J", and
-	"must have at least `k` signatures from witnesses A...J where one is from
-		witness B".
-
-Witness cosigning policies are beyond the scope of this specification.
-
-A cosigned STH is composed of an STH and a list of cosignatures.  A cosignature
-must cover the serialized STH as an `StItem`, and be produced with a witness
-namespace of type `ed25519_v1`.
-
-```
-struct {
-	SignedTreeHeadV1 signed_tree_head;
-	SignatureV1 cosignatures<0..2^32-1>; // vector of cosignatures
-} CosignedTreeHeadV1;
-```
-
-#### `consistency_proof_v1`
-For the most part we use the same consistency proof definition as in [RFC
-6962/bis,
-§4.11](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.11).
-There are two modifications: our log identifier is a namespace rather than an
-[OID](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.4),
-and a consistency proof may be empty.
-
-```
-struct {
-	Namespace log_id;
-	uint64 tree_size_1;
-	uint64 tree_size_2;
-	NodeHash consistency_path<0..2^16-1>;
-} ConsistencyProofV1;
-```
-
-#### `inclusion_proof_v1`
-For the most part we use the same inclusion proof definition as in [RFC
-6962/bis,
-§4.12](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.12).
-There are two modifications: our log identifier is a namespace rather than an
-[OID](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.4),
-and an inclusion proof may be empty.
-```
-struct {
-	Namespace log_id;
-	uint64 tree_size;
-	uint64 leaf_index;
-	NodeHash inclusion_path<0..2^16-1>;
-} InclusionProofV1;
-```
-
-#### `signed_checksum_v1`
-A checksum entry contains a package identifier like `foobar-1.2.3` and an
-artifact hash.   It is then signed so that clients can distinguish artifact
-hashes from two different software publishers A and B.  For example, the
-`signed_checksum_v1` type can help [enforce public binary logging before
-accepting a new software
-update](https://wiki.mozilla.org/Security/Binary_Transparency).
-
-```
-struct {
-	ChecksumV1 data;
-	SignatureV1 signature;
-} SignedChecksumV1;
-
-struct {
-	opaque identifier<1..128>;
-	opaque checksum<1..64>;
-} ChecksumV1;
-```
-
-It is assumed that clients know how to find the real artifact source (if not
-already at hand), such that the logged hash can be recomputed and compared for
-equality.  The log is not aware of how artifact hashes are computed, which means
-that it is up to the submitters to define hash functions, data formats, and
-such.
-
-## Public endpoints
-Clients talk to the log using HTTP(S). Successfully processed requests are
-responded to with HTTP status code `200 OK`, and any returned data is
-serialized.  Endpoints without input parameters use HTTP GET requests.
-Endpoints that have input parameters HTTP POST a TLS-serialized data structure.
-The HTTP content type `application/octet-stream` is used when sending data.
-
-### add-entry
-```
-POST https://<base url>/st/v1/add-entry
-```
-
-Input:
-- An `StItem` of type `signed_checksum_v1`.
-
-No output.
-
-### add-cosignature
-```
-POST https://<base url>/st/v1/add-cosignature
-```
-
-Input:
-- An `StItem` of type `cosigned_tree_head_v1`.  The list of cosignatures must
-be of length one, the witness signature must cover the item's STH, and that STH
-must additionally match the log's stable STH that is currently being cosigned.
-
-No output.
-
-### get-latest-sth
-```
-GET https://<base url>/st/v1/get-latest-sth
-```
-
-No input.
-
-Output:
-- An `StItem` of type `signed_tree_head_v1` that corresponds to the most
-recent STH.
-
-### get-stable-sth
-```
-GET https://<base url>/st/v1/get-stable-sth
-```
-
-No input.
-
-Output:
-- An `StItem` of type `signed_tree_head_v1` that corresponds to a stable STH
-that witnesses should cosign.  The same STH is returned for a period of time.
-
-### get-cosigned-sth
-```
-GET https://<base url>/st/v1/get-cosigned-sth
-```
-
-No input.
-
-Output:
-- An `StItem` of type `cosigned_tree_head_v1` that corresponds to the most
-recent cosigned STH.
-
-### get-proof-by-hash
-```
-POST https://<base url>/st/v1/get-proof-by-hash
-```
-
-Input:
-```
-struct {
-	opaque hash[32]; // leaf hash
-	uint64 tree_size; // tree size that the proof should be based on
-} GetProofByHashV1;
-```
-
-Output:
-- An `StItem` of type `inclusion_proof_v1`.
-
-### get-consistency-proof
-```
-POST https://<base url>/st/v1/get-consistency-proof
-```
-
-Input:
-```
-struct {
-	uint64 first; // first tree size that the proof should be based on
-	uint64 second; // second tree size that the proof should be based on
-} GetConsistencyProofV1;
-```
-
-Output:
-- An `StItem` of type `consistency_proof_v1`.
-
-### get-entries
-```
-POST https://<base url>/st/v1/get-entries
-```
-
-Input:
-```
-struct {
-	uint64 start; // 0-based index of first entry to retrieve
-	uint64 end; // 0-based index of last entry to retrieve in decimal.
-} GetEntriesV1;
-```
-
-Output:
-- An `StItem` list where each entry is of type `signed_checksum_v1`.  The first
-`StItem` corresponds to the start index, the second one to `start+1`, etc.  The
-log may return fewer entries than requested.
-
-# Appendix A
-In the future other namespace types might be supported.  For example, we could
-add [RSASSA-PKCS1-v1_5](https://tools.ietf.org/html/rfc3447#section-8.2) as
-follows:
-1. Add `rsa_v1` format and RSAV1 namespace.  This is what we would register on
-the server-side such that the server knows the namespace and complete key.
-```
-struct {
-	opaque namespace<32>; // key fingerprint
-	// + some encoding of public key
-} RSAV1;
-```
-2. Add `rsassa_pkcs1_5_v1` format and `RSASSAPKCS1_5_v1`.  This is what the
-submitter would use to communicate namespace and RSA signature mode.
-```
-struct {
-	opaque namespace<32>; // key fingerprint
-	// + necessary parameters, e.g., SHA256 as hash function
-} RSASSAPKCS1_5V1;
-```