12 files changed, 466 insertions, 532 deletions
diff --git a/doc/api.md b/doc/api.md
new file mode 100644
index 0000000..760663b
--- /dev/null
+++ b/doc/api.md
@@ -0,0 +1,247 @@
+# System Transparency Logging: API v0
+This document describes details of the System Transparency logging API,
+version 0.  The broader picture is not explained here.  We assume that you have
+read the System Transparency design document.  It can be found [here](https://github.com/system-transparency/stfe/blob/design/doc/design.md).
+
+**Warning.**
+This is a work-in-progress document that may be moved or modified.
+
+## Overview
+The log implements an HTTP(S) API:
+- Requests that add data to the log use the HTTP POST method.  The HTTP content
+type is `application/x-www-form-urlencoded`.  The posted data are key-value
+pairs.  Binary data must be base64-encoded.
+- Requests that retrieve data from the log use the HTTP GET method.  The HTTP
+content type is `application/x-www-form-urlencoded`.  Input parameters are
+key-value pairs.
+- Responses are JSON objects.  The HTTP content type is `application/json`.
+- Error messages are human-readable strings.  The HTTP content type is
+`text/plain`.
+
+We decided to use these web formats for requests and responses because the log
+is running as an HTTP(S) service.  In other words, anyone that interacts with
+the log is most likely using these formats already.  The other benefit is that
+all requests and responses are human-readable.  This makes it easier to
+understand the protocol, troubleshoot issues, and copy-paste.  We favored
+compatibility and understandability over a wire-efficient format.
+
+Note that we are not using JSON for signed and/or logged data.  In other words,
+a submitter that wishes to distribute log responses to their user base in a
+different format may do so.  The forced (de)serialization parser on _end-users_
+is a small subset of Trunnel.  Trunnel is an "idiot-proof" wire-format that the
+Tor project uses.
+
+## Primitives
+### Cryptography
+The log uses the same Merkle tree hash strategy as [RFC 6962, §2](https://tools.ietf.org/html/rfc6962#section-2).
+The hash functions must be [SHA256](https://csrc.nist.gov/csrc/media/publications/fips/180/4/final/documents/fips180-4-draft-aug2014.pdf).
+The log must sign tree heads using [Ed25519](https://tools.ietf.org/html/rfc8032).
+The log's witnesses must also sign tree heads using Ed25519.
+
+All other parts that are not Merkle tree related also use SHA256 as the hash
+function.  Using more than one hash function would increases the overall attack
+surface: two hash functions must be collision resistant instead of one.
+
+We recommend that submitters sign using Ed25519.  We also support RSA with
+[deterministic](https://tools.ietf.org/html/rfc8017#section-8.2)
+or [probabilistic](https://tools.ietf.org/html/rfc8017#section-8.1)
+padding.  Supporting RSA is suboptimal, but excluding it would make the log
+useless for many possible adopters.
+
+### Serialization
+We use the [Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html)
+to define (de)serialization of data structures that need to be signed or
+inserted into the Merkle tree.  Trunnel is more expressive than the
+[SSH wire format](https://tools.ietf.org/html/rfc4251#section-5).
+It is about as expressive as the [TLS presentation language](https://tools.ietf.org/html/rfc8446#section-3).
+A notable difference is that Trunnel supports integer constraints.  The Trunnel
+language is also readable by humans _and_ machines.  "Obviously correct code"
+can be generated in C and Go.
+
+A fair summary of our Trunnel usage is as follows.
+
+All integers are 64-bit, unsigned, and in network byte order.  A fixed-size byte
+array is put into the serialization buffer in-order, starting from the first
+byte.  These basic types are concatenated to form a collection.  You should not
+need a general-purpose Trunnel (de)serialization parser to work with this
+format.  If you have one, you may use it though.  The main point of using
+Trunnel is that it makes a simple format explicit and unambiguous.
+
+TODO: URL-encode _or_ JSON?  I think we should only need one.  Always doing HTTP
+POST would also ensure that input parameters don't show up in web server logs.
+
+#### Merkle tree head
+Tree heads are signed by the log and its witnesses.  It contains a timestamp, a
+tree size, and a root hash.  The timestamp is included so that monitors can
+ensure _liveliness_.  It is the time since the UNIX epoch (January 1, 1970
+00:00:00 UTC) in milliseconds.  The tree size specifies the current number of
+leaves.  The root hash fixes the structure and content of the Merkle tree.
+
+```
+struct tree_head {
+	u64 timestamp;
+	u64 tree_size;
+	u8 root_hash[32];
+};
+```
+
+The serialized tree head must be signed using Ed25519.  A witness must only sign
+the log's tree head if it is consistent with prior history and the timestamp is
+roughly correct.  A timestamp is roughly correct if it is not backdated or
+future-dated more than 12 hours.
+
+#### Merkle tree leaf
+The log supports a single leaf type.  It contains a checksum, a signature
+scheme, a signature that the submitter computed over that checksum, and the hash
+of the public verification key that can be used to verify the signature.
+
+```
+const ALG_ED25519 = 1; // RFC 8032
+const ALG_RSASSA_PKCS1_V1_5 = 2; // RFC 8017
+const ALG_RSASSA_PSS = 3; // RFC 8017
+
+struct tree_leaf {
+	u8 checksum[32];
+	u64 signature_scheme IN [
+		ALG_ED25519,
+		ALG_RSASSA_PKCS1_V1_5,
+		ALG_RSASSA_PSS,
+	];
+	union signature[signature_scheme] {
+		ALG_ED25519: u8 ed25519[32];
+		default:     u8 rsa[512];
+	}
+	u8 key_hash[32];
+}
+```
+
+A key-hash is included in the leaf so that it can be attributed to the signing
+entity.  A hash, rather than the full public verification key, is used to force
+the verifier to locate the appropriate key and make an explicit trust decision.
+
+## Public endpoints
+Every log has a base URL that identifies it uniquely.  The only constraint is
+that it must be a valid HTTP(S) URL that can have the `/st/v0/<endpoint>` suffix
+appended.  For example, a complete endpoint URL could be
+`https://log.example.com/2021/st/v0/get-signed-tree-head`.
+
+### get-signed-tree-head
+```
+GET <base url>/st/v0/get-signed-tree-head
+```
+
+Input key-value pairs:
+- `type`: either the string "latest", "stable", or "cosigned".
+	- "latest": ask for the most recent signed tree head.
+	- "stable": ask for a recent signed tree head that is fixed for some period
+	  of time.
+	- "cosigned": ask for a recent cosigned tree head.
+
+Output:
+- On success: status 200 OK and a signed tree head.  The response body is
+defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/sth.schema.json).
+- On failure: a different status code and a human-readable error message.
+
+### get-proof-by-hash
+```
+POST <base url>/st/v0/get-proof-by-hash
+```
+
+Input key-value pairs:
+- `leaf_hash`: a base64-encoded leaf hash that identifies which `tree_leaf` the
+log should prove inclusion for.  The leaf hash is computed using the RFC 6962
+hashing strategy.  In other words, `H(0x00 | tree_leaf)`.
+- `tree_size`: the tree size of a tree head that the proof should be based on.
+
+Output:
+- On success: status 200 OK and an inclusion proof.  The response body is
+defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/inclusion_proof.schema.json).
+- On failure: a different status code and a human-readable error message.
+
+### get-consistency-proof
+```
+POST <base url>/st/v0/get-consistency-proof
+```
+
+Input key-value pairs:
+- `new_size`: the tree size of a newer tree head.
+- `old_size`: the tree size of an older tree head that the log should prove is
+consistent with the newer tree head.
+
+Output:
+- On success: status 200 OK and a consistency proof.  The response body is
+defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/consistency_proof.schema.json).
+- On failure: a different status code and a human-readable error message.
+
+### get-leaves
+```
+POST <base url>/st/v0/get-leaves
+```
+
+Input key-value pairs:
+- `start_size`: zero-based index of the first leaf to retrieve.
+- `end_size`: index of the last leaf to retrieve.
+
+Output:
+- On success: status 200 OK and a list of leaves.  The response body is
+defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/leaves.schema.json).
+- On failure: a different status code and a human-readable error message.
+
+The log may truncate the list of returned leaves.  However, it must not be an
+empty list on success. 
+
+### add-leaf
+```
+POST <base url>/st/v0/add-leaf
+```
+
+Input key-value pairs:
+- `leaf_checksum`: the checksum that the submitter wants to log in base64.
+- `signature_scheme`: the signature scheme that the submitter wants to use.
+- `tree_leaf_signature`: the submitter's `tree_leaf` signature in base64.
+- `verification_key`: the submitter's public verification key.  It is serialized
+as described in the corresponding RFC, then base64-encoded.
+- `domain_hint`: a domain name that indicates where the public verification-key
+hash can be downloaded in base64.  Supported methods: DNS and HTTPS
+(TODO: docdoc).
+
+Output:
+- On success: HTTP 200.  The log will _try_ to incorporate the submitted leaf
+into its Merkle tree.
+- On failure: a different status code and a human-readable error message.
+
+The submitted entry will not be accepted if the signature is invalid or if the
+downloaded verification-key hash does not match.  The submitted entry may also
+not be accepted if the second-level domain name exceeded its rate limit.  By
+coupling every add-leaf request with a second-level domain, it becomes more
+difficult to spam the log.  You would need an excessive number of domain names.
+This becomes costly if free domain names are rejected.
+
+The log does not publish domain-name to key bindings because key management is
+more complex than that.
+
+Public logging should not be assumed until an inclusion proof is available.  An
+inclusion proof should not be relied upon unless it leads up to a trustworthy
+signed tree head.  Witness cosigning can make a tree head trustworthy.
+
+TODO: the log may allow no `domain_hint`?  Especially useful for v0 testing.
+
+### add-cosignature
+```
+POST <base url>/st/v0/add-cosignature
+```
+
+Input key-value pairs:
+- `signature`: a base64-encoded signature over a `tree_head` that is fixed for
+some period of time. The cosigning witness retrieves the tree head using the
+`get-signed-tree-head` endpoint with the "stable" type.
+- `key_hash`: a base64-encoded hash of the public verification key that can be
+used to verify the signature.
+
+Output:
+- HTTP status 200 OK on success.  Otherwise a different status code and a
+human-readable error message.
+
+The key-hash can be used to identify which witness signed the log's tree head.
+A key-hash, rather than the full verification key, is used to force the verifier
+to locate the appropriate key and make an explicit trust decision.
diff --git a/doc/design.md b/doc/design.md
new file mode 100644
index 0000000..f966d03
--- /dev/null
+++ b/doc/design.md
@@ -0,0 +1,32 @@
+# System Transparency Logging: Design v0
+We propose System Transparency logging.  It is similar to Certificate
+Transparency, expect that cryptographically signed checksums are logged as
+opposed to X.509 certificates.  Publicly logging signed checksums allow anyone
+to discover which keys signed what.  As such, malicious and unintended key-usage
+can be _discovered_.  We present our design and discuss how two possible
+use-cases influenced it: binary transparency and reproducible builds.
+
+**Target audience.**
+You are most likely interested in transparency logs or supply-chain security.
+
+**Preliminaries.**
+You have basic understanding of cryptographic primitives like digital
+signatures, hash functions, and Merkle trees.  You roughly know what problem
+Certificate Transparency solves and how.  You may never have heard the term
+_gossip-audit model_, or know how it is related to trust assumptions and
+detectability properties.
+
+**Warning.**
+This is a work-in-progress document that may be moved or modified.
+
+## Introduction
+Transparency logs make it possible to detect unwanted events.  For example,
+	are there any (mis-)issued TLS certificates [\[CT\]](https://tools.ietf.org/html/rfc6962),
+	did you get a different Go module than everyone else [\[ChecksumDB\]](https://go.googlesource.com/proposal/+/master/design/25530-sumdb.md),
+	or is someone running unexpected commands on your server [\[AuditLog\]](https://transparency.dev/application/reliably-log-all-actions-performed-on-your-servers/).
+System Transparency logging makes signed checksums transparent.  The goal is to
+_detect_ unwanted key-usage without making assumptions about the signed data.
+
+## Threat model and (non-)goals
+
+## Design
diff --git a/doc/formats.md b/doc/formats.md
deleted file mode 100644
index bffd05f..0000000
--- a/doc/formats.md
+++ /dev/null
@@ -1,160 +0,0 @@
-# Formats
-This document defines data structures and data formats.
-
-## Overview
-Here we give an overview of our presentation language / serialization rules.
-
-All integers are represented by 64-bit unsigned integers in network byte order.
-
-Variable length lists have an integer specifying its length.  Then each list
-item is enumerated.
-
-TODO: fixme.
-
-## Items
-Every item type start with a versioned format specifier.  Protocol version 1
-uses format specifiers in the range 1--X.
-
-### Request data structures
-Log endpoints that take input data use the following request data structures.
-
-#### `get_entries_v1`
-```
-0  Format  8                16               24
-+----------+----------------+----------------+
-|    1     |   Start Size   |    End Size    |
-+----------+----------------+----------------+
-   uint64        uint64           uint64
-```
-- Format is always 1 for items of type `get_entries_v1`.
-- Start size specifies the index of the first Merkle tree leaf to retrieve.
-- End size specifies the index of the last Merkle tree leaf to retrieve.
-
-#### `get_proof_by_hash_v1`
-```
-0  Format  8                16               48
-+----------+----------------+----------------+
-|    2     |   Tree size    |    Leaf hash   |
-+----------+----------------+----------------+
-   uint64        uint64      fixed byte array
-```
-- Format is always 2 for items of type `get_proof_by_hash_v1`.
-- Leaf hash is computed as described in [RFC 6962/bis, §2.1.1](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-35#section-2.1.1).
-- Tree size specifies which Merkle tree root inclusion should be proven for.
-
-#### `get_consistency_proof_v1`
-```
-0  Format  8                16               24
-+----------+----------------+----------------+
-|    3     |    Old size    |    New size    |
-+----------+----------------+----------------+
-   uint64        uint64           uint64
-```
-- Format is always 3 for items of type `get_consistency_proof_v1`.
-- Old size specifies the tree size of an older Merkle tree head.
-- New size specifies the tree size of a newer Merkle tree head.
-
-### Proof and log data structures
-#### `inclusion_proof_v1`
-```
-                                                                               --zero or more node hashes-->
-0  Format  8                48               56               64               72                 72+Length
-+----------+----------------+----------------+----------------+----------------+--------//--------+
-|    4     |   Identifier   |    Tree size   |    Leaf index  |     Length     |    Node hashes   |
-+----------+----------------+----------------+----------------+----------------+--------//--------+
-   uint64      ed25519_v1         uint64           uint64           uint64           list body
-```
-- Format is always 4 for items of type `inclusion_proof_v1`.
-- Identifier identifies the log uniquely as an `ed25519_v1` item.
-- Tree size is the size of the Merkle tree that the proof is based on.
-- Leaf index is a zero-based index of the log entry that the proof is based on.
-- The remaining part is a list of node hashes.
-	- Length specifies the full byte size of the list.  It must be `32 * m`,
-	where `m >= 0`.  This means that an inclusion needs zero or more node
-	hashes to be well-formed.
-	- Node hash is a node hash in the Merkle tree that the proof is based on.
-
-Remark: the list of node hashes is generated and verified as in [RFC 6962/bis,
-§2.1.3](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-35#section-2.1.3).
-
-#### `consistency_proof_v1`
-```
-                                                                               --zero or more node hashes-->
-0  Format  8                48               56               64               72                 72+Length
-+----------+----------------+----------------+----------------+----------------+--------//--------+
-|    5     |   Identifier   |    Old size    |    New size    |     Length     |    Node hashes   |
-+----------+----------------+----------------+----------------+----------------+--------//--------+
-   uint64     ed25519_v1          uint64           uint64           uint64           list body
-```
-- Format is always 5 for items of type `consistency_proof_v1`.
-- Identifier identifies the log uniquely as an `ed25519_v1` item.
-- Old size is the tree size of the older Merkle tree.
-- New size is the tree size of the newer Merkle tree.
-- The remaining part is a list of node hashes.
-	- Length specifies the full byte size of the list.  It must be `32 * m`,
-	where `m >= 0`.  This means that a consistenty proof needs zero or more node
-	hashes to be well-formed.
-	- Node hash is a node hash from the older or the newer Merkle tree.
-
-Remark: the list of node hashes is generated and verified as in [RFC 6962/bis,
-§2.1.4](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-35#section-2.1.4).
-
-#### `signed_tree_head_v1`
-```
-                                                                               ----one or more signature-identifier pairs------->
-0  Format  8               16               24               56                64               128              168    64+Length
-+----------+----------------+----------------+----------------+----------------+----------------+----------------+--//--+
-|    6     |   Timestamp    |   Tree size    |    Root hash   |     Length     |    Signature   |   Identifier   | .... |
-+----------+----------------+----------------+----------------+----------------+----------------+----------------+--//--+
-   uint64        uint64           uint64      fixed byte array      uint64      fixed byte array     ed25519_v1   cont. list body
-```
-- Format is always 6 for items of type `signed_tree_head_v1`.
-- Timestamp is the time since the UNIX epoch (January 1, 1970 00:00:00 UTC) in
-milliseconds.
-- Tree size is the number of leaves in the current Merkle tree.
-- Root hash is the root hash of the current Merkle tree.
-- The remaining part is a list of signature-identifier pairs. 
-	- Length specifies the full byte size of the list.  It must be `104 * m`,
-	where `m > 1`.  This means that a signed tree head needs at least one
-	signature-identifier pair to be well-formed.
-	- Signature is an Ed25519 signature over bytes 0--56.  The signature is
-	encodes as in [RFC 8032, §3.3](https://tools.ietf.org/html/rfc8032#section-3.3).
-	- Identifier identifies the signer uniquely as an `ed25519_v1` item.
-
-Remark: there may be multiple signature-identifier pairs if the log is cosigned.
-
-#### `signed_checksum32_ed25519_v1`
-```
-0  Format  8                40               56                 56+Length        120+Length         160+Length
-+----------+----------------+----------------+-------//---------+----------------+--------//--------+
-|    7     |     Checksum   |     Length     |    Identifier    |    Signature   |    Namespace     |
-+----------+----------------+----------------+-------//---------+----------------+--------//--------+
-   uint64   fixed byte array      uint64          byte array     fixed byte array      ed25519_v1
-```
-- Format is always 7 for items of type `signed_checksum32_ed25519_v1`.
-- Checksum is a 32-byte checksum that represents a data item of opaque type.
-- Length specified the full byte size of the following identifier.  It must be
-larger than zero and less than 128.
-- Identifier identifies what the checksum represents.  The aforementioned length
-constraint means that the identifier cannot be omitted or exceed 128 bytes.
-- Signature is an Ed25519 signature over bytes 0--56+Length.  The signature is
-encodes as in [RFC 8032, §3.3](https://tools.ietf.org/html/rfc8032#section-3.3).
-- Namespace is an `ed25519_v1` item that identifies the signer uniquely.
-
-Remark: to keep this checksum entry as simple as possible it does not have a
-variable length checksum or any agility with regards to the signing namespace.
-This means that we need to have multiple leaf types that follow the pattern
-`signed_checksum{32,64}_namespace_v1`.
-
-### Namespace data structures
-#### `ed25519_v1`
-```
-0  Format  8                40
-+----------+----------------+
-|    8     |   public key   |
-+----------+----------------+
-   uint64   fixed byte array
-```
-- The format is always 8 for items of type `ed25519_v1`.
-- The public Ed25519 verification key is always 32 bytes.  See encoding in [RFC
-8032, §3.2](https://tools.ietf.org/html/rfc8032#section-3.2).
diff --git a/doc/schema/consistency_proof.schema.json b/doc/schema/consistency_proof.schema.json
new file mode 100644
index 0000000..003f3c7
--- /dev/null
+++ b/doc/schema/consistency_proof.schema.json
@@ -0,0 +1,30 @@
+{
+	"$schema": "https://json-schema.org/draft-07/schema#",
+	"title": "inclusion_proof",
+	"description": "JSON-formatted inclusion proof, version 0.",
+
+	"type": "object",
+	"required": [ "new_size", "old_size", "consistency_proof" ],
+	"properties": {
+		"new_size": {
+			"description": "The tree size of the newer Merkle tree head.",
+			"type": "integer",
+			"minimum": 0
+		},
+		"old_size": {
+			"description": "The tree size of the older Merkle tree head.",
+			"type": "integer",
+			"minimum": 0
+		},
+		"consistency_proof": {
+			"description": "A list of base64-encoded node hashes that proves consistency",
+			"type": "array",
+			"items": {
+				"description": "A node hash in base64",
+				"type": "string",
+				"minLength": 44,
+				"maxLength": 44
+			}
+		}
+	}
+}
diff --git a/doc/schema/example/consistency_proof.json b/doc/schema/example/consistency_proof.json
new file mode 100644
index 0000000..0a323b7
--- /dev/null
+++ b/doc/schema/example/consistency_proof.json
@@ -0,0 +1,7 @@
+{
+	"new_size": 2,
+	"old_size": 1,
+	"consistency_proof": [
+		"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="
+	]
+}
diff --git a/doc/schema/example/inclusion_proof.json b/doc/schema/example/inclusion_proof.json
new file mode 100644
index 0000000..d46d426
--- /dev/null
+++ b/doc/schema/example/inclusion_proof.json
@@ -0,0 +1,7 @@
+{
+	"tree_size": 2,
+	"leaf_index": 0,
+	"inclusion_proof": [
+		"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="
+	]
+}
diff --git a/doc/schema/example/leaves.json b/doc/schema/example/leaves.json
new file mode 100644
index 0000000..1eed05d
--- /dev/null
+++ b/doc/schema/example/leaves.json
@@ -0,0 +1,14 @@
+[
+	{
+		"checksum": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=",
+		"signature_scheme": 1,
+		"signature": "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC=",
+		"key_hash": "DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD="
+	},
+	{
+		"checksum": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=",
+		"signature_scheme": 2,
+		"signature": "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB=",
+		"key_hash": "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC="
+	}
+]
diff --git a/doc/schema/example/sth.json b/doc/schema/example/sth.json
new file mode 100644
index 0000000..ec3ad11
--- /dev/null
+++ b/doc/schema/example/sth.json
@@ -0,0 +1,11 @@
+{
+    "timestamp": 0,
+    "tree_size": 0,
+    "root_hash": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=",
+    "signatures": [
+        {
+            "key_hash": "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB=",
+            "signature": "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC="
+        }
+    ]
+}
diff --git a/doc/schema/inclusion_proof.schema.json b/doc/schema/inclusion_proof.schema.json
new file mode 100644
index 0000000..3309d37
--- /dev/null
+++ b/doc/schema/inclusion_proof.schema.json
@@ -0,0 +1,30 @@
+{
+	"$schema": "https://json-schema.org/draft-07/schema#",
+	"title": "inclusion_proof",
+	"description": "JSON-formatted inclusion proof, version 0.",
+
+	"type": "object",
+	"required": [ "tree_size", "leaf_index", "inclusion_proof" ],
+	"properties": {
+		"tree_size": {
+			"description": "The Merkle tree size that the inclusion proof is based on.",
+			"type": "integer",
+			"minimum": 0
+		},
+		"leaf_index": {
+			"description": "The zero-based index of the leaf that the inclusion proof is for.",
+			"type": "integer",
+			"minimum": 0
+		},
+		"inclusion_proof": {
+			"description": "A list of base64-encoded node hashes that proves inclusion",
+			"type": "array",
+			"items": {
+				"description": "A node hash in base64",
+				"type": "string",
+				"minLength": 44,
+				"maxLength": 44
+			}
+		}
+	}
+}
diff --git a/doc/schema/leaves.schema.json b/doc/schema/leaves.schema.json
new file mode 100644
index 0000000..74d7454
--- /dev/null
+++ b/doc/schema/leaves.schema.json
@@ -0,0 +1,38 @@
+{
+	"$schema": "https://json-schema.org/draft-07/schema#",
+	"title": "list of tree_leaf",
+	"description": "JSON-formatted tree leaf list, version 0.",
+
+	"type": "array",
+	"description": "A list Merkle tree leaves",
+	"items": {
+		"type": "object",
+		"required": [ "checksum", "signature_scheme", "signature", "key_hash" ],
+		"properties": {
+			"checksum": {
+				"description": "A cryptographic hash that is computed over some data of opaque type.  The result is base64-encoded.",
+				"type": "string",
+				"minLength": 44,
+				"maxLength": 44
+			},
+			"signature_scheme": {
+				"description": "An integer that identifies the signature scheme used by the submitter.  See API documentation.",
+				"type": "integer",
+				"enum": [ 1, 2, 3 ]
+			},
+			"signature": {
+				"description": "The submitter's signature over the checksum in base64",
+				"type": "string",
+				"minLength": 44,
+				"maxLength": 684
+			},
+			"key_hash": {
+				"description": "A public verification-key hash that identifies the signer.",
+				"type": "string",
+				"minLength": 44,
+				"maxLength": 44
+			}
+		}
+	},
+	"minItems": 1
+}
diff --git a/doc/schema/sth.schema.json b/doc/schema/sth.schema.json
new file mode 100644
index 0000000..86de2d3
--- /dev/null
+++ b/doc/schema/sth.schema.json
@@ -0,0 +1,50 @@
+{
+	"$schema": "https://json-schema.org/draft-07/schema#",
+	"title": "signed_tree_head_v0",
+	"description": "JSON-formatted signed tree head, version 0.",
+
+	"type": "object",
+	"required": [ "timestamp", "tree_size", "root_hash", "signatures" ],
+	"properties": {
+		"timestamp": {
+			"description": "The number of milliseconds since the UNIX epoch (January 1, 1970 00:00:00 UTC).",
+			"type": "integer",
+			"minimum": 0
+		},
+		"tree_size": {
+			"description": "The number of entries that are stored in the log's Merkle tree.",
+			"type": "integer",
+			"minimum": 0
+		},
+		"root_hash": {
+			"description": "The log's Merkle tree root hash in base64.",
+			"type": "string",
+			"minLength": 44,
+			"maxLength": 44
+		},
+		"signatures": {
+			"description": "A list of signer-signature pairs.",
+			"type": "array",
+			"items": {
+				"description": "A signer-signature pair.",
+				"type": "object",
+				"required": [ "key_hash", "signature" ],
+				"properties": {
+					"key_hash": {
+						"description": "A public verification-key hash that identifies the signer in base64.",
+						"type": "string",
+						"minLength": 44,
+						"maxLength": 44
+					},
+					"signature": {
+						"description": "The signer's signature over the log's tree_leaf structure in base64.",
+						"type": "string",
+						"minLength": 44,
+						"maxLength": 44
+					}
+				}
+			},
+			"minItems": 1
+		}
+	}
+}
diff --git a/doc/sketch.md b/doc/sketch.md
deleted file mode 100644
index 31964e0..0000000
--- a/doc/sketch.md
+++ /dev/null
@@ -1,372 +0,0 @@
-# System Transparency Logging
-This document provides a sketch of System Transparency (ST) logging.  The basic
-idea is to insert hashes of system artifacts into a public, append-only, and
-tamper-evident transparency log, such that any enforcing client can be sure that
-they see the same system artifacts as everyone else.  A system artifact could
-be a browser update, an operating system image, a Debian package, or more
-generally something that is opaque.
-
-We take inspiration from the Certificate Transparency Front-End
-([CTFE](https://github.com/google/certificate-transparency-go/tree/master/trillian/ctfe))
-that implements [RFC 6962](https://tools.ietf.org/html/rfc6962) for
-[Trillian](https://transparency.dev).
-
-## Log parameters
-An ST log is defined by the following parameters:
-- `log_identifier`: a `Namespace` of type `ed25519_v1` that defines the log's
-signing algorithm and public verification key.
-- `supported_namespaces`: a list of namespace types that the log supports.
-Entities must use a supported namespace type when posting signed data to the
-log.
-- `base_url`: prefix used by clients that contact the log, e.g.,
-example.com:1234/log.
-- `final_cosigned_tree_head`: an `StItem` of type `cosigned_tree_head_v*`.  Not
-set until the log is turned into read-only mode in preparation of a shutdown.
-
-ST logs use the same hash strategy as described in RFC 6962: SHA256 with `0x00`
-as leaf node prefix and `0x01` as interior node prefix.
-
-In contrast to Certificate Transparency (CT) **there is no Maximum Merge Delay
-(MMD)**.  New entries are merged into the log as soon as possible, and no client
-should trust that something is logged until an inclusion proof can be provided
-that references a trustworthy STH.  Therefore, **there are no "promises" of
-public logging** as in CT.
-
-To produce trustworthy STHs a simple form of [witness
-cosigning](https://arxiv.org/pdf/1503.08768.pdf) is built into the log.
-Witnesses poll the log for the next stable STH, and verify that it is consistent
-before posting a cosignature that can then be served by the log.
-
-## Acceptance criteria and scope
-A log should accept a leaf submission if it is:
-- Well-formed, see data structure definitions below.
-- Digitally signed by a registered namespace.
-
-Rate limits may be applied per namespace to combat spam.  Namespaces may also be
-used by clients to determine which entries belong to who.  It is up to the
-submitters to communicate trusted namespaces to their own clients.  In other
-words, there are no mappings from namespaces to identities built into the log.
-There is also no revocation of namespaces: **we facilitate _detection_ of
-compromised signing keys by making artifact hashes public, which is not to be
-confused with _prevention_ or even _recovery_ after detection**.
-
-## Data structure definitions
-Data structures are defined and serialized using the presentation language in
-[RFC 5246, §4](https://tools.ietf.org/html/rfc5246).  A definition of the log's
-Merkle tree can be found in [RFC 6962,
-§2](https://tools.ietf.org/html/rfc6962#section-2).
-
-### Namespace
-A _namespace_ is a versioned data structure that contains a public verification
-key (or fingerprint), as well as enough information to determine its format,
-signing, and verification operations.  Namespaces are used as identifiers, both
-for the log itself and the parties that submit artifact hashes and cosignatures.
-
-```
-enum {
-	reserved(0),
-	ed25519_v1(1),
-	(2^16-1)
-} NamespaceFormat;
-
-struct {
-	NamespaceFormat format;
-	select (format) {
-		case ed25519_v1: Ed25519V1;
-	} message;
-} Namespace;
-```
-
-Our namespace format is inspired by Keybase's
-[key-id](https://keybase.io/docs/api/1.0/kid).
-
-#### Ed25519V1
-At this time the only supported namespace type is based on Ed25519.  The
-namespace field contains the full verification key.  Signing operations and
-serialized formats are defined by [RFC
-8032](https://tools.ietf.org/html/rfc8032).
-```
-struct {
-	opaque namespace[32]; // public verification key
-} Ed25519V1;
-```
-
-### `StItem`
-A general-purpose `TransItem` is defined in [RFC 6962/bis,
-§4.5](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.5).
-We define our own `TransItem`, but name it `StItem` to emphasize that they are
-not the same.
-
-```
-enum {
-	reserved(0),
-	signed_tree_head_v1(1),
-	cosigned_tree_head_v1(2),
-	consistency_proof_v1(3),
-	inclusion_proof_v1(4),
-	signed_checksum_v1(5), // leaf type
-	(2^16-1)
-} StFormat;
-
-struct {
-	StFormat format;
-	select (format) {
-		case signed_tree_head_v1: SignedTreeHeadV1;
-		case cosigned_tree_head_v1: CosignedTreeHeadV1;
-		case consistency_proof_v1: ConsistencyProofV1;
-		case inclusion_proof_v1: InclusionProofV1;
-		case signed_checksum_v1: SignedChecksumV1;
-	} message;
-} StItem;
-
-struct {
-	StItem items<0..2^32-1>;
-} StItemList;
-```
-
-#### `signed_tree_head_v1`
-We use the same tree head definition as in [RFC 6962/bis,
-§4.9](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.9).
-The resulting _signed_ tree head is packaged differently: a namespace is used as
-log identifier, and it is communicated in a `SignatureV1` structure.
-```
-struct {
-	TreeHeadV1 tree_head;
-	SignatureV1 signature;
-} SignedTreeHeadV1;
-
-struct {
-	uint64 timestamp;
-	uint64 tree_size;
-	NodeHash root_hash;
-	Extension extensions<0..2^16-1>;
-} TreeHeadV1;
-opaque NodeHash<32..2^8-1>;
-
-struct {
-	Namespace namespace;
-	opaque signature<1..2^16-1>;
-} SignatureV1;
-```
-
-#### `cosigned_tree_head_v1`
-Transparency logs were designed to be cryptographically verifiable in the
-presence of a gossip-audit model that ensures everyone observes _the same
-cryptographically verifiable log_.  The gossip-audit model is largely undefined
-in today's existing transparency logging ecosystems, which means that the logs
-must be trusted to play by the rules.   We wanted to avoid that outcome in our
-ecosystem.  Therefore, a gossip-audit model is built into the log.
-
-The basic idea is that an STH should only be considered valid if it is cosigned
-by a number of witnesses that verify the append-only property.  Which witnesses
-to trust and under what circumstances is defined by a client-side _witness
-cosigning policy_.  For example,
-	"require no witness cosigning",
-	"must have at least `k` signatures from witnesses A...J", and
-	"must have at least `k` signatures from witnesses A...J where one is from
-		witness B".
-
-Witness cosigning policies are beyond the scope of this specification.
-
-A cosigned STH is composed of an STH and a list of cosignatures.  A cosignature
-must cover the serialized STH as an `StItem`, and be produced with a witness
-namespace of type `ed25519_v1`.
-
-```
-struct {
-	SignedTreeHeadV1 signed_tree_head;
-	SignatureV1 cosignatures<0..2^32-1>; // vector of cosignatures
-} CosignedTreeHeadV1;
-```
-
-#### `consistency_proof_v1`
-For the most part we use the same consistency proof definition as in [RFC
-6962/bis,
-§4.11](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.11).
-There are two modifications: our log identifier is a namespace rather than an
-[OID](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.4),
-and a consistency proof may be empty.
-
-```
-struct {
-	Namespace log_id;
-	uint64 tree_size_1;
-	uint64 tree_size_2;
-	NodeHash consistency_path<0..2^16-1>;
-} ConsistencyProofV1;
-```
-
-#### `inclusion_proof_v1`
-For the most part we use the same inclusion proof definition as in [RFC
-6962/bis,
-§4.12](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.12).
-There are two modifications: our log identifier is a namespace rather than an
-[OID](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.4),
-and an inclusion proof may be empty.
-```
-struct {
-	Namespace log_id;
-	uint64 tree_size;
-	uint64 leaf_index;
-	NodeHash inclusion_path<0..2^16-1>;
-} InclusionProofV1;
-```
-
-#### `signed_checksum_v1`
-A checksum entry contains a package identifier like `foobar-1.2.3` and an
-artifact hash.   It is then signed so that clients can distinguish artifact
-hashes from two different software publishers A and B.  For example, the
-`signed_checksum_v1` type can help [enforce public binary logging before
-accepting a new software
-update](https://wiki.mozilla.org/Security/Binary_Transparency).
-
-```
-struct {
-	ChecksumV1 data;
-	SignatureV1 signature;
-} SignedChecksumV1;
-
-struct {
-	opaque identifier<1..128>;
-	opaque checksum<1..64>;
-} ChecksumV1;
-```
-
-It is assumed that clients know how to find the real artifact source (if not
-already at hand), such that the logged hash can be recomputed and compared for
-equality.  The log is not aware of how artifact hashes are computed, which means
-that it is up to the submitters to define hash functions, data formats, and
-such.
-
-## Public endpoints
-Clients talk to the log using HTTP(S). Successfully processed requests are
-responded to with HTTP status code `200 OK`, and any returned data is
-serialized.  Endpoints without input parameters use HTTP GET requests.
-Endpoints that have input parameters HTTP POST a TLS-serialized data structure.
-The HTTP content type `application/octet-stream` is used when sending data.
-
-### add-entry
-```
-POST https://<base url>/st/v1/add-entry
-```
-
-Input:
-- An `StItem` of type `signed_checksum_v1`.
-
-No output.
-
-### add-cosignature
-```
-POST https://<base url>/st/v1/add-cosignature
-```
-
-Input:
-- An `StItem` of type `cosigned_tree_head_v1`.  The list of cosignatures must
-be of length one, the witness signature must cover the item's STH, and that STH
-must additionally match the log's stable STH that is currently being cosigned.
-
-No output.
-
-### get-latest-sth
-```
-GET https://<base url>/st/v1/get-latest-sth
-```
-
-No input.
-
-Output:
-- An `StItem` of type `signed_tree_head_v1` that corresponds to the most
-recent STH.
-
-### get-stable-sth
-```
-GET https://<base url>/st/v1/get-stable-sth
-```
-
-No input.
-
-Output:
-- An `StItem` of type `signed_tree_head_v1` that corresponds to a stable STH
-that witnesses should cosign.  The same STH is returned for a period of time.
-
-### get-cosigned-sth
-```
-GET https://<base url>/st/v1/get-cosigned-sth
-```
-
-No input.
-
-Output:
-- An `StItem` of type `cosigned_tree_head_v1` that corresponds to the most
-recent cosigned STH.
-
-### get-proof-by-hash
-```
-POST https://<base url>/st/v1/get-proof-by-hash
-```
-
-Input:
-```
-struct {
-	opaque hash[32]; // leaf hash
-	uint64 tree_size; // tree size that the proof should be based on
-} GetProofByHashV1;
-```
-
-Output:
-- An `StItem` of type `inclusion_proof_v1`.
-
-### get-consistency-proof
-```
-POST https://<base url>/st/v1/get-consistency-proof
-```
-
-Input:
-```
-struct {
-	uint64 first; // first tree size that the proof should be based on
-	uint64 second; // second tree size that the proof should be based on
-} GetConsistencyProofV1;
-```
-
-Output:
-- An `StItem` of type `consistency_proof_v1`.
-
-### get-entries
-```
-POST https://<base url>/st/v1/get-entries
-```
-
-Input:
-```
-struct {
-	uint64 start; // 0-based index of first entry to retrieve
-	uint64 end; // 0-based index of last entry to retrieve in decimal.
-} GetEntriesV1;
-```
-
-Output:
-- An `StItem` list where each entry is of type `signed_checksum_v1`.  The first
-`StItem` corresponds to the start index, the second one to `start+1`, etc.  The
-log may return fewer entries than requested.
-
-# Appendix A
-In the future other namespace types might be supported.  For example, we could
-add [RSASSA-PKCS1-v1_5](https://tools.ietf.org/html/rfc3447#section-8.2) as
-follows:
-1. Add `rsa_v1` format and RSAV1 namespace.  This is what we would register on
-the server-side such that the server knows the namespace and complete key.
-```
-struct {
-	opaque namespace<32>; // key fingerprint
-	// + some encoding of public key
-} RSAV1;
-```
-2. Add `rsassa_pkcs1_5_v1` format and `RSASSAPKCS1_5_v1`.  This is what the
-submitter would use to communicate namespace and RSA signature mode.
-```
-struct {
-	opaque namespace<32>; // key fingerprint
-	// + necessary parameters, e.g., SHA256 as hash function
-} RSASSAPKCS1_5V1;
-```