1 files changed, 116 insertions, 90 deletions
diff --git a/doc/api.md b/doc/api.md
index b5d54e6..174d2c9 100644
--- a/doc/api.md
+++ b/doc/api.md
@@ -8,28 +8,18 @@ This is a work-in-progress document that may be moved or modified.
 
 ## Overview
 The log implements an HTTP(S) API:
-- Requests that add data to the log use the HTTP POST method.  The HTTP content
-type is `application/x-www-form-urlencoded`.  The posted data are key-value
-pairs.  Binary data must be base64-encoded.
-- Requests that retrieve data from the log use the HTTP GET method.  The HTTP
-content type is `application/x-www-form-urlencoded`.  Input parameters are
-key-value pairs.
-- Responses are JSON objects.  The HTTP content type is `application/json`.
-- Error messages are human-readable strings.  The HTTP content type is
-`text/plain`.
-
-We decided to use these web formats for requests and responses because the log
-is running as an HTTP(S) service.  In other words, anyone that interacts with
-the log is most likely using these formats already.  The other benefit is that
-all requests and responses are human-readable.  This makes it easier to
-understand the protocol, troubleshoot issues, and copy-paste.  We favored
-compatibility and understandability over a wire-efficient format.
-
-Note that we are not using JSON for signed and/or logged data.  In other words,
-a submitter that wishes to distribute log responses to their user base in a
-different format may do so.  The forced (de)serialization parser on _end-users_
-is a small subset of Trunnel.  Trunnel is an "idiot-proof" wire-format that the
-Tor project uses.
+- Requests that add data to the log use the HTTP POST method.
+- Request that retrieve data from the log use the HTTP GET method.
+- The HTTP content type is `application/x-www-form-urlencoded` for requests and
+responses.  This means that all input and output are expressed as key-value
+pairs.  Binary data must be hex-encoded.
+
+We decided to use percent encoding for requests and responses because it is a
+_simple format_ that is commonly used on the web.  We are not using percent
+encoding for signed and/or logged data.  In other words, a submitter may
+distribute log responses to their end-users in a different format that suit
+them.  The forced (de)serialization parser on _end-users_ is a small subset of
+Trunnel.  Trunnel is an "idiot-proof" wire-format that the Tor project uses.
 
 ## Primitives
 ### Cryptography
@@ -49,6 +39,13 @@ padding.  Supporting RSA is suboptimal, but excluding it would make the log
 useless for many possible adopters.
 
 ### Serialization
+Log requests and responses are percent encoded.  Percent encoding is a smaller
+dependency than an alternative parser like JSON.  It is comparable to rolling
+your own minimalistic line-terminated format.  Some input and output data is
+binary: cryptographic hashes and signatures.  Binary data must be expressed as
+hex before percent-encoding it.  We decided to use hex as opposed to base64
+because it is simpler, favoring simplicity over efficiency on the wire.
+
 We use the [Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html)
 to define (de)serialization of data structures that need to be signed or
 inserted into the Merkle tree.  Trunnel is more expressive than the
@@ -62,13 +59,12 @@ A fair summary of our Trunnel usage is as follows.
 
 All integers are 64-bit, unsigned, and in network byte order.  A fixed-size byte
 array is put into the serialization buffer in-order, starting from the first
-byte.  These basic types are concatenated to form a collection.  You should not
-need a general-purpose Trunnel (de)serialization parser to work with this
-format.  If you have one, you may use it though.  The main point of using
-Trunnel is that it makes a simple format explicit and unambiguous.
-
-TODO: URL-encode _or_ JSON?  I think we should only need one.  Always doing HTTP
-POST would also ensure that input parameters don't show up in web server logs.
+byte.  A variable length byte array first declares its length as an integer,
+which is then followed by that number of bytes.  These basic types are
+concatenated to form a collection.  You should not need a general-purpose
+Trunnel (de)serialization parser to work with this format.  If you have one, you
+may use it though.  The main point of using Trunnel is that it makes a simple
+format explicit and unambiguous.
 
 #### Merkle tree head
 Tree heads are signed by the log and its witnesses.  It contains a timestamp, a
@@ -160,91 +156,124 @@ that it must be a valid HTTP(S) URL that can have the `/st/v0/<endpoint>` suffix
 appended.  For example, a complete endpoint URL could be
 `https://log.example.com/2021/st/v0/get-signed-tree-head`.
 
+The HTTP status code is 200 OK to indicate success.  A different HTTP status
+code is used to indicate failure.  The log should set the "error" key to a
+human-readable value that describes what went wrong.  For example,
+`error=invalid+signature`, `error=rate+limit+exceeded`, or
+`error=unknown+leaf+hash`.
+
 ### get-signed-tree-head
 ```
 GET <base url>/st/v0/get-signed-tree-head
 ```
 
-Input key-value pairs:
-- `type`: either the string "latest", "stable", or "cosigned".
-	- "latest": ask for the most recent signed tree head.
-	- "stable": ask for a recent signed tree head that is fixed for some period
+Input:
+- "type": either the string "latest", "stable", or "cosigned".
+	- latest: ask for the most recent signed tree head.
+	- stable: ask for a recent signed tree head that is fixed for some period
 	  of time.
-	- "cosigned": ask for a recent cosigned tree head.
-
-Output:
-- On success: status 200 OK and a signed tree head.  The response body is
-defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/sth.schema.json).
-- On failure: a different status code and a human-readable error message.
+	- cosigned: ask for a recent cosigned tree head.
+
+Output on success:
+- "timestamp": `tree_head.timestamp` as a human-readable number.
+- "tree_size": `tree_head.tree_size` as a human-readable number.
+- "root_hash": `tree_head.root_hash` in hex.
+- "signature": an Ed25519 signature over `tree_head`.  The result is
+hex-encoded.
+- "key_hash": a hash of the public verification key that can be used to verify
+the signature.  The public verification key is serialized as in RFC 8032, then
+hashed using SHA256.  The result is hex-encoded.
+
+The "signature" and "key_hash" fields may repeat. The first signature
+corresponds to the first key hash, the second signature corresponds to the
+second key hash, etc.  The number of signatures and key hashes must match.
 
 ### get-proof-by-hash
 ```
 POST <base url>/st/v0/get-proof-by-hash
 ```
 
-Input key-value pairs:
-- `leaf_hash`: a base64-encoded leaf hash that identifies which `tree_leaf` the
+Input:
+- "leaf_hash": a hex-encoded leaf hash that identifies which `tree_leaf` the
 log should prove inclusion for.  The leaf hash is computed using the RFC 6962
-hashing strategy.  In other words, `H(0x00 | tree_leaf)`.
-- `tree_size`: the tree size of a tree head that the proof should be based on.
+hashing strategy.  In other words, `SHA256(0x00 | tree_leaf)`.
+- "tree_size": a human-readable tree size of the tree head that the proof should
+be based on.
 
-Output:
-- On success: status 200 OK and an inclusion proof.  The response body is
-defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/inclusion_proof.schema.json).
-- On failure: a different status code and a human-readable error message.
+Output on success:
+- "tree_size": human-readable tree size that the proof is based on.
+- "leaf_index": human-readable zero-based index of the leaf that the proof is
+based on.
+- "inclusion_path": a node hash in hex.
+
+The "inclusion_path" may be omitted or repeated to represent an inclusion proof
+of zero or more node hashes.  The order of node hashes follow from our hash
+strategy, see RFC 6962.
 
 ### get-consistency-proof
 ```
 POST <base url>/st/v0/get-consistency-proof
 ```
 
-Input key-value pairs:
-- `new_size`: the tree size of a newer tree head.
-- `old_size`: the tree size of an older tree head that the log should prove is
-consistent with the newer tree head.
+Input:
+- "new_size": human-readable tree size of a newer tree head.
+- "old_size": human-readable tree size of an older tree head that the log should
+prove is consistent with the newer tree head.
+
+Output on success:
+- "new_size": human-readable tree size of a newer tree head that the proof
+is based on.
+- "old_size": human-readable tree size of an older tree head that the proof is
+based on.
+- "consistency_path": a node hash in hex.
 
-Output:
-- On success: status 200 OK and a consistency proof.  The response body is
-defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/consistency_proof.schema.json).
-- On failure: a different status code and a human-readable error message.
+The "consistency_path" may be omitted or repeated to represent a consistency
+proof of zero or more node hashes.  The order of node hashes follow from our
+hash strategy, see RFC 6962.
 
 ### get-leaves
 ```
 POST <base url>/st/v0/get-leaves
 ```
 
-Input key-value pairs:
-- `start_size`: zero-based index of the first leaf to retrieve.
-- `end_size`: index of the last leaf to retrieve.
+Input:
+- "start_size": human-readable index of the first leaf to retrieve.
+- "end_size": human-readable index of the last leaf to retrieve.
 
-Output:
-- On success: status 200 OK and a list of leaves.  The response body is
-defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/leaves.schema.json).
-- On failure: a different status code and a human-readable error message.
+Output on success:
+- "shard_hint": `tree_leaf.message.shard_hint` as a human-readable number.
+- "checksum": `tree_leaf.message.checksum` in hex.
+- "signature_scheme": human-readable number that identifies a signature scheme.
+- "signature": `tree_leaf.signature` in hex.
+- "key_hash": `tree_leaf.key_hash` in hex.
 
-The log may truncate the list of returned leaves.  However, it must not be an
-empty list on success. 
+All fields may be repeated to return more than one leaf.  The first value in
+each list refers to the first leaf, the second value in each list refers to the
+second leaf, etc.  The size of each list must match.
+
+The log may return fewer leaves than requested.  At least one leaf must be
+returned on HTTP status code 200 OK.
 
 ### add-leaf
 ```
 POST <base url>/st/v0/add-leaf
 ```
 
-Input key-value pairs:
-- `shard_hint`: the shard hint that the submitter selected.
-- `checksum`: the checksum that the submitter wants to log in base64.
-- `signature_scheme`: the signature scheme that the submitter wants to use.
-- `signature`: the submitter's signature over `tree_leaf.message` in base64.
-- `verification_key`: the submitter's public verification key.  It is serialized
-as described in the corresponding RFC, then base64-encoded.
-- `domain_hint`: a domain name that indicates where the public verification-key
-hash can be downloaded in base64.  Supported methods: DNS and HTTPS
-(TODO: docdoc).
-
-Output:
-- On success: HTTP 200.  The log will _try_ to incorporate the submitted leaf
-into its Merkle tree.
-- On failure: a different status code and a human-readable error message.
+Input:
+- "shard_hint": human-readable number in the log's shard interval that the
+submitter selected.
+- "checksum": the cryptographic checksum that the submitter wants to log in hex.
+- "signature_scheme": human-readable number that identifies the submitter's
+signature scheme.
+- "signature": the submitter's signature over `tree_leaf.message`.  The result
+is hex-encoded.
+- "verification_key": the submitter's public verification key.  It is serialized
+as described in the corresponding RFC.  The result is hex-encoded.
+- "domain_hint": a domain name that indicates where `tree_leaf.key_hash` can be
+retrieved as a DNS TXT resource record in hex.
+
+Output on success:
+- None
 
 The submitted entry will not be accepted if the signature is invalid or if the
 downloaded verification-key hash does not match.  The submitted entry may also
@@ -260,23 +289,20 @@ Public logging should not be assumed until an inclusion proof is available.  An
 inclusion proof should not be relied upon unless it leads up to a trustworthy
 signed tree head.  Witness cosigning can make a tree head trustworthy.
 
-TODO: the log may allow no `domain_hint`?  Especially useful for v0 testing.
-
 ### add-cosignature
 ```
 POST <base url>/st/v0/add-cosignature
 ```
 
-Input key-value pairs:
-- `signature`: a base64-encoded signature over a `tree_head` that is fixed for
-some period of time. The cosigning witness retrieves the tree head using the
-`get-signed-tree-head` endpoint with the "stable" type.
-- `key_hash`: a base64-encoded hash of the public verification key that can be
-used to verify the signature.
+Input:
+- "signature": an Ed25519 signature over `tree_head`.  The result is
+hex-encoded.
+- "key_hash": a hash of the public verification key that can be used to verify
+the signature.  The public verification key is serialized as in RFC 8032, then
+hashed using SHA256.  The result is hex-encoded.
 
-Output:
-- HTTP status 200 OK on success.  Otherwise a different status code and a
-human-readable error message.
+Output on success:
+- None
 
 The key-hash can be used to identify which witness signed the log's tree head.
 A key-hash, rather than the full verification key, is used to force the verifier