diff options
-rw-r--r-- | doc/api.md | 206 |
1 files changed, 116 insertions, 90 deletions
@@ -8,28 +8,18 @@ This is a work-in-progress document that may be moved or modified. ## Overview The log implements an HTTP(S) API: -- Requests that add data to the log use the HTTP POST method. The HTTP content -type is `application/x-www-form-urlencoded`. The posted data are key-value -pairs. Binary data must be base64-encoded. -- Requests that retrieve data from the log use the HTTP GET method. The HTTP -content type is `application/x-www-form-urlencoded`. Input parameters are -key-value pairs. -- Responses are JSON objects. The HTTP content type is `application/json`. -- Error messages are human-readable strings. The HTTP content type is -`text/plain`. - -We decided to use these web formats for requests and responses because the log -is running as an HTTP(S) service. In other words, anyone that interacts with -the log is most likely using these formats already. The other benefit is that -all requests and responses are human-readable. This makes it easier to -understand the protocol, troubleshoot issues, and copy-paste. We favored -compatibility and understandability over a wire-efficient format. - -Note that we are not using JSON for signed and/or logged data. In other words, -a submitter that wishes to distribute log responses to their user base in a -different format may do so. The forced (de)serialization parser on _end-users_ -is a small subset of Trunnel. Trunnel is an "idiot-proof" wire-format that the -Tor project uses. +- Requests that add data to the log use the HTTP POST method. +- Request that retrieve data from the log use the HTTP GET method. +- The HTTP content type is `application/x-www-form-urlencoded` for requests and +responses. This means that all input and output are expressed as key-value +pairs. Binary data must be hex-encoded. + +We decided to use percent encoding for requests and responses because it is a +_simple format_ that is commonly used on the web. We are not using percent +encoding for signed and/or logged data. In other words, a submitter may +distribute log responses to their end-users in a different format that suit +them. The forced (de)serialization parser on _end-users_ is a small subset of +Trunnel. Trunnel is an "idiot-proof" wire-format that the Tor project uses. ## Primitives ### Cryptography @@ -49,6 +39,13 @@ padding. Supporting RSA is suboptimal, but excluding it would make the log useless for many possible adopters. ### Serialization +Log requests and responses are percent encoded. Percent encoding is a smaller +dependency than an alternative parser like JSON. It is comparable to rolling +your own minimalistic line-terminated format. Some input and output data is +binary: cryptographic hashes and signatures. Binary data must be expressed as +hex before percent-encoding it. We decided to use hex as opposed to base64 +because it is simpler, favoring simplicity over efficiency on the wire. + We use the [Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html) to define (de)serialization of data structures that need to be signed or inserted into the Merkle tree. Trunnel is more expressive than the @@ -62,13 +59,12 @@ A fair summary of our Trunnel usage is as follows. All integers are 64-bit, unsigned, and in network byte order. A fixed-size byte array is put into the serialization buffer in-order, starting from the first -byte. These basic types are concatenated to form a collection. You should not -need a general-purpose Trunnel (de)serialization parser to work with this -format. If you have one, you may use it though. The main point of using -Trunnel is that it makes a simple format explicit and unambiguous. - -TODO: URL-encode _or_ JSON? I think we should only need one. Always doing HTTP -POST would also ensure that input parameters don't show up in web server logs. +byte. A variable length byte array first declares its length as an integer, +which is then followed by that number of bytes. These basic types are +concatenated to form a collection. You should not need a general-purpose +Trunnel (de)serialization parser to work with this format. If you have one, you +may use it though. The main point of using Trunnel is that it makes a simple +format explicit and unambiguous. #### Merkle tree head Tree heads are signed by the log and its witnesses. It contains a timestamp, a @@ -160,91 +156,124 @@ that it must be a valid HTTP(S) URL that can have the `/st/v0/<endpoint>` suffix appended. For example, a complete endpoint URL could be `https://log.example.com/2021/st/v0/get-signed-tree-head`. +The HTTP status code is 200 OK to indicate success. A different HTTP status +code is used to indicate failure. The log should set the "error" key to a +human-readable value that describes what went wrong. For example, +`error=invalid+signature`, `error=rate+limit+exceeded`, or +`error=unknown+leaf+hash`. + ### get-signed-tree-head ``` GET <base url>/st/v0/get-signed-tree-head ``` -Input key-value pairs: -- `type`: either the string "latest", "stable", or "cosigned". - - "latest": ask for the most recent signed tree head. - - "stable": ask for a recent signed tree head that is fixed for some period +Input: +- "type": either the string "latest", "stable", or "cosigned". + - latest: ask for the most recent signed tree head. + - stable: ask for a recent signed tree head that is fixed for some period of time. - - "cosigned": ask for a recent cosigned tree head. - -Output: -- On success: status 200 OK and a signed tree head. The response body is -defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/sth.schema.json). -- On failure: a different status code and a human-readable error message. + - cosigned: ask for a recent cosigned tree head. + +Output on success: +- "timestamp": `tree_head.timestamp` as a human-readable number. +- "tree_size": `tree_head.tree_size` as a human-readable number. +- "root_hash": `tree_head.root_hash` in hex. +- "signature": an Ed25519 signature over `tree_head`. The result is +hex-encoded. +- "key_hash": a hash of the public verification key that can be used to verify +the signature. The public verification key is serialized as in RFC 8032, then +hashed using SHA256. The result is hex-encoded. + +The "signature" and "key_hash" fields may repeat. The first signature +corresponds to the first key hash, the second signature corresponds to the +second key hash, etc. The number of signatures and key hashes must match. ### get-proof-by-hash ``` POST <base url>/st/v0/get-proof-by-hash ``` -Input key-value pairs: -- `leaf_hash`: a base64-encoded leaf hash that identifies which `tree_leaf` the +Input: +- "leaf_hash": a hex-encoded leaf hash that identifies which `tree_leaf` the log should prove inclusion for. The leaf hash is computed using the RFC 6962 -hashing strategy. In other words, `H(0x00 | tree_leaf)`. -- `tree_size`: the tree size of a tree head that the proof should be based on. +hashing strategy. In other words, `SHA256(0x00 | tree_leaf)`. +- "tree_size": a human-readable tree size of the tree head that the proof should +be based on. -Output: -- On success: status 200 OK and an inclusion proof. The response body is -defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/inclusion_proof.schema.json). -- On failure: a different status code and a human-readable error message. +Output on success: +- "tree_size": human-readable tree size that the proof is based on. +- "leaf_index": human-readable zero-based index of the leaf that the proof is +based on. +- "inclusion_path": a node hash in hex. + +The "inclusion_path" may be omitted or repeated to represent an inclusion proof +of zero or more node hashes. The order of node hashes follow from our hash +strategy, see RFC 6962. ### get-consistency-proof ``` POST <base url>/st/v0/get-consistency-proof ``` -Input key-value pairs: -- `new_size`: the tree size of a newer tree head. -- `old_size`: the tree size of an older tree head that the log should prove is -consistent with the newer tree head. +Input: +- "new_size": human-readable tree size of a newer tree head. +- "old_size": human-readable tree size of an older tree head that the log should +prove is consistent with the newer tree head. + +Output on success: +- "new_size": human-readable tree size of a newer tree head that the proof +is based on. +- "old_size": human-readable tree size of an older tree head that the proof is +based on. +- "consistency_path": a node hash in hex. -Output: -- On success: status 200 OK and a consistency proof. The response body is -defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/consistency_proof.schema.json). -- On failure: a different status code and a human-readable error message. +The "consistency_path" may be omitted or repeated to represent a consistency +proof of zero or more node hashes. The order of node hashes follow from our +hash strategy, see RFC 6962. ### get-leaves ``` POST <base url>/st/v0/get-leaves ``` -Input key-value pairs: -- `start_size`: zero-based index of the first leaf to retrieve. -- `end_size`: index of the last leaf to retrieve. +Input: +- "start_size": human-readable index of the first leaf to retrieve. +- "end_size": human-readable index of the last leaf to retrieve. -Output: -- On success: status 200 OK and a list of leaves. The response body is -defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/leaves.schema.json). -- On failure: a different status code and a human-readable error message. +Output on success: +- "shard_hint": `tree_leaf.message.shard_hint` as a human-readable number. +- "checksum": `tree_leaf.message.checksum` in hex. +- "signature_scheme": human-readable number that identifies a signature scheme. +- "signature": `tree_leaf.signature` in hex. +- "key_hash": `tree_leaf.key_hash` in hex. -The log may truncate the list of returned leaves. However, it must not be an -empty list on success. +All fields may be repeated to return more than one leaf. The first value in +each list refers to the first leaf, the second value in each list refers to the +second leaf, etc. The size of each list must match. + +The log may return fewer leaves than requested. At least one leaf must be +returned on HTTP status code 200 OK. ### add-leaf ``` POST <base url>/st/v0/add-leaf ``` -Input key-value pairs: -- `shard_hint`: the shard hint that the submitter selected. -- `checksum`: the checksum that the submitter wants to log in base64. -- `signature_scheme`: the signature scheme that the submitter wants to use. -- `signature`: the submitter's signature over `tree_leaf.message` in base64. -- `verification_key`: the submitter's public verification key. It is serialized -as described in the corresponding RFC, then base64-encoded. -- `domain_hint`: a domain name that indicates where the public verification-key -hash can be downloaded in base64. Supported methods: DNS and HTTPS -(TODO: docdoc). - -Output: -- On success: HTTP 200. The log will _try_ to incorporate the submitted leaf -into its Merkle tree. -- On failure: a different status code and a human-readable error message. +Input: +- "shard_hint": human-readable number in the log's shard interval that the +submitter selected. +- "checksum": the cryptographic checksum that the submitter wants to log in hex. +- "signature_scheme": human-readable number that identifies the submitter's +signature scheme. +- "signature": the submitter's signature over `tree_leaf.message`. The result +is hex-encoded. +- "verification_key": the submitter's public verification key. It is serialized +as described in the corresponding RFC. The result is hex-encoded. +- "domain_hint": a domain name that indicates where `tree_leaf.key_hash` can be +retrieved as a DNS TXT resource record in hex. + +Output on success: +- None The submitted entry will not be accepted if the signature is invalid or if the downloaded verification-key hash does not match. The submitted entry may also @@ -260,23 +289,20 @@ Public logging should not be assumed until an inclusion proof is available. An inclusion proof should not be relied upon unless it leads up to a trustworthy signed tree head. Witness cosigning can make a tree head trustworthy. -TODO: the log may allow no `domain_hint`? Especially useful for v0 testing. - ### add-cosignature ``` POST <base url>/st/v0/add-cosignature ``` -Input key-value pairs: -- `signature`: a base64-encoded signature over a `tree_head` that is fixed for -some period of time. The cosigning witness retrieves the tree head using the -`get-signed-tree-head` endpoint with the "stable" type. -- `key_hash`: a base64-encoded hash of the public verification key that can be -used to verify the signature. +Input: +- "signature": an Ed25519 signature over `tree_head`. The result is +hex-encoded. +- "key_hash": a hash of the public verification key that can be used to verify +the signature. The public verification key is serialized as in RFC 8032, then +hashed using SHA256. The result is hex-encoded. -Output: -- HTTP status 200 OK on success. Otherwise a different status code and a -human-readable error message. +Output on success: +- None The key-hash can be used to identify which witness signed the log's tree head. A key-hash, rather than the full verification key, is used to force the verifier |