From 24cc6b0db8ef9c718925d14b329f21938e5d2b1b Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Tue, 20 Apr 2021 12:28:28 +0200 Subject: started on our in-progress (re)design documents --- doc/api.md | 247 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 247 insertions(+) create mode 100644 doc/api.md (limited to 'doc/api.md') diff --git a/doc/api.md b/doc/api.md new file mode 100644 index 0000000..760663b --- /dev/null +++ b/doc/api.md @@ -0,0 +1,247 @@ +# System Transparency Logging: API v0 +This document describes details of the System Transparency logging API, +version 0. The broader picture is not explained here. We assume that you have +read the System Transparency design document. It can be found [here](https://github.com/system-transparency/stfe/blob/design/doc/design.md). + +**Warning.** +This is a work-in-progress document that may be moved or modified. + +## Overview +The log implements an HTTP(S) API: +- Requests that add data to the log use the HTTP POST method. The HTTP content +type is `application/x-www-form-urlencoded`. The posted data are key-value +pairs. Binary data must be base64-encoded. +- Requests that retrieve data from the log use the HTTP GET method. The HTTP +content type is `application/x-www-form-urlencoded`. Input parameters are +key-value pairs. +- Responses are JSON objects. The HTTP content type is `application/json`. +- Error messages are human-readable strings. The HTTP content type is +`text/plain`. + +We decided to use these web formats for requests and responses because the log +is running as an HTTP(S) service. In other words, anyone that interacts with +the log is most likely using these formats already. The other benefit is that +all requests and responses are human-readable. This makes it easier to +understand the protocol, troubleshoot issues, and copy-paste. We favored +compatibility and understandability over a wire-efficient format. + +Note that we are not using JSON for signed and/or logged data. In other words, +a submitter that wishes to distribute log responses to their user base in a +different format may do so. The forced (de)serialization parser on _end-users_ +is a small subset of Trunnel. Trunnel is an "idiot-proof" wire-format that the +Tor project uses. + +## Primitives +### Cryptography +The log uses the same Merkle tree hash strategy as [RFC 6962, ยง2](https://tools.ietf.org/html/rfc6962#section-2). +The hash functions must be [SHA256](https://csrc.nist.gov/csrc/media/publications/fips/180/4/final/documents/fips180-4-draft-aug2014.pdf). +The log must sign tree heads using [Ed25519](https://tools.ietf.org/html/rfc8032). +The log's witnesses must also sign tree heads using Ed25519. + +All other parts that are not Merkle tree related also use SHA256 as the hash +function. Using more than one hash function would increases the overall attack +surface: two hash functions must be collision resistant instead of one. + +We recommend that submitters sign using Ed25519. We also support RSA with +[deterministic](https://tools.ietf.org/html/rfc8017#section-8.2) +or [probabilistic](https://tools.ietf.org/html/rfc8017#section-8.1) +padding. Supporting RSA is suboptimal, but excluding it would make the log +useless for many possible adopters. + +### Serialization +We use the [Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html) +to define (de)serialization of data structures that need to be signed or +inserted into the Merkle tree. Trunnel is more expressive than the +[SSH wire format](https://tools.ietf.org/html/rfc4251#section-5). +It is about as expressive as the [TLS presentation language](https://tools.ietf.org/html/rfc8446#section-3). +A notable difference is that Trunnel supports integer constraints. The Trunnel +language is also readable by humans _and_ machines. "Obviously correct code" +can be generated in C and Go. + +A fair summary of our Trunnel usage is as follows. + +All integers are 64-bit, unsigned, and in network byte order. A fixed-size byte +array is put into the serialization buffer in-order, starting from the first +byte. These basic types are concatenated to form a collection. You should not +need a general-purpose Trunnel (de)serialization parser to work with this +format. If you have one, you may use it though. The main point of using +Trunnel is that it makes a simple format explicit and unambiguous. + +TODO: URL-encode _or_ JSON? I think we should only need one. Always doing HTTP +POST would also ensure that input parameters don't show up in web server logs. + +#### Merkle tree head +Tree heads are signed by the log and its witnesses. It contains a timestamp, a +tree size, and a root hash. The timestamp is included so that monitors can +ensure _liveliness_. It is the time since the UNIX epoch (January 1, 1970 +00:00:00 UTC) in milliseconds. The tree size specifies the current number of +leaves. The root hash fixes the structure and content of the Merkle tree. + +``` +struct tree_head { + u64 timestamp; + u64 tree_size; + u8 root_hash[32]; +}; +``` + +The serialized tree head must be signed using Ed25519. A witness must only sign +the log's tree head if it is consistent with prior history and the timestamp is +roughly correct. A timestamp is roughly correct if it is not backdated or +future-dated more than 12 hours. + +#### Merkle tree leaf +The log supports a single leaf type. It contains a checksum, a signature +scheme, a signature that the submitter computed over that checksum, and the hash +of the public verification key that can be used to verify the signature. + +``` +const ALG_ED25519 = 1; // RFC 8032 +const ALG_RSASSA_PKCS1_V1_5 = 2; // RFC 8017 +const ALG_RSASSA_PSS = 3; // RFC 8017 + +struct tree_leaf { + u8 checksum[32]; + u64 signature_scheme IN [ + ALG_ED25519, + ALG_RSASSA_PKCS1_V1_5, + ALG_RSASSA_PSS, + ]; + union signature[signature_scheme] { + ALG_ED25519: u8 ed25519[32]; + default: u8 rsa[512]; + } + u8 key_hash[32]; +} +``` + +A key-hash is included in the leaf so that it can be attributed to the signing +entity. A hash, rather than the full public verification key, is used to force +the verifier to locate the appropriate key and make an explicit trust decision. + +## Public endpoints +Every log has a base URL that identifies it uniquely. The only constraint is +that it must be a valid HTTP(S) URL that can have the `/st/v0/` suffix +appended. For example, a complete endpoint URL could be +`https://log.example.com/2021/st/v0/get-signed-tree-head`. + +### get-signed-tree-head +``` +GET /st/v0/get-signed-tree-head +``` + +Input key-value pairs: +- `type`: either the string "latest", "stable", or "cosigned". + - "latest": ask for the most recent signed tree head. + - "stable": ask for a recent signed tree head that is fixed for some period + of time. + - "cosigned": ask for a recent cosigned tree head. + +Output: +- On success: status 200 OK and a signed tree head. The response body is +defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/sth.schema.json). +- On failure: a different status code and a human-readable error message. + +### get-proof-by-hash +``` +POST /st/v0/get-proof-by-hash +``` + +Input key-value pairs: +- `leaf_hash`: a base64-encoded leaf hash that identifies which `tree_leaf` the +log should prove inclusion for. The leaf hash is computed using the RFC 6962 +hashing strategy. In other words, `H(0x00 | tree_leaf)`. +- `tree_size`: the tree size of a tree head that the proof should be based on. + +Output: +- On success: status 200 OK and an inclusion proof. The response body is +defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/inclusion_proof.schema.json). +- On failure: a different status code and a human-readable error message. + +### get-consistency-proof +``` +POST /st/v0/get-consistency-proof +``` + +Input key-value pairs: +- `new_size`: the tree size of a newer tree head. +- `old_size`: the tree size of an older tree head that the log should prove is +consistent with the newer tree head. + +Output: +- On success: status 200 OK and a consistency proof. The response body is +defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/consistency_proof.schema.json). +- On failure: a different status code and a human-readable error message. + +### get-leaves +``` +POST /st/v0/get-leaves +``` + +Input key-value pairs: +- `start_size`: zero-based index of the first leaf to retrieve. +- `end_size`: index of the last leaf to retrieve. + +Output: +- On success: status 200 OK and a list of leaves. The response body is +defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/leaves.schema.json). +- On failure: a different status code and a human-readable error message. + +The log may truncate the list of returned leaves. However, it must not be an +empty list on success. + +### add-leaf +``` +POST /st/v0/add-leaf +``` + +Input key-value pairs: +- `leaf_checksum`: the checksum that the submitter wants to log in base64. +- `signature_scheme`: the signature scheme that the submitter wants to use. +- `tree_leaf_signature`: the submitter's `tree_leaf` signature in base64. +- `verification_key`: the submitter's public verification key. It is serialized +as described in the corresponding RFC, then base64-encoded. +- `domain_hint`: a domain name that indicates where the public verification-key +hash can be downloaded in base64. Supported methods: DNS and HTTPS +(TODO: docdoc). + +Output: +- On success: HTTP 200. The log will _try_ to incorporate the submitted leaf +into its Merkle tree. +- On failure: a different status code and a human-readable error message. + +The submitted entry will not be accepted if the signature is invalid or if the +downloaded verification-key hash does not match. The submitted entry may also +not be accepted if the second-level domain name exceeded its rate limit. By +coupling every add-leaf request with a second-level domain, it becomes more +difficult to spam the log. You would need an excessive number of domain names. +This becomes costly if free domain names are rejected. + +The log does not publish domain-name to key bindings because key management is +more complex than that. + +Public logging should not be assumed until an inclusion proof is available. An +inclusion proof should not be relied upon unless it leads up to a trustworthy +signed tree head. Witness cosigning can make a tree head trustworthy. + +TODO: the log may allow no `domain_hint`? Especially useful for v0 testing. + +### add-cosignature +``` +POST /st/v0/add-cosignature +``` + +Input key-value pairs: +- `signature`: a base64-encoded signature over a `tree_head` that is fixed for +some period of time. The cosigning witness retrieves the tree head using the +`get-signed-tree-head` endpoint with the "stable" type. +- `key_hash`: a base64-encoded hash of the public verification key that can be +used to verify the signature. + +Output: +- HTTP status 200 OK on success. Otherwise a different status code and a +human-readable error message. + +The key-hash can be used to identify which witness signed the log's tree head. +A key-hash, rather than the full verification key, is used to force the verifier +to locate the appropriate key and make an explicit trust decision. -- cgit v1.2.3