diff options
-rw-r--r-- | doc/api.md | 323 |
1 files changed, 181 insertions, 142 deletions
@@ -1,7 +1,9 @@ # System Transparency Logging: API v0 -This document describes details of the System Transparency logging API, -version 0. The broader picture is not explained here. We assume that you have -read the System Transparency Logging design document. It can be found [here](https://github.com/system-transparency/stfe/blob/design/doc/design.md). +This document describes details of the System Transparency logging +API, version 0. The broader picture is not explained here. We assume +that you have read the System Transparency Logging design document. +It can be found +[here](https://github.com/system-transparency/stfe/blob/design/doc/design.md). **Warning.** This is a work-in-progress document that may be moved or modified. @@ -17,24 +19,28 @@ The log implements an HTTP(S) API: - Binary data is hex-encoded before being transmitted. The motivation for using a text based key/value format for request and -response data is that it's simple to parse. Note that this format is not being -used for the serialization of signed or logged data, where a more -well defined and storage efficient format is desirable. -A submitter may distribute log responses to their end-users in any +response data is that it's simple to parse. Note that this format is +not being used for the serialization of signed or logged data, where a +more well defined and storage efficient format is desirable. A +submitter may distribute log responses to their end-users in any format that suits them. The (de)serialization required for _end-users_ is a small subset of Trunnel. Trunnel is an "idiot-proof" wire-format in use by the Tor project. ## Primitives ### Cryptography -The log uses the same Merkle tree hash strategy as [RFC 6962, §2](https://tools.ietf.org/html/rfc6962#section-2). -The hash functions must be [SHA256](https://csrc.nist.gov/csrc/media/publications/fips/180/4/final/documents/fips180-4-draft-aug2014.pdf). -The log must sign tree heads using [Ed25519](https://tools.ietf.org/html/rfc8032). -The log's witnesses must also sign tree heads using Ed25519. - -All other parts that are not Merkle tree related also use SHA256 as the hash -function. Using more than one hash function would increases the overall attack -surface: two hash functions must be collision resistant instead of one. +The log uses the same Merkle tree hash strategy as +[RFC 6962,§2](https://tools.ietf.org/html/rfc6962#section-2). +The hash functions must be +[SHA256](https://csrc.nist.gov/csrc/media/publications/fips/180/4/final/documents/fips180-4-draft-aug2014.pdf). +The log must sign tree heads using +[Ed25519](https://tools.ietf.org/html/rfc8032). The log's witnesses +must also sign tree heads using Ed25519. + +All other parts that are not Merkle tree related also use SHA256 as +the hash function. Using more than one hash function would increases +the overall attack surface: two hash functions must be collision +resistant instead of one. ### Serialization Log requests and responses are transmitted as ASCII-encoded key/value @@ -45,32 +51,36 @@ encoding. Using hex as opposed to base64 is motivated by it being simpler, favoring ease of decoding and encoding over efficiency on the wire. -We use the [Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html) +We use the +[Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html) to define (de)serialization of data structures that need to be signed or inserted into the Merkle tree. Trunnel is more expressive than the [SSH wire format](https://tools.ietf.org/html/rfc4251#section-5). -It is about as expressive as the [TLS presentation language](https://tools.ietf.org/html/rfc8446#section-3). -A notable difference is that Trunnel supports integer constraints. The Trunnel -language is also readable by humans _and_ machines. "Obviously correct code" -can be generated in C and Go. +It is about as expressive as the +[TLS presentation language](https://tools.ietf.org/html/rfc8446#section-3). +A notable difference is that Trunnel supports integer constraints. +The Trunnel language is also readable by humans _and_ machines. +"Obviously correct code" can be generated in C and Go. A fair summary of our Trunnel usage is as follows. -All integers are 64-bit, unsigned, and in network byte order. Fixed-size byte -arrays are put into the serialization buffer in-order, starting from the first -byte. Variable length byte arrays first declare their length as an integer, -which is then followed by that number of bytes. These basic types are -concatenated to form a collection. You should not need a general-purpose -Trunnel (de)serialization parser to work with this format. If you have one, you -may use it though. The main point of using Trunnel is that it makes a simple -format explicit and unambiguous. +All integers are 64-bit, unsigned, and in network byte order. +Fixed-size byte arrays are put into the serialization buffer in-order, +starting from the first byte. Variable length byte arrays first +declare their length as an integer, which is then followed by that +number of bytes. These basic types are concatenated to form a +collection. You should not need a general-purpose Trunnel +(de)serialization parser to work with this format. If you have one, +you may use it though. The main point of using Trunnel is that it +makes a simple format explicit and unambiguous. #### Merkle tree head -Tree heads are signed by the log and its witnesses. It contains a timestamp, a -tree size, and a root hash. The timestamp is included so that monitors can -ensure _liveliness_. It is the time since the UNIX epoch (January 1, 1970 -00:00:00 UTC) in seconds. The tree size specifies the current number of -leaves. The root hash fixes the structure and content of the Merkle tree. +Tree heads are signed by the log and its witnesses. It contains a +timestamp, a tree size, and a root hash. The timestamp is included so +that monitors can ensure _liveliness_. It is the time since the UNIX +epoch (January 1, 1970 00:00:00 UTC) in seconds. The tree size +specifies the current number of leaves. The root hash fixes the +structure and content of the Merkle tree. ``` struct tree_head { @@ -80,14 +90,16 @@ struct tree_head { }; ``` -The serialized tree head must be signed using Ed25519. A witness must not -cosign a tree head if it is inconsistent with prior history or if the timestamp -is backdated or future-dated more than 12 hours. +The serialized tree head must be signed using Ed25519. A witness must +not cosign a tree head if it is inconsistent with prior history or if +the timestamp is backdated or future-dated more than 12 hours. #### Merkle tree leaf -The log supports a single leaf type. It contains a shard hint, a checksum over whatever the submitter wants to log a checksum for, -a signature that the submitter computed over the shard hint and the checksum, and a hash of the -submitter's public verification key, that can be used to verify the signature. +The log supports a single leaf type. It contains a shard hint, a +checksum over whatever the submitter wants to log a checksum for, a +signature that the submitter computed over the shard hint and the +checksum, and a hash of the submitter's public verification key, that +can be used to verify the signature. ``` struct message { @@ -102,23 +114,26 @@ struct tree_leaf { } ``` -Unlike X.509 certificates which already have validity ranges, a checksum does not -carry any such information. Therefore, we require that the submitter selects a -_shard hint_. The selected shard hint must be in the log's _shard interval_. A -shard interval is defined by a start time and an end time. Both ends of the -shard interval are inclusive and expressed as the number of seconds since -the UNIX epoch (January 1, 1970 00:00 UTC). - -Sharding simplifies log operations because it becomes explicit when a log can be -shutdown. A log must only accept logging requests that have valid shard hints. -A log should only accept logging requests during the predefined shard interval. -Note that _the submitter's shard hint is not a verified timestamp_. The -submitter should set the shard hint as large as possible. If a roughly verified -timestamp is needed, a cosigned tree head can be used. - -Without a shard hint, the good Samaritan could log all leaves from an earlier -shard into a newer one. Not only would that defeat the purpose of sharding, but -it would also become a potential denial-of-service vector. +Unlike X.509 certificates which already have validity ranges, a +checksum does not carry any such information. Therefore, we require +that the submitter selects a _shard hint_. The selected shard hint +must be in the log's _shard interval_. A shard interval is defined by +a start time and an end time. Both ends of the shard interval are +inclusive and expressed as the number of seconds since the UNIX epoch +(January 1, 1970 00:00 UTC). + +Sharding simplifies log operations because it becomes explicit when a +log can be shutdown. A log must only accept logging requests that +have valid shard hints. A log should only accept logging requests +during the predefined shard interval. Note that _the submitter's +shard hint is not a verified timestamp_. The submitter should set the +shard hint as large as possible. If a roughly verified timestamp is +needed, a cosigned tree head can be used. + +Without a shard hint, the good Samaritan could log all leaves from an +earlier shard into a newer one. Not only would that defeat the +purpose of sharding, but it would also become a potential +denial-of-service vector. The signed message is composed of the chosen `shard_hint` and the submitter's `checksum`. It must be possible to verify @@ -136,9 +151,10 @@ verifier to locate the appropriate key and make an explicit trust decision. ## Public endpoints -Every log has a base URL that identifies it uniquely. The only constraint is -that it must be a valid HTTP(S) URL that can have the `/st/v0/<endpoint>` suffix -appended. For example, a complete endpoint URL could be +Every log has a base URL that identifies it uniquely. The only +constraint is that it must be a valid HTTP(S) URL that can have the +`/st/v0/<endpoint>` suffix appended. For example, a complete endpoint +URL could be `https://log.example.com/2021/st/v0/get-signed-tree-head`. Input data (in requests) is sent as ASCII key/value pairs as HTTP @@ -151,11 +167,11 @@ format as the input data, i.e. as ASCII key/value pairs on the format `Key: Value`. Example: For sending `tree_size=4711` as output a log would send an HTTP message body consisting of `stlog-tree_size: 4711`. -The HTTP status code is 200 OK to indicate success. A different HTTP status -code is used to indicate failure. The log should set the "error" key to a -human-readable value that describes what went wrong. For example, -`error=invalid+signature`, `error=rate+limit+exceeded`, or -`error=unknown+leaf+hash`. +The HTTP status code is 200 OK to indicate success. A different HTTP +status code is used to indicate failure. The log should set the +"error" key to a human-readable value that describes what went wrong. +For example, `error=invalid+signature`, `error=rate+limit+exceeded`, +or `error=unknown+leaf+hash`. ### get-tree-head-cosigned Returns the latest cosigned tree head. Used together with @@ -169,17 +185,22 @@ Input: - None Output on success: -- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, seconds since the UNIX epoch. +- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, + seconds since the UNIX epoch. - "tree_size": `tree_head.tree_size` ASCII-encoded decimal number. - "root_hash": `tree_head.root_hash` hex-encoded. -- "signature": hex-encoded Ed25519 signature over `tree_head` serialzed as described in section `Merkle tree head`. -- "key_hash": a hash of the public verification key (belonging to either the log or to one of its witnesses), which can be used to verify -the most recent `signature`. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and then -hashed using SHA256. The hash value is hex-encoded. +- "signature": hex-encoded Ed25519 signature over `tree_head` + serialzed as described in section `Merkle tree head`. +- "key_hash": a hash of the public verification key (belonging to + either the log or to one of its witnesses), which can be used to + verify the most recent `signature`. The key is encoded as defined + in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), + and then hashed using SHA256. The hash value is hex-encoded. The "signature" and "key_hash" fields may repeat. The first signature -corresponds to the first key hash, the second signature corresponds to the -second key hash, etc. The number of signatures and key hashes must match. +corresponds to the first key hash, the second signature corresponds to +the second key hash, etc. The number of signatures and key hashes +must match. ### get-tree-head-to-sign Returns the latest tree head to be signed by log witnesses. Used by @@ -193,20 +214,24 @@ Input: - None Output on success: -- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, seconds since the UNIX epoch. +- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, + seconds since the UNIX epoch. - "tree_size": `tree_head.tree_size` ASCII-encoded decimal number. - "root_hash": `tree_head.root_hash` hex-encoded. -- "signature": hex-encoded Ed25519 signature over `tree_head` serialzed as described in section `Merkle tree head`. -- "key_hash": a hash of the log's public verification key, which can be used to verify -`signature`. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and then -hashed using SHA256. The hash value is hex-encoded. +- "signature": hex-encoded Ed25519 signature over `tree_head` + serialzed as described in section `Merkle tree head`. +- "key_hash": a hash of the log's public verification key, which can + be used to verify `signature`. The key is encoded as defined in + [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), + and then hashed using SHA256. The hash value is hex-encoded. There is exactly one `signature` and one `key_hash` field. The `key_hash` refers to the log's public verification key. ### get-tree-head-latest -Returns the latest tree head, signed only by the log. Used for debugging purposes. +Returns the latest tree head, signed only by the log. Used for +debugging purposes. ``` GET <base url>/st/v0/get-tree-head-latest @@ -216,14 +241,16 @@ Input: - None Output on success: -- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, seconds since the UNIX epoch. +- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, + seconds since the UNIX epoch. - "tree_size": `tree_head.tree_size` ASCII-encoded decimal number. - "root_hash": `tree_head.root_hash` hex-encoded. -- "signature": hex-encoded Ed25519 signature over `tree_head` serialzed as described in section `Merkle tree head`. +- "signature": hex-encoded Ed25519 signature over `tree_head` + serialzed as described in section `Merkle tree head`. - "key_hash": a hash of the log's public verification key that can be -used to verify `signature`. The key is encoded as defined in -[RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), -and then hashed using SHA256. The hash value is hex-encoded. + used to verify `signature`. The key is encoded as defined in + [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), + and then hashed using SHA256. The hash value is hex-encoded. There is exactly one `signature` and one `key_hash` field. The `key_hash` refers to the log's public verification key. @@ -235,21 +262,22 @@ POST <base url>/st/v0/get-proof-by-hash ``` Input: -- "leaf_hash": a hex-encoded leaf hash that identifies which `tree_leaf` the -log should prove inclusion for. The leaf hash is computed using the RFC 6962 -hashing strategy. In other words, `SHA256(0x00 | tree_leaf)`. -- "tree_size": a human-readable tree size of the tree head that the proof should -be based on. +- "leaf_hash": a hex-encoded leaf hash that identifies which + `tree_leaf` the log should prove inclusion for. The leaf hash is + computed using the RFC 6962 hashing strategy. In other words, + `SHA256(0x00 | tree_leaf)`. +- "tree_size": a human-readable tree size of the tree head that the + proof should be based on. Output on success: - "tree_size": human-readable tree size that the proof is based on. -- "leaf_index": human-readable zero-based index of the leaf that the proof is -based on. +- "leaf_index": human-readable zero-based index of the leaf that the + proof is based on. - "inclusion_path": a node hash in hex. -The "inclusion_path" may be omitted or repeated to represent an inclusion proof -of zero or more node hashes. The order of node hashes follow from our hash -strategy, see RFC 6962. +The "inclusion_path" may be omitted or repeated to represent an +inclusion proof of zero or more node hashes. The order of node hashes +follow from our hash strategy, see RFC 6962. ### get-consistency-proof ``` @@ -258,19 +286,19 @@ POST <base url>/st/v0/get-consistency-proof Input: - "new_size": human-readable tree size of a newer tree head. -- "old_size": human-readable tree size of an older tree head that the log should -prove is consistent with the newer tree head. +- "old_size": human-readable tree size of an older tree head that the + log should prove is consistent with the newer tree head. Output on success: -- "new_size": human-readable tree size of a newer tree head that the proof -is based on. -- "old_size": human-readable tree size of an older tree head that the proof is -based on. +- "new_size": human-readable tree size of a newer tree head that the + proof is based on. +- "old_size": human-readable tree size of an older tree head that the + proof is based on. - "consistency_path": a node hash in hex. -The "consistency_path" may be omitted or repeated to represent a consistency -proof of zero or more node hashes. The order of node hashes follow from our -hash strategy, see RFC 6962. +The "consistency_path" may be omitted or repeated to represent a +consistency proof of zero or more node hashes. The order of node +hashes follow from our hash strategy, see RFC 6962. ### get-leaves ``` @@ -282,18 +310,21 @@ Input: - "end_size": human-readable index of the last leaf to retrieve. Output on success: -- "shard_hint": `tree_leaf.message.shard_hint` as a human-readable number. +- "shard_hint": `tree_leaf.message.shard_hint` as a human-readable + number. - "checksum": `tree_leaf.message.checksum` in hex. -- "signature_scheme": human-readable number that identifies a signature scheme. +- "signature_scheme": human-readable number that identifies a + signature scheme. - "signature": `tree_leaf.signature` in hex. - "key_hash": `tree_leaf.key_hash` in hex. -All fields may be repeated to return more than one leaf. The first value in -each list refers to the first leaf, the second value in each list refers to the -second leaf, etc. The size of each list must match. +All fields may be repeated to return more than one leaf. The first +value in each list refers to the first leaf, the second value in each +list refers to the second leaf, etc. The size of each list must +match. -The log may return fewer leaves than requested. At least one leaf must be -returned on HTTP status code 200 OK. +The log may return fewer leaves than requested. At least one leaf +must be returned on HTTP status code 200 OK. ### add-leaf ``` @@ -301,31 +332,38 @@ POST <base url>/st/v0/add-leaf ``` Input: -- "shard_hint": human-readable decimal number in the log's shard interval that the -submitter selected. -- "checksum": the cryptographic checksum that the submitter wants to log in hex. note: fixed length 64 bytes, validated by the server somehow -- "signature": the submitter's signature over `tree_leaf.message`. The result -is hex-encoded. -- "verification_key": the submitter's public verification key. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2). The result is hex-encoded. -- "domain_hint": a domain name that indicates where `tree_leaf.key_hash` can be -retrieved as a DNS TXT resource record in hex. +- "shard_hint": human-readable decimal number in the log's shard + interval that the submitter selected. +- "checksum": the cryptographic checksum that the submitter wants to + log in hex. note: fixed length 64 bytes, validated by the server + somehow +- "signature": the submitter's signature over `tree_leaf.message`. + The result is hex-encoded. +- "verification_key": the submitter's public verification key. The + key is encoded as defined in + [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2). The result is hex-encoded. +- "domain_hint": a domain name that indicates where + `tree_leaf.key_hash` can be retrieved as a DNS TXT resource record + in hex. Output on success: - None -The submitted entry will not be accepted if the signature is invalid or if the -downloaded verification-key hash does not match. The submitted entry may also -not be accepted if the second-level domain name exceeded its rate limit. By -coupling every add-leaf request with a second-level domain, it becomes more -difficult to spam the log. You would need an excessive number of domain names. -This becomes costly if free domain names are rejected. +The submitted entry will not be accepted if the signature is invalid +or if the downloaded verification-key hash does not match. The +submitted entry may also not be accepted if the second-level domain +name exceeded its rate limit. By coupling every add-leaf request with +a second-level domain, it becomes more difficult to spam the log. You +would need an excessive number of domain names. This becomes costly +if free domain names are rejected. -The log does not publish domain-name to key bindings because key management is -more complex than that. +The log does not publish domain-name to key bindings because key +management is more complex than that. -Public logging should not be assumed until an inclusion proof is available. An -inclusion proof should not be relied upon unless it leads up to a trustworthy -signed tree head. Witness cosigning can make a tree head trustworthy. +Public logging should not be assumed until an inclusion proof is +available. An inclusion proof should not be relied upon unless it +leads up to a trustworthy signed tree head. Witness cosigning can +make a tree head trustworthy. ### add-cosignature ``` @@ -334,25 +372,26 @@ POST <base url>/st/v0/add-cosignature Input: - "signature": an Ed25519 signature over `tree_head`. The result is -hex-encoded. -- "key_hash": a hash of the witness' public verification key that can be used -to verify the signature. The key is encoded as defined in [RFC 8032, -section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and -then hashed using SHA256. The hash value is hex-encoded. + hex-encoded. +- "key_hash": a hash of the witness' public verification key that can + be used to verify the signature. The key is encoded as defined in + [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), + and then hashed using SHA256. The hash value is hex-encoded. Output on success: - None -The key-hash can be used to identify which witness signed the log's tree head. -A key-hash, rather than the full verification key, is used to force the verifier -to locate the appropriate key and make an explicit trust decision. +The key-hash can be used to identify which witness signed the log's +tree head. A key-hash, rather than the full verification key, is used +to force the verifier to locate the appropriate key and make an +explicit trust decision. ## Summary of log parameters -- **Public key**: an Ed25519 verification key that can be used to verify the -log's tree head signatures. +- **Public key**: an Ed25519 verification key that can be used to + verify the log's tree head signatures. - **Log identifier**: the hashed public verification key using SHA256. -- **Shard interval**: the time during which the log accepts logging requests. -The shard interval's start and end are inclusive and expressed as the number of -seconds since the UNIX epoch. -- **Base URL**: where the log can be reached over HTTP(S). It is the prefix -before a version-0 specific endpoint. +- **Shard interval**: the time during which the log accepts logging + requests. The shard interval's start and end are inclusive and + expressed as the number of seconds since the UNIX epoch. +- **Base URL**: where the log can be reached over HTTP(S). It is the + prefix before a version-0 specific endpoint. |