diff options
-rw-r--r-- | archive/2021-09-14-api-refactor-aborted.patch | 548 |
1 files changed, 548 insertions, 0 deletions
diff --git a/archive/2021-09-14-api-refactor-aborted.patch b/archive/2021-09-14-api-refactor-aborted.patch new file mode 100644 index 0000000..fc8cf54 --- /dev/null +++ b/archive/2021-09-14-api-refactor-aborted.patch @@ -0,0 +1,548 @@ +From d8e0354941a23e2fafa4ac4257904431eb656554 Mon Sep 17 00:00:00 2001 +From: Rasmus Dahlberg <rasmus.dahlberg@kau.se> +Date: Mon, 30 Aug 2021 18:29:32 +0200 +Subject: [PATCH] added example of a possible API refactor + +- Worked checkpoint into API documentation as a placeholder +- s/hex/base64 to be consistent with checkpoint format +- Played with the idea of only doing line-terminated parsing, i.e., only +values with a predefined order as opposed to unordered key-value pairs. +--- + doc/api.md | 399 +++++++++++++++++++++++------------------------------ + 1 file changed, 174 insertions(+), 225 deletions(-) + +diff --git a/doc/api.md b/doc/api.md +index eb35df4..628523b 100644 +--- a/doc/api.md ++++ b/doc/api.md +@@ -6,104 +6,93 @@ logging [design document](https://github.com/sigsum/sigsum/blob/main/doc/design. + **Warning.** + This is a work-in-progress document that may be moved or modified. + +-## Overview ++## 1 - Overview + A log implements an HTTP(S) API for accepting requests and sending responses. + + - Requests without any input data use the HTTP GET method. + - Requests with input data use the HTTP POST method. +-- Input data in requests and output data in responses are expressed as +-ASCII-encoded key/value pairs. +-- Binary data is hex-encoded before being transmitted. +- +-The motivation for using a text based key/value format for request and +-response data is that it is simple to parse. Note that this format is +-not being used for the serialization of signed or logged data, where a +-more well defined and storage efficient format is desirable. A +-submitter should distribute log responses to their end-users in any +-format that suits them. The (de)serialization required for +-_end-users_ is a small subset of Trunnel. Trunnel is an "idiot-proof" +-wire-format in use by the Tor project. +- +-## Primitives +-### Cryptography ++- Input data in requests and output data in responses are expressed with ++line-terminated ASCII. ++- Binary data is base64-encoded before being transmitted. ++ ++The motivation for using a text based format for request and response data is ++that it is simple to parse. This format is not used for the serialization of a ++claimant's statements or the log's leaf data, for which a more well-defined and ++storage-efficient format is desired. ++ ++A claimant should distribute log responses to their believers in any format that ++suits them. The (de)serialization required for _believers_ is a small subset of ++Trunnel. Trunnel is an "idiot-proof" wire-format in use by the Tor project. ++ ++A believer will also need to de-serialize a (witnessed) checkpoint. The sigsum ++API that is related to witnessing may be used by other log ecosystems. ++ ++## 2 - Primitives ++### 2.1 - Cryptography + Logs use the same Merkle tree hash strategy as + [RFC 6962,ยง2](https://tools.ietf.org/html/rfc6962#section-2). + The hash functions must be + [SHA256](https://csrc.nist.gov/csrc/media/publications/fips/180/4/final/documents/fips180-4-draft-aug2014.pdf). + Logs must sign tree heads using + [Ed25519](https://tools.ietf.org/html/rfc8032). Log witnesses +-must cosign signed tree heads using Ed25519. ++must cosign tree heads using Ed25519. + + All other parts that are not Merkle tree related should use SHA256 as + the hash function. Using more than one hash function would increases + the overall attack surface: two hash functions must be collision + resistant instead of one. + +-### Serialization +-Log requests and responses are transmitted as ASCII-encoded key/value +-pairs, for a smaller dependency than an alternative parser like JSON. +-Some input and output data is binary: cryptographic hashes and +-signatures. Binary data must be Base16-encoded, also known as hex +-encoding. Using hex as opposed to base64 is motivated by it being +-simpler, favoring ease of decoding and encoding over efficiency on the +-wire. +- +-We use the +-[Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html) +-to define (de)serialization of data structures that need to be signed or +-inserted into the Merkle tree. Trunnel is more expressive than the +-[SSH wire format](https://tools.ietf.org/html/rfc4251#section-5). +-It is about as expressive as the +-[TLS presentation language](https://tools.ietf.org/html/rfc8446#section-3). +-A notable difference is that Trunnel supports integer constraints. +-The Trunnel language is also readable by humans _and_ machines. +-"Obviously correct code" can be generated in C and Go. ++### 2.2 - Serialization ++Log requests and responses are transmitted as line-terminated ASCII -- for a ++smaller dependency than an alternative parser like JSON. A log's (co)signed ++checkpoints also use line-terminated ASCII. The used checkpoint format is ++shared with other log ecosystems to simply witness cosigning efforts. ++ ++Some input and output data is binary: cryptographic hashes and signatures. ++Binary data is base64-encoded to be consistent with a log's checkpoints. ++ ++We use the [Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html) ++to define (de)serialization of a claimant's signed statements. We also use ++Trunnel to define the format of a log's leaves. Trunnel is about as expressive ++as the [TLS presentation language](https://tools.ietf.org/html/rfc8446#section-3). ++However, it is readable by humans _and_ machines. "Obviously correct code" can ++be generated in C and Go. + + A fair summary of our Trunnel usage is as follows. + +-All integers are 64-bit, unsigned, and in network byte order. +-Fixed-size byte arrays are put into the serialization buffer in-order, +-starting from the first byte. These basic types are concatenated to form a +-collection. You should not need a general-purpose Trunnel +-(de)serialization parser to work with this format. If you have one, +-you may use it though. The main point of using Trunnel is that it +-makes a simple format explicit and unambiguous. +- +-#### Merkle tree head +-A tree head contains a timestamp, a tree size, and a root hash. The timestamp +-is included so that monitors can ensure _liveliness_. It is the time since the +-UNIX epoch (January 1, 1970 00:00 UTC) in seconds. The tree size specifies the +-current number of leaves. The root hash fixes the structure and content of the +-Merkle tree. ++All integers are 64-bit, unsigned, and in network byte order. Fixed-size byte ++arrays are put into the serialization buffer in-order, starting from the first ++byte. These basic types are concatenated to form a collection. + +-``` +-struct tree_head { +- u64 timestamp; +- u64 tree_size; +- u8 root_hash[32]; +-}; +-``` ++You should not need a general-purpose Trunnel (de)serialization parser to work ++with this format. If you have one, you may use it though. The main point of ++using Trunnel is that it makes a simple format explicit and unambiguous. + +-#### (Co)signed Merkle tree head +-A log signs the serialized tree head using Ed25519. A witness cosigns the +-serialized _signed tree head_ using Ed25519. This means that a witness +-signature can not be mistaken for a log signature and vice versa. ++### 2.3 - Merkle tree ++#### 2.3.1 - Tree head ++A tree head contains a globally unique log identifier, a timestamp, a tree size, ++and a root hash. The timestamp is included so that monitors can ensure ++_freshness_. It is the time since the UNIX epoch (January 1, 1970 00:00 UTC) in ++seconds. The tree size specifies the current number of leaves. The root hash ++fixes the structure and content of the Merkle tree. + +-``` +-struct signed_tree_head { +- struct tree_head tree_head; +- u8 signature[64]; +-}; +-``` ++TODO: description of checkpoint goes here when it is finalized, then refactor. ++ ++A log signs the envelope body using Ed25519. A witness cosigns the same ++envelope body iff it is consistent with its prior history _and_ the timestamp ++is fresh. Freshness is defined as not being backdated more than X time units. + +-A witness must not cosign a signed tree head if it is inconsistent with prior +-history or if the timestamp is backdated or future-dated more than 12 hours. ++TODO: define X. + +-#### Merkle tree leaf +-Logs support a single leaf type. It contains a shard hint, a +-checksum, a signature, and a key hash. ++#### 2.3.2 - Tree leaf ++Logs support a single leaf type. It contains a shard hint, a checksum, a ++signature, and a key hash. + + ``` ++/* ++ * Trunnel definition of a sigsum log's tree leaf. Dropping signature and ++ * key_hash gives us the statement that a claimant signs. ++ */ + struct tree_leaf { + u64 shard_hint; + u8 checksum[32]; +@@ -113,7 +102,7 @@ struct tree_leaf { + ``` + + `shard_hint` is chosen by the submitter to match the log's shard interval, see +-design document. ++[design document](). + + `checksum` is computed by the submitter and represents some opaque data. + +@@ -126,236 +115,188 @@ in `tree_leaf` so that the leaf can be attributed to the submitter. A hash, + rather than the full public key, is used to motivate verifiers to locate the + appropriate key and make an explicit trust decision. + +-## Public endpoints ++## 3 - Public endpoints + Every log has a base URL that identifies it uniquely. The only + constraint is that it must be a valid HTTP(S) URL that can have the + `/sigsum/v0/<endpoint>` suffix appended. For example, a complete endpoint + URL could be `https://log.example.com/2021/sigsum/v0/get-tree-head-cosigned`. + +-Input data (in requests) is POST:ed in the HTTP message body as ASCII +-key/value pairs. ++Input data (in requests) is POST:ed in the HTTP message body using ++line-terminated ASCII. + + Output data (in replies) is sent in the HTTP message body in the same +-format as the input data, i.e. as ASCII key/value pairs on the format +-`Key=Value` ++format as the input data, i.e., as line-terminated ASCII. + + The HTTP status code is 200 OK to indicate success. A different HTTP + status code is used to indicate failure, in which case a log should +-respond with a human-readable string describing what went wrong using +-the key `error`. Example: `error=Invalid signature`. ++respond with a human-readable string that describes what went wrong. + +-### get-tree-head-cosigned +-Returns the latest cosigned tree head. Used together with +-`get-proof-by-hash` and `get-consistency-proof` for verifying the tree. ++### 3.1 - get-checkpoint-latest ++Returns the latest checkpoint. Mainly used for debugging purposes. + + ``` +-GET <base url>/sigsum/v0/get-tree-head-cosigned ++GET <base url>/sigsum/v0/get-checkpoint-latest + ``` + +-Input: ++Input + - None + +-Output on success: +-- `timestamp`: `tree_head.timestamp` ASCII-encoded decimal number, +- seconds since the UNIX epoch. +-- `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number. +-- `root_hash`: `tree_head.root_hash` hex-encoded. +-- `signature`: hex-encoded Ed25519 log signature over `timestamp`, +- `tree_size` and `root_hash` serialized into a `tree_head` as +- described in section `Merkle tree head`. +-- `cosignature`: hex-encoded Ed25519 witness signature over `timestamp`, +- `tree_size`, `root_hash`, and `signature` serialized into a `signed_tree_head` +- as described in section `(Co)signed Merkle tree head`. +-- `key_hash`: a hash of the witness verification key that can be used to +- verify the above `cosignature`. The key is encoded as defined +- in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), +- and then hashed using SHA256. The hash value is hex-encoded. +- +-The `cosignature` and `key_hash` fields may repeat. The first witness signature +-corresponds to the first key hash, the second witness signature corresponds to +-the second key hash, etc. At least one witness signature must be returned on +-success. The number of witness signatures and key hashes must match. +- +-### get-tree-head-to-sign +-Returns the latest signed tree head to be cosigned. Used by witnesses. ++Output on success ++- A formatted checkpoint, see Section [2.3.1](). ++ ++Success requires that there are no witness signatures. ++ ++### 3.2 - get-checkpoint-to-sign ++Returns the latest checkpoint _to be cosigned_. Used by witnesses so that they ++cosign the same checkpoint regardless of if new leaves were added to the log. + + ``` +-GET <base url>/sigsum/v0/get-tree-head-to-sign ++GET <base url>/sigsum/v0/get-checkpoint-to-sign + ``` + + Input: + - None + + Output on success: +-- `timestamp`: `tree_head.timestamp` ASCII-encoded decimal number, +- seconds since the UNIX epoch. +-- `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number. +-- `root_hash`: `tree_head.root_hash` hex-encoded. +-- `signature`: hex-encoded Ed25519 log signature over `timestamp`, +- `tree_size` and `root_hash` serialized into a `tree_head` as +- described in section `Merkle tree head`. ++- A formatted checkpoint, see Section [2.3.1](). ++ ++Success requires that there are no witness signatures. + +-### get-tree-head-latest +-Returns the latest signed tree head. Used for debugging purposes. ++### 3.3 - get-checkpoint-cosigned ++Returns the latest cosigned checkpoint. Used together with `get-inclusion-proof` ++and `get-leaves`. Ensures that verifiers see the same statements as believers. + + ``` +-GET <base url>/sigsum/v0/get-tree-head-latest ++GET <base url>/sigsum/v0/get-checkpoint-cosigned + ``` + +-Input an output follows the same formatting as `get-tree-head-to-sign`. ++Input: ++- None ++ ++Output on success: ++- A formatted checkpoint, see Section [2.3.1](). + +-### get-proof-by-hash ++At least one witness signature is required for a request to be successful. ++ ++### 3.4 - add-cosignature + ``` +-POST <base url>/sigsum/v0/get-proof-by-hash ++POST <base url>/sigsum/v0/add-cosignature + ``` + + Input: +-- `leaf_hash`: leaf identifying which `tree_leaf` the log should prove +- inclusion of, hex-encoded. +-- `tree_size`: tree size of the tree head that the proof should be +- based on, as an ASCII-encoded decimal number. ++- Line 1: signature line that is valid for the log's to-sign checkpoint. + + Output on success: +-- `tree_size`: tree size that the proof is based on, as an +- ASCII-encoded decimal number. +-- `leaf_index`: zero-based index of the leaf that the proof is based +- on, as an ASCII-encoded decimal number. +-- `inclusion_path`: node hash, hex-encoded. +- +-The leaf hash is computed using the RFC 6962 hashing strategy. In +-other words, `SHA256(0x00 | tree_leaf)`. +- +-`inclusion_path` may be omitted or repeated to represent an inclusion +-proof of zero or more node hashes. The order of node hashes follow +-from the hash strategy, see RFC 6962. ++- None + +-Example: `echo "leaf_hash=241fd4538d0a35c2d0394e4710ea9e6916854d08f62602fb03b55221dcdac90f +-tree_size=4711" | curl --data-binary @- localhost/sigsum/v0/get-proof-by-hash` ++A log may reject signatures from witnesses that they are not aware of. + +-### get-consistency-proof ++### 3.5 - get-consistency-proof + ``` + POST <base url>/sigsum/v0/get-consistency-proof + ``` + + Input: +-- `new_size`: tree size of a newer tree head, as an ASCII-encoded +- decimal number. +-- `old_size`: tree size of an older tree head that the log should +- prove is consistent with the newer tree head, as an ASCII-encoded +- decimal number. ++- Line 1: tree size of a newer tree head, as an ASCII-encoded decimal number. ++- Line 2: tree size of an older tree head that the log should prove is ++consistent with the newer tree head, as an ASCII-encoded decimal number. + + Output on success: +-- `new_size`: tree size of the newer tree head that the proof is based +- on, as an ASCII-encoded decimal number. +-- `old_size`: tree size of the older tree head that the proof is based +- on, as an ASCII-encoded decimal number. +-- `consistency_path`: node hash, hex-encoded. ++- Line 1+: a consistency path with one node hash per line, in base64. + +-`consistency_path` may be omitted or repeated to represent a +-consistency proof of zero or more node hashes. The order of node +-hashes follow from the hash strategy, see RFC 6962. ++The order of node hashes follow from the hash strategy, see RFC 6962. + +-Example: `echo "new_size=4711 +-old_size=42" | curl --data-binary @- localhost/sigsum/v0/get-consistency-proof` ++Example: `echo "4711 ++42" | curl --data-binary @- localhost/sigsum/v0/get-consistency-proof` + +-### get-leaves ++### 3.6 - get-inclusion-proof + ``` +-POST <base url>/sigsum/v0/get-leaves ++POST <base url>/sigsum/v0/get-inclusion-proof + ``` + + Input: +-- `start_size`: index of the first leaf to retrieve, as an +- ASCII-encoded decimal number. +-- `end_size`: index of the last leaf to retrieve, as an ASCII-encoded +- decimal number. ++- Line 1: leaf hash that identifies which `tree_leaf` the log should prove ++inclusion of, in base64. ++- Line 2: tree size of the tree head that the proof should be based on, as an ++ASCII-encoded decimal number. + + Output on success: +-- `shard_hint`: `tree_leaf.message.shard_hint` as an ASCII-encoded +- decimal number. +-- `checksum`: `tree_leaf.message.checksum`, hex-encoded. +-- `signature`: `tree_leaf.signature_over_message`, hex-encoded. +-- `key_hash`: `tree_leaf.key_hash`, hex-encoded. ++- Line 1: zero-based index of the leaf that the proof is based on, as an ++ASCII-encoded decimal number. ++- Line 2+: an inclusion path with one node hash per line, in base64. + +-All fields may be repeated to return more than one leaf. The first +-value in each list refers to the first leaf, the second value in each +-list refers to the second leaf, etc. The size of each list must +-match. ++The leaf hash is computed using the RFC 6962 hashing strategy. In other words, ++`H(0x00 | tree_leaf)`. + +-A log may return fewer leaves than requested. At least one leaf +-must be returned on HTTP status code 200 OK. ++The order of node hashes follow from the hash strategy, see RFC 6962. + +-Example: `echo "start_size=42 +-end_size=4711" | curl --data-binary @- localhost/sigsum/v0/get-leaves` ++Example: `echo "leaf_hash=WJG1tSLV3whtD/CxEPvZ0hu0/HFjrzTQgoai6Eb2vgM= ++4711" | curl --data-binary @- localhost/sigsum/v0/get-inclusion-proof` + +-### add-leaf ++### 3.7 - get-leaves + ``` +-POST <base url>/sigsum/v0/add-leaf ++POST <base url>/sigsum/v0/get-leaves + ``` + + Input: +-- `shard_hint`: number within the log's shard interval as an +- ASCII-encoded decimal number. +-- `checksum`: the cryptographic checksum that the submitter wants to +- log, hex-encoded. +-- `signature`: the submitter's signature over `tree_leaf.shard_hint` and +- `tree_leaf.checksum`, see section `Merkle tree leaf`. The resulting signature +- is hex-encoded. +-- `verification_key`: the submitter's public verification key. The +- key is encoded as defined in +- [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2) +- and then hex-encoded. +-- `domain_hint`: domain name indicating where `tree_leaf.key_hash` +- can be found as a DNS TXT resource record with hex-encoding. ++- Line 1: index of the first leaf to retrieve, as an ASCII-encoded decimal ++number. ++- Line 2: index of the last leaf to retrieve, as an ASCII-encoded decimal ++number. + + Output on success: +-- None +- +-The submission will not be accepted if `signature` is +-invalid or if the key hash retrieved using `domain_hint` does not +-match a hash over `verification_key`. +- +-The submission may also not be accepted if the second-level domain +-name exceeded its rate limit. By coupling every add-leaf request to +-a second-level domain, it becomes more difficult to spam logs. You +-would need an excessive number of domain names. This becomes costly +-if free domain names are rejected. ++- Line `k` for `k` in `[1, num_returned_leaves]`: ++ `tree_leaf.shard_hint` as an ASCII-encoded decimal number, ++ `tree_leaf.checksum` in base64, ++ `tree_leaf.signature` in base64, and ++ `tree_leaf.key_hash` in base64; ++ all separated by a single blank space character. + +-Logs don't publish domain-name to key bindings because key +-management is more complex than that. ++The first line corresponds to the first leaf that was requested, the second ++line corresponds to the second leaf that was requested, etc. + +-Public logging should not be assumed to have happened until an +-inclusion proof is available. An inclusion proof should not be relied +-upon unless it leads up to a trustworthy signed tree head. Witness +-cosigning can make a tree head trustworthy. ++A log may return fewer leaves than requested. At least one leaf must be ++returned on HTTP status code 200 OK. + +-Example: `echo "shard_hint=1640995200 +-checksum=cfa2d8e78bf273ab85d3cef7bde62716261d1e42626d776f9b4e6aae7b6ff953 +-signature=c026687411dea494539516ee0c4e790c24450f1a4440c2eb74df311ca9a7adf2847b99273af78b0bda65dfe9c4f7d23a5d319b596a8881d3bc2964749ae9ece3 +-verification_key=c9a674888e905db1761ba3f10f3ad09586dddfe8581964b55787b44f318cbcdf +-domain_hint=example.com" | curl --data-binary @- localhost/sigsum/v0/add-leaf` ++Example: `echo "42 ++4711" | curl --data-binary @- localhost/sigsum/v0/get-leaves` + +-### add-cosignature ++### 3.8 - add-leaf + ``` +-POST <base url>/sigsum/v0/add-cosignature ++POST <base url>/sigsum/v0/add-leaf + ``` + + Input: +-- `cosignature`: Ed25519 witness signature over `signed_tree_head`, hex-encoded. +-- `key_hash`: hash of the witness' public verification key that can be +- used to verify `cosignature`. The key is encoded as defined in +- [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), +- and then hashed using SHA256. The hash value is hex-encoded. ++- Line 1: a shard hint within the log's shard interval as an ASCII-encoded ++decimal number. ++- Line 2: a cryptographic checksum that the submitter wants to log, in base64. ++- Line 3: a signature over `tree_leaf.shard_hint` and `tree_leaf.checksum`, see ++section `Merkle tree leaf`. The resulting signature is in base64. ++- Line 4: a public verification key that can be used to verify the above ++signature. This key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), ++then base64-encoded. ++- Line 5: domain hint that indicates where `tree_leaf.key_hash` can be found as ++a DNS TXT resource record in base64. + + Output on success: + - None + +-`key_hash` can be used to identify which witness cosigned a signed tree +-head. A key-hash, rather than the full verification key, is used to +-motivate verifiers to locate the appropriate key and make an explicit +-trust decision. ++The submission will not be accepted if `signature` is invalid or if the ++retrieved key hash does not match the specified verification key. + +-Example: `echo "cosignature=d1b15061d0f287847d066630339beaa0915a6bbb77332c3e839a32f66f1831b69c678e8ca63afd24e436525554dbc6daa3b1201cc0c93721de24b778027d41af +-key_hash=662ce093682280f8fbea9939abe02fdba1f0dc39594c832b411ddafcffb75b1d" | curl --data-binary @- localhost/sigsum/v0/add-cosignature` ++The submission may also not be accepted if the second-level domain ++name exceeded its rate limit. By coupling every add-leaf request to ++a second-level domain, it becomes more difficult to spam logs. ++ ++Public logging should not be assumed to have happened until an ++inclusion proof is available. An inclusion proof should not be relied ++upon unless it leads up to a cosigned checkpoint. ++ ++Example: `echo "1640995200 ++z6LY54vyc6uF0873veYnFiYdHkJibXdvm05qrntv+VM= ++wCZodBHepJRTlRbuDE55DCRFDxpEQMLrdN8xHKmnrfKEe5knOveLC9pl3+nE99I6XTGbWWqIgdO8KWR0muns4w== ++yaZ0iI6QXbF2G6PxDzrQlYbd3+hYGWS1V4e0TzGMvN8= ++example.com" | curl --data-binary @- localhost/sigsum/v0/add-leaf` + + ## Summary of log parameters + - **Public key**: The Ed25519 verification key to be used for +@@ -366,3 +307,11 @@ key_hash=662ce093682280f8fbea9939abe02fdba1f0dc39594c832b411ddafcffb75b1d" | cur + requests are accepted as the number of seconds since the UNIX epoch. + - **Base URL**: Where the log can be reached over HTTP(S). It is the + prefix to be used to construct a version 0 specific endpoint. ++ ++## TODO notes ++- s/submitter/claimant, or define/clarify this in API doc? ++- what do we think about ordered lines vs unordered key-value pairs? ++- Need a unique log ID in the updated checkpoint format ++ - Option: public key, or hashed public key ++ - Option: base URL ++ - Option: free form, pick a fun log name +-- +2.20.1 + |