From 8288635071a972265af0dd2aa547f8376185f458 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Thu, 1 Apr 2021 00:17:06 +0200 Subject: added drafty ascii charts (work in progress) --- doc/formats.md | 160 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 160 insertions(+) create mode 100644 doc/formats.md (limited to 'doc') diff --git a/doc/formats.md b/doc/formats.md new file mode 100644 index 0000000..bffd05f --- /dev/null +++ b/doc/formats.md @@ -0,0 +1,160 @@ +# Formats +This document defines data structures and data formats. + +## Overview +Here we give an overview of our presentation language / serialization rules. + +All integers are represented by 64-bit unsigned integers in network byte order. + +Variable length lists have an integer specifying its length. Then each list +item is enumerated. + +TODO: fixme. + +## Items +Every item type start with a versioned format specifier. Protocol version 1 +uses format specifiers in the range 1--X. + +### Request data structures +Log endpoints that take input data use the following request data structures. + +#### `get_entries_v1` +``` +0 Format 8 16 24 ++----------+----------------+----------------+ +| 1 | Start Size | End Size | ++----------+----------------+----------------+ + uint64 uint64 uint64 +``` +- Format is always 1 for items of type `get_entries_v1`. +- Start size specifies the index of the first Merkle tree leaf to retrieve. +- End size specifies the index of the last Merkle tree leaf to retrieve. + +#### `get_proof_by_hash_v1` +``` +0 Format 8 16 48 ++----------+----------------+----------------+ +| 2 | Tree size | Leaf hash | ++----------+----------------+----------------+ + uint64 uint64 fixed byte array +``` +- Format is always 2 for items of type `get_proof_by_hash_v1`. +- Leaf hash is computed as described in [RFC 6962/bis, §2.1.1](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-35#section-2.1.1). +- Tree size specifies which Merkle tree root inclusion should be proven for. + +#### `get_consistency_proof_v1` +``` +0 Format 8 16 24 ++----------+----------------+----------------+ +| 3 | Old size | New size | ++----------+----------------+----------------+ + uint64 uint64 uint64 +``` +- Format is always 3 for items of type `get_consistency_proof_v1`. +- Old size specifies the tree size of an older Merkle tree head. +- New size specifies the tree size of a newer Merkle tree head. + +### Proof and log data structures +#### `inclusion_proof_v1` +``` + --zero or more node hashes--> +0 Format 8 48 56 64 72 72+Length ++----------+----------------+----------------+----------------+----------------+--------//--------+ +| 4 | Identifier | Tree size | Leaf index | Length | Node hashes | ++----------+----------------+----------------+----------------+----------------+--------//--------+ + uint64 ed25519_v1 uint64 uint64 uint64 list body +``` +- Format is always 4 for items of type `inclusion_proof_v1`. +- Identifier identifies the log uniquely as an `ed25519_v1` item. +- Tree size is the size of the Merkle tree that the proof is based on. +- Leaf index is a zero-based index of the log entry that the proof is based on. +- The remaining part is a list of node hashes. + - Length specifies the full byte size of the list. It must be `32 * m`, + where `m >= 0`. This means that an inclusion needs zero or more node + hashes to be well-formed. + - Node hash is a node hash in the Merkle tree that the proof is based on. + +Remark: the list of node hashes is generated and verified as in [RFC 6962/bis, +§2.1.3](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-35#section-2.1.3). + +#### `consistency_proof_v1` +``` + --zero or more node hashes--> +0 Format 8 48 56 64 72 72+Length ++----------+----------------+----------------+----------------+----------------+--------//--------+ +| 5 | Identifier | Old size | New size | Length | Node hashes | ++----------+----------------+----------------+----------------+----------------+--------//--------+ + uint64 ed25519_v1 uint64 uint64 uint64 list body +``` +- Format is always 5 for items of type `consistency_proof_v1`. +- Identifier identifies the log uniquely as an `ed25519_v1` item. +- Old size is the tree size of the older Merkle tree. +- New size is the tree size of the newer Merkle tree. +- The remaining part is a list of node hashes. + - Length specifies the full byte size of the list. It must be `32 * m`, + where `m >= 0`. This means that a consistenty proof needs zero or more node + hashes to be well-formed. + - Node hash is a node hash from the older or the newer Merkle tree. + +Remark: the list of node hashes is generated and verified as in [RFC 6962/bis, +§2.1.4](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-35#section-2.1.4). + +#### `signed_tree_head_v1` +``` + ----one or more signature-identifier pairs-------> +0 Format 8 16 24 56 64 128 168 64+Length ++----------+----------------+----------------+----------------+----------------+----------------+----------------+--//--+ +| 6 | Timestamp | Tree size | Root hash | Length | Signature | Identifier | .... | ++----------+----------------+----------------+----------------+----------------+----------------+----------------+--//--+ + uint64 uint64 uint64 fixed byte array uint64 fixed byte array ed25519_v1 cont. list body +``` +- Format is always 6 for items of type `signed_tree_head_v1`. +- Timestamp is the time since the UNIX epoch (January 1, 1970 00:00:00 UTC) in +milliseconds. +- Tree size is the number of leaves in the current Merkle tree. +- Root hash is the root hash of the current Merkle tree. +- The remaining part is a list of signature-identifier pairs. + - Length specifies the full byte size of the list. It must be `104 * m`, + where `m > 1`. This means that a signed tree head needs at least one + signature-identifier pair to be well-formed. + - Signature is an Ed25519 signature over bytes 0--56. The signature is + encodes as in [RFC 8032, §3.3](https://tools.ietf.org/html/rfc8032#section-3.3). + - Identifier identifies the signer uniquely as an `ed25519_v1` item. + +Remark: there may be multiple signature-identifier pairs if the log is cosigned. + +#### `signed_checksum32_ed25519_v1` +``` +0 Format 8 40 56 56+Length 120+Length 160+Length ++----------+----------------+----------------+-------//---------+----------------+--------//--------+ +| 7 | Checksum | Length | Identifier | Signature | Namespace | ++----------+----------------+----------------+-------//---------+----------------+--------//--------+ + uint64 fixed byte array uint64 byte array fixed byte array ed25519_v1 +``` +- Format is always 7 for items of type `signed_checksum32_ed25519_v1`. +- Checksum is a 32-byte checksum that represents a data item of opaque type. +- Length specified the full byte size of the following identifier. It must be +larger than zero and less than 128. +- Identifier identifies what the checksum represents. The aforementioned length +constraint means that the identifier cannot be omitted or exceed 128 bytes. +- Signature is an Ed25519 signature over bytes 0--56+Length. The signature is +encodes as in [RFC 8032, §3.3](https://tools.ietf.org/html/rfc8032#section-3.3). +- Namespace is an `ed25519_v1` item that identifies the signer uniquely. + +Remark: to keep this checksum entry as simple as possible it does not have a +variable length checksum or any agility with regards to the signing namespace. +This means that we need to have multiple leaf types that follow the pattern +`signed_checksum{32,64}_namespace_v1`. + +### Namespace data structures +#### `ed25519_v1` +``` +0 Format 8 40 ++----------+----------------+ +| 8 | public key | ++----------+----------------+ + uint64 fixed byte array +``` +- The format is always 8 for items of type `ed25519_v1`. +- The public Ed25519 verification key is always 32 bytes. See encoding in [RFC +8032, §3.2](https://tools.ietf.org/html/rfc8032#section-3.2). -- cgit v1.2.3 From 24cc6b0db8ef9c718925d14b329f21938e5d2b1b Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Tue, 20 Apr 2021 12:28:28 +0200 Subject: started on our in-progress (re)design documents --- doc/api.md | 247 ++++++++++++++++++++ doc/design.md | 32 +++ doc/formats.md | 160 ------------- doc/schema/consistency_proof.schema.json | 30 +++ doc/schema/example/consistency_proof.json | 7 + doc/schema/example/inclusion_proof.json | 7 + doc/schema/example/leaves.json | 14 ++ doc/schema/example/sth.json | 11 + doc/schema/inclusion_proof.schema.json | 30 +++ doc/schema/leaves.schema.json | 38 +++ doc/schema/sth.schema.json | 50 ++++ doc/sketch.md | 372 ------------------------------ 12 files changed, 466 insertions(+), 532 deletions(-) create mode 100644 doc/api.md create mode 100644 doc/design.md delete mode 100644 doc/formats.md create mode 100644 doc/schema/consistency_proof.schema.json create mode 100644 doc/schema/example/consistency_proof.json create mode 100644 doc/schema/example/inclusion_proof.json create mode 100644 doc/schema/example/leaves.json create mode 100644 doc/schema/example/sth.json create mode 100644 doc/schema/inclusion_proof.schema.json create mode 100644 doc/schema/leaves.schema.json create mode 100644 doc/schema/sth.schema.json delete mode 100644 doc/sketch.md (limited to 'doc') diff --git a/doc/api.md b/doc/api.md new file mode 100644 index 0000000..760663b --- /dev/null +++ b/doc/api.md @@ -0,0 +1,247 @@ +# System Transparency Logging: API v0 +This document describes details of the System Transparency logging API, +version 0. The broader picture is not explained here. We assume that you have +read the System Transparency design document. It can be found [here](https://github.com/system-transparency/stfe/blob/design/doc/design.md). + +**Warning.** +This is a work-in-progress document that may be moved or modified. + +## Overview +The log implements an HTTP(S) API: +- Requests that add data to the log use the HTTP POST method. The HTTP content +type is `application/x-www-form-urlencoded`. The posted data are key-value +pairs. Binary data must be base64-encoded. +- Requests that retrieve data from the log use the HTTP GET method. The HTTP +content type is `application/x-www-form-urlencoded`. Input parameters are +key-value pairs. +- Responses are JSON objects. The HTTP content type is `application/json`. +- Error messages are human-readable strings. The HTTP content type is +`text/plain`. + +We decided to use these web formats for requests and responses because the log +is running as an HTTP(S) service. In other words, anyone that interacts with +the log is most likely using these formats already. The other benefit is that +all requests and responses are human-readable. This makes it easier to +understand the protocol, troubleshoot issues, and copy-paste. We favored +compatibility and understandability over a wire-efficient format. + +Note that we are not using JSON for signed and/or logged data. In other words, +a submitter that wishes to distribute log responses to their user base in a +different format may do so. The forced (de)serialization parser on _end-users_ +is a small subset of Trunnel. Trunnel is an "idiot-proof" wire-format that the +Tor project uses. + +## Primitives +### Cryptography +The log uses the same Merkle tree hash strategy as [RFC 6962, §2](https://tools.ietf.org/html/rfc6962#section-2). +The hash functions must be [SHA256](https://csrc.nist.gov/csrc/media/publications/fips/180/4/final/documents/fips180-4-draft-aug2014.pdf). +The log must sign tree heads using [Ed25519](https://tools.ietf.org/html/rfc8032). +The log's witnesses must also sign tree heads using Ed25519. + +All other parts that are not Merkle tree related also use SHA256 as the hash +function. Using more than one hash function would increases the overall attack +surface: two hash functions must be collision resistant instead of one. + +We recommend that submitters sign using Ed25519. We also support RSA with +[deterministic](https://tools.ietf.org/html/rfc8017#section-8.2) +or [probabilistic](https://tools.ietf.org/html/rfc8017#section-8.1) +padding. Supporting RSA is suboptimal, but excluding it would make the log +useless for many possible adopters. + +### Serialization +We use the [Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html) +to define (de)serialization of data structures that need to be signed or +inserted into the Merkle tree. Trunnel is more expressive than the +[SSH wire format](https://tools.ietf.org/html/rfc4251#section-5). +It is about as expressive as the [TLS presentation language](https://tools.ietf.org/html/rfc8446#section-3). +A notable difference is that Trunnel supports integer constraints. The Trunnel +language is also readable by humans _and_ machines. "Obviously correct code" +can be generated in C and Go. + +A fair summary of our Trunnel usage is as follows. + +All integers are 64-bit, unsigned, and in network byte order. A fixed-size byte +array is put into the serialization buffer in-order, starting from the first +byte. These basic types are concatenated to form a collection. You should not +need a general-purpose Trunnel (de)serialization parser to work with this +format. If you have one, you may use it though. The main point of using +Trunnel is that it makes a simple format explicit and unambiguous. + +TODO: URL-encode _or_ JSON? I think we should only need one. Always doing HTTP +POST would also ensure that input parameters don't show up in web server logs. + +#### Merkle tree head +Tree heads are signed by the log and its witnesses. It contains a timestamp, a +tree size, and a root hash. The timestamp is included so that monitors can +ensure _liveliness_. It is the time since the UNIX epoch (January 1, 1970 +00:00:00 UTC) in milliseconds. The tree size specifies the current number of +leaves. The root hash fixes the structure and content of the Merkle tree. + +``` +struct tree_head { + u64 timestamp; + u64 tree_size; + u8 root_hash[32]; +}; +``` + +The serialized tree head must be signed using Ed25519. A witness must only sign +the log's tree head if it is consistent with prior history and the timestamp is +roughly correct. A timestamp is roughly correct if it is not backdated or +future-dated more than 12 hours. + +#### Merkle tree leaf +The log supports a single leaf type. It contains a checksum, a signature +scheme, a signature that the submitter computed over that checksum, and the hash +of the public verification key that can be used to verify the signature. + +``` +const ALG_ED25519 = 1; // RFC 8032 +const ALG_RSASSA_PKCS1_V1_5 = 2; // RFC 8017 +const ALG_RSASSA_PSS = 3; // RFC 8017 + +struct tree_leaf { + u8 checksum[32]; + u64 signature_scheme IN [ + ALG_ED25519, + ALG_RSASSA_PKCS1_V1_5, + ALG_RSASSA_PSS, + ]; + union signature[signature_scheme] { + ALG_ED25519: u8 ed25519[32]; + default: u8 rsa[512]; + } + u8 key_hash[32]; +} +``` + +A key-hash is included in the leaf so that it can be attributed to the signing +entity. A hash, rather than the full public verification key, is used to force +the verifier to locate the appropriate key and make an explicit trust decision. + +## Public endpoints +Every log has a base URL that identifies it uniquely. The only constraint is +that it must be a valid HTTP(S) URL that can have the `/st/v0/` suffix +appended. For example, a complete endpoint URL could be +`https://log.example.com/2021/st/v0/get-signed-tree-head`. + +### get-signed-tree-head +``` +GET /st/v0/get-signed-tree-head +``` + +Input key-value pairs: +- `type`: either the string "latest", "stable", or "cosigned". + - "latest": ask for the most recent signed tree head. + - "stable": ask for a recent signed tree head that is fixed for some period + of time. + - "cosigned": ask for a recent cosigned tree head. + +Output: +- On success: status 200 OK and a signed tree head. The response body is +defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/sth.schema.json). +- On failure: a different status code and a human-readable error message. + +### get-proof-by-hash +``` +POST /st/v0/get-proof-by-hash +``` + +Input key-value pairs: +- `leaf_hash`: a base64-encoded leaf hash that identifies which `tree_leaf` the +log should prove inclusion for. The leaf hash is computed using the RFC 6962 +hashing strategy. In other words, `H(0x00 | tree_leaf)`. +- `tree_size`: the tree size of a tree head that the proof should be based on. + +Output: +- On success: status 200 OK and an inclusion proof. The response body is +defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/inclusion_proof.schema.json). +- On failure: a different status code and a human-readable error message. + +### get-consistency-proof +``` +POST /st/v0/get-consistency-proof +``` + +Input key-value pairs: +- `new_size`: the tree size of a newer tree head. +- `old_size`: the tree size of an older tree head that the log should prove is +consistent with the newer tree head. + +Output: +- On success: status 200 OK and a consistency proof. The response body is +defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/consistency_proof.schema.json). +- On failure: a different status code and a human-readable error message. + +### get-leaves +``` +POST /st/v0/get-leaves +``` + +Input key-value pairs: +- `start_size`: zero-based index of the first leaf to retrieve. +- `end_size`: index of the last leaf to retrieve. + +Output: +- On success: status 200 OK and a list of leaves. The response body is +defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/leaves.schema.json). +- On failure: a different status code and a human-readable error message. + +The log may truncate the list of returned leaves. However, it must not be an +empty list on success. + +### add-leaf +``` +POST /st/v0/add-leaf +``` + +Input key-value pairs: +- `leaf_checksum`: the checksum that the submitter wants to log in base64. +- `signature_scheme`: the signature scheme that the submitter wants to use. +- `tree_leaf_signature`: the submitter's `tree_leaf` signature in base64. +- `verification_key`: the submitter's public verification key. It is serialized +as described in the corresponding RFC, then base64-encoded. +- `domain_hint`: a domain name that indicates where the public verification-key +hash can be downloaded in base64. Supported methods: DNS and HTTPS +(TODO: docdoc). + +Output: +- On success: HTTP 200. The log will _try_ to incorporate the submitted leaf +into its Merkle tree. +- On failure: a different status code and a human-readable error message. + +The submitted entry will not be accepted if the signature is invalid or if the +downloaded verification-key hash does not match. The submitted entry may also +not be accepted if the second-level domain name exceeded its rate limit. By +coupling every add-leaf request with a second-level domain, it becomes more +difficult to spam the log. You would need an excessive number of domain names. +This becomes costly if free domain names are rejected. + +The log does not publish domain-name to key bindings because key management is +more complex than that. + +Public logging should not be assumed until an inclusion proof is available. An +inclusion proof should not be relied upon unless it leads up to a trustworthy +signed tree head. Witness cosigning can make a tree head trustworthy. + +TODO: the log may allow no `domain_hint`? Especially useful for v0 testing. + +### add-cosignature +``` +POST /st/v0/add-cosignature +``` + +Input key-value pairs: +- `signature`: a base64-encoded signature over a `tree_head` that is fixed for +some period of time. The cosigning witness retrieves the tree head using the +`get-signed-tree-head` endpoint with the "stable" type. +- `key_hash`: a base64-encoded hash of the public verification key that can be +used to verify the signature. + +Output: +- HTTP status 200 OK on success. Otherwise a different status code and a +human-readable error message. + +The key-hash can be used to identify which witness signed the log's tree head. +A key-hash, rather than the full verification key, is used to force the verifier +to locate the appropriate key and make an explicit trust decision. diff --git a/doc/design.md b/doc/design.md new file mode 100644 index 0000000..f966d03 --- /dev/null +++ b/doc/design.md @@ -0,0 +1,32 @@ +# System Transparency Logging: Design v0 +We propose System Transparency logging. It is similar to Certificate +Transparency, expect that cryptographically signed checksums are logged as +opposed to X.509 certificates. Publicly logging signed checksums allow anyone +to discover which keys signed what. As such, malicious and unintended key-usage +can be _discovered_. We present our design and discuss how two possible +use-cases influenced it: binary transparency and reproducible builds. + +**Target audience.** +You are most likely interested in transparency logs or supply-chain security. + +**Preliminaries.** +You have basic understanding of cryptographic primitives like digital +signatures, hash functions, and Merkle trees. You roughly know what problem +Certificate Transparency solves and how. You may never have heard the term +_gossip-audit model_, or know how it is related to trust assumptions and +detectability properties. + +**Warning.** +This is a work-in-progress document that may be moved or modified. + +## Introduction +Transparency logs make it possible to detect unwanted events. For example, + are there any (mis-)issued TLS certificates [\[CT\]](https://tools.ietf.org/html/rfc6962), + did you get a different Go module than everyone else [\[ChecksumDB\]](https://go.googlesource.com/proposal/+/master/design/25530-sumdb.md), + or is someone running unexpected commands on your server [\[AuditLog\]](https://transparency.dev/application/reliably-log-all-actions-performed-on-your-servers/). +System Transparency logging makes signed checksums transparent. The goal is to +_detect_ unwanted key-usage without making assumptions about the signed data. + +## Threat model and (non-)goals + +## Design diff --git a/doc/formats.md b/doc/formats.md deleted file mode 100644 index bffd05f..0000000 --- a/doc/formats.md +++ /dev/null @@ -1,160 +0,0 @@ -# Formats -This document defines data structures and data formats. - -## Overview -Here we give an overview of our presentation language / serialization rules. - -All integers are represented by 64-bit unsigned integers in network byte order. - -Variable length lists have an integer specifying its length. Then each list -item is enumerated. - -TODO: fixme. - -## Items -Every item type start with a versioned format specifier. Protocol version 1 -uses format specifiers in the range 1--X. - -### Request data structures -Log endpoints that take input data use the following request data structures. - -#### `get_entries_v1` -``` -0 Format 8 16 24 -+----------+----------------+----------------+ -| 1 | Start Size | End Size | -+----------+----------------+----------------+ - uint64 uint64 uint64 -``` -- Format is always 1 for items of type `get_entries_v1`. -- Start size specifies the index of the first Merkle tree leaf to retrieve. -- End size specifies the index of the last Merkle tree leaf to retrieve. - -#### `get_proof_by_hash_v1` -``` -0 Format 8 16 48 -+----------+----------------+----------------+ -| 2 | Tree size | Leaf hash | -+----------+----------------+----------------+ - uint64 uint64 fixed byte array -``` -- Format is always 2 for items of type `get_proof_by_hash_v1`. -- Leaf hash is computed as described in [RFC 6962/bis, §2.1.1](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-35#section-2.1.1). -- Tree size specifies which Merkle tree root inclusion should be proven for. - -#### `get_consistency_proof_v1` -``` -0 Format 8 16 24 -+----------+----------------+----------------+ -| 3 | Old size | New size | -+----------+----------------+----------------+ - uint64 uint64 uint64 -``` -- Format is always 3 for items of type `get_consistency_proof_v1`. -- Old size specifies the tree size of an older Merkle tree head. -- New size specifies the tree size of a newer Merkle tree head. - -### Proof and log data structures -#### `inclusion_proof_v1` -``` - --zero or more node hashes--> -0 Format 8 48 56 64 72 72+Length -+----------+----------------+----------------+----------------+----------------+--------//--------+ -| 4 | Identifier | Tree size | Leaf index | Length | Node hashes | -+----------+----------------+----------------+----------------+----------------+--------//--------+ - uint64 ed25519_v1 uint64 uint64 uint64 list body -``` -- Format is always 4 for items of type `inclusion_proof_v1`. -- Identifier identifies the log uniquely as an `ed25519_v1` item. -- Tree size is the size of the Merkle tree that the proof is based on. -- Leaf index is a zero-based index of the log entry that the proof is based on. -- The remaining part is a list of node hashes. - - Length specifies the full byte size of the list. It must be `32 * m`, - where `m >= 0`. This means that an inclusion needs zero or more node - hashes to be well-formed. - - Node hash is a node hash in the Merkle tree that the proof is based on. - -Remark: the list of node hashes is generated and verified as in [RFC 6962/bis, -§2.1.3](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-35#section-2.1.3). - -#### `consistency_proof_v1` -``` - --zero or more node hashes--> -0 Format 8 48 56 64 72 72+Length -+----------+----------------+----------------+----------------+----------------+--------//--------+ -| 5 | Identifier | Old size | New size | Length | Node hashes | -+----------+----------------+----------------+----------------+----------------+--------//--------+ - uint64 ed25519_v1 uint64 uint64 uint64 list body -``` -- Format is always 5 for items of type `consistency_proof_v1`. -- Identifier identifies the log uniquely as an `ed25519_v1` item. -- Old size is the tree size of the older Merkle tree. -- New size is the tree size of the newer Merkle tree. -- The remaining part is a list of node hashes. - - Length specifies the full byte size of the list. It must be `32 * m`, - where `m >= 0`. This means that a consistenty proof needs zero or more node - hashes to be well-formed. - - Node hash is a node hash from the older or the newer Merkle tree. - -Remark: the list of node hashes is generated and verified as in [RFC 6962/bis, -§2.1.4](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-35#section-2.1.4). - -#### `signed_tree_head_v1` -``` - ----one or more signature-identifier pairs-------> -0 Format 8 16 24 56 64 128 168 64+Length -+----------+----------------+----------------+----------------+----------------+----------------+----------------+--//--+ -| 6 | Timestamp | Tree size | Root hash | Length | Signature | Identifier | .... | -+----------+----------------+----------------+----------------+----------------+----------------+----------------+--//--+ - uint64 uint64 uint64 fixed byte array uint64 fixed byte array ed25519_v1 cont. list body -``` -- Format is always 6 for items of type `signed_tree_head_v1`. -- Timestamp is the time since the UNIX epoch (January 1, 1970 00:00:00 UTC) in -milliseconds. -- Tree size is the number of leaves in the current Merkle tree. -- Root hash is the root hash of the current Merkle tree. -- The remaining part is a list of signature-identifier pairs. - - Length specifies the full byte size of the list. It must be `104 * m`, - where `m > 1`. This means that a signed tree head needs at least one - signature-identifier pair to be well-formed. - - Signature is an Ed25519 signature over bytes 0--56. The signature is - encodes as in [RFC 8032, §3.3](https://tools.ietf.org/html/rfc8032#section-3.3). - - Identifier identifies the signer uniquely as an `ed25519_v1` item. - -Remark: there may be multiple signature-identifier pairs if the log is cosigned. - -#### `signed_checksum32_ed25519_v1` -``` -0 Format 8 40 56 56+Length 120+Length 160+Length -+----------+----------------+----------------+-------//---------+----------------+--------//--------+ -| 7 | Checksum | Length | Identifier | Signature | Namespace | -+----------+----------------+----------------+-------//---------+----------------+--------//--------+ - uint64 fixed byte array uint64 byte array fixed byte array ed25519_v1 -``` -- Format is always 7 for items of type `signed_checksum32_ed25519_v1`. -- Checksum is a 32-byte checksum that represents a data item of opaque type. -- Length specified the full byte size of the following identifier. It must be -larger than zero and less than 128. -- Identifier identifies what the checksum represents. The aforementioned length -constraint means that the identifier cannot be omitted or exceed 128 bytes. -- Signature is an Ed25519 signature over bytes 0--56+Length. The signature is -encodes as in [RFC 8032, §3.3](https://tools.ietf.org/html/rfc8032#section-3.3). -- Namespace is an `ed25519_v1` item that identifies the signer uniquely. - -Remark: to keep this checksum entry as simple as possible it does not have a -variable length checksum or any agility with regards to the signing namespace. -This means that we need to have multiple leaf types that follow the pattern -`signed_checksum{32,64}_namespace_v1`. - -### Namespace data structures -#### `ed25519_v1` -``` -0 Format 8 40 -+----------+----------------+ -| 8 | public key | -+----------+----------------+ - uint64 fixed byte array -``` -- The format is always 8 for items of type `ed25519_v1`. -- The public Ed25519 verification key is always 32 bytes. See encoding in [RFC -8032, §3.2](https://tools.ietf.org/html/rfc8032#section-3.2). diff --git a/doc/schema/consistency_proof.schema.json b/doc/schema/consistency_proof.schema.json new file mode 100644 index 0000000..003f3c7 --- /dev/null +++ b/doc/schema/consistency_proof.schema.json @@ -0,0 +1,30 @@ +{ + "$schema": "https://json-schema.org/draft-07/schema#", + "title": "inclusion_proof", + "description": "JSON-formatted inclusion proof, version 0.", + + "type": "object", + "required": [ "new_size", "old_size", "consistency_proof" ], + "properties": { + "new_size": { + "description": "The tree size of the newer Merkle tree head.", + "type": "integer", + "minimum": 0 + }, + "old_size": { + "description": "The tree size of the older Merkle tree head.", + "type": "integer", + "minimum": 0 + }, + "consistency_proof": { + "description": "A list of base64-encoded node hashes that proves consistency", + "type": "array", + "items": { + "description": "A node hash in base64", + "type": "string", + "minLength": 44, + "maxLength": 44 + } + } + } +} diff --git a/doc/schema/example/consistency_proof.json b/doc/schema/example/consistency_proof.json new file mode 100644 index 0000000..0a323b7 --- /dev/null +++ b/doc/schema/example/consistency_proof.json @@ -0,0 +1,7 @@ +{ + "new_size": 2, + "old_size": 1, + "consistency_proof": [ + "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=" + ] +} diff --git a/doc/schema/example/inclusion_proof.json b/doc/schema/example/inclusion_proof.json new file mode 100644 index 0000000..d46d426 --- /dev/null +++ b/doc/schema/example/inclusion_proof.json @@ -0,0 +1,7 @@ +{ + "tree_size": 2, + "leaf_index": 0, + "inclusion_proof": [ + "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=" + ] +} diff --git a/doc/schema/example/leaves.json b/doc/schema/example/leaves.json new file mode 100644 index 0000000..1eed05d --- /dev/null +++ b/doc/schema/example/leaves.json @@ -0,0 +1,14 @@ +[ + { + "checksum": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=", + "signature_scheme": 1, + "signature": "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC=", + "key_hash": "DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD=" + }, + { + "checksum": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=", + "signature_scheme": 2, + "signature": "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB=", + "key_hash": "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC=" + } +] diff --git a/doc/schema/example/sth.json b/doc/schema/example/sth.json new file mode 100644 index 0000000..ec3ad11 --- /dev/null +++ b/doc/schema/example/sth.json @@ -0,0 +1,11 @@ +{ + "timestamp": 0, + "tree_size": 0, + "root_hash": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=", + "signatures": [ + { + "key_hash": "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB=", + "signature": "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC=" + } + ] +} diff --git a/doc/schema/inclusion_proof.schema.json b/doc/schema/inclusion_proof.schema.json new file mode 100644 index 0000000..3309d37 --- /dev/null +++ b/doc/schema/inclusion_proof.schema.json @@ -0,0 +1,30 @@ +{ + "$schema": "https://json-schema.org/draft-07/schema#", + "title": "inclusion_proof", + "description": "JSON-formatted inclusion proof, version 0.", + + "type": "object", + "required": [ "tree_size", "leaf_index", "inclusion_proof" ], + "properties": { + "tree_size": { + "description": "The Merkle tree size that the inclusion proof is based on.", + "type": "integer", + "minimum": 0 + }, + "leaf_index": { + "description": "The zero-based index of the leaf that the inclusion proof is for.", + "type": "integer", + "minimum": 0 + }, + "inclusion_proof": { + "description": "A list of base64-encoded node hashes that proves inclusion", + "type": "array", + "items": { + "description": "A node hash in base64", + "type": "string", + "minLength": 44, + "maxLength": 44 + } + } + } +} diff --git a/doc/schema/leaves.schema.json b/doc/schema/leaves.schema.json new file mode 100644 index 0000000..74d7454 --- /dev/null +++ b/doc/schema/leaves.schema.json @@ -0,0 +1,38 @@ +{ + "$schema": "https://json-schema.org/draft-07/schema#", + "title": "list of tree_leaf", + "description": "JSON-formatted tree leaf list, version 0.", + + "type": "array", + "description": "A list Merkle tree leaves", + "items": { + "type": "object", + "required": [ "checksum", "signature_scheme", "signature", "key_hash" ], + "properties": { + "checksum": { + "description": "A cryptographic hash that is computed over some data of opaque type. The result is base64-encoded.", + "type": "string", + "minLength": 44, + "maxLength": 44 + }, + "signature_scheme": { + "description": "An integer that identifies the signature scheme used by the submitter. See API documentation.", + "type": "integer", + "enum": [ 1, 2, 3 ] + }, + "signature": { + "description": "The submitter's signature over the checksum in base64", + "type": "string", + "minLength": 44, + "maxLength": 684 + }, + "key_hash": { + "description": "A public verification-key hash that identifies the signer.", + "type": "string", + "minLength": 44, + "maxLength": 44 + } + } + }, + "minItems": 1 +} diff --git a/doc/schema/sth.schema.json b/doc/schema/sth.schema.json new file mode 100644 index 0000000..86de2d3 --- /dev/null +++ b/doc/schema/sth.schema.json @@ -0,0 +1,50 @@ +{ + "$schema": "https://json-schema.org/draft-07/schema#", + "title": "signed_tree_head_v0", + "description": "JSON-formatted signed tree head, version 0.", + + "type": "object", + "required": [ "timestamp", "tree_size", "root_hash", "signatures" ], + "properties": { + "timestamp": { + "description": "The number of milliseconds since the UNIX epoch (January 1, 1970 00:00:00 UTC).", + "type": "integer", + "minimum": 0 + }, + "tree_size": { + "description": "The number of entries that are stored in the log's Merkle tree.", + "type": "integer", + "minimum": 0 + }, + "root_hash": { + "description": "The log's Merkle tree root hash in base64.", + "type": "string", + "minLength": 44, + "maxLength": 44 + }, + "signatures": { + "description": "A list of signer-signature pairs.", + "type": "array", + "items": { + "description": "A signer-signature pair.", + "type": "object", + "required": [ "key_hash", "signature" ], + "properties": { + "key_hash": { + "description": "A public verification-key hash that identifies the signer in base64.", + "type": "string", + "minLength": 44, + "maxLength": 44 + }, + "signature": { + "description": "The signer's signature over the log's tree_leaf structure in base64.", + "type": "string", + "minLength": 44, + "maxLength": 44 + } + } + }, + "minItems": 1 + } + } +} diff --git a/doc/sketch.md b/doc/sketch.md deleted file mode 100644 index 31964e0..0000000 --- a/doc/sketch.md +++ /dev/null @@ -1,372 +0,0 @@ -# System Transparency Logging -This document provides a sketch of System Transparency (ST) logging. The basic -idea is to insert hashes of system artifacts into a public, append-only, and -tamper-evident transparency log, such that any enforcing client can be sure that -they see the same system artifacts as everyone else. A system artifact could -be a browser update, an operating system image, a Debian package, or more -generally something that is opaque. - -We take inspiration from the Certificate Transparency Front-End -([CTFE](https://github.com/google/certificate-transparency-go/tree/master/trillian/ctfe)) -that implements [RFC 6962](https://tools.ietf.org/html/rfc6962) for -[Trillian](https://transparency.dev). - -## Log parameters -An ST log is defined by the following parameters: -- `log_identifier`: a `Namespace` of type `ed25519_v1` that defines the log's -signing algorithm and public verification key. -- `supported_namespaces`: a list of namespace types that the log supports. -Entities must use a supported namespace type when posting signed data to the -log. -- `base_url`: prefix used by clients that contact the log, e.g., -example.com:1234/log. -- `final_cosigned_tree_head`: an `StItem` of type `cosigned_tree_head_v*`. Not -set until the log is turned into read-only mode in preparation of a shutdown. - -ST logs use the same hash strategy as described in RFC 6962: SHA256 with `0x00` -as leaf node prefix and `0x01` as interior node prefix. - -In contrast to Certificate Transparency (CT) **there is no Maximum Merge Delay -(MMD)**. New entries are merged into the log as soon as possible, and no client -should trust that something is logged until an inclusion proof can be provided -that references a trustworthy STH. Therefore, **there are no "promises" of -public logging** as in CT. - -To produce trustworthy STHs a simple form of [witness -cosigning](https://arxiv.org/pdf/1503.08768.pdf) is built into the log. -Witnesses poll the log for the next stable STH, and verify that it is consistent -before posting a cosignature that can then be served by the log. - -## Acceptance criteria and scope -A log should accept a leaf submission if it is: -- Well-formed, see data structure definitions below. -- Digitally signed by a registered namespace. - -Rate limits may be applied per namespace to combat spam. Namespaces may also be -used by clients to determine which entries belong to who. It is up to the -submitters to communicate trusted namespaces to their own clients. In other -words, there are no mappings from namespaces to identities built into the log. -There is also no revocation of namespaces: **we facilitate _detection_ of -compromised signing keys by making artifact hashes public, which is not to be -confused with _prevention_ or even _recovery_ after detection**. - -## Data structure definitions -Data structures are defined and serialized using the presentation language in -[RFC 5246, §4](https://tools.ietf.org/html/rfc5246). A definition of the log's -Merkle tree can be found in [RFC 6962, -§2](https://tools.ietf.org/html/rfc6962#section-2). - -### Namespace -A _namespace_ is a versioned data structure that contains a public verification -key (or fingerprint), as well as enough information to determine its format, -signing, and verification operations. Namespaces are used as identifiers, both -for the log itself and the parties that submit artifact hashes and cosignatures. - -``` -enum { - reserved(0), - ed25519_v1(1), - (2^16-1) -} NamespaceFormat; - -struct { - NamespaceFormat format; - select (format) { - case ed25519_v1: Ed25519V1; - } message; -} Namespace; -``` - -Our namespace format is inspired by Keybase's -[key-id](https://keybase.io/docs/api/1.0/kid). - -#### Ed25519V1 -At this time the only supported namespace type is based on Ed25519. The -namespace field contains the full verification key. Signing operations and -serialized formats are defined by [RFC -8032](https://tools.ietf.org/html/rfc8032). -``` -struct { - opaque namespace[32]; // public verification key -} Ed25519V1; -``` - -### `StItem` -A general-purpose `TransItem` is defined in [RFC 6962/bis, -§4.5](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.5). -We define our own `TransItem`, but name it `StItem` to emphasize that they are -not the same. - -``` -enum { - reserved(0), - signed_tree_head_v1(1), - cosigned_tree_head_v1(2), - consistency_proof_v1(3), - inclusion_proof_v1(4), - signed_checksum_v1(5), // leaf type - (2^16-1) -} StFormat; - -struct { - StFormat format; - select (format) { - case signed_tree_head_v1: SignedTreeHeadV1; - case cosigned_tree_head_v1: CosignedTreeHeadV1; - case consistency_proof_v1: ConsistencyProofV1; - case inclusion_proof_v1: InclusionProofV1; - case signed_checksum_v1: SignedChecksumV1; - } message; -} StItem; - -struct { - StItem items<0..2^32-1>; -} StItemList; -``` - -#### `signed_tree_head_v1` -We use the same tree head definition as in [RFC 6962/bis, -§4.9](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.9). -The resulting _signed_ tree head is packaged differently: a namespace is used as -log identifier, and it is communicated in a `SignatureV1` structure. -``` -struct { - TreeHeadV1 tree_head; - SignatureV1 signature; -} SignedTreeHeadV1; - -struct { - uint64 timestamp; - uint64 tree_size; - NodeHash root_hash; - Extension extensions<0..2^16-1>; -} TreeHeadV1; -opaque NodeHash<32..2^8-1>; - -struct { - Namespace namespace; - opaque signature<1..2^16-1>; -} SignatureV1; -``` - -#### `cosigned_tree_head_v1` -Transparency logs were designed to be cryptographically verifiable in the -presence of a gossip-audit model that ensures everyone observes _the same -cryptographically verifiable log_. The gossip-audit model is largely undefined -in today's existing transparency logging ecosystems, which means that the logs -must be trusted to play by the rules. We wanted to avoid that outcome in our -ecosystem. Therefore, a gossip-audit model is built into the log. - -The basic idea is that an STH should only be considered valid if it is cosigned -by a number of witnesses that verify the append-only property. Which witnesses -to trust and under what circumstances is defined by a client-side _witness -cosigning policy_. For example, - "require no witness cosigning", - "must have at least `k` signatures from witnesses A...J", and - "must have at least `k` signatures from witnesses A...J where one is from - witness B". - -Witness cosigning policies are beyond the scope of this specification. - -A cosigned STH is composed of an STH and a list of cosignatures. A cosignature -must cover the serialized STH as an `StItem`, and be produced with a witness -namespace of type `ed25519_v1`. - -``` -struct { - SignedTreeHeadV1 signed_tree_head; - SignatureV1 cosignatures<0..2^32-1>; // vector of cosignatures -} CosignedTreeHeadV1; -``` - -#### `consistency_proof_v1` -For the most part we use the same consistency proof definition as in [RFC -6962/bis, -§4.11](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.11). -There are two modifications: our log identifier is a namespace rather than an -[OID](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.4), -and a consistency proof may be empty. - -``` -struct { - Namespace log_id; - uint64 tree_size_1; - uint64 tree_size_2; - NodeHash consistency_path<0..2^16-1>; -} ConsistencyProofV1; -``` - -#### `inclusion_proof_v1` -For the most part we use the same inclusion proof definition as in [RFC -6962/bis, -§4.12](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.12). -There are two modifications: our log identifier is a namespace rather than an -[OID](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.4), -and an inclusion proof may be empty. -``` -struct { - Namespace log_id; - uint64 tree_size; - uint64 leaf_index; - NodeHash inclusion_path<0..2^16-1>; -} InclusionProofV1; -``` - -#### `signed_checksum_v1` -A checksum entry contains a package identifier like `foobar-1.2.3` and an -artifact hash. It is then signed so that clients can distinguish artifact -hashes from two different software publishers A and B. For example, the -`signed_checksum_v1` type can help [enforce public binary logging before -accepting a new software -update](https://wiki.mozilla.org/Security/Binary_Transparency). - -``` -struct { - ChecksumV1 data; - SignatureV1 signature; -} SignedChecksumV1; - -struct { - opaque identifier<1..128>; - opaque checksum<1..64>; -} ChecksumV1; -``` - -It is assumed that clients know how to find the real artifact source (if not -already at hand), such that the logged hash can be recomputed and compared for -equality. The log is not aware of how artifact hashes are computed, which means -that it is up to the submitters to define hash functions, data formats, and -such. - -## Public endpoints -Clients talk to the log using HTTP(S). Successfully processed requests are -responded to with HTTP status code `200 OK`, and any returned data is -serialized. Endpoints without input parameters use HTTP GET requests. -Endpoints that have input parameters HTTP POST a TLS-serialized data structure. -The HTTP content type `application/octet-stream` is used when sending data. - -### add-entry -``` -POST https:///st/v1/add-entry -``` - -Input: -- An `StItem` of type `signed_checksum_v1`. - -No output. - -### add-cosignature -``` -POST https:///st/v1/add-cosignature -``` - -Input: -- An `StItem` of type `cosigned_tree_head_v1`. The list of cosignatures must -be of length one, the witness signature must cover the item's STH, and that STH -must additionally match the log's stable STH that is currently being cosigned. - -No output. - -### get-latest-sth -``` -GET https:///st/v1/get-latest-sth -``` - -No input. - -Output: -- An `StItem` of type `signed_tree_head_v1` that corresponds to the most -recent STH. - -### get-stable-sth -``` -GET https:///st/v1/get-stable-sth -``` - -No input. - -Output: -- An `StItem` of type `signed_tree_head_v1` that corresponds to a stable STH -that witnesses should cosign. The same STH is returned for a period of time. - -### get-cosigned-sth -``` -GET https:///st/v1/get-cosigned-sth -``` - -No input. - -Output: -- An `StItem` of type `cosigned_tree_head_v1` that corresponds to the most -recent cosigned STH. - -### get-proof-by-hash -``` -POST https:///st/v1/get-proof-by-hash -``` - -Input: -``` -struct { - opaque hash[32]; // leaf hash - uint64 tree_size; // tree size that the proof should be based on -} GetProofByHashV1; -``` - -Output: -- An `StItem` of type `inclusion_proof_v1`. - -### get-consistency-proof -``` -POST https:///st/v1/get-consistency-proof -``` - -Input: -``` -struct { - uint64 first; // first tree size that the proof should be based on - uint64 second; // second tree size that the proof should be based on -} GetConsistencyProofV1; -``` - -Output: -- An `StItem` of type `consistency_proof_v1`. - -### get-entries -``` -POST https:///st/v1/get-entries -``` - -Input: -``` -struct { - uint64 start; // 0-based index of first entry to retrieve - uint64 end; // 0-based index of last entry to retrieve in decimal. -} GetEntriesV1; -``` - -Output: -- An `StItem` list where each entry is of type `signed_checksum_v1`. The first -`StItem` corresponds to the start index, the second one to `start+1`, etc. The -log may return fewer entries than requested. - -# Appendix A -In the future other namespace types might be supported. For example, we could -add [RSASSA-PKCS1-v1_5](https://tools.ietf.org/html/rfc3447#section-8.2) as -follows: -1. Add `rsa_v1` format and RSAV1 namespace. This is what we would register on -the server-side such that the server knows the namespace and complete key. -``` -struct { - opaque namespace<32>; // key fingerprint - // + some encoding of public key -} RSAV1; -``` -2. Add `rsassa_pkcs1_5_v1` format and `RSASSAPKCS1_5_v1`. This is what the -submitter would use to communicate namespace and RSA signature mode. -``` -struct { - opaque namespace<32>; // key fingerprint - // + necessary parameters, e.g., SHA256 as hash function -} RSASSAPKCS1_5V1; -``` -- cgit v1.2.3 From aa8f64c0ed18f384a6af1ade6268b35ec60dac85 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Tue, 20 Apr 2021 21:45:03 +0200 Subject: added shard_hint --- doc/api.md | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index 760663b..0f873e4 100644 --- a/doc/api.md +++ b/doc/api.md @@ -119,6 +119,39 @@ A key-hash is included in the leaf so that it can be attributed to the signing entity. A hash, rather than the full public verification key, is used to force the verifier to locate the appropriate key and make an explicit trust decision. +#### Shard hint +The log is only accepting new leaves during a predefined time interval. We +refer to this time interval as the log's _shard_. Sharding can simplify log +operations because it becomes explicit when the log can be shutdown. + +Unlike X.509 certificates that already have a validity range, a checksum does +not have any such information. Therefore, we require the submitter to sign a +_shard hint_. A shard hint is composed of a prefix and a tree leaf. + +``` +struct shard_hint { + u64 prefix; + struct tree_leaf leaf; +} +``` + +The log will check that the signed `shard_hint` can be verified using the +submitter's public verification key. The prefix could be anything and may +repeat. This API documentation assumes that the prefix is set to zero. + +As long as the `shard_hint` signature is not revealed, no one but the submitter +can submit a leaf that the log will accept. Therefore, the good Samaritan +cannot submit all leaves from an earlier shard into a newer one. The +`shard_hint` does not prevent the _legitimate submitter_ from reusing an earlier +submission in a future shard. + +Note the importance of letting the submitter decide if an entry is logged again +or not. If the log has a rate limiting function, replayed submissions could +deny service in a new shard. In practise we expect submitters to not log a +leaf again. Once an inclusion proof and a cosigned tree head is available, you +have all the necessary proofs. These proofs continue to be valid after the log +shuts down because the verification process is non-interactive. + ## Public endpoints Every log has a base URL that identifies it uniquely. The only constraint is that it must be a valid HTTP(S) URL that can have the `/st/v0/` suffix @@ -199,6 +232,7 @@ Input key-value pairs: - `leaf_checksum`: the checksum that the submitter wants to log in base64. - `signature_scheme`: the signature scheme that the submitter wants to use. - `tree_leaf_signature`: the submitter's `tree_leaf` signature in base64. +- `shard_hint_signature`: the submitter's `shard_hint` signature in base64. - `verification_key`: the submitter's public verification key. It is serialized as described in the corresponding RFC, then base64-encoded. - `domain_hint`: a domain name that indicates where the public verification-key -- cgit v1.2.3 From 66b3d09b526c3c1e8d5f2d9a92deba497ca8124c Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Mon, 26 Apr 2021 12:48:22 +0200 Subject: moved shard_hint into tree_leaf --- doc/api.md | 114 ++++++++++++++++++++++++++++++++++--------------------------- 1 file changed, 63 insertions(+), 51 deletions(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index 0f873e4..b5d54e6 100644 --- a/doc/api.md +++ b/doc/api.md @@ -85,72 +85,74 @@ struct tree_head { }; ``` -The serialized tree head must be signed using Ed25519. A witness must only sign -the log's tree head if it is consistent with prior history and the timestamp is -roughly correct. A timestamp is roughly correct if it is not backdated or -future-dated more than 12 hours. +The serialized tree head must be signed using Ed25519. A witness must not +cosign a tree head if it is inconsistent with prior history or if the timestamp +is backdated or future-dated more than 12 hours. #### Merkle tree leaf -The log supports a single leaf type. It contains a checksum, a signature -scheme, a signature that the submitter computed over that checksum, and the hash -of the public verification key that can be used to verify the signature. +The log supports a single leaf type. It contains a message, a signature scheme, +a signature that the submitter computed over the message, and a hash of the +public verification key that can be used to verify the signature. ``` -const ALG_ED25519 = 1; // RFC 8032 -const ALG_RSASSA_PKCS1_V1_5 = 2; // RFC 8017 -const ALG_RSASSA_PSS = 3; // RFC 8017 +const SIGNATURE_SCHEME_ED25519 = 1; // RFC 8032 +const SIGNATURE_SCHEME_RSASSA_PKCS1_V1_5 = 2; // RFC 8017 +const SIGNATURE_SCHEME_RSASSA_PSS = 3; // RFC 8017 -struct tree_leaf { +struct signature_ed25519 { + u8 signature[32]; +}; + +struct signature_rsassa { + u64 num_bytes IN [ 256, 384, 512 ]; + u8 signature[num_bytes]; +}; + +struct message { + u64 shard_hint; u8 checksum[32]; +}; + +struct tree_leaf { + struct message message; u64 signature_scheme IN [ - ALG_ED25519, - ALG_RSASSA_PKCS1_V1_5, - ALG_RSASSA_PSS, + SIGNATURE_SCHEME_ED25519, + SIGNATURE_SCHEME_RSASSA_PKCS1_V1_5, + SIGNATURE_SCHEME_RSASSA_PSS, ]; union signature[signature_scheme] { - ALG_ED25519: u8 ed25519[32]; - default: u8 rsa[512]; + SIGNATURE_SCHEME_ED25519: struct signature_ed25519 ed25519; + default: struct signature_rsassa rsassa; } u8 key_hash[32]; } ``` -A key-hash is included in the leaf so that it can be attributed to the signing -entity. A hash, rather than the full public verification key, is used to force -the verifier to locate the appropriate key and make an explicit trust decision. - -#### Shard hint -The log is only accepting new leaves during a predefined time interval. We -refer to this time interval as the log's _shard_. Sharding can simplify log -operations because it becomes explicit when the log can be shutdown. - -Unlike X.509 certificates that already have a validity range, a checksum does -not have any such information. Therefore, we require the submitter to sign a -_shard hint_. A shard hint is composed of a prefix and a tree leaf. +Unlike X.509 certificates that already have validity ranges, a checksum does not +have any such information. Therefore, we require that the submitter selects a +_shard hint_. The selected shard hint must be in the log's _shard interval_. A +shard interval is defined by a start time and an end time. Both ends of the +shard interval are inclusive and expressed as the number of milliseconds since +the UNIX epoch (January 1, 1970 00:00:00 UTC). -``` -struct shard_hint { - u64 prefix; - struct tree_leaf leaf; -} -``` +Sharding simplifies log operations because it becomes explicit when a log can be +shutdown. A log must only accept logging requests that have valid shard hints. +A log should only accept logging requests during the predefined shard interval. +Note that _the submitter's shard hint is not a verified timestamp_. The +submitter should set the shard hint as large as possible. If a roughly verified +timestamp is needed, a cosigned tree head can be used. -The log will check that the signed `shard_hint` can be verified using the -submitter's public verification key. The prefix could be anything and may -repeat. This API documentation assumes that the prefix is set to zero. +Without a shard hint, the good Samaritan could log all leaves from an earlier +shard into a newer one. Not only would that defeat the purpose of sharding, but +it would also become a potential denial-of-service vector. -As long as the `shard_hint` signature is not revealed, no one but the submitter -can submit a leaf that the log will accept. Therefore, the good Samaritan -cannot submit all leaves from an earlier shard into a newer one. The -`shard_hint` does not prevent the _legitimate submitter_ from reusing an earlier -submission in a future shard. +The signed message is composed of the selected shard hint and the submitter's +checksum. It must be possible to verify the signature using the specified +signature scheme and the submitter's public verification key. -Note the importance of letting the submitter decide if an entry is logged again -or not. If the log has a rate limiting function, replayed submissions could -deny service in a new shard. In practise we expect submitters to not log a -leaf again. Once an inclusion proof and a cosigned tree head is available, you -have all the necessary proofs. These proofs continue to be valid after the log -shuts down because the verification process is non-interactive. +A key-hash is included in the leaf so that it can be attributed to the signing +entity. A hash, rather than the full public verification key, is used to force +the verifier to locate the appropriate key and make an explicit trust decision. ## Public endpoints Every log has a base URL that identifies it uniquely. The only constraint is @@ -229,10 +231,10 @@ POST /st/v0/add-leaf ``` Input key-value pairs: -- `leaf_checksum`: the checksum that the submitter wants to log in base64. +- `shard_hint`: the shard hint that the submitter selected. +- `checksum`: the checksum that the submitter wants to log in base64. - `signature_scheme`: the signature scheme that the submitter wants to use. -- `tree_leaf_signature`: the submitter's `tree_leaf` signature in base64. -- `shard_hint_signature`: the submitter's `shard_hint` signature in base64. +- `signature`: the submitter's signature over `tree_leaf.message` in base64. - `verification_key`: the submitter's public verification key. It is serialized as described in the corresponding RFC, then base64-encoded. - `domain_hint`: a domain name that indicates where the public verification-key @@ -279,3 +281,13 @@ human-readable error message. The key-hash can be used to identify which witness signed the log's tree head. A key-hash, rather than the full verification key, is used to force the verifier to locate the appropriate key and make an explicit trust decision. + +## Summary of log parameters +- **Public key**: an Ed25519 verification key that can be used to verify the +log's tree head signatures. +- **Log identifier**: the hashed public verification key using SHA256. +- **Shard interval**: the time during which the log accepts logging requests. +The shard interval's start and end are inclusive and expressed as the number of +milliseconds since the UNIX epoch. +- **Base URL**: where the log can be reached over HTTP(S). It is the prefix +before a version-0 specific endpoint. -- cgit v1.2.3 From 83d38bfc5c3b9304953d04a4679658e3c2645367 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Mon, 26 Apr 2021 15:12:57 +0200 Subject: drafty experiment where we would only use percent encoding --- doc/api.md | 206 ++++++++++++++++++++++++++++++++++--------------------------- 1 file changed, 116 insertions(+), 90 deletions(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index b5d54e6..174d2c9 100644 --- a/doc/api.md +++ b/doc/api.md @@ -8,28 +8,18 @@ This is a work-in-progress document that may be moved or modified. ## Overview The log implements an HTTP(S) API: -- Requests that add data to the log use the HTTP POST method. The HTTP content -type is `application/x-www-form-urlencoded`. The posted data are key-value -pairs. Binary data must be base64-encoded. -- Requests that retrieve data from the log use the HTTP GET method. The HTTP -content type is `application/x-www-form-urlencoded`. Input parameters are -key-value pairs. -- Responses are JSON objects. The HTTP content type is `application/json`. -- Error messages are human-readable strings. The HTTP content type is -`text/plain`. - -We decided to use these web formats for requests and responses because the log -is running as an HTTP(S) service. In other words, anyone that interacts with -the log is most likely using these formats already. The other benefit is that -all requests and responses are human-readable. This makes it easier to -understand the protocol, troubleshoot issues, and copy-paste. We favored -compatibility and understandability over a wire-efficient format. - -Note that we are not using JSON for signed and/or logged data. In other words, -a submitter that wishes to distribute log responses to their user base in a -different format may do so. The forced (de)serialization parser on _end-users_ -is a small subset of Trunnel. Trunnel is an "idiot-proof" wire-format that the -Tor project uses. +- Requests that add data to the log use the HTTP POST method. +- Request that retrieve data from the log use the HTTP GET method. +- The HTTP content type is `application/x-www-form-urlencoded` for requests and +responses. This means that all input and output are expressed as key-value +pairs. Binary data must be hex-encoded. + +We decided to use percent encoding for requests and responses because it is a +_simple format_ that is commonly used on the web. We are not using percent +encoding for signed and/or logged data. In other words, a submitter may +distribute log responses to their end-users in a different format that suit +them. The forced (de)serialization parser on _end-users_ is a small subset of +Trunnel. Trunnel is an "idiot-proof" wire-format that the Tor project uses. ## Primitives ### Cryptography @@ -49,6 +39,13 @@ padding. Supporting RSA is suboptimal, but excluding it would make the log useless for many possible adopters. ### Serialization +Log requests and responses are percent encoded. Percent encoding is a smaller +dependency than an alternative parser like JSON. It is comparable to rolling +your own minimalistic line-terminated format. Some input and output data is +binary: cryptographic hashes and signatures. Binary data must be expressed as +hex before percent-encoding it. We decided to use hex as opposed to base64 +because it is simpler, favoring simplicity over efficiency on the wire. + We use the [Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html) to define (de)serialization of data structures that need to be signed or inserted into the Merkle tree. Trunnel is more expressive than the @@ -62,13 +59,12 @@ A fair summary of our Trunnel usage is as follows. All integers are 64-bit, unsigned, and in network byte order. A fixed-size byte array is put into the serialization buffer in-order, starting from the first -byte. These basic types are concatenated to form a collection. You should not -need a general-purpose Trunnel (de)serialization parser to work with this -format. If you have one, you may use it though. The main point of using -Trunnel is that it makes a simple format explicit and unambiguous. - -TODO: URL-encode _or_ JSON? I think we should only need one. Always doing HTTP -POST would also ensure that input parameters don't show up in web server logs. +byte. A variable length byte array first declares its length as an integer, +which is then followed by that number of bytes. These basic types are +concatenated to form a collection. You should not need a general-purpose +Trunnel (de)serialization parser to work with this format. If you have one, you +may use it though. The main point of using Trunnel is that it makes a simple +format explicit and unambiguous. #### Merkle tree head Tree heads are signed by the log and its witnesses. It contains a timestamp, a @@ -160,91 +156,124 @@ that it must be a valid HTTP(S) URL that can have the `/st/v0/` suffix appended. For example, a complete endpoint URL could be `https://log.example.com/2021/st/v0/get-signed-tree-head`. +The HTTP status code is 200 OK to indicate success. A different HTTP status +code is used to indicate failure. The log should set the "error" key to a +human-readable value that describes what went wrong. For example, +`error=invalid+signature`, `error=rate+limit+exceeded`, or +`error=unknown+leaf+hash`. + ### get-signed-tree-head ``` GET /st/v0/get-signed-tree-head ``` -Input key-value pairs: -- `type`: either the string "latest", "stable", or "cosigned". - - "latest": ask for the most recent signed tree head. - - "stable": ask for a recent signed tree head that is fixed for some period +Input: +- "type": either the string "latest", "stable", or "cosigned". + - latest: ask for the most recent signed tree head. + - stable: ask for a recent signed tree head that is fixed for some period of time. - - "cosigned": ask for a recent cosigned tree head. - -Output: -- On success: status 200 OK and a signed tree head. The response body is -defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/sth.schema.json). -- On failure: a different status code and a human-readable error message. + - cosigned: ask for a recent cosigned tree head. + +Output on success: +- "timestamp": `tree_head.timestamp` as a human-readable number. +- "tree_size": `tree_head.tree_size` as a human-readable number. +- "root_hash": `tree_head.root_hash` in hex. +- "signature": an Ed25519 signature over `tree_head`. The result is +hex-encoded. +- "key_hash": a hash of the public verification key that can be used to verify +the signature. The public verification key is serialized as in RFC 8032, then +hashed using SHA256. The result is hex-encoded. + +The "signature" and "key_hash" fields may repeat. The first signature +corresponds to the first key hash, the second signature corresponds to the +second key hash, etc. The number of signatures and key hashes must match. ### get-proof-by-hash ``` POST /st/v0/get-proof-by-hash ``` -Input key-value pairs: -- `leaf_hash`: a base64-encoded leaf hash that identifies which `tree_leaf` the +Input: +- "leaf_hash": a hex-encoded leaf hash that identifies which `tree_leaf` the log should prove inclusion for. The leaf hash is computed using the RFC 6962 -hashing strategy. In other words, `H(0x00 | tree_leaf)`. -- `tree_size`: the tree size of a tree head that the proof should be based on. +hashing strategy. In other words, `SHA256(0x00 | tree_leaf)`. +- "tree_size": a human-readable tree size of the tree head that the proof should +be based on. -Output: -- On success: status 200 OK and an inclusion proof. The response body is -defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/inclusion_proof.schema.json). -- On failure: a different status code and a human-readable error message. +Output on success: +- "tree_size": human-readable tree size that the proof is based on. +- "leaf_index": human-readable zero-based index of the leaf that the proof is +based on. +- "inclusion_path": a node hash in hex. + +The "inclusion_path" may be omitted or repeated to represent an inclusion proof +of zero or more node hashes. The order of node hashes follow from our hash +strategy, see RFC 6962. ### get-consistency-proof ``` POST /st/v0/get-consistency-proof ``` -Input key-value pairs: -- `new_size`: the tree size of a newer tree head. -- `old_size`: the tree size of an older tree head that the log should prove is -consistent with the newer tree head. +Input: +- "new_size": human-readable tree size of a newer tree head. +- "old_size": human-readable tree size of an older tree head that the log should +prove is consistent with the newer tree head. + +Output on success: +- "new_size": human-readable tree size of a newer tree head that the proof +is based on. +- "old_size": human-readable tree size of an older tree head that the proof is +based on. +- "consistency_path": a node hash in hex. -Output: -- On success: status 200 OK and a consistency proof. The response body is -defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/consistency_proof.schema.json). -- On failure: a different status code and a human-readable error message. +The "consistency_path" may be omitted or repeated to represent a consistency +proof of zero or more node hashes. The order of node hashes follow from our +hash strategy, see RFC 6962. ### get-leaves ``` POST /st/v0/get-leaves ``` -Input key-value pairs: -- `start_size`: zero-based index of the first leaf to retrieve. -- `end_size`: index of the last leaf to retrieve. +Input: +- "start_size": human-readable index of the first leaf to retrieve. +- "end_size": human-readable index of the last leaf to retrieve. -Output: -- On success: status 200 OK and a list of leaves. The response body is -defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/leaves.schema.json). -- On failure: a different status code and a human-readable error message. +Output on success: +- "shard_hint": `tree_leaf.message.shard_hint` as a human-readable number. +- "checksum": `tree_leaf.message.checksum` in hex. +- "signature_scheme": human-readable number that identifies a signature scheme. +- "signature": `tree_leaf.signature` in hex. +- "key_hash": `tree_leaf.key_hash` in hex. -The log may truncate the list of returned leaves. However, it must not be an -empty list on success. +All fields may be repeated to return more than one leaf. The first value in +each list refers to the first leaf, the second value in each list refers to the +second leaf, etc. The size of each list must match. + +The log may return fewer leaves than requested. At least one leaf must be +returned on HTTP status code 200 OK. ### add-leaf ``` POST /st/v0/add-leaf ``` -Input key-value pairs: -- `shard_hint`: the shard hint that the submitter selected. -- `checksum`: the checksum that the submitter wants to log in base64. -- `signature_scheme`: the signature scheme that the submitter wants to use. -- `signature`: the submitter's signature over `tree_leaf.message` in base64. -- `verification_key`: the submitter's public verification key. It is serialized -as described in the corresponding RFC, then base64-encoded. -- `domain_hint`: a domain name that indicates where the public verification-key -hash can be downloaded in base64. Supported methods: DNS and HTTPS -(TODO: docdoc). - -Output: -- On success: HTTP 200. The log will _try_ to incorporate the submitted leaf -into its Merkle tree. -- On failure: a different status code and a human-readable error message. +Input: +- "shard_hint": human-readable number in the log's shard interval that the +submitter selected. +- "checksum": the cryptographic checksum that the submitter wants to log in hex. +- "signature_scheme": human-readable number that identifies the submitter's +signature scheme. +- "signature": the submitter's signature over `tree_leaf.message`. The result +is hex-encoded. +- "verification_key": the submitter's public verification key. It is serialized +as described in the corresponding RFC. The result is hex-encoded. +- "domain_hint": a domain name that indicates where `tree_leaf.key_hash` can be +retrieved as a DNS TXT resource record in hex. + +Output on success: +- None The submitted entry will not be accepted if the signature is invalid or if the downloaded verification-key hash does not match. The submitted entry may also @@ -260,23 +289,20 @@ Public logging should not be assumed until an inclusion proof is available. An inclusion proof should not be relied upon unless it leads up to a trustworthy signed tree head. Witness cosigning can make a tree head trustworthy. -TODO: the log may allow no `domain_hint`? Especially useful for v0 testing. - ### add-cosignature ``` POST /st/v0/add-cosignature ``` -Input key-value pairs: -- `signature`: a base64-encoded signature over a `tree_head` that is fixed for -some period of time. The cosigning witness retrieves the tree head using the -`get-signed-tree-head` endpoint with the "stable" type. -- `key_hash`: a base64-encoded hash of the public verification key that can be -used to verify the signature. +Input: +- "signature": an Ed25519 signature over `tree_head`. The result is +hex-encoded. +- "key_hash": a hash of the public verification key that can be used to verify +the signature. The public verification key is serialized as in RFC 8032, then +hashed using SHA256. The result is hex-encoded. -Output: -- HTTP status 200 OK on success. Otherwise a different status code and a -human-readable error message. +Output on success: +- None The key-hash can be used to identify which witness signed the log's tree head. A key-hash, rather than the full verification key, is used to force the verifier -- cgit v1.2.3 From 87a2fa506c1861158ca04fd34d64e10b6447d8f3 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Mon, 26 Apr 2021 19:54:06 +0200 Subject: added drafty threat model text --- doc/design.md | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) (limited to 'doc') diff --git a/doc/design.md b/doc/design.md index f966d03..59cd7c8 100644 --- a/doc/design.md +++ b/doc/design.md @@ -28,5 +28,35 @@ System Transparency logging makes signed checksums transparent. The goal is to _detect_ unwanted key-usage without making assumptions about the signed data. ## Threat model and (non-)goals +We consider a powerful attacker that gained control of a target's signing and +release infrastructure. This covers a weaker form of attacker that is able to +sign data and distribute it to a subset of isolated users. For example, this is +essentially what FBI requested from Apple in the San Bernardino case [\[FBI-Apple\]](https://www.eff.org/cases/apple-challenges-fbi-all-writs-act-order). +The fact that signing keys and related infrastructure components get +compromised should not be controversial [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/). + +The attacker can also gain control of the transparency log's signing key and +infrastructure. This covers a weaker form of attacker that is able to sign log +data and distribute it to a subset of isolated users. For example, this could +have been the case when a remote code execution was found for a Certificate +Transparency Log [\[DigiCert\]](https://groups.google.com/a/chromium.org/g/ct-policy/c/aKNbZuJzwfM). + +Any attacker that is able to position itself to control these components will +likely be _risk-averse_. This is at minimum due to two factors. First, +detection would result in a significant loss of capability that is by no means +trivial to come by. Second, detection means that some part of the attacker's +malicious behavior will be disclosed publicly. + +Our goal is to facilitate _detection_ of compromised signing keys. Therefore, +we transparency log signed checksums. We assume that clients _fail closed_ if a +checksum does not appear in a public log. We also assume that the attacker +controls at most a threshold of independent parties to achieve our goal +("strength in numbers"). + +It is a non-goal to disclose the data that a signed checksum represents. For +example, the log cannot distinguish between a checksum that represents a tax +declaration, an ISO image, or a Debian package. This means that the type of +detection we support is _courser-grained_ when compared to Certificate +Transparency. ## Design -- cgit v1.2.3 From 94fea7a3c993686d26efbf7ca9b73d598222a272 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Thu, 29 Apr 2021 14:50:49 +0200 Subject: added start on design document Work in progress. --- doc/design.md | 196 ++++++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 176 insertions(+), 20 deletions(-) (limited to 'doc') diff --git a/doc/design.md b/doc/design.md index 59cd7c8..9fcf4b6 100644 --- a/doc/design.md +++ b/doc/design.md @@ -2,9 +2,9 @@ We propose System Transparency logging. It is similar to Certificate Transparency, expect that cryptographically signed checksums are logged as opposed to X.509 certificates. Publicly logging signed checksums allow anyone -to discover which keys signed what. As such, malicious and unintended key-usage -can be _discovered_. We present our design and discuss how two possible -use-cases influenced it: binary transparency and reproducible builds. +to discover which keys produced what signatures. As such, malicious and +unintended key-usage can be _detected_. We present our design and conclude by +providing two use-cases: binary transparency and reproducible builds. **Target audience.** You are most likely interested in transparency logs or supply-chain security. @@ -12,20 +12,20 @@ You are most likely interested in transparency logs or supply-chain security. **Preliminaries.** You have basic understanding of cryptographic primitives like digital signatures, hash functions, and Merkle trees. You roughly know what problem -Certificate Transparency solves and how. You may never have heard the term -_gossip-audit model_, or know how it is related to trust assumptions and -detectability properties. +Certificate Transparency solves and how. **Warning.** -This is a work-in-progress document that may be moved or modified. +This is a work-in-progress document that may be moved or modified. A future +revision of this document will bump the version number to v1. Please let us +know if you have any feedback. ## Introduction Transparency logs make it possible to detect unwanted events. For example, are there any (mis-)issued TLS certificates [\[CT\]](https://tools.ietf.org/html/rfc6962), did you get a different Go module than everyone else [\[ChecksumDB\]](https://go.googlesource.com/proposal/+/master/design/25530-sumdb.md), or is someone running unexpected commands on your server [\[AuditLog\]](https://transparency.dev/application/reliably-log-all-actions-performed-on-your-servers/). -System Transparency logging makes signed checksums transparent. The goal is to -_detect_ unwanted key-usage without making assumptions about the signed data. +A System Transparency log makes signed checksums transparent. The overall goal +is to facilitate detection of unwanted key-usage. ## Threat model and (non-)goals We consider a powerful attacker that gained control of a target's signing and @@ -33,7 +33,7 @@ release infrastructure. This covers a weaker form of attacker that is able to sign data and distribute it to a subset of isolated users. For example, this is essentially what FBI requested from Apple in the San Bernardino case [\[FBI-Apple\]](https://www.eff.org/cases/apple-challenges-fbi-all-writs-act-order). The fact that signing keys and related infrastructure components get -compromised should not be controversial [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/). +compromised should not be controversial these days [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/). The attacker can also gain control of the transparency log's signing key and infrastructure. This covers a weaker form of attacker that is able to sign log @@ -47,16 +47,172 @@ detection would result in a significant loss of capability that is by no means trivial to come by. Second, detection means that some part of the attacker's malicious behavior will be disclosed publicly. -Our goal is to facilitate _detection_ of compromised signing keys. Therefore, -we transparency log signed checksums. We assume that clients _fail closed_ if a -checksum does not appear in a public log. We also assume that the attacker -controls at most a threshold of independent parties to achieve our goal -("strength in numbers"). +Our goal is to facilitate _detection_ of compromised signing keys. We consider +a signing key compromised if an end-user accepts an unwanted signature as valid. +The solution that we propose is that signed checksums are transparency logged. +For security we need a collision resistant hash function and an unforgeable +signature scheme. We also assume that at most a threshold of seemingly +independent parties are adversarial. -It is a non-goal to disclose the data that a signed checksum represents. For -example, the log cannot distinguish between a checksum that represents a tax -declaration, an ISO image, or a Debian package. This means that the type of -detection we support is _courser-grained_ when compared to Certificate -Transparency. +It is a non-goal to disclose the data that a checksum represents. For example, +the log cannot distinguish between a checksum that represents a tax declaration, +an ISO image, or a Debian package. This means that the type of detection we +support is more _course-grained_ when compared to Certificate Transparency. ## Design +We consider a data publisher that wants to digitally sign their data. The data +is of opaque type. We assume that end-users have a mechanism to locate the +relevant public verification keys. Data and signatures can also be retrieved +(in)directly from the data publisher. We make little assumptions about the +signature tooling. The ecosystem at large can continue to use `gpg`, `openssl`, +`ssh-keygen -Y`, `signify`, or something else. + +We _have to assume_ that additional tooling can be installed by end-users that +wish to enforce transparency logging. For example, none of the existing +signature tooling support verification of Merkle tree proofs. A side-effect of +our design is that this additional tooling makes no outbound connections. The +above data flows are thus preserved. + +### A bird's view +A central part of any transparency log is the data. The data is stored by the +leaves of an append-only Merkle tree. Our leaf structure contains four fields: +- **shard_hint**: a number that binds the leaf to a particular _shard interval_. +Sharding means that the log has a predefined time during which logging requests +will be accepted. Once elapsed, the log can be shutdown. +- **checksum**: a cryptographic hash of some opaque data. The log never +sees the opaque data; just the hash. +- **signature**: a digital signature that is computed by the data publisher over +the leaf's shard hint and checksum. +- **key_hash**: a cryptographic hash of the public verification key that can be +used to verify the leaf's signature. + +#### Step 1 - preparing a logging request +The data publisher selects a shard hint and a checksum that should be logged. +For example, the shard hint could be "logs that are active during 2021". The +checksum might be a hashed release file or something else. + +The data publisher signs the selected shard hint and checksum using their secret +signing key. Both the signed message and the signature is stored +in the leaf for anyone to verify. Including a shard hint in the signed message +ensures that the good Samaritan cannot change it to log all leaves from an +earlier shard into a newer one. + +The hashed public verification key is also stored in the leaf. This makes it +easy to attribute the leaf to the signing entity. For example, a data publisher +that monitors the log can look for leaves that match their own key hash(es). + +A hash, rather than the full public verification key, is used to force the +verifier to locate the key and trust it explicitly. Not disclosing the public +verification key in the leaf makes it more difficult to use an untrusted key _by +mistake_. + +#### Step 2 - submitting a logging request +The log implements an HTTP(S) API. Input and output is human-readable and uses +percent encoding. We decided to use percent encoding for requests and responses +because it is a simple format that is commonly used on the web. A more complex +parser like JSON is not needed if the exchanged data structures are basic +enough. + +The data publisher submits their shard hint, checksum, signature, and public +verification key as key-value pairs. The log will use the public verification +key to check that the signature is valid, then hash it to construct the leaf. + +The data publisher also submits a _domain hint_. The log will download a DNS +TXT resource record based on the provided domain name. The downloaded result +must match the public verification key hash. By verifying that the submitter +controls a domain that is aware of the public verification key, rate limits can +be applied per second-level domain. As a result, you would need a large number +of domain names to spam the log in any significant way. + +Using DNS to combat spam is convenient because many data publishers already have +a domain name. A single domain name is also relatively cheap. Another +benefit is that the same anti-spam mechanism can be used across several +independent logs without coordination. This is important because a healthy log +ecosystem needs more than one log to be reliable. DNS also has built-in +caching that can be influenced by setting TTLs accordingly. + +The submitter's domain hint is not part of the leaf because key management is +more complex than that. The only service that the log provides is discovery of +signed checksums. Key transparency projects have their own merit. + +The log will _try_ to incorporate a leaf into the Merkle tree if a logging +request is accepted. There are no _promises of public logging_ as in +Certificate Transparency. Therefore, the submitter needs to wait for an +inclusion proof before concluding that the request succeeded. Not having +inclusion promises makes the log less complex. + +#### Step 3 - distributing proofs of public logging +The data publisher is responsible for collecting all cryptographic proofs that +their end-users will need to enforce public logging. It must be possible to +download the following collection (in)directly from the data publisher: +1. **Shard hint**: the data publisher's selected shard hint. +2. **Opaque data**: the data publisher's opaque data. +3. **Signature**: the data publisher's leaf signature. +5. **Cosigned tree head**: the log's tree head and a _list of signatures_ that +state it is consistent with prior history. +6. **Inclusion proof**: a proof of inclusion that is based on the leaf and tree +head in question. + +The public verification key is known. Therefore, the first three fields are +sufficient to reconstruct the logged leaf. The leaf's signature can be +verified. The final two fields then prove that the leaf is in the log. If the +leaf is included in the log, any monitor can detect that there is a new +signature for a data publisher's public verification key. + +The catch is that the proof of logging is only as convincing as the tree head +that the inclusion proof leads up to. To bypass public logging, the attacker +needs to control a threshold of independent _witnesses_ that cosign the log. A +benign witness will only sign the log's tree head if it is consistent with prior +history. + +#### Summary +The log is sharded and will shutdown at a predefined time. The log can shut +down _safely_ because end-user verification is not interactive. The difficulty +of bypassing public logging is based on the difficulty of controlling a +threshold of independent witnesses. Witnesses cosign tree heads to make them +trustworthy. + +Submitters, monitors, and witnesses interact with the log using an HTTP(S) API. +Submitters must prove that they own a domain name as an anti-spam mechanism. +End-users interact with the log _indirectly_ via a data publisher. It is the +data publisher's job to log signed checksums, distribute necessary proofs of +logging, and monitor the log. + +### A peak into the details +Our bird's view introduction skipped many details that matter in practise. Some +of these details are presented here using a question-answer format. A +question-answer format is helpful because it is easily modified and extended. + +#### What cryptographic primitives are supported? +The only supported hash algorithm is SHA256. The only supported signature +scheme is Ed25519. Not having any cryptographic agility makes the protocol +simpler and more secure. + +An immediate follow-up question is how that is supposed to work with existing +and future signature tooling. The key insight is that _additional tooling is +already required to verify Merkle tree proofs. That tooling should use SHA256. +That tooling should also verify all Ed25519 signatures that logs, witnesses, and +data publishers create_. + +For example, suppose that an ecosystem uses `gpg` which has its own incompatible +signature format and algorithms. The data publisher could _cross-sign_ using +Ed25519 as follows: +1. Sign the opaque data as you normally would with `gpg`. +2. Hash the opaque data and use that as the leaf's checksum. Sign the leaf +using Ed25519. + +First the end-user verifies that the `gpg` signature is valid. This is the +old verification process. Then the end-user uses the additional tooling to +verify proofs of logging, which involves SHA256 hashing and Ed25519 signatures. + +The downside is that the data publisher may need to manage an Ed25519 key _as +well_. TODO: motivate why that is a suboptimal but worth-while trade-off. + +#### What (de)serialization parsers are needed? +#### Why witness cosigning? +#### What policy should be used? +#### TODO +Add more key questions and answers. + +## Concluding remarks +Example of binary transparency and reproducible builds. -- cgit v1.2.3 From 6cae1445318e22ce909b0211fc405dbeb6db7c44 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Fri, 30 Apr 2021 12:11:40 +0200 Subject: fixed typos --- doc/design.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'doc') diff --git a/doc/design.md b/doc/design.md index 9fcf4b6..cb379e5 100644 --- a/doc/design.md +++ b/doc/design.md @@ -1,6 +1,6 @@ # System Transparency Logging: Design v0 We propose System Transparency logging. It is similar to Certificate -Transparency, expect that cryptographically signed checksums are logged as +Transparency, except that cryptographically signed checksums are logged as opposed to X.509 certificates. Publicly logging signed checksums allow anyone to discover which keys produced what signatures. As such, malicious and unintended key-usage can be _detected_. We present our design and conclude by @@ -31,7 +31,7 @@ is to facilitate detection of unwanted key-usage. We consider a powerful attacker that gained control of a target's signing and release infrastructure. This covers a weaker form of attacker that is able to sign data and distribute it to a subset of isolated users. For example, this is -essentially what FBI requested from Apple in the San Bernardino case [\[FBI-Apple\]](https://www.eff.org/cases/apple-challenges-fbi-all-writs-act-order). +essentially what the FBI requested from Apple in the San Bernardino case [\[FBI-Apple\]](https://www.eff.org/cases/apple-challenges-fbi-all-writs-act-order). The fact that signing keys and related infrastructure components get compromised should not be controversial these days [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/). @@ -57,7 +57,7 @@ independent parties are adversarial. It is a non-goal to disclose the data that a checksum represents. For example, the log cannot distinguish between a checksum that represents a tax declaration, an ISO image, or a Debian package. This means that the type of detection we -support is more _course-grained_ when compared to Certificate Transparency. +support is more _coarse-grained_ when compared to Certificate Transparency. ## Design We consider a data publisher that wants to digitally sign their data. The data @@ -69,7 +69,7 @@ signature tooling. The ecosystem at large can continue to use `gpg`, `openssl`, We _have to assume_ that additional tooling can be installed by end-users that wish to enforce transparency logging. For example, none of the existing -signature tooling support verification of Merkle tree proofs. A side-effect of +signature tooling supports verification of Merkle tree proofs. A side-effect of our design is that this additional tooling makes no outbound connections. The above data flows are thus preserved. @@ -78,7 +78,7 @@ A central part of any transparency log is the data. The data is stored by the leaves of an append-only Merkle tree. Our leaf structure contains four fields: - **shard_hint**: a number that binds the leaf to a particular _shard interval_. Sharding means that the log has a predefined time during which logging requests -will be accepted. Once elapsed, the log can be shutdown. +will be accepted. Once elapsed, the log can be shut down. - **checksum**: a cryptographic hash of some opaque data. The log never sees the opaque data; just the hash. - **signature**: a digital signature that is computed by the data publisher over @@ -166,7 +166,7 @@ benign witness will only sign the log's tree head if it is consistent with prior history. #### Summary -The log is sharded and will shutdown at a predefined time. The log can shut +The log is sharded and will shut down at a predefined time. The log can shut down _safely_ because end-user verification is not interactive. The difficulty of bypassing public logging is based on the difficulty of controlling a threshold of independent witnesses. Witnesses cosign tree heads to make them @@ -178,7 +178,7 @@ End-users interact with the log _indirectly_ via a data publisher. It is the data publisher's job to log signed checksums, distribute necessary proofs of logging, and monitor the log. -### A peak into the details +### A peek into the details Our bird's view introduction skipped many details that matter in practise. Some of these details are presented here using a question-answer format. A question-answer format is helpful because it is easily modified and extended. -- cgit v1.2.3 From 984f73e11ea1000b3af4f36199f591450afca2af Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Fri, 30 Apr 2021 14:15:50 +0200 Subject: clarified why domain hint is not in the leaf --- doc/design.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'doc') diff --git a/doc/design.md b/doc/design.md index cb379e5..dda9efe 100644 --- a/doc/design.md +++ b/doc/design.md @@ -132,8 +132,8 @@ ecosystem needs more than one log to be reliable. DNS also has built-in caching that can be influenced by setting TTLs accordingly. The submitter's domain hint is not part of the leaf because key management is -more complex than that. The only service that the log provides is discovery of -signed checksums. Key transparency projects have their own merit. +more complex than that. A separate project should focus on transparent key +management. The scope of our work is transparent _key-usage_. The log will _try_ to incorporate a leaf into the Merkle tree if a logging request is accepted. There are no _promises of public logging_ as in -- cgit v1.2.3 From b78c5a72cd6284b5be3cf4e42fd85b7f16cb0dc4 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Fri, 30 Apr 2021 14:32:10 +0200 Subject: rephrased a complex sentence --- doc/design.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'doc') diff --git a/doc/design.md b/doc/design.md index dda9efe..c7be178 100644 --- a/doc/design.md +++ b/doc/design.md @@ -143,14 +143,14 @@ inclusion promises makes the log less complex. #### Step 3 - distributing proofs of public logging The data publisher is responsible for collecting all cryptographic proofs that -their end-users will need to enforce public logging. It must be possible to -download the following collection (in)directly from the data publisher: -1. **Shard hint**: the data publisher's selected shard hint. -2. **Opaque data**: the data publisher's opaque data. +their end-users will need to enforce public logging. The collection below +should be downloadable from the same place that the data is normally hosted. +1. **Opaque data**: the data publisher's opaque data. +2. **Shard hint**: the data publisher's selected shard hint. 3. **Signature**: the data publisher's leaf signature. -5. **Cosigned tree head**: the log's tree head and a _list of signatures_ that +4. **Cosigned tree head**: the log's tree head and a _list of signatures_ that state it is consistent with prior history. -6. **Inclusion proof**: a proof of inclusion that is based on the leaf and tree +5. **Inclusion proof**: a proof of inclusion that is based on the leaf and tree head in question. The public verification key is known. Therefore, the first three fields are -- cgit v1.2.3 From 6de2935d3a6589d35a6e7a59c56c5a67313f3ccb Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Fri, 30 Apr 2021 14:34:38 +0200 Subject: minor edit --- doc/design.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'doc') diff --git a/doc/design.md b/doc/design.md index c7be178..0aa83f2 100644 --- a/doc/design.md +++ b/doc/design.md @@ -186,7 +186,7 @@ question-answer format is helpful because it is easily modified and extended. #### What cryptographic primitives are supported? The only supported hash algorithm is SHA256. The only supported signature scheme is Ed25519. Not having any cryptographic agility makes the protocol -simpler and more secure. +less complex and more secure. An immediate follow-up question is how that is supposed to work with existing and future signature tooling. The key insight is that _additional tooling is -- cgit v1.2.3 From f649f2715dc6c4c7f45116b83a6347a08d7193b4 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Sat, 1 May 2021 15:15:22 +0200 Subject: removed unnecessary parser details in the bird's view --- doc/design.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'doc') diff --git a/doc/design.md b/doc/design.md index 0aa83f2..2836364 100644 --- a/doc/design.md +++ b/doc/design.md @@ -108,10 +108,8 @@ mistake_. #### Step 2 - submitting a logging request The log implements an HTTP(S) API. Input and output is human-readable and uses -percent encoding. We decided to use percent encoding for requests and responses -because it is a simple format that is commonly used on the web. A more complex -parser like JSON is not needed if the exchanged data structures are basic -enough. +a simple key-value format. A more complex parser like JSON is not needed +because the exchanged data structures are basic enough. The data publisher submits their shard hint, checksum, signature, and public verification key as key-value pairs. The log will use the public verification -- cgit v1.2.3 From e61bd2fb0e845eeef11b1825fdbc5e5c52fb2ec5 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Sat, 1 May 2021 19:39:45 +0200 Subject: added context regarding the supported cryptographic primitives --- doc/design.md | 49 ++++++++++++++++++++++++++++--------------------- 1 file changed, 28 insertions(+), 21 deletions(-) (limited to 'doc') diff --git a/doc/design.md b/doc/design.md index 2836364..91de288 100644 --- a/doc/design.md +++ b/doc/design.md @@ -183,32 +183,39 @@ question-answer format is helpful because it is easily modified and extended. #### What cryptographic primitives are supported? The only supported hash algorithm is SHA256. The only supported signature -scheme is Ed25519. Not having any cryptographic agility makes the protocol -less complex and more secure. - -An immediate follow-up question is how that is supposed to work with existing -and future signature tooling. The key insight is that _additional tooling is -already required to verify Merkle tree proofs. That tooling should use SHA256. -That tooling should also verify all Ed25519 signatures that logs, witnesses, and -data publishers create_. - -For example, suppose that an ecosystem uses `gpg` which has its own incompatible -signature format and algorithms. The data publisher could _cross-sign_ using -Ed25519 as follows: -1. Sign the opaque data as you normally would with `gpg`. +scheme is Ed25519. Not having any cryptographic agility makes the protocol less +complex and more secure. + +We can be cryptographically opinionated because of a key insight. Existing +signature tools like `gpg`, `ssh-keygen -Y`, and `signify` cannot verify proofs +of public logging. Therefore, _additional tooling must already be installed by +end-users_. That tooling should verify hashes using the log's hash function. +That tooling should also verify signatures using the log's signature scheme. +Signed messages include tree heads as well as tree leaves. + +#### Why not let the data publisher pick their own signature scheme and format? +Agility introduces complexity and difficult policy questions. For example, +which algorithms and formats should (not) be supported and why? Picking Ed25519 +is a current best practise that should be encouraged if possible. + +There is not much we can do if a data publisher _refuses_ to rely on the log's +hash function or signature scheme. + +#### What if the data publisher must use a specific signature scheme or format? +You may _cross-sign_ the data as follows. +1. Sign the opaque data as you normally would. 2. Hash the opaque data and use that as the leaf's checksum. Sign the leaf -using Ed25519. +using the log's signature scheme. -First the end-user verifies that the `gpg` signature is valid. This is the -old verification process. Then the end-user uses the additional tooling to -verify proofs of logging, which involves SHA256 hashing and Ed25519 signatures. - -The downside is that the data publisher may need to manage an Ed25519 key _as -well_. TODO: motivate why that is a suboptimal but worth-while trade-off. +First the end-user verifies that the normal signature is valid. Then the +end-user lets the additional tooling (that is already required) verify the rest. +Cross-signing should be a relatively comfortable upgrade path that is backwards +compatible. The downside is that the data publisher may need to manage an +additional key-pair. #### What (de)serialization parsers are needed? -#### Why witness cosigning? #### What policy should be used? +#### Why witness cosigning? #### TODO Add more key questions and answers. -- cgit v1.2.3 From 16eed32e779f2fef850c084cb2631898dddcc5dc Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Sat, 1 May 2021 19:46:52 +0200 Subject: added q/a topics --- doc/design.md | 3 +++ 1 file changed, 3 insertions(+) (limited to 'doc') diff --git a/doc/design.md b/doc/design.md index 91de288..22bfab0 100644 --- a/doc/design.md +++ b/doc/design.md @@ -218,6 +218,9 @@ additional key-pair. #### Why witness cosigning? #### TODO Add more key questions and answers. +- Log spamming +- Log poisoning +- Why we removed identifier field from the leaf ## Concluding remarks Example of binary transparency and reproducible builds. -- cgit v1.2.3 From 8f76216554d83cf45094686f6a43f757d2c186fe Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Mon, 3 May 2021 10:47:57 +0200 Subject: added detail that needs to be explained --- doc/design.md | 1 + 1 file changed, 1 insertion(+) (limited to 'doc') diff --git a/doc/design.md b/doc/design.md index 22bfab0..bd24878 100644 --- a/doc/design.md +++ b/doc/design.md @@ -221,6 +221,7 @@ Add more key questions and answers. - Log spamming - Log poisoning - Why we removed identifier field from the leaf +- Explain `latest`, `stable` and `cosigned` tree head. ## Concluding remarks Example of binary transparency and reproducible builds. -- cgit v1.2.3 From 57a600662e98f86fc103f2671a5ec9602e1b7dd0 Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Mon, 3 May 2021 15:38:51 +0200 Subject: Incorporate changes from recent discussions. Remove all RSA support. Motivation: Simpler format for tree_leaf. Replace percent-encoding with headers for indata and key/value in body for outdata. Motivation: ':' is exactly what we want and it works for output data (responses) and not only for input data (requests). Don't POST. Motivation: We don't need the complexity of POST since we don't ever send a lot of data to the log. Split up the get-signed-tree-head endpoint into three separate without input data. Motivation: More explicit API plus easier debugging. Change timestamps and shard hints to use seconds rather than milliseconds. Motivation: time(1) and time(2). --- doc/api.md | 190 ++++++++++++++++++++++++++++++++++--------------------------- 1 file changed, 105 insertions(+), 85 deletions(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index 174d2c9..2d54001 100644 --- a/doc/api.md +++ b/doc/api.md @@ -1,25 +1,29 @@ # System Transparency Logging: API v0 This document describes details of the System Transparency logging API, version 0. The broader picture is not explained here. We assume that you have -read the System Transparency design document. It can be found [here](https://github.com/system-transparency/stfe/blob/design/doc/design.md). +read the System Transparency Logging design document. It can be found [here](https://github.com/system-transparency/stfe/blob/design/doc/design.md). **Warning.** This is a work-in-progress document that may be moved or modified. ## Overview The log implements an HTTP(S) API: -- Requests that add data to the log use the HTTP POST method. -- Request that retrieve data from the log use the HTTP GET method. -- The HTTP content type is `application/x-www-form-urlencoded` for requests and -responses. This means that all input and output are expressed as key-value -pairs. Binary data must be hex-encoded. - -We decided to use percent encoding for requests and responses because it is a -_simple format_ that is commonly used on the web. We are not using percent -encoding for signed and/or logged data. In other words, a submitter may -distribute log responses to their end-users in a different format that suit -them. The forced (de)serialization parser on _end-users_ is a small subset of -Trunnel. Trunnel is an "idiot-proof" wire-format that the Tor project uses. + +- Requests to the log use the HTTP GET method. +- Input data (in requests) and output data (in responses) are + expressed as ASCII-encoded key/value pairs. +- Requests use HTTP request headers for input data while responses use + the HTTP message body for output data. +- Binary data is hex-encoded before being transmitted. + +The motivation for using a text based key/value format for request and +response data is that it's simple to parse. Note that this format is not being +used for the serialization of signed or logged data, where a more +well defined and storage efficient format is desirable. +A submitter may distribute log responses to their end-users in any +format that suits them. The (de)serialization required for +_end-users_ is a small subset of Trunnel. Trunnel is an "idiot-proof" +wire-format in use by the Tor project. ## Primitives ### Cryptography @@ -32,19 +36,14 @@ All other parts that are not Merkle tree related also use SHA256 as the hash function. Using more than one hash function would increases the overall attack surface: two hash functions must be collision resistant instead of one. -We recommend that submitters sign using Ed25519. We also support RSA with -[deterministic](https://tools.ietf.org/html/rfc8017#section-8.2) -or [probabilistic](https://tools.ietf.org/html/rfc8017#section-8.1) -padding. Supporting RSA is suboptimal, but excluding it would make the log -useless for many possible adopters. - ### Serialization -Log requests and responses are percent encoded. Percent encoding is a smaller -dependency than an alternative parser like JSON. It is comparable to rolling -your own minimalistic line-terminated format. Some input and output data is -binary: cryptographic hashes and signatures. Binary data must be expressed as -hex before percent-encoding it. We decided to use hex as opposed to base64 -because it is simpler, favoring simplicity over efficiency on the wire. +Log requests and responses are transmitted as ASCII-encoded key/value +pairs, for a smaller dependency than an alternative parser like JSON. +Some input and output data is binary: cryptographic hashes and +signatures. Binary data must be Base16-encoded, also known as hex +encoding. Using hex as opposed to base64 is motivated by it being +simpler, favoring ease of decoding and encoding over efficiency on the +wire. We use the [Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html) to define (de)serialization of data structures that need to be signed or @@ -57,9 +56,9 @@ can be generated in C and Go. A fair summary of our Trunnel usage is as follows. -All integers are 64-bit, unsigned, and in network byte order. A fixed-size byte -array is put into the serialization buffer in-order, starting from the first -byte. A variable length byte array first declares its length as an integer, +All integers are 64-bit, unsigned, and in network byte order. Fixed-size byte +arrays are put into the serialization buffer in-order, starting from the first +byte. Variable length byte arrays first declare their length as an integer, which is then followed by that number of bytes. These basic types are concatenated to form a collection. You should not need a general-purpose Trunnel (de)serialization parser to work with this format. If you have one, you @@ -70,7 +69,7 @@ format explicit and unambiguous. Tree heads are signed by the log and its witnesses. It contains a timestamp, a tree size, and a root hash. The timestamp is included so that monitors can ensure _liveliness_. It is the time since the UNIX epoch (January 1, 1970 -00:00:00 UTC) in milliseconds. The tree size specifies the current number of +00:00:00 UTC) in seconds. The tree size specifies the current number of leaves. The root hash fixes the structure and content of the Merkle tree. ``` @@ -86,50 +85,25 @@ cosign a tree head if it is inconsistent with prior history or if the timestamp is backdated or future-dated more than 12 hours. #### Merkle tree leaf -The log supports a single leaf type. It contains a message, a signature scheme, -a signature that the submitter computed over the message, and a hash of the +The log supports a single leaf type. It contains a shard hint, a checksum over whatever the submitter wants to log a checksum for, +a signature that the submitter computed over the shard hint and the checksum, and a hash of the public verification key that can be used to verify the signature. ``` -const SIGNATURE_SCHEME_ED25519 = 1; // RFC 8032 -const SIGNATURE_SCHEME_RSASSA_PKCS1_V1_5 = 2; // RFC 8017 -const SIGNATURE_SCHEME_RSASSA_PSS = 3; // RFC 8017 - -struct signature_ed25519 { - u8 signature[32]; -}; - -struct signature_rsassa { - u64 num_bytes IN [ 256, 384, 512 ]; - u8 signature[num_bytes]; -}; - -struct message { +struct tree_leaf { u64 shard_hint; u8 checksum[32]; -}; - -struct tree_leaf { - struct message message; - u64 signature_scheme IN [ - SIGNATURE_SCHEME_ED25519, - SIGNATURE_SCHEME_RSASSA_PKCS1_V1_5, - SIGNATURE_SCHEME_RSASSA_PSS, - ]; - union signature[signature_scheme] { - SIGNATURE_SCHEME_ED25519: struct signature_ed25519 ed25519; - default: struct signature_rsassa rsassa; - } + u8 signature[32]; u8 key_hash[32]; } ``` -Unlike X.509 certificates that already have validity ranges, a checksum does not -have any such information. Therefore, we require that the submitter selects a +Unlike X.509 certificates which already have validity ranges, a checksum does not +carry any such information. Therefore, we require that the submitter selects a _shard hint_. The selected shard hint must be in the log's _shard interval_. A shard interval is defined by a start time and an end time. Both ends of the -shard interval are inclusive and expressed as the number of milliseconds since -the UNIX epoch (January 1, 1970 00:00:00 UTC). +shard interval are inclusive and expressed as the number of seconds since +the UNIX epoch (January 1, 1970 00:00 UTC). Sharding simplifies log operations because it becomes explicit when a log can be shutdown. A log must only accept logging requests that have valid shard hints. @@ -143,11 +117,15 @@ shard into a newer one. Not only would that defeat the purpose of sharding, but it would also become a potential denial-of-service vector. The signed message is composed of the selected shard hint and the submitter's -checksum. It must be possible to verify the signature using the specified -signature scheme and the submitter's public verification key. +checksum. It must be possible to verify the signature using the +submitter's public verification key. -A key-hash is included in the leaf so that it can be attributed to the signing -entity. A hash, rather than the full public verification key, is used to force +Note that the way `shard_hint` and `chekcsum` are serialized with +regards to signing differs from how they're being transmitted to the +log. + +A key hash is included in the leaf so that the leaf can be attributed to the +submitter. A hash, rather than the full public verification key, is used to motivate the verifier to locate the appropriate key and make an explicit trust decision. ## Public endpoints @@ -162,32 +140,76 @@ human-readable value that describes what went wrong. For example, `error=invalid+signature`, `error=rate+limit+exceeded`, or `error=unknown+leaf+hash`. -### get-signed-tree-head +### get-tree-head-cosigned +Returns the latest cosigned tree head. Used by ordinary users of the log. + ``` -GET /st/v0/get-signed-tree-head +GET /st/v0/get-tree-head-cosigned ``` Input: -- "type": either the string "latest", "stable", or "cosigned". - - latest: ask for the most recent signed tree head. - - stable: ask for a recent signed tree head that is fixed for some period - of time. - - cosigned: ask for a recent cosigned tree head. +- None Output on success: -- "timestamp": `tree_head.timestamp` as a human-readable number. -- "tree_size": `tree_head.tree_size` as a human-readable number. -- "root_hash": `tree_head.root_hash` in hex. -- "signature": an Ed25519 signature over `tree_head`. The result is -hex-encoded. +- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, seconds since the UNIX epoch. +- "tree_size": `tree_head.tree_size` ASCII-encoded decimal number. +- "root_hash": `tree_head.root_hash` hex-encoded. +- "signature": hex-encoded Ed25519 signature over `tree_head` serialzed as described in section `Merkle tree head`. - "key_hash": a hash of the public verification key that can be used to verify -the signature. The public verification key is serialized as in RFC 8032, then -hashed using SHA256. The result is hex-encoded. +the signature. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and then +hashed using SHA256. The hash value is hex-encoded. The "signature" and "key_hash" fields may repeat. The first signature corresponds to the first key hash, the second signature corresponds to the second key hash, etc. The number of signatures and key hashes must match. +### get-tree-head-to-sign +Returns the latest tree head to be signed by log witnesses. Used by +witnesses. + +``` +GET /st/v0/get-tree-head-to-sign +``` + +Input: +- None + +Output on success: +- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, seconds since the UNIX epoch. +- "tree_size": `tree_head.tree_size` ASCII-encoded decimal number. +- "root_hash": `tree_head.root_hash` hex-encoded. +- "signature": hex-encoded Ed25519 signature over `tree_head` serialzed as described in section `Merkle tree head`. +- "key_hash": a hash of the public verification key that can be used to verify +the signature. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and then +hashed using SHA256. The hash value is hex-encoded. + +There is exactly one `signature` and one `key_hash` field. The +`key_hash` refers to the log's signing key. + + +### get-tree-head-latest +Returns the latest tree head, signed only by the log. Used for debug. + +``` +GET /st/v0/get-tree-head-latest +``` + +Input: +- None + +Output on success: +- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, seconds since the UNIX epoch. +- "tree_size": `tree_head.tree_size` ASCII-encoded decimal number. +- "root_hash": `tree_head.root_hash` hex-encoded. +- "signature": hex-encoded Ed25519 signature over `tree_head` serialzed as described in section `Merkle tree head`. +- "key_hash": a hash of the public verification key that can be used to verify +the signature. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and then +hashed using SHA256. The hash value is hex-encoded. + +There is exactly one `signature` and one `key_hash` field. The +`key_hash` refers to the log's signing key. + + ### get-proof-by-hash ``` POST /st/v0/get-proof-by-hash @@ -260,11 +282,9 @@ POST /st/v0/add-leaf ``` Input: -- "shard_hint": human-readable number in the log's shard interval that the +- "shard_hint": human-readable decimal number in the log's shard interval that the submitter selected. -- "checksum": the cryptographic checksum that the submitter wants to log in hex. -- "signature_scheme": human-readable number that identifies the submitter's -signature scheme. +- "checksum": the cryptographic checksum that the submitter wants to log in hex. note: fixed length 64 bytes, validated by the server somehow - "signature": the submitter's signature over `tree_leaf.message`. The result is hex-encoded. - "verification_key": the submitter's public verification key. It is serialized @@ -314,6 +334,6 @@ log's tree head signatures. - **Log identifier**: the hashed public verification key using SHA256. - **Shard interval**: the time during which the log accepts logging requests. The shard interval's start and end are inclusive and expressed as the number of -milliseconds since the UNIX epoch. +seconds since the UNIX epoch. - **Base URL**: where the log can be reached over HTTP(S). It is the prefix before a version-0 specific endpoint. -- cgit v1.2.3 From e7bd2f29e7226e39bee7d0a1b89965ef5bdf5dc2 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Mon, 3 May 2021 22:48:17 +0200 Subject: added q/a topic --- doc/design.md | 1 + 1 file changed, 1 insertion(+) (limited to 'doc') diff --git a/doc/design.md b/doc/design.md index bd24878..4c764e3 100644 --- a/doc/design.md +++ b/doc/design.md @@ -222,6 +222,7 @@ Add more key questions and answers. - Log poisoning - Why we removed identifier field from the leaf - Explain `latest`, `stable` and `cosigned` tree head. +- Privacy aspects ## Concluding remarks Example of binary transparency and reproducible builds. -- cgit v1.2.3 From c82c4e1266c5e8fbe08cb0f6140caea2723ef205 Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Tue, 4 May 2021 14:51:36 +0200 Subject: be explicit with key type; define struct message, for tree_leaf Specify who's verification key -- log, witness or submitter. Move shard_hint and checksum in tree_leaf into its own struct, for a more explicit definition of what to be signed. --- doc/api.md | 59 ++++++++++++++++++++++++++++++++++------------------------- 1 file changed, 34 insertions(+), 25 deletions(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index 2d54001..0fa445a 100644 --- a/doc/api.md +++ b/doc/api.md @@ -87,14 +87,18 @@ is backdated or future-dated more than 12 hours. #### Merkle tree leaf The log supports a single leaf type. It contains a shard hint, a checksum over whatever the submitter wants to log a checksum for, a signature that the submitter computed over the shard hint and the checksum, and a hash of the -public verification key that can be used to verify the signature. +submitter's public verification key, that can be used to verify the signature. ``` +struct message { + u64 shard_hint; + u8 checksum[32]; +}; + struct tree_leaf { - u64 shard_hint; - u8 checksum[32]; - u8 signature[32]; - u8 key_hash[32]; + struct message; + u8 signature_over_message[32]; + u8 key_hash[32]; } ``` @@ -116,17 +120,20 @@ Without a shard hint, the good Samaritan could log all leaves from an earlier shard into a newer one. Not only would that defeat the purpose of sharding, but it would also become a potential denial-of-service vector. -The signed message is composed of the selected shard hint and the submitter's -checksum. It must be possible to verify the signature using the -submitter's public verification key. +The signed message is composed of the chosen `shard_hint` and the +submitter's `checksum`. It must be possible to verify +`signature_over_message` using the submitter's public verification +key. -Note that the way `shard_hint` and `chekcsum` are serialized with +Note that the way `shard_hint` and `checksum` are serialized with regards to signing differs from how they're being transmitted to the log. -A key hash is included in the leaf so that the leaf can be attributed to the -submitter. A hash, rather than the full public verification key, is used to motivate -the verifier to locate the appropriate key and make an explicit trust decision. +A `key_hash` of the key used for signing `message` is included in +`tree_leaf` so that the leaf can be attributed to the submitter. A +hash, rather than the full public key, is used to motivate the +verifier to locate the appropriate key and make an explicit trust +decision. ## Public endpoints Every log has a base URL that identifies it uniquely. The only constraint is @@ -155,8 +162,8 @@ Output on success: - "tree_size": `tree_head.tree_size` ASCII-encoded decimal number. - "root_hash": `tree_head.root_hash` hex-encoded. - "signature": hex-encoded Ed25519 signature over `tree_head` serialzed as described in section `Merkle tree head`. -- "key_hash": a hash of the public verification key that can be used to verify -the signature. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and then +- "key_hash": a hash of the public verification key (belonging to either the log or to one of its witnesses), which can be used to verify +the most recent `signature`. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and then hashed using SHA256. The hash value is hex-encoded. The "signature" and "key_hash" fields may repeat. The first signature @@ -179,16 +186,16 @@ Output on success: - "tree_size": `tree_head.tree_size` ASCII-encoded decimal number. - "root_hash": `tree_head.root_hash` hex-encoded. - "signature": hex-encoded Ed25519 signature over `tree_head` serialzed as described in section `Merkle tree head`. -- "key_hash": a hash of the public verification key that can be used to verify -the signature. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and then +- "key_hash": a hash of the log's public verification key, which can be used to verify +`signature`. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and then hashed using SHA256. The hash value is hex-encoded. There is exactly one `signature` and one `key_hash` field. The -`key_hash` refers to the log's signing key. +`key_hash` refers to the log's public verification key. ### get-tree-head-latest -Returns the latest tree head, signed only by the log. Used for debug. +Returns the latest tree head, signed only by the log. Used for debugging purposes. ``` GET /st/v0/get-tree-head-latest @@ -202,12 +209,13 @@ Output on success: - "tree_size": `tree_head.tree_size` ASCII-encoded decimal number. - "root_hash": `tree_head.root_hash` hex-encoded. - "signature": hex-encoded Ed25519 signature over `tree_head` serialzed as described in section `Merkle tree head`. -- "key_hash": a hash of the public verification key that can be used to verify -the signature. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and then -hashed using SHA256. The hash value is hex-encoded. +- "key_hash": a hash of the log's public verification key that can be +used to verify `signature`. The key is encoded as defined in +[RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), +and then hashed using SHA256. The hash value is hex-encoded. There is exactly one `signature` and one `key_hash` field. The -`key_hash` refers to the log's signing key. +`key_hash` refers to the log's public verification key. ### get-proof-by-hash @@ -317,9 +325,10 @@ POST /st/v0/add-cosignature Input: - "signature": an Ed25519 signature over `tree_head`. The result is hex-encoded. -- "key_hash": a hash of the public verification key that can be used to verify -the signature. The public verification key is serialized as in RFC 8032, then -hashed using SHA256. The result is hex-encoded. +- "key_hash": a hash of the witness' public verification key that can be used +to verify the signature. The key is encoded as defined in [RFC 8032, +section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and +then hashed using SHA256. The hash value is hex-encoded. Output on success: - None -- cgit v1.2.3 From a30eb85272010ffbcfd3fb2c6932dc2f15d596c1 Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Tue, 4 May 2021 14:53:00 +0200 Subject: get rid of the underspecified term "ordinary users" --- doc/api.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index 0fa445a..976c167 100644 --- a/doc/api.md +++ b/doc/api.md @@ -148,7 +148,8 @@ human-readable value that describes what went wrong. For example, `error=unknown+leaf+hash`. ### get-tree-head-cosigned -Returns the latest cosigned tree head. Used by ordinary users of the log. +Returns the latest cosigned tree head. Used together with +`get-proof-by-hash` and `get-consistency-proof` for verifying the log. ``` GET /st/v0/get-tree-head-cosigned -- cgit v1.2.3 From c163da97d32a291a1c913800c926d7758c641c6e Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Tue, 4 May 2021 14:53:36 +0200 Subject: specify serialization of key --- doc/api.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index 976c167..362dc46 100644 --- a/doc/api.md +++ b/doc/api.md @@ -296,8 +296,7 @@ submitter selected. - "checksum": the cryptographic checksum that the submitter wants to log in hex. note: fixed length 64 bytes, validated by the server somehow - "signature": the submitter's signature over `tree_leaf.message`. The result is hex-encoded. -- "verification_key": the submitter's public verification key. It is serialized -as described in the corresponding RFC. The result is hex-encoded. +- "verification_key": the submitter's public verification key. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2). The result is hex-encoded. - "domain_hint": a domain name that indicates where `tree_leaf.key_hash` can be retrieved as a DNS TXT resource record in hex. -- cgit v1.2.3 From 9ee06539685bdcaea84b3daede5354d83264c1e4 Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Tue, 4 May 2021 16:08:22 +0200 Subject: explain how input and output data are sent This is the "header in, body out" idea written up. We might change to a "POST body in, receive body out" scheme with "Content-Type: application/stfe" if we can decide that POST is not a terrible idea after all. --- doc/api.md | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index 362dc46..c747aa2 100644 --- a/doc/api.md +++ b/doc/api.md @@ -141,6 +141,16 @@ that it must be a valid HTTP(S) URL that can have the `/st/v0/` suffix appended. For example, a complete endpoint URL could be `https://log.example.com/2021/st/v0/get-signed-tree-head`. +Input data (in requests) is sent as ASCII key/value pairs as HTTP +entity headers, with their keys prefixed with the string +`stlog-`. Example: For sending `treee_size=4711` as input a client +would send the HTTP header `stlog-tree_size: 4711`. + +Output data (in replies) is sent in the HTTP message body in the same +format as the input data, i.e. as ASCII key/value pairs on the format +`Key: Value`. Example: For sending `tree_size=4711` as output a log +would send an HTTP message body consisting of `stlog-tree_size: 4711`. + The HTTP status code is 200 OK to indicate success. A different HTTP status code is used to indicate failure. The log should set the "error" key to a human-readable value that describes what went wrong. For example, -- cgit v1.2.3 From 044dc540aaf3950695602e849906969aed7e6a46 Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Tue, 4 May 2021 16:15:59 +0200 Subject: be consistent with "request" vs "entity" headers --- doc/api.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index c747aa2..638b753 100644 --- a/doc/api.md +++ b/doc/api.md @@ -12,7 +12,7 @@ The log implements an HTTP(S) API: - Requests to the log use the HTTP GET method. - Input data (in requests) and output data (in responses) are expressed as ASCII-encoded key/value pairs. -- Requests use HTTP request headers for input data while responses use +- Requests use HTTP entity headers for input data while responses use the HTTP message body for output data. - Binary data is hex-encoded before being transmitted. -- cgit v1.2.3 From 3b5b4429d94e142ee12af7eb5f89b49997b72237 Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Tue, 4 May 2021 16:22:32 +0200 Subject: whitespace changes --- doc/api.md | 323 ++++++++++++++++++++++++++++++++++--------------------------- 1 file changed, 181 insertions(+), 142 deletions(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index 638b753..5b7cb19 100644 --- a/doc/api.md +++ b/doc/api.md @@ -1,7 +1,9 @@ # System Transparency Logging: API v0 -This document describes details of the System Transparency logging API, -version 0. The broader picture is not explained here. We assume that you have -read the System Transparency Logging design document. It can be found [here](https://github.com/system-transparency/stfe/blob/design/doc/design.md). +This document describes details of the System Transparency logging +API, version 0. The broader picture is not explained here. We assume +that you have read the System Transparency Logging design document. +It can be found +[here](https://github.com/system-transparency/stfe/blob/design/doc/design.md). **Warning.** This is a work-in-progress document that may be moved or modified. @@ -17,24 +19,28 @@ The log implements an HTTP(S) API: - Binary data is hex-encoded before being transmitted. The motivation for using a text based key/value format for request and -response data is that it's simple to parse. Note that this format is not being -used for the serialization of signed or logged data, where a more -well defined and storage efficient format is desirable. -A submitter may distribute log responses to their end-users in any +response data is that it's simple to parse. Note that this format is +not being used for the serialization of signed or logged data, where a +more well defined and storage efficient format is desirable. A +submitter may distribute log responses to their end-users in any format that suits them. The (de)serialization required for _end-users_ is a small subset of Trunnel. Trunnel is an "idiot-proof" wire-format in use by the Tor project. ## Primitives ### Cryptography -The log uses the same Merkle tree hash strategy as [RFC 6962, §2](https://tools.ietf.org/html/rfc6962#section-2). -The hash functions must be [SHA256](https://csrc.nist.gov/csrc/media/publications/fips/180/4/final/documents/fips180-4-draft-aug2014.pdf). -The log must sign tree heads using [Ed25519](https://tools.ietf.org/html/rfc8032). -The log's witnesses must also sign tree heads using Ed25519. - -All other parts that are not Merkle tree related also use SHA256 as the hash -function. Using more than one hash function would increases the overall attack -surface: two hash functions must be collision resistant instead of one. +The log uses the same Merkle tree hash strategy as +[RFC 6962,§2](https://tools.ietf.org/html/rfc6962#section-2). +The hash functions must be +[SHA256](https://csrc.nist.gov/csrc/media/publications/fips/180/4/final/documents/fips180-4-draft-aug2014.pdf). +The log must sign tree heads using +[Ed25519](https://tools.ietf.org/html/rfc8032). The log's witnesses +must also sign tree heads using Ed25519. + +All other parts that are not Merkle tree related also use SHA256 as +the hash function. Using more than one hash function would increases +the overall attack surface: two hash functions must be collision +resistant instead of one. ### Serialization Log requests and responses are transmitted as ASCII-encoded key/value @@ -45,32 +51,36 @@ encoding. Using hex as opposed to base64 is motivated by it being simpler, favoring ease of decoding and encoding over efficiency on the wire. -We use the [Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html) +We use the +[Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html) to define (de)serialization of data structures that need to be signed or inserted into the Merkle tree. Trunnel is more expressive than the [SSH wire format](https://tools.ietf.org/html/rfc4251#section-5). -It is about as expressive as the [TLS presentation language](https://tools.ietf.org/html/rfc8446#section-3). -A notable difference is that Trunnel supports integer constraints. The Trunnel -language is also readable by humans _and_ machines. "Obviously correct code" -can be generated in C and Go. +It is about as expressive as the +[TLS presentation language](https://tools.ietf.org/html/rfc8446#section-3). +A notable difference is that Trunnel supports integer constraints. +The Trunnel language is also readable by humans _and_ machines. +"Obviously correct code" can be generated in C and Go. A fair summary of our Trunnel usage is as follows. -All integers are 64-bit, unsigned, and in network byte order. Fixed-size byte -arrays are put into the serialization buffer in-order, starting from the first -byte. Variable length byte arrays first declare their length as an integer, -which is then followed by that number of bytes. These basic types are -concatenated to form a collection. You should not need a general-purpose -Trunnel (de)serialization parser to work with this format. If you have one, you -may use it though. The main point of using Trunnel is that it makes a simple -format explicit and unambiguous. +All integers are 64-bit, unsigned, and in network byte order. +Fixed-size byte arrays are put into the serialization buffer in-order, +starting from the first byte. Variable length byte arrays first +declare their length as an integer, which is then followed by that +number of bytes. These basic types are concatenated to form a +collection. You should not need a general-purpose Trunnel +(de)serialization parser to work with this format. If you have one, +you may use it though. The main point of using Trunnel is that it +makes a simple format explicit and unambiguous. #### Merkle tree head -Tree heads are signed by the log and its witnesses. It contains a timestamp, a -tree size, and a root hash. The timestamp is included so that monitors can -ensure _liveliness_. It is the time since the UNIX epoch (January 1, 1970 -00:00:00 UTC) in seconds. The tree size specifies the current number of -leaves. The root hash fixes the structure and content of the Merkle tree. +Tree heads are signed by the log and its witnesses. It contains a +timestamp, a tree size, and a root hash. The timestamp is included so +that monitors can ensure _liveliness_. It is the time since the UNIX +epoch (January 1, 1970 00:00:00 UTC) in seconds. The tree size +specifies the current number of leaves. The root hash fixes the +structure and content of the Merkle tree. ``` struct tree_head { @@ -80,14 +90,16 @@ struct tree_head { }; ``` -The serialized tree head must be signed using Ed25519. A witness must not -cosign a tree head if it is inconsistent with prior history or if the timestamp -is backdated or future-dated more than 12 hours. +The serialized tree head must be signed using Ed25519. A witness must +not cosign a tree head if it is inconsistent with prior history or if +the timestamp is backdated or future-dated more than 12 hours. #### Merkle tree leaf -The log supports a single leaf type. It contains a shard hint, a checksum over whatever the submitter wants to log a checksum for, -a signature that the submitter computed over the shard hint and the checksum, and a hash of the -submitter's public verification key, that can be used to verify the signature. +The log supports a single leaf type. It contains a shard hint, a +checksum over whatever the submitter wants to log a checksum for, a +signature that the submitter computed over the shard hint and the +checksum, and a hash of the submitter's public verification key, that +can be used to verify the signature. ``` struct message { @@ -102,23 +114,26 @@ struct tree_leaf { } ``` -Unlike X.509 certificates which already have validity ranges, a checksum does not -carry any such information. Therefore, we require that the submitter selects a -_shard hint_. The selected shard hint must be in the log's _shard interval_. A -shard interval is defined by a start time and an end time. Both ends of the -shard interval are inclusive and expressed as the number of seconds since -the UNIX epoch (January 1, 1970 00:00 UTC). - -Sharding simplifies log operations because it becomes explicit when a log can be -shutdown. A log must only accept logging requests that have valid shard hints. -A log should only accept logging requests during the predefined shard interval. -Note that _the submitter's shard hint is not a verified timestamp_. The -submitter should set the shard hint as large as possible. If a roughly verified -timestamp is needed, a cosigned tree head can be used. - -Without a shard hint, the good Samaritan could log all leaves from an earlier -shard into a newer one. Not only would that defeat the purpose of sharding, but -it would also become a potential denial-of-service vector. +Unlike X.509 certificates which already have validity ranges, a +checksum does not carry any such information. Therefore, we require +that the submitter selects a _shard hint_. The selected shard hint +must be in the log's _shard interval_. A shard interval is defined by +a start time and an end time. Both ends of the shard interval are +inclusive and expressed as the number of seconds since the UNIX epoch +(January 1, 1970 00:00 UTC). + +Sharding simplifies log operations because it becomes explicit when a +log can be shutdown. A log must only accept logging requests that +have valid shard hints. A log should only accept logging requests +during the predefined shard interval. Note that _the submitter's +shard hint is not a verified timestamp_. The submitter should set the +shard hint as large as possible. If a roughly verified timestamp is +needed, a cosigned tree head can be used. + +Without a shard hint, the good Samaritan could log all leaves from an +earlier shard into a newer one. Not only would that defeat the +purpose of sharding, but it would also become a potential +denial-of-service vector. The signed message is composed of the chosen `shard_hint` and the submitter's `checksum`. It must be possible to verify @@ -136,9 +151,10 @@ verifier to locate the appropriate key and make an explicit trust decision. ## Public endpoints -Every log has a base URL that identifies it uniquely. The only constraint is -that it must be a valid HTTP(S) URL that can have the `/st/v0/` suffix -appended. For example, a complete endpoint URL could be +Every log has a base URL that identifies it uniquely. The only +constraint is that it must be a valid HTTP(S) URL that can have the +`/st/v0/` suffix appended. For example, a complete endpoint +URL could be `https://log.example.com/2021/st/v0/get-signed-tree-head`. Input data (in requests) is sent as ASCII key/value pairs as HTTP @@ -151,11 +167,11 @@ format as the input data, i.e. as ASCII key/value pairs on the format `Key: Value`. Example: For sending `tree_size=4711` as output a log would send an HTTP message body consisting of `stlog-tree_size: 4711`. -The HTTP status code is 200 OK to indicate success. A different HTTP status -code is used to indicate failure. The log should set the "error" key to a -human-readable value that describes what went wrong. For example, -`error=invalid+signature`, `error=rate+limit+exceeded`, or -`error=unknown+leaf+hash`. +The HTTP status code is 200 OK to indicate success. A different HTTP +status code is used to indicate failure. The log should set the +"error" key to a human-readable value that describes what went wrong. +For example, `error=invalid+signature`, `error=rate+limit+exceeded`, +or `error=unknown+leaf+hash`. ### get-tree-head-cosigned Returns the latest cosigned tree head. Used together with @@ -169,17 +185,22 @@ Input: - None Output on success: -- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, seconds since the UNIX epoch. +- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, + seconds since the UNIX epoch. - "tree_size": `tree_head.tree_size` ASCII-encoded decimal number. - "root_hash": `tree_head.root_hash` hex-encoded. -- "signature": hex-encoded Ed25519 signature over `tree_head` serialzed as described in section `Merkle tree head`. -- "key_hash": a hash of the public verification key (belonging to either the log or to one of its witnesses), which can be used to verify -the most recent `signature`. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and then -hashed using SHA256. The hash value is hex-encoded. +- "signature": hex-encoded Ed25519 signature over `tree_head` + serialzed as described in section `Merkle tree head`. +- "key_hash": a hash of the public verification key (belonging to + either the log or to one of its witnesses), which can be used to + verify the most recent `signature`. The key is encoded as defined + in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), + and then hashed using SHA256. The hash value is hex-encoded. The "signature" and "key_hash" fields may repeat. The first signature -corresponds to the first key hash, the second signature corresponds to the -second key hash, etc. The number of signatures and key hashes must match. +corresponds to the first key hash, the second signature corresponds to +the second key hash, etc. The number of signatures and key hashes +must match. ### get-tree-head-to-sign Returns the latest tree head to be signed by log witnesses. Used by @@ -193,20 +214,24 @@ Input: - None Output on success: -- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, seconds since the UNIX epoch. +- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, + seconds since the UNIX epoch. - "tree_size": `tree_head.tree_size` ASCII-encoded decimal number. - "root_hash": `tree_head.root_hash` hex-encoded. -- "signature": hex-encoded Ed25519 signature over `tree_head` serialzed as described in section `Merkle tree head`. -- "key_hash": a hash of the log's public verification key, which can be used to verify -`signature`. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and then -hashed using SHA256. The hash value is hex-encoded. +- "signature": hex-encoded Ed25519 signature over `tree_head` + serialzed as described in section `Merkle tree head`. +- "key_hash": a hash of the log's public verification key, which can + be used to verify `signature`. The key is encoded as defined in + [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), + and then hashed using SHA256. The hash value is hex-encoded. There is exactly one `signature` and one `key_hash` field. The `key_hash` refers to the log's public verification key. ### get-tree-head-latest -Returns the latest tree head, signed only by the log. Used for debugging purposes. +Returns the latest tree head, signed only by the log. Used for +debugging purposes. ``` GET /st/v0/get-tree-head-latest @@ -216,14 +241,16 @@ Input: - None Output on success: -- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, seconds since the UNIX epoch. +- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, + seconds since the UNIX epoch. - "tree_size": `tree_head.tree_size` ASCII-encoded decimal number. - "root_hash": `tree_head.root_hash` hex-encoded. -- "signature": hex-encoded Ed25519 signature over `tree_head` serialzed as described in section `Merkle tree head`. +- "signature": hex-encoded Ed25519 signature over `tree_head` + serialzed as described in section `Merkle tree head`. - "key_hash": a hash of the log's public verification key that can be -used to verify `signature`. The key is encoded as defined in -[RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), -and then hashed using SHA256. The hash value is hex-encoded. + used to verify `signature`. The key is encoded as defined in + [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), + and then hashed using SHA256. The hash value is hex-encoded. There is exactly one `signature` and one `key_hash` field. The `key_hash` refers to the log's public verification key. @@ -235,21 +262,22 @@ POST /st/v0/get-proof-by-hash ``` Input: -- "leaf_hash": a hex-encoded leaf hash that identifies which `tree_leaf` the -log should prove inclusion for. The leaf hash is computed using the RFC 6962 -hashing strategy. In other words, `SHA256(0x00 | tree_leaf)`. -- "tree_size": a human-readable tree size of the tree head that the proof should -be based on. +- "leaf_hash": a hex-encoded leaf hash that identifies which + `tree_leaf` the log should prove inclusion for. The leaf hash is + computed using the RFC 6962 hashing strategy. In other words, + `SHA256(0x00 | tree_leaf)`. +- "tree_size": a human-readable tree size of the tree head that the + proof should be based on. Output on success: - "tree_size": human-readable tree size that the proof is based on. -- "leaf_index": human-readable zero-based index of the leaf that the proof is -based on. +- "leaf_index": human-readable zero-based index of the leaf that the + proof is based on. - "inclusion_path": a node hash in hex. -The "inclusion_path" may be omitted or repeated to represent an inclusion proof -of zero or more node hashes. The order of node hashes follow from our hash -strategy, see RFC 6962. +The "inclusion_path" may be omitted or repeated to represent an +inclusion proof of zero or more node hashes. The order of node hashes +follow from our hash strategy, see RFC 6962. ### get-consistency-proof ``` @@ -258,19 +286,19 @@ POST /st/v0/get-consistency-proof Input: - "new_size": human-readable tree size of a newer tree head. -- "old_size": human-readable tree size of an older tree head that the log should -prove is consistent with the newer tree head. +- "old_size": human-readable tree size of an older tree head that the + log should prove is consistent with the newer tree head. Output on success: -- "new_size": human-readable tree size of a newer tree head that the proof -is based on. -- "old_size": human-readable tree size of an older tree head that the proof is -based on. +- "new_size": human-readable tree size of a newer tree head that the + proof is based on. +- "old_size": human-readable tree size of an older tree head that the + proof is based on. - "consistency_path": a node hash in hex. -The "consistency_path" may be omitted or repeated to represent a consistency -proof of zero or more node hashes. The order of node hashes follow from our -hash strategy, see RFC 6962. +The "consistency_path" may be omitted or repeated to represent a +consistency proof of zero or more node hashes. The order of node +hashes follow from our hash strategy, see RFC 6962. ### get-leaves ``` @@ -282,18 +310,21 @@ Input: - "end_size": human-readable index of the last leaf to retrieve. Output on success: -- "shard_hint": `tree_leaf.message.shard_hint` as a human-readable number. +- "shard_hint": `tree_leaf.message.shard_hint` as a human-readable + number. - "checksum": `tree_leaf.message.checksum` in hex. -- "signature_scheme": human-readable number that identifies a signature scheme. +- "signature_scheme": human-readable number that identifies a + signature scheme. - "signature": `tree_leaf.signature` in hex. - "key_hash": `tree_leaf.key_hash` in hex. -All fields may be repeated to return more than one leaf. The first value in -each list refers to the first leaf, the second value in each list refers to the -second leaf, etc. The size of each list must match. +All fields may be repeated to return more than one leaf. The first +value in each list refers to the first leaf, the second value in each +list refers to the second leaf, etc. The size of each list must +match. -The log may return fewer leaves than requested. At least one leaf must be -returned on HTTP status code 200 OK. +The log may return fewer leaves than requested. At least one leaf +must be returned on HTTP status code 200 OK. ### add-leaf ``` @@ -301,31 +332,38 @@ POST /st/v0/add-leaf ``` Input: -- "shard_hint": human-readable decimal number in the log's shard interval that the -submitter selected. -- "checksum": the cryptographic checksum that the submitter wants to log in hex. note: fixed length 64 bytes, validated by the server somehow -- "signature": the submitter's signature over `tree_leaf.message`. The result -is hex-encoded. -- "verification_key": the submitter's public verification key. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2). The result is hex-encoded. -- "domain_hint": a domain name that indicates where `tree_leaf.key_hash` can be -retrieved as a DNS TXT resource record in hex. +- "shard_hint": human-readable decimal number in the log's shard + interval that the submitter selected. +- "checksum": the cryptographic checksum that the submitter wants to + log in hex. note: fixed length 64 bytes, validated by the server + somehow +- "signature": the submitter's signature over `tree_leaf.message`. + The result is hex-encoded. +- "verification_key": the submitter's public verification key. The + key is encoded as defined in + [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2). The result is hex-encoded. +- "domain_hint": a domain name that indicates where + `tree_leaf.key_hash` can be retrieved as a DNS TXT resource record + in hex. Output on success: - None -The submitted entry will not be accepted if the signature is invalid or if the -downloaded verification-key hash does not match. The submitted entry may also -not be accepted if the second-level domain name exceeded its rate limit. By -coupling every add-leaf request with a second-level domain, it becomes more -difficult to spam the log. You would need an excessive number of domain names. -This becomes costly if free domain names are rejected. +The submitted entry will not be accepted if the signature is invalid +or if the downloaded verification-key hash does not match. The +submitted entry may also not be accepted if the second-level domain +name exceeded its rate limit. By coupling every add-leaf request with +a second-level domain, it becomes more difficult to spam the log. You +would need an excessive number of domain names. This becomes costly +if free domain names are rejected. -The log does not publish domain-name to key bindings because key management is -more complex than that. +The log does not publish domain-name to key bindings because key +management is more complex than that. -Public logging should not be assumed until an inclusion proof is available. An -inclusion proof should not be relied upon unless it leads up to a trustworthy -signed tree head. Witness cosigning can make a tree head trustworthy. +Public logging should not be assumed until an inclusion proof is +available. An inclusion proof should not be relied upon unless it +leads up to a trustworthy signed tree head. Witness cosigning can +make a tree head trustworthy. ### add-cosignature ``` @@ -334,25 +372,26 @@ POST /st/v0/add-cosignature Input: - "signature": an Ed25519 signature over `tree_head`. The result is -hex-encoded. -- "key_hash": a hash of the witness' public verification key that can be used -to verify the signature. The key is encoded as defined in [RFC 8032, -section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and -then hashed using SHA256. The hash value is hex-encoded. + hex-encoded. +- "key_hash": a hash of the witness' public verification key that can + be used to verify the signature. The key is encoded as defined in + [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), + and then hashed using SHA256. The hash value is hex-encoded. Output on success: - None -The key-hash can be used to identify which witness signed the log's tree head. -A key-hash, rather than the full verification key, is used to force the verifier -to locate the appropriate key and make an explicit trust decision. +The key-hash can be used to identify which witness signed the log's +tree head. A key-hash, rather than the full verification key, is used +to force the verifier to locate the appropriate key and make an +explicit trust decision. ## Summary of log parameters -- **Public key**: an Ed25519 verification key that can be used to verify the -log's tree head signatures. +- **Public key**: an Ed25519 verification key that can be used to + verify the log's tree head signatures. - **Log identifier**: the hashed public verification key using SHA256. -- **Shard interval**: the time during which the log accepts logging requests. -The shard interval's start and end are inclusive and expressed as the number of -seconds since the UNIX epoch. -- **Base URL**: where the log can be reached over HTTP(S). It is the prefix -before a version-0 specific endpoint. +- **Shard interval**: the time during which the log accepts logging + requests. The shard interval's start and end are inclusive and + expressed as the number of seconds since the UNIX epoch. +- **Base URL**: where the log can be reached over HTTP(S). It is the + prefix before a version-0 specific endpoint. -- cgit v1.2.3 From d13da7fd14c9050a70313f00b71955beb4276132 Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Tue, 4 May 2021 16:25:36 +0200 Subject: seconds, not milliseconds --- doc/api.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index 5b7cb19..c9d3db9 100644 --- a/doc/api.md +++ b/doc/api.md @@ -78,7 +78,7 @@ makes a simple format explicit and unambiguous. Tree heads are signed by the log and its witnesses. It contains a timestamp, a tree size, and a root hash. The timestamp is included so that monitors can ensure _liveliness_. It is the time since the UNIX -epoch (January 1, 1970 00:00:00 UTC) in seconds. The tree size +epoch (January 1, 1970 00:00 UTC) in seconds. The tree size specifies the current number of leaves. The root hash fixes the structure and content of the Merkle tree. -- cgit v1.2.3 From 866320e7cb3f8eee21f464cbc56d518f6eb66c72 Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Tue, 4 May 2021 16:33:01 +0200 Subject: move long description of sharding to the design doc --- doc/api.md | 49 ++++++++++++++----------------------------------- doc/design.md | 22 ++++++++++++++++++++++ 2 files changed, 36 insertions(+), 35 deletions(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index c9d3db9..3a595ee 100644 --- a/doc/api.md +++ b/doc/api.md @@ -114,41 +114,20 @@ struct tree_leaf { } ``` -Unlike X.509 certificates which already have validity ranges, a -checksum does not carry any such information. Therefore, we require -that the submitter selects a _shard hint_. The selected shard hint -must be in the log's _shard interval_. A shard interval is defined by -a start time and an end time. Both ends of the shard interval are -inclusive and expressed as the number of seconds since the UNIX epoch -(January 1, 1970 00:00 UTC). - -Sharding simplifies log operations because it becomes explicit when a -log can be shutdown. A log must only accept logging requests that -have valid shard hints. A log should only accept logging requests -during the predefined shard interval. Note that _the submitter's -shard hint is not a verified timestamp_. The submitter should set the -shard hint as large as possible. If a roughly verified timestamp is -needed, a cosigned tree head can be used. - -Without a shard hint, the good Samaritan could log all leaves from an -earlier shard into a newer one. Not only would that defeat the -purpose of sharding, but it would also become a potential -denial-of-service vector. - -The signed message is composed of the chosen `shard_hint` and the -submitter's `checksum`. It must be possible to verify -`signature_over_message` using the submitter's public verification -key. - -Note that the way `shard_hint` and `checksum` are serialized with -regards to signing differs from how they're being transmitted to the -log. - -A `key_hash` of the key used for signing `message` is included in -`tree_leaf` so that the leaf can be attributed to the submitter. A -hash, rather than the full public key, is used to motivate the -verifier to locate the appropriate key and make an explicit trust -decision. +`message` is composed of the `shard_hint`, chosen by the submitter to +match the shard interval for the log, and the submitter's `checksum` +to be logged. + +`signature_over_message` is a signature over `message`, using the +submitter's verification key. It must be possible to verify the +signature using the submitter's public verification key, as indicated +by `key_hash`. + +`key_hash` is a hash of the submitter's verification key used for +signing `message`. It is included in `tree_leaf` so that the leaf can +be attributed to the submitter. A hash, rather than the full public +key, is used to motivate verifiers to locate the appropriate key and +make an explicit trust decision. ## Public endpoints Every log has a base URL that identifies it uniquely. The only diff --git a/doc/design.md b/doc/design.md index 4c764e3..a840c01 100644 --- a/doc/design.md +++ b/doc/design.md @@ -216,6 +216,28 @@ additional key-pair. #### What (de)serialization parsers are needed? #### What policy should be used? #### Why witness cosigning? +#### Why sharding? +Unlike X.509 certificates which already have validity ranges, a +checksum does not carry any such information. Therefore, we require +that the submitter selects a _shard hint_. The selected shard hint +must be in the log's _shard interval_. A shard interval is defined by +a start time and an end time. Both ends of the shard interval are +inclusive and expressed as the number of seconds since the UNIX epoch +(January 1, 1970 00:00 UTC). + +Sharding simplifies log operations because it becomes explicit when a +log can be shutdown. A log must only accept logging requests that +have valid shard hints. A log should only accept logging requests +during the predefined shard interval. Note that _the submitter's +shard hint is not a verified timestamp_. The submitter should set the +shard hint as large as possible. If a roughly verified timestamp is +needed, a cosigned tree head can be used. + +Without a shard hint, the good Samaritan could log all leaves from an +earlier shard into a newer one. Not only would that defeat the +purpose of sharding, but it would also become a potential +denial-of-service vector. + #### TODO Add more key questions and answers. - Log spamming -- cgit v1.2.3 From 78c68e528517f157f29784f9dc87b3246f046e52 Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Tue, 4 May 2021 16:43:31 +0200 Subject: no need for encoding SPACE --- doc/api.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index 3a595ee..d75fe6f 100644 --- a/doc/api.md +++ b/doc/api.md @@ -147,10 +147,10 @@ format as the input data, i.e. as ASCII key/value pairs on the format would send an HTTP message body consisting of `stlog-tree_size: 4711`. The HTTP status code is 200 OK to indicate success. A different HTTP -status code is used to indicate failure. The log should set the -"error" key to a human-readable value that describes what went wrong. -For example, `error=invalid+signature`, `error=rate+limit+exceeded`, -or `error=unknown+leaf+hash`. +status code is used to indicate failure. The log should set the value +value for the key `error` to a human-readable string describing what +went wrong. For example, `error: invalid signature`, `error: rate +limit exceeded`, or `error: unknown leaf hash`. ### get-tree-head-cosigned Returns the latest cosigned tree head. Used together with -- cgit v1.2.3 From ee4ad9e1e4be9e969c13a12f5e76a2b439077b6e Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Tue, 4 May 2021 17:19:46 +0200 Subject: another pass over the input and output descriptions Mostly replacing "human-readable" with something more well defined. --- doc/api.md | 132 +++++++++++++++++++++++++++++++------------------------------ 1 file changed, 68 insertions(+), 64 deletions(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index d75fe6f..8a46af6 100644 --- a/doc/api.md +++ b/doc/api.md @@ -237,65 +237,69 @@ There is exactly one `signature` and one `key_hash` field. The ### get-proof-by-hash ``` -POST /st/v0/get-proof-by-hash +GET /st/v0/get-proof-by-hash ``` Input: -- "leaf_hash": a hex-encoded leaf hash that identifies which - `tree_leaf` the log should prove inclusion for. The leaf hash is - computed using the RFC 6962 hashing strategy. In other words, - `SHA256(0x00 | tree_leaf)`. -- "tree_size": a human-readable tree size of the tree head that the - proof should be based on. +- "leaf_hash": leaf identifying which `tree_leaf` the log should prove + inclusion of, hex-encoded. +- "tree_size": tree size of the tree head that the proof should be + based on, as an ASCII-encoded decimal number. Output on success: -- "tree_size": human-readable tree size that the proof is based on. -- "leaf_index": human-readable zero-based index of the leaf that the - proof is based on. -- "inclusion_path": a node hash in hex. +- "tree_size": tree size that the proof is based on, as an + ASCII-encoded decimal number. +- "leaf_index": zero-based index of the leaf that the proof is based + on, as an ASCII-encoded decimal number. +- "inclusion_path": node hash, hex-encoded. -The "inclusion_path" may be omitted or repeated to represent an -inclusion proof of zero or more node hashes. The order of node hashes -follow from our hash strategy, see RFC 6962. +The leaf hash is computed using the RFC 6962 hashing strategy. In +other words, `SHA256(0x00 | tree_leaf)`. + +`inclusion_path` may be omitted or repeated to represent an inclusion +proof of zero or more node hashes. The order of node hashes follow +from the hash strategy, see RFC 6962. ### get-consistency-proof ``` -POST /st/v0/get-consistency-proof +GET /st/v0/get-consistency-proof ``` Input: -- "new_size": human-readable tree size of a newer tree head. -- "old_size": human-readable tree size of an older tree head that the - log should prove is consistent with the newer tree head. +- "new_size": tree size of a newer tree head, as an ASCII-encoded + decimal number. +- "old_size": tree size of an older tree head that the log should + prove is consistent with the newer tree head, as an ASCII-encoded + decimal number. Output on success: -- "new_size": human-readable tree size of a newer tree head that the - proof is based on. -- "old_size": human-readable tree size of an older tree head that the - proof is based on. -- "consistency_path": a node hash in hex. +- "new_size": tree size of the newer tree head that the proof is based + on, as an ASCII-encoded decimal number. +- "old_size": tree size of the older tree head that the proof is based + on, as an ASCII-encoded decimal number. +- "consistency_path": node hash, hex-encoded. -The "consistency_path" may be omitted or repeated to represent a +`consistency_path` may be omitted or repeated to represent a consistency proof of zero or more node hashes. The order of node -hashes follow from our hash strategy, see RFC 6962. +hashes follow from the hash strategy, see RFC 6962. ### get-leaves ``` -POST /st/v0/get-leaves +GET /st/v0/get-leaves ``` Input: -- "start_size": human-readable index of the first leaf to retrieve. -- "end_size": human-readable index of the last leaf to retrieve. +- "start_size": index of the first leaf to retrieve, as an + ASCII-encoded decimal number. +- "end_size": index of the last leaf to retrieve, as an ASCII-encoded + decimal number. Output on success: -- "shard_hint": `tree_leaf.message.shard_hint` as a human-readable - number. -- "checksum": `tree_leaf.message.checksum` in hex. -- "signature_scheme": human-readable number that identifies a - signature scheme. -- "signature": `tree_leaf.signature` in hex. -- "key_hash": `tree_leaf.key_hash` in hex. +- "shard_hint": `tree_leaf.message.shard_hint` as an ASCII-encoded + decimal number. +- "checksum": `tree_leaf.message.checksum`, hex-encoded. +- "signature": `tree_leaf.signature_over_message`, hex-encoded. +- "key_hash": `tree_leaf.key_hash`, hex-encoded. All fields may be repeated to return more than one leaf. The first value in each list refers to the first leaf, the second value in each @@ -307,31 +311,32 @@ must be returned on HTTP status code 200 OK. ### add-leaf ``` -POST /st/v0/add-leaf +GET /st/v0/add-leaf ``` Input: -- "shard_hint": human-readable decimal number in the log's shard - interval that the submitter selected. +- "shard_hint": number within the log's shard interval as an + ASCII-encoded decimal number. - "checksum": the cryptographic checksum that the submitter wants to - log in hex. note: fixed length 64 bytes, validated by the server - somehow -- "signature": the submitter's signature over `tree_leaf.message`. - The result is hex-encoded. + log, hex-encoded. +- "signature_over_message": the submitter's signature over + `tree_leaf.message`, hex-encoded. - "verification_key": the submitter's public verification key. The key is encoded as defined in - [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2). The result is hex-encoded. -- "domain_hint": a domain name that indicates where - `tree_leaf.key_hash` can be retrieved as a DNS TXT resource record - in hex. + [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2) + and then hex-encoded. +- "domain_hint": domain name indicating where `tree_leaf.key_hash` + can be found as a DNS TXT resource record. Output on success: - None -The submitted entry will not be accepted if the signature is invalid -or if the downloaded verification-key hash does not match. The -submitted entry may also not be accepted if the second-level domain -name exceeded its rate limit. By coupling every add-leaf request with +The submission will not be accepted if `signature_over_message` is +invalid or if the key hash retrieved using `domain_hint` does not +match a hash over `verification_key`. + +The submission may also not be accepted if the second-level domain +name exceeded its rate limit. By coupling every add-leaf request to a second-level domain, it becomes more difficult to spam the log. You would need an excessive number of domain names. This becomes costly if free domain names are rejected. @@ -339,31 +344,30 @@ if free domain names are rejected. The log does not publish domain-name to key bindings because key management is more complex than that. -Public logging should not be assumed until an inclusion proof is -available. An inclusion proof should not be relied upon unless it -leads up to a trustworthy signed tree head. Witness cosigning can -make a tree head trustworthy. +Public logging should not be assumed to have happened until an +inclusion proof is available. An inclusion proof should not be relied +upon unless it leads up to a trustworthy signed tree head. Witness +cosigning can make a tree head trustworthy. ### add-cosignature ``` -POST /st/v0/add-cosignature +GET /st/v0/add-cosignature ``` Input: -- "signature": an Ed25519 signature over `tree_head`. The result is - hex-encoded. -- "key_hash": a hash of the witness' public verification key that can - be used to verify the signature. The key is encoded as defined in +- "signature": Ed25519 signature over `tree_head`, hex-encoded. +- "key_hash": hash of the witness' public verification key that can be + used to verify `signature`. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), - and then hashed using SHA256. The hash value is hex-encoded. + and then hashed using SHA256. The hash value is hex-encoded. Output on success: - None -The key-hash can be used to identify which witness signed the log's -tree head. A key-hash, rather than the full verification key, is used -to force the verifier to locate the appropriate key and make an -explicit trust decision. +`key_hash` can be used to identify which witness signed the log's tree +head. A key-hash, rather than the full verification key, is used to +motivate verifiers to locate the appropriate key and make an explicit +trust decision. ## Summary of log parameters - **Public key**: an Ed25519 verification key that can be used to -- cgit v1.2.3 From 8301e63f91b023e57b2d7c8b11d3dff4f0056aed Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Tue, 4 May 2021 17:22:06 +0200 Subject: use backticks for quoting single words I think this is more markdownish. --- doc/api.md | 78 +++++++++++++++++++++++++++++++------------------------------- 1 file changed, 39 insertions(+), 39 deletions(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index 8a46af6..c6a4569 100644 --- a/doc/api.md +++ b/doc/api.md @@ -164,19 +164,19 @@ Input: - None Output on success: -- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, +- `timestamp`: `tree_head.timestamp` ASCII-encoded decimal number, seconds since the UNIX epoch. -- "tree_size": `tree_head.tree_size` ASCII-encoded decimal number. -- "root_hash": `tree_head.root_hash` hex-encoded. -- "signature": hex-encoded Ed25519 signature over `tree_head` +- `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number. +- `root_hash`: `tree_head.root_hash` hex-encoded. +- `signature`: hex-encoded Ed25519 signature over `tree_head` serialzed as described in section `Merkle tree head`. -- "key_hash": a hash of the public verification key (belonging to +- `key_hash`: a hash of the public verification key (belonging to either the log or to one of its witnesses), which can be used to verify the most recent `signature`. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and then hashed using SHA256. The hash value is hex-encoded. -The "signature" and "key_hash" fields may repeat. The first signature +The `signature` and `key_hash` fields may repeat. The first signature corresponds to the first key hash, the second signature corresponds to the second key hash, etc. The number of signatures and key hashes must match. @@ -193,13 +193,13 @@ Input: - None Output on success: -- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, +- `timestamp`: `tree_head.timestamp` ASCII-encoded decimal number, seconds since the UNIX epoch. -- "tree_size": `tree_head.tree_size` ASCII-encoded decimal number. -- "root_hash": `tree_head.root_hash` hex-encoded. -- "signature": hex-encoded Ed25519 signature over `tree_head` +- `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number. +- `root_hash`: `tree_head.root_hash` hex-encoded. +- `signature`: hex-encoded Ed25519 signature over `tree_head` serialzed as described in section `Merkle tree head`. -- "key_hash": a hash of the log's public verification key, which can +- `key_hash`: a hash of the log's public verification key, which can be used to verify `signature`. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and then hashed using SHA256. The hash value is hex-encoded. @@ -220,13 +220,13 @@ Input: - None Output on success: -- "timestamp": `tree_head.timestamp` ASCII-encoded decimal number, +- `timestamp`: `tree_head.timestamp` ASCII-encoded decimal number, seconds since the UNIX epoch. -- "tree_size": `tree_head.tree_size` ASCII-encoded decimal number. -- "root_hash": `tree_head.root_hash` hex-encoded. -- "signature": hex-encoded Ed25519 signature over `tree_head` +- `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number. +- `root_hash`: `tree_head.root_hash` hex-encoded. +- `signature`: hex-encoded Ed25519 signature over `tree_head` serialzed as described in section `Merkle tree head`. -- "key_hash": a hash of the log's public verification key that can be +- `key_hash`: a hash of the log's public verification key that can be used to verify `signature`. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and then hashed using SHA256. The hash value is hex-encoded. @@ -241,17 +241,17 @@ GET /st/v0/get-proof-by-hash ``` Input: -- "leaf_hash": leaf identifying which `tree_leaf` the log should prove +- `leaf_hash`: leaf identifying which `tree_leaf` the log should prove inclusion of, hex-encoded. -- "tree_size": tree size of the tree head that the proof should be +- `tree_size`: tree size of the tree head that the proof should be based on, as an ASCII-encoded decimal number. Output on success: -- "tree_size": tree size that the proof is based on, as an +- `tree_size`: tree size that the proof is based on, as an ASCII-encoded decimal number. -- "leaf_index": zero-based index of the leaf that the proof is based +- `leaf_index`: zero-based index of the leaf that the proof is based on, as an ASCII-encoded decimal number. -- "inclusion_path": node hash, hex-encoded. +- `inclusion_path`: node hash, hex-encoded. The leaf hash is computed using the RFC 6962 hashing strategy. In other words, `SHA256(0x00 | tree_leaf)`. @@ -266,18 +266,18 @@ GET /st/v0/get-consistency-proof ``` Input: -- "new_size": tree size of a newer tree head, as an ASCII-encoded +- `new_size`: tree size of a newer tree head, as an ASCII-encoded decimal number. -- "old_size": tree size of an older tree head that the log should +- `old_size`: tree size of an older tree head that the log should prove is consistent with the newer tree head, as an ASCII-encoded decimal number. Output on success: -- "new_size": tree size of the newer tree head that the proof is based +- `new_size`: tree size of the newer tree head that the proof is based on, as an ASCII-encoded decimal number. -- "old_size": tree size of the older tree head that the proof is based +- `old_size`: tree size of the older tree head that the proof is based on, as an ASCII-encoded decimal number. -- "consistency_path": node hash, hex-encoded. +- `consistency_path`: node hash, hex-encoded. `consistency_path` may be omitted or repeated to represent a consistency proof of zero or more node hashes. The order of node @@ -289,17 +289,17 @@ GET /st/v0/get-leaves ``` Input: -- "start_size": index of the first leaf to retrieve, as an +- `start_size`: index of the first leaf to retrieve, as an ASCII-encoded decimal number. -- "end_size": index of the last leaf to retrieve, as an ASCII-encoded +- `end_size`: index of the last leaf to retrieve, as an ASCII-encoded decimal number. Output on success: -- "shard_hint": `tree_leaf.message.shard_hint` as an ASCII-encoded +- `shard_hint`: `tree_leaf.message.shard_hint` as an ASCII-encoded decimal number. -- "checksum": `tree_leaf.message.checksum`, hex-encoded. -- "signature": `tree_leaf.signature_over_message`, hex-encoded. -- "key_hash": `tree_leaf.key_hash`, hex-encoded. +- `checksum`: `tree_leaf.message.checksum`, hex-encoded. +- `signature`: `tree_leaf.signature_over_message`, hex-encoded. +- `key_hash`: `tree_leaf.key_hash`, hex-encoded. All fields may be repeated to return more than one leaf. The first value in each list refers to the first leaf, the second value in each @@ -315,17 +315,17 @@ GET /st/v0/add-leaf ``` Input: -- "shard_hint": number within the log's shard interval as an +- `shard_hint`: number within the log's shard interval as an ASCII-encoded decimal number. -- "checksum": the cryptographic checksum that the submitter wants to +- `checksum`: the cryptographic checksum that the submitter wants to log, hex-encoded. -- "signature_over_message": the submitter's signature over +- `signature_over_message`: the submitter's signature over `tree_leaf.message`, hex-encoded. -- "verification_key": the submitter's public verification key. The +- `verification_key`: the submitter's public verification key. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2) and then hex-encoded. -- "domain_hint": domain name indicating where `tree_leaf.key_hash` +- `domain_hint`: domain name indicating where `tree_leaf.key_hash` can be found as a DNS TXT resource record. Output on success: @@ -355,8 +355,8 @@ GET /st/v0/add-cosignature ``` Input: -- "signature": Ed25519 signature over `tree_head`, hex-encoded. -- "key_hash": hash of the witness' public verification key that can be +- `signature`: Ed25519 signature over `tree_head`, hex-encoded. +- `key_hash`: hash of the witness' public verification key that can be used to verify `signature`. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), and then hashed using SHA256. The hash value is hex-encoded. -- cgit v1.2.3 From 8261776989fd25fbdcf1f0e930c1b3848886ba70 Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Wed, 5 May 2021 10:09:35 +0200 Subject: minor wording --- doc/design.md | 58 +++++++++++++++++++++++++++++----------------------------- 1 file changed, 29 insertions(+), 29 deletions(-) (limited to 'doc') diff --git a/doc/design.md b/doc/design.md index a840c01..a1a6140 100644 --- a/doc/design.md +++ b/doc/design.md @@ -74,46 +74,46 @@ our design is that this additional tooling makes no outbound connections. The above data flows are thus preserved. ### A bird's view -A central part of any transparency log is the data. The data is stored by the +A central part of any transparency log is the data stored by the log. The data is stored by the leaves of an append-only Merkle tree. Our leaf structure contains four fields: - **shard_hint**: a number that binds the leaf to a particular _shard interval_. Sharding means that the log has a predefined time during which logging requests -will be accepted. Once elapsed, the log can be shut down. +are accepted. Once elapsed, the log can be shut down. - **checksum**: a cryptographic hash of some opaque data. The log never -sees the opaque data; just the hash. +sees the opaque data; just the hash made by the data publisher. - **signature**: a digital signature that is computed by the data publisher over the leaf's shard hint and checksum. -- **key_hash**: a cryptographic hash of the public verification key that can be -used to verify the leaf's signature. +- **key_hash**: a cryptographic hash of the data publisher's public verification key that can be +used to verify the signature. #### Step 1 - preparing a logging request The data publisher selects a shard hint and a checksum that should be logged. For example, the shard hint could be "logs that are active during 2021". The -checksum might be a hashed release file or something else. +checksum might be the hash of a release file. -The data publisher signs the selected shard hint and checksum using their secret +The data publisher signs the selected shard hint and checksum using a secret signing key. Both the signed message and the signature is stored in the leaf for anyone to verify. Including a shard hint in the signed message -ensures that the good Samaritan cannot change it to log all leaves from an +ensures that a good Samaritan cannot change it to log all leaves from an earlier shard into a newer one. -The hashed public verification key is also stored in the leaf. This makes it -easy to attribute the leaf to the signing entity. For example, a data publisher +A hash of the public verification key is also stored in the leaf. This makes it +possible to attribute the leaf to the data publisher. For example, a data publisher that monitors the log can look for leaves that match their own key hash(es). -A hash, rather than the full public verification key, is used to force the -verifier to locate the key and trust it explicitly. Not disclosing the public -verification key in the leaf makes it more difficult to use an untrusted key _by +A hash, rather than the full public verification key, is used to motivate the +verifier to locate the key and make an explicit trust decision. Not disclosing the public +verification key in the leaf makes it more unlikely that someone would use an untrusted key _by mistake_. #### Step 2 - submitting a logging request The log implements an HTTP(S) API. Input and output is human-readable and uses a simple key-value format. A more complex parser like JSON is not needed -because the exchanged data structures are basic enough. +because the exchanged data structures are primitive enough. The data publisher submits their shard hint, checksum, signature, and public verification key as key-value pairs. The log will use the public verification -key to check that the signature is valid, then hash it to construct the leaf. +key to check that the signature is valid, then hash it to construct the `key_hash` part of the leaf. The data publisher also submits a _domain hint_. The log will download a DNS TXT resource record based on the provided domain name. The downloaded result @@ -126,8 +126,8 @@ Using DNS to combat spam is convenient because many data publishers already have a domain name. A single domain name is also relatively cheap. Another benefit is that the same anti-spam mechanism can be used across several independent logs without coordination. This is important because a healthy log -ecosystem needs more than one log to be reliable. DNS also has built-in -caching that can be influenced by setting TTLs accordingly. +ecosystem needs more than one log in order to be reliable. DNS also has built-in +caching which data publishers can influence by setting TTLs accordingly. The submitter's domain hint is not part of the leaf because key management is more complex than that. A separate project should focus on transparent key @@ -136,26 +136,26 @@ management. The scope of our work is transparent _key-usage_. The log will _try_ to incorporate a leaf into the Merkle tree if a logging request is accepted. There are no _promises of public logging_ as in Certificate Transparency. Therefore, the submitter needs to wait for an -inclusion proof before concluding that the request succeeded. Not having +inclusion proof to appear before concluding that the logging request succeeded. Not having inclusion promises makes the log less complex. #### Step 3 - distributing proofs of public logging The data publisher is responsible for collecting all cryptographic proofs that their end-users will need to enforce public logging. The collection below -should be downloadable from the same place that the data is normally hosted. +should be downloadable from the same place that published data is normally hosted. 1. **Opaque data**: the data publisher's opaque data. 2. **Shard hint**: the data publisher's selected shard hint. 3. **Signature**: the data publisher's leaf signature. 4. **Cosigned tree head**: the log's tree head and a _list of signatures_ that state it is consistent with prior history. -5. **Inclusion proof**: a proof of inclusion that is based on the leaf and tree +5. **Inclusion proof**: a proof of inclusion based on the logged leaf and tree head in question. -The public verification key is known. Therefore, the first three fields are +The data publisher's public verification key is known. Therefore, the first three fields are sufficient to reconstruct the logged leaf. The leaf's signature can be verified. The final two fields then prove that the leaf is in the log. If the leaf is included in the log, any monitor can detect that there is a new -signature for a data publisher's public verification key. +signature made by a given data publisher, 's public verification key. The catch is that the proof of logging is only as convincing as the tree head that the inclusion proof leads up to. To bypass public logging, the attacker @@ -191,7 +191,7 @@ signature tools like `gpg`, `ssh-keygen -Y`, and `signify` cannot verify proofs of public logging. Therefore, _additional tooling must already be installed by end-users_. That tooling should verify hashes using the log's hash function. That tooling should also verify signatures using the log's signature scheme. -Signed messages include tree heads as well as tree leaves. +Both tree heads and tree leaves are being signed. #### Why not let the data publisher pick their own signature scheme and format? Agility introduces complexity and difficult policy questions. For example, @@ -202,13 +202,13 @@ There is not much we can do if a data publisher _refuses_ to rely on the log's hash function or signature scheme. #### What if the data publisher must use a specific signature scheme or format? -You may _cross-sign_ the data as follows. -1. Sign the opaque data as you normally would. -2. Hash the opaque data and use that as the leaf's checksum. Sign the leaf -using the log's signature scheme. +They may _cross-sign_ the data as follows. +1. Sign the data as they're used to. +2. Hash the data and use the result as the leaf's checksum to be logged. +3. Sign the leaf using the log's signature scheme. -First the end-user verifies that the normal signature is valid. Then the -end-user lets the additional tooling (that is already required) verify the rest. +For verification, the end-user first verifies that the usual signature from step 1 is valid. Then the +end-user uses the additional tooling (which is already required) to verify the rest. Cross-signing should be a relatively comfortable upgrade path that is backwards compatible. The downside is that the data publisher may need to manage an additional key-pair. -- cgit v1.2.3 From cd02e6e2bd7e36d8333824e57913d08a56d8a85b Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Wed, 5 May 2021 12:31:04 +0200 Subject: add reminder about another q/a --- doc/design.md | 1 + 1 file changed, 1 insertion(+) (limited to 'doc') diff --git a/doc/design.md b/doc/design.md index a1a6140..2e01a34 100644 --- a/doc/design.md +++ b/doc/design.md @@ -245,6 +245,7 @@ Add more key questions and answers. - Why we removed identifier field from the leaf - Explain `latest`, `stable` and `cosigned` tree head. - Privacy aspects +- How does this whole thing work with more than one log? ## Concluding remarks Example of binary transparency and reproducible builds. -- cgit v1.2.3 From c4a99d20dcbf524f94a018ac712d830e7e655ce2 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Tue, 11 May 2021 11:28:01 +0200 Subject: removed unused schemas --- doc/schema/consistency_proof.schema.json | 30 ------------------- doc/schema/example/consistency_proof.json | 7 ----- doc/schema/example/inclusion_proof.json | 7 ----- doc/schema/example/leaves.json | 14 --------- doc/schema/example/sth.json | 11 ------- doc/schema/inclusion_proof.schema.json | 30 ------------------- doc/schema/leaves.schema.json | 38 ----------------------- doc/schema/sth.schema.json | 50 ------------------------------- 8 files changed, 187 deletions(-) delete mode 100644 doc/schema/consistency_proof.schema.json delete mode 100644 doc/schema/example/consistency_proof.json delete mode 100644 doc/schema/example/inclusion_proof.json delete mode 100644 doc/schema/example/leaves.json delete mode 100644 doc/schema/example/sth.json delete mode 100644 doc/schema/inclusion_proof.schema.json delete mode 100644 doc/schema/leaves.schema.json delete mode 100644 doc/schema/sth.schema.json (limited to 'doc') diff --git a/doc/schema/consistency_proof.schema.json b/doc/schema/consistency_proof.schema.json deleted file mode 100644 index 003f3c7..0000000 --- a/doc/schema/consistency_proof.schema.json +++ /dev/null @@ -1,30 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft-07/schema#", - "title": "inclusion_proof", - "description": "JSON-formatted inclusion proof, version 0.", - - "type": "object", - "required": [ "new_size", "old_size", "consistency_proof" ], - "properties": { - "new_size": { - "description": "The tree size of the newer Merkle tree head.", - "type": "integer", - "minimum": 0 - }, - "old_size": { - "description": "The tree size of the older Merkle tree head.", - "type": "integer", - "minimum": 0 - }, - "consistency_proof": { - "description": "A list of base64-encoded node hashes that proves consistency", - "type": "array", - "items": { - "description": "A node hash in base64", - "type": "string", - "minLength": 44, - "maxLength": 44 - } - } - } -} diff --git a/doc/schema/example/consistency_proof.json b/doc/schema/example/consistency_proof.json deleted file mode 100644 index 0a323b7..0000000 --- a/doc/schema/example/consistency_proof.json +++ /dev/null @@ -1,7 +0,0 @@ -{ - "new_size": 2, - "old_size": 1, - "consistency_proof": [ - "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=" - ] -} diff --git a/doc/schema/example/inclusion_proof.json b/doc/schema/example/inclusion_proof.json deleted file mode 100644 index d46d426..0000000 --- a/doc/schema/example/inclusion_proof.json +++ /dev/null @@ -1,7 +0,0 @@ -{ - "tree_size": 2, - "leaf_index": 0, - "inclusion_proof": [ - "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=" - ] -} diff --git a/doc/schema/example/leaves.json b/doc/schema/example/leaves.json deleted file mode 100644 index 1eed05d..0000000 --- a/doc/schema/example/leaves.json +++ /dev/null @@ -1,14 +0,0 @@ -[ - { - "checksum": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=", - "signature_scheme": 1, - "signature": "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC=", - "key_hash": "DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD=" - }, - { - "checksum": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=", - "signature_scheme": 2, - "signature": "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB=", - "key_hash": "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC=" - } -] diff --git a/doc/schema/example/sth.json b/doc/schema/example/sth.json deleted file mode 100644 index ec3ad11..0000000 --- a/doc/schema/example/sth.json +++ /dev/null @@ -1,11 +0,0 @@ -{ - "timestamp": 0, - "tree_size": 0, - "root_hash": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=", - "signatures": [ - { - "key_hash": "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB=", - "signature": "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC=" - } - ] -} diff --git a/doc/schema/inclusion_proof.schema.json b/doc/schema/inclusion_proof.schema.json deleted file mode 100644 index 3309d37..0000000 --- a/doc/schema/inclusion_proof.schema.json +++ /dev/null @@ -1,30 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft-07/schema#", - "title": "inclusion_proof", - "description": "JSON-formatted inclusion proof, version 0.", - - "type": "object", - "required": [ "tree_size", "leaf_index", "inclusion_proof" ], - "properties": { - "tree_size": { - "description": "The Merkle tree size that the inclusion proof is based on.", - "type": "integer", - "minimum": 0 - }, - "leaf_index": { - "description": "The zero-based index of the leaf that the inclusion proof is for.", - "type": "integer", - "minimum": 0 - }, - "inclusion_proof": { - "description": "A list of base64-encoded node hashes that proves inclusion", - "type": "array", - "items": { - "description": "A node hash in base64", - "type": "string", - "minLength": 44, - "maxLength": 44 - } - } - } -} diff --git a/doc/schema/leaves.schema.json b/doc/schema/leaves.schema.json deleted file mode 100644 index 74d7454..0000000 --- a/doc/schema/leaves.schema.json +++ /dev/null @@ -1,38 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft-07/schema#", - "title": "list of tree_leaf", - "description": "JSON-formatted tree leaf list, version 0.", - - "type": "array", - "description": "A list Merkle tree leaves", - "items": { - "type": "object", - "required": [ "checksum", "signature_scheme", "signature", "key_hash" ], - "properties": { - "checksum": { - "description": "A cryptographic hash that is computed over some data of opaque type. The result is base64-encoded.", - "type": "string", - "minLength": 44, - "maxLength": 44 - }, - "signature_scheme": { - "description": "An integer that identifies the signature scheme used by the submitter. See API documentation.", - "type": "integer", - "enum": [ 1, 2, 3 ] - }, - "signature": { - "description": "The submitter's signature over the checksum in base64", - "type": "string", - "minLength": 44, - "maxLength": 684 - }, - "key_hash": { - "description": "A public verification-key hash that identifies the signer.", - "type": "string", - "minLength": 44, - "maxLength": 44 - } - } - }, - "minItems": 1 -} diff --git a/doc/schema/sth.schema.json b/doc/schema/sth.schema.json deleted file mode 100644 index 86de2d3..0000000 --- a/doc/schema/sth.schema.json +++ /dev/null @@ -1,50 +0,0 @@ -{ - "$schema": "https://json-schema.org/draft-07/schema#", - "title": "signed_tree_head_v0", - "description": "JSON-formatted signed tree head, version 0.", - - "type": "object", - "required": [ "timestamp", "tree_size", "root_hash", "signatures" ], - "properties": { - "timestamp": { - "description": "The number of milliseconds since the UNIX epoch (January 1, 1970 00:00:00 UTC).", - "type": "integer", - "minimum": 0 - }, - "tree_size": { - "description": "The number of entries that are stored in the log's Merkle tree.", - "type": "integer", - "minimum": 0 - }, - "root_hash": { - "description": "The log's Merkle tree root hash in base64.", - "type": "string", - "minLength": 44, - "maxLength": 44 - }, - "signatures": { - "description": "A list of signer-signature pairs.", - "type": "array", - "items": { - "description": "A signer-signature pair.", - "type": "object", - "required": [ "key_hash", "signature" ], - "properties": { - "key_hash": { - "description": "A public verification-key hash that identifies the signer in base64.", - "type": "string", - "minLength": 44, - "maxLength": 44 - }, - "signature": { - "description": "The signer's signature over the log's tree_leaf structure in base64.", - "type": "string", - "minLength": 44, - "maxLength": 44 - } - } - }, - "minItems": 1 - } - } -} -- cgit v1.2.3 From af5b14b9e9f85fe15253fbdb48945a302f0b7bec Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Tue, 11 May 2021 14:05:31 +0200 Subject: signatures are 64 octets Spotted by Rasmus. --- doc/api.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index c6a4569..4f43d2c 100644 --- a/doc/api.md +++ b/doc/api.md @@ -109,7 +109,7 @@ struct message { struct tree_leaf { struct message; - u8 signature_over_message[32]; + u8 signature_over_message[64]; u8 key_hash[32]; } ``` -- cgit v1.2.3 From 6ab06df1cd3dca8f4367ee009dde77a7b2fb79b1 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Wed, 12 May 2021 16:24:05 +0200 Subject: added a first take on claimant model There might be a few inconsistencies and errors. To be discussed! --- doc/claimant.md | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 84 insertions(+) create mode 100644 doc/claimant.md (limited to 'doc') diff --git a/doc/claimant.md b/doc/claimant.md new file mode 100644 index 0000000..2aeebf0 --- /dev/null +++ b/doc/claimant.md @@ -0,0 +1,84 @@ +# Claimant model +## **SystemCHECKSUM**: +SystemCHECKSUM is about the claims made by a _data publisher_. +* **ClaimCHECKSUM**: + _I, data publisher, claim that the data_: + 1. has cryptographic hash X + 2. can be located using X as an identifier + 3. has properties Y (_"ecosystem specific_") +* **StatementCHECKSUM**: signed checksum
+* **ClaimantCHECKSUM**: data publisher
+ The data publisher is a party that wants to publish some data to an + end-user. +* **BelieverCHECKSUM**: end-user
+ Belief is based on seeing a valid StatementCHECKSUM. +* **VerifierCHECKSUM**: any interested party
+ These parties try to verify the above claims. For example: + * the data publisher itself (_"has my identity been compromised?"_) + * third-parties that want to look further into the data (_"ecosystem + specific_") +* **ArbiterCHECKSUM**:
+ There's no official body. Invalidated claims would affect reputation. + +**Example.** +The published data could be an executable binary from a reproducible build. The +ecosystem-specific claim would be that the corresponding source code can be +looked-up in a public database using X as an identifier. A rebuilder would +verify this claim by compiling the source, comparing the hashed output to the +claimed value. + +## **SystemCHECKSUM-LOG**: +SystemCHECKSUM-LOG is about the claims made by a _log operator_. +It adds _discoverability_ into SystemCHECKSUM. Discoverability means +that VerifierCHECKSUM can see all StatementCHECKSUM that +BelieverCHECKSUM will accept. + +* **ClaimCHECKSUM-LOG**: + _I, log operator, make available:_ + 1. a globally consistent append-only log of StatementCHECKSUM +* **StatementCHECKSUM-LOG**: signed tree head +* **ClaimantCHECKSUM-LOG**: log operator
+ Possible operators might be: + * a small subset of data publishers + * members of relevant consortia +* **BelieverCHECKSUM-LOG**: + BelieverCHECKSUM and + VerifierCHECKSUM
+ Belief is based on two factors: + 1. seeing a valid StatementCHECKSUM-LOG + 2. seeing a number of valid StatementCHECKSUM-WITNESS from + independent instances on SystemCHECKSUM-WITNESS + + A _policy_ defines the exact conditions that must be met. +* **VerifierCHECKSUM-LOG**: SystemCHECKSUM-WITNESS
+ Witnesses verify the log's append-only property from their own local + vantage point(s). +* **ArbiterCHECKSUM-LOG**:
+ There is no official body. The ecosystem at large should stop using an + instance of SystemCHECKSUM-LOG if cryptographic proofs of log + misbehavior are preseneted by some VerifierCHECKSUM-LOG. + +## **SystemCHECKSUM-WITNESS**: +SystemCHECKSUM-WITNESS is about making the claims of a log operator +_trustworthy_. +* **ClaimCHECKSUM-WITNESS**: + _I, witness, claim that_: + 1. SystemCHECKSUM-LOG provides a locally consistent append-only + log +* **StatementCHECKSUM-WITNESS**: signed tree head +* **ClaimantCHECKSUM-WITNESS**: third party
+ Examples of parties that may take on this role include: + * members of relevant consortia + * non-profits and other reputable organizations + * security enthusiasts and researchers + * log operators (cross-ecosystem) + * monitors (cross-ecosystem) + * a small subset of data publishers (cross-ecosystem) +* **BelieverCHECKSUM-WITNESS**: + BelieverCHECKSUM and + VerifierCHECKSUM
+ Belief is based on seeing a valid StatementCHECKSUM-WITNESS. +* **VerifierCHECKSUM-WITNESS**: n/a
+ Witnesses are trusted parties. Security is based on _strength in numbers_. +* **ArbiterCHECKSUM-WITNESS**:
+ There is no official body. Invalidated claims would affect reputation. -- cgit v1.2.3 From caf91fa52c192c188adb14a81219602628d46d9d Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Wed, 12 May 2021 16:32:18 +0200 Subject: fixed spacing typos --- doc/claimant.md | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) (limited to 'doc') diff --git a/doc/claimant.md b/doc/claimant.md index 2aeebf0..c10e657 100644 --- a/doc/claimant.md +++ b/doc/claimant.md @@ -27,7 +27,7 @@ looked-up in a public database using X as an identifier. A rebuilder would verify this claim by compiling the source, comparing the hashed output to the claimed value. -## **SystemCHECKSUM-LOG**: +## **SystemCHECKSUM-LOG**: SystemCHECKSUM-LOG is about the claims made by a _log operator_. It adds _discoverability_ into SystemCHECKSUM. Discoverability means that VerifierCHECKSUM can see all StatementCHECKSUM that @@ -47,9 +47,7 @@ BelieverCHECKSUM will accept. Belief is based on two factors: 1. seeing a valid StatementCHECKSUM-LOG 2. seeing a number of valid StatementCHECKSUM-WITNESS from - independent instances on SystemCHECKSUM-WITNESS - - A _policy_ defines the exact conditions that must be met. + independent instances on SystemCHECKSUM-WITNESS. * **VerifierCHECKSUM-LOG**: SystemCHECKSUM-WITNESS
Witnesses verify the log's append-only property from their own local vantage point(s). @@ -58,7 +56,7 @@ BelieverCHECKSUM will accept. instance of SystemCHECKSUM-LOG if cryptographic proofs of log misbehavior are preseneted by some VerifierCHECKSUM-LOG. -## **SystemCHECKSUM-WITNESS**: +## **SystemCHECKSUM-WITNESS**: SystemCHECKSUM-WITNESS is about making the claims of a log operator _trustworthy_. * **ClaimCHECKSUM-WITNESS**: -- cgit v1.2.3 From 540306404d792ed7387ab0d8ca63632e7750aed3 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Thu, 13 May 2021 12:33:09 +0200 Subject: added claimant model, take 2 There might be some inconsistencies and errors. To be discussed! --- doc/claimant.md | 57 ++++++++++++++++++++++++++++++++++----------------------- 1 file changed, 34 insertions(+), 23 deletions(-) (limited to 'doc') diff --git a/doc/claimant.md b/doc/claimant.md index c10e657..b98f2ad 100644 --- a/doc/claimant.md +++ b/doc/claimant.md @@ -1,37 +1,48 @@ # Claimant model -## **SystemCHECKSUM**: -SystemCHECKSUM is about the claims made by a _data publisher_. +## **SystemCHECKSUM** +SystemCHECKSUM is about the claims made by a data publisher. * **ClaimCHECKSUM**: _I, data publisher, claim that the data_: 1. has cryptographic hash X - 2. can be located using X as an identifier - 3. has properties Y (_"ecosystem specific_") + 2. is produced by no-one but myself * **StatementCHECKSUM**: signed checksum
* **ClaimantCHECKSUM**: data publisher
The data publisher is a party that wants to publish some data to an end-user. * **BelieverCHECKSUM**: end-user
Belief is based on seeing a valid StatementCHECKSUM. -* **VerifierCHECKSUM**: any interested party
- These parties try to verify the above claims. For example: - * the data publisher itself (_"has my identity been compromised?"_) - * third-parties that want to look further into the data (_"ecosystem - specific_") +* **VerifierCHECKSUM**: data publisher
+ The data publisher tries to detect unwanted statements. * **ArbiterCHECKSUM**:
There's no official body. Invalidated claims would affect reputation. -**Example.** -The published data could be an executable binary from a reproducible build. The -ecosystem-specific claim would be that the corresponding source code can be -looked-up in a public database using X as an identifier. A rebuilder would -verify this claim by compiling the source, comparing the hashed output to the -claimed value. +SystemCHECKSUM\* can be defined to make more specific claims. Below +is a reproducible builds example. + +### **SystemCHECKSUM-RB**: +SystemCHECKSUM-RB is about the claims made by a _software publisher_ +that makes reproducible builds available. +* **ClaimCHECKSUM-RB**: + _I, software publisher, claim that the data_: + 1. has cryptographic hash X + 2. is the output of a reproducible build for which the source can be located + using X as an identifier +* **StatementCHECKSUM-RB**: StatementCHECKSUM +* **ClaimantCHECKSUM-RB**: software publisher
+* **BelieverCHECKSUM-RB**: end-user
+ Belief is based on seeing a valid StatementCHECKSUM-RB. +* **VerifierCHECKSUM-RB**: any interested party
+ These parties try to verify the above claims. For example: + * the software publisher itself (_"has my identity been compromised?"_) + * rebuilders that check for locatability and reproducibility +* **ArbiterCHECKSUM-RB**:
+ There's no official body. Invalidated claims would affect reputation. ## **SystemCHECKSUM-LOG**: SystemCHECKSUM-LOG is about the claims made by a _log operator_. -It adds _discoverability_ into SystemCHECKSUM. Discoverability means -that VerifierCHECKSUM can see all StatementCHECKSUM that -BelieverCHECKSUM will accept. +It adds _discoverability_ into SystemCHECKSUM\*. Discoverability +means that VerifierCHECKSUM\* can see all +StatementCHECKSUM that BelieverCHECKSUM\* accept. * **ClaimCHECKSUM-LOG**: _I, log operator, make available:_ @@ -42,12 +53,12 @@ BelieverCHECKSUM will accept. * a small subset of data publishers * members of relevant consortia * **BelieverCHECKSUM-LOG**: - BelieverCHECKSUM and - VerifierCHECKSUM
+ BelieverCHECKSUM\* and + VerifierCHECKSUM\*
Belief is based on two factors: 1. seeing a valid StatementCHECKSUM-LOG 2. seeing a number of valid StatementCHECKSUM-WITNESS from - independent instances on SystemCHECKSUM-WITNESS. + independent instances of SystemCHECKSUM-WITNESS. * **VerifierCHECKSUM-LOG**: SystemCHECKSUM-WITNESS
Witnesses verify the log's append-only property from their own local vantage point(s). @@ -73,8 +84,8 @@ _trustworthy_. * monitors (cross-ecosystem) * a small subset of data publishers (cross-ecosystem) * **BelieverCHECKSUM-WITNESS**: - BelieverCHECKSUM and - VerifierCHECKSUM
+ BelieverCHECKSUM\* and + VerifierCHECKSUM\*
Belief is based on seeing a valid StatementCHECKSUM-WITNESS. * **VerifierCHECKSUM-WITNESS**: n/a
Witnesses are trusted parties. Security is based on _strength in numbers_. -- cgit v1.2.3 From 5a780e8cd56509218123671be5826cbd2f0e8d2c Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Thu, 13 May 2021 16:00:37 +0200 Subject: added claimant model, take 3 --- doc/claimant.md | 52 +++++++++++++++------------------------------------- 1 file changed, 15 insertions(+), 37 deletions(-) (limited to 'doc') diff --git a/doc/claimant.md b/doc/claimant.md index b98f2ad..6728fef 100644 --- a/doc/claimant.md +++ b/doc/claimant.md @@ -7,12 +7,11 @@ SystemCHECKSUM is about the claims made by a data publisher. 2. is produced by no-one but myself * **StatementCHECKSUM**: signed checksum
* **ClaimantCHECKSUM**: data publisher
- The data publisher is a party that wants to publish some data to an - end-user. + The data publisher is a party that wants to publish some data. * **BelieverCHECKSUM**: end-user
- Belief is based on seeing a valid StatementCHECKSUM. + The end-user is a party that wants to use some published data. * **VerifierCHECKSUM**: data publisher
- The data publisher tries to detect unwanted statements. + Only the data publisher can verify the above claims. * **ArbiterCHECKSUM**:
There's no official body. Invalidated claims would affect reputation. @@ -29,8 +28,11 @@ that makes reproducible builds available. using X as an identifier * **StatementCHECKSUM-RB**: StatementCHECKSUM * **ClaimantCHECKSUM-RB**: software publisher
+ The software publisher is a party that wants to publish the output of a + reproducible build. * **BelieverCHECKSUM-RB**: end-user
- Belief is based on seeing a valid StatementCHECKSUM-RB. + The end-user is a party that wants to run an executable binary that built + reproducibly. * **VerifierCHECKSUM-RB**: any interested party
These parties try to verify the above claims. For example: * the software publisher itself (_"has my identity been compromised?"_) @@ -53,41 +55,17 @@ StatementCHECKSUM that BelieverCHECKSUM\* accept. * a small subset of data publishers * members of relevant consortia * **BelieverCHECKSUM-LOG**: - BelieverCHECKSUM\* and - VerifierCHECKSUM\*
- Belief is based on two factors: - 1. seeing a valid StatementCHECKSUM-LOG - 2. seeing a number of valid StatementCHECKSUM-WITNESS from - independent instances of SystemCHECKSUM-WITNESS. -* **VerifierCHECKSUM-LOG**: SystemCHECKSUM-WITNESS
- Witnesses verify the log's append-only property from their own local - vantage point(s). -* **ArbiterCHECKSUM-LOG**:
- There is no official body. The ecosystem at large should stop using an - instance of SystemCHECKSUM-LOG if cryptographic proofs of log - misbehavior are preseneted by some VerifierCHECKSUM-LOG. - -## **SystemCHECKSUM-WITNESS**: -SystemCHECKSUM-WITNESS is about making the claims of a log operator -_trustworthy_. -* **ClaimCHECKSUM-WITNESS**: - _I, witness, claim that_: - 1. SystemCHECKSUM-LOG provides a locally consistent append-only - log -* **StatementCHECKSUM-WITNESS**: signed tree head -* **ClaimantCHECKSUM-WITNESS**: third party
- Examples of parties that may take on this role include: + * BelieverCHECKSUM\* + * VerifierCHECKSUM\*
+* **VerifierCHECKSUM-LOG**: third parties
+ These parties verify the above claims. Examples include: * members of relevant consortia * non-profits and other reputable organizations * security enthusiasts and researchers * log operators (cross-ecosystem) * monitors (cross-ecosystem) * a small subset of data publishers (cross-ecosystem) -* **BelieverCHECKSUM-WITNESS**: - BelieverCHECKSUM\* and - VerifierCHECKSUM\*
- Belief is based on seeing a valid StatementCHECKSUM-WITNESS. -* **VerifierCHECKSUM-WITNESS**: n/a
- Witnesses are trusted parties. Security is based on _strength in numbers_. -* **ArbiterCHECKSUM-WITNESS**:
- There is no official body. Invalidated claims would affect reputation. +* **ArbiterCHECKSUM-LOG**:
+ There is no official body. The ecosystem at large should stop using an + instance of SystemCHECKSUM-LOG if cryptographic proofs of log + misbehavior are preseneted by some VerifierCHECKSUM-LOG. -- cgit v1.2.3 From 533f683ef1ae999c2fdc0086cbc3de4e675d1e33 Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Tue, 25 May 2021 11:26:32 +0200 Subject: use POST for requests with input data The major argument for moving input data from HTTP headers in GET requests to body of POST's is that we define the protocol ourselves without any dependencies on HTTP and can make it even simpler to parse. --- doc/api.md | 52 ++++++++++++++++++++++++++++++++-------------------- 1 file changed, 32 insertions(+), 20 deletions(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index 4f43d2c..a998d70 100644 --- a/doc/api.md +++ b/doc/api.md @@ -11,11 +11,9 @@ This is a work-in-progress document that may be moved or modified. ## Overview The log implements an HTTP(S) API: -- Requests to the log use the HTTP GET method. -- Input data (in requests) and output data (in responses) are - expressed as ASCII-encoded key/value pairs. -- Requests use HTTP entity headers for input data while responses use - the HTTP message body for output data. +- Input data in requests and output data in responses are expressed as + ASCII-encoded key/value pairs. +- Requests with input data use POST to send the data to the log. - Binary data is hex-encoded before being transmitted. The motivation for using a text based key/value format for request and @@ -136,21 +134,17 @@ constraint is that it must be a valid HTTP(S) URL that can have the URL could be `https://log.example.com/2021/st/v0/get-signed-tree-head`. -Input data (in requests) is sent as ASCII key/value pairs as HTTP -entity headers, with their keys prefixed with the string -`stlog-`. Example: For sending `treee_size=4711` as input a client -would send the HTTP header `stlog-tree_size: 4711`. +Input data (in requests) is POST:ed in the HTTP message body as ASCII +key/value pairs. Output data (in replies) is sent in the HTTP message body in the same format as the input data, i.e. as ASCII key/value pairs on the format -`Key: Value`. Example: For sending `tree_size=4711` as output a log -would send an HTTP message body consisting of `stlog-tree_size: 4711`. +`Key=Value` The HTTP status code is 200 OK to indicate success. A different HTTP -status code is used to indicate failure. The log should set the value -value for the key `error` to a human-readable string describing what -went wrong. For example, `error: invalid signature`, `error: rate -limit exceeded`, or `error: unknown leaf hash`. +status code is used to indicate failure, in which case the log should +respond with a human-readable string describing what went wrong using +the key `error`. Example: `error=Invalid signature.`. ### get-tree-head-cosigned Returns the latest cosigned tree head. Used together with @@ -237,7 +231,7 @@ There is exactly one `signature` and one `key_hash` field. The ### get-proof-by-hash ``` -GET /st/v0/get-proof-by-hash +POST /st/v0/get-proof-by-hash ``` Input: @@ -260,9 +254,12 @@ other words, `SHA256(0x00 | tree_leaf)`. proof of zero or more node hashes. The order of node hashes follow from the hash strategy, see RFC 6962. +Example: `echo "leaf_hash=241fd4538d0a35c2d0394e4710ea9e6916854d08f62602fb03b55221dcdac90f +tree_size=4711" | curl --data-binary @- localhost/st/v0/get-proof-by-hash` + ### get-consistency-proof ``` -GET /st/v0/get-consistency-proof +POST /st/v0/get-consistency-proof ``` Input: @@ -283,9 +280,12 @@ Output on success: consistency proof of zero or more node hashes. The order of node hashes follow from the hash strategy, see RFC 6962. +Example: `echo "new_size=4711 +old_size=42" | curl --data-binary @- localhost/st/v0/get-consistency-proof` + ### get-leaves ``` -GET /st/v0/get-leaves +POST /st/v0/get-leaves ``` Input: @@ -309,9 +309,12 @@ match. The log may return fewer leaves than requested. At least one leaf must be returned on HTTP status code 200 OK. +Example: `echo "start_size=42 +end_size=4711" | curl --data-binary @- localhost/st/v0/get-leaves` + ### add-leaf ``` -GET /st/v0/add-leaf +POST /st/v0/add-leaf ``` Input: @@ -349,9 +352,15 @@ inclusion proof is available. An inclusion proof should not be relied upon unless it leads up to a trustworthy signed tree head. Witness cosigning can make a tree head trustworthy. +Example: `echo "shard_hint=1640995200 +checksum=cfa2d8e78bf273ab85d3cef7bde62716261d1e42626d776f9b4e6aae7b6ff953 +signature_over_message=c026687411dea494539516ee0c4e790c24450f1a4440c2eb74df311ca9a7adf2847b99273af78b0bda65dfe9c4f7d23a5d319b596a8881d3bc2964749ae9ece3 +verification_key=c9a674888e905db1761ba3f10f3ad09586dddfe8581964b55787b44f318cbcdf +domain_hint=example.com" | curl --data-binary @- localhost/st/v0/add-leaf` + ### add-cosignature ``` -GET /st/v0/add-cosignature +POST /st/v0/add-cosignature ``` Input: @@ -369,6 +378,9 @@ head. A key-hash, rather than the full verification key, is used to motivate verifiers to locate the appropriate key and make an explicit trust decision. +Example: `echo "signature=d1b15061d0f287847d066630339beaa0915a6bbb77332c3e839a32f66f1831b69c678e8ca63afd24e436525554dbc6daa3b1201cc0c93721de24b778027d41af +key_hash=662ce093682280f8fbea9939abe02fdba1f0dc39594c832b411ddafcffb75b1d" | curl --data-binary @- localhost/st/v0/add-cosignature` + ## Summary of log parameters - **Public key**: an Ed25519 verification key that can be used to verify the log's tree head signatures. -- cgit v1.2.3 From 8822e78af9fb67dc9280de08c2758350a862b8ab Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Tue, 25 May 2021 12:14:45 +0200 Subject: replace some of "the log" and other rephrasing --- doc/api.md | 49 ++++++++++++++++++++++++++----------------------- 1 file changed, 26 insertions(+), 23 deletions(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index a998d70..beda293 100644 --- a/doc/api.md +++ b/doc/api.md @@ -9,11 +9,12 @@ It can be found This is a work-in-progress document that may be moved or modified. ## Overview -The log implements an HTTP(S) API: +Logs implement an HTTP(S) API for accepting requests and sending +responses. - Input data in requests and output data in responses are expressed as ASCII-encoded key/value pairs. -- Requests with input data use POST to send the data to the log. +- Requests with input data use HTTP POST to send the data to a log. - Binary data is hex-encoded before being transmitted. The motivation for using a text based key/value format for request and @@ -27,12 +28,12 @@ wire-format in use by the Tor project. ## Primitives ### Cryptography -The log uses the same Merkle tree hash strategy as +Logs use the same Merkle tree hash strategy as [RFC 6962,§2](https://tools.ietf.org/html/rfc6962#section-2). The hash functions must be [SHA256](https://csrc.nist.gov/csrc/media/publications/fips/180/4/final/documents/fips180-4-draft-aug2014.pdf). -The log must sign tree heads using -[Ed25519](https://tools.ietf.org/html/rfc8032). The log's witnesses +Logs must sign tree heads using +[Ed25519](https://tools.ietf.org/html/rfc8032). Log witnesses must also sign tree heads using Ed25519. All other parts that are not Merkle tree related also use SHA256 as @@ -73,7 +74,7 @@ you may use it though. The main point of using Trunnel is that it makes a simple format explicit and unambiguous. #### Merkle tree head -Tree heads are signed by the log and its witnesses. It contains a +Tree heads are signed both by a log and its witnesses. It contains a timestamp, a tree size, and a root hash. The timestamp is included so that monitors can ensure _liveliness_. It is the time since the UNIX epoch (January 1, 1970 00:00 UTC) in seconds. The tree size @@ -93,7 +94,7 @@ not cosign a tree head if it is inconsistent with prior history or if the timestamp is backdated or future-dated more than 12 hours. #### Merkle tree leaf -The log supports a single leaf type. It contains a shard hint, a +Logs support a single leaf type. It contains a shard hint, a checksum over whatever the submitter wants to log a checksum for, a signature that the submitter computed over the shard hint and the checksum, and a hash of the submitter's public verification key, that @@ -113,8 +114,8 @@ struct tree_leaf { ``` `message` is composed of the `shard_hint`, chosen by the submitter to -match the shard interval for the log, and the submitter's `checksum` -to be logged. +match the shard interval for the log it's submitting to, and the +submitter's `checksum` to be logged. `signature_over_message` is a signature over `message`, using the submitter's verification key. It must be possible to verify the @@ -142,13 +143,13 @@ format as the input data, i.e. as ASCII key/value pairs on the format `Key=Value` The HTTP status code is 200 OK to indicate success. A different HTTP -status code is used to indicate failure, in which case the log should +status code is used to indicate failure, in which case a log should respond with a human-readable string describing what went wrong using the key `error`. Example: `error=Invalid signature.`. ### get-tree-head-cosigned Returns the latest cosigned tree head. Used together with -`get-proof-by-hash` and `get-consistency-proof` for verifying the log. +`get-proof-by-hash` and `get-consistency-proof` for verifying the tree. ``` GET /st/v0/get-tree-head-cosigned @@ -306,7 +307,7 @@ value in each list refers to the first leaf, the second value in each list refers to the second leaf, etc. The size of each list must match. -The log may return fewer leaves than requested. At least one leaf +A log may return fewer leaves than requested. At least one leaf must be returned on HTTP status code 200 OK. Example: `echo "start_size=42 @@ -340,11 +341,11 @@ match a hash over `verification_key`. The submission may also not be accepted if the second-level domain name exceeded its rate limit. By coupling every add-leaf request to -a second-level domain, it becomes more difficult to spam the log. You +a second-level domain, it becomes more difficult to spam logs. You would need an excessive number of domain names. This becomes costly if free domain names are rejected. -The log does not publish domain-name to key bindings because key +Logs don't publish domain-name to key bindings because key management is more complex than that. Public logging should not be assumed to have happened until an @@ -373,7 +374,7 @@ Input: Output on success: - None -`key_hash` can be used to identify which witness signed the log's tree +`key_hash` can be used to identify which witness signed the tree head. A key-hash, rather than the full verification key, is used to motivate verifiers to locate the appropriate key and make an explicit trust decision. @@ -382,11 +383,13 @@ Example: `echo "signature=d1b15061d0f287847d066630339beaa0915a6bbb77332c3e839a32 key_hash=662ce093682280f8fbea9939abe02fdba1f0dc39594c832b411ddafcffb75b1d" | curl --data-binary @- localhost/st/v0/add-cosignature` ## Summary of log parameters -- **Public key**: an Ed25519 verification key that can be used to - verify the log's tree head signatures. -- **Log identifier**: the hashed public verification key using SHA256. -- **Shard interval**: the time during which the log accepts logging - requests. The shard interval's start and end are inclusive and - expressed as the number of seconds since the UNIX epoch. -- **Base URL**: where the log can be reached over HTTP(S). It is the - prefix before a version-0 specific endpoint. +- **Public key**: The Ed25519 verification key to be used for + verifying tree head signatures. +- **Log identifier**: The public verification key `Public key` hashed + using SHA256. +- **Shard interval start**: The earliest time at which logging + requests are accepted as the number of seconds since the UNIX epoch. +- **Shard interval end**: The latest time at which logging + requests are accepted as the number of seconds since the UNIX epoch. +- **Base URL**: Where the log can be reached over HTTP(S). It is the + prefix to be used to construct a version 0 specific endpoint. -- cgit v1.2.3 From e374db9e70cd329ff46f1a4443c59a8fa118ddd6 Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Fri, 28 May 2021 11:44:39 +0200 Subject: use a proper endpoint in example --- doc/api.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index beda293..92344c5 100644 --- a/doc/api.md +++ b/doc/api.md @@ -133,7 +133,7 @@ Every log has a base URL that identifies it uniquely. The only constraint is that it must be a valid HTTP(S) URL that can have the `/st/v0/` suffix appended. For example, a complete endpoint URL could be -`https://log.example.com/2021/st/v0/get-signed-tree-head`. +`https://log.example.com/2021/st/v0/get-tree-head-cosigned`. Input data (in requests) is POST:ed in the HTTP message body as ASCII key/value pairs. -- cgit v1.2.3 From fe2e20f346e5f8a66c92016d77f32241498b790e Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Fri, 28 May 2021 11:44:54 +0200 Subject: clarify what the signature in get-tree-head-* is covering --- doc/api.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) (limited to 'doc') diff --git a/doc/api.md b/doc/api.md index 92344c5..57ad119 100644 --- a/doc/api.md +++ b/doc/api.md @@ -163,8 +163,9 @@ Output on success: seconds since the UNIX epoch. - `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number. - `root_hash`: `tree_head.root_hash` hex-encoded. -- `signature`: hex-encoded Ed25519 signature over `tree_head` - serialzed as described in section `Merkle tree head`. +- `signature`: hex-encoded Ed25519 signature over `timestamp`, + `tree_size` and `root_hash` serialized into a `tree_head` as + described in section `Merkle tree head`. - `key_hash`: a hash of the public verification key (belonging to either the log or to one of its witnesses), which can be used to verify the most recent `signature`. The key is encoded as defined @@ -192,8 +193,9 @@ Output on success: seconds since the UNIX epoch. - `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number. - `root_hash`: `tree_head.root_hash` hex-encoded. -- `signature`: hex-encoded Ed25519 signature over `tree_head` - serialzed as described in section `Merkle tree head`. +- `signature`: hex-encoded Ed25519 signature over `timestamp`, + `tree_size` and `root_hash` serialized into a `tree_head` as + described in section `Merkle tree head`. - `key_hash`: a hash of the log's public verification key, which can be used to verify `signature`. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), @@ -219,8 +221,9 @@ Output on success: seconds since the UNIX epoch. - `tree_size`: `tree_head.tree_size` ASCII-encoded decimal number. - `root_hash`: `tree_head.root_hash` hex-encoded. -- `signature`: hex-encoded Ed25519 signature over `tree_head` - serialzed as described in section `Merkle tree head`. +- `signature`: hex-encoded Ed25519 signature over `timestamp`, + `tree_size` and `root_hash` serialized into a `tree_head` as + described in section `Merkle tree head`. - `key_hash`: a hash of the log's public verification key that can be used to verify `signature`. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2), -- cgit v1.2.3