diff options
| author | Rasmus Dahlberg <rasmus.dahlberg@kau.se> | 2021-04-20 12:28:28 +0200 | 
|---|---|---|
| committer | Rasmus Dahlberg <rasmus.dahlberg@kau.se> | 2021-04-20 12:28:28 +0200 | 
| commit | 24cc6b0db8ef9c718925d14b329f21938e5d2b1b (patch) | |
| tree | ecf078b59ea10d8212615dbfc4f0879c3d6560a0 /doc | |
| parent | f3134997ccbb525cd09a8144ed6daeeb3245326a (diff) | |
started on our in-progress (re)design documents
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/api.md | 247 | ||||
| -rw-r--r-- | doc/design.md | 32 | ||||
| -rw-r--r-- | doc/formats.md | 160 | ||||
| -rw-r--r-- | doc/schema/consistency_proof.schema.json | 30 | ||||
| -rw-r--r-- | doc/schema/example/consistency_proof.json | 7 | ||||
| -rw-r--r-- | doc/schema/example/inclusion_proof.json | 7 | ||||
| -rw-r--r-- | doc/schema/example/leaves.json | 14 | ||||
| -rw-r--r-- | doc/schema/example/sth.json | 11 | ||||
| -rw-r--r-- | doc/schema/inclusion_proof.schema.json | 30 | ||||
| -rw-r--r-- | doc/schema/leaves.schema.json | 38 | ||||
| -rw-r--r-- | doc/schema/sth.schema.json | 50 | ||||
| -rw-r--r-- | doc/sketch.md | 372 | 
12 files changed, 466 insertions, 532 deletions
| diff --git a/doc/api.md b/doc/api.md new file mode 100644 index 0000000..760663b --- /dev/null +++ b/doc/api.md @@ -0,0 +1,247 @@ +# System Transparency Logging: API v0 +This document describes details of the System Transparency logging API, +version 0.  The broader picture is not explained here.  We assume that you have +read the System Transparency design document.  It can be found [here](https://github.com/system-transparency/stfe/blob/design/doc/design.md). + +**Warning.** +This is a work-in-progress document that may be moved or modified. + +## Overview +The log implements an HTTP(S) API: +- Requests that add data to the log use the HTTP POST method.  The HTTP content +type is `application/x-www-form-urlencoded`.  The posted data are key-value +pairs.  Binary data must be base64-encoded. +- Requests that retrieve data from the log use the HTTP GET method.  The HTTP +content type is `application/x-www-form-urlencoded`.  Input parameters are +key-value pairs. +- Responses are JSON objects.  The HTTP content type is `application/json`. +- Error messages are human-readable strings.  The HTTP content type is +`text/plain`. + +We decided to use these web formats for requests and responses because the log +is running as an HTTP(S) service.  In other words, anyone that interacts with +the log is most likely using these formats already.  The other benefit is that +all requests and responses are human-readable.  This makes it easier to +understand the protocol, troubleshoot issues, and copy-paste.  We favored +compatibility and understandability over a wire-efficient format. + +Note that we are not using JSON for signed and/or logged data.  In other words, +a submitter that wishes to distribute log responses to their user base in a +different format may do so.  The forced (de)serialization parser on _end-users_ +is a small subset of Trunnel.  Trunnel is an "idiot-proof" wire-format that the +Tor project uses. + +## Primitives +### Cryptography +The log uses the same Merkle tree hash strategy as [RFC 6962, §2](https://tools.ietf.org/html/rfc6962#section-2). +The hash functions must be [SHA256](https://csrc.nist.gov/csrc/media/publications/fips/180/4/final/documents/fips180-4-draft-aug2014.pdf). +The log must sign tree heads using [Ed25519](https://tools.ietf.org/html/rfc8032). +The log's witnesses must also sign tree heads using Ed25519. + +All other parts that are not Merkle tree related also use SHA256 as the hash +function.  Using more than one hash function would increases the overall attack +surface: two hash functions must be collision resistant instead of one. + +We recommend that submitters sign using Ed25519.  We also support RSA with +[deterministic](https://tools.ietf.org/html/rfc8017#section-8.2) +or [probabilistic](https://tools.ietf.org/html/rfc8017#section-8.1) +padding.  Supporting RSA is suboptimal, but excluding it would make the log +useless for many possible adopters. + +### Serialization +We use the [Trunnel](https://gitweb.torproject.org/trunnel.git) [description language](https://www.seul.org/~nickm/trunnel-manual.html) +to define (de)serialization of data structures that need to be signed or +inserted into the Merkle tree.  Trunnel is more expressive than the +[SSH wire format](https://tools.ietf.org/html/rfc4251#section-5). +It is about as expressive as the [TLS presentation language](https://tools.ietf.org/html/rfc8446#section-3). +A notable difference is that Trunnel supports integer constraints.  The Trunnel +language is also readable by humans _and_ machines.  "Obviously correct code" +can be generated in C and Go. + +A fair summary of our Trunnel usage is as follows. + +All integers are 64-bit, unsigned, and in network byte order.  A fixed-size byte +array is put into the serialization buffer in-order, starting from the first +byte.  These basic types are concatenated to form a collection.  You should not +need a general-purpose Trunnel (de)serialization parser to work with this +format.  If you have one, you may use it though.  The main point of using +Trunnel is that it makes a simple format explicit and unambiguous. + +TODO: URL-encode _or_ JSON?  I think we should only need one.  Always doing HTTP +POST would also ensure that input parameters don't show up in web server logs. + +#### Merkle tree head +Tree heads are signed by the log and its witnesses.  It contains a timestamp, a +tree size, and a root hash.  The timestamp is included so that monitors can +ensure _liveliness_.  It is the time since the UNIX epoch (January 1, 1970 +00:00:00 UTC) in milliseconds.  The tree size specifies the current number of +leaves.  The root hash fixes the structure and content of the Merkle tree. + +``` +struct tree_head { +	u64 timestamp; +	u64 tree_size; +	u8 root_hash[32]; +}; +``` + +The serialized tree head must be signed using Ed25519.  A witness must only sign +the log's tree head if it is consistent with prior history and the timestamp is +roughly correct.  A timestamp is roughly correct if it is not backdated or +future-dated more than 12 hours. + +#### Merkle tree leaf +The log supports a single leaf type.  It contains a checksum, a signature +scheme, a signature that the submitter computed over that checksum, and the hash +of the public verification key that can be used to verify the signature. + +``` +const ALG_ED25519 = 1; // RFC 8032 +const ALG_RSASSA_PKCS1_V1_5 = 2; // RFC 8017 +const ALG_RSASSA_PSS = 3; // RFC 8017 + +struct tree_leaf { +	u8 checksum[32]; +	u64 signature_scheme IN [ +		ALG_ED25519, +		ALG_RSASSA_PKCS1_V1_5, +		ALG_RSASSA_PSS, +	]; +	union signature[signature_scheme] { +		ALG_ED25519: u8 ed25519[32]; +		default:     u8 rsa[512]; +	} +	u8 key_hash[32]; +} +``` + +A key-hash is included in the leaf so that it can be attributed to the signing +entity.  A hash, rather than the full public verification key, is used to force +the verifier to locate the appropriate key and make an explicit trust decision. + +## Public endpoints +Every log has a base URL that identifies it uniquely.  The only constraint is +that it must be a valid HTTP(S) URL that can have the `/st/v0/<endpoint>` suffix +appended.  For example, a complete endpoint URL could be +`https://log.example.com/2021/st/v0/get-signed-tree-head`. + +### get-signed-tree-head +``` +GET <base url>/st/v0/get-signed-tree-head +``` + +Input key-value pairs: +- `type`: either the string "latest", "stable", or "cosigned". +	- "latest": ask for the most recent signed tree head. +	- "stable": ask for a recent signed tree head that is fixed for some period +	  of time. +	- "cosigned": ask for a recent cosigned tree head. + +Output: +- On success: status 200 OK and a signed tree head.  The response body is +defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/sth.schema.json). +- On failure: a different status code and a human-readable error message. + +### get-proof-by-hash +``` +POST <base url>/st/v0/get-proof-by-hash +``` + +Input key-value pairs: +- `leaf_hash`: a base64-encoded leaf hash that identifies which `tree_leaf` the +log should prove inclusion for.  The leaf hash is computed using the RFC 6962 +hashing strategy.  In other words, `H(0x00 | tree_leaf)`. +- `tree_size`: the tree size of a tree head that the proof should be based on. + +Output: +- On success: status 200 OK and an inclusion proof.  The response body is +defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/inclusion_proof.schema.json). +- On failure: a different status code and a human-readable error message. + +### get-consistency-proof +``` +POST <base url>/st/v0/get-consistency-proof +``` + +Input key-value pairs: +- `new_size`: the tree size of a newer tree head. +- `old_size`: the tree size of an older tree head that the log should prove is +consistent with the newer tree head. + +Output: +- On success: status 200 OK and a consistency proof.  The response body is +defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/consistency_proof.schema.json). +- On failure: a different status code and a human-readable error message. + +### get-leaves +``` +POST <base url>/st/v0/get-leaves +``` + +Input key-value pairs: +- `start_size`: zero-based index of the first leaf to retrieve. +- `end_size`: index of the last leaf to retrieve. + +Output: +- On success: status 200 OK and a list of leaves.  The response body is +defined by the following [schema](https://github.com/system-transparency/stfe/blob/design/doc/schema/leaves.schema.json). +- On failure: a different status code and a human-readable error message. + +The log may truncate the list of returned leaves.  However, it must not be an +empty list on success.  + +### add-leaf +``` +POST <base url>/st/v0/add-leaf +``` + +Input key-value pairs: +- `leaf_checksum`: the checksum that the submitter wants to log in base64. +- `signature_scheme`: the signature scheme that the submitter wants to use. +- `tree_leaf_signature`: the submitter's `tree_leaf` signature in base64. +- `verification_key`: the submitter's public verification key.  It is serialized +as described in the corresponding RFC, then base64-encoded. +- `domain_hint`: a domain name that indicates where the public verification-key +hash can be downloaded in base64.  Supported methods: DNS and HTTPS +(TODO: docdoc). + +Output: +- On success: HTTP 200.  The log will _try_ to incorporate the submitted leaf +into its Merkle tree. +- On failure: a different status code and a human-readable error message. + +The submitted entry will not be accepted if the signature is invalid or if the +downloaded verification-key hash does not match.  The submitted entry may also +not be accepted if the second-level domain name exceeded its rate limit.  By +coupling every add-leaf request with a second-level domain, it becomes more +difficult to spam the log.  You would need an excessive number of domain names. +This becomes costly if free domain names are rejected. + +The log does not publish domain-name to key bindings because key management is +more complex than that. + +Public logging should not be assumed until an inclusion proof is available.  An +inclusion proof should not be relied upon unless it leads up to a trustworthy +signed tree head.  Witness cosigning can make a tree head trustworthy. + +TODO: the log may allow no `domain_hint`?  Especially useful for v0 testing. + +### add-cosignature +``` +POST <base url>/st/v0/add-cosignature +``` + +Input key-value pairs: +- `signature`: a base64-encoded signature over a `tree_head` that is fixed for +some period of time. The cosigning witness retrieves the tree head using the +`get-signed-tree-head` endpoint with the "stable" type. +- `key_hash`: a base64-encoded hash of the public verification key that can be +used to verify the signature. + +Output: +- HTTP status 200 OK on success.  Otherwise a different status code and a +human-readable error message. + +The key-hash can be used to identify which witness signed the log's tree head. +A key-hash, rather than the full verification key, is used to force the verifier +to locate the appropriate key and make an explicit trust decision. diff --git a/doc/design.md b/doc/design.md new file mode 100644 index 0000000..f966d03 --- /dev/null +++ b/doc/design.md @@ -0,0 +1,32 @@ +# System Transparency Logging: Design v0 +We propose System Transparency logging.  It is similar to Certificate +Transparency, expect that cryptographically signed checksums are logged as +opposed to X.509 certificates.  Publicly logging signed checksums allow anyone +to discover which keys signed what.  As such, malicious and unintended key-usage +can be _discovered_.  We present our design and discuss how two possible +use-cases influenced it: binary transparency and reproducible builds. + +**Target audience.** +You are most likely interested in transparency logs or supply-chain security. + +**Preliminaries.** +You have basic understanding of cryptographic primitives like digital +signatures, hash functions, and Merkle trees.  You roughly know what problem +Certificate Transparency solves and how.  You may never have heard the term +_gossip-audit model_, or know how it is related to trust assumptions and +detectability properties. + +**Warning.** +This is a work-in-progress document that may be moved or modified. + +## Introduction +Transparency logs make it possible to detect unwanted events.  For example, +	are there any (mis-)issued TLS certificates [\[CT\]](https://tools.ietf.org/html/rfc6962), +	did you get a different Go module than everyone else [\[ChecksumDB\]](https://go.googlesource.com/proposal/+/master/design/25530-sumdb.md), +	or is someone running unexpected commands on your server [\[AuditLog\]](https://transparency.dev/application/reliably-log-all-actions-performed-on-your-servers/). +System Transparency logging makes signed checksums transparent.  The goal is to +_detect_ unwanted key-usage without making assumptions about the signed data. + +## Threat model and (non-)goals + +## Design diff --git a/doc/formats.md b/doc/formats.md deleted file mode 100644 index bffd05f..0000000 --- a/doc/formats.md +++ /dev/null @@ -1,160 +0,0 @@ -# Formats -This document defines data structures and data formats. - -## Overview -Here we give an overview of our presentation language / serialization rules. - -All integers are represented by 64-bit unsigned integers in network byte order. - -Variable length lists have an integer specifying its length.  Then each list -item is enumerated. - -TODO: fixme. - -## Items -Every item type start with a versioned format specifier.  Protocol version 1 -uses format specifiers in the range 1--X. - -### Request data structures -Log endpoints that take input data use the following request data structures. - -#### `get_entries_v1` -``` -0  Format  8                16               24 -+----------+----------------+----------------+ -|    1     |   Start Size   |    End Size    | -+----------+----------------+----------------+ -   uint64        uint64           uint64 -``` -- Format is always 1 for items of type `get_entries_v1`. -- Start size specifies the index of the first Merkle tree leaf to retrieve. -- End size specifies the index of the last Merkle tree leaf to retrieve. - -#### `get_proof_by_hash_v1` -``` -0  Format  8                16               48 -+----------+----------------+----------------+ -|    2     |   Tree size    |    Leaf hash   | -+----------+----------------+----------------+ -   uint64        uint64      fixed byte array -``` -- Format is always 2 for items of type `get_proof_by_hash_v1`. -- Leaf hash is computed as described in [RFC 6962/bis, §2.1.1](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-35#section-2.1.1). -- Tree size specifies which Merkle tree root inclusion should be proven for. - -#### `get_consistency_proof_v1` -``` -0  Format  8                16               24 -+----------+----------------+----------------+ -|    3     |    Old size    |    New size    | -+----------+----------------+----------------+ -   uint64        uint64           uint64 -``` -- Format is always 3 for items of type `get_consistency_proof_v1`. -- Old size specifies the tree size of an older Merkle tree head. -- New size specifies the tree size of a newer Merkle tree head. - -### Proof and log data structures -#### `inclusion_proof_v1` -``` -                                                                               --zero or more node hashes--> -0  Format  8                48               56               64               72                 72+Length -+----------+----------------+----------------+----------------+----------------+--------//--------+ -|    4     |   Identifier   |    Tree size   |    Leaf index  |     Length     |    Node hashes   | -+----------+----------------+----------------+----------------+----------------+--------//--------+ -   uint64      ed25519_v1         uint64           uint64           uint64           list body -``` -- Format is always 4 for items of type `inclusion_proof_v1`. -- Identifier identifies the log uniquely as an `ed25519_v1` item. -- Tree size is the size of the Merkle tree that the proof is based on. -- Leaf index is a zero-based index of the log entry that the proof is based on. -- The remaining part is a list of node hashes. -	- Length specifies the full byte size of the list.  It must be `32 * m`, -	where `m >= 0`.  This means that an inclusion needs zero or more node -	hashes to be well-formed. -	- Node hash is a node hash in the Merkle tree that the proof is based on. - -Remark: the list of node hashes is generated and verified as in [RFC 6962/bis, -§2.1.3](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-35#section-2.1.3). - -#### `consistency_proof_v1` -``` -                                                                               --zero or more node hashes--> -0  Format  8                48               56               64               72                 72+Length -+----------+----------------+----------------+----------------+----------------+--------//--------+ -|    5     |   Identifier   |    Old size    |    New size    |     Length     |    Node hashes   | -+----------+----------------+----------------+----------------+----------------+--------//--------+ -   uint64     ed25519_v1          uint64           uint64           uint64           list body -``` -- Format is always 5 for items of type `consistency_proof_v1`. -- Identifier identifies the log uniquely as an `ed25519_v1` item. -- Old size is the tree size of the older Merkle tree. -- New size is the tree size of the newer Merkle tree. -- The remaining part is a list of node hashes. -	- Length specifies the full byte size of the list.  It must be `32 * m`, -	where `m >= 0`.  This means that a consistenty proof needs zero or more node -	hashes to be well-formed. -	- Node hash is a node hash from the older or the newer Merkle tree. - -Remark: the list of node hashes is generated and verified as in [RFC 6962/bis, -§2.1.4](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-35#section-2.1.4). - -#### `signed_tree_head_v1` -``` -                                                                               ----one or more signature-identifier pairs-------> -0  Format  8               16               24               56                64               128              168    64+Length -+----------+----------------+----------------+----------------+----------------+----------------+----------------+--//--+ -|    6     |   Timestamp    |   Tree size    |    Root hash   |     Length     |    Signature   |   Identifier   | .... | -+----------+----------------+----------------+----------------+----------------+----------------+----------------+--//--+ -   uint64        uint64           uint64      fixed byte array      uint64      fixed byte array     ed25519_v1   cont. list body -``` -- Format is always 6 for items of type `signed_tree_head_v1`. -- Timestamp is the time since the UNIX epoch (January 1, 1970 00:00:00 UTC) in -milliseconds. -- Tree size is the number of leaves in the current Merkle tree. -- Root hash is the root hash of the current Merkle tree. -- The remaining part is a list of signature-identifier pairs.  -	- Length specifies the full byte size of the list.  It must be `104 * m`, -	where `m > 1`.  This means that a signed tree head needs at least one -	signature-identifier pair to be well-formed. -	- Signature is an Ed25519 signature over bytes 0--56.  The signature is -	encodes as in [RFC 8032, §3.3](https://tools.ietf.org/html/rfc8032#section-3.3). -	- Identifier identifies the signer uniquely as an `ed25519_v1` item. - -Remark: there may be multiple signature-identifier pairs if the log is cosigned. - -#### `signed_checksum32_ed25519_v1` -``` -0  Format  8                40               56                 56+Length        120+Length         160+Length -+----------+----------------+----------------+-------//---------+----------------+--------//--------+ -|    7     |     Checksum   |     Length     |    Identifier    |    Signature   |    Namespace     | -+----------+----------------+----------------+-------//---------+----------------+--------//--------+ -   uint64   fixed byte array      uint64          byte array     fixed byte array      ed25519_v1 -``` -- Format is always 7 for items of type `signed_checksum32_ed25519_v1`. -- Checksum is a 32-byte checksum that represents a data item of opaque type. -- Length specified the full byte size of the following identifier.  It must be -larger than zero and less than 128. -- Identifier identifies what the checksum represents.  The aforementioned length -constraint means that the identifier cannot be omitted or exceed 128 bytes. -- Signature is an Ed25519 signature over bytes 0--56+Length.  The signature is -encodes as in [RFC 8032, §3.3](https://tools.ietf.org/html/rfc8032#section-3.3). -- Namespace is an `ed25519_v1` item that identifies the signer uniquely. - -Remark: to keep this checksum entry as simple as possible it does not have a -variable length checksum or any agility with regards to the signing namespace. -This means that we need to have multiple leaf types that follow the pattern -`signed_checksum{32,64}_namespace_v1`. - -### Namespace data structures -#### `ed25519_v1` -``` -0  Format  8                40 -+----------+----------------+ -|    8     |   public key   | -+----------+----------------+ -   uint64   fixed byte array -``` -- The format is always 8 for items of type `ed25519_v1`. -- The public Ed25519 verification key is always 32 bytes.  See encoding in [RFC -8032, §3.2](https://tools.ietf.org/html/rfc8032#section-3.2). diff --git a/doc/schema/consistency_proof.schema.json b/doc/schema/consistency_proof.schema.json new file mode 100644 index 0000000..003f3c7 --- /dev/null +++ b/doc/schema/consistency_proof.schema.json @@ -0,0 +1,30 @@ +{ +	"$schema": "https://json-schema.org/draft-07/schema#", +	"title": "inclusion_proof", +	"description": "JSON-formatted inclusion proof, version 0.", + +	"type": "object", +	"required": [ "new_size", "old_size", "consistency_proof" ], +	"properties": { +		"new_size": { +			"description": "The tree size of the newer Merkle tree head.", +			"type": "integer", +			"minimum": 0 +		}, +		"old_size": { +			"description": "The tree size of the older Merkle tree head.", +			"type": "integer", +			"minimum": 0 +		}, +		"consistency_proof": { +			"description": "A list of base64-encoded node hashes that proves consistency", +			"type": "array", +			"items": { +				"description": "A node hash in base64", +				"type": "string", +				"minLength": 44, +				"maxLength": 44 +			} +		} +	} +} diff --git a/doc/schema/example/consistency_proof.json b/doc/schema/example/consistency_proof.json new file mode 100644 index 0000000..0a323b7 --- /dev/null +++ b/doc/schema/example/consistency_proof.json @@ -0,0 +1,7 @@ +{ +	"new_size": 2, +	"old_size": 1, +	"consistency_proof": [ +		"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=" +	] +} diff --git a/doc/schema/example/inclusion_proof.json b/doc/schema/example/inclusion_proof.json new file mode 100644 index 0000000..d46d426 --- /dev/null +++ b/doc/schema/example/inclusion_proof.json @@ -0,0 +1,7 @@ +{ +	"tree_size": 2, +	"leaf_index": 0, +	"inclusion_proof": [ +		"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=" +	] +} diff --git a/doc/schema/example/leaves.json b/doc/schema/example/leaves.json new file mode 100644 index 0000000..1eed05d --- /dev/null +++ b/doc/schema/example/leaves.json @@ -0,0 +1,14 @@ +[ +	{ +		"checksum": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=", +		"signature_scheme": 1, +		"signature": "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC=", +		"key_hash": "DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD=" +	}, +	{ +		"checksum": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=", +		"signature_scheme": 2, +		"signature": "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB=", +		"key_hash": "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC=" +	} +] diff --git a/doc/schema/example/sth.json b/doc/schema/example/sth.json new file mode 100644 index 0000000..ec3ad11 --- /dev/null +++ b/doc/schema/example/sth.json @@ -0,0 +1,11 @@ +{ +    "timestamp": 0, +    "tree_size": 0, +    "root_hash": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=", +    "signatures": [ +        { +            "key_hash": "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB=", +            "signature": "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC=" +        } +    ] +} diff --git a/doc/schema/inclusion_proof.schema.json b/doc/schema/inclusion_proof.schema.json new file mode 100644 index 0000000..3309d37 --- /dev/null +++ b/doc/schema/inclusion_proof.schema.json @@ -0,0 +1,30 @@ +{ +	"$schema": "https://json-schema.org/draft-07/schema#", +	"title": "inclusion_proof", +	"description": "JSON-formatted inclusion proof, version 0.", + +	"type": "object", +	"required": [ "tree_size", "leaf_index", "inclusion_proof" ], +	"properties": { +		"tree_size": { +			"description": "The Merkle tree size that the inclusion proof is based on.", +			"type": "integer", +			"minimum": 0 +		}, +		"leaf_index": { +			"description": "The zero-based index of the leaf that the inclusion proof is for.", +			"type": "integer", +			"minimum": 0 +		}, +		"inclusion_proof": { +			"description": "A list of base64-encoded node hashes that proves inclusion", +			"type": "array", +			"items": { +				"description": "A node hash in base64", +				"type": "string", +				"minLength": 44, +				"maxLength": 44 +			} +		} +	} +} diff --git a/doc/schema/leaves.schema.json b/doc/schema/leaves.schema.json new file mode 100644 index 0000000..74d7454 --- /dev/null +++ b/doc/schema/leaves.schema.json @@ -0,0 +1,38 @@ +{ +	"$schema": "https://json-schema.org/draft-07/schema#", +	"title": "list of tree_leaf", +	"description": "JSON-formatted tree leaf list, version 0.", + +	"type": "array", +	"description": "A list Merkle tree leaves", +	"items": { +		"type": "object", +		"required": [ "checksum", "signature_scheme", "signature", "key_hash" ], +		"properties": { +			"checksum": { +				"description": "A cryptographic hash that is computed over some data of opaque type.  The result is base64-encoded.", +				"type": "string", +				"minLength": 44, +				"maxLength": 44 +			}, +			"signature_scheme": { +				"description": "An integer that identifies the signature scheme used by the submitter.  See API documentation.", +				"type": "integer", +				"enum": [ 1, 2, 3 ] +			}, +			"signature": { +				"description": "The submitter's signature over the checksum in base64", +				"type": "string", +				"minLength": 44, +				"maxLength": 684 +			}, +			"key_hash": { +				"description": "A public verification-key hash that identifies the signer.", +				"type": "string", +				"minLength": 44, +				"maxLength": 44 +			} +		} +	}, +	"minItems": 1 +} diff --git a/doc/schema/sth.schema.json b/doc/schema/sth.schema.json new file mode 100644 index 0000000..86de2d3 --- /dev/null +++ b/doc/schema/sth.schema.json @@ -0,0 +1,50 @@ +{ +	"$schema": "https://json-schema.org/draft-07/schema#", +	"title": "signed_tree_head_v0", +	"description": "JSON-formatted signed tree head, version 0.", + +	"type": "object", +	"required": [ "timestamp", "tree_size", "root_hash", "signatures" ], +	"properties": { +		"timestamp": { +			"description": "The number of milliseconds since the UNIX epoch (January 1, 1970 00:00:00 UTC).", +			"type": "integer", +			"minimum": 0 +		}, +		"tree_size": { +			"description": "The number of entries that are stored in the log's Merkle tree.", +			"type": "integer", +			"minimum": 0 +		}, +		"root_hash": { +			"description": "The log's Merkle tree root hash in base64.", +			"type": "string", +			"minLength": 44, +			"maxLength": 44 +		}, +		"signatures": { +			"description": "A list of signer-signature pairs.", +			"type": "array", +			"items": { +				"description": "A signer-signature pair.", +				"type": "object", +				"required": [ "key_hash", "signature" ], +				"properties": { +					"key_hash": { +						"description": "A public verification-key hash that identifies the signer in base64.", +						"type": "string", +						"minLength": 44, +						"maxLength": 44 +					}, +					"signature": { +						"description": "The signer's signature over the log's tree_leaf structure in base64.", +						"type": "string", +						"minLength": 44, +						"maxLength": 44 +					} +				} +			}, +			"minItems": 1 +		} +	} +} diff --git a/doc/sketch.md b/doc/sketch.md deleted file mode 100644 index 31964e0..0000000 --- a/doc/sketch.md +++ /dev/null @@ -1,372 +0,0 @@ -# System Transparency Logging -This document provides a sketch of System Transparency (ST) logging.  The basic -idea is to insert hashes of system artifacts into a public, append-only, and -tamper-evident transparency log, such that any enforcing client can be sure that -they see the same system artifacts as everyone else.  A system artifact could -be a browser update, an operating system image, a Debian package, or more -generally something that is opaque. - -We take inspiration from the Certificate Transparency Front-End -([CTFE](https://github.com/google/certificate-transparency-go/tree/master/trillian/ctfe)) -that implements [RFC 6962](https://tools.ietf.org/html/rfc6962) for -[Trillian](https://transparency.dev). - -## Log parameters -An ST log is defined by the following parameters: -- `log_identifier`: a `Namespace` of type `ed25519_v1` that defines the log's -signing algorithm and public verification key. -- `supported_namespaces`: a list of namespace types that the log supports. -Entities must use a supported namespace type when posting signed data to the -log. -- `base_url`: prefix used by clients that contact the log, e.g., -example.com:1234/log. -- `final_cosigned_tree_head`: an `StItem` of type `cosigned_tree_head_v*`.  Not -set until the log is turned into read-only mode in preparation of a shutdown. - -ST logs use the same hash strategy as described in RFC 6962: SHA256 with `0x00` -as leaf node prefix and `0x01` as interior node prefix. - -In contrast to Certificate Transparency (CT) **there is no Maximum Merge Delay -(MMD)**.  New entries are merged into the log as soon as possible, and no client -should trust that something is logged until an inclusion proof can be provided -that references a trustworthy STH.  Therefore, **there are no "promises" of -public logging** as in CT. - -To produce trustworthy STHs a simple form of [witness -cosigning](https://arxiv.org/pdf/1503.08768.pdf) is built into the log. -Witnesses poll the log for the next stable STH, and verify that it is consistent -before posting a cosignature that can then be served by the log. - -## Acceptance criteria and scope -A log should accept a leaf submission if it is: -- Well-formed, see data structure definitions below. -- Digitally signed by a registered namespace. - -Rate limits may be applied per namespace to combat spam.  Namespaces may also be -used by clients to determine which entries belong to who.  It is up to the -submitters to communicate trusted namespaces to their own clients.  In other -words, there are no mappings from namespaces to identities built into the log. -There is also no revocation of namespaces: **we facilitate _detection_ of -compromised signing keys by making artifact hashes public, which is not to be -confused with _prevention_ or even _recovery_ after detection**. - -## Data structure definitions -Data structures are defined and serialized using the presentation language in -[RFC 5246, §4](https://tools.ietf.org/html/rfc5246).  A definition of the log's -Merkle tree can be found in [RFC 6962, -§2](https://tools.ietf.org/html/rfc6962#section-2). - -### Namespace -A _namespace_ is a versioned data structure that contains a public verification -key (or fingerprint), as well as enough information to determine its format, -signing, and verification operations.  Namespaces are used as identifiers, both -for the log itself and the parties that submit artifact hashes and cosignatures. - -``` -enum { -	reserved(0), -	ed25519_v1(1), -	(2^16-1) -} NamespaceFormat; - -struct { -	NamespaceFormat format; -	select (format) { -		case ed25519_v1: Ed25519V1; -	} message; -} Namespace; -``` - -Our namespace format is inspired by Keybase's -[key-id](https://keybase.io/docs/api/1.0/kid). - -#### Ed25519V1 -At this time the only supported namespace type is based on Ed25519.  The -namespace field contains the full verification key.  Signing operations and -serialized formats are defined by [RFC -8032](https://tools.ietf.org/html/rfc8032). -``` -struct { -	opaque namespace[32]; // public verification key -} Ed25519V1; -``` - -### `StItem` -A general-purpose `TransItem` is defined in [RFC 6962/bis, -§4.5](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.5). -We define our own `TransItem`, but name it `StItem` to emphasize that they are -not the same. - -``` -enum { -	reserved(0), -	signed_tree_head_v1(1), -	cosigned_tree_head_v1(2), -	consistency_proof_v1(3), -	inclusion_proof_v1(4), -	signed_checksum_v1(5), // leaf type -	(2^16-1) -} StFormat; - -struct { -	StFormat format; -	select (format) { -		case signed_tree_head_v1: SignedTreeHeadV1; -		case cosigned_tree_head_v1: CosignedTreeHeadV1; -		case consistency_proof_v1: ConsistencyProofV1; -		case inclusion_proof_v1: InclusionProofV1; -		case signed_checksum_v1: SignedChecksumV1; -	} message; -} StItem; - -struct { -	StItem items<0..2^32-1>; -} StItemList; -``` - -#### `signed_tree_head_v1` -We use the same tree head definition as in [RFC 6962/bis, -§4.9](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.9). -The resulting _signed_ tree head is packaged differently: a namespace is used as -log identifier, and it is communicated in a `SignatureV1` structure. -``` -struct { -	TreeHeadV1 tree_head; -	SignatureV1 signature; -} SignedTreeHeadV1; - -struct { -	uint64 timestamp; -	uint64 tree_size; -	NodeHash root_hash; -	Extension extensions<0..2^16-1>; -} TreeHeadV1; -opaque NodeHash<32..2^8-1>; - -struct { -	Namespace namespace; -	opaque signature<1..2^16-1>; -} SignatureV1; -``` - -#### `cosigned_tree_head_v1` -Transparency logs were designed to be cryptographically verifiable in the -presence of a gossip-audit model that ensures everyone observes _the same -cryptographically verifiable log_.  The gossip-audit model is largely undefined -in today's existing transparency logging ecosystems, which means that the logs -must be trusted to play by the rules.   We wanted to avoid that outcome in our -ecosystem.  Therefore, a gossip-audit model is built into the log. - -The basic idea is that an STH should only be considered valid if it is cosigned -by a number of witnesses that verify the append-only property.  Which witnesses -to trust and under what circumstances is defined by a client-side _witness -cosigning policy_.  For example, -	"require no witness cosigning", -	"must have at least `k` signatures from witnesses A...J", and -	"must have at least `k` signatures from witnesses A...J where one is from -		witness B". - -Witness cosigning policies are beyond the scope of this specification. - -A cosigned STH is composed of an STH and a list of cosignatures.  A cosignature -must cover the serialized STH as an `StItem`, and be produced with a witness -namespace of type `ed25519_v1`. - -``` -struct { -	SignedTreeHeadV1 signed_tree_head; -	SignatureV1 cosignatures<0..2^32-1>; // vector of cosignatures -} CosignedTreeHeadV1; -``` - -#### `consistency_proof_v1` -For the most part we use the same consistency proof definition as in [RFC -6962/bis, -§4.11](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.11). -There are two modifications: our log identifier is a namespace rather than an -[OID](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.4), -and a consistency proof may be empty. - -``` -struct { -	Namespace log_id; -	uint64 tree_size_1; -	uint64 tree_size_2; -	NodeHash consistency_path<0..2^16-1>; -} ConsistencyProofV1; -``` - -#### `inclusion_proof_v1` -For the most part we use the same inclusion proof definition as in [RFC -6962/bis, -§4.12](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.12). -There are two modifications: our log identifier is a namespace rather than an -[OID](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.4), -and an inclusion proof may be empty. -``` -struct { -	Namespace log_id; -	uint64 tree_size; -	uint64 leaf_index; -	NodeHash inclusion_path<0..2^16-1>; -} InclusionProofV1; -``` - -#### `signed_checksum_v1` -A checksum entry contains a package identifier like `foobar-1.2.3` and an -artifact hash.   It is then signed so that clients can distinguish artifact -hashes from two different software publishers A and B.  For example, the -`signed_checksum_v1` type can help [enforce public binary logging before -accepting a new software -update](https://wiki.mozilla.org/Security/Binary_Transparency). - -``` -struct { -	ChecksumV1 data; -	SignatureV1 signature; -} SignedChecksumV1; - -struct { -	opaque identifier<1..128>; -	opaque checksum<1..64>; -} ChecksumV1; -``` - -It is assumed that clients know how to find the real artifact source (if not -already at hand), such that the logged hash can be recomputed and compared for -equality.  The log is not aware of how artifact hashes are computed, which means -that it is up to the submitters to define hash functions, data formats, and -such. - -## Public endpoints -Clients talk to the log using HTTP(S). Successfully processed requests are -responded to with HTTP status code `200 OK`, and any returned data is -serialized.  Endpoints without input parameters use HTTP GET requests. -Endpoints that have input parameters HTTP POST a TLS-serialized data structure. -The HTTP content type `application/octet-stream` is used when sending data. - -### add-entry -``` -POST https://<base url>/st/v1/add-entry -``` - -Input: -- An `StItem` of type `signed_checksum_v1`. - -No output. - -### add-cosignature -``` -POST https://<base url>/st/v1/add-cosignature -``` - -Input: -- An `StItem` of type `cosigned_tree_head_v1`.  The list of cosignatures must -be of length one, the witness signature must cover the item's STH, and that STH -must additionally match the log's stable STH that is currently being cosigned. - -No output. - -### get-latest-sth -``` -GET https://<base url>/st/v1/get-latest-sth -``` - -No input. - -Output: -- An `StItem` of type `signed_tree_head_v1` that corresponds to the most -recent STH. - -### get-stable-sth -``` -GET https://<base url>/st/v1/get-stable-sth -``` - -No input. - -Output: -- An `StItem` of type `signed_tree_head_v1` that corresponds to a stable STH -that witnesses should cosign.  The same STH is returned for a period of time. - -### get-cosigned-sth -``` -GET https://<base url>/st/v1/get-cosigned-sth -``` - -No input. - -Output: -- An `StItem` of type `cosigned_tree_head_v1` that corresponds to the most -recent cosigned STH. - -### get-proof-by-hash -``` -POST https://<base url>/st/v1/get-proof-by-hash -``` - -Input: -``` -struct { -	opaque hash[32]; // leaf hash -	uint64 tree_size; // tree size that the proof should be based on -} GetProofByHashV1; -``` - -Output: -- An `StItem` of type `inclusion_proof_v1`. - -### get-consistency-proof -``` -POST https://<base url>/st/v1/get-consistency-proof -``` - -Input: -``` -struct { -	uint64 first; // first tree size that the proof should be based on -	uint64 second; // second tree size that the proof should be based on -} GetConsistencyProofV1; -``` - -Output: -- An `StItem` of type `consistency_proof_v1`. - -### get-entries -``` -POST https://<base url>/st/v1/get-entries -``` - -Input: -``` -struct { -	uint64 start; // 0-based index of first entry to retrieve -	uint64 end; // 0-based index of last entry to retrieve in decimal. -} GetEntriesV1; -``` - -Output: -- An `StItem` list where each entry is of type `signed_checksum_v1`.  The first -`StItem` corresponds to the start index, the second one to `start+1`, etc.  The -log may return fewer entries than requested. - -# Appendix A -In the future other namespace types might be supported.  For example, we could -add [RSASSA-PKCS1-v1_5](https://tools.ietf.org/html/rfc3447#section-8.2) as -follows: -1. Add `rsa_v1` format and RSAV1 namespace.  This is what we would register on -the server-side such that the server knows the namespace and complete key. -``` -struct { -	opaque namespace<32>; // key fingerprint -	// + some encoding of public key -} RSAV1; -``` -2. Add `rsassa_pkcs1_5_v1` format and `RSASSAPKCS1_5_v1`.  This is what the -submitter would use to communicate namespace and RSA signature mode. -``` -struct { -	opaque namespace<32>; // key fingerprint -	// + necessary parameters, e.g., SHA256 as hash function -} RSASSAPKCS1_5V1; -``` | 
