# System Transparency Logging This document provides a sketch of System Transparency (ST) logging. The basic idea is to insert hashes of system artifacts into a public, append-only, and tamper-evident transparency log, such that any enforcing client can be sure that they see the same system artifacts as everyone else. A system artifact could be a browser update, an operating system image, a Debian package, or more generally something that is opaque. We take inspiration from the Certificate Transparency Front-End ([CTFE](https://github.com/google/certificate-transparency-go/tree/master/trillian/ctfe)) that implements [RFC 6962](https://tools.ietf.org/html/rfc6962) for [Trillian](https://transparency.dev). ## Log parameters and overview An ST log is defined by the following parameters: - `log_identifier`: namespace of type `ed25519_v1` that defines the log's signing algorithm and public verification key. - `supported_namespaces`: a list of namespace types the log supports. Submitters must use a supported namespace type when adding new entries. - `base_url`: prefix used by clients that contact the log, e.g., example.com:1234/log. ST logs use the same hash strategy as described in RFC 6962: SHA256 with `0x00` as leaf node prefix and `0x01` as interior node prefix. In contrast to Certificate Transparency (CT) **there is no Maximum Merge Delay (MMD)**. New entries are merged into the log as soon as possible, and no client should trust that something is logged until an inclusion proof can be provided that references a trustworthy STH. Therefore, **there are no "promises" of public logging** as in CT. To produce trustworthy STHs a simple form of [witness cosigning](https://arxiv.org/pdf/1503.08768.pdf) is built into the log. ## Minimum acceptance criteria and scope A log should accept a submission if it is: - Well-formed, see data structure definitions below. - Digitally signed by a registered namespace. Rate limits may be applied per namespace to combat spam. Namespaces may also be used by clients to determine which entries belong to who. It is up to the submitters to communicate trusted namespaces to their own clients. In other words, there are no mappings from namespaces to identities built into the log. There is also no revocation of namespaces: **we facilitate _detection_ of compromised signing keys by making artifact hashes public, which is not to be confused with _prevention_ or even _recovery_ after detection**. ## Data structure definitions Data structures are defined and serialized using the presentation language in [RFC 5246, §4](https://tools.ietf.org/html/rfc5246). A definition of the log's Merkle tree can be found in [RFC 6962, §2](https://tools.ietf.org/html/rfc6962#section-2). ### Namespace A _namespace_ is a versioned data structure that contains a public verification key (or fingerprint), as well as enough information to determine its format, signing, and verification operations. Namespaces are used as identifiers, both for the log itself and the parties that submit artifact hashes and cosignatures. ``` enum { reserved(0), ed25519_v1(1), (65535) } NamespaceFormat; struct { NamespaceFormat format; select (format) { case ed25519_v1: Ed25519V1; } message; } Namespace; ``` Our namespace format is inspired by Keybase's [key-id](https://keybase.io/docs/api/1.0/kid). #### Ed25519V1 At this time the only supported namespace type is based on Ed25519. The namespace field contains the full verification key. Signing operations and serialized formats are defined by [RFC 8032](https://tools.ietf.org/html/rfc8032). ``` struct { opaque namespace<32>; // public verification key } Ed25519V1; ``` ### `StItem` A general-purpose `TransItem` is defined in [RFC 6962/bis, §4.5](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.5). We define our own `TransItem`, but name it `StItem` to emphasize that they are not the same. ``` enum { reserved(0), signed_tree_head_v1(1), cosigned_tree_head_v1(2), consistency_proof_v1(3), inclusion_proof_v1(4), checksum_v1(5), // leaf type (65535) } StFormat; struct { StFormat format; select (format) { case signed_tree_head_v1: SignedTreeHeadV1; case cosigned_tree_head_v1: CosignedTreeHeadV1; case consistency_proof_v1: ConsistencyProofV1; case inclusion_proof_v1: InclusionProofV1; case checksum_v1: ChecksumV1; } message; } StItem; ``` #### `signed_tree_head_v1` For the most part we use the same signed tree head definition as in [RFC 6962/bis, §4.9](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.10). There is one modification: our log identifier is a namespace rather than an [OID](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.4). ``` opaque NodeHash<32..2^8-1>; struct { uint64 timestamp; uint64 tree_size; NodeHash root_hash; Extension sth_extensions<0..2^16-1>; } TreeHeadV1; struct { Namespace log_id; TreeHeadV1 tree_head; opaque signature<0..2^16-1>; } SignedTreeHeadV1; ``` #### `cosigned_tree_head_v1` Transparency logs were designed to be cryptographically verifiable in the presence of a gossip-audit model that ensures everyone observes _the same cryptographically verifiable log_. The gossip-audit model is largely undefined in today's existing transparency logging ecosystems, which means that the logs must be trusted to play by the rules. We wanted to avoid that outcome in our ecosystem. Therefore, a gossip-audit model is built into the log. The basic idea is that an STH should only be considered valid if it is cosigned by a number of witnesses that verify the append-only property. Which witnesses to trust and under what circumstances is defined by a client-side _witness cosigning policy_. For example, "require no witness cosigning", "must have at least `k` signatures from witnesses A...J", and "must have at least `k` signatures from witnesses A...J where one is from witness B". Witness cosigning policies are beyond the scope of this specification. A cosigned STH is composed of an STH and a list of cosignatures. A cosignature must cover the serialized STH as an `StItem`, and be produced with a witness namespace of type `ed25519_v1`. ``` struct { SignedTreeHeadV1 sth; CosignatureV1 <0..4294967295>; // vector of cosignatures } CosignedTreeHeadV1; struct { Namespace witness_id; opaque signature<1..65535>; } CosignatureV1; ``` #### `consistency_proof_v1` For the most part we use the same consistency proof definition as in [RFC 6962/bis, §4.11](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.11). There are two modifications: our log identifier is a namespace rather than an [OID](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.4), and a consistency proof may be empty. ``` struct { Namespace log_id; uint64 tree_size_1; uint64 tree_size_2; NodeHash consistency_path<0..2^16-1>; } ConsistencyProofV1; ``` #### `inclusion_proof_v1` For the most part we use the same inclusion proof definition as in [RFC 6962/bis, §4.12](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.12). There are two modifications: our log identifier is a namespace rather than an [OID](https://tools.ietf.org/html/draft-ietf-trans-rfc6962-bis-34#section-4.4), and an inclusion proof may be empty. ``` struct { Namespace log_id; uint64 tree_size; uint64 leaf_index; NodeHash inclusion_path<0..2^16-1>; } InclusionProofV1; ``` #### `checksum_v1` A checksum entry contains a package identifier such as `foobar-1.2.3` and an artifact hash. It also contains a namespace that allows clients to distinguish artifact hashes from two different software publishers A and B. For example, the `checksum_v1` can help [enforce public binary logging before accepting a new software update](https://wiki.mozilla.org/Security/Binary_Transparency). ``` struct { opaque package<1..2^8-1>; // package identifier opaque checksum<1..64>; // hash of some artifact Namespace namespace; } ChecksumV1; ``` It is assumed that clients know how to find the real artifact source (if not already at hand), such that the logged hash can be recomputed and compared for equality. The log is not aware of how artifact hashes are computed, which means that it is up to the submitters to define hash functions, data formats, and such. ## Public endpoints Clients talk to the log using HTTP(S). Parameters for HTTP GET requests are URL-encoded with `Content-Type: application/x-www-form-urlencoded`. In contrast, HTTP POST requests post a single base64-encoded serialized `StItem` with `Content-Type: text/plain`. The log returns HTTP Status `200 OK` to signal success, and the response (if any) is serialized and then base64-encoded with `Content-Type: text/plain`. ### add-entry ``` POST https:///st/v1/add-entry ``` Input: - An `StItem` of type `checksum_v1`. No output. ### add-cosignature ``` POST https:///st/v1/add-cosignature ``` Input: - An `StItem` of type `cosigned_tree_head_v1`. The list of cosignatures must be of length one, the witness signature must cover the item's STH, and that STH must additionally match the log's stable STH that is currently being cosigned. No output. ### get-latest-sth ``` GET https:///st/v1/get-latest-sth ``` No input. Output: - An `StItem` of type `signed_tree_head_v1` that corresponds to the most recent STH. ### get-stable-sth ``` GET https:///st/v1/get-stable-sth ``` No input. Output: - An `StItem` of type `signed_tree_head_v1` that corresponds to a stable STH that witnesses should cosign. The same STH is returned for a period of time. ### get-cosigned-sth ``` GET https:///st/v1/get-cosigned-sth ``` No input. Output: - An `StItem` of type `cosigned_tree_head_v1` that corresponds to the most recent cosigned STH. ### get-proof-by-hash ``` GET https:///st/v1/get-proof-by-hash ``` Input: - `hash`: a base-64 encoded leaf hash using the log's hash function. - `tree_size`: the tree size that the proof should be based on in decimal. Output: - An `StItem` of type `inclusion_proof_v1`. ### get-consistency-proof ``` GET https:///st/v1/get-consistency-proof ``` Input: - first: the `tree_size` of the older tree in decimal. - second: the `tree_size` of the newer tree in decimal. Output: - An `StItem` of type `consistency_proof_v1`. ### get-entries ``` GET https:///st/v1/get-entries ``` Input: - `start`: 0-based index of first entry to retrieve in decimal. - `end`: 0-based index of last entry to retrieve in decimal. Output: - An `StItem` list where each entry is of type `checksum_v1`. The first `StItem` corresponds to the start index, the second one to `start+1`, etc. The log may return fewer entries than requested. ## Witness behavior TODO: docdoc expected witness behavior somewhere # Appendix A In the future other namespace types might be supported. For example, we could add [RSASSA-PKCS1-v1_5](https://tools.ietf.org/html/rfc3447#section-8.2) as follows: 1. Add `rsa_v1` format and RSAV1 namespace. This is what we would register on the server-side such that the server knows the namespace and complete key. ``` struct { opaque namespace<32>; // key fingerprint // + some encoding of public key } RSAV1; ``` 2. Add `rsassa_pkcs1_5_v1` format and `RSASSAPKCS1_5_v1`. This is what the submitter would use to communicate namespace and RSA signature mode. ``` struct { opaque namespace<32>; // key fingerprint // + necessary parameters, e.g., SHA256 as hash function } RSASSAPKCS1_5V1; ```