From 348c58a567f18b111ca41653bf550cc64b15443b Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Sun, 10 Oct 2021 20:00:32 +0200 Subject: fixed small/medium issues and left some comments - Deleted unnecessary roadmap - Clarified distribution and verification section - Proposed down-to-the-point text for domain hint description - Left comments that we should consider addressing - A bunch of minor edits For transparency this commit was squashed and rebased by rgdd. --- doc/design.md | 130 ++++++++++++++++++++++++++++++---------------------------- 1 file changed, 67 insertions(+), 63 deletions(-) diff --git a/doc/design.md b/doc/design.md index 535685b..bba234c 100644 --- a/doc/design.md +++ b/doc/design.md @@ -33,7 +33,7 @@ sigsum logging as pre-hashed digital signing with transparency. The signing party is called a _signer_. The user of the signed data is called a _verifier_. -The problem with _just digital signing_ is that it is difficult to determine +The problem with _digital signing on its own_ is that it is difficult to determine whether the signed data is _actually the data that should have been signed_. How would we detect if a secret signing key got compromised? How would we detect if something was signed by mistake, or even worse, @@ -46,7 +46,7 @@ block that can be used to facilitate verification of falsifiable claims. Examples include: - Everyone gets the same executable binaries [\[BT\]](https://wiki.mozilla.org/Security/Binary_Transparency) -- A domain does not serve malicious javascript +- A web server does not serve malicious javascript [\[SRI\]](https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity) - A list of key-value pairs is maintained with a certain policy. @@ -107,12 +107,8 @@ that a verifier is required to support. Signers, monitors, and witnesses additionally need to interact with a sigsum log's line-terminated ASCII HTTP(S) [API](https://git.sigsum.org/sigsum/tree/doc/api.md). -### 1.3 - Roadmap -First we describe our threat model. Then we give a bird's view of the design. -Finally, we wrap up with an incomplete frequently asked questions section. - ## 2 - Threat model -We consider a powerful attacker that gained control of a signer's signing and +We consider a powerful attacker that has gained control of a signer's software signing and release infrastructure. This covers a weaker form of attacker that is able to sign data and distribute it to a subset of isolated verifiers. For example, this is essentially what the FBI requested from Apple in the San Bernardino case @@ -121,14 +117,14 @@ The fact that signing keys and related infrastructure components get compromised should not be controversial these days [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/). -The attacker can also gain control of the log's signing key and infrastructure. +The same attacker has also gained control of the signing key and infrastructure of a sigsum log used for transparency. This covers a weaker form of attacker that is able to sign log data and distribute it to a subset of isolated verifiers. For example, this could have been the case when a remote code execution was found for a Certificate Transparency Log [\[DigiCert\]](https://groups.google.com/a/chromium.org/g/ct-policy/c/aKNbZuJzwfM). -The overall system is said to be secure if a monitor can discover every signed +The overall system is said to be secure if a log monitor can discover every signed checksum that a verifier would accept. A log can misbehave by not presenting the same append-only Merkle tree to everyone because it is attacker-controlled. However, a log operator would only do that if it is likely to go unnoticed. @@ -160,7 +156,7 @@ we give a brief primer below. | +---------->| Monitor |<-------+ |proof v +---------+ v +---------+ | +----------+ - | witness | | false | Verifier | + | Witness | | false | Verifier | +---------+ | claim +----------+ v investigate @@ -194,48 +190,45 @@ verify that this tree is fresh and append-only before cosigning it to achieve a distributed form of trust. A tree leaf contains four fields: - **shard_hint**: a number that binds the leaf to a particular _shard interval_. Sharding means that the log has a predefined time during which logging requests -are accepted. Once elapsed, the log can be shut down. +are accepted. Once elapsed, the log can be shut down or be made read-only. - **checksum**: most likely a hash of some data. The log is not aware of data; just checksums. - **signature**: a digital signature that is computed by a signer over the -leaf's shard hint and checksum. +shard hint and checksum. - **key_hash**: a cryptographic hash of the signer's verification key that can be used to verify the signature. -A shard hint is included in the signed statement to prevent replays in a -non-overlapping shard. See details in Section 4.2. - Any additional metadata that is use-case specific can be stored as part of the data that a checksum represents. Where data is located is use-case specific. Note that a key hash is logged rather than the public key itself. This reduces the likelihood that an untrusted key is discovered and used by mistake. In -other words, verifiers and monitors must locate keys and trust them explicitly. +other words, verifiers and monitors must locate signer verification keys independently of logs, and trust them explicitly. ### 3.2 - Usage pattern #### 3.2.1 - Prepare a request -A signer selects a shard hint and a checksum that should be logged. The -selected shard hint represents an abstract statement like "sigsum logs that are -active during 2021". The selected checksum is most likely the output of a +A signer selects a shard hint representing an abstract statement like "sigsum logs that are +active during 2021". +A shard hint is +incorporated into the signed statement to ensure that a log's leaves cannot be +replayed in a non-overlapping shard, for example by a good Samaritan. + +The signer selects a checksum that should be logged, most likely the output of a hash function. For example, it could be the hash of an executable binary. -The selected shard hint and checksum are signed by the signer. A shard hint is -incorporated into the signed statement to ensure that a log's leaves cannot be -replayed in a non-overlapping shard by a good Samaritan. +The signer signs the selected shard hint and checksum. The signer also has to do a one-time DNS setup. As outlined below, logs will check that _some domain_ is aware of the signer's verification key. This is -part of a defense mechanism that helps us combat log spam. +part of a defense mechanism that helps log operators to deal with log spam. +Once present in DNS, a verification key can be used in log requests. #### 3.2.2 - Submit request Sigsum logs implement an HTTP(S) API. Input and output is human-readable and -uses a simple ASCII format. A more complex parser like JSON is not needed -because the exchanged data structures are primitive enough. - -A signer submits their shard hint, checksum, signature, and public verification -key as key-value pairs. The log uses the public verification key to check that -the signature is valid, then hashes it to construct the leaf's key hash. +use a simple ASCII format. A more complex parser like JSON is not needed +since the data structures being exchanged are primitive enough. +[[move domain hint discussion to its own section vvv /ln]] The signer also submits a _domain hint_. The log will download a DNS TXT resource record based on the provided domain name. The downloaded result must match the public verification key hash. By verifying that all signers control a @@ -253,19 +246,24 @@ more than one log to be reliable in case of downtime or unexpected events like A signer's domain hint is not part of the logged leaf because key management is more complex than that. A separate project should focus on transparent key management. Our work is about transparent _key-usage_. +[[^^^ move domain hint discussion to its own section /ln]] + +A signer submits shard hint, checksum, signature, public verification +key and domain hint as ASCII key-value pairs. The log verifies that the public verification key is present in DNS and uses it to check that +the signature is valid, then constructs the Merkle tree leaf as described in 3.1 and hashes it to construct the leaf's key hash. -A sigsum log _tries_ to incorporate a leaf into its Merkle tree if a logging -request is accepted. There are however no _promises of public logging_ as in -Certificate Transparency. Therefore, sigsum logs do not provide low-latency. A +When a submitted logging +request is accepted, the log _tries_ to incorporate the submitted leaf into its Merkle tree. There are however no _promises of public logging_ as in +Certificate Transparency. Therefore, sigsum logs do not provide low latency -- the signer has to wait for an inclusion proof and a cosigned tree head. #### 3.2.3 - Wait for witness cosigning -Sigsum logs freeze a tree head every five minutes. Cosigning witnesses poll the -logs for so-called _to-sign_ tree heads, verifying that they are fresh and +Sigsum logs periodically freeze the most current tree head, typically every five minutes. Cosigning witnesses poll +logs for so-called _to-sign_ tree heads and verify that they are fresh and append-only before doing a cosignature operation. Cosignatures are posted back -to the logs so that signers can easily fetch the finalized cosigned tree heads. +to logs so that signers can easily fetch finalized cosigned tree heads. -It takes five to ten minutes before a signer's distribution phase can start. +It thus takes five to ten minutes before a signer's distribution phase can start. The added latency is a trade-off that simplifies sigsum logging by removing the need for reactive gossip-audit protocols [\[G1,](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7346853) @@ -276,40 +274,41 @@ need for reactive gossip-audit protocols Use-cases like instant certificate issuance are not supported by design. #### 3.2.4 - Distribution -After a signer collected proofs of public logging the distribution phase can +Once a signer has collected proofs of public logging the distribution phase can start. Distribution happens using the same mechanism that is normally used for the data. For example, on a website, in a git repository, etc. +Signers distribute at least the following pieces: **Data:** -the signer's data. It can be used to reproduce a logged checksum. +the signer's data, for example an executable binary. It can be used to reproduce a logged checksum. **Metadata:** -a signer's shard hint, signature, and verification key hash. Note that the +the shard hint, the signature over shard hint and checksum, and the verification key hash used in the log request. Note that the combination of data and metadata can be used to reconstruct the logged leaf. **Proof:** -an inclusion proof that leads up to a cosigned tree head. +an inclusion proof that leads up to a cosigned tree head. Note that _proof_ +refers to the collection of an inclusion proof and a cosigned tree head. #### 3.2.5 - Verification -A verifier should only accept the distributed data if these criteria hold: -1. The signer's checksum is correct for the distributed data. -2. The signer's signed statement is valid for the specified public key. -3. The provided tree head can be reconstructed from the logged leaf and +A verifier should only accept the distributed data if the following criteria hold: +1. The data's checksum and shard hint are signed using the specified public key. +2. The provided tree head can be reconstructed from the logged leaf and its inclusion proof. -4. The provided tree head is from a known log with enough valid cosignatures. +3. The provided tree head is from a known log with enough valid cosignatures. Notice that there are no new outbound network connections for a verifier. Therefore, a proof of public logging is only as convincing as the tree head that -an inclusion proof leads up to. Sigsum logs have trustworthy tree heads due to -using a variant of witness cosigning. In other words, a verifier cannot be -tricked into accepting some data whose checksum have yet to be publicly logged +an inclusion proof leads up to. Sigsum logs have trustworthy tree heads thanks to +using a variant of witness cosigning. A verifier cannot be +tricked into accepting data whose checksum have not been publicly logged unless the attacker controls more than a threshold of witnesses. #### 3.2.6 - Monitoring An often overlooked step is that transparency logging falls short if no-one keeps track of what appears in the public logs. Monitoring is necessarily use-case -specific in sigsum. At minimum, you need to locate relevant public keys. You -may also need to be aware of how to locate the data that a checksum represents. +specific in sigsum. At a minimum, a monitor needs to locate relevant public keys. It +may also need to be aware of how to locate the data that a given checksum represents. It should also be noted that sigsum logging can facilitate detection of attacks even if a verifier fails open by enforcing the third and fourth criteria partially @@ -317,34 +316,39 @@ in Section 3.2.5. For example, the fact that a distribution mechanism does not serve proofs of public logging could indicate that there is an ongoing attack against a signer's distributed infrastructure. A monitor may detect that. +[["fails open" needs an explanation /ln]] +[["by enforcing the third and fourth criteria partially in Section 3.2.5" needs a little more context -- partially how? /ln]] + ### 3.3 - Summary +[[move the summary to the top of section 3? /ln]] Sigsum logs are sharded and shut down at predefined times. A sigsum log can shut down _safely_ because verification on the verifier-side is not interactive. + The difficulty of bypassing public logging is based on the difficulty of controlling enough independent witnesses. A witness checks that a log's tree -head is correct before cosigning. Correct refers to fresh and append-only. +head is correct before cosigning. Correctness includes freshness and the append-only property. Signers, monitors, and witnesses interact with the logs using an ASCII HTTP(S) -API. A signer must prove that they own a domain name as an anti-spam mechanism. -No data and rich metadata is logged to protect the log operator from poisoning. -It also keeps log operations simpler because there are fewer bytes to manage. +API. A signer must prove that they control a DNS domain name as an anti-spam mechanism. +No data or rich metadata is being logged, to protect the log operator from poisoning. +This also keeps log operations simpler because there are less data to manage. -Verifiers interact with the logs indirectly through their signer's existing +Verifiers interact with logs indirectly through their signer's existing distribution mechanism. Signers are responsible for logging signed checksums -and distributing necessary proofs of public logging. Monitor discover signed -checksums in the logs, generating alerts if any key-usage is inappropriate. +and distributing necessary proofs of public logging. Monitors discover signed +checksums in the logs and generate alerts if any key-usage is inappropriate. ### 4 - Frequently Asked Questions -#### 4.1 - What parts of the design are we still thinking about? +#### 4.1 - What parts of the design are up for debate? A brief summary appeared in our archive on [2021-10-05](https://git.sigsum.org/sigsum/tree/archive/2021-10-05-open-design-thoughts?id=5c02770b5bd7d43b9327623d3de9adeda2468e84). It may be incomplete, but covers some details that are worth thinking more -about. We are still open to remove, add, or change things if it is motivated. +about. We are still open to remove, add, or change things. #### 4.2 - What is the point of having a shard hint? Unlike TLS certificates which already have validity ranges, a checksum does not carry any such information. Therefore, we require that the signer selects a -shard hint. The selected shard hint must be in a log's shard interval. A shard +shard hint. The selected shard hint must be within a log's shard interval. A shard interval is defined by a start time and an end time. Both ends of the shard interval are inclusive and expressed as the number of seconds since the UNIX epoch (January 1, 1970 00:00 UTC). @@ -360,13 +364,13 @@ set it as large as possible. If a verified timestamp is needed to reason about the time of logging, you may use a cosigned tree head instead [\[TS\]](https://git.sigsum.org/sigsum/commit/?id=fef460586e847e378a197381ef1ae3a64e6ea38b). -#### 4.3 - XXX +#### 4.3 - More questions - Why not store data in the log? XXX: answered enough already? - Why not store rich metadata in the log? XXX: answered enough already? - What (de)serialization parsers are needed and why? - What cryptographic primitives are supported and why? - What thought went into witness cosigning? Compare with other approaches, and should include `get-tree-head-*` endpoints in more detail. -- Are there any privacy concerns? +- What are the privacy concerns? - How does it work with more than one log? -- What policy should a verifier use? +- What policy should a verifier follow? -- cgit v1.2.3 From 05548e4e289890f318d93b90cb47730c45acc210 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Sun, 10 Oct 2021 20:00:44 +0200 Subject: refactored extended domain hint text into FAQ --- doc/design.md | 43 ++++++++++++++++++++++--------------------- 1 file changed, 22 insertions(+), 21 deletions(-) diff --git a/doc/design.md b/doc/design.md index bba234c..9df24f0 100644 --- a/doc/design.md +++ b/doc/design.md @@ -228,26 +228,6 @@ Sigsum logs implement an HTTP(S) API. Input and output is human-readable and use a simple ASCII format. A more complex parser like JSON is not needed since the data structures being exchanged are primitive enough. -[[move domain hint discussion to its own section vvv /ln]] -The signer also submits a _domain hint_. The log will download a DNS TXT -resource record based on the provided domain name. The downloaded result must -match the public verification key hash. By verifying that all signers control a -domain that is aware of their verification key, rate limits can be applied per -second-level domain. You would need a large number of domain names to spam the -log in any significant way if rate limits are not too loose. - -Using DNS to combat spam is convenient because many signers already have a -domain name. A single domain name is also relatively cheap. Another benefit is -that the same anti-spam mechanism can be used across several independent logs -without coordination. This is important because a healthy log ecosystem needs -more than one log to be reliable in case of downtime or unexpected events like - [cosmic rays](https://groups.google.com/a/chromium.org/g/ct-policy/c/PCkKU357M2Q/). - -A signer's domain hint is not part of the logged leaf because key management is -more complex than that. A separate project should focus on transparent key -management. Our work is about transparent _key-usage_. -[[^^^ move domain hint discussion to its own section /ln]] - A signer submits shard hint, checksum, signature, public verification key and domain hint as ASCII key-value pairs. The log verifies that the public verification key is present in DNS and uses it to check that the signature is valid, then constructs the Merkle tree leaf as described in 3.1 and hashes it to construct the leaf's key hash. @@ -364,7 +344,28 @@ set it as large as possible. If a verified timestamp is needed to reason about the time of logging, you may use a cosigned tree head instead [\[TS\]](https://git.sigsum.org/sigsum/commit/?id=fef460586e847e378a197381ef1ae3a64e6ea38b). -#### 4.3 - More questions +#### 4.3 - What is the point of having a domain hint? +Domain hints help log operators combat spam. By verifying that every signer +controls a domain name that is aware of their public key, rate limits can be +applied per second-level domain. You would need a large number of domain names +to spam a log in any significant way if rate limits are not set too loose. + +Notice that the effect of spam is not only about storage. It is also about +merge latencies. Too many submissions from a single party may render a log +unusable for others. This kind of incident happened in the real-world already + [\[Aviator\]](https://groups.google.com/a/chromium.org/g/ct-policy/c/ZZf3iryLgCo/m/rdTAHWcdBgAJ). + +Using DNS as an anti-spam mechanism is not a perfect solution. It is however +better than not having any anti-spam mechanism at all. We picked DNS because +many signers have a domain. A single domain name is also relatively cheap. + +A signer's domain hint is not part of the logged leaf because key management is +more complex than that. A separate project should focus on transparent key +management. Our work is about transparent _key-usage_. + +We are considering if additional anti-spam mechanisms should be supported. + +#### 4.4 - More questions - Why not store data in the log? XXX: answered enough already? - Why not store rich metadata in the log? XXX: answered enough already? - What (de)serialization parsers are needed and why? -- cgit v1.2.3 From 8211f0ecdf8a65584d34ee177616dda80ebcab17 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Sun, 10 Oct 2021 20:00:51 +0200 Subject: reworked partial enforcement of verification criteria - Expanded into two separate examples - Moved it into the verification subsection --- doc/design.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/doc/design.md b/doc/design.md index 9df24f0..22ceb1d 100644 --- a/doc/design.md +++ b/doc/design.md @@ -284,21 +284,22 @@ using a variant of witness cosigning. A verifier cannot be tricked into accepting data whose checksum have not been publicly logged unless the attacker controls more than a threshold of witnesses. +In a less ideal world sigsum logging can facilitate detection of attacks if a +verifier _fails open_ by enforcing the second and third criteria partially. For +example, some verifier may not enforce these criteria at all, and so would +accept data from a malicious data mirror without proofs of public logging. +Someone in a similar area may be able to detect this and report the attack. + +Another example of partial enforcement would be if a verifier required logging +in a known log without witnessing. Attacks against the signer's signing and +release infrastructure would be detected if the log is not compromised. + #### 3.2.6 - Monitoring An often overlooked step is that transparency logging falls short if no-one keeps track of what appears in the public logs. Monitoring is necessarily use-case specific in sigsum. At a minimum, a monitor needs to locate relevant public keys. It may also need to be aware of how to locate the data that a given checksum represents. -It should also be noted that sigsum logging can facilitate detection of attacks -even if a verifier fails open by enforcing the third and fourth criteria partially -in Section 3.2.5. For example, the fact that a distribution mechanism does not -serve proofs of public logging could indicate that there is an ongoing attack -against a signer's distributed infrastructure. A monitor may detect that. - -[["fails open" needs an explanation /ln]] -[["by enforcing the third and fourth criteria partially in Section 3.2.5" needs a little more context -- partially how? /ln]] - ### 3.3 - Summary [[move the summary to the top of section 3? /ln]] Sigsum logs are sharded and shut down at predefined times. A sigsum log can -- cgit v1.2.3 From 90f8d431fe694b0b6b040c31447d961bcc75e52f Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Sun, 10 Oct 2021 20:00:58 +0200 Subject: keep summary session at its current location I don't think it improves our design document by being moved. We already have a summary of properties in the introduction, and an easier primer at the start of Section 3 that is strongly coupled to Figure 1. Perhaps it is no longer necessary though. When we wrote this we did not have a summary of properties in introduction, or a relatively detailed walk-through of the log's intended usage-pattern. I'm fine with both keeping as is or deleting if it feels redundant. --- doc/design.md | 1 - 1 file changed, 1 deletion(-) diff --git a/doc/design.md b/doc/design.md index 22ceb1d..03769d2 100644 --- a/doc/design.md +++ b/doc/design.md @@ -301,7 +301,6 @@ specific in sigsum. At a minimum, a monitor needs to locate relevant public key may also need to be aware of how to locate the data that a given checksum represents. ### 3.3 - Summary -[[move the summary to the top of section 3? /ln]] Sigsum logs are sharded and shut down at predefined times. A sigsum log can shut down _safely_ because verification on the verifier-side is not interactive. -- cgit v1.2.3 From 5df66c5b5498195b5b076ca5f0eebdce8a9a7881 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Sun, 10 Oct 2021 20:01:06 +0200 Subject: added a few minor edits --- doc/.design.md.swp | Bin 0 -> 36864 bytes doc/design.md | 30 ++++++++++++++---------------- 2 files changed, 14 insertions(+), 16 deletions(-) create mode 100644 doc/.design.md.swp diff --git a/doc/.design.md.swp b/doc/.design.md.swp new file mode 100644 index 0000000..ff611e3 Binary files /dev/null and b/doc/.design.md.swp differ diff --git a/doc/design.md b/doc/design.md index 03769d2..f16fa81 100644 --- a/doc/design.md +++ b/doc/design.md @@ -108,7 +108,7 @@ additionally need to interact with a sigsum log's line-terminated ASCII HTTP(S) [API](https://git.sigsum.org/sigsum/tree/doc/api.md). ## 2 - Threat model -We consider a powerful attacker that has gained control of a signer's software signing and +We consider a powerful attacker that gained control of a signer's signing and release infrastructure. This covers a weaker form of attacker that is able to sign data and distribute it to a subset of isolated verifiers. For example, this is essentially what the FBI requested from Apple in the San Bernardino case @@ -117,7 +117,7 @@ The fact that signing keys and related infrastructure components get compromised should not be controversial these days [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/). -The same attacker has also gained control of the signing key and infrastructure of a sigsum log used for transparency. +The same attacker also gained control of the signing key and infrastructure of a sigsum log that is used for transparency. This covers a weaker form of attacker that is able to sign log data and distribute it to a subset of isolated verifiers. For example, this could have been the case when a remote code execution was found for a Certificate @@ -194,7 +194,7 @@ are accepted. Once elapsed, the log can be shut down or be made read-only. - **checksum**: most likely a hash of some data. The log is not aware of data; just checksums. - **signature**: a digital signature that is computed by a signer over the -shard hint and checksum. +selected shard hint and checksum. - **key_hash**: a cryptographic hash of the signer's verification key that can be used to verify the signature. @@ -207,30 +207,28 @@ other words, verifiers and monitors must locate signer verification keys indepen ### 3.2 - Usage pattern #### 3.2.1 - Prepare a request -A signer selects a shard hint representing an abstract statement like "sigsum logs that are -active during 2021". -A shard hint is -incorporated into the signed statement to ensure that a log's leaves cannot be -replayed in a non-overlapping shard, for example by a good Samaritan. - -The signer selects a checksum that should be logged, most likely the output of a -hash function. For example, it could be the hash of an executable binary. +A signer selects a checksum that should be logged. For example, it could be the +hash of an executable binary or something else. The signer also selects a shard +hint representing an abstract statement like "sigsum logs that are active during +2021". Shard hints ensure that a log's leaves cannot be replayed in a +non-overlapping shard. The signer signs the selected shard hint and checksum. The signer also has to do a one-time DNS setup. As outlined below, logs will check that _some domain_ is aware of the signer's verification key. This is part of a defense mechanism that helps log operators to deal with log spam. -Once present in DNS, a verification key can be used in log requests. +Once present in DNS, a verification key can be used in subsequent log requests. #### 3.2.2 - Submit request Sigsum logs implement an HTTP(S) API. Input and output is human-readable and use a simple ASCII format. A more complex parser like JSON is not needed since the data structures being exchanged are primitive enough. -A signer submits shard hint, checksum, signature, public verification +The signer submits their shard hint, checksum, signature, public verification key and domain hint as ASCII key-value pairs. The log verifies that the public verification key is present in DNS and uses it to check that -the signature is valid, then constructs the Merkle tree leaf as described in 3.1 and hashes it to construct the leaf's key hash. +the signature is valid, then hashes it to constructs the Merkle tree leaf as described in Section 3.1. + When a submitted logging request is accepted, the log _tries_ to incorporate the submitted leaf into its Merkle tree. There are however no _promises of public logging_ as in @@ -297,8 +295,8 @@ release infrastructure would be detected if the log is not compromised. #### 3.2.6 - Monitoring An often overlooked step is that transparency logging falls short if no-one keeps track of what appears in the public logs. Monitoring is necessarily use-case -specific in sigsum. At a minimum, a monitor needs to locate relevant public keys. It -may also need to be aware of how to locate the data that a given checksum represents. +specific in sigsum. At a minimum, monitors need to locate relevant public keys. They +may also need to be aware of how to locate the data that found checksums represent. ### 3.3 - Summary Sigsum logs are sharded and shut down at predefined times. A sigsum log can -- cgit v1.2.3 From ab2b24a7b9fab6ff6f13c3558f8007a41692038e Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Sun, 10 Oct 2021 20:01:12 +0200 Subject: fixed overflowing lines, no content changes --- doc/.design.md.swp | Bin 36864 -> 0 bytes doc/design.md | 99 +++++++++++++++++++++++++++++------------------------ 2 files changed, 55 insertions(+), 44 deletions(-) delete mode 100644 doc/.design.md.swp diff --git a/doc/.design.md.swp b/doc/.design.md.swp deleted file mode 100644 index ff611e3..0000000 Binary files a/doc/.design.md.swp and /dev/null differ diff --git a/doc/design.md b/doc/design.md index f16fa81..e155762 100644 --- a/doc/design.md +++ b/doc/design.md @@ -33,11 +33,11 @@ sigsum logging as pre-hashed digital signing with transparency. The signing party is called a _signer_. The user of the signed data is called a _verifier_. -The problem with _digital signing on its own_ is that it is difficult to determine -whether the signed data is _actually the data that should have been signed_. -How would we detect if a secret signing key got compromised? -How would we detect if something was signed by mistake, or even worse, -if the signing party was forced to sign malicious data against their will? +The problem with _digital signing on its own_ is that it is difficult to +determine whether the signed data is _actually the data that should have been +signed_. How would we detect if a secret signing key got compromised? How +would we detect if something was signed by mistake, or even worse, if the +signing party was forced to sign malicious data against their will? Sigsum logs make it possible to answers these types of questions. The basic idea is to make a signer's _key-usage_ transparent. This is a powerful building @@ -117,16 +117,17 @@ The fact that signing keys and related infrastructure components get compromised should not be controversial these days [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/). -The same attacker also gained control of the signing key and infrastructure of a sigsum log that is used for transparency. -This covers a weaker form of attacker that is able to sign log data and -distribute it to a subset of isolated verifiers. For example, this could have -been the case when a remote code execution was found for a Certificate -Transparency Log +The same attacker also gained control of the signing key and infrastructure of a +sigsum log that is used for transparency. This covers a weaker form of attacker +that is able to sign log data and distribute it to a subset of isolated +verifiers. For example, this could have been the case when a remote code +execution was found for a Certificate Transparency Log [\[DigiCert\]](https://groups.google.com/a/chromium.org/g/ct-policy/c/aKNbZuJzwfM). -The overall system is said to be secure if a log monitor can discover every signed -checksum that a verifier would accept. A log can misbehave by not presenting -the same append-only Merkle tree to everyone because it is attacker-controlled. +The overall system is said to be secure if a log monitor can discover every +signed checksum that a verifier would accept. +A log can misbehave by not presenting the same append-only Merkle tree to +everyone because it is attacker-controlled. However, a log operator would only do that if it is likely to go unnoticed. For security we need a collision resistant hash function and an unforgeable @@ -203,15 +204,17 @@ data that a checksum represents. Where data is located is use-case specific. Note that a key hash is logged rather than the public key itself. This reduces the likelihood that an untrusted key is discovered and used by mistake. In -other words, verifiers and monitors must locate signer verification keys independently of logs, and trust them explicitly. +other words, verifiers and monitors must locate signer verification keys +independently of logs, and trust them explicitly. ### 3.2 - Usage pattern #### 3.2.1 - Prepare a request A signer selects a checksum that should be logged. For example, it could be the -hash of an executable binary or something else. The signer also selects a shard -hint representing an abstract statement like "sigsum logs that are active during -2021". Shard hints ensure that a log's leaves cannot be replayed in a -non-overlapping shard. +hash of an executable binary or something else. + +The signer also selects a shard hint representing an abstract statement like +"sigsum logs that are active during 2021". Shard hints ensure that a log's +leaves cannot be replayed in a non-overlapping shard. The signer signs the selected shard hint and checksum. @@ -226,20 +229,23 @@ use a simple ASCII format. A more complex parser like JSON is not needed since the data structures being exchanged are primitive enough. The signer submits their shard hint, checksum, signature, public verification -key and domain hint as ASCII key-value pairs. The log verifies that the public verification key is present in DNS and uses it to check that -the signature is valid, then hashes it to constructs the Merkle tree leaf as described in Section 3.1. - +key and domain hint as ASCII key-value pairs. The log verifies that the public +verification key is present in DNS and uses it to check that the signature is +valid, then hashes it to constructs the Merkle tree leaf as described in +Section 3.1. -When a submitted logging -request is accepted, the log _tries_ to incorporate the submitted leaf into its Merkle tree. There are however no _promises of public logging_ as in -Certificate Transparency. Therefore, sigsum logs do not provide low latency -- the -signer has to wait for an inclusion proof and a cosigned tree head. +When a submitted logging request is accepted, the log _tries_ to incorporate the +submitted leaf into its Merkle tree. There are however no _promises of public +logging_ as in Certificate Transparency. Therefore, sigsum logs do not provide +low latency---the signer has to wait for an inclusion proof and a cosigned tree +head. #### 3.2.3 - Wait for witness cosigning -Sigsum logs periodically freeze the most current tree head, typically every five minutes. Cosigning witnesses poll -logs for so-called _to-sign_ tree heads and verify that they are fresh and -append-only before doing a cosignature operation. Cosignatures are posted back -to logs so that signers can easily fetch finalized cosigned tree heads. +Sigsum logs periodically freeze the most current tree head, typically every five +minutes. Cosigning witnesses poll logs for so-called _to-sign_ tree heads and +verify that they are fresh and append-only before doing a cosignature operation. +Cosignatures are posted back to logs so that signers can easily fetch finalized +cosigned tree heads. It thus takes five to ten minutes before a signer's distribution phase can start. The added latency is a trade-off that simplifies sigsum logging by removing the @@ -258,11 +264,13 @@ the data. For example, on a website, in a git repository, etc. Signers distribute at least the following pieces: **Data:** -the signer's data, for example an executable binary. It can be used to reproduce a logged checksum. +the signer's data, for example an executable binary. It can be used to +reproduce a logged checksum. **Metadata:** -the shard hint, the signature over shard hint and checksum, and the verification key hash used in the log request. Note that the -combination of data and metadata can be used to reconstruct the logged leaf. +the shard hint, the signature over shard hint and checksum, and the verification +key hash used in the log request. Note that the combination of data and +metadata can be used to reconstruct the logged leaf. **Proof:** an inclusion proof that leads up to a cosigned tree head. Note that _proof_ @@ -293,10 +301,11 @@ in a known log without witnessing. Attacks against the signer's signing and release infrastructure would be detected if the log is not compromised. #### 3.2.6 - Monitoring -An often overlooked step is that transparency logging falls short if no-one keeps -track of what appears in the public logs. Monitoring is necessarily use-case -specific in sigsum. At a minimum, monitors need to locate relevant public keys. They -may also need to be aware of how to locate the data that found checksums represent. +An often overlooked step is that transparency logging falls short if no-one +keeps track of what appears in the public logs. Monitoring is necessarily +use-case specific in sigsum. At a minimum, monitors need to locate relevant +public keys. They may also need to be aware of how to locate the data that +found checksums represent. ### 3.3 - Summary Sigsum logs are sharded and shut down at predefined times. A sigsum log can @@ -304,12 +313,14 @@ shut down _safely_ because verification on the verifier-side is not interactive. The difficulty of bypassing public logging is based on the difficulty of controlling enough independent witnesses. A witness checks that a log's tree -head is correct before cosigning. Correctness includes freshness and the append-only property. +head is correct before cosigning. Correctness includes freshness and the +append-only property. Signers, monitors, and witnesses interact with the logs using an ASCII HTTP(S) -API. A signer must prove that they control a DNS domain name as an anti-spam mechanism. -No data or rich metadata is being logged, to protect the log operator from poisoning. -This also keeps log operations simpler because there are less data to manage. +API. A signer must prove that they control a DNS domain name as an anti-spam +mechanism. No data or rich metadata is being logged, to protect the log +operator from poisoning. This also keeps log operations simpler because there +are less data to manage. Verifiers interact with logs indirectly through their signer's existing distribution mechanism. Signers are responsible for logging signed checksums @@ -326,10 +337,10 @@ about. We are still open to remove, add, or change things. #### 4.2 - What is the point of having a shard hint? Unlike TLS certificates which already have validity ranges, a checksum does not carry any such information. Therefore, we require that the signer selects a -shard hint. The selected shard hint must be within a log's shard interval. A shard -interval is defined by a start time and an end time. Both ends of the shard -interval are inclusive and expressed as the number of seconds since the UNIX -epoch (January 1, 1970 00:00 UTC). +shard hint. The selected shard hint must be within a log's shard interval. A +shard interval is defined by a start time and an end time. Both ends of the +shard interval are inclusive and expressed as the number of seconds since the +UNIX epoch (January 1, 1970 00:00 UTC). Without sharding, a good Samaritan can add all leaves from an old log into a newer one that just started its operations. This makes log operations -- cgit v1.2.3 From ab7b2645e73bc0880960d8b1378bcc9a926acd1d Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Sun, 10 Oct 2021 20:01:17 +0200 Subject: explained property of usage pattern that relates to sharding --- doc/design.md | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/doc/design.md b/doc/design.md index e155762..9030091 100644 --- a/doc/design.md +++ b/doc/design.md @@ -284,11 +284,15 @@ its inclusion proof. 3. The provided tree head is from a known log with enough valid cosignatures. Notice that there are no new outbound network connections for a verifier. -Therefore, a proof of public logging is only as convincing as the tree head that -an inclusion proof leads up to. Sigsum logs have trustworthy tree heads thanks to -using a variant of witness cosigning. A verifier cannot be -tricked into accepting data whose checksum have not been publicly logged -unless the attacker controls more than a threshold of witnesses. +Therefore, a verifier will not be affected by future log downtime since the +signer already collected relevant proofs of public logging. Log downtime may be +caused by temporary operational issues or simply because a shard is done. + +The lack of external communication means that a proof of public logging cannot +be more convincing than the tree head an inclusion proof leads up to. Sigsum +logs have trustworthy tree heads thanks to using a variant of witness cosigning. +A verifier cannot be tricked into accepting data whose checksum have not been +publicly logged unless the attacker controls more than a threshold of witnesses. In a less ideal world sigsum logging can facilitate detection of attacks if a verifier _fails open_ by enforcing the second and third criteria partially. For @@ -353,6 +357,10 @@ set it as large as possible. If a verified timestamp is needed to reason about the time of logging, you may use a cosigned tree head instead [\[TS\]](https://git.sigsum.org/sigsum/commit/?id=fef460586e847e378a197381ef1ae3a64e6ea38b). +A log operator that shuts down a completed shard will not effect verifiers. In +other words, a signer can continue to distribute proofs that were once +collected. This is important because a checksum does not necessarily expire. + #### 4.3 - What is the point of having a domain hint? Domain hints help log operators combat spam. By verifying that every signer controls a domain name that is aware of their public key, rate limits can be -- cgit v1.2.3 From d1ea4e9a9940367cc5dfdaf0d9eab99d1a54eb8b Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Sun, 10 Oct 2021 20:01:22 +0200 Subject: emphasized "attacker" instead of "log operator" --- doc/design.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/design.md b/doc/design.md index 9030091..40287a4 100644 --- a/doc/design.md +++ b/doc/design.md @@ -128,7 +128,7 @@ The overall system is said to be secure if a log monitor can discover every signed checksum that a verifier would accept. A log can misbehave by not presenting the same append-only Merkle tree to everyone because it is attacker-controlled. -However, a log operator would only do that if it is likely to go unnoticed. +The attacker would only do that if it is likely to go unnoticed, however. For security we need a collision resistant hash function and an unforgeable signature scheme. We also assume that at most a threshold of independent @@ -137,7 +137,7 @@ attempts [split-view](https://datatracker.ietf.org/doc/html/draft-ietf-trans-gossip-05) and [slow-down](https://git.sigsum.org/sigsum/tree/archive/2021-08-24-checkpoint-timestamp) -attacks. A log operator can at best deny service with these assumptions. +attacks. An attacker can at best deny service with these assumptions. ## 3 - Design An overview of sigsum logging is shown in Figure 1. Before going into detail -- cgit v1.2.3 From 4912cd5813e1ce69c3d8c5a95d91a78f70d92172 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Sun, 10 Oct 2021 20:01:25 +0200 Subject: added additional witnessing thoughts in FAQ --- doc/design.md | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/doc/design.md b/doc/design.md index 40287a4..1117f02 100644 --- a/doc/design.md +++ b/doc/design.md @@ -382,13 +382,31 @@ management. Our work is about transparent _key-usage_. We are considering if additional anti-spam mechanisms should be supported. -#### 4.4 - More questions +#### 4.4 - Is witness cosigning done? +There are interesting policy aspects that relate to witness cosigning. For +example, what witnessing policy should a verifier use and how are trustworthy +witnesses discovered. This is somewhat analogous to a related policy question +that all log ecosystems must address. Which logs should be considered known? + +We do however think that witness cosigning could be done _from the perspective +of a log and its operator_. The + [sigsum/v0 API](https://git.sigsum.org/sigsum/tree/doc/api.md) +supports witness cosigning. Policy aspects for a log operator are easy because +it is relatively cheap to allow a witness to be a cosigner. It is not a log +operator's job to determine if any real-world entity is trustworthy. It is not +even a log operator's job to help signers and verifiers discover witness keys. + +Given a permissive policy for which witnesses are allowed to cosign, a signer +may not care for all retrieved cosignatures. Unwanted ones can simply be +removed before distribution to a verifier takes place. This is in contrast to +the original proposal by + [Syta et al.](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7546521), +which puts an authority right in the middle of a slowly evolving witnessing policy. + +#### 4.5 - More questions - Why not store data in the log? XXX: answered enough already? - Why not store rich metadata in the log? XXX: answered enough already? - What (de)serialization parsers are needed and why? - What cryptographic primitives are supported and why? -- What thought went into witness cosigning? Compare with other approaches, and -should include `get-tree-head-*` endpoints in more detail. - What are the privacy concerns? - How does it work with more than one log? -- What policy should a verifier follow? -- cgit v1.2.3 From 8218cefd1b0789807208f01df3e2a382748cb371 Mon Sep 17 00:00:00 2001 From: Linus Nordberg Date: Tue, 12 Oct 2021 14:18:55 +0200 Subject: minor wording --- doc/design.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/doc/design.md b/doc/design.md index 1117f02..0393a91 100644 --- a/doc/design.md +++ b/doc/design.md @@ -231,7 +231,7 @@ since the data structures being exchanged are primitive enough. The signer submits their shard hint, checksum, signature, public verification key and domain hint as ASCII key-value pairs. The log verifies that the public verification key is present in DNS and uses it to check that the signature is -valid, then hashes it to constructs the Merkle tree leaf as described in +valid, then hashes it to construct the Merkle tree leaf as described in Section 3.1. When a submitted logging request is accepted, the log _tries_ to incorporate the @@ -279,7 +279,7 @@ refers to the collection of an inclusion proof and a cosigned tree head. #### 3.2.5 - Verification A verifier should only accept the distributed data if the following criteria hold: 1. The data's checksum and shard hint are signed using the specified public key. -2. The provided tree head can be reconstructed from the logged leaf and +2. The provided tree head can be reconstructed from the logged leaf and its inclusion proof. 3. The provided tree head is from a known log with enough valid cosignatures. @@ -309,7 +309,7 @@ An often overlooked step is that transparency logging falls short if no-one keeps track of what appears in the public logs. Monitoring is necessarily use-case specific in sigsum. At a minimum, monitors need to locate relevant public keys. They may also need to be aware of how to locate the data that -found checksums represent. +logged checksums represent. ### 3.3 - Summary Sigsum logs are sharded and shut down at predefined times. A sigsum log can @@ -357,7 +357,7 @@ set it as large as possible. If a verified timestamp is needed to reason about the time of logging, you may use a cosigned tree head instead [\[TS\]](https://git.sigsum.org/sigsum/commit/?id=fef460586e847e378a197381ef1ae3a64e6ea38b). -A log operator that shuts down a completed shard will not effect verifiers. In +A log operator that shuts down a completed shard will not affect verifiers. In other words, a signer can continue to distribute proofs that were once collected. This is important because a checksum does not necessarily expire. @@ -369,7 +369,7 @@ to spam a log in any significant way if rate limits are not set too loose. Notice that the effect of spam is not only about storage. It is also about merge latencies. Too many submissions from a single party may render a log -unusable for others. This kind of incident happened in the real-world already +unusable for others. This kind of incident happened in the real world already [\[Aviator\]](https://groups.google.com/a/chromium.org/g/ct-policy/c/ZZf3iryLgCo/m/rdTAHWcdBgAJ). Using DNS as an anti-spam mechanism is not a perfect solution. It is however @@ -382,7 +382,7 @@ management. Our work is about transparent _key-usage_. We are considering if additional anti-spam mechanisms should be supported. -#### 4.4 - Is witness cosigning done? +#### 4.4 - Are you done with the witness cosigning design? There are interesting policy aspects that relate to witness cosigning. For example, what witnessing policy should a verifier use and how are trustworthy witnesses discovered. This is somewhat analogous to a related policy question -- cgit v1.2.3 From 7d5f7cf12c6b16baa31a942a88e6a12affcb8a73 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Tue, 12 Oct 2021 16:06:28 +0200 Subject: renamed section 4.4 Discussed with ln5. --- doc/design.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/design.md b/doc/design.md index 0393a91..821ba88 100644 --- a/doc/design.md +++ b/doc/design.md @@ -382,7 +382,7 @@ management. Our work is about transparent _key-usage_. We are considering if additional anti-spam mechanisms should be supported. -#### 4.4 - Are you done with the witness cosigning design? +#### 4.4 - What parts of witness cosigning are not done? There are interesting policy aspects that relate to witness cosigning. For example, what witnessing policy should a verifier use and how are trustworthy witnesses discovered. This is somewhat analogous to a related policy question -- cgit v1.2.3 From 29d1971f5f5dd3a3f2943d1add5029c62e5f9372 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Tue, 12 Oct 2021 17:38:34 +0200 Subject: removed mention of Go modules Strictly speaking the logged data structure is a checksum of a Go module. This could be confusing with the wording of this sentence. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index aa068e7..d938a92 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # The Sigsum Project Sigsum is a free and open-source project that brings transparency logging to **sig**ned check**sum**s. The overall design is kept general by not logging -a more concrete data structure like TLS certificates or Go modules. +a more concrete data structure like TLS certificates. - [x] Discoverability of signed checksums for the data of your choice - [x] Centralised log operations but distributed trust assumptions -- cgit v1.2.3 From 34746cefa42bb7d4fd1b3d8bace285bd393db7d5 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Tue, 12 Oct 2021 17:40:26 +0200 Subject: spelled out FOSS project correctly Spotted by ln5. --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index d938a92..8aef8d4 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # The Sigsum Project -Sigsum is a free and open-source project that brings transparency logging to -**sig**ned check**sum**s. The overall design is kept general by not logging -a more concrete data structure like TLS certificates. +Sigsum is a free and open source software project that brings transparency +logging to **sig**ned check**sum**s. The overall design is kept general +by not logging a more concrete data structure like TLS certificates. - [x] Discoverability of signed checksums for the data of your choice - [x] Centralised log operations but distributed trust assumptions -- cgit v1.2.3 From 8c10d09289289ddbc349503dac4b0493bf73b2b3 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Tue, 12 Oct 2021 17:43:03 +0200 Subject: removed comments about partial enforcement To be re-added at a later time somewhere else. It is not helpful for a reader that is trying to understand the basic design for the first time. Spotted by ln5. --- doc/design.md | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/doc/design.md b/doc/design.md index 821ba88..e1f3b5e 100644 --- a/doc/design.md +++ b/doc/design.md @@ -294,16 +294,6 @@ logs have trustworthy tree heads thanks to using a variant of witness cosigning. A verifier cannot be tricked into accepting data whose checksum have not been publicly logged unless the attacker controls more than a threshold of witnesses. -In a less ideal world sigsum logging can facilitate detection of attacks if a -verifier _fails open_ by enforcing the second and third criteria partially. For -example, some verifier may not enforce these criteria at all, and so would -accept data from a malicious data mirror without proofs of public logging. -Someone in a similar area may be able to detect this and report the attack. - -Another example of partial enforcement would be if a verifier required logging -in a known log without witnessing. Attacks against the signer's signing and -release infrastructure would be detected if the log is not compromised. - #### 3.2.6 - Monitoring An often overlooked step is that transparency logging falls short if no-one keeps track of what appears in the public logs. Monitoring is necessarily -- cgit v1.2.3 From 924b2d40311831dd8158f63afe067fd43db7ee98 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Tue, 12 Oct 2021 17:51:16 +0200 Subject: cleaned-up more questions section These questions are to some extent answered as part of our refactor, or addressed as things we are still open to think more about. I think we can leave them out for now and add them later _with answers_ if needed. I kept the privacy concerns question because that is not addressed anywhere yet. We think that the answer is "mostly none". --- doc/design.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/doc/design.md b/doc/design.md index e1f3b5e..4f7c06a 100644 --- a/doc/design.md +++ b/doc/design.md @@ -394,9 +394,5 @@ the original proposal by which puts an authority right in the middle of a slowly evolving witnessing policy. #### 4.5 - More questions -- Why not store data in the log? XXX: answered enough already? -- Why not store rich metadata in the log? XXX: answered enough already? -- What (de)serialization parsers are needed and why? -- What cryptographic primitives are supported and why? - What are the privacy concerns? -- How does it work with more than one log? +- Add more questions here! -- cgit v1.2.3