From ab2b24a7b9fab6ff6f13c3558f8007a41692038e Mon Sep 17 00:00:00 2001
From: Rasmus Dahlberg <rasmus.dahlberg@kau.se>
Date: Sun, 10 Oct 2021 20:01:12 +0200
Subject: fixed overflowing lines, no content changes

---
 doc/design.md | 99 +++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 55 insertions(+), 44 deletions(-)

(limited to 'doc/design.md')

diff --git a/doc/design.md b/doc/design.md
index f16fa81..e155762 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -33,11 +33,11 @@ sigsum logging as pre-hashed digital signing with transparency.
 The signing party is called a _signer_.
 The user of the signed data is called a _verifier_.
 
-The problem with _digital signing on its own_ is that it is difficult to determine
-whether the signed data is _actually the data that should have been signed_.
-How would we detect if a secret signing key got compromised?
-How would we detect if something was signed by mistake, or even worse,
-if the signing party was forced to sign malicious data against their will?
+The problem with _digital signing on its own_ is that it is difficult to
+determine whether the signed data is _actually the data that should have been
+signed_.  How would we detect if a secret signing key got compromised?  How
+would we detect if something was signed by mistake, or even worse, if the
+signing party was forced to sign malicious data against their will?
 
 Sigsum logs make it possible to answers these types of questions.  The basic
 idea is to make a signer's _key-usage_ transparent.  This is a powerful building
@@ -117,16 +117,17 @@ The fact that signing keys and related infrastructure components get
 compromised should not be controversial these days
 	[\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/).
 
-The same attacker also gained control of the signing key and infrastructure of a sigsum log that is used for transparency.
-This covers a weaker form of attacker that is able to sign log data and
-distribute it to a subset of isolated verifiers.  For example, this could have
-been the case when a remote code execution was found for a Certificate
-Transparency Log
+The same attacker also gained control of the signing key and infrastructure of a
+sigsum log that is used for transparency.  This covers a weaker form of attacker
+that is able to sign log data and distribute it to a subset of isolated
+verifiers.  For example, this could have been the case when a remote code
+execution was found for a Certificate Transparency Log
 	[\[DigiCert\]](https://groups.google.com/a/chromium.org/g/ct-policy/c/aKNbZuJzwfM).
 
-The overall system is said to be secure if a log monitor can discover every signed
-checksum that a verifier would accept.  A log can misbehave by not presenting
-the same append-only Merkle tree to everyone because it is attacker-controlled.
+The overall system is said to be secure if a log monitor can discover every
+signed checksum that a verifier would accept.
+A log can misbehave by not presenting the same append-only Merkle tree to
+everyone because it is attacker-controlled.
 However, a log operator would only do that if it is likely to go unnoticed.
 
 For security we need a collision resistant hash function and an unforgeable
@@ -203,15 +204,17 @@ data that a checksum represents.  Where data is located is use-case specific.
 
 Note that a key hash is logged rather than the public key itself.  This reduces
 the likelihood that an untrusted key is discovered and used by mistake.  In
-other words, verifiers and monitors must locate signer verification keys independently of logs, and trust them explicitly.
+other words, verifiers and monitors must locate signer verification keys
+independently of logs, and trust them explicitly.
 
 ### 3.2 - Usage pattern
 #### 3.2.1 - Prepare a request
 A signer selects a checksum that should be logged.  For example, it could be the
-hash of an executable binary or something else.  The signer also selects a shard
-hint representing an abstract statement like "sigsum logs that are active during
-2021".  Shard hints ensure that a log's leaves cannot be replayed in a
-non-overlapping shard.
+hash of an executable binary or something else.
+
+The signer also selects a shard hint representing an abstract statement like
+"sigsum logs that are active during 2021".  Shard hints ensure that a log's
+leaves cannot be replayed in a non-overlapping shard.
 
 The signer signs the selected shard hint and checksum.
 
@@ -226,20 +229,23 @@ use a simple ASCII format.  A more complex parser like JSON is not needed
 since the data structures being exchanged are primitive enough.
 
 The signer submits their shard hint, checksum, signature, public verification
-key and domain hint as ASCII key-value pairs.  The log verifies that the public verification key is present in DNS and uses it to check that
-the signature is valid, then hashes it to constructs the Merkle tree leaf as described in Section 3.1.
-
+key and domain hint as ASCII key-value pairs.  The log verifies that the public
+verification key is present in DNS and uses it to check that the signature is
+valid, then hashes it to constructs the Merkle tree leaf as described in
+Section 3.1.
 
-When a submitted logging
-request is accepted, the log _tries_ to incorporate the submitted leaf into its Merkle tree.  There are however no _promises of public logging_ as in
-Certificate Transparency.  Therefore, sigsum logs do not provide low latency -- the
-signer has to wait for an inclusion proof and a cosigned tree head.
+When a submitted logging request is accepted, the log _tries_ to incorporate the
+submitted leaf into its Merkle tree.  There are however no _promises of public
+logging_ as in Certificate Transparency.  Therefore, sigsum logs do not provide
+low latency---the signer has to wait for an inclusion proof and a cosigned tree
+head.
 
 #### 3.2.3 - Wait for witness cosigning
-Sigsum logs periodically freeze the most current tree head, typically every five minutes.  Cosigning witnesses poll
-logs for so-called _to-sign_ tree heads and verify that they are fresh and
-append-only before doing a cosignature operation.  Cosignatures are posted back
-to logs so that signers can easily fetch finalized cosigned tree heads.
+Sigsum logs periodically freeze the most current tree head, typically every five
+minutes.  Cosigning witnesses poll logs for so-called _to-sign_ tree heads and
+verify that they are fresh and append-only before doing a cosignature operation.
+Cosignatures are posted back to logs so that signers can easily fetch finalized
+cosigned tree heads.
 
 It thus takes five to ten minutes before a signer's distribution phase can start.
 The added latency is a trade-off that simplifies sigsum logging by removing the
@@ -258,11 +264,13 @@ the data.  For example, on a website, in a git repository, etc.
 Signers distribute at least the following pieces:
 
 **Data:**
-the signer's data, for example an executable binary.  It can be used to reproduce a logged checksum.
+the signer's data, for example an executable binary.  It can be used to
+reproduce a logged checksum.
 
 **Metadata:**
-the shard hint, the signature over shard hint and checksum, and the verification key hash used in the log request.  Note that the
-combination of data and metadata can be used to reconstruct the logged leaf.
+the shard hint, the signature over shard hint and checksum, and the verification
+key hash used in the log request.  Note that the combination of data and
+metadata can be used to reconstruct the logged leaf.
 
 **Proof:**
 an inclusion proof that leads up to a cosigned tree head.  Note that _proof_
@@ -293,10 +301,11 @@ in a known log without witnessing.  Attacks against the signer's signing and
 release infrastructure would be detected if the log is not compromised.
 
 #### 3.2.6 - Monitoring
-An often overlooked step is that transparency logging falls short if no-one keeps
-track of what appears in the public logs.  Monitoring is necessarily use-case
-specific in sigsum.  At a minimum, monitors need to locate relevant public keys.  They
-may also need to be aware of how to locate the data that found checksums represent.
+An often overlooked step is that transparency logging falls short if no-one
+keeps track of what appears in the public logs.  Monitoring is necessarily
+use-case specific in sigsum.  At a minimum, monitors need to locate relevant
+public keys.  They may also need to be aware of how to locate the data that
+found checksums represent.
 
 ### 3.3 - Summary
 Sigsum logs are sharded and shut down at predefined times.  A sigsum log can
@@ -304,12 +313,14 @@ shut down _safely_ because verification on the verifier-side is not interactive.
 
 The difficulty of bypassing public logging is based on the difficulty of
 controlling enough independent witnesses.  A witness checks that a log's tree
-head is correct before cosigning.  Correctness includes freshness and the append-only property.
+head is correct before cosigning.  Correctness includes freshness and the
+append-only property.
 
 Signers, monitors, and witnesses interact with the logs using an ASCII HTTP(S)
-API.  A signer must prove that they control a DNS domain name as an anti-spam mechanism.
-No data or rich metadata is being logged, to protect the log operator from poisoning.
-This also keeps log operations simpler because there are less data to manage.
+API.  A signer must prove that they control a DNS domain name as an anti-spam
+mechanism.  No data or rich metadata is being logged, to protect the log
+operator from poisoning.  This also keeps log operations simpler because there
+are less data to manage.
 
 Verifiers interact with logs indirectly through their signer's existing
 distribution mechanism.  Signers are responsible for logging signed checksums
@@ -326,10 +337,10 @@ about.  We are still open to remove, add, or change things.
 #### 4.2 - What is the point of having a shard hint?
 Unlike TLS certificates which already have validity ranges, a checksum does not
 carry any such information.  Therefore, we require that the signer selects a
-shard hint.  The selected shard hint must be within a log's shard interval.  A shard
-interval is defined by a start time and an end time.  Both ends of the shard
-interval are inclusive and expressed as the number of seconds since the UNIX
-epoch (January 1, 1970 00:00 UTC).
+shard hint.  The selected shard hint must be within a log's shard interval.  A
+shard interval is defined by a start time and an end time.  Both ends of the
+shard interval are inclusive and expressed as the number of seconds since the
+UNIX epoch (January 1, 1970 00:00 UTC).
 
 Without sharding, a good Samaritan can add all leaves from an old log into a
 newer one that just started its operations.  This makes log operations
-- 
cgit v1.2.3