From 24cc6b0db8ef9c718925d14b329f21938e5d2b1b Mon Sep 17 00:00:00 2001
From: Rasmus Dahlberg <rasmus.dahlberg@kau.se>
Date: Tue, 20 Apr 2021 12:28:28 +0200
Subject: started on our in-progress (re)design documents

---
 doc/design.md | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)
 create mode 100644 doc/design.md

(limited to 'doc/design.md')

diff --git a/doc/design.md b/doc/design.md
new file mode 100644
index 0000000..f966d03
--- /dev/null
+++ b/doc/design.md
@@ -0,0 +1,32 @@
+# System Transparency Logging: Design v0
+We propose System Transparency logging.  It is similar to Certificate
+Transparency, expect that cryptographically signed checksums are logged as
+opposed to X.509 certificates.  Publicly logging signed checksums allow anyone
+to discover which keys signed what.  As such, malicious and unintended key-usage
+can be _discovered_.  We present our design and discuss how two possible
+use-cases influenced it: binary transparency and reproducible builds.
+
+**Target audience.**
+You are most likely interested in transparency logs or supply-chain security.
+
+**Preliminaries.**
+You have basic understanding of cryptographic primitives like digital
+signatures, hash functions, and Merkle trees.  You roughly know what problem
+Certificate Transparency solves and how.  You may never have heard the term
+_gossip-audit model_, or know how it is related to trust assumptions and
+detectability properties.
+
+**Warning.**
+This is a work-in-progress document that may be moved or modified.
+
+## Introduction
+Transparency logs make it possible to detect unwanted events.  For example,
+	are there any (mis-)issued TLS certificates [\[CT\]](https://tools.ietf.org/html/rfc6962),
+	did you get a different Go module than everyone else [\[ChecksumDB\]](https://go.googlesource.com/proposal/+/master/design/25530-sumdb.md),
+	or is someone running unexpected commands on your server [\[AuditLog\]](https://transparency.dev/application/reliably-log-all-actions-performed-on-your-servers/).
+System Transparency logging makes signed checksums transparent.  The goal is to
+_detect_ unwanted key-usage without making assumptions about the signed data.
+
+## Threat model and (non-)goals
+
+## Design
-- 
cgit v1.2.3


From 87a2fa506c1861158ca04fd34d64e10b6447d8f3 Mon Sep 17 00:00:00 2001
From: Rasmus Dahlberg <rasmus.dahlberg@kau.se>
Date: Mon, 26 Apr 2021 19:54:06 +0200
Subject: added drafty threat model text

---
 doc/design.md | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

(limited to 'doc/design.md')

diff --git a/doc/design.md b/doc/design.md
index f966d03..59cd7c8 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -28,5 +28,35 @@ System Transparency logging makes signed checksums transparent.  The goal is to
 _detect_ unwanted key-usage without making assumptions about the signed data.
 
 ## Threat model and (non-)goals
+We consider a powerful attacker that gained control of a target's signing and
+release infrastructure.  This covers a weaker form of attacker that is able to
+sign data and distribute it to a subset of isolated users.  For example, this is
+essentially what FBI requested from Apple in the San Bernardino case [\[FBI-Apple\]](https://www.eff.org/cases/apple-challenges-fbi-all-writs-act-order).
+The fact that signing keys and related infrastructure components get
+compromised should not be controversial [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/).
+
+The attacker can also gain control of the transparency log's signing key and
+infrastructure.  This covers a weaker form of attacker that is able to sign log
+data and distribute it to a subset of isolated users.  For example, this could
+have been the case when a remote code execution was found for a Certificate
+Transparency Log [\[DigiCert\]](https://groups.google.com/a/chromium.org/g/ct-policy/c/aKNbZuJzwfM).
+
+Any attacker that is able to position itself to control these components will
+likely be _risk-averse_.  This is at minimum due to two factors.  First,
+detection would result in a significant loss of capability that is by no means
+trivial to come by.  Second, detection means that some part of the attacker's
+malicious behavior will be disclosed publicly.
+
+Our goal is to facilitate _detection_ of compromised signing keys.  Therefore,
+we transparency log signed checksums.  We assume that clients _fail closed_ if a
+checksum does not appear in a public log.  We also assume that the attacker
+controls at most a threshold of independent parties to achieve our goal
+("strength in numbers").
+
+It is a non-goal to disclose the data that a signed checksum represents.  For
+example, the log cannot distinguish between a checksum that represents a tax
+declaration, an ISO image, or a Debian package.  This means that the type of
+detection we support is _courser-grained_ when compared to Certificate
+Transparency.
 
 ## Design
-- 
cgit v1.2.3


From 94fea7a3c993686d26efbf7ca9b73d598222a272 Mon Sep 17 00:00:00 2001
From: Rasmus Dahlberg <rasmus.dahlberg@kau.se>
Date: Thu, 29 Apr 2021 14:50:49 +0200
Subject: added start on design document

Work in progress.
---
 doc/design.md | 196 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 176 insertions(+), 20 deletions(-)

(limited to 'doc/design.md')

diff --git a/doc/design.md b/doc/design.md
index 59cd7c8..9fcf4b6 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -2,9 +2,9 @@
 We propose System Transparency logging.  It is similar to Certificate
 Transparency, expect that cryptographically signed checksums are logged as
 opposed to X.509 certificates.  Publicly logging signed checksums allow anyone
-to discover which keys signed what.  As such, malicious and unintended key-usage
-can be _discovered_.  We present our design and discuss how two possible
-use-cases influenced it: binary transparency and reproducible builds.
+to discover which keys produced what signatures.  As such, malicious and
+unintended key-usage can be _detected_.  We present our design and conclude by
+providing two use-cases: binary transparency and reproducible builds.
 
 **Target audience.**
 You are most likely interested in transparency logs or supply-chain security.
@@ -12,20 +12,20 @@ You are most likely interested in transparency logs or supply-chain security.
 **Preliminaries.**
 You have basic understanding of cryptographic primitives like digital
 signatures, hash functions, and Merkle trees.  You roughly know what problem
-Certificate Transparency solves and how.  You may never have heard the term
-_gossip-audit model_, or know how it is related to trust assumptions and
-detectability properties.
+Certificate Transparency solves and how.
 
 **Warning.**
-This is a work-in-progress document that may be moved or modified.
+This is a work-in-progress document that may be moved or modified.  A future
+revision of this document will bump the version number to v1.  Please let us
+know if you have any feedback.
 
 ## Introduction
 Transparency logs make it possible to detect unwanted events.  For example,
 	are there any (mis-)issued TLS certificates [\[CT\]](https://tools.ietf.org/html/rfc6962),
 	did you get a different Go module than everyone else [\[ChecksumDB\]](https://go.googlesource.com/proposal/+/master/design/25530-sumdb.md),
 	or is someone running unexpected commands on your server [\[AuditLog\]](https://transparency.dev/application/reliably-log-all-actions-performed-on-your-servers/).
-System Transparency logging makes signed checksums transparent.  The goal is to
-_detect_ unwanted key-usage without making assumptions about the signed data.
+A System Transparency log makes signed checksums transparent.  The overall goal
+is to facilitate detection of unwanted key-usage.
 
 ## Threat model and (non-)goals
 We consider a powerful attacker that gained control of a target's signing and
@@ -33,7 +33,7 @@ release infrastructure.  This covers a weaker form of attacker that is able to
 sign data and distribute it to a subset of isolated users.  For example, this is
 essentially what FBI requested from Apple in the San Bernardino case [\[FBI-Apple\]](https://www.eff.org/cases/apple-challenges-fbi-all-writs-act-order).
 The fact that signing keys and related infrastructure components get
-compromised should not be controversial [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/).
+compromised should not be controversial these days [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/).
 
 The attacker can also gain control of the transparency log's signing key and
 infrastructure.  This covers a weaker form of attacker that is able to sign log
@@ -47,16 +47,172 @@ detection would result in a significant loss of capability that is by no means
 trivial to come by.  Second, detection means that some part of the attacker's
 malicious behavior will be disclosed publicly.
 
-Our goal is to facilitate _detection_ of compromised signing keys.  Therefore,
-we transparency log signed checksums.  We assume that clients _fail closed_ if a
-checksum does not appear in a public log.  We also assume that the attacker
-controls at most a threshold of independent parties to achieve our goal
-("strength in numbers").
+Our goal is to facilitate _detection_ of compromised signing keys.  We consider
+a signing key compromised if an end-user accepts an unwanted signature as valid.
+The solution that we propose is that signed checksums are transparency logged.
+For security we need a collision resistant hash function and an unforgeable
+signature scheme.  We also assume that at most a threshold of seemingly
+independent parties are adversarial.
 
-It is a non-goal to disclose the data that a signed checksum represents.  For
-example, the log cannot distinguish between a checksum that represents a tax
-declaration, an ISO image, or a Debian package.  This means that the type of
-detection we support is _courser-grained_ when compared to Certificate
-Transparency.
+It is a non-goal to disclose the data that a checksum represents.  For example,
+the log cannot distinguish between a checksum that represents a tax declaration,
+an ISO image, or a Debian package.  This means that the type of detection we
+support is more _course-grained_ when compared to Certificate Transparency.
 
 ## Design
+We consider a data publisher that wants to digitally sign their data.  The data
+is of opaque type.  We assume that end-users have a mechanism to locate the
+relevant public verification keys.  Data and signatures can also be retrieved
+(in)directly from the data publisher.  We make little assumptions about the
+signature tooling.  The ecosystem at large can continue to use `gpg`, `openssl`,
+`ssh-keygen -Y`, `signify`, or something else.
+
+We _have to assume_ that additional tooling can be installed by end-users that
+wish to enforce transparency logging.  For example, none of the existing
+signature tooling support verification of Merkle tree proofs.  A side-effect of
+our design is that this additional tooling makes no outbound connections.  The
+above data flows are thus preserved.
+
+### A bird's view
+A central part of any transparency log is the data.  The data is stored by the
+leaves of an append-only Merkle tree.  Our leaf structure contains four fields:
+- **shard_hint**: a number that binds the leaf to a particular _shard interval_.
+Sharding means that the log has a predefined time during which logging requests
+will be accepted.  Once elapsed, the log can be shutdown.
+- **checksum**: a cryptographic hash of some opaque data.  The log never
+sees the opaque data; just the hash.
+- **signature**: a digital signature that is computed by the data publisher over
+the leaf's shard hint and checksum.
+- **key_hash**: a cryptographic hash of the public verification key that can be
+used to verify the leaf's signature.
+
+#### Step 1 - preparing a logging request
+The data publisher selects a shard hint and a checksum that should be logged.
+For example, the shard hint could be "logs that are active during 2021".  The
+checksum might be a hashed release file or something else.
+
+The data publisher signs the selected shard hint and checksum using their secret
+signing key.  Both the signed message and the signature is stored
+in the leaf for anyone to verify.  Including a shard hint in the signed message
+ensures that the good Samaritan cannot change it to log all leaves from an
+earlier shard into a newer one.
+
+The hashed public verification key is also stored in the leaf.  This makes it
+easy to attribute the leaf to the signing entity.  For example, a data publisher
+that monitors the log can look for leaves that match their own key hash(es).
+
+A hash, rather than the full public verification key, is used to force the
+verifier to locate the key and trust it explicitly.  Not disclosing the public
+verification key in the leaf makes it more difficult to use an untrusted key _by
+mistake_.
+
+#### Step 2 - submitting a logging request
+The log implements an HTTP(S) API.  Input and output is human-readable and uses
+percent encoding.  We decided to use percent encoding for requests and responses
+because it is a simple format that is commonly used on the web.  A more complex
+parser like JSON is not needed if the exchanged data structures are basic
+enough.
+
+The data publisher submits their shard hint, checksum, signature, and public
+verification key as key-value pairs.  The log will use the public verification
+key to check that the signature is valid, then hash it to construct the leaf.
+
+The data publisher also submits a _domain hint_.  The log will download a DNS
+TXT resource record based on the provided domain name.  The downloaded result
+must match the public verification key hash.  By verifying that the submitter
+controls a domain that is aware of the public verification key, rate limits can
+be applied per second-level domain.  As a result, you would need a large number
+of domain names to spam the log in any significant way.
+
+Using DNS to combat spam is convenient because many data publishers already have
+a domain name.  A single domain name is also relatively cheap.  Another
+benefit is that the same anti-spam mechanism can be used across several
+independent logs without coordination.  This is important because a healthy log
+ecosystem needs more than one log to be reliable.  DNS also has built-in
+caching that can be influenced by setting TTLs accordingly.
+
+The submitter's domain hint is not part of the leaf because key management is
+more complex than that.  The only service that the log provides is discovery of
+signed checksums.  Key transparency projects have their own merit.
+
+The log will _try_ to incorporate a leaf into the Merkle tree if a logging
+request is accepted.  There are no _promises of public logging_ as in
+Certificate Transparency.  Therefore, the submitter needs to wait for an
+inclusion proof before concluding that the request succeeded.  Not having
+inclusion promises makes the log less complex.
+
+#### Step 3 - distributing proofs of public logging
+The data publisher is responsible for collecting all cryptographic proofs that
+their end-users will need to enforce public logging.  It must be possible to
+download the following collection (in)directly from the data publisher:
+1. **Shard hint**: the data publisher's selected shard hint.
+2. **Opaque data**: the data publisher's opaque data.
+3. **Signature**: the data publisher's leaf signature.
+5. **Cosigned tree head**: the log's tree head and a _list of signatures_ that
+state it is consistent with prior history.
+6. **Inclusion proof**: a proof of inclusion that is based on the leaf and tree
+head in question.
+
+The public verification key is known.  Therefore, the first three fields are
+sufficient to reconstruct the logged leaf.  The leaf's signature can be
+verified.  The final two fields then prove that the leaf is in the log.  If the
+leaf is included in the log, any monitor can detect that there is a new
+signature for a data publisher's public verification key.
+
+The catch is that the proof of logging is only as convincing as the tree head
+that the inclusion proof leads up to.  To bypass public logging, the attacker
+needs to control a threshold of independent _witnesses_ that cosign the log.  A
+benign witness will only sign the log's tree head if it is consistent with prior
+history.
+
+#### Summary
+The log is sharded and will shutdown at a predefined time.  The log can shut
+down _safely_ because end-user verification is not interactive.  The difficulty
+of bypassing public logging is based on the difficulty of controlling a
+threshold of independent witnesses.  Witnesses cosign tree heads to make them
+trustworthy.
+
+Submitters, monitors, and witnesses interact with the log using an HTTP(S) API.
+Submitters must prove that they own a domain name as an anti-spam mechanism.
+End-users interact with the log _indirectly_ via a data publisher.  It is the
+data publisher's job to log signed checksums, distribute necessary proofs of
+logging, and monitor the log.
+
+### A peak into the details
+Our bird's view introduction skipped many details that matter in practise.  Some
+of these details are presented here using a question-answer format.  A
+question-answer format is helpful because it is easily modified and extended.
+
+#### What cryptographic primitives are supported?
+The only supported hash algorithm is SHA256.  The only supported signature
+scheme is Ed25519.  Not having any cryptographic agility makes the protocol
+simpler and more secure.
+
+An immediate follow-up question is how that is supposed to work with existing
+and future signature tooling.  The key insight is that _additional tooling is
+already required to verify Merkle tree proofs.  That tooling should use SHA256.
+That tooling should also verify all Ed25519 signatures that logs, witnesses, and
+data publishers create_.
+
+For example, suppose that an ecosystem uses `gpg` which has its own incompatible
+signature format and algorithms.  The data publisher could _cross-sign_ using
+Ed25519 as follows:
+1. Sign the opaque data as you normally would with `gpg`.
+2. Hash the opaque data and use that as the leaf's checksum.  Sign the leaf
+using Ed25519.
+
+First the end-user verifies that the `gpg` signature is valid.  This is the
+old verification process.  Then the end-user uses the additional tooling to
+verify proofs of logging, which involves SHA256 hashing and Ed25519 signatures.
+
+The downside is that the data publisher may need to manage an Ed25519 key _as
+well_.  TODO: motivate why that is a suboptimal but worth-while trade-off.
+
+#### What (de)serialization parsers are needed?
+#### Why witness cosigning?
+#### What policy should be used?
+#### TODO
+Add more key questions and answers.
+
+## Concluding remarks
+Example of binary transparency and reproducible builds.
-- 
cgit v1.2.3


From 6cae1445318e22ce909b0211fc405dbeb6db7c44 Mon Sep 17 00:00:00 2001
From: Rasmus Dahlberg <rasmus.dahlberg@kau.se>
Date: Fri, 30 Apr 2021 12:11:40 +0200
Subject: fixed typos

---
 doc/design.md | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

(limited to 'doc/design.md')

diff --git a/doc/design.md b/doc/design.md
index 9fcf4b6..cb379e5 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -1,6 +1,6 @@
 # System Transparency Logging: Design v0
 We propose System Transparency logging.  It is similar to Certificate
-Transparency, expect that cryptographically signed checksums are logged as
+Transparency, except that cryptographically signed checksums are logged as
 opposed to X.509 certificates.  Publicly logging signed checksums allow anyone
 to discover which keys produced what signatures.  As such, malicious and
 unintended key-usage can be _detected_.  We present our design and conclude by
@@ -31,7 +31,7 @@ is to facilitate detection of unwanted key-usage.
 We consider a powerful attacker that gained control of a target's signing and
 release infrastructure.  This covers a weaker form of attacker that is able to
 sign data and distribute it to a subset of isolated users.  For example, this is
-essentially what FBI requested from Apple in the San Bernardino case [\[FBI-Apple\]](https://www.eff.org/cases/apple-challenges-fbi-all-writs-act-order).
+essentially what the FBI requested from Apple in the San Bernardino case [\[FBI-Apple\]](https://www.eff.org/cases/apple-challenges-fbi-all-writs-act-order).
 The fact that signing keys and related infrastructure components get
 compromised should not be controversial these days [\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/).
 
@@ -57,7 +57,7 @@ independent parties are adversarial.
 It is a non-goal to disclose the data that a checksum represents.  For example,
 the log cannot distinguish between a checksum that represents a tax declaration,
 an ISO image, or a Debian package.  This means that the type of detection we
-support is more _course-grained_ when compared to Certificate Transparency.
+support is more _coarse-grained_ when compared to Certificate Transparency.
 
 ## Design
 We consider a data publisher that wants to digitally sign their data.  The data
@@ -69,7 +69,7 @@ signature tooling.  The ecosystem at large can continue to use `gpg`, `openssl`,
 
 We _have to assume_ that additional tooling can be installed by end-users that
 wish to enforce transparency logging.  For example, none of the existing
-signature tooling support verification of Merkle tree proofs.  A side-effect of
+signature tooling supports verification of Merkle tree proofs.  A side-effect of
 our design is that this additional tooling makes no outbound connections.  The
 above data flows are thus preserved.
 
@@ -78,7 +78,7 @@ A central part of any transparency log is the data.  The data is stored by the
 leaves of an append-only Merkle tree.  Our leaf structure contains four fields:
 - **shard_hint**: a number that binds the leaf to a particular _shard interval_.
 Sharding means that the log has a predefined time during which logging requests
-will be accepted.  Once elapsed, the log can be shutdown.
+will be accepted.  Once elapsed, the log can be shut down.
 - **checksum**: a cryptographic hash of some opaque data.  The log never
 sees the opaque data; just the hash.
 - **signature**: a digital signature that is computed by the data publisher over
@@ -166,7 +166,7 @@ benign witness will only sign the log's tree head if it is consistent with prior
 history.
 
 #### Summary
-The log is sharded and will shutdown at a predefined time.  The log can shut
+The log is sharded and will shut down at a predefined time.  The log can shut
 down _safely_ because end-user verification is not interactive.  The difficulty
 of bypassing public logging is based on the difficulty of controlling a
 threshold of independent witnesses.  Witnesses cosign tree heads to make them
@@ -178,7 +178,7 @@ End-users interact with the log _indirectly_ via a data publisher.  It is the
 data publisher's job to log signed checksums, distribute necessary proofs of
 logging, and monitor the log.
 
-### A peak into the details
+### A peek into the details
 Our bird's view introduction skipped many details that matter in practise.  Some
 of these details are presented here using a question-answer format.  A
 question-answer format is helpful because it is easily modified and extended.
-- 
cgit v1.2.3


From 984f73e11ea1000b3af4f36199f591450afca2af Mon Sep 17 00:00:00 2001
From: Rasmus Dahlberg <rasmus.dahlberg@kau.se>
Date: Fri, 30 Apr 2021 14:15:50 +0200
Subject: clarified why domain hint is not in the leaf

---
 doc/design.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'doc/design.md')

diff --git a/doc/design.md b/doc/design.md
index cb379e5..dda9efe 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -132,8 +132,8 @@ ecosystem needs more than one log to be reliable.  DNS also has built-in
 caching that can be influenced by setting TTLs accordingly.
 
 The submitter's domain hint is not part of the leaf because key management is
-more complex than that.  The only service that the log provides is discovery of
-signed checksums.  Key transparency projects have their own merit.
+more complex than that.  A separate project should focus on transparent key
+management.  The scope of our work is transparent _key-usage_.
 
 The log will _try_ to incorporate a leaf into the Merkle tree if a logging
 request is accepted.  There are no _promises of public logging_ as in
-- 
cgit v1.2.3


From b78c5a72cd6284b5be3cf4e42fd85b7f16cb0dc4 Mon Sep 17 00:00:00 2001
From: Rasmus Dahlberg <rasmus.dahlberg@kau.se>
Date: Fri, 30 Apr 2021 14:32:10 +0200
Subject: rephrased a complex sentence

---
 doc/design.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

(limited to 'doc/design.md')

diff --git a/doc/design.md b/doc/design.md
index dda9efe..c7be178 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -143,14 +143,14 @@ inclusion promises makes the log less complex.
 
 #### Step 3 - distributing proofs of public logging
 The data publisher is responsible for collecting all cryptographic proofs that
-their end-users will need to enforce public logging.  It must be possible to
-download the following collection (in)directly from the data publisher:
-1. **Shard hint**: the data publisher's selected shard hint.
-2. **Opaque data**: the data publisher's opaque data.
+their end-users will need to enforce public logging.  The collection below
+should be downloadable from the same place that the data is normally hosted.
+1. **Opaque data**: the data publisher's opaque data.
+2. **Shard hint**: the data publisher's selected shard hint.
 3. **Signature**: the data publisher's leaf signature.
-5. **Cosigned tree head**: the log's tree head and a _list of signatures_ that
+4. **Cosigned tree head**: the log's tree head and a _list of signatures_ that
 state it is consistent with prior history.
-6. **Inclusion proof**: a proof of inclusion that is based on the leaf and tree
+5. **Inclusion proof**: a proof of inclusion that is based on the leaf and tree
 head in question.
 
 The public verification key is known.  Therefore, the first three fields are
-- 
cgit v1.2.3


From 6de2935d3a6589d35a6e7a59c56c5a67313f3ccb Mon Sep 17 00:00:00 2001
From: Rasmus Dahlberg <rasmus.dahlberg@kau.se>
Date: Fri, 30 Apr 2021 14:34:38 +0200
Subject: minor edit

---
 doc/design.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'doc/design.md')

diff --git a/doc/design.md b/doc/design.md
index c7be178..0aa83f2 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -186,7 +186,7 @@ question-answer format is helpful because it is easily modified and extended.
 #### What cryptographic primitives are supported?
 The only supported hash algorithm is SHA256.  The only supported signature
 scheme is Ed25519.  Not having any cryptographic agility makes the protocol
-simpler and more secure.
+less complex and more secure.
 
 An immediate follow-up question is how that is supposed to work with existing
 and future signature tooling.  The key insight is that _additional tooling is
-- 
cgit v1.2.3


From f649f2715dc6c4c7f45116b83a6347a08d7193b4 Mon Sep 17 00:00:00 2001
From: Rasmus Dahlberg <rasmus.dahlberg@kau.se>
Date: Sat, 1 May 2021 15:15:22 +0200
Subject: removed unnecessary parser details in the bird's view

---
 doc/design.md | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

(limited to 'doc/design.md')

diff --git a/doc/design.md b/doc/design.md
index 0aa83f2..2836364 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -108,10 +108,8 @@ mistake_.
 
 #### Step 2 - submitting a logging request
 The log implements an HTTP(S) API.  Input and output is human-readable and uses
-percent encoding.  We decided to use percent encoding for requests and responses
-because it is a simple format that is commonly used on the web.  A more complex
-parser like JSON is not needed if the exchanged data structures are basic
-enough.
+a simple key-value format.  A more complex parser like JSON is not needed
+because the exchanged data structures are basic enough.
 
 The data publisher submits their shard hint, checksum, signature, and public
 verification key as key-value pairs.  The log will use the public verification
-- 
cgit v1.2.3


From e61bd2fb0e845eeef11b1825fdbc5e5c52fb2ec5 Mon Sep 17 00:00:00 2001
From: Rasmus Dahlberg <rasmus.dahlberg@kau.se>
Date: Sat, 1 May 2021 19:39:45 +0200
Subject: added context regarding the supported cryptographic primitives

---
 doc/design.md | 49 ++++++++++++++++++++++++++++---------------------
 1 file changed, 28 insertions(+), 21 deletions(-)

(limited to 'doc/design.md')

diff --git a/doc/design.md b/doc/design.md
index 2836364..91de288 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -183,32 +183,39 @@ question-answer format is helpful because it is easily modified and extended.
 
 #### What cryptographic primitives are supported?
 The only supported hash algorithm is SHA256.  The only supported signature
-scheme is Ed25519.  Not having any cryptographic agility makes the protocol
-less complex and more secure.
-
-An immediate follow-up question is how that is supposed to work with existing
-and future signature tooling.  The key insight is that _additional tooling is
-already required to verify Merkle tree proofs.  That tooling should use SHA256.
-That tooling should also verify all Ed25519 signatures that logs, witnesses, and
-data publishers create_.
-
-For example, suppose that an ecosystem uses `gpg` which has its own incompatible
-signature format and algorithms.  The data publisher could _cross-sign_ using
-Ed25519 as follows:
-1. Sign the opaque data as you normally would with `gpg`.
+scheme is Ed25519.  Not having any cryptographic agility makes the protocol less
+complex and more secure.
+
+We can be cryptographically opinionated because of a key insight.  Existing
+signature tools like `gpg`, `ssh-keygen -Y`, and `signify` cannot verify proofs
+of public logging.  Therefore, _additional tooling must already be installed by
+end-users_.  That tooling should verify hashes using the log's hash function.
+That tooling should also verify signatures using the log's signature scheme.
+Signed messages include tree heads as well as tree leaves.
+
+#### Why not let the data publisher pick their own signature scheme and format?
+Agility introduces complexity and difficult policy questions.  For example,
+which algorithms and formats should (not) be supported and why?  Picking Ed25519
+is a current best practise that should be encouraged if possible.
+
+There is not much we can do if a data publisher _refuses_ to rely on the log's
+hash function or signature scheme.
+
+#### What if the data publisher must use a specific signature scheme or format?
+You may _cross-sign_ the data as follows.
+1. Sign the opaque data as you normally would. 
 2. Hash the opaque data and use that as the leaf's checksum.  Sign the leaf
-using Ed25519.
+using the log's signature scheme.
 
-First the end-user verifies that the `gpg` signature is valid.  This is the
-old verification process.  Then the end-user uses the additional tooling to
-verify proofs of logging, which involves SHA256 hashing and Ed25519 signatures.
-
-The downside is that the data publisher may need to manage an Ed25519 key _as
-well_.  TODO: motivate why that is a suboptimal but worth-while trade-off.
+First the end-user verifies that the normal signature is valid.  Then the
+end-user lets the additional tooling (that is already required) verify the rest.
+Cross-signing should be a relatively comfortable upgrade path that is backwards
+compatible.  The downside is that the data publisher may need to manage an
+additional key-pair.
 
 #### What (de)serialization parsers are needed?
-#### Why witness cosigning?
 #### What policy should be used?
+#### Why witness cosigning?
 #### TODO
 Add more key questions and answers.
 
-- 
cgit v1.2.3


From 16eed32e779f2fef850c084cb2631898dddcc5dc Mon Sep 17 00:00:00 2001
From: Rasmus Dahlberg <rasmus.dahlberg@kau.se>
Date: Sat, 1 May 2021 19:46:52 +0200
Subject: added q/a topics

---
 doc/design.md | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'doc/design.md')

diff --git a/doc/design.md b/doc/design.md
index 91de288..22bfab0 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -218,6 +218,9 @@ additional key-pair.
 #### Why witness cosigning?
 #### TODO
 Add more key questions and answers.
+- Log spamming
+- Log poisoning
+- Why we removed identifier field from the leaf
 
 ## Concluding remarks
 Example of binary transparency and reproducible builds.
-- 
cgit v1.2.3


From 8f76216554d83cf45094686f6a43f757d2c186fe Mon Sep 17 00:00:00 2001
From: Rasmus Dahlberg <rasmus.dahlberg@kau.se>
Date: Mon, 3 May 2021 10:47:57 +0200
Subject: added detail that needs to be explained

---
 doc/design.md | 1 +
 1 file changed, 1 insertion(+)

(limited to 'doc/design.md')

diff --git a/doc/design.md b/doc/design.md
index 22bfab0..bd24878 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -221,6 +221,7 @@ Add more key questions and answers.
 - Log spamming
 - Log poisoning
 - Why we removed identifier field from the leaf
+- Explain `latest`, `stable` and `cosigned` tree head.
 
 ## Concluding remarks
 Example of binary transparency and reproducible builds.
-- 
cgit v1.2.3


From e7bd2f29e7226e39bee7d0a1b89965ef5bdf5dc2 Mon Sep 17 00:00:00 2001
From: Rasmus Dahlberg <rasmus.dahlberg@kau.se>
Date: Mon, 3 May 2021 22:48:17 +0200
Subject: added q/a topic

---
 doc/design.md | 1 +
 1 file changed, 1 insertion(+)

(limited to 'doc/design.md')

diff --git a/doc/design.md b/doc/design.md
index bd24878..4c764e3 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -222,6 +222,7 @@ Add more key questions and answers.
 - Log poisoning
 - Why we removed identifier field from the leaf
 - Explain `latest`, `stable` and `cosigned` tree head.
+- Privacy aspects
 
 ## Concluding remarks
 Example of binary transparency and reproducible builds.
-- 
cgit v1.2.3


From 866320e7cb3f8eee21f464cbc56d518f6eb66c72 Mon Sep 17 00:00:00 2001
From: Linus Nordberg <linus@nordberg.se>
Date: Tue, 4 May 2021 16:33:01 +0200
Subject: move long description of sharding to the design doc

---
 doc/api.md    | 49 ++++++++++++++-----------------------------------
 doc/design.md | 22 ++++++++++++++++++++++
 2 files changed, 36 insertions(+), 35 deletions(-)

(limited to 'doc/design.md')

diff --git a/doc/api.md b/doc/api.md
index c9d3db9..3a595ee 100644
--- a/doc/api.md
+++ b/doc/api.md
@@ -114,41 +114,20 @@ struct tree_leaf {
 }
 ```
 
-Unlike X.509 certificates which already have validity ranges, a
-checksum does not carry any such information.  Therefore, we require
-that the submitter selects a _shard hint_.  The selected shard hint
-must be in the log's _shard interval_.  A shard interval is defined by
-a start time and an end time.  Both ends of the shard interval are
-inclusive and expressed as the number of seconds since the UNIX epoch
-(January 1, 1970 00:00 UTC).
-
-Sharding simplifies log operations because it becomes explicit when a
-log can be shutdown.  A log must only accept logging requests that
-have valid shard hints.  A log should only accept logging requests
-during the predefined shard interval.  Note that _the submitter's
-shard hint is not a verified timestamp_.  The submitter should set the
-shard hint as large as possible.  If a roughly verified timestamp is
-needed, a cosigned tree head can be used.
-
-Without a shard hint, the good Samaritan could log all leaves from an
-earlier shard into a newer one.  Not only would that defeat the
-purpose of sharding, but it would also become a potential
-denial-of-service vector.
-
-The signed message is composed of the chosen `shard_hint` and the
-submitter's `checksum`.  It must be possible to verify
-`signature_over_message` using the submitter's public verification
-key.
-
-Note that the way `shard_hint` and `checksum` are serialized with
-regards to signing differs from how they're being transmitted to the
-log.
-
-A `key_hash` of the key used for signing `message` is included in
-`tree_leaf` so that the leaf can be attributed to the submitter.  A
-hash, rather than the full public key, is used to motivate the
-verifier to locate the appropriate key and make an explicit trust
-decision.
+`message` is composed of the `shard_hint`, chosen by the submitter to
+match the shard interval for the log, and the submitter's `checksum`
+to be logged.
+
+`signature_over_message` is a signature over `message`, using the
+submitter's verification key. It must be possible to verify the
+signature using the submitter's public verification key, as indicated
+by `key_hash`.
+
+`key_hash` is a hash of the submitter's verification key used for
+signing `message`. It is included in `tree_leaf` so that the leaf can
+be attributed to the submitter.  A hash, rather than the full public
+key, is used to motivate verifiers to locate the appropriate key and
+make an explicit trust decision.
 
 ## Public endpoints
 Every log has a base URL that identifies it uniquely.  The only
diff --git a/doc/design.md b/doc/design.md
index 4c764e3..a840c01 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -216,6 +216,28 @@ additional key-pair.
 #### What (de)serialization parsers are needed?
 #### What policy should be used?
 #### Why witness cosigning?
+#### Why sharding?
+Unlike X.509 certificates which already have validity ranges, a
+checksum does not carry any such information.  Therefore, we require
+that the submitter selects a _shard hint_.  The selected shard hint
+must be in the log's _shard interval_.  A shard interval is defined by
+a start time and an end time.  Both ends of the shard interval are
+inclusive and expressed as the number of seconds since the UNIX epoch
+(January 1, 1970 00:00 UTC).
+
+Sharding simplifies log operations because it becomes explicit when a
+log can be shutdown.  A log must only accept logging requests that
+have valid shard hints.  A log should only accept logging requests
+during the predefined shard interval.  Note that _the submitter's
+shard hint is not a verified timestamp_.  The submitter should set the
+shard hint as large as possible.  If a roughly verified timestamp is
+needed, a cosigned tree head can be used.
+
+Without a shard hint, the good Samaritan could log all leaves from an
+earlier shard into a newer one.  Not only would that defeat the
+purpose of sharding, but it would also become a potential
+denial-of-service vector.
+
 #### TODO
 Add more key questions and answers.
 - Log spamming
-- 
cgit v1.2.3


From 8261776989fd25fbdcf1f0e930c1b3848886ba70 Mon Sep 17 00:00:00 2001
From: Linus Nordberg <linus@nordberg.se>
Date: Wed, 5 May 2021 10:09:35 +0200
Subject: minor wording

---
 doc/design.md | 58 +++++++++++++++++++++++++++++-----------------------------
 1 file changed, 29 insertions(+), 29 deletions(-)

(limited to 'doc/design.md')

diff --git a/doc/design.md b/doc/design.md
index a840c01..a1a6140 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -74,46 +74,46 @@ our design is that this additional tooling makes no outbound connections.  The
 above data flows are thus preserved.
 
 ### A bird's view
-A central part of any transparency log is the data.  The data is stored by the
+A central part of any transparency log is the data stored by the log.  The data is stored by the
 leaves of an append-only Merkle tree.  Our leaf structure contains four fields:
 - **shard_hint**: a number that binds the leaf to a particular _shard interval_.
 Sharding means that the log has a predefined time during which logging requests
-will be accepted.  Once elapsed, the log can be shut down.
+are accepted.  Once elapsed, the log can be shut down.
 - **checksum**: a cryptographic hash of some opaque data.  The log never
-sees the opaque data; just the hash.
+sees the opaque data; just the hash made by the data publisher.
 - **signature**: a digital signature that is computed by the data publisher over
 the leaf's shard hint and checksum.
-- **key_hash**: a cryptographic hash of the public verification key that can be
-used to verify the leaf's signature.
+- **key_hash**: a cryptographic hash of the data publisher's public verification key that can be
+used to verify the signature.
 
 #### Step 1 - preparing a logging request
 The data publisher selects a shard hint and a checksum that should be logged.
 For example, the shard hint could be "logs that are active during 2021".  The
-checksum might be a hashed release file or something else.
+checksum might be the hash of a release file.
 
-The data publisher signs the selected shard hint and checksum using their secret
+The data publisher signs the selected shard hint and checksum using a secret
 signing key.  Both the signed message and the signature is stored
 in the leaf for anyone to verify.  Including a shard hint in the signed message
-ensures that the good Samaritan cannot change it to log all leaves from an
+ensures that a good Samaritan cannot change it to log all leaves from an
 earlier shard into a newer one.
 
-The hashed public verification key is also stored in the leaf.  This makes it
-easy to attribute the leaf to the signing entity.  For example, a data publisher
+A hash of the public verification key is also stored in the leaf.  This makes it
+possible to attribute the leaf to the data publisher.  For example, a data publisher
 that monitors the log can look for leaves that match their own key hash(es).
 
-A hash, rather than the full public verification key, is used to force the
-verifier to locate the key and trust it explicitly.  Not disclosing the public
-verification key in the leaf makes it more difficult to use an untrusted key _by
+A hash, rather than the full public verification key, is used to motivate the
+verifier to locate the key and make an explicit trust decision.  Not disclosing the public
+verification key in the leaf makes it more unlikely that someone would use an untrusted key _by
 mistake_.
 
 #### Step 2 - submitting a logging request
 The log implements an HTTP(S) API.  Input and output is human-readable and uses
 a simple key-value format.  A more complex parser like JSON is not needed
-because the exchanged data structures are basic enough.
+because the exchanged data structures are primitive enough.
 
 The data publisher submits their shard hint, checksum, signature, and public
 verification key as key-value pairs.  The log will use the public verification
-key to check that the signature is valid, then hash it to construct the leaf.
+key to check that the signature is valid, then hash it to construct the `key_hash` part of the leaf.
 
 The data publisher also submits a _domain hint_.  The log will download a DNS
 TXT resource record based on the provided domain name.  The downloaded result
@@ -126,8 +126,8 @@ Using DNS to combat spam is convenient because many data publishers already have
 a domain name.  A single domain name is also relatively cheap.  Another
 benefit is that the same anti-spam mechanism can be used across several
 independent logs without coordination.  This is important because a healthy log
-ecosystem needs more than one log to be reliable.  DNS also has built-in
-caching that can be influenced by setting TTLs accordingly.
+ecosystem needs more than one log in order to be reliable.  DNS also has built-in
+caching which data publishers can influence by setting TTLs accordingly.
 
 The submitter's domain hint is not part of the leaf because key management is
 more complex than that.  A separate project should focus on transparent key
@@ -136,26 +136,26 @@ management.  The scope of our work is transparent _key-usage_.
 The log will _try_ to incorporate a leaf into the Merkle tree if a logging
 request is accepted.  There are no _promises of public logging_ as in
 Certificate Transparency.  Therefore, the submitter needs to wait for an
-inclusion proof before concluding that the request succeeded.  Not having
+inclusion proof to appear before concluding that the logging request succeeded.  Not having
 inclusion promises makes the log less complex.
 
 #### Step 3 - distributing proofs of public logging
 The data publisher is responsible for collecting all cryptographic proofs that
 their end-users will need to enforce public logging.  The collection below
-should be downloadable from the same place that the data is normally hosted.
+should be downloadable from the same place that published data is normally hosted.
 1. **Opaque data**: the data publisher's opaque data.
 2. **Shard hint**: the data publisher's selected shard hint.
 3. **Signature**: the data publisher's leaf signature.
 4. **Cosigned tree head**: the log's tree head and a _list of signatures_ that
 state it is consistent with prior history.
-5. **Inclusion proof**: a proof of inclusion that is based on the leaf and tree
+5. **Inclusion proof**: a proof of inclusion based on the logged leaf and tree
 head in question.
 
-The public verification key is known.  Therefore, the first three fields are
+The data publisher's public verification key is known.  Therefore, the first three fields are
 sufficient to reconstruct the logged leaf.  The leaf's signature can be
 verified.  The final two fields then prove that the leaf is in the log.  If the
 leaf is included in the log, any monitor can detect that there is a new
-signature for a data publisher's public verification key.
+signature made by a given data publisher, 's public verification key.
 
 The catch is that the proof of logging is only as convincing as the tree head
 that the inclusion proof leads up to.  To bypass public logging, the attacker
@@ -191,7 +191,7 @@ signature tools like `gpg`, `ssh-keygen -Y`, and `signify` cannot verify proofs
 of public logging.  Therefore, _additional tooling must already be installed by
 end-users_.  That tooling should verify hashes using the log's hash function.
 That tooling should also verify signatures using the log's signature scheme.
-Signed messages include tree heads as well as tree leaves.
+Both tree heads and tree leaves are being signed.
 
 #### Why not let the data publisher pick their own signature scheme and format?
 Agility introduces complexity and difficult policy questions.  For example,
@@ -202,13 +202,13 @@ There is not much we can do if a data publisher _refuses_ to rely on the log's
 hash function or signature scheme.
 
 #### What if the data publisher must use a specific signature scheme or format?
-You may _cross-sign_ the data as follows.
-1. Sign the opaque data as you normally would. 
-2. Hash the opaque data and use that as the leaf's checksum.  Sign the leaf
-using the log's signature scheme.
+They may _cross-sign_ the data as follows.
+1. Sign the data as they're used to.
+2. Hash the data and use the result as the leaf's checksum to be logged.
+3. Sign the leaf using the log's signature scheme.
 
-First the end-user verifies that the normal signature is valid.  Then the
-end-user lets the additional tooling (that is already required) verify the rest.
+For verification, the end-user first verifies that the usual signature from step 1 is valid.  Then the
+end-user uses the additional tooling (which is already required) to verify the rest.
 Cross-signing should be a relatively comfortable upgrade path that is backwards
 compatible.  The downside is that the data publisher may need to manage an
 additional key-pair.
-- 
cgit v1.2.3


From cd02e6e2bd7e36d8333824e57913d08a56d8a85b Mon Sep 17 00:00:00 2001
From: Linus Nordberg <linus@nordberg.se>
Date: Wed, 5 May 2021 12:31:04 +0200
Subject: add reminder about another q/a

---
 doc/design.md | 1 +
 1 file changed, 1 insertion(+)

(limited to 'doc/design.md')

diff --git a/doc/design.md b/doc/design.md
index a1a6140..2e01a34 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -245,6 +245,7 @@ Add more key questions and answers.
 - Why we removed identifier field from the leaf
 - Explain `latest`, `stable` and `cosigned` tree head.
 - Privacy aspects
+- How does this whole thing work with more than one log?
 
 ## Concluding remarks
 Example of binary transparency and reproducible builds.
-- 
cgit v1.2.3