aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorRasmus Dahlberg <rasmus@mullvad.net>2022-01-31 17:22:45 +0100
committerRasmus Dahlberg <rasmus@mullvad.net>2022-01-31 17:22:45 +0100
commit07fdec6d86895706a4d5f6e3c50f8a522968b91b (patch)
tree77fbd1dff86f92ec428486eba1842b68c2f3971b /doc
parenta395a3ac6949394db9541e6f050bbff6a7c8d560 (diff)
documented the decided remove arbitrary bytes proposal
Refer to doc/proposals/2021-11-remove-arbitrary-bytes.md for details. Since our proposal left the exact terminology undefined, this commit took a stab at that. The main idea was to keep referring to what we have in a leaf and what is being signed as a _checksum_. This ensures that we are not undermining or stepping away from our core of "signed checksums". It seemed quite natural to refer to a checksum's preimage.
Diffstat (limited to 'doc')
-rw-r--r--doc/api.md8
-rw-r--r--doc/design.md40
2 files changed, 32 insertions, 16 deletions
diff --git a/doc/api.md b/doc/api.md
index c28c254..b9465b2 100644
--- a/doc/api.md
+++ b/doc/api.md
@@ -129,7 +129,9 @@ struct tree_leaf {
`shard_hint` must match a log's shard interval and is determined by the signer.
-`checksum` represents some data and is computed by the signer.
+`checksum` is a hashed preimage. The signer selects a 32-byte preimage which
+represents some data. It is recommended to set this preimage to `H(data)`, in
+which case the checksum will be `H(H(data))`.
`signature` is a signature over a serialized `statement`. It must be possible
to verify this signature using the signer's public verification key.
@@ -319,7 +321,7 @@ POST <base url>/sigsum/v0/add-leaf
Input:
- `shard_hint`: `tree_leaf.statement.shard_hint`, ASCII-encoded decimal number.
-- `checksum`: `tree_leaf.statement.checksum`, hex-encoded.
+- `preimage`: the preimage used to compute `tree_leaf.statement.checksum`, hex-encoded.
- `signature`: `tree_leaf.signature`, hex-encoded.
- `verification_key`: public verification key that can be used to verify the
above signature. The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2),
@@ -343,7 +345,7 @@ should (re)send their add-leaf request until observing HTTP status 200 OK.
Example:
```
$ echo "shard_hint=1633039200
-checksum=315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3
+preimage=315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3
signature=0b849ed46b71b550d47ae320a8a37401129d71888edcc387b6a604b2fe1579e25479adb0edd1769f9b525d44b843ac0b3527ea12b8d9574676464b2ec6077401
verification_key=46a6aaceb6feee9cb50c258123e573cc5a8aa09e5e51d1a56cace9bfd7c5569c
domain_hint=_sigsum_v0.example.com" | curl --data-binary @- <base url>/sigsum/v0/add-leaf
diff --git a/doc/design.md b/doc/design.md
index 1173501..d0d62cb 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -192,8 +192,7 @@ distributed form of trust. A tree leaf contains four fields:
- **shard_hint**: a number that binds the leaf to a particular _shard interval_.
Sharding means that the log has a predefined time during which logging requests
are accepted. Once elapsed, the log can be shut down or be made read-only.
-- **checksum**: most likely a hash of some data. The log is not aware of data;
-just checksums.
+- **checksum**: a cryptographic hash that commits to some data.
- **signature**: a digital signature that is computed by a signer over the
selected shard hint and checksum.
- **key_hash**: a cryptographic hash of the signer's verification key that can
@@ -209,8 +208,8 @@ independently of logs, and trust them explicitly.
### 3.2 - Usage pattern
#### 3.2.1 - Prepare a request
-A signer selects a checksum that should be logged. For example, it could be the
-hash of an executable binary or something else.
+A signer computes the checksum to be logged. For example, it could be a
+hash that commits to an executable binary or something else.
The signer also selects a shard hint representing an abstract statement like
"sigsum logs that are active during 2021". Shard hints ensure that a log's
@@ -228,11 +227,13 @@ Sigsum logs implement an HTTP(S) API. Input and output is human-readable and
use a simple ASCII format. A more complex parser like JSON is not needed
since the data structures being exchanged are primitive enough.
-The signer submits their shard hint, checksum, signature, public verification
-key and domain hint as ASCII key-value pairs. The log verifies that the public
-verification key is present in DNS and uses it to check that the signature is
-valid, then hashes it to construct the Merkle tree leaf as described in
-Section 3.1.
+The signer submits their shard hint, checksum preimage, signature, public
+verification key and domain hint as ASCII key-value pairs. The log uses the
+specified preimage to compute the signer's checksum. The log also verifies
+that the public verification key is present in DNS, and uses it to check that
+the signature is valid for the resulting checksum and shard hint. The public
+verification key is then hashed to construct the Merkle tree leaf as described
+in Section 3.1.
A sigsum log will
[try](https://git.sigsum.org/sigsum/tree/doc/proposals/2022-01-add-leaf-endpoint)
@@ -329,7 +330,20 @@ A brief summary appeared in our archive on
It may be incomplete, but covers some details that are worth thinking more
about. We are still open to remove, add, or change things.
-#### 4.2 - What is the point of having a domain hint?
+#### 4.2 - What is the point of submitting a checksum's preimage?
+Logging arbitrary bytes can poison a log with inappropriate content. While a
+leaf is already light in Sigsum, a stream of leaves could be used. By not
+allowing any checksum to be arbitrary because logs compute them, a malicious
+party would have to craft leaves that are computationally costly to encode more
+than a few bytes.
+
+It is worth pointing out that the submitted preimage is limited to be a 32-byte
+buffer. If the data to be transparently signed is `D`, the recommended preimage
+is `H(D)`. The resulting checksum would be `H(H(D))`. The log will not be in a
+position to observe the data `D`, thereby removing power in the form of trivial
+data mining while at the same time making the overall protocol less heavy.
+
+#### 4.3 - What is the point of having a domain hint?
Domain hints help log operators combat spam. By verifying that every signer
controls a domain name that is aware of their public key, rate limits can be
applied per second-level domain. You would need a large number of domain names
@@ -356,7 +370,7 @@ that added this criteria.
We are considering if additional anti-spam mechanisms should be supported in v1.
-#### 4.3 - What is the point of having a shard hint?
+#### 4.4 - What is the point of having a shard hint?
Unlike TLS certificates which already have validity ranges, a checksum does not
carry any such information. Therefore, we require that the signer selects a
shard hint. The selected shard hint must be within a log's shard interval.
@@ -383,7 +397,7 @@ A log operator that shuts down a completed shard will not affect verifiers. In
other words, a signer can continue to distribute proofs that were once
collected. This is important because a checksum does not necessarily expire.
-#### 4.4 - What parts of witness cosigning are not done?
+#### 4.5 - What parts of witness cosigning are not done?
There are interesting policy aspects that relate to witness cosigning. For
example, what witnessing policy should a verifier use and how are trustworthy
witnesses discovered. This is somewhat analogous to a related policy question
@@ -404,6 +418,6 @@ the original proposal by
[Syta et al.](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7546521),
which puts an authority right in the middle of a slowly evolving witnessing policy.
-#### 4.5 - More questions
+#### 4.6 - More questions
- What are the privacy concerns?
- Add more questions here!