aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorRasmus Dahlberg <rasmus.dahlberg@kau.se>2021-11-15 12:04:45 +0100
committerRasmus Dahlberg <rasmus.dahlberg@kau.se>2021-11-15 12:04:45 +0100
commit8c7d8f925b00f8ab81735c08a69176e1e4efa07d (patch)
treed70d22b4199e25f111c1dddc4c7b44bb74e2ae02 /doc
parent32ee3924c528d715bf45fb135bcec6c123055aa8 (diff)
added remove arbitrary bytes proposal
Diffstat (limited to 'doc')
-rw-r--r--doc/proposals/2021-11-remove-arbitrary-bytes.md32
1 files changed, 32 insertions, 0 deletions
diff --git a/doc/proposals/2021-11-remove-arbitrary-bytes.md b/doc/proposals/2021-11-remove-arbitrary-bytes.md
new file mode 100644
index 0000000..bcdf3cc
--- /dev/null
+++ b/doc/proposals/2021-11-remove-arbitrary-bytes.md
@@ -0,0 +1,32 @@
+# Remove arbitrary bytes
+A leaf's checksum is currently an opaque array of 32 arbitrary bytes. We would
+like to change this to H(checksum), so that no logged bytes are arbitrary. As a
+result, the threat of log poisoning goes from unlikely to very unlikely.
+
+## Details
+New leaf:
+- Shard hint
+- H(checksum), was "just checksum"
+- Signature
+- H(public key)
+
+A signer's signed statement must be for H(checksum), not checksum. In other
+words, a signer basically signs H(H(data)), then checksum<-H(data) is submitted
+on our current add-leaf endpoint. The log computes H(checksum) for incoming
+add-leaf requests. No other changes are required for the log's leaf endpoints.
+
+Monitors locate data externally based on H(checksum), not checksum. Note that
+monitors can verify observed signatures as before without locating the data.
+This is important so that we can be sure a signing operation actually happened.
+
+Verifiers need the same (meta)data distributed, but in the verification step
+H(checksum) must be computed to verify signatures and inclusion proofs.
+
+Witnesses are not affected by this change.
+
+## Other
+A different approach would be to submit data and let the log hash that. Not
+letting the log see data is a feature:
+- The data cannot be analyzed by the log unless its location is known
+- The data cannot be expected to be stored in the future
+- Each logging request becomes cheaper