aboutsummaryrefslogtreecommitdiff
path: root/doc/proposals/2021-11-remove-arbitrary-bytes.md
blob: 68df68b034f7b8b36d59ea3210540512342ab917 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
**Title**: Remove arbitrary bytes </br>
**Date**: 2021-12-04 </br>
**State**: Decided </br>

# Summary
A leaf's checksum is currently an opaque array of 32 arbitrary bytes.  We would
like to change this to H(checksum), so that no logged bytes are arbitrary.  As a
result, the threat of log poisoning goes from unlikely to very unlikely.

# Detailed description
New leaf:
- Shard hint
- H(checksum), was "just checksum"
- Signature
- H(public key)

A signer's signed statement would be for shard hint and H(checksum), not shard
hint and checksum.  The same inputs are provided to the log for add-leaf
submissions.  The log hashes the submitted checksum and then does all
verification as before.  The hashed checksum is stored in the log's leaf.  As
such, it becomes computationally expensive to craft many arbitrary leaf bytes.

Monitors locate data externally based on H(checksum), not checksum.  Note that
monitors can verify observed signatures as before without locating the data.
This is important so that we can be sure a signing operation actually happened.

Verifiers need the same (meta)data distributed, but in the verification step
H(checksum) must be computed to verify signatures and inclusion proofs.

Witnesses are not affected by this change.

Note: a different approach would have been to submit data and let the log hash
that.  Not letting the log see data is a feature:
- The data cannot be analyzed by the log unless its location is known
- The data cannot be expected to be stored in the future
- Each logging request becomes cheaper

Note: the above terminology a bit "meh" and messy.  Consider better naming when
moving this into the main documentation.