aboutsummaryrefslogtreecommitdiff
path: root/doc/proposals/2021-11-remove-arbitrary-bytes.md
blob: bcdf3ccfae6f3c5de01461e78efe2fb398e78e1b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Remove arbitrary bytes
A leaf's checksum is currently an opaque array of 32 arbitrary bytes.  We would
like to change this to H(checksum), so that no logged bytes are arbitrary.  As a
result, the threat of log poisoning goes from unlikely to very unlikely.

## Details
New leaf:
- Shard hint
- H(checksum), was "just checksum"
- Signature
- H(public key)

A signer's signed statement must be for H(checksum), not checksum.  In other
words, a signer basically signs H(H(data)), then checksum<-H(data) is submitted
on our current add-leaf endpoint.  The log computes H(checksum) for incoming
add-leaf requests.  No other changes are required for the log's leaf endpoints.

Monitors locate data externally based on H(checksum), not checksum.  Note that
monitors can verify observed signatures as before without locating the data.
This is important so that we can be sure a signing operation actually happened.

Verifiers need the same (meta)data distributed, but in the verification step
H(checksum) must be computed to verify signatures and inclusion proofs.

Witnesses are not affected by this change.

## Other
A different approach would be to submit data and let the log hash that.  Not
letting the log see data is a feature:
- The data cannot be analyzed by the log unless its location is known
- The data cannot be expected to be stored in the future
- Each logging request becomes cheaper