aboutsummaryrefslogtreecommitdiff
path: root/doc/design.md
blob: 535685b4c45bddbb2e28a5fb4ee31a3eccd39971 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
# Sigsum Logging Design v0
We propose sigsum logging.  It is similar to Certificate Transparency and Go's
checksum database, except that cryptographically **sig**ned check**sum**s are
logged in order to make signature operations transparent.  For example,
malicious and unintended key-usage can be detected using a sigsum log.  This is
a building block that can be used for a variety of use-cases.  Transparent
management of executable binaries and provenance are two examples.  Our
architecture evolves around centralized log operations, distributed trust, and
minimalism that simplifies usage.

**Preliminaries.**
You have basic understanding of cryptographic primitives, e.g., digital
signatures, hash functions, and Merkle trees.  You roughly know what problem
Certificate Transparency solves and how.

**Warning.**
This is a work-in-progress document that may be moved or modified.  A future
revision of this document will bump the version number to v1.

Please let us know if you have any feedback.

## 1 - Introduction
Transparency logs make it possible to detect unwanted events.  For example,
are there any (mis-)issued TLS certificates
	[\[CT\]](https://tools.ietf.org/html/rfc6962),
did you get a different Go module than everyone else
	[\[ChecksumDB\]](https://go.googlesource.com/proposal/+/master/design/25530-sumdb.md),
or is someone running unexpected commands on your server
	[\[AuditLog\]](https://transparency.dev/application/reliably-log-all-actions-performed-on-your-servers/).

A sigsum log brings transparency to **sig**ned check**sum**s.  You can think of
sigsum logging as pre-hashed digital signing with transparency.
The signing party is called a _signer_.
The user of the signed data is called a _verifier_.

The problem with _just digital signing_ is that it is difficult to determine
whether the signed data is _actually the data that should have been signed_.
How would we detect if a secret signing key got compromised?
How would we detect if something was signed by mistake, or even worse,
if the signing party was forced to sign malicious data against their will?

Sigsum logs make it possible to answers these types of questions.  The basic
idea is to make a signer's _key-usage_ transparent.  This is a powerful building
block that can be used to facilitate verification of falsifiable claims.

Examples include:
- Everyone gets the same executable binaries
	[\[BT\]](https://wiki.mozilla.org/Security/Binary_Transparency)
- A domain does not serve malicious javascript
	[\[SRI\]](https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity)
- A list of key-value pairs is maintained with a certain policy.

There are many other use-cases that sigsum logging can help with.  We intend to
document them based on what people are working on in a
        [separate document](https://git.sigsum.org/sigsum/tree/doc/claimant.md)
using the
        [claimant model](https://github.com/google/trillian/blob/master/docs/claimantmodel/CoreModel.md).
This document is about our log design.

### 1.1 - Goals and non-scope
The goal of sigsum logging is to make a signer's key-usage transparent in
general.  Therefore, sigsum logs allow logging of signed checksum and some
minimally required metadata.  Storing data and rich metadata is a non-goal.

We want the resulting design to be easy from many different perspectives, for
example log operations and verification in constrained environments.  This
includes considerations such as simple parsing, protection against log spam and
poisoning, and a well-defined gossip protocol without complex auditing logic.

This is in contrast to Certificate Transparency, which requires ASN.1 parsing,
storage of arbitrary certificate fields, reactive auditing of complicated log
promises, and deployment of a gossip protocol that suits the web
	[\[G1,](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7346853)
	[G2\]](https://datatracker.ietf.org/doc/html/draft-ietf-trans-gossip-05).

### 1.2 - Log properties
It is fair to say that much though went into _removing_ unwanted usage-patterns
of sigsum logs, ultimately leaving us with a design that has the below
properties.  It does not mean that the sigsum log design is set in stone yet,
but it is mature enough to capture what type of ecosystem we want to bootstrap.
- **Preserved data flows:** a verifier can enforce sigsum logging without making
additional outbound network connections.  Proofs of public logging are provided
using the same distribution mechanism as is used for distributing the actual data.
In other words, the signer talks to the log on behalf of the verifying party.
- **Sharding to simplify log life cycles:** starting to operate a log is easier
than closing it down in a reliable way.  We have a predefined sharding interval
that determines the time during which the log will be active.  Submissions to
an older log shard cannot be replayed in another non-overlapping log shard.
- **Defenses against log spam and poisoning:** to keep logs as useful as
possible they should be open for everyone.  However, accepting logging requests
from anyone at arbitrary rates can lead to abusive usage patterns.  We store as
little metadata as possible to combat log poisoning.  We piggyback on DNS to
combat log spam.  Sharding is also helpful to combat log spam in the long run.
- **Built-in mechanisms that ensure a globally consistent log:** transparency
logs rely on gossip protocols to detect forks.  We built a proactive gossip
protocol directly into the log.  It is a variant of
	[witness cosigning](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7546521).
- **No cryptographic agility**: the only supported signature schemes and hash
functions are Ed25519 and SHA256.  Not having any cryptographic agility makes
protocols and data formats simpler and more secure.
- **Simple (de)serialization parsers:** complex (de)serialization parsers
increase attack surfaces and make the system more difficult to use in
constrained environments.  Signed and logged data can be (de)serialized using
	[Trunnel](https://gitlab.torproject.org/tpo/core/trunnel/-/blob/main/doc/trunnel.md),
or "by hand" in many modern programming languages.  This is the only parsing
that a verifier is required to support.  Signers, monitors, and witnesses
additionally need to interact with a sigsum log's line-terminated ASCII HTTP(S)
        [API](https://git.sigsum.org/sigsum/tree/doc/api.md).

### 1.3 - Roadmap
First we describe our threat model.  Then we give a bird's view of the design.
Finally, we wrap up with an incomplete frequently asked questions section.

## 2 - Threat model
We consider a powerful attacker that gained control of a signer's signing and
release infrastructure.  This covers a weaker form of attacker that is able to
sign data and distribute it to a subset of isolated verifiers.  For example,
this is essentially what the FBI requested from Apple in the San Bernardino case
	[\[FBI-Apple\]](https://www.eff.org/cases/apple-challenges-fbi-all-writs-act-order).
The fact that signing keys and related infrastructure components get
compromised should not be controversial these days
	[\[SolarWinds\]](https://www.zdnet.com/article/third-malware-strain-discovered-in-solarwinds-supply-chain-attack/).

The attacker can also gain control of the log's signing key and infrastructure.
This covers a weaker form of attacker that is able to sign log data and
distribute it to a subset of isolated verifiers.  For example, this could have
been the case when a remote code execution was found for a Certificate
Transparency Log
	[\[DigiCert\]](https://groups.google.com/a/chromium.org/g/ct-policy/c/aKNbZuJzwfM).

The overall system is said to be secure if a monitor can discover every signed
checksum that a verifier would accept.  A log can misbehave by not presenting
the same append-only Merkle tree to everyone because it is attacker-controlled.
However, a log operator would only do that if it is likely to go unnoticed.

For security we need a collision resistant hash function and an unforgeable
signature scheme.  We also assume that at most a threshold of independent
witnesses stop following protocol to protect against a malicious log that
attempts
	[split-view](https://datatracker.ietf.org/doc/html/draft-ietf-trans-gossip-05)
and
	[slow-down](https://git.sigsum.org/sigsum/tree/archive/2021-08-24-checkpoint-timestamp)
attacks.   A log operator can at best deny service with these assumptions.

## 3 - Design
An overview of sigsum logging is shown in Figure 1.  Before going into detail
we give a brief primer below.
```
                               +----------+
           checksum +----------|  Signer  |-----------+ data
           metadata |          +----------+           | metadata
                    |                ^                | proof
                    v                |                v
  +-----+ H(vk) +---------+   proof  |          +--------------+
  | DNS |------>|   Log   |----------+          | Distribution |
  +-----+       +---------+                     +--------------+
                 ^  | checksum                     |  |
                 |  | metadata                     |  |data
                 |  | proof     +---------+   data |  |metadata
                 |  +---------->| Monitor |<-------+  |proof
                 v              +---------+           v
               +---------+           |             +----------+
               | witness |           | false       | Verifier |
               +---------+           | claim       +----------+
                                     v
                                investigate
           
                       Figure 1: system overview
```

A signer wants to make their key-usage transparent.  Therefore, they sign a
statement that sigsum logs accept.  That statement encodes a checksum of some
data.  Minimal metadata must also be logged, such as the checksum's signature
and a hash of the public verification key.  A hash of the public verification
key is configured in DNS as a TXT record to help log operators combat spam.

The signing party waits for their submission to be included in the log.  When an
inclusion proof is available that leads up to a trustworthy Merkle tree head,
the signed checksum's data is ready for distribution with proofs of public
logging.  A sigsum log does not help the signer with any data distribution.

Verifiers use the signer's data if it is accompanied by proofs of public
logging.  Monitors look for signed checksums and data that correspond to public
keys that they are aware of.  Any falsifiable claim that a signer makes about
their key-usage can now be verified because no signing operation goes unnoticed.

Verifiers and monitors can be convinced that public logging happened without
additional outbound network connections if a threshold of witnesses followed a
cosigning protocol.  More detail is provided in Section 3.2.3.

### 3.1 - Merkle tree
A sigsum log maintains a public append-only Merkle tree.  Independent witnesses
verify that this tree is fresh and append-only before cosigning it to achieve a
distributed form of trust.  A tree leaf contains four fields:
- **shard_hint**: a number that binds the leaf to a particular _shard interval_.
Sharding means that the log has a predefined time during which logging requests
are accepted.  Once elapsed, the log can be shut down.
- **checksum**: most likely a hash of some data.  The log is not aware of data;
just checksums.
- **signature**: a digital signature that is computed by a signer over the
leaf's shard hint and checksum.
- **key_hash**: a cryptographic hash of the signer's verification key that can
be used to verify the signature.

A shard hint is included in the signed statement to prevent replays in a
non-overlapping shard.  See details in Section 4.2.

Any additional metadata that is use-case specific can be stored as part of the
data that a checksum represents.  Where data is located is use-case specific.

Note that a key hash is logged rather than the public key itself.  This reduces
the likelihood that an untrusted key is discovered and used by mistake.  In
other words, verifiers and monitors must locate keys and trust them explicitly.

### 3.2 - Usage pattern
#### 3.2.1 - Prepare a request
A signer selects a shard hint and a checksum that should be logged.  The
selected shard hint represents an abstract statement like "sigsum logs that are
active during 2021".  The selected checksum is most likely the output of a
hash function.  For example, it could be the hash of an executable binary.

The selected shard hint and checksum are signed by the signer.  A shard hint is
incorporated into the signed statement to ensure that a log's leaves cannot be
replayed in a non-overlapping shard by a good Samaritan.

The signer also has to do a one-time DNS setup.  As outlined below, logs will
check that _some domain_ is aware of the signer's verification key.  This is
part of a defense mechanism that helps us combat log spam.

#### 3.2.2 - Submit request
Sigsum logs implement an HTTP(S) API.  Input and output is human-readable and
uses a simple ASCII format.  A more complex parser like JSON is not needed
because the exchanged data structures are primitive enough.

A signer submits their shard hint, checksum, signature, and public verification
key as key-value pairs.  The log uses the public verification key to check that
the signature is valid, then hashes it to construct the leaf's key hash.

The signer also submits a _domain hint_.  The log will download a DNS TXT
resource record based on the provided domain name.  The downloaded result must
match the public verification key hash.  By verifying that all signers control a
domain that is aware of their verification key, rate limits can be applied per
second-level domain.  You would need a large number of domain names to spam the
log in any significant way if rate limits are not too loose.

Using DNS to combat spam is convenient because many signers already have a
domain name.  A single domain name is also relatively cheap.  Another benefit is
that the same anti-spam mechanism can be used across several independent logs
without coordination.  This is important because a healthy log ecosystem needs
more than one log to be reliable in case of downtime or unexpected events like
        [cosmic rays](https://groups.google.com/a/chromium.org/g/ct-policy/c/PCkKU357M2Q/).

A signer's domain hint is not part of the logged leaf because key management is
more complex than that.  A separate project should focus on transparent key
management.  Our work is about transparent _key-usage_.

A sigsum log _tries_ to incorporate a leaf into its Merkle tree if a logging
request is accepted.  There are however no _promises of public logging_ as in
Certificate Transparency.  Therefore, sigsum logs do not provide low-latency.  A
signer has to wait for an inclusion proof and a cosigned tree head.

#### 3.2.3 - Wait for witness cosigning
Sigsum logs freeze a tree head every five minutes.  Cosigning witnesses poll the
logs for so-called _to-sign_ tree heads, verifying that they are fresh and
append-only before doing a cosignature operation.  Cosignatures are posted back
to the logs so that signers can easily fetch the finalized cosigned tree heads.

It takes five to ten minutes before a signer's distribution phase can start.
The added latency is a trade-off that simplifies sigsum logging by removing the
need for reactive gossip-audit protocols
	[\[G1,](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7346853)
	[G2,](https://datatracker.ietf.org/doc/html/draft-ietf-trans-gossip-05)
	[G3,](https://petsymposium.org/2021/files/papers/issue2/popets-2021-0024.pdf)
	[G4\]](https://docs.google.com/document/d/16G-Q7iN3kB46GSW5b-sfH5MO3nKSYyEb77YsM7TMZGE/edit).

Use-cases like instant certificate issuance are not supported by design.

#### 3.2.4 - Distribution
After a signer collected proofs of public logging the distribution phase can
start.  Distribution happens using the same mechanism that is normally used for
the data.  For example, on a website, in a git repository, etc.

**Data:**
the signer's data.  It can be used to reproduce a logged checksum.

**Metadata:**
a signer's shard hint, signature, and verification key hash.  Note that the
combination of data and metadata can be used to reconstruct the logged leaf.

**Proof:**
an inclusion proof that leads up to a cosigned tree head.

#### 3.2.5 - Verification
A verifier should only accept the distributed data if these criteria hold:
1. The signer's checksum is correct for the distributed data.
2. The signer's signed statement is valid for the specified public key.
3. The provided tree head can be reconstructed from the logged leaf and 
its inclusion proof.
4. The provided tree head is from a known log with enough valid cosignatures.

Notice that there are no new outbound network connections for a verifier.
Therefore, a proof of public logging is only as convincing as the tree head that
an inclusion proof leads up to.  Sigsum logs have trustworthy tree heads due to
using a variant of witness cosigning.  In other words, a verifier cannot be
tricked into accepting some data whose checksum have yet to be publicly logged
unless the attacker controls more than a threshold of witnesses.

#### 3.2.6 - Monitoring
An often overlooked step is that transparency logging falls short if no-one keeps
track of what appears in the public logs.  Monitoring is necessarily use-case
specific in sigsum.  At minimum, you need to locate relevant public keys.  You
may also need to be aware of how to locate the data that a checksum represents.

It should also be noted that sigsum logging can facilitate detection of attacks
even if a verifier fails open by enforcing the third and fourth criteria partially
in Section 3.2.5.  For example, the fact that a distribution mechanism does not
serve proofs of public logging could indicate that there is an ongoing attack
against a signer's distributed infrastructure.  A monitor may detect that.

### 3.3 - Summary
Sigsum logs are sharded and shut down at predefined times.  A sigsum log can
shut down _safely_ because verification on the verifier-side is not interactive.
The difficulty of bypassing public logging is based on the difficulty of
controlling enough independent witnesses.  A witness checks that a log's tree
head is correct before cosigning.  Correct refers to fresh and append-only.

Signers, monitors, and witnesses interact with the logs using an ASCII HTTP(S)
API.  A signer must prove that they own a domain name as an anti-spam mechanism.
No data and rich metadata is logged to protect the log operator from poisoning.
It also keeps log operations simpler because there are fewer bytes to manage.

Verifiers interact with the logs indirectly through their signer's existing
distribution mechanism.  Signers are responsible for logging signed checksums
and distributing necessary proofs of public logging.  Monitor discover signed
checksums in the logs, generating alerts if any key-usage is inappropriate.

### 4 - Frequently Asked Questions
#### 4.1 - What parts of the design are we still thinking about?
A brief summary appeared in our archive on
	[2021-10-05](https://git.sigsum.org/sigsum/tree/archive/2021-10-05-open-design-thoughts?id=5c02770b5bd7d43b9327623d3de9adeda2468e84).
It may be incomplete, but covers some details that are worth thinking more
about.  We are still open to remove, add, or change things if it is motivated.

#### 4.2 - What is the point of having a shard hint?
Unlike TLS certificates which already have validity ranges, a checksum does not
carry any such information.  Therefore, we require that the signer selects a
shard hint.  The selected shard hint must be in a log's shard interval.  A shard
interval is defined by a start time and an end time.  Both ends of the shard
interval are inclusive and expressed as the number of seconds since the UNIX
epoch (January 1, 1970 00:00 UTC).

Without sharding, a good Samaritan can add all leaves from an old log into a
newer one that just started its operations.  This makes log operations
unsustainable in the long run because log sizes grow indefinitely.

Such re-logging also comes at the risk of activating someone else's rate limits.

Note that a signer's shard hint is not a verified timestamp.  We recommend to
set it as large as possible.  If a verified timestamp is needed to reason about
the time of logging, you may use a cosigned tree head instead
	[\[TS\]](https://git.sigsum.org/sigsum/commit/?id=fef460586e847e378a197381ef1ae3a64e6ea38b).

#### 4.3 - XXX
- Why not store data in the log?  XXX: answered enough already?
- Why not store rich metadata in the log? XXX: answered enough already?
- What (de)serialization parsers are needed and why?
- What cryptographic primitives are supported and why?
- What thought went into witness cosigning?  Compare with other approaches, and
should include `get-tree-head-*` endpoints in more detail.
- Are there any privacy concerns?
- How does it work with more than one log?
- What policy should a verifier use?