aboutsummaryrefslogtreecommitdiff
path: root/doc/api.md
blob: 2f4ddeb15741e46d35a5487c4cd069d348e581bf (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
# Sigsum Logging API v0
This document outlines the sigsum logging API, version 0.  The broader picture
is not explained here.  We assume that you are already familiar with the sigsum
logging [design document](https://git.sigsum.org/sigsum/tree/doc/design.md).

**Warning.**
This is a work-in-progress document that may be moved or modified.

## 1 - Overview
A log implements an HTTP(S) API for accepting requests and sending responses.

- Requests that retrieve data from the log uses the HTTP GET method.
- Requests that add data to the log uses the HTTP POST method.
- Input data in get-requests are expressed as ASCII values that are
slash-delimited at the end of the respective endpoint URLs.
- Input data in add-requests and output data in responses are expressed as
ASCII-encoded key/value pairs.
- Binary data is hex-encoded before being transmitted.

The motivation for using text-based formats for request and response data is
that it is simple to parse and understand for humans.  These formats are not
used for the serialization of signed and/or logged data, where a more well
defined and storage efficient format is desirable.

A _signer_ should distribute log responses to their end-users in any format that
suits them.  The (de)serialization required for _end-users_ is a small subset of
Trunnel.  Trunnel is an "idiot-proof" wire-format in use by the Tor project.

Figure 1 of our design document gives an intuition of all involved parties.

## 2 - Primitives
### 2.1 - Cryptography
Logs use the same Merkle tree hash strategy as
	[RFC 6962,§2](https://tools.ietf.org/html/rfc6962#section-2).
Any mentions of hash functions or digital signature schemes refer to
	[SHA256](https://csrc.nist.gov/csrc/media/publications/fips/180/4/final/documents/fips180-4-draft-aug2014.pdf)
and
	[Ed25519](https://tools.ietf.org/html/rfc8032).
The exact
	[signature format](https://github.com/openssh/openssh-portable/blob/master/PROTOCOL.sshsig)
is defined by OpenSSH.

### 2.2 - Serialization
Log requests and responses are transmitted using simple ASCII encodings, for a
smaller dependency than alternative parsers like JSON or percent-encoded URLs.
Some input and output data is binary: cryptographic hashes and signatures.
Binary data must be lower-case base16-encoded, also known as lower-case hex
encoding.  Using hex as opposed to base64 is motivated by it being simpler,
favoring ease of decoding and encoding over efficiency on the wire.

We use the [Trunnel](https://gitweb.torproject.org/trunnel.git)
[description language](https://www.seul.org/~nickm/trunnel-manual.html)
to define data structures that need to be (de)serialized in the log.  Data
structures that need to be signed have additional SSH-specific metadata.  For
example, metadata includes a magic preamble string and a signing context.  An
implementer can easily express the SSH signing format using Trunnel.

### 2.3 - Merkle tree
#### 2.3.1 - Tree head
A tree head contains a timestamp, a tree size, a root hash, and a key hash.

```
struct tree_head {
	u64 timestamp;
	u64 tree_size;
	u8 root_hash[32];
};
```
`timestamp` is the time since the UNIX epoch (January 1, 1970 00:00 UTC) in
seconds.  It is included so that monitors can be convinced of _freshness_ if
enough witnesses added their cosignatures.  A signer can also use timestamps
to prove to an end-user that public logging happened within some interval
	[\[TS\]](https://git.sigsum.org/sigsum/commit/?id=fef460586e847e378a197381ef1ae3a64e6ea38b).

`tree_size` is the number of leaves in a log.

`root_hash` is a Merkle tree root hash that fixes a log's structure and content.

#### 2.3.2 - (Co)signed tree head
Logs and witnesses perform (co)signing operations by treating the serialized
tree head as the message `M` in SSH's
	[signing format](https://github.com/openssh/openssh-portable/blob/master/PROTOCOL.sshsig).
The hash algorithm string must be "SHA256".  The reserved string must be empty.
The namespace field must be set to `tree_head:v0:<key-hash>@sigsum.org`, where
`<key hash>` is substituted with the log's hashed public key.  The public key is
encoded as defined in
	[RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2)
before hashing it.  This ensures a _sigsum log specific tree head context_ that
prevents a possible
	[attack](https://git.sigsum.org/sigsum/tree/archive/2021-08-10-witnessing-broader-discuss#n95)
in multi-log ecosystems.

A witness must not cosign a tree head if it is inconsistent with prior history
or if the timestamp is older than five (5) minutes.  This means that a witness plays
	[two abstract roles](https://git.sigsum.org/sigsum/tree/archive/2021-08-31-checkpoint-timestamp-continued#n84):
Verifier("append-only") and Verifier("freshness").

#### 2.3.3 - Tree leaf
Logs support a single leaf type.  It contains a signer's statement,
signature, and key hash.

```
struct tree_leaf {
    u64 shard_hint;
    u8 checksum[32];
    u8 signature[64];
    u8 key_hash[32];
}
```

`shard_hint` is a shard hint that matches the log's shard interval.

`checksum` is a hash of the 32-byte message submitted by the signer.
The message is meant to represent some data and it is recommended that
the signer uses `H(data)` as the message, in which case `checksum`
will be `H(H(data))`.

`signature` is computed by treating the above message as `M`
in SSH's
	[signing format](https://github.com/openssh/openssh-portable/blob/master/PROTOCOL.sshsig).
The hash algorithm string must be "SHA256".  The reserved string must be empty.
The namespace field must be set to `tree_leaf:v0:<shard_hint>@sigsum.org`, where
`<shard_hint>` is replaced with the shortest decimal ASCII representation of `shard_hint`.
This ensures a _sigsum shard-specific tree leaf context_.

`key_hash` is a hash of the signer's public key using the same
format as Section 2.3.2.  It is included
in `tree_leaf` so that each leaf can be attributed to a signer.  A hash,
rather than the full public key, is used to motivate monitors and end-users to
locate the appropriate key and make an explicit trust decision.

## 3 - Public endpoints
A log must have a fixed and unique log URL.  A valid log URL is any valid
HTTP(S) URL that ends with "/sigsum/v0".  Example:
```
https://log.example.com:4711/opossum/2021/sigsum/v0`.
```

Input data in `get-*` requests are added at the end of an endpoint's
URL.  Values are delimited by a `/`.  The order of values is defined by
the respective endpoints.  For an example, see Section 3.4.

Input data in `add-*` requests is POST:ed in the HTTP message body as
line-terminated ASCII key/value pairs.  The key-value format is `Key=Value\n`.
Everything before the first equal-sign is considered a key.
Everything after the first equal sign and before the next new line character is
considered a value.  Different keys may appear in any order.  A key may be
repeated, in which case the relative order must be preserved.  Example:
```
blue=first value for blue key
red=some value for red key
blue=second value for blue key
```

Output data (in replies) is sent in the HTTP message body using the same
key-value format as for `add-*` input data.

The HTTP status code is 200 OK to indicate success.  A different HTTP
status code is used to indicate failure.  A log must respond with a
human-readable string describing what went wrong using the key `error`.
Example:
```
error=Invalid signature
```

### 3.1 - get-tree-head-to-cosign
Returns a tree head that witnesses should cosign.

```
GET <log URL>/get-tree-head-to-cosign
```

Input:
- None

Output on success:
- `timestamp`: `tree_head.timestamp`, ASCII-encoded decimal number.
- `tree_size`: `tree_head.tree_size`, ASCII-encoded decimal number.
- `root_hash`: `tree_head.root_hash`, hex-encoded.
- `signature`: log signature for the above tree head, hex-encoded.

### 3.2 - get-tree-head-cosigned
Returns a tree head that has been cosigned by at least one witness.  The list of
cosignatures is updated every time a new cosignature gets added.  This
endpoint is used by Signers that want _enough cosignatures as fast as possible_.

```
GET <log URL>/get-tree-head-cosigned
```

Input:
- None

Output on success:
- `timestamp`: `tree_head.timestamp`, ASCII-encoded decimal number.
- `tree_size`: `tree_head.tree_size`, ASCII-encoded decimal number.
- `root_hash`: `tree_head.root_hash`, hex-encoded.
- `signature`: log signature for the above tree head, hex-encoded.
- `cosignature`: witness signature for the above tree head, hex-encoded.
- `key_hash`: hashed witness public key that can be used to verify the
  above cosignature.  The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2)
  before hashing.  The resulting hash value is hex-encoded.

The `cosignature` and `key_hash` fields may repeat. The first witness signature
corresponds to the first key hash, the second witness signature corresponds to
the second key hash, etc.  At least one witness signature must be returned on
success.  The number of witness signatures and key hashes must match.

### 3.3 - get-inclusion-proof
```
GET <log URL>/get-inclusion-proof/<tree_size>/<leaf_hash>
```

Input:
- `tree_size`: tree size of the tree head that the proof should be
  based on, ASCII-encoded decimal number.
- `leaf_hash`: leaf hash identifying which `tree_leaf` the log should prove
  inclusion of, hex-encoded.

Output on success:
- `leaf_index`: zero-based index of the leaf that the proof is based on,
  ASCII-encoded decimal number.
- `inclusion_path`: node hash, hex-encoded.

The leaf hash is computed using the RFC 6962 hashing strategy.  In
other words, `H(0x00 | tree_leaf)`.

`inclusion_path` must contain one or more hashes.  The order of node hashes
follow from the hash strategy, see RFC 6962.

Example:
```
$ curl <log URL>/get-inclusion-proof/4711/241fd4538d0a35c2d0394e4710ea9e6916854d08f62602fb03b55221dcdac90f
```

### 3.4 - get-consistency-proof
```
GET <log URL>/get-consistency-proof/<old_size>/<new_size>
```

Input:
- `old_size`: tree size of an older tree head that the log should prove is
  consistent with a newer tree head, ASCII-encoded decimal number.
- `new_size`: tree size of a newer tree head, ASCII-encoded decimal number.

Output on success:
- `consistency_path`: node hash, hex-encoded.

`consistency_path` must contain one or more hashes.  The order of node
hashes follow from the hash strategy, see RFC 6962.

Example:
```
$ curl <log URL>/get-consistency-proof/42/4711
```

### 3.5 - get-leaves
```
GET <log URL>/get-leaves/<start_size>/<end_size>
```

Input:
- `start_size`: index of the first leaf to retrieve, ASCII-encoded decimal
  number.
- `end_size`: index of the last leaf to retrieve, ASCII-encoded decimal number.

Output on success:
- `shard_hint`: shard hint to use as tree leaf context, ASCII-encoded decimal
  number.
- `checksum`: `tree_leaf.statement.checksum`, hex-encoded.
- `signature`: `tree_leaf.signature`, hex-encoded.
- `key_hash`: `tree_leaf.key_hash`, hex-encoded.

All fields may be repeated to return more than one leaf.  The first
value in each list refers to the first leaf, the second value in each
list refers to the second leaf, etc.  The size of each list must match.

A log may return fewer leaves than requested.  At least one leaf
must be returned on success.

Example:
```
$ curl <log URL>/get-leaves/42/4711
```

### 3.6 - add-leaf
```
POST <log URL>/add-leaf
```

Input:
- `shard_hint`: shard hint to use as tree leaf context, ASCII-encoded decimal
  number.
- `message`: the message used to compute `tree_leaf.statement.checksum`, hex-encoded.
- `signature`: `tree_leaf.signature`, hex-encoded.
- `public_key`: public key that can be used to verify the
  above signature.  The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2),
  then hex-encoded.
- `domain_hint`: domain name indicating where `tree_leaf.key_hash` can be found
  as a DNS TXT resource record with hex-encoding.  The left-most label must be
  set to `_sigsum_v0`.

Output on success:
- None

A submission will not be accepted if `signature` or `shard_hint` is invalid.
The retrieved key hash must also match the specified public key.

A submission may not be accepted if the second-level domain name has exceeded its
rate limit.  A rate limit should only be charged for the specified domain hint
on success.

HTTP status 200 OK must not be returned unless the log has sequenced its Merkle
tree so that the next signed tree head merged the added leaf.  A submitter
should (re)send their add-leaf request until observing HTTP status 200 OK.

Example:
```
$ echo "shard_hint=1633039200
message=315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3
signature=0b849ed46b71b550d47ae320a8a37401129d71888edcc387b6a604b2fe1579e25479adb0edd1769f9b525d44b843ac0b3527ea12b8d9574676464b2ec6077401
public_key=46a6aaceb6feee9cb50c258123e573cc5a8aa09e5e51d1a56cace9bfd7c5569c
domain_hint=_sigsum_v0.example.com" | curl --data-binary @- <log URL>/add-leaf
```

TODO: update the above with valid input.  Link
	[proposal](https://git.sigsum.org/sigsum/tree/doc/proposals/2021-11-ssh-signature-format.md)
on how one could produce it "byte-for-byte" using Python and ssh-keygen -Y.

### 3.7 - add-cosignature
=======
```
POST <log URL>/add-cosignature
```

Input:
- `cosignature`: witness signature over `tree_head`, hex-encoded.
- `key_hash`: hashed witness public key that can be used to verify the
  above cosignature.  The key is encoded as defined in [RFC 8032, section 5.1.2](https://tools.ietf.org/html/rfc8032#section-5.1.2)
  prior to hashing.  The resulting hash value is hex-encoded.

Output on success:
- None

`key_hash` can be used to identify which witness cosigned a tree head.  A
key-hash, rather than the full public key, is used to motivate monitors
and end-users to locate the appropriate key and make an explicit trust decision.

Note that logs must be configured with relevant public keys for witnesses.

Example:
```
$ echo "cosignature=d1b15061d0f287847d066630339beaa0915a6bbb77332c3e839a32f66f1831b69c678e8ca63afd24e436525554dbc6daa3b1201cc0c93721de24b778027d41af
key_hash=662ce093682280f8fbea9939abe02fdba1f0dc39594c832b411ddafcffb75b1d" | curl --data-binary @- <log URL>/add-cosignature
```

TODO: update the above with valid input.  Link
	[proposal](https://git.sigsum.org/sigsum/tree/doc/proposals/2021-11-ssh-signature-format.md)
on how one could produce it "byte-for-byte" using Python and ssh-keygen -Y.

## 4 - Parameter summary
Ed25519 as signature scheme. SHA256 as hash function.

### 4.1 - Log
- **Public key**: public key that is used to verify tree head
  signatures.
- **Base URL**: Where the log can be reached over HTTP(S).  It is the
  prefix to be used to construct a version 0 specific endpoint.
- **Shard interval start**: the earliest time at which logging
  requests are accepted as the number of seconds since the UNIX epoch.
- **Shard interval end**: determined by policy.  A log that is active should
  use the number of seconds since the UNIX epoch as a dynamic shard end.

### 4.2 - Witness
- **Public key**: public key that is used to verify tree head
  cosignatures.