From f8d61a93109656e89cbbdeae56ca778127a0eafe Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Mon, 31 Jan 2022 15:36:29 +0100 Subject: moved some persisted pads to proposal directory See doc/proposals/2022-01-how-to-use-proposal-folder for details. --- archive/2022-01-04--meeting-minutes | 8 +- archive/2022-01-04-proposal-add-leaf-endpoint | 90 ---------------- archive/2022-01-04-proposal-domain-hint | 51 --------- archive/2022-01-04-proposal-get-endpoints | 46 -------- archive/2022-01-04-proposal-tree-head-endpoints | 118 --------------------- archive/2022-01-18--meeting-minutes | 5 +- .../2022-01-18-proposal-author-reader-terminology | 41 ------- archive/2022-01-18-proposal-log-url | 26 ----- doc/proposals/2022-01-add-leaf-endpoint | 90 ++++++++++++++++ doc/proposals/2022-01-author-reader-terminology | 41 +++++++ doc/proposals/2022-01-domain-hint | 51 +++++++++ doc/proposals/2022-01-get-endpoints | 46 ++++++++ doc/proposals/2022-01-log-url | 26 +++++ doc/proposals/2022-01-tree-head-endpoints | 118 +++++++++++++++++++++ 14 files changed, 378 insertions(+), 379 deletions(-) delete mode 100644 archive/2022-01-04-proposal-add-leaf-endpoint delete mode 100644 archive/2022-01-04-proposal-domain-hint delete mode 100644 archive/2022-01-04-proposal-get-endpoints delete mode 100644 archive/2022-01-04-proposal-tree-head-endpoints delete mode 100644 archive/2022-01-18-proposal-author-reader-terminology delete mode 100644 archive/2022-01-18-proposal-log-url create mode 100644 doc/proposals/2022-01-add-leaf-endpoint create mode 100644 doc/proposals/2022-01-author-reader-terminology create mode 100644 doc/proposals/2022-01-domain-hint create mode 100644 doc/proposals/2022-01-get-endpoints create mode 100644 doc/proposals/2022-01-log-url create mode 100644 doc/proposals/2022-01-tree-head-endpoints diff --git a/archive/2022-01-04--meeting-minutes b/archive/2022-01-04--meeting-minutes index 284c3e5..a828953 100644 --- a/archive/2022-01-04--meeting-minutes +++ b/archive/2022-01-04--meeting-minutes @@ -16,13 +16,13 @@ Status round * [rgdd, ln5] remove arbitrary bytes proposal (re-opened) * https://git.sigsum.org/sigsum/tree/doc/proposals/2021-11-remove-arbitrary-bytes.md * [rgdd, ln5] change get-* endpoints that use HTTP post proposal (new) - * https://git.sigsum.org/sigsum/tree/archive/2022-01-04-proposal-get-endpoints + * https://git.sigsum.org/sigsum/tree/doc/proposals/2022-01-get-endpoints * [rgdd, ln5] stricter domain hint verification proposal (new) - * https://git.sigsum.org/sigsum/tree/archive/2022-01-04-proposal-domain-hint + * https://git.sigsum.org/sigsum/tree/doc/proposals/2022-01-domain-hint * [rgdd, ln5] change add-leaf endpoint proposal (new) - * https://git.sigsum.org/sigsum/tree/archive/2022-01-04-proposal-add-leaf-endpoint + * https://git.sigsum.org/sigsum/tree/doc/proposals/2022-01-add-leaf-endpoint * [rgdd, ln5] change tree-head endpoints proposal (new) - * https://git.sigsum.org/sigsum/tree/archive/2022-01-04-proposal-tree-head-endpoints + * https://git.sigsum.org/sigsum/tree/doc/proposals/2022-01-tree-head-endpoints Decisions * Decision: adopt remove arbitrary bytes proposal diff --git a/archive/2022-01-04-proposal-add-leaf-endpoint b/archive/2022-01-04-proposal-add-leaf-endpoint deleted file mode 100644 index 3123e02..0000000 --- a/archive/2022-01-04-proposal-add-leaf-endpoint +++ /dev/null @@ -1,90 +0,0 @@ -Proposal: change add-leaf endpoint - -Background ---- -Right now a log returns HTTP status 200 OK if it will "try" to merge a submitted -leaf into its Merkle tree. A submitter should not assume that logging happened -until they see an inclusion proof that leads up to a (co)signed tree head. - -If a submitted leaf does not show up in the log despite seeing HTTP status 200 -OK, the submitter must resubmit it. When a resubmission is required/expected is -undefined. - -The reason for this "try" behavior is that log operations become much easier, -especially in self-hosted environments that do not rely on managed databases. -In other words, it is OK to just be "pretty sure" that a submitted leaf will be -persisted and sequenced, and "100%" sure after sequencing actually happened. - -Proposal ---- -A log should not return HTTP status 200 OK unless: -1. The submitted leaf has been sequenced as part of a persisted database. -2. The next tree head that the log signs will contain the submitted leaf. - -HTTP status 3XX is returned with, e.g., "Error=leaf has not been sequenced yet" -if it is not guaranteed that the submitted leaf has been sequenced. - -This means that logging should be assumed after seeing HTTP status 200 OK. This -assumption will be confirmed when the submitter obtains the next (co)signed tree -head. Further investigation is required if it turns out that this assumption is -false. - -Notes ---- -An earlier draft of this proposal considered if useful debug information should -be returned, such as "leaf index", "leaf hash", and "estimated time until a -cosigned tree head is available". We decided to not go in this direction to -avoid redundant and unsigned output that may be mis-used and tampered with ("not -consistent with design"). - -(Note that it is easy to determine when the next cosigned tree head will be -available. The to-sign tree head has a timestamp, and it is rotated every 300s. -Then it takes an additional 300s before the to-sign tree head is served with -collected cosignatures.) - -An earlier draft of this proposal also considered to have verifiable output: - * Option 1: An inclusion proof and a signed tree head - * Option 2: An inclusion proof and a cosigned tree head - -This could be a worthwhile direction if the submitter can only obtain the -required data by using the add-leaf endpoint, thus "forcing resubmits until the -desired output is obtained". Credit to Al Cutter who proposed this (very nice) -idea to us a while back. - -It is not appropriate to always return an inclusion proof for a signed tree -head. What we want is for submitters to get inclusion proofs that reference -cosigned tree heads. - -There are drawbacks to replace the above signed tree head with a cosigned tree -head: - * A submitter that submits multiple leaves will likely (have to?) retrieve - the same cosigned tree head multiple times via the add-leaf endpoint. That - overhead adds up. - * A submitter will have to be in a "resubmit phase" for several minutes as - the default, because it takes time before a cosigned tree head becomes - available. - * (The most sensible implementation would likely resubmit periodically, - say, once per minute. A clever implementation would look at the - timestamp of the to-sign endpoint to determine when is the earliest time - that a merged may have happened.) - -Moreover, removing the get-inclusion-proof and get-tree-head-cosigned endpoints -to force usage of add-leaf excludes (or makes for wonky) usage patterns of the -log: - * "I just want to download all cosigned tree heads to archive them" -> add - leaves. - * "I just want to debug/know that the log is committed to have the leaf - logged, and rely on other witnesses" -> still forced to observe the log's - cosignatures. - * "I want an inclusion proof to a particular tree head" -> build the Merkle - tree yourself to construct that proof. The log's API chooses tree heads for - you. - * (Keeping these endpoints in addition to any new add-leaf output would to - some degree defeat the purpose of adding output, which is why it is not - considered an option.) - -In gist, we decided to go with a solution that is somewhere in between what we -did before and what Al Cutter proposed. We defined when a resubmission is (not) -expected. As a result, a self-hosted log may return at least one HTTP 3XX for -each leaf request, and a few seconds later return HTTP status 200 OK for the -same input data. diff --git a/archive/2022-01-04-proposal-domain-hint b/archive/2022-01-04-proposal-domain-hint deleted file mode 100644 index 322d9cc..0000000 --- a/archive/2022-01-04-proposal-domain-hint +++ /dev/null @@ -1,51 +0,0 @@ -Proposal: stricter domain hint requirements - -Background ---- -Right now a log is expected to look up a submitter's public key hash via DNS. A -domain hint, say, example.com, specifies the location of a TXT RR that contains -the appropriate key hash in hex-encoding. "Some domain knows about the key". - -Downsides with this: -1. A log can be instructed to look up arbitrary TXT records -2. No versioning - -As far as we know there are no amplification threats with (1), but ideally it -would only be possible to query TXT RRs that are actually relevant for Sigsum. - -Not having any versioning could potentially become a headache. All other log -endpoints are versioned. There is no good reason to not have versioning here, -unless that would imply something like registering many different things with -IANA as a result. - -Proposal ---- -Require that a domain hint is formatted as: - - _sigsum_v0.* - -Examples of valid domain hints: - - _sigsum_v0.com - _sigsum_v0.example.com - _sigsum_v0.sub.example.com - -Examples of invalid domain hints: - - _sigsum_v0hello.example.com - -This change addresses both (1) and (2), without making DNS configs harder. - -Notes ---- -For v1 we need to consider if something should be registered with IANA. Credit -to Patrik Wallström who pointed us towards documentation about labels with -underscores: - * https://www.rfc-editor.org/rfc/rfc8552.html - * https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#underscored-globally-scoped-dns-node-names - -Note also that the dependency on TXT look-ups means that a "hidden log" via Tor -would need help from a resolver that is also available over Tor (preferably an -onion but at minimum reachable over TCP). This is because TXT records cannot be -resolved over Tor. This proposal allows the used resolver to be restricted to -only resolve _sigsum_*. diff --git a/archive/2022-01-04-proposal-get-endpoints b/archive/2022-01-04-proposal-get-endpoints deleted file mode 100644 index cbe3170..0000000 --- a/archive/2022-01-04-proposal-get-endpoints +++ /dev/null @@ -1,46 +0,0 @@ -Proposal: change get-* endpoints that use HTTP post - -Background ---- -Right now we HTTP POST ASCII key-value pairs on these endpoints: - * get-leaves - * get-inclusion-proof - * get-consistency-proof - -The original reason was to not have an additional parser, say, input-parameters -with percent-encoding as part of the request URL. - -A major problem with this approach is that it will not be possible to benefit -from HTTP caching. Debugging, with "URLs that reference data" also becomes more -messy. You would have to say "I did printf | curl ...". - -Proposal ---- -Change these endpoints so that they use HTTP GET. Encode input params in URL: - - /get-leaves/10/20 # get leaves 10,11,...20 - /get-consistency-proof/10/20 # proof from tree size 10 to 20 - /get-inclusion-proof/10/ # proof for tree size 10 - -This notably avoids percent-encoding which is more messy. - -Notes ---- -We considered if it would be a good idea to re-use our ASCII parser for the -portion of the URL that encodes input data. The basic idea would be that -different "end of key" and "end of value" patterns could be used that are better -suited for a URL. - -For example, instead of (=,\n) one could use ([,]) as ("end of key", "end of value"). - * get-leaves/start_size[10]end_size[12] - * get-consistency-proof/old_size[12]new_size[14] - * get-inclusion-proof/tree_size[10]leaf_hash[ab...ef] - -The reasons why we aborted this direction: - * We can not think of any concrete security risk with the shorter '/' proposal. - * There are very few parameters at play here, hard to confuse and quick - feedback loop if you do. For example, "Error=start size must be smaller or - equal to end size". - * We can be sure that the '/' proposal will not introduce any wonky - interoperability issues; picking a ("end of key", "end of value") would - require much more care. diff --git a/archive/2022-01-04-proposal-tree-head-endpoints b/archive/2022-01-04-proposal-tree-head-endpoints deleted file mode 100644 index b2831bf..0000000 --- a/archive/2022-01-04-proposal-tree-head-endpoints +++ /dev/null @@ -1,118 +0,0 @@ -Proposal: change tree-head endpoints - -Background ---- -Right now the get-tree-head-to-sign endpoint returns the signed tree head that -witnesses should cosign. It does not return any cosignatures. One needs to -wait until the to-sign tree head is finalized and served via -get-tree-head-cosigned. We also have a get-tree-head-latest endpoint that is -sort of hanging around for "debug purposes". - -It would be nice if a submitter could find required cosignatures without always -having to wait for five minutes. The log will likely have received a majority -of cosignatures after one minute, but a submitter currently needs to wait the -full duration before getting access via the get-tree-head-cosigned endpoint. - -It would also be nice to consider if the get-tree-head-latest endpoint can be -removed. - -Here is a rough break-down of how we think about the sigsum API's usage via -roles: - * Submitter - * add-leaf, until HTTP status 200 OK which should mean "you have been sequenced". - * [fetching an inclusion proof for a signed tree head to "verify sequencing" - is not a recommended usage pattern, and does not prevent DoS. The only - difference is that the submitter would notice that the log has not - included with regards to the latest tree head sooner than with regards - to the cosigned tree head. In both cases, there is no proof that - submitter got 200 OK without getting sequenced.] - * Distributor - * get-tree-head-cosigned - * get-inclusion-proof - * [wants "enough" cosignatures, sooner rather than later is a soft requirement] - * Monitor - * get-leaves - * get-tree-head-cosigned - * might hit get-{consistency,inclusion}-proof depending on implementation - * [wants as many cosignatures as possible, does not care about ~minutes of waiting] - * Witness - * get-consistency-proof - * get-tree-head-to-sign - * add-cosignature - * [does not / should not care about other cosignatures; just that the - log signed and that the tree head is consistent with prior history as - observed by the witness] - * End-user - * [does not hit any of the log's endpoints] - * "The curious" - * the latest signed tree head, as fast as possible for quick debug - probably. "is the thing I'm doing working". - * the latest cosigned tree head, with as many cosignatures as possible - for archiving - -Keep in mind that the below proposal should not introduce the log's key hash as -output on any API endpoint. We removed this and other redundant output because -that reduces the risk of faulty implementations that operate on untrusted input. - -For example, in the same way that a faulty witness could verify "the wrong -consistency proof" if it just verified the proof against the tree sizes that the -log returned redundantly (as opposed to the tree sizes the witness asked for), a -faulty witness could end-up cosigning a tree head with another log's context -because "they just copied the key hash and used it because it was there". - -Note that we cannot add the key_hash and cosignature fields to the output of -get-tree-head-to-sign. Our ASCII parser is so simple that it does not permit -empty lists. So, we will either need a way to handle empty lists, or tweak our -endpoints so that they still do what we want without having any empty list. - -[Both rgdd and ln5 would like to avoid complicating the ASCII parser.] - -Proposal ---- -1. Remove the get-tree-head-latest endpoint. We no longer have any recommended -usage-pattern for this endpoint and so it should be removed. Our strongest -arguments for removal are "don't use a signed tree head, it is sort of like a -promise", and "it does not even help you prove that the log's HTTP status 200 OK -semantics were faulty". -2. The get-tree-head-to-sign endpoint is kept as is, but renamed. - * Purpose: used by witnesses. -3. Add an endpoint that returns the logs "to-cosign" tree head and all -cosignatures that were collected thus far. If no cosignatures were received -yet, return an error to avoid having an empty list as output. - * Purpose: used by distributors, but could also be used by a witness' - internal monitoring setup ("is my witness working, are the signatures really - showing up?"). -4. Keep an endpoint that serves the "finalized" cosigned tree head. - * Purpose: mainly used by monitors, but could also be used by distributor's - that don't mind the additional waiting or by parties that want to archive - cosigned tree heads. - -This proposal currently does not have a name for the above endpoints. Help -wanted. - -Notes ---- -A witness polls the "get-tree-head-to-sign" endpoint as before. Witnesses are -recommended to poll the log at least once per minute at randomly selected times. - -After a successful add-cosignature request, a witness should not attempt to add -the same cosignature again. A log can refresh their "to-sign tree head" to -instruct witnesses to send their cosignatures again for the same tree size. - -A witness operator may check that their cosignatures appear on the -"get-tree-head-cosigned endpoints". Such checking would likely be part of how -the operator monitors that the witness operates correctly (i.e., it would not be -something that the witness software does itself after a successful -add-cosignature request). - -A submitter ("Signer" in Figure 1) that wants a cosigned tree head that -satisifies a given policy as fast as possible can poll the "dynamic cosigned -tree head endpoint". Keep in mind that polling more than a few times per minute -would not let you obtain cosignatures much faster, see the above recommendation -for how often witnesses should provide their cosignatures. - -A helpful reflection with regards to naming: - * "The log's to-sign STH shows up, it gets filled-up with cosignatures; the - previous cosigned tree head is served on a separate endpoint. Then - "prev=curr, curr=new". I.e., there is a time aspect here that might be - helpful for naming, although previous and current would be bad choices." diff --git a/archive/2022-01-18--meeting-minutes b/archive/2022-01-18--meeting-minutes index 5040581..fafd4a9 100644 --- a/archive/2022-01-18--meeting-minutes +++ b/archive/2022-01-18--meeting-minutes @@ -31,9 +31,9 @@ Status round * [rgdd, ln5] started sketching on what www.sigsum.org might be eventually * https://git.sigsum.org/sigsum/tree/archive/2022-01-18-future-website-sketch * [rgdd] start using the terminology "author" and "reader" proposal - * https://git.sigsum.org/sigsum/tree/archive/2022-01-18-proposal-author-reader-terminology + * https://git.sigsum.org/sigsum/tree/doc/proposals/2022-01-author-reader-terminology * [rgdd] redefine "base URL" as "log URL" proposal - * https://git.sigsum.org/sigsum/tree/archive/2022-01-18-proposal-log-url + * https://git.sigsum.org/sigsum/tree/doc/proposals/2022-01-proposal-log-url * [ln5, rgdd] continued work on logotype Decisions @@ -42,7 +42,6 @@ Decisions * Author seems like a good abstraction definition * Reader seems less good * Decision: redefine "base URL" as "log URL" - * Next steps * [rgdd] tree-head endpoint implementation, think about replacement for "reader" diff --git a/archive/2022-01-18-proposal-author-reader-terminology b/archive/2022-01-18-proposal-author-reader-terminology deleted file mode 100644 index fb447d2..0000000 --- a/archive/2022-01-18-proposal-author-reader-terminology +++ /dev/null @@ -1,41 +0,0 @@ -Start using the terminology "author" and "reader" proposal - -Background ---- -Figure 1 in doc/design.md refers to - - a) the party producing a signed checksum as "Signer", and - b) the party verifying a signed checksum as "Verifier". - -This is fine in isolation, but less appropriate when looking at it from a -broader Sigsum perspective. For example, a "Signer" may also be a "Submitter". -It seems like we are mixing terminology for roles and concrete actors here. - -The above is also ambiguous. For example, logs and witnesses sign things; -witnesses and monitors verify things. - -Proposal ---- -1) Replace "Signer" with "Author" when we are talking about a concrete party. - -According to Wikipedia's definition (https://en.wikipedia.org/wiki/Author), an -'author is "the person who originated or gave existence to anything" and whose -authorship determines responsibility for what was created'. This seems -appropriate for us. - -The term "author" has been used in academic litterature before us for similar -purposes: - * "In the setting of transparency logging [18] as depicted in Fig. 1, the - author generates events intended for recipients that describe data - processing by the author as it takes place" - * Link to paper: https://link.springer.com/chapter/10.1007/978-3-319-45741-3_7 - -2) Replace "Verifier" with "Reader" when we are talking about a concrete party. - -According to Wikipedia's definition (https://en.wikipedia.org/wiki/Reading), -"[r]eading is the process of taking in the sense or meaning of letters, symbols, -etc., especially by sight or touch". Although the latter is not a perfect -description for us, the first part is quite close and we could argue that we are -in the "etc" category. - -The main idea here is that it should feel intuitive that an author has readers. diff --git a/archive/2022-01-18-proposal-log-url b/archive/2022-01-18-proposal-log-url deleted file mode 100644 index 598fb43..0000000 --- a/archive/2022-01-18-proposal-log-url +++ /dev/null @@ -1,26 +0,0 @@ -Redefine "base URL" as "log URL" proposal - -Background ---- -The current api.md specification requires that a log has a fixed unique "base -URL". It is any valid HTTP(S) URL that can end with "/sigsum/v0/". - -Proposal ---- -Remove the term "base URL" and instead define "log URL". A log URL is a valid -HTTP(S) URL that ends with "/sigsum/v0/". Example of a valid log URL: - - https://example.com:4711/opposum/sigsum/v0/ - -This means that a named sigsum endpoint can be appended to a log's URL. For -example, if the endpoint is "get-tree-head-quickly" the resulting "endpoint URL" -would be: - - https://example.com:4711/opposum/sigsum/v0/get-tree-head-quickly/ - -And with input parameters for "get-leaves": - - https://example.com:4711/opposum/sigsum/v0/get-leaves/42/4711/ - -Note the final slash in all of the above URLs. Should that be enforced (?). - * XXX: Need to check in URL specification(s). Defer for now. diff --git a/doc/proposals/2022-01-add-leaf-endpoint b/doc/proposals/2022-01-add-leaf-endpoint new file mode 100644 index 0000000..3123e02 --- /dev/null +++ b/doc/proposals/2022-01-add-leaf-endpoint @@ -0,0 +1,90 @@ +Proposal: change add-leaf endpoint + +Background +--- +Right now a log returns HTTP status 200 OK if it will "try" to merge a submitted +leaf into its Merkle tree. A submitter should not assume that logging happened +until they see an inclusion proof that leads up to a (co)signed tree head. + +If a submitted leaf does not show up in the log despite seeing HTTP status 200 +OK, the submitter must resubmit it. When a resubmission is required/expected is +undefined. + +The reason for this "try" behavior is that log operations become much easier, +especially in self-hosted environments that do not rely on managed databases. +In other words, it is OK to just be "pretty sure" that a submitted leaf will be +persisted and sequenced, and "100%" sure after sequencing actually happened. + +Proposal +--- +A log should not return HTTP status 200 OK unless: +1. The submitted leaf has been sequenced as part of a persisted database. +2. The next tree head that the log signs will contain the submitted leaf. + +HTTP status 3XX is returned with, e.g., "Error=leaf has not been sequenced yet" +if it is not guaranteed that the submitted leaf has been sequenced. + +This means that logging should be assumed after seeing HTTP status 200 OK. This +assumption will be confirmed when the submitter obtains the next (co)signed tree +head. Further investigation is required if it turns out that this assumption is +false. + +Notes +--- +An earlier draft of this proposal considered if useful debug information should +be returned, such as "leaf index", "leaf hash", and "estimated time until a +cosigned tree head is available". We decided to not go in this direction to +avoid redundant and unsigned output that may be mis-used and tampered with ("not +consistent with design"). + +(Note that it is easy to determine when the next cosigned tree head will be +available. The to-sign tree head has a timestamp, and it is rotated every 300s. +Then it takes an additional 300s before the to-sign tree head is served with +collected cosignatures.) + +An earlier draft of this proposal also considered to have verifiable output: + * Option 1: An inclusion proof and a signed tree head + * Option 2: An inclusion proof and a cosigned tree head + +This could be a worthwhile direction if the submitter can only obtain the +required data by using the add-leaf endpoint, thus "forcing resubmits until the +desired output is obtained". Credit to Al Cutter who proposed this (very nice) +idea to us a while back. + +It is not appropriate to always return an inclusion proof for a signed tree +head. What we want is for submitters to get inclusion proofs that reference +cosigned tree heads. + +There are drawbacks to replace the above signed tree head with a cosigned tree +head: + * A submitter that submits multiple leaves will likely (have to?) retrieve + the same cosigned tree head multiple times via the add-leaf endpoint. That + overhead adds up. + * A submitter will have to be in a "resubmit phase" for several minutes as + the default, because it takes time before a cosigned tree head becomes + available. + * (The most sensible implementation would likely resubmit periodically, + say, once per minute. A clever implementation would look at the + timestamp of the to-sign endpoint to determine when is the earliest time + that a merged may have happened.) + +Moreover, removing the get-inclusion-proof and get-tree-head-cosigned endpoints +to force usage of add-leaf excludes (or makes for wonky) usage patterns of the +log: + * "I just want to download all cosigned tree heads to archive them" -> add + leaves. + * "I just want to debug/know that the log is committed to have the leaf + logged, and rely on other witnesses" -> still forced to observe the log's + cosignatures. + * "I want an inclusion proof to a particular tree head" -> build the Merkle + tree yourself to construct that proof. The log's API chooses tree heads for + you. + * (Keeping these endpoints in addition to any new add-leaf output would to + some degree defeat the purpose of adding output, which is why it is not + considered an option.) + +In gist, we decided to go with a solution that is somewhere in between what we +did before and what Al Cutter proposed. We defined when a resubmission is (not) +expected. As a result, a self-hosted log may return at least one HTTP 3XX for +each leaf request, and a few seconds later return HTTP status 200 OK for the +same input data. diff --git a/doc/proposals/2022-01-author-reader-terminology b/doc/proposals/2022-01-author-reader-terminology new file mode 100644 index 0000000..fb447d2 --- /dev/null +++ b/doc/proposals/2022-01-author-reader-terminology @@ -0,0 +1,41 @@ +Start using the terminology "author" and "reader" proposal + +Background +--- +Figure 1 in doc/design.md refers to + + a) the party producing a signed checksum as "Signer", and + b) the party verifying a signed checksum as "Verifier". + +This is fine in isolation, but less appropriate when looking at it from a +broader Sigsum perspective. For example, a "Signer" may also be a "Submitter". +It seems like we are mixing terminology for roles and concrete actors here. + +The above is also ambiguous. For example, logs and witnesses sign things; +witnesses and monitors verify things. + +Proposal +--- +1) Replace "Signer" with "Author" when we are talking about a concrete party. + +According to Wikipedia's definition (https://en.wikipedia.org/wiki/Author), an +'author is "the person who originated or gave existence to anything" and whose +authorship determines responsibility for what was created'. This seems +appropriate for us. + +The term "author" has been used in academic litterature before us for similar +purposes: + * "In the setting of transparency logging [18] as depicted in Fig. 1, the + author generates events intended for recipients that describe data + processing by the author as it takes place" + * Link to paper: https://link.springer.com/chapter/10.1007/978-3-319-45741-3_7 + +2) Replace "Verifier" with "Reader" when we are talking about a concrete party. + +According to Wikipedia's definition (https://en.wikipedia.org/wiki/Reading), +"[r]eading is the process of taking in the sense or meaning of letters, symbols, +etc., especially by sight or touch". Although the latter is not a perfect +description for us, the first part is quite close and we could argue that we are +in the "etc" category. + +The main idea here is that it should feel intuitive that an author has readers. diff --git a/doc/proposals/2022-01-domain-hint b/doc/proposals/2022-01-domain-hint new file mode 100644 index 0000000..322d9cc --- /dev/null +++ b/doc/proposals/2022-01-domain-hint @@ -0,0 +1,51 @@ +Proposal: stricter domain hint requirements + +Background +--- +Right now a log is expected to look up a submitter's public key hash via DNS. A +domain hint, say, example.com, specifies the location of a TXT RR that contains +the appropriate key hash in hex-encoding. "Some domain knows about the key". + +Downsides with this: +1. A log can be instructed to look up arbitrary TXT records +2. No versioning + +As far as we know there are no amplification threats with (1), but ideally it +would only be possible to query TXT RRs that are actually relevant for Sigsum. + +Not having any versioning could potentially become a headache. All other log +endpoints are versioned. There is no good reason to not have versioning here, +unless that would imply something like registering many different things with +IANA as a result. + +Proposal +--- +Require that a domain hint is formatted as: + + _sigsum_v0.* + +Examples of valid domain hints: + + _sigsum_v0.com + _sigsum_v0.example.com + _sigsum_v0.sub.example.com + +Examples of invalid domain hints: + + _sigsum_v0hello.example.com + +This change addresses both (1) and (2), without making DNS configs harder. + +Notes +--- +For v1 we need to consider if something should be registered with IANA. Credit +to Patrik Wallström who pointed us towards documentation about labels with +underscores: + * https://www.rfc-editor.org/rfc/rfc8552.html + * https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#underscored-globally-scoped-dns-node-names + +Note also that the dependency on TXT look-ups means that a "hidden log" via Tor +would need help from a resolver that is also available over Tor (preferably an +onion but at minimum reachable over TCP). This is because TXT records cannot be +resolved over Tor. This proposal allows the used resolver to be restricted to +only resolve _sigsum_*. diff --git a/doc/proposals/2022-01-get-endpoints b/doc/proposals/2022-01-get-endpoints new file mode 100644 index 0000000..cbe3170 --- /dev/null +++ b/doc/proposals/2022-01-get-endpoints @@ -0,0 +1,46 @@ +Proposal: change get-* endpoints that use HTTP post + +Background +--- +Right now we HTTP POST ASCII key-value pairs on these endpoints: + * get-leaves + * get-inclusion-proof + * get-consistency-proof + +The original reason was to not have an additional parser, say, input-parameters +with percent-encoding as part of the request URL. + +A major problem with this approach is that it will not be possible to benefit +from HTTP caching. Debugging, with "URLs that reference data" also becomes more +messy. You would have to say "I did printf | curl ...". + +Proposal +--- +Change these endpoints so that they use HTTP GET. Encode input params in URL: + + /get-leaves/10/20 # get leaves 10,11,...20 + /get-consistency-proof/10/20 # proof from tree size 10 to 20 + /get-inclusion-proof/10/ # proof for tree size 10 + +This notably avoids percent-encoding which is more messy. + +Notes +--- +We considered if it would be a good idea to re-use our ASCII parser for the +portion of the URL that encodes input data. The basic idea would be that +different "end of key" and "end of value" patterns could be used that are better +suited for a URL. + +For example, instead of (=,\n) one could use ([,]) as ("end of key", "end of value"). + * get-leaves/start_size[10]end_size[12] + * get-consistency-proof/old_size[12]new_size[14] + * get-inclusion-proof/tree_size[10]leaf_hash[ab...ef] + +The reasons why we aborted this direction: + * We can not think of any concrete security risk with the shorter '/' proposal. + * There are very few parameters at play here, hard to confuse and quick + feedback loop if you do. For example, "Error=start size must be smaller or + equal to end size". + * We can be sure that the '/' proposal will not introduce any wonky + interoperability issues; picking a ("end of key", "end of value") would + require much more care. diff --git a/doc/proposals/2022-01-log-url b/doc/proposals/2022-01-log-url new file mode 100644 index 0000000..598fb43 --- /dev/null +++ b/doc/proposals/2022-01-log-url @@ -0,0 +1,26 @@ +Redefine "base URL" as "log URL" proposal + +Background +--- +The current api.md specification requires that a log has a fixed unique "base +URL". It is any valid HTTP(S) URL that can end with "/sigsum/v0/". + +Proposal +--- +Remove the term "base URL" and instead define "log URL". A log URL is a valid +HTTP(S) URL that ends with "/sigsum/v0/". Example of a valid log URL: + + https://example.com:4711/opposum/sigsum/v0/ + +This means that a named sigsum endpoint can be appended to a log's URL. For +example, if the endpoint is "get-tree-head-quickly" the resulting "endpoint URL" +would be: + + https://example.com:4711/opposum/sigsum/v0/get-tree-head-quickly/ + +And with input parameters for "get-leaves": + + https://example.com:4711/opposum/sigsum/v0/get-leaves/42/4711/ + +Note the final slash in all of the above URLs. Should that be enforced (?). + * XXX: Need to check in URL specification(s). Defer for now. diff --git a/doc/proposals/2022-01-tree-head-endpoints b/doc/proposals/2022-01-tree-head-endpoints new file mode 100644 index 0000000..b2831bf --- /dev/null +++ b/doc/proposals/2022-01-tree-head-endpoints @@ -0,0 +1,118 @@ +Proposal: change tree-head endpoints + +Background +--- +Right now the get-tree-head-to-sign endpoint returns the signed tree head that +witnesses should cosign. It does not return any cosignatures. One needs to +wait until the to-sign tree head is finalized and served via +get-tree-head-cosigned. We also have a get-tree-head-latest endpoint that is +sort of hanging around for "debug purposes". + +It would be nice if a submitter could find required cosignatures without always +having to wait for five minutes. The log will likely have received a majority +of cosignatures after one minute, but a submitter currently needs to wait the +full duration before getting access via the get-tree-head-cosigned endpoint. + +It would also be nice to consider if the get-tree-head-latest endpoint can be +removed. + +Here is a rough break-down of how we think about the sigsum API's usage via +roles: + * Submitter + * add-leaf, until HTTP status 200 OK which should mean "you have been sequenced". + * [fetching an inclusion proof for a signed tree head to "verify sequencing" + is not a recommended usage pattern, and does not prevent DoS. The only + difference is that the submitter would notice that the log has not + included with regards to the latest tree head sooner than with regards + to the cosigned tree head. In both cases, there is no proof that + submitter got 200 OK without getting sequenced.] + * Distributor + * get-tree-head-cosigned + * get-inclusion-proof + * [wants "enough" cosignatures, sooner rather than later is a soft requirement] + * Monitor + * get-leaves + * get-tree-head-cosigned + * might hit get-{consistency,inclusion}-proof depending on implementation + * [wants as many cosignatures as possible, does not care about ~minutes of waiting] + * Witness + * get-consistency-proof + * get-tree-head-to-sign + * add-cosignature + * [does not / should not care about other cosignatures; just that the + log signed and that the tree head is consistent with prior history as + observed by the witness] + * End-user + * [does not hit any of the log's endpoints] + * "The curious" + * the latest signed tree head, as fast as possible for quick debug + probably. "is the thing I'm doing working". + * the latest cosigned tree head, with as many cosignatures as possible + for archiving + +Keep in mind that the below proposal should not introduce the log's key hash as +output on any API endpoint. We removed this and other redundant output because +that reduces the risk of faulty implementations that operate on untrusted input. + +For example, in the same way that a faulty witness could verify "the wrong +consistency proof" if it just verified the proof against the tree sizes that the +log returned redundantly (as opposed to the tree sizes the witness asked for), a +faulty witness could end-up cosigning a tree head with another log's context +because "they just copied the key hash and used it because it was there". + +Note that we cannot add the key_hash and cosignature fields to the output of +get-tree-head-to-sign. Our ASCII parser is so simple that it does not permit +empty lists. So, we will either need a way to handle empty lists, or tweak our +endpoints so that they still do what we want without having any empty list. + +[Both rgdd and ln5 would like to avoid complicating the ASCII parser.] + +Proposal +--- +1. Remove the get-tree-head-latest endpoint. We no longer have any recommended +usage-pattern for this endpoint and so it should be removed. Our strongest +arguments for removal are "don't use a signed tree head, it is sort of like a +promise", and "it does not even help you prove that the log's HTTP status 200 OK +semantics were faulty". +2. The get-tree-head-to-sign endpoint is kept as is, but renamed. + * Purpose: used by witnesses. +3. Add an endpoint that returns the logs "to-cosign" tree head and all +cosignatures that were collected thus far. If no cosignatures were received +yet, return an error to avoid having an empty list as output. + * Purpose: used by distributors, but could also be used by a witness' + internal monitoring setup ("is my witness working, are the signatures really + showing up?"). +4. Keep an endpoint that serves the "finalized" cosigned tree head. + * Purpose: mainly used by monitors, but could also be used by distributor's + that don't mind the additional waiting or by parties that want to archive + cosigned tree heads. + +This proposal currently does not have a name for the above endpoints. Help +wanted. + +Notes +--- +A witness polls the "get-tree-head-to-sign" endpoint as before. Witnesses are +recommended to poll the log at least once per minute at randomly selected times. + +After a successful add-cosignature request, a witness should not attempt to add +the same cosignature again. A log can refresh their "to-sign tree head" to +instruct witnesses to send their cosignatures again for the same tree size. + +A witness operator may check that their cosignatures appear on the +"get-tree-head-cosigned endpoints". Such checking would likely be part of how +the operator monitors that the witness operates correctly (i.e., it would not be +something that the witness software does itself after a successful +add-cosignature request). + +A submitter ("Signer" in Figure 1) that wants a cosigned tree head that +satisifies a given policy as fast as possible can poll the "dynamic cosigned +tree head endpoint". Keep in mind that polling more than a few times per minute +would not let you obtain cosignatures much faster, see the above recommendation +for how often witnesses should provide their cosignatures. + +A helpful reflection with regards to naming: + * "The log's to-sign STH shows up, it gets filled-up with cosignatures; the + previous cosigned tree head is served on a separate endpoint. Then + "prev=curr, curr=new". I.e., there is a time aspect here that might be + helpful for naming, although previous and current would be bad choices." -- cgit v1.2.3