Speculation Rules

A Collection of Interesting Ideas,

Issue Tracking:
GitHub
Inline In Spec
Editor:
(Google)

Abstract

A flexible syntax for defining what outgoing links can be prepared speculatively before navigation.

1. Speculation rules

1.1. Definitions

A speculation rule is a struct with the following items:

The only valid string for requirements to contain is "anonymous-client-ip-when-cross-origin".

A speculation rule set is a struct with the following items:

1.2. The script element

Note: This section contains modifications to the corresponding section of [HTML].

To process speculation rules consistently with the existing script types, we make the following changes:

The following algorithms are updated accordingly:

We should consider whether we also want to make this execute even if scripting is disabled.

We should also incorporate the case where a src attribute is set.

We could fire error and load events if we wanted to.

1.3. Prepare a script

Inside the prepare a script algorithm we make the following changes:

1.4. Parsing

The general principle here is to allow the existence of directives which are not understood, but not to accept into the rule set a rule which the user agent does not fully understand. This reduces the risk of unintended activity by user agents which are unaware of most recently added directives which might limit the scope of a rule.

To parse speculation rules given a string input and a URL baseURL, perform the following steps. They return a speculation rule set or null.
  1. Let parsed be the result of parsing a JSON string to an Infra value given input.

  2. If parsed is not a map, then return null.

  3. Let result be an empty speculation rule set.

  4. If parsed["prefetch"] exists and is a list, then for each prefetchRule of parsed["prefetch"]:

    1. If prefetchRule is not a map, then continue.

    2. Let rule be the result of parsing a speculation rule given prefetchRule and baseURL.

    3. If rule is null, then continue.

    4. Append rule to result’s prefetch rules.

  5. If parsed["prefetch_with_subresources"] exists and is a list, then for each pwsRule of parsed["prefetch_with_subresources"]:

    1. If pwsRule is not a map, then continue.

    2. Let rule be the result of parsing a speculation rule given pwsRule and baseURL.

    3. If rule is null, then continue.

    4. Append rule to result’s prefetch-with-subresources rules.

  6. Return result.

To parse a speculation rule given a map input and a URL baseURL, perform the following steps. They return a speculation rule or null.
  1. If input has any key other than "source", "urls", and "requires", then return null.

  2. If input["source"] does not exist or is not the string "list", then return null.

  3. Let urls be an empty list.

  4. If input["urls"] does not exist, is not a list, or has any element which is not a string, then return null.

  5. For each urlString of input["urls"]:

    1. Let parsedURL be the result of parsing urlString with baseURL.

    2. If parsedURL is failure, then continue.

    3. If parsedURL’s scheme is not an HTTP(S) scheme, then continue.

    4. Append parsedURL to urls.

  6. Let requirements be an empty ordered set.

  7. If input["requires"] exists, but is not a list, then return null.

  8. For each requirement of input["requires"]:

    1. If requirement is not the string "anonymous-client-ip-when-cross-origin", then return null.

    2. Append requirement to requirements.

  9. Return a speculation rule with URLs urls and requirements requirements.

1.5. Processing model

A document has a list of speculation rule sets, which is an initially empty list.

Periodically, for any document document, the user agent may queue a global task on the DOM manipulation task source with document’s relevant global object to consider speculation for document.

The user agent will likely do this after the insertion of new speculation rules, or when resources are idle and available.

To consider speculation for a document document:
  1. If document is not fully active, then return.

    It’s likely that we should also handle prerendered and back-forward cached documents.

  2. For each ruleSet of document’s list of speculation rule sets:

    1. For each rule of ruleSet’s prefetch-with-subresources rules:

      1. Let requiresAnonymousClientIPWhenCrossOrigin be true if rule’s requirements contains "anonymous-client-ip-when-cross-origin", and false otherwise.

      2. For each url of rule’s URLs:

        1. The user agent may prefetch url given requiresAnonymousClientIPWhenCrossOrigin, including subresources identified by speculative HTML parsing.

          TODO: expand this along with prefetch more generally.

    2. For each rule of ruleSet’s prefetch rules:

      1. Let requiresAnonymousClientIPWhenCrossOrigin be true if rule’s requirements contains "anonymous-client-ip-when-cross-origin", and false otherwise.

      2. For each url of rule’s URLs:

        1. The user agent may prefetch url given requiresAnonymousClientIPWhenCrossOrigin.

          TODO: expand this to actually elaborate on how prefetch works, once initiated, and to incorporate the requiresAnonymousClientIPWhenCrossOrigin flag. We may wish to include language about when the UA should deduplicate requests.

We should also notice removals and consider cancelling speculated actions.

2. Security considerations

2.1. Cross-site request forgery

This specification allows documents to cause HTTP requests to be issued.

When any supported action acts on a URL which is same origin to the document, then this does not constitute a risk of cross-site request forgery, since the request uses only the credentials available to the document.

Otherwise, requests are always issued without using any previously existing credentials. This limits the ambient authority available to any potentially forged request, and such requests can already be made through [FETCH], a subresource or frame, or various other means. Site operators are therefore already well-advised to use CSRF tokens or other mitigations for this threat.

2.2. Cross-site scripting

This specification causes activity in response to content found in the document, so it is worth considering the options open to an attacker able to inject unescaped HTML.

Such an attacker is otherwise able to inject JavaScript, frames or other elements. The activity possible with this specification (requesting fetches etc) is generally less dangerous than arbitrary script execution, and comparable to other elements. The same mitigations available to other features also apply here. In particular, the [CSP] script-src directive applies to the parsing of the speculation rules and the prefetch-src directive applies to prefetch requests arising from the rules.

2.3. Type confusion

In the case of speculation rules in an inline <script>, an application which erroneously parsed speculation rules as a JavaScript script (though user agents are instructed not to execute scripts who "type" is unrecognized) would either interpret it as the empty block {} or produce a syntax error, since the U+003A COLON (:) after the first key is invalid JavaScript. In neither case would such an application execute harmful behavior.

Since the parsing behavior of the <script> element has long been part of HTML, any modern HTML parser would not construct any non-text children of the element. There is thus a low risk of other text hidden inside a <script> element with type="speculationrules" which is parsed as part of the script content by compliant HTML implementations but as HTML tags by others.

Authors should, however, still escape any potentially attacker-controlled content inserted into speculation rules. In particular, it may be necessary to escape JSON syntax as well as, if the speculation rules are in an inline <script> tag, the closing </script> tag. [CSP] is a useful additional mitigation for vulnerabilities of this type.

Expand this section once externally loaded (via "src") speculation rules are specified.

2.4. IP anonymization

This specification allows authors to request prefetch traffic using IP anonymization technology provided by the user agent. The details of this technology are not a part of this specification; nonetheless some general principles apply.

To the extent IP anonymization is implemented using a proxy service, it is advisable to minimize the information available to the service operator and other entities on the network path. This likely involves, at a minimum, the use of [TLS] for the connection.

Site operators should be aware that, similar to virtual private network (VPN) technology, the client IP address seen by the HTTP server may not exactly correspond to the user’s actual network provider or location, and a traffic for multiple distinct subscribers may originate from a single client IP address. This may affect site operators' security and abuse prevention measures. IP anonymization measures may make an effort to use an egress IP address which has a similar geolocation or is located in the same jurisdiction as the user, but any such behavior is particular to the user agent and not guaranteed by this specification.

3. Privacy considerations

3.1. Heuristics

Because the candidate prefetches and other actions are not required, the user agent can use heuristics to determine which actions would be best to execute. Because it may be observable to the document whether actions were executed, user agents must take care to protect privacy when making such decisions — for instance by only using information which is already available to the origin. If these heuristics depend on any persistent state, that state must be erased whenever the user erases other site data. If the user agent automatically clears other site data from time to time, it must erase such persistent state at the same time.

The use of origin here instead of site here is intentional. Origins generally form the basis for the web’s security boundary. Though same-site origins are generally allowed to coordinate if they wish, origins are generally not allowed access to data from other origins, even same-site ones.

Examples of inputs which would be already known to the document:

Examples of persistent data related to the origin (which the origin could have gathered itself) but which must be erased according to user intent:

Examples of device information which may be valuable in deciding whether prefetching is appropriate, but which must be considered as part of the user agent’s overall privacy posture because it may make the user more identifiable across origins:

3.2. Intent

While efforts have been made to minimize the privacy impact of prefetching, some users may nonetheless prefer that prefetching not occur, even though this may make loading slower. User agents are encouraged to provide a setting to disable prefetching features to accommodate such users.

3.3. Partitioning

Some user agents partition storage according to the site or origin of the top-level document. In order for prefetching and prerendering to be useful, it is therefore essential that prefetching or prerendering of a document either occur in the partition in which the navigation would occur (e.g., for a same-origin URL) or in an isolated partition, so as to ensure that prefetching does not become a mechanism for bypassing the partitioning scheme.

Expand this section once more detail on prefetch and prerender partitioning mechanism is specified.

3.4. Identity joining

This specification describes a mechanism through which HTTP requests for later top-level navigation (in the case of prefetching) can be made without a user gesture. It is natural to ask whether it is possible for two coordinating sites to connect user identities.

Since existing credentials for the destination origin are not sent (assuming it is not same origin with the referrer), that site is limited in its ability to identify the user before navigation in a similar way to if the referrer site had simply used [FETCH] to make an uncredentialed request. Upon navigation, this becomes similar to ordinary navigation (e.g., by clicking a link that was not prefetched).

To the extent that user agents attempt to mitigate identity joining for ordinary fetches and navigations, they can apply similar mitigations to prefetched navigations.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[FETCH]
Anne van Kesteren. Fetch Standard. Living Standard. URL: https://fetch.spec.whatwg.org/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[URL]
Anne van Kesteren. URL Standard. Living Standard. URL: https://url.spec.whatwg.org/

Informative References

[CSP]
Mike West. Content Security Policy Level 3. 29 June 2021. WD. URL: https://www.w3.org/TR/CSP3/
[TLS]
E. Rescorla. The Transport Layer Security (TLS) Protocol Version 1.3. August 2018. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc8446

Issues Index

We should consider whether we also want to make this execute even if scripting is disabled.
We should also incorporate the case where a src attribute is set.
We could fire error and load events if we wanted to.
It’s likely that we should also handle prerendered and back-forward cached documents.
TODO: expand this along with prefetch more generally.
TODO: expand this to actually elaborate on how prefetch works, once initiated, and to incorporate the requiresAnonymousClientIPWhenCrossOrigin flag. We may wish to include language about when the UA should deduplicate requests.
We should also notice removals and consider cancelling speculated actions.
Expand this section once externally loaded (via "src") speculation rules are specified.
Expand this section once more detail on prefetch and prerender partitioning mechanism is specified.