diff --git a/manuscript/microsyntaxes.md b/manuscript/microsyntaxes.md index 4114fbd..b06aba8 100755 --- a/manuscript/microsyntaxes.md +++ b/manuscript/microsyntaxes.md @@ -186,7 +186,13 @@ The format of the srcset attribute is as follows: * If an *image candidate string* has no descriptors and no trailing whitespace, then the next *image candidate string* must begin with whitespace (otherwise it would get jammed together with the previous URL). -A naïve processing would be to split the string on commas and then split on whitespace, to get a list of URLs and their descriptors. However, this would fail to correctly parse URLs that contain commas (for example data: URLs), and, for the purpose of compatibility with possible future complex descriptors, the parsing of those are more involved, too. +The `srcset` microsyntax doesn't have legacy baggage (other than URLs) to attribute its complexity to. It was a new attribute and the syntax was designed. However, a number of requirements led to complexity anyway: + +* graceful error handling: in the spirit of HTML, an error somewhere shouldn't cause the entire attribute value to be ignored. + +* compatibility with URLs: URLs can contain basically any character and should still work. A naïve processing would be to split the string on commas and then split on whitespace, to get a list of URLs and their descriptors. However, this would fail to correctly parse URLs that contain commas (for example, `data:` URLs). + +* support future extensions: it should be possible to add new descriptors in the future without causing unexpected behavior in legacy user agents that don't support the new descriptor. One such anticipated descriptor is a descriptor analogous to the `integrity` attribute on `script` and `link` -- integrity checks would need to be per URL, so for images, each URL in `srcset` would need to be annotated individually. The processing is as follows: @@ -198,14 +204,118 @@ The processing is as follows: * If the URL ends with a comma, then all trailing commas are removed (only a single trailing comma is conforming). Otherwise, descriptors for the current item are parsed: - * A state machine is used to tokenize descriptors. This is to handle whitespace and commas inside parentheses. For example, `size(50, 50, 30)` is tokenized to a single descriptor. A top-level comma ends the tokenizer. + * A state machine is used to tokenize descriptors. This is to handle whitespace and commas inside parentheses. For example, `size(50, 50, 30)` is tokenized to a single descriptor. A top-level comma ends the descriptors tokenizer. -* The tokenized descriptors are parsed into *density*, *width*, and *future-compat-h*. The last one is for gracefully handling future web content that uses not-yet-specified *height* descriptors in addition to *width* descriptors. If any of the descriptors are invalid, the entire candidate is dropped. +* The tokenized descriptors are parsed into *density*, *width*, and *future-compat-h*. If any of the descriptors are invalid, the entire candidate is dropped. The *future-compat-h* descriptor is for gracefully handling future web content that uses not-yet-specified *height* descriptors in addition to *width* descriptors. Instead of dropping the candidate when seeing a "h" descriptor, only that descriptor is ignored. -* Run the above steps in a loop. +* Run the above steps in a loop until reaching the end of the string. ### Sizes +The `sizes` attribute is used in conjuction with the `srcset` attribute when *width* descriptors are used. The *width* descriptor tells the browser the width of the image resource, and the `sizes` attribute tells the browser what the *intended* layout size is for the image. + +You may wonder why this attribute is needed in the first place. Can't the browser just use the layout information that is provided in the CSS to decide which image to load? + +To answer this question, we first need to know a bit about how browsers load web pages. When navigating to a web page, the browser will first receive the HTML, and it will subsequently fetch further resources as it finds them in the HTML. For most kinds of resources, the HTML parser will continue processing while the subresource is being fetched. So for a simple document that includes an external stylesheet and then an image, the browser will fetch both in parallel. + +```html + + +wow +``` + +This means that the browser can't wait for `style.css` to be available before starting to load the image, as that would regress page load performance. + +There are other scenarios to consider as well, but the above is the simplest one and is enough to justify the `sizes` attribute. + +A more complicated scenario involves `script` elements that block the HTML parser and an optimization that browsers have, the speculative tokenizer or speculative parser, that speculatively continues to process HTML past a blocking `script` element and speculatively fetch further subresources found, such as scripts, stylesheets, and images. A `script` element blocks the HTML parser if it is external (has a `src` attribute) and is a classic script (not `type="module"`) and does not use `async` or `defer` attributes. + +```html + + + +wow +``` + +The reason scripts can block the HTML parser is that scripts can call `document.write()`, which can change the state of the HTML parser and thus change the meaning of all markup after the `` end tag. Consider `script.js` being the script `document.write('