Follow-up to #2549077: Allow the "Limit allowed HTML tags" filter to also restrict HTML attributes, and only allow a small whitelist of attributes by default to backport to Drupal 7
Problem/Motivation
- Drupal 8 core outputs
<!DOCTYPE html>at the top ofhtml.html.twig, which tells the browser that the contents are HTML5. - http://www.w3.org/TR/html5/single-page.html#non-conforming-features lists many tags and attributes from older versions of HTML that the HTML5 specification defines as: Elements in [this] list are entirely obsolete, and must not be used by authors.
- Install Standard Profile, and go to
/node/add/article. Decide that you can't think of anything original to write so you go searching for interesting content on other web pages to copy and paste. You stumble upon https://www.cs.tut.fi/~jkorpela/www/justify.html, which is a perfectly valid HTML 4 page and find that you like the section "What is justification?", so you select that whole section and copy it to your clipboard. You then switch back to your node form and paste that into the body field, click Save, and voila, you now have a page on your site with some lovely justified and right-aligned paragraphs. - Problem is,
alignis not a valid HTML5 attributes. Current browsers still respect it, but there's nothing stopping a future browser from ignoring it. And meanwhile, you now have a site that will be failing strict html validation (if you care about that sort of thing). - If instead, non-html5-valid attributes were filtered out, you'd have the opportunity to discover a more correct (i.e., standards-compliant, and therefore future-proof) way of aligning your paragraphs, such as going to your format configuration and adding the alignment/justification buttons to your toolbar.
Proposed resolution
Fortunately, we already have all of the tooling necessary to fix this. FilterInterface::getHTMLRestrictions() provides the API for expressing attribute (and other) restrictions and Html::load() gives us a DOM document that we can use in order to apply those rules. So the proposed resolution consists of changing the allowed_html setting of html_filter to include a whitelist of attributes, and for each attribute, an optional whitelist of allowed attribute values.
Remaining tasks
Review.
User interface changes
The format configuration UI.
API changes
None.
Data model changes
The allowed_html key of the filter_html filter's configurations is expanded to allow for whitelisting attributes. So, e.g. <a> to <a href hreflang dir>, <h4> to <h4 id>, et cetera.
Beta phase evaluation
| Issue category | Bug because content authors shouldn't be required to know the HTML5 specification in order to avoid generating invalid web pages. |
|---|---|
| Issue priority | Major because we should not change the format configurations of existing sites, so if we don't prioritize this for 8.0.0, then sites built between 8.0.0 and when this is fixed will have invalid html content potentially for a long time. |
| Prioritized changes | The main goal of this issue is usability, in the sense of helping content authors generate content that is standards-compliant and therefore future-proof. |
| Disruption | Requires an upgrade path and is disruptive to some existing D8 sites (depending on which attributes their content depends on, they may need to manually add additional attributes and attribute values to be whitelisted). Potentially disruptive for contributed and custom modules that implement filters that run after 'filter_html' but use internal attributes that 'filter_html' will now be filtering out. However, these modules have ways to resolve that, such as instructing site builders on the required configuration for compatibility and/or using the FilterInterface::prepare() method. |
Comments
Comment #2
wim leersNote that we should still capture/move key comments about D7 from #2549077: Allow the "Limit allowed HTML tags" filter to also restrict HTML attributes, and only allow a small whitelist of attributes by default to this issue's summary.
Comment #3
nod_Comment #4
nod_