Follow-up to #2549077: Allow the "Limit allowed HTML tags" filter to also restrict HTML attributes, and only allow a small whitelist of attributes by default to backport to Drupal 7

Problem/Motivation

  • Drupal 8 core outputs <!DOCTYPE html> at the top of html.html.twig, which tells the browser that the contents are HTML5.
  • http://www.w3.org/TR/html5/single-page.html#non-conforming-features lists many tags and attributes from older versions of HTML that the HTML5 specification defines as: Elements in [this] list are entirely obsolete, and must not be used by authors.
  • Install Standard Profile, and go to /node/add/article. Decide that you can't think of anything original to write so you go searching for interesting content on other web pages to copy and paste. You stumble upon https://www.cs.tut.fi/~jkorpela/www/justify.html, which is a perfectly valid HTML 4 page and find that you like the section "What is justification?", so you select that whole section and copy it to your clipboard. You then switch back to your node form and paste that into the body field, click Save, and voila, you now have a page on your site with some lovely justified and right-aligned paragraphs.
  • Problem is, align is not a valid HTML5 attributes. Current browsers still respect it, but there's nothing stopping a future browser from ignoring it. And meanwhile, you now have a site that will be failing strict html validation (if you care about that sort of thing).
  • If instead, non-html5-valid attributes were filtered out, you'd have the opportunity to discover a more correct (i.e., standards-compliant, and therefore future-proof) way of aligning your paragraphs, such as going to your format configuration and adding the alignment/justification buttons to your toolbar.

Proposed resolution

Fortunately, we already have all of the tooling necessary to fix this. FilterInterface::getHTMLRestrictions() provides the API for expressing attribute (and other) restrictions and Html::load() gives us a DOM document that we can use in order to apply those rules. So the proposed resolution consists of changing the allowed_html setting of html_filter to include a whitelist of attributes, and for each attribute, an optional whitelist of allowed attribute values.

Remaining tasks

Review.

User interface changes

The format configuration UI.

API changes

None.

Data model changes

The allowed_html key of the filter_html filter's configurations is expanded to allow for whitelisting attributes. So, e.g. <a> to <a href hreflang dir>, <h4> to <h4 id>, et cetera.

Beta phase evaluation

Reference: https://www.drupal.org/core/beta-changes
Issue category Bug because content authors shouldn't be required to know the HTML5 specification in order to avoid generating invalid web pages.
Issue priority Major because we should not change the format configurations of existing sites, so if we don't prioritize this for 8.0.0, then sites built between 8.0.0 and when this is fixed will have invalid html content potentially for a long time.
Prioritized changes The main goal of this issue is usability, in the sense of helping content authors generate content that is standards-compliant and therefore future-proof.
Disruption Requires an upgrade path and is disruptive to some existing D8 sites (depending on which attributes their content depends on, they may need to manually add additional attributes and attribute values to be whitelisted). Potentially disruptive for contributed and custom modules that implement filters that run after 'filter_html' but use internal attributes that 'filter_html' will now be filtering out. However, these modules have ways to resolve that, such as instructing site builders on the required configuration for compatibility and/or using the FilterInterface::prepare() method.

Comments

pwolanin created an issue. See original summary.

wim leers’s picture

nod_’s picture

Title: Allow the "Limit allowed HTML tags" filter to also restrict HTML attributes, and only allow a small whitelist of attributes by default » Allow the "Limit allowed HTML tags" filter to also restrict HTML attributes, and only allow a small allow list of attributes by default
Issue tags: -JavaScript +JavaScript
nod_’s picture

Title: Allow the "Limit allowed HTML tags" filter to also restrict HTML attributes, and only allow a small allow list of attributes by default » Allow the "Limit allowed HTML tags" filter to also restrict HTML attributes, and only allow a small list of attributes by default

Status: Patch (to be ported) » Closed (outdated)

Automatically closed because Drupal 7 security and bugfix support has ended as of 5 January 2025. If the issue verifiably applies to later versions, please reopen with details and update the version.