Why: - DeduplicateStyles runs as a default post-cache output transformation for every backend pageview. It tokenizes the article HTML via Remex to deduplicate style nodes within. - This is expensive for large pages. On the Barack Obama page, the transform takes 350+ ms on a parser cache hit. - Some other transforms, like HandleSectionLinks, already use regexes to only run Remex-driven transforms on relevant elements to avoid a potentially expensive tokenization of the whole page. What: - Use a regular expression to limit this transform so that it only tokenizes potential <style> nodes. This takes ~2ms to execute on a large page[1], compared to ~166ms currently. - Restrict this optimization to legacy parser output transformations, since the naïve regex used might otherwise match encoded style tags within data-parsoid attribute values, as described in I32d3d1772243c3819e1e1486351d16871b6e21c4. Add a test for this. [1] https://en.m.wikipedia.org/wiki/Democratic_Party_(United_States)?action=render Bug: T394059 Change-Id: I33ebcc2da7685b4b6dafdad3ed3ef2a9edea9a00 (cherry picked from commit 02f69d5dc99a964981c57b597eedffa1f253a14c) |
||
|---|---|---|
| .. | ||
| api-testing | ||
| common | ||
| jest | ||
| parser | ||
| phan | ||
| phpunit | ||
| qunit | ||
| selenium | ||
| uidesign | ||
| .htaccess | ||