This patch exports the necessary information from the Parser into the
ParserOutput to ensure that the Table of Contents can be properly
language-converted: both ensuring that the target language is correct
(in cases where it differs from the content language) and that various
conversion-suppression mechanisms are functional. When the
ParserCache does not (yet) have the new properties from Parser, the
behavior is unchanged from before (the content language is used, and
its "preferred variant").
This is a follow up to the "quick fix" deployed in
Ic14b3a49a8ee7ed600485d4f8a363a206035a847 to fix an UBN regression.
Parser tests have also been added to verify that ToC conversion
is correctly done (T299973).
Task T303329 has been opened to (eventually) rename the
'core:target-lang' and 'core:target-lang-variant' properties added to
the ParserOutput in this patch.
Bug: T303235
Bug: T295187
Bug: T299973
Followup-To: Ic14b3a49a8ee7ed600485d4f8a363a206035a847
Followup-To: Ib273f88531c340b561072ee9f616aa60725091e6
Change-Id: Ie0f1d7b6daffc8ff47228f6f086a257518f72717
This is spec'd out at https://www.mediawiki.org/wiki/Specs/HTML#Media
It's also useful in the bug to determine when the link is pointing at
the resource, and hence MediaViewer should open.
Previously that was distinguished with .image class on the link but
that's now omitted in getDescLinkAttribs.
FIXME: Should the "resource" contain querystrings? Maybe this needs to
be done on the Parsoid side as well.
Bug: T292657
Depends-On: Idb60e418f79dcb6a121de2a11e6e0ed0b31fd3ff
Change-Id: Ia94138383ebdbfc2feef75fdf651b969085a72b1
Just newer and overlooked tests. All the media in those galleries are
invalid and the gallery changes went in later in
Iff2bdc3aa02f84f0bf4ca55d177706823934cc08.
Change-Id: I6d03037af1b5c90e6d57fd048506da2b4e4bc704
All of my favorite text editors corrupt this test case whenever I edit
parserTests.txt. extraParserTests.txt contains other tests with weird
characters that may get corrupted by normal text editors.
(I had to use `vi` to make this patch, and I wouldn't wish this on
anyone.)
Change-Id: Id474469180fc284e3e28b55f65808be727507875
The value in the attribute displaytitle must contain valid HTML. The
sanitizer of the {{DISPLAYTITLE}} parser ensures that only valid HTML
is accepted.
If there is no {{DISPLAYTITLE}} in the wikitext then displaytitle
falls back to $title->getPrefixedText(). Here an HTML encoding of
special characters is necessary. This affects only the replacement of
& by & because other special characters like < and > are not
allowed in the title.
This change affects the displaytitle fallback on the following places:
* ApiParse
* ApiQueryInfo
* InfoAction
* Parser
The displaytitle fallback in OutputPage is also updated to this
behavior although
Sanitizer::normalizeCharReferences( Sanitizer::removeHTMLtags( $html )
also replaces & by &.
Also add test cases with & in the displaytitle to:
* ApiParseTest
* ApiQueryInfoTest
* parserTests
Bug: T291985
Change-Id: I8ee1e2731d9bfa49725d663b34986e7e3073e4ca
Gated behind the flag $wgParserEnableLegacyMediaDOM. The scattershot
usage of it is a little unfortunate but isn't expected to live very long
so maybe that's acceptable.
Further details can be found at,
https://www.mediawiki.org/wiki/Parsing/Media_structure
Bug: T51097
Bug: T266148
Bug: T271129
Change-Id: I978187f9f6e9e0a105521ab3e26821e36a96b911
This character is no longer required here.
It was added to ensure correct display of parentheses in mixed
LTR/RTL environment, for example an interlanguage link from
an RTL wiki to an LTR language with parentheses in its name.
However, the Unicode bidirectional algorithm was updated
to handle parentheses more cleverly and automatically,
making manual adjustment with RTL/LRM unnecessary.
This update was implemented years ago in all browsers and
operating systems. I've tested this in Firefox, Chrome, Edge,
and Internet Explorer 11, and it works correctly without
the RLM/LRM characters.
Parser tests are updated accordingly.
Bug: T280435
Change-Id: I63107f623ade3b8367eae579a8e96d7e2c18b747
Our PortableInfobox extension uses the HTML5 <aside> tag in its generated HTML.
This tag isn't recognized as a block element (in the way e.g. <div> is) by the
legacy parser, resulting in some spurious empty paragraphs in the output.
As a fix, make the legacy parser aware of <aside> tags to avoid unnecessary
p-wrapping. Also add <aside> to the Sanitizer's internal attribute check.
I3e57f55ac69d2c1ee8a1d41c21b692e56fc7e628 takes care of updating Parsoid-PHP
accordingly.
Bug: T278565
Change-Id: I89dbdf7770e13e1b62320228a366c64e64217b0b
This fails without the follow-up patch with the same exception as on the
task
Bug: T276476
Follow-Up: I014da3a333f8ee6ca623b98c415b8d9f9d1be084
Change-Id: Ib61e9ea44a6fdc31e10b89c3504cecec5b9fd208
We lost some insight in c44a395 because we're no longer analysing the
entire dom as a serialized string, but instead running our regexp on
individual text nodes.
This patch as written here just allows for the space to be at the start
of the text node. However, some git spelunking shows that in 9dc65ef,
the condition for there being a non-whitespace character previous to the
space was only because armoring French spacing happened before
doBlockLevels and wanted to protect indent pre's.
That's certainly not the case anymore, so we can probably get away with
dropping the condition altogether now.
Bug: T275918
Change-Id: I654a09b0f98937379b9fad3f325134ead7f2d8a6
This also means we don't need to take special care for French spacing in
attributes, since it's no longer applied there.
Adds a test that captures this change.
Note that the test "Nowiki and french spacing" wonders whether this
escaping should be applied to nowiki content.
Bug: T255007
Change-Id: Ic8965e81882d7cf024bdced437f684064a30ac86
This validates langconvert's "from" and "to" arguments as valid BCP 47 tags. For example, it will accept "sr-Cyrl" and "sr-cyrl" and reject the non-standard internal MediaWiki code "sr-ec". I made the BCP 47 matching case insensitive as that seems to conform with how MediaWiki handles it elsewhere and case sensitive matching would probably be a headache for users.
Bug: T271758
Change-Id: I9f765fe650279820d61c3a7e499ca99468df3d14
Currently MediaWiki turns `[[test, abc]]` to `[[test, abc|test]]`
while saving the page but that comma isn't in use in Persian
so this patch makes MediaWiki to treat Arabic comma the same way
as regular comma.
Change-Id: Ib8051023abc25b7c4f97a3f50246f35650057ec9
Document and enforce the correct type for the first argument to
a Parser tag hook, which will be `null` if the tag is self-closed.
Mark the methods in CoreTagHooks @internal. They are apparently
unused outside MediaWiki core:
https://codesearch.wmcloud.org/search/?q=CoreTagHooks&i=nope&files=&repos=
Add coverage test cases to ensure that all tag hooks properly handle
the `null` value of the first argument; prior to this patch the
`<html>` tag emitted a broken strip tag in this case. The other hooks
passed the null to other callees in violation of their type
signatures, but eventually every other hook managed to safely cast the
null to the empty string without throwing an exception or emitting a
warning. For those, this patch does not change existing behavior---it
just makes the cast to the empty string much more obvious to the
reader.
Change-Id: I69fde6c06eabb2db27bb1cc23d2cb19b99273391
Html::element is more lenient about which characters it escapes.
But really this is just factored out of the next patch for ease of
review.
Change-Id: I9abb4d866a624df7bf4628ab9cc581967e715160
The <langconvert> tag takes two attributes: from (language variant from) and to (language variant to). It returns the content of the tag converted using LanguageConverter. It returns an error if the attributes are not present, if the variants do not exist, or if the variants belong to different languages. Currently it does not work for IuConverter, because the variants use the code ike rather than iu, and ike isn't in the list of languages with converters available.
This patchset reimplements from a parser function to a tag, and renames from transliterate to langconvert.
Bug: T263082
Change-Id: Idc3a32c66d5a0466c63e7ce8753d2619354c30b0