Commit graph

757 commits

Author SHA1 Message Date
C. Scott Ananian
0955046ca5 Ensure that ToC is converted into the proper target language
This patch exports the necessary information from the Parser into the
ParserOutput to ensure that the Table of Contents can be properly
language-converted: both ensuring that the target language is correct
(in cases where it differs from the content language) and that various
conversion-suppression mechanisms are functional.  When the
ParserCache does not (yet) have the new properties from Parser, the
behavior is unchanged from before (the content language is used, and
its "preferred variant").

This is a follow up to the "quick fix" deployed in
Ic14b3a49a8ee7ed600485d4f8a363a206035a847 to fix an UBN regression.

Parser tests have also been added to verify that ToC conversion
is correctly done (T299973).

Task T303329 has been opened to (eventually) rename the
'core:target-lang' and 'core:target-lang-variant' properties added to
the ParserOutput in this patch.

Bug: T303235
Bug: T295187
Bug: T299973
Followup-To: Ic14b3a49a8ee7ed600485d4f8a363a206035a847
Followup-To: Ib273f88531c340b561072ee9f616aa60725091e6
Change-Id: Ie0f1d7b6daffc8ff47228f6f086a257518f72717
2022-03-09 00:08:57 -05:00
Arlo Breault
4f22d39828 Fix parserTest name
Follow up to I206f1ccbfee1a601f3e5a4b52cb6acb5a6fbf113

Change-Id: I684e140b9a8d907f72f8e45ba2d7402dcb9d102d
2022-03-01 17:56:08 -05:00
Arlo Breault
1f381ce2cf Sync up with Parsoid parserTests files
This now aligns with Parsoid commit 3326011bc6ac9539eb197a22a22497347a4b2e35

Change-Id: Ifda86614572455d51e1855e5336c9ee613f842fd
2022-03-01 17:28:21 -05:00
Arlo Breault
350721cc2c Add mw-file-description class on links to the file description page
Matches Parsoid output.

Bug: T292657
Depends-On: Iccee2dcbc7b06d80bcb4e026eedc11042585550b
Change-Id: I206f1ccbfee1a601f3e5a4b52cb6acb5a6fbf113
2022-03-01 15:53:05 -05:00
Arlo Breault
30ae2b3a3b Sync up with Parsoid parserTests files
This now aligns with Parsoid commit 0a99d4ef38f1eba7637e6c0ddb6be434ccd5e72e

Change-Id: I0251d7c56dfb5fa917ab3666a586217e90043bec
2022-02-19 09:58:09 -05:00
Arlo Breault
4ae0758db2 Revert "Add "resource" attribute to img tags"
This reverts commit 5809ef7caa.

Bug: T292657
Bug: T297984
Depends-On: I261887a3b2d15130894b947d18a2e85537d50a1f
Change-Id: Id4e8d16344ce0f420bfd3e0d5833c67d1cf85fd8
2022-02-18 20:14:20 -05:00
Tim Starling
80a22645f6 Allow parser tests to test the value of extension data and properties
* Add "property" and "extension" options to parser tests
* Slightly refactor the relevant code since it's getting big.
* Slightly refactor the documentation too.

Change-Id: Idc4ac4eb4e20d8e3e2fdbd093ff75f26d3af0d57
2022-01-24 12:46:34 +11:00
Arlo Breault
5254ddea2d Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit d9cbfe3ae95ead0111e935f214408ccb49aa12a6

Change-Id: Id5dd98d2bd54187a060ae7523de634747e5d7594
2022-01-07 15:58:30 -05:00
jenkins-bot
7d62f16f77 Merge "Add "resource" attribute to img tags" 2022-01-07 19:23:48 +00:00
Arlo Breault
20be2afb0a Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit 1b39be0995a201b4cc949bee32c5964053bdf77e

Change-Id: I243885af834baab178cf4c4fa71de5f59e8609aa
2022-01-05 18:59:28 -05:00
Arlo Breault
5809ef7caa Add "resource" attribute to img tags
This is spec'd out at https://www.mediawiki.org/wiki/Specs/HTML#Media

It's also useful in the bug to determine when the link is pointing at
the resource, and hence MediaViewer should open.

Previously that was distinguished with .image class on the link but
that's now omitted in getDescLinkAttribs.

FIXME: Should the "resource" contain querystrings?  Maybe this needs to
be done on the Parsoid side as well.

Bug: T292657
Depends-On: Idb60e418f79dcb6a121de2a11e6e0ed0b31fd3ff
Change-Id: Ia94138383ebdbfc2feef75fdf651b969085a72b1
2021-12-15 19:16:44 -05:00
Arlo Breault
7406194be4 Disable the legacy media dom on a few more tests
Just newer and overlooked tests.  All the media in those galleries are
invalid and the gallery changes went in later in
Iff2bdc3aa02f84f0bf4ca55d177706823934cc08.

Change-Id: I6d03037af1b5c90e6d57fd048506da2b4e4bc704
2021-12-15 16:39:35 -05:00
C. Scott Ananian
4f60541f49 Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit 819630e57c646038215a144fc03e6e9c29c12328

Change-Id: Ia814636d0d8550ebb6c76be6b4b7964b2b5ce105
2021-12-10 14:33:34 -05:00
Bartosz Dziewoński
dd4d1db814 TestRunner: Set local interwiki URLs to match wgServer, like in production
Matching Parsoid change I6e7bdcdea6bc2fd955f0a04f25f09314ec1230c8.

Change-Id: I6e7bdcdea6bc2fd955f0a04f25f09314ec1230c8
2021-12-07 16:20:26 -05:00
Reedy
a349d6b677 parserTexts.txt: Remove usages of "sanity"
Bug: T254646
Change-Id: Iaf8a1df2a88a59e787c0a039c8c7becbd51dfcb5
2021-11-21 23:07:26 +00:00
Winston Sung
6eda8891a0 Update 台灣 to 臺灣 according to Wikipedia-zh village pump discussions
https://zh.wikipedia.org/wiki/Wikipedia:互助客栈/其他/存档/2019年2月?oldid=61018059#「台灣」「正體」?

Follow-up of https://gerrit.wikimedia.org/r/c/mediawiki/core/+/700626

Change-Id: I6d2a128f682e71312400b97333ffbfffe9968ee7
2021-10-26 11:02:07 +00:00
Fomafix
e86f180bd4 Merge "Encode & to & in displaytitle fallback" 2021-10-14 17:58:06 +00:00
Bartosz Dziewoński
3223981217 Move parser test with stray carriage return to extraParserTests.txt
All of my favorite text editors corrupt this test case whenever I edit
parserTests.txt. extraParserTests.txt contains other tests with weird
characters that may get corrupted by normal text editors.

(I had to use `vi` to make this patch, and I wouldn't wish this on
anyone.)

Change-Id: Id474469180fc284e3e28b55f65808be727507875
2021-10-14 00:49:58 +02:00
Fomafix
eed3121a8f Encode & to & in displaytitle fallback
The value in the attribute displaytitle must contain valid HTML. The
sanitizer of the {{DISPLAYTITLE}} parser ensures that only valid HTML
is accepted.

If there is no {{DISPLAYTITLE}} in the wikitext then displaytitle
falls back to $title->getPrefixedText(). Here an HTML encoding of
special characters is necessary. This affects only the replacement of
& by &amp; because other special characters like < and > are not
allowed in the title.

This change affects the displaytitle fallback on the following places:
* ApiParse
* ApiQueryInfo
* InfoAction
* Parser

The displaytitle fallback in OutputPage is also updated to this
behavior although
Sanitizer::normalizeCharReferences( Sanitizer::removeHTMLtags( $html )
also replaces & by &amp;.

Also add test cases with & in the displaytitle to:
* ApiParseTest
* ApiQueryInfoTest
* parserTests

Bug: T291985
Change-Id: I8ee1e2731d9bfa49725d663b34986e7e3073e4ca
2021-10-05 18:09:15 +00:00
Subramanya Sastry
c417f6eb5f Sync up with Parsoid (legacyMediaP|mediaP|p)arserTests.txt
This now aligns with Parsoid commit 29f8e7051529ecbb62fc52bff6726a4df8bf20c2

Change-Id: I4f1be053aad137c974a18291ce018f9ce8fa8f82
2021-09-30 14:57:54 -05:00
Isabelle Hurbain-Palatin
1fd9493285 Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit 9cf6f53f8adf52e92ae6fd0dc6fc6505ab6fce1f

Change-Id: Ib35dc9decc2acda0d244bd0ec7ea983867903b4e
2021-09-29 15:29:00 +02:00
Arlo Breault
9c854614a3 Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit 356629d62ad930d67798c54aa8c11f45f328d030

Change-Id: I5179a5f6fc4fdb221f1b4fd92fe0bfb3fa4442e5
2021-06-25 10:55:22 -04:00
Arlo Breault
fdd8f864b8 Emit media structure as piloted in Parsoid
Gated behind the flag $wgParserEnableLegacyMediaDOM.  The scattershot
usage of it is a little unfortunate but isn't expected to live very long
so maybe that's acceptable.

Further details can be found at,
https://www.mediawiki.org/wiki/Parsing/Media_structure

Bug: T51097
Bug: T266148
Bug: T271129
Change-Id: I978187f9f6e9e0a105521ab3e26821e36a96b911
2021-06-24 23:32:40 +00:00
Arlo Breault
c32e539bcd Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit 760eb7ea841efff29a9e740662985c330501601b

Change-Id: I06928d461e2948db2b23806e64adb2de4ef2c724
2021-04-26 15:09:39 -04:00
jenkins-bot
ee3e2a572d Merge "Don't p-wrap <aside> tags in extension HTML" 2021-04-26 18:50:46 +00:00
Amir Aharoni
c8caf26ffd Remove RLM/LRM from Names.php
This character is no longer required here.

It was added to ensure correct display of parentheses in mixed
LTR/RTL environment, for example an interlanguage link from
an RTL wiki to an LTR language with parentheses in its name.
However, the Unicode bidirectional algorithm was updated
to handle parentheses more cleverly and automatically,
making manual adjustment with RTL/LRM unnecessary.
This update was implemented years ago in all browsers and
operating systems. I've tested this in Firefox, Chrome, Edge,
and Internet Explorer 11, and it works correctly without
the RLM/LRM characters.

Parser tests are updated accordingly.

Bug: T280435
Change-Id: I63107f623ade3b8367eae579a8e96d7e2c18b747
2021-04-22 08:27:41 +00:00
Máté Szabó
377c53ae51 Don't p-wrap <aside> tags in extension HTML
Our PortableInfobox extension uses the HTML5 <aside> tag in its generated HTML.
This tag isn't recognized as a block element (in the way e.g. <div> is) by the
legacy parser, resulting in some spurious empty paragraphs in the output.

As a fix, make the legacy parser aware of <aside> tags to avoid unnecessary
p-wrapping. Also add <aside> to the Sanitizer's internal attribute check.
I3e57f55ac69d2c1ee8a1d41c21b692e56fc7e628 takes care of updating Parsoid-PHP
accordingly.

Bug: T278565
Change-Id: I89dbdf7770e13e1b62320228a366c64e64217b0b
2021-04-06 16:26:12 +02:00
Umherirrender
dc7cfa0434 Add parser test for {{safesubst:self}}
This fails without the follow-up patch with the same exception as on the
task

Bug: T276476
Follow-Up: I014da3a333f8ee6ca623b98c415b8d9f9d1be084
Change-Id: Ib61e9ea44a6fdc31e10b89c3504cecec5b9fd208
2021-04-04 22:22:29 +02:00
James D. Forrester
7c74fc35e2 parserTests: Avoid problematic language in comments
Bug: T277986
Change-Id: I1e079d670ecfb5338223a26df507427b45e28121
2021-03-28 21:23:37 -07:00
jenkins-bot
800e1f8cea Merge "Don't worry about something before when armoring french spaces" 2021-03-02 01:28:03 +00:00
Arlo Breault
6222a1aee8 Don't worry about something before when armoring french spaces
We lost some insight in c44a395 because we're no longer analysing the
entire dom as a serialized string, but instead running our regexp on
individual text nodes.

This patch as written here just allows for the space to be at the start
of the text node.  However, some git spelunking shows that in 9dc65ef,
the condition for there being a non-whitespace character previous to the
space was only because armoring French spacing happened before
doBlockLevels and wanted to protect indent pre's.

That's certainly not the case anymore, so we can probably get away with
dropping the condition altogether now.

Bug: T275918
Change-Id: I654a09b0f98937379b9fad3f325134ead7f2d8a6
2021-03-01 11:52:27 -05:00
Arlo Breault
ed543be03b Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit 241a08fb80cd5b4b16146eb99054b25c0261998c

Change-Id: I8d176b76891e10de096247f6ad3ed52ec6f5735e
2021-02-18 11:23:19 -05:00
Arlo Breault
c44a3958a3 Don't apply French spacing in raw text elements
This also means we don't need to take special care for French spacing in
attributes, since it's no longer applied there.

Adds a test that captures this change.

Note that the test "Nowiki and french spacing" wonders whether this
escaping should be applied to nowiki content.

Bug: T255007
Change-Id: Ic8965e81882d7cf024bdced437f684064a30ac86
2021-02-16 19:26:29 -05:00
jenkins-bot
7b2a853019 Merge "Parser test for Balinese language conversion" 2021-01-30 15:22:25 +00:00
Arlo Breault
21dfb00fa3 Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit 4dd80737783737621bf1fc0e0b7e954f3d1bbf3c

Change-Id: Ib780af2f1e71aa6df8369d17cebf66d3bc85686b
2021-01-29 17:28:43 -05:00
jenkins-bot
03e2d471c4 Merge "Rewrite <langconvert> to support BCP 47 tags" 2021-01-28 16:30:31 +00:00
Tim Starling
0384793a2e Parser test for Balinese language conversion
Bug: T263082
Change-Id: I0a51656c54fbd547a6283dd23a7ee571dfb43d08
2021-01-28 03:43:07 +00:00
jenkins-bot
e845067ab6 Merge "Adopt pipe trick with Arabic comma" 2021-01-16 03:29:36 +00:00
Arlo Breault
96d9eaa8c7 Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit ebf0a41507ec09a17f247acd2fdbb72555cbf2af

Change-Id: Ic3f59b93ae7b3132e1f410d0dfd35b1a4f6852be
2021-01-14 10:21:57 -05:00
David Kamholz
cdbd2e791d Rewrite <langconvert> to support BCP 47 tags
This validates langconvert's "from" and "to" arguments as valid BCP 47 tags. For example, it will accept "sr-Cyrl" and "sr-cyrl" and reject the non-standard internal MediaWiki code "sr-ec". I made the BCP 47 matching case insensitive as that seems to conform with how MediaWiki handles it elsewhere and case sensitive matching would probably be a headache for users.

Bug: T271758
Change-Id: I9f765fe650279820d61c3a7e499ca99468df3d14
2021-01-13 19:00:47 -08:00
Arlo Breault
78e85ab9e5 Split out media parser tests
Bug: T111604
Bug: T271129
Change-Id: I9893d11d50b8e5884239da2bb41262e093afc47f
2021-01-13 15:53:33 -05:00
Ebrahim Byagowi
9fe1d1f734 Adopt pipe trick with Arabic comma
Currently MediaWiki turns `[[test, abc]]` to `[[test, abc|test]]`
while saving the page but that comma isn't in use in Persian
so this patch makes MediaWiki to treat Arabic comma the same way
as regular comma.

Change-Id: Ib8051023abc25b7c4f97a3f50246f35650057ec9
2021-01-11 21:43:33 +00:00
C. Scott Ananian
a41f284324 CoreTagHooks: First argument passed to parser tags can be null
Document and enforce the correct type for the first argument to
a Parser tag hook, which will be `null` if the tag is self-closed.

Mark the methods in CoreTagHooks @internal.  They are apparently
unused outside MediaWiki core:
  https://codesearch.wmcloud.org/search/?q=CoreTagHooks&i=nope&files=&repos=

Add coverage test cases to ensure that all tag hooks properly handle
the `null` value of the first argument; prior to this patch the
`<html>` tag emitted a broken strip tag in this case.  The other hooks
passed the null to other callees in violation of their type
signatures, but eventually every other hook managed to safely cast the
null to the empty string without throwing an exception or emitting a
warning.  For those, this patch does not change existing behavior---it
just makes the cast to the empty string much more obvious to the
reader.

Change-Id: I69fde6c06eabb2db27bb1cc23d2cb19b99273391
2021-01-05 14:19:44 -05:00
Subramanya Sastry
94705b1e6a Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit b8c7ac91f5d4ec5860e23455e17a09d6c579b338

Change-Id: I80e93b2e22e10129e48a3a0312c46090e8d02551
2020-12-21 18:06:37 -06:00
C. Scott Ananian
727a77a19e ParserTestRunner: add interwiki prefixes used by Parsoid tests
Bug: T254181
Change-Id: Ia79992e8e44435746f8512b2f05408c560c80533
2020-12-21 16:53:43 -05:00
Arlo Breault
b12f5d8e20 Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit 67180924cc1d78eed9b300b6f867498da51c35bc

Change-Id: Icb1c8c3cc4e19db9fa5c93b62a6afadb9f6676dc
2020-12-18 12:30:25 -05:00
Arlo Breault
c2cef6cb58 Consistent label escaping in makeBrokenImageLinkObj
Html::element is more lenient about which characters it escapes.

But really this is just factored out of the next patch for ease of
review.

Change-Id: I9abb4d866a624df7bf4628ab9cc581967e715160
2020-12-18 11:41:09 -05:00
Arlo Breault
c203c574bd Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit c2952b434c1dc52d7c73154ca47bda19f2c2602f

Change-Id: Ic878dd183592c0ace77e3e078c40df40e54b7eab
2020-12-17 13:18:44 -05:00
David Kamholz
a7ad0547bc Implement <langconvert> tag
The <langconvert> tag takes two attributes: from (language variant from) and to (language variant to). It returns the content of the tag converted using LanguageConverter. It returns an error if the attributes are not present, if the variants do not exist, or if the variants belong to different languages. Currently it does not work for IuConverter, because the variants use the code ike rather than iu, and ike isn't in the list of languages with converters available.

This patchset reimplements from a parser function to a tag, and renames from transliterate to langconvert.

Bug: T263082
Change-Id: Idc3a32c66d5a0466c63e7ce8753d2619354c30b0
2020-12-14 19:40:31 -08:00
Arlo Breault
acb40ea0d5 Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit 010856ed7d4aeb9617ac264782809cc58d94fc47

Change-Id: Iae87302abe2c11deb36088100e50a638d58cffe6
2020-10-05 16:09:18 -04:00