Commit graph

863 commits

Author SHA1 Message Date
Arlo Breault
023a1ab36e Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit bbc56abd8fcc245876fc340723ed3431565b61d1

Change-Id: I463401aa3d2dd290d5ba6813e8e869f713682dd3
2018-07-03 15:02:33 -04:00
Gergő Tisza
66d5060369
Update parser tests for translation change
Bug: T198625
Change-Id: I1f52288be319f7d429a8536987cf5efc357d78c6
2018-07-02 16:20:34 +02:00
C. Scott Ananian
ee8d2c461b Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit b068bb51d29e294a4f4a875ae829cca8cf314205

Change-Id: Ie8e8a7ed631894f56372e286ed01d1583f7a8979
2018-06-26 10:38:05 -04:00
jenkins-bot
875f8ea9a2 Merge "Armor against French spaces detection in HTML attributes" 2018-06-25 21:42:03 +00:00
Aaron Schulz
ec3289524e Avoid bad method call to patchPatch() in DbTestRecorder
Bug: T193995
Change-Id: Ibc480b04463792b7cd720a6eb080e0960a30e440
2018-06-25 21:14:13 +01:00
Fomafix
a60dcdc2e3 Armor against French spaces detection in HTML attributes
This change also solves T13874 in a generic way.

Bug: T5158
Change-Id: Id8cdb887182f346acab2d108836ce201626848af
2018-06-21 19:24:07 +02:00
jenkins-bot
84fa176c9c Merge "Avoid deprecated LinkCache::singleton()" 2018-06-14 23:48:54 +00:00
jenkins-bot
23b7f3bbd5 Merge "parser: Validate $length in padleft/padright parser functions" 2018-06-12 00:39:53 +00:00
Fomafix
e1630b6a53 PHP: Use short ternary operator (?:) where possible
Change-Id: Idcc7e4fcdd4d8302ceda44bf6d294fa8c2219381
2018-06-11 11:26:35 +02:00
Kunal Mehta
c4e5a9dd97 Avoid deprecated LinkCache::singleton()
Change-Id: Ie0e5c4ef0fe6ec896378bb2433af0898655dd907
2018-06-10 23:55:11 -07:00
Kunal Mehta
dc96f656af parser: Validate $length in padleft/padright parser functions
$length is user input, so cast it to an int before passing it to min().
If there is nothing to add at that point, return immediately.

In PHP 7.1+ this raised a warning of "A non-numeric value encountered"
because min() will return the junk value, returning a string. Then we
try and subtract an int from it (return value of mb_strlen()),
triggering the warning.

Added a parser test to verify the behavior, and confirmed that it
triggers warnings without the patch.

Bug: T180403
Change-Id: I614750962104f6251a864519035366ac9798fc0f
2018-06-10 11:20:13 -07:00
jenkins-bot
5a6c78c441 Merge "Use PHP 7 '??' operator instead of '?:' with 'isset()' where convenient" 2018-05-31 19:01:07 +00:00
Jforrester
9befbd38dc Revert "Strip soft hyphens (U+00AD) from title"
This reverts commit 6b8a5a137d.

Change-Id: Ica5abe69c316792aa2f7eafad9b1d63183b282a8
2018-05-31 10:45:11 -07:00
Jforrester
722ff7b1fc Revert "Strip Unicode 6.3.0 directional formatting characters from title"
This reverts commit 7564624d1c.

Change-Id: I5d596f8f3c784920829de6ae50b270b0396369e0
2018-05-31 10:45:11 -07:00
Bartosz Dziewoński
485f66f174 Use PHP 7 '??' operator instead of '?:' with 'isset()' where convenient
Find: /isset\(\s*([^()]+?)\s*\)\s*\?\s*\1\s*:\s*/
Replace with: '\1 ?? '

(Everywhere except includes/PHPVersionCheck.php)
(Then, manually fix some line length and indentation issues)

Then manually reviewed the replacements for cases where confusing
operator precedence would result in incorrect results
(fixing those in I478db046a1cc162c6767003ce45c9b56270f3372).

Change-Id: I33b421c8cb11cdd4ce896488c9ff5313f03a38cf
2018-05-30 18:06:13 -07:00
Kunal Mehta
99e861920d Complete coverage of Parser::getTemplateDom()
This test covers the branch of code when the $mTplRedirCache is already
populated, by using the same template that redirects twice.

Change-Id: Ie0ce277c75366b7b060e0da6873175976621aff9
2018-05-27 23:17:50 -07:00
Kunal Mehta
3e0b1f7cf6 Improve Parser::braceSubstitution() coverage
Change-Id: I3d9426143fe486c6aed0494b68773a36e24c02d9
2018-05-26 22:57:55 -07:00
Fomafix
7564624d1c Strip Unicode 6.3.0 directional formatting characters from title
Unicode 6.3.0 (September 2013) the added additional directional
formatting characters:

U+061C ARABIC LETTER MARK
U+2066 LEFT-TO-RIGHT ISOLATE
U+2067 RIGHT-TO-LEFT ISOLATE
U+2068 FIRST STRONG ISOLATE
U+2069 POP DIRECTIONAL ISOLATE

https://www.fileformat.info/info/unicode/version/6.3/index.htm

This change strips the new directional formatting characters from the
title like the directional formatting characters from Unicode 1.1.0
(June 1993).

Any existing titles containing the new Unicode directional formatting
characters get stripped by a run of maintenance/cleanupTitles.php after
deployment.

This change also allows to insert the new Unicode directional
formatting characters into the DISPLAYTITLE.

Change-Id: I2279f51048f5252c2e4280ec6a13f060ff9967cb
2018-05-10 08:58:23 +02:00
Fomafix
6b8a5a137d Strip soft hyphens (U+00AD) from title
This change strips all soft hyphens from the title. This is already
done for Unicode bidi characters (T5696).

URLs with soft hyphens (%C2%AD) get redirected (301) to the URL without
soft hyphens (T145605):
https://de.wikipedia.org/wiki/Bosnatal%C2%ADbahn get redirected to
https://de.wikipedia.org/wiki/Bosnatalbahn

Links in wikitext containing soft hyphen "[[Bosnatal<AD>bahn]]" (the
"<AD>" stands here for a soft hyphen) links "Bosnatalbahn" but displays
"Bosnatal<AD>bahn".

This change also allows to insert soft hyphens into the displaytitle
(T66528). This allows to insert soft hyphens into the first heading for
manual hyphenation of titles with very long words.

This change prevents access to any existing articles containing soft
hyphens in the title. After deploying this change a run of
maintenance/cleanupTitles.php must performed to rename existing titles
with soft hyphens. Before deploying this change existing articles and
redirects with soft hyphens in the title can already renamed or
deleted.

Bug: T121979
Bug: T66528
Change-Id: Ie13626c433cdb460dbf00b3bba28d1bb5a7b6d6a
2018-05-10 08:56:29 +02:00
jenkins-bot
9879c85d1e Merge "Replace wfGetLB" 2018-05-04 19:08:32 +00:00
Umherirrender
e1a203603c Replace wfGetLB
@deprecated since 1.27

Change-Id: Ibdd49fdfc0d1511503e1ed2173a592c612996c53
2018-05-02 22:30:24 +02:00
James D. Forrester
846f4f58f5 Remove $wgExperimentalHtmlIds and related code, deprecated in 1.30
Bug: T139744
Change-Id: Ia15d5ab6e7637fd40d5c3399822a3dbeb7b383b5
2018-05-01 14:34:02 -07:00
Kunal Mehta
3849c58953 ParserTestRunner: Fix some documentation types
Change-Id: I7d0375815dc6ac91cc3f39ea7910cdf1dff49666
2018-04-20 18:28:29 +00:00
Brad Jorsch
8853300a6b ParserTestRunner: Reset InterwikiLookup service
Otherwise earlier tests might have cached prefixes in the service and
cause these tests to fail.

Change-Id: Id0e6184aff8f9d7e8f32558e1de14faa0168cc1d
2018-04-10 17:09:01 -04:00
jenkins-bot
2384948256 Merge "Fix parsing of <pre> tags generated by extension tag hooks" 2018-04-04 19:36:31 +00:00
Subramanya Sastry
c349861427 Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit ad7c4322d4dd7903065f066d8d96ead875b5126b

Change-Id: Ica20c20ce8f40786f9b2b8ec4c3021f49843354f
2018-03-26 16:42:26 -07:00
jenkins-bot
abeca9ac48 Merge "Fix whitespace trimming in headings" 2018-03-26 23:12:31 +00:00
Antoine Musso
b7d495cf0c ParserTest: clear Language namespaces cache
The content language object has a cache for namespaces, it might then
not take in account $wgExtraNamespaces set by the parser test suite
which causes unknown namespaces errors.

Ensure the new language object has a clean cache.

Repro:
php phpunit.php --filter '(ParserMethodsTest::testValidCovers|T53680)'

Bug: T190554
Change-Id: I9c4104d7bb3a0c84b60d7e7b4154743cbe58348c
2018-03-25 15:36:10 +02:00
Subramanya Sastry
87c7ccd9bc Fix whitespace trimming in headings
* b3dd3881 was trimming whitespace in wikitext as well as HTML headings
  whereas the whitespace-trimming proposal was going to leave HTML tags
  untouched.

* 30495ea1 missed this because coincidentally, the test I added there
  for HTML headings had a typo and used <h2>...<h2> instead of
  <h2>...</h2> which caused the test to magically pass.

* This patch trims whitespace in
  doHeadings (which deals with wikitext headings) instead of
  formatHeadings (which deals with all headings).

* Updated parser tests to account for this.

Change-Id: I854f20b4c39a0a8e03d70155b269de77acf02cae
2018-03-23 11:42:01 -05:00
jenkins-bot
743eca95e3 Merge "Clarify -{ => {{ transition" 2018-03-21 20:37:21 +00:00
Subramanya Sastry
7aa15f7220 Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit 3f79aa9fd48e68d32d1b9bdc3e29ec4536f297b8

Change-Id: I12249e39ddc6e3344a9dd8a1545b129ed469e184
2018-03-20 11:11:32 -05:00
jenkins-bot
3bae295aef Merge "RFC T157418: Trim whitespace in table cells, list items, headings" 2018-03-20 15:26:06 +00:00
This, that and the other
5ff5dbc7dc Fix parsing of <pre> tags generated by extension tag hooks
When this part of BlockLevelPass::execute() encounters a block-level tag,
such as <hr>, one of $openMatch or $closeMatch will be truthy.

Without this patch, $this->closeParagraph() is unconditionally called in
this situation, which sets $this->inPre = false. If we're already inside a
<pre> tag, this makes the parser think we're no longer in a <pre>
environment, so it starts wrapping the <pre> tag's content in <p> tags as
if it was processing regular content.

We should only call $this->closeParagraph() in the case that (a) we are not
inside a <pre> tag, or (b) the block-level tag that is being opened is
itself a <pre> tag (in which case $preOpenMatch will be truthy, and
$this->inPre will have already been set to true).

This doesn't affect the parsing of <pre> tags that are written in wikitext,
since their content isn't parsed. It only affects hooks and the like that
return <pre> tags.

This doesn't solve the task T7718 that is mentioned in the code comment,
but if the testwiki test cases linked there are anything to go by, it
doesn't make the problem worse in any way.

This is required for Poem change I754f2e84f7d6efc0829765c82297f2de5f9ca149.

Change-Id: I469e633fc41d8ca73653c7e982c591092dcb1708
2018-03-17 23:20:50 +11:00
Subramanya Sastry
30495ea1f9 RFC T157418: Trim whitespace in table cells, list items, headings
* Matmarex had implemented this for wikitext headings in b3dd3881.
* This patch extends this to wikitext list items and wikitext table cells.
* Updated RELEASE NOTES.

tests/parser/parserTests.txt:
* All whitespace removed in output of list items, table cells, and
  headings. Removed corresponding whitespace in the input wikitext
  except for a few tests where the whitespace is significant "| +"
  or "| -", for example.
* Updated output of html/parsoid sections as well.
* Added new tests to spec white-space trimming behavior.

tests/phpunit/*:
* Fixed a few tests that used whitespace in list items and table cells.

Bug: T157418
Change-Id: I8ea34c7ab893c0c125c81d810feeb3c581e4bba1
2018-03-16 13:42:55 -05:00
Umherirrender
1ca39974ce Remove @group from ParserTestRunner
It should be used on phpunit tests only

Change-Id: I0f73690606c8b92ce65ed1324394f5523b8156f5
2018-03-16 17:45:33 +01:00
Arlo Breault
f27c50b580 Clarify -{ => {{ transition
Ensure we have the correct rule on the stack.

Change-Id: Ie814df7b759a2381be0b815eeefdb5d1f7adcde0
2018-03-15 13:04:59 -04:00
jenkins-bot
6b97d969c3 Merge "Replace wfGetLBFactory" 2018-03-09 11:33:10 +00:00
C. Scott Ananian
65fcb7a945 Use class="free external" only on unbracketed URLs
The ability for URLs to be marked free even if they use bracketed syntax
but "sorta look free" (aka unbracketed) was added 13 years ago in
2d71cb3080 (r7074).

It seemed like a reasonable idea at the time: make printed output a little
prettier by marking "sorta free" URLs as free.  But this complicates the
semantics of wikitext, and introduces all sorts of strange corner cases,
for example:

  [http://example.com/&amp; http://example.com/&]

isn't marked as free, even though the parser output is:

  <a rel="nofollow" class="external text" href="http://example.com/&amp;">http://example.com/&amp;</a>

This functionality isn't actually needed: if you want the pretty printed
output of an unbracketed URL, then actually use an unbracketed URL.

In recent years we're more concerned with simplifying the semantics of
wikitext and eliminating corner cases, such that the content of our wikis
can be effectively archived.  The "effectively free" URLs are low-hanging
fruit in this quest.

Change-Id: I339e8698786c60c96a37a73443cb9a04362662c4
2018-03-07 00:20:09 -05:00
C. Scott Ananian
dc6c5002cf Sync up with Parsoid parserTests.txt
This now aligns with Parsoid commit 7d2a92f81ebbc0941e8fba2a136f5929406ea5e6

Change-Id: I0b57b1bd3b0802ce08249dd0bf376b931d8c7698
2018-03-06 17:56:16 -05:00
jenkins-bot
6e65e0ab0d Merge "Use RemexHtml as the tidy implementation for parser tests" 2018-03-06 22:27:42 +00:00
Kunal Mehta
bd91229204 Use RemexHtml as the tidy implementation for parser tests
* RemexHtml is the future of "tidy" in MediaWiki,
  so run our parser tests using it.

* This is a necessary step before we can make it
  the default in MediaWiki (T185753).

* Cleaned up a bunch of tests:
  (a) where html/php+tidy and html/parsoid match up,
      retained a html+tidy section and removed the others.
  (b) where html/php and html/php+tidy match up,
      retained the html/php section and removed the
      html/php+tidy section.

* Annotating tests with explanations where Parsoid & Remex
  output differ. This is usually because of two reasons:
  (a) Parsoid has Tidy-emulation code in some cases (which
      we can consider stripping away separately).
  (b) Parsoid does a bunch of cleanup on the DOM (which was
      probably done to emulate Tidy output, but which could
      probably be retained). Since Parsoid (in some form)
      will be default parser in the future, no reason to try
      to port this cleanup (in broken markup scenarios) into
      Remex.

* Left a bunch of FIXMEs for later followup.

Unrelated cleanup:
* Renamed a few tests since the functionality in Parsoid
  was fixed up. There is no more "implicit <td>" support.
  Those all now lead to fostered content.
* Fixed some clearly broken output in html/parsoid sections
  for some tests.

Co-Authored-by: Kunal Mehta <legoktm@member.fsf.org>
Co-Authored-by: Subramanya Sastry <ssastry@wikimedia.org>
Bug: T188167
Depends-On: I646dbabb3c2ed28c1ea72c5bd8f7f92d03f57c75
Change-Id: Ic7c34d57a300dbd36a37f03fbfe33391b2950b44
2018-03-02 14:30:27 -08:00
Brad Jorsch
b3e575bb8f Parser: Don't wrap <style> or <link> tags in paragraphs
If <style> or <link> tags are by themselves on a line, don't wrap them
in <p> tags. But, at the same time, don't end an existing paragraph if
we find <style> or <link> in the middle (like we would if we just
treated them as block tags).

If <style> or <link> is on a line with other text, though, let it be
wrapped in a paragraph along with that other text.

Bug: T186965
Change-Id: Ide4005842cdab537226aa538cb5f7d8e363ba95d
2018-02-28 14:12:49 -05:00
Umherirrender
554f9c857c Replace wfGetLBFactory
@deprecated since 1.27

Change-Id: I11a7253cebe525948a55cebee183e6de128fdc39
2018-02-27 20:02:48 +00:00
Brad Jorsch
27c61fb1e9 Add actor table and code to start using it
Storing the user name or IP in every row in large tables like revision
and logging takes up space and makes operations on these tables slower.
This patch begins the process of moving those into one "actor" table
which other tables can reference with a single integer field.

A subsequent patch will remove the old columns.

Bug: T167246
Depends-On: I9293fd6e0f958d87e52965de925046f1bb8f8a50
Change-Id: I8d825eb02c69cc66d90bd41325133fd3f99f0226
2018-02-23 10:06:20 -08:00
Umherirrender
63d96c15fd build: Updating mediawiki/mediawiki-codesniffer to 16.0.0
Change-Id: I59b59f79bbf3ce4feff3b3a20c1c31bc16370531
2018-02-17 13:29:13 +01:00
Reedy
39f0f919c5 Update suppressWarning()/restoreWarning() calls
Bug: T182273
Change-Id: I9e1b628fe5949ca54258424c2e45b2fb6d491d0f
2018-02-10 08:50:12 +00:00
Kunal Mehta
1ec5bb6dbf editTests: Use the correct list of parser test files
$wgParserTestFiles is deprecated, so this wasn't running the core parser
tests. Using ParserTestRunner::getParserTestFiles() includes everything,
including autodiscovered extension parser tests.

Change-Id: Ie3b02565c184e8e06931ab52a39ca8ae0877aab9
2018-02-09 12:08:05 -08:00
Brad Jorsch
d511626236
Add 'unwrap' ParserOutput post-cache transform
And deprecate passing false for ParserOptions::setWrapOutputClass().

There are three cases for the Parser wrapper: the default
mw-parser-output, a custom wrapper, or no wrapper. As things currently
stand, we have to fragment the parser cache on each of these options,
which uses a nontrival amount of storage space (T167784).

Ideally we'd do all the wrapping as a post-cache transform, but
TemplateStyles needs to know the wrapper in use in order to properly
prefix its CSS rules (that's why we added the wrapper in the first
place). So, second best option is to make *un*wrapping be a post-cache
transform and make "custom wrapper" be uncacheable.

This patch does the first bit (unwrapping as a post-cache transform),
and a followup will do the second part once the deprecation process is
satisfied.

Bug: T181846
Change-Id: Iba16e78c41be992467101e7d83e9c3134765b101
2018-02-01 14:24:27 -08:00
Umherirrender
45da581551 Use ::class to resolve class names in tests
This helps to find renamed or misspelled classes earlier.
Phan will check the class names

Change-Id: Ie541a7baae10ab6f5c13f95ac2ff6598b8f8950c
2018-01-26 22:49:13 +01:00
Prateek Saxena
60a64e8912 Gallery: Use Parser::parseWidthParam() for gallery dimensions
Used by the `setWidths` and `setHeights` methods to make sure we are
using correct values.

Makes `parseWidthParam` static to be used in the gallery class.

Bug: T129372
Change-Id: I38b9ef0ea26e3748ad5d5458fadd2545f677ef93
2018-01-25 17:35:40 -05:00