Commit graph

1801 commits

Author SHA1 Message Date
C. Scott Ananian
ad89079a44 Avoid counting input lines twice in BlockLevelPass::execute()
In T208070 / I120ca25a77b7b933de4afddd1d458e36a95e26da we added a
check whether we were processing the last line of input, in order
to avoid emitting extra trailing newlines.  But if the number of
input lines is large, StringUtils::explode() will return an
iterator which doesn't implement Countable for efficiency.
I22eebb70af1b19d7c25241fc78bfcced4470e78a fixed this, but at the
cost of scanning the string twice: once just to count the number
of newlines before we begin to iterate over the lines.

This patch uses Iterator::valid() to determine if we're on the
last iteration without having to scan the string twice.

Bug: T208070
Bug: T218817
Change-Id: I41a45266d266195aa6002d3854e018cacf052ca6
2019-03-20 17:35:14 -04:00
Arlo Breault
73239ee9bd BlockLevelPass: further fixes for T218817
The previous fix for T218817 (I22eebb70af1b19d7c25241fc78bfcced4470e78a)
was a bit premature: we didn't notice that ExplodeIterator *also*
used a different Iterator::key() than ArrayIterator -- it used
the string position as a key, not the line number.  Combined with
an inequality test for "not the last line" meant that almost every
line was now the "last line" and we were missing a lot of needed
newlines.

Count the lines ourselves to fix the problem.

Bug: T208070
Bug: T218817
Change-Id: I55a2c4c0ec304292162c51aa88b206fea0142392
2019-03-20 17:31:29 -04:00
Arlo Breault
4d7dcf5c96 parser: Count occurrences of newlines
StringUtils::explode() returns an ExplodeIterator if the number of
separators is too high, which doesn't implement count.

So count the way that explode does.

Bug: T218817
Change-Id: I22eebb70af1b19d7c25241fc78bfcced4470e78a
2019-03-20 16:33:55 -04:00
jenkins-bot
47296f1381 Merge "parser: Rename $lastSection to $lastParagraph" 2019-03-19 14:37:54 +00:00
jenkins-bot
da42dd58e7 Merge "parser: Omit outputting newline after final line" 2019-03-19 14:37:16 +00:00
jenkins-bot
76e1adc554 Merge "parser: Remove trailing newline after prefixes have been cleared" 2019-03-19 14:18:30 +00:00
jenkins-bot
be8019b967 Merge "Improve RemexStripTagHandler working with tables" 2019-03-19 04:25:47 +00:00
Arlo Breault
19cea6b2cd parser: Rename $lastSection to $lastParagraph
This now at least matches the function names even though what's actually
meant is more like 'block-level tag'.  Section is a poor choice of name
since there're wikitext sections unrelated to this.

Change-Id: Ic83aff4d862800b778441c28884194480b7e7d96
2019-03-15 15:45:59 -04:00
Arlo Breault
a91757523a parser: Omit outputting newline after final line
Bug: T208070
Depends-On: I47d1d9620031036b9497cacf70b34a45c3e5f409
Depends-On: I6119b4af9632496dbda81c3a3951c55217e7c2d5
Depends-On: I584f74e2ba0d14c2975fb43cc53c5e26080e6fc7
Depends-On: Ie70e1915c172d2d67b3b8b90eb35f753b800f61e
Change-Id: I120ca25a77b7b933de4afddd1d458e36a95e26da
2019-03-15 14:22:45 -04:00
Arlo Breault
8384d48ae0 parser: Remove trailing newline after prefixes have been cleared
Bug: T208070
Depends-On: I74953d5de765a2245a2999f17c7ae1cf49376bd1
Change-Id: I05511aee275238954f22db78616b19ce10cd6490
2019-03-15 14:22:42 -04:00
jenkins-bot
fa0fe8d294 Merge "Avoid using outdated $casToken field for BagOStuff calls" 2019-03-14 22:14:23 +00:00
Erik Bernhardson
aef02d516d Improve RemexStripTagHandler working with tables
HTML, generated by some infoboxes and perhaps other places, gets
stripped in a way that merges words together that should not be
merged. Add tr, th, and td to the list of tags that should force
word separation.

Bug: T218001
Change-Id: Ib374339628b1f543ea4e07f24aa3e3b76f3117b5
2019-03-14 13:11:59 -07:00
Arlo Breault
fe3a04748b parser: closeParagraph already resets the lastSection
Change-Id: Ic24c9aa25852cc786a5ca438c2c1e9031f9e7c17
2019-03-13 03:42:43 +00:00
Aaron Schulz
f474fa4cef Avoid using outdated $casToken field for BagOStuff calls
Change-Id: Ic9bcb388e4f50e2ae16ae57aa16113e79b43350b
2019-03-11 23:39:29 -07:00
Aaron Schulz
ba7645032a Remove deprecated ParserOutput::legacyOptions
Change-Id: I5f32dd741f3ee795ec599aacb687d5cee2c52835
2019-03-11 22:50:47 -07:00
Max Semenik
1e9db557d7 Remove $wgMediaInTargetLanguage
It's a temporary feature flag not included in any release, just
removing it outright. The functonality will now be always enabled.

Bug: T205040
Change-Id: Ia9da82e6f6b2d270f1790a99fc8c35ad5e6aee5e
2019-03-08 15:24:39 -08:00
Fomafix
f17c297624 Use short assignment operator in PHP
Use
  $var .= $foo
instead of
  $var = $var . $foo

Change-Id: I5dcdd7278e618c14968e5ac1fb8ea43ac2200deb
2019-03-07 09:55:49 +01:00
Timo Tijhof
c6f3440832 resourceloader: Remove addModuleScripts, and deprecate getModuleScripts.
The addModuleScripts() methods were deprecated in 1.31 and 1.32,
these are now removed.

The getModuleScripts() are now deprecated as well, always returning
an empty array. To be removed in 1.34.

Depends on commits for bundled/wmf-deployed extensions that
remove the last few remaining callers to the deprecated functions
in: 3D, Collection, Flow, GlobalUserPage, and Wikibase.

Bug: T188689
Depends-On: If9f0bc6aef85117587fa1929f34f8861c8d80314
Depends-On: Ia8d41b97fbf6822f5f8f7ac889408acce1ac9a3a
Depends-On: I503b919739ea474ff33726815b0da55e2f7e2724
Depends-On: I236ef637fd03b810a46eb361e25067a037e9d183
Depends-On: I62e17779753b977a452cc0c9694947941e999cc3
Change-Id: I5a19b8f164ccf666485d2971202194b747f882df
2019-03-05 16:54:08 +00:00
Fomafix
204126e7c7 Simplify strings in PHP code
Change-Id: I481810ade68b0c5a5be21d22e2a107646d5813e6
2019-03-01 22:16:26 +01:00
Thiemo Kreuz
37b3383e8b Remove comments literally repeating the next line of code
I would argue that these comments do not add any information that
would not be there already. Having them adds mental overhead, because
one needs to read both the comment and the next line of code first to
understand they say the exact same. I don't find this helpful, but
more distracting.

Change-Id: I39c98f25225947ebffdcc2fd8f0243e7a6c070d7
2019-02-27 17:28:40 +00:00
Max Semenik
adf90edb33 Sanitizer: remove deprecated parameter to escapeIdReferenceList()
Change-Id: Iacd5796718c1d64e7290cfd9669c99d8f9e85dc5
2019-02-21 20:12:22 -08:00
Brian Wolff
286d49011f Various fixes for phan-taint-check
Bug: T216348
Change-Id: I0adafdc680dae0e930f38f08fe926645c57be06c
2019-02-17 11:41:11 +00:00
addshore
bc86b698cd parser: Add new pcache metrics, split by page content model
Change-Id: I31c3c5b863309ffcc4424c43891b577b3fb7a753
2019-02-11 20:48:56 +00:00
Kunal Mehta
cc5d9a92a2 build: Updating mediawiki/mediawiki-codesniffer to 24.0.0
Change-Id: I66b1775b7c1d36076d9ca78cbeb42787a743f2aa
2019-02-07 18:39:42 +00:00
jenkins-bot
3082005f00 Merge "Safe replacement of a lot of !count() with === []" 2019-01-16 07:03:47 +00:00
Thiemo Kreuz
c3dfa88966 Add missing empty lines between methods
This might hint at an edge-case in the PHP CodeSniffer sniff that should
detect if methods are separated by a single empty line. Feel free to
investigate. I, personally, can't invest more time in this than
suggesting this quick fix.

Change-Id: Ib3c60eac76f255b4fe929f7933de256222716576
2019-01-15 19:14:35 +00:00
Thiemo Kreuz
734a969d55 Safe replacement of a lot of !count() with === []
This was originally a global search and replace. I manually checked all
replacements and reverted them if (due to the lack of type hints) either
null (that would be 0 when counted) or a Countable object can end in the
variable or property in question.

Now this patch only touches places where I'm sure nothing can break.

For the sanity of the honorable reviewers this patch is exclusively touching
negated counts. You should not find a single `!== []` in this patch, that
would be a mistake.

Change-Id: I5eafd4d8fccdb53a668be8e6f25a566f9c3a0a95
2019-01-15 17:28:49 +01:00
jenkins-bot
bd78869618 Merge "No yoda conditions" 2018-12-09 01:34:23 +00:00
Alangi Derick
19adaa6a4b parser: Fix PHPDoc annotations in parser module
Change-Id: I09680d72516f943051e86655b5fddf9ff2988e4e
2018-12-08 13:07:10 +00:00
jenkins-bot
9ff8e0a946 Merge "Remove most support for configuring Tidy, including Raggett" 2018-12-05 18:59:50 +00:00
jenkins-bot
cbc6169044 Merge "Remove duplicate keys from arrays" 2018-12-02 00:18:23 +00:00
Jakub Vrana
3fc3b9e578 Change typehint callback to callable
Found by PHPStan.

Change-Id: I77877a18131bd69996bad07f2ee1c5f3ba3ba2e7
2018-12-01 10:02:48 +01:00
jenkins-bot
c984a1f2f8 Merge "Quoted attributes don't need to be followed by a space" 2018-11-27 16:21:41 +00:00
jenkins-bot
4077b57759 Merge "Parse wikitext in gallery caption" 2018-11-27 15:47:50 +00:00
C. Scott Ananian
f87898b488 Protect legacy URL parameter syntax in link and alt options
HTML doesn't allow certain semicolon-less HTML entities in attribute
values to avoid breaking legacy markup like:
  <a href="http://example.com?foo&param=bar">...</a>
(Note that the & in that URL is not properly entity-escaped as `&amp;`.)

Unlike wikitext, HTML generally allows semicolon-less legacy entities
in text.

Our alt and link option processing shove text through
Sanitizer::stripAllTags, which does entity decoding including these
legacy semicolon-less entities.  Wikitext doesn't allow semicolon-less
entities, so escape & characters where appropriate to protect alt/link
options and avoid breaking URLs.

This was a "regression" in how alt options were handled starting in
ddb4913f53 when we switched to using
Remex for Sanitizer::stripAllTags -- semicolon-less entities (previously
invalid in wikitext) were now being decoded when stripAllTags was
called on alt text.  This change became a problem when
ad80f0bca2 sent link option text through
Sanitizer::stripAllTags (with the new semicolon-less entity decode)
instead of PHP's strip_tags (which, in addition to its other faults,
doesn't do entity decode at all).  This suddenly started decoding
"non-wikitext" entities like `&para` inside URLs, breaking links.
Filed T210437 as a follow-up to consider changing the behavior
of Sanitizer::stripAllTags() globally to prevent it from decoding
semicolon-less entities for all callers.

Bug: T209236
Change-Id: I5925e110e335d83eafa9de935c4e06806322f4a9
2018-11-27 10:12:05 -05:00
Jakub Vrana
9f14c02e20 Remove duplicate keys from arrays
Found by PHPStan.

Change-Id: Ie0e0cfa33b3caa4a13f4dfb04c772c8a0284435a
2018-11-26 19:22:08 +01:00
Fomafix
3ee1560232 No yoda conditions
Replace
  if ( 42 === $foo )
by
  if ( $foo === 42 )

Change-Id: Ice320ef1ae64a59ed035c20134326b35d454f943
2018-11-21 17:54:39 +01:00
C. Scott Ananian
6db35b3c98 Remove most support for configuring Tidy, including Raggett
Remex is pure PHP so there is no reason to use an external tidy any
more. Configuration variables and implementation classes were
deprecated in 1.32 or earlier.  We've kept only $wgTidyConfig
which can be used for experimental features or debugging Remex.

Bug: T198214
Change-Id: I99d48f858d97b6e1d1e6cd76a42c960cc2c61f9f
2018-11-15 12:22:06 -05:00
jenkins-bot
7a3eb1f3a6 Merge "Hard deprecate codepaths where tidy is disabled" 2018-11-13 23:54:24 +00:00
Brad Jorsch
d65e96b763 Use new externallinks.el_index_60 field
This adds a method to LinkFilter to build the query conditions necessary
to properly use it, and adjusts code to use it.

This also takes the opportunity to clean up the calculation of el_index:
IPs are handled more sensibly and IDNs are canonicalized.

Also weird edge cases for invalid hosts like "http://.example.com" and
corresponding searches like "http://*..example.com" are now handled more
regularly instead of being treated as if the extra dot were omitted,
while explicit specification of the DNS root like "http://example.com./"
is canonicalized to the usual implicit specification.

Note that this patch will break link searches for links where the host
is an IP or IDN until refreshExternallinksIndex.php is run.

Bug: T59176
Bug: T130482
Change-Id: I84d224ef23de22dfe179009ec3a11fd0e4b5f56d
2018-11-12 22:33:18 +00:00
Arlo Breault
59bb8864a2 Quoted attributes don't need to be followed by a space
Further, this splits up attribute parsing from filtering.

Change-Id: Ib4e0a808a6ca2ba032873e885837233e2f2feefe
2018-11-09 16:00:18 -05:00
C. Scott Ananian
54ac31f94d Hard deprecate codepaths where tidy is disabled
Future parsers will not support the output generated with tidy disabled.

Parser tests using untidied output will also be deprecated (and
rewritten) in a follow-up patch.

No new release notes necessary since user-visible tidy configuration
was deprecated previously (in 1.32), and individual methods which had
disabled tidy during execution were individually release-noted as they
were updated.

Bug: T198214
Depends-On: I0f417f75a49dfea873e9a2f44d81796a48b9f428
Depends-On: If5c619cdd3e7f786687cfc2ca166074d9197ca11
Change-Id: I592e0e0dfef7d929f05c60ffe4d60e09725b39cc
2018-11-05 18:49:16 +00:00
jenkins-bot
9aedec343e Merge "Handle <nowiki> and other markup consistently in image link/alt options" 2018-11-02 01:59:01 +00:00
Max Semenik
c16704c33a Display SVGs in target language
Previously, they were always displayed in defult language unless
forced explicitly in wikitext, e.g. [[File:Foo.svg|lang=ru]].
This change adds a feature flag that would enable always trying to
display in page language.

* If enabled, Parser will pass a new parameter - 'pagelang' - to
  the media handler.
* SvgHandler uses page language when determining what language to
  render the image in.
* 'pagelang' can always be overridden by 'lang'.
* If no translation in page language is available, the default
  language (English) will be used for thumbnail URLs, to prevent
  cluttering media storage and HTTP caches with useless copies.

Performance: this requires accessing image's metadata during parsing.
My testing indicates there were no code path where this wasn't the
case already, so no performance hit is expected, however we should
still keep an eye on page save performance.

Bug: T205040
Change-Id: I348840ef405e1370cc0c17d69051bce30153c9c0
2018-10-30 16:12:11 -07:00
jenkins-bot
682f3d92b2 Merge "Parser: Remove markNoConversion for displaytitle error message" 2018-10-29 21:42:40 +00:00
jenkins-bot
824936c08f Merge "Hard deprecate $wgTidyConfig['driver'] = 'disabled'" 2018-10-29 20:59:59 +00:00
Fomafix
e085e3f310 Parser: Remove markNoConversion for displaytitle error message
bacd87e49 moved the displaytitle error message from the content to
outside of the content. Only the content is converted by the language
conversion. The error message outside of the content is not converted.
Therefor markNoConversion is not needed here anymore.

This change removes the -{R|...}- around the displaytitle in the error
message when the language converter is active.

Bug: T208249
Change-Id: Ieec43e9af045d19b0b7a82afb889e076b347eed1
2018-10-29 21:23:52 +01:00
C. Scott Ananian
58abac2d14 Change ParserOptions tidy default to true
We are deprecating the non-tidy modes of the parser.

ParserOptions::getCanonicalOverrides() has always set `tidy` to `true` at
any rate, so this isn't going to invalidate any parser cache entries.

Change-Id: Ib703a041edf8a8d57e94f136965f72d9bbfcf222
2018-10-29 19:37:00 +00:00
C. Scott Ananian
661c43f3eb Hard deprecate $wgTidyConfig['driver'] = 'disabled'
This was already deprecated in the release notes, and is not used in
production, but I'd overlooked adding an appropriate hard deprecation
notice in MWTidy::factory() to notify downstream users.

Change-Id: I8f4d8154a1d8a233017f54f0fb4bcfdf4a0373e1
2018-10-25 22:20:49 -04:00
Tim Starling
a6a017cea4 Fix use of non-existent variable Parser::$config
Fix bug from Ib4394f370cb561ccf195338a1c2e9e465dcb3dc3

Add test.

Bug: T208000
Change-Id: Ia81cca1b64afef2af3cb8dff19719a7f0de9d306
2018-10-25 16:27:55 -07:00