Commit graph

2492 commits

Author SHA1 Message Date
Umherirrender
1f71eccf63 phan: Disable null_casts_as_any_type setting
Make phan stricter about null types by setting null_casts_as_any_type to
false (the default in mediawiki-phan-config)
Remaining false positive issues are suppressed.
The suppression and the setting change can only be done together

Bug: T242536
Bug: T301991
Change-Id: I0f295382b96fb3be8037a01c10487d9d591e7e01
2022-03-21 18:25:07 +00:00
jenkins-bot
2876074562 Merge "phan: Disable scalar_implicit_cast setting" 2022-03-20 15:55:45 +00:00
Umherirrender
6dd8a2bb32 phan: Disable scalar_implicit_cast setting
Make phan stricter about scalar types by setting scalar_implicit_cast to
false (the default in mediawiki-phan-config)

Bug: T242536
Bug: T301991
Change-Id: Ia2fe30b17804186571722e728578121c8b75d455
2022-03-18 18:52:24 +00:00
Alexander Vorwerk
82739980fd parser: change 'level' in parse api back to string
We changed to operate on an int internally in I92daeb0f7be8a0.
Let's cast it back to a string for the api in order to prevent
a breaking change, which is not really necessary.

Bug: T304171
Change-Id: I5f5a9203b4dd085cb5defba72c6650532bc9e8d1
2022-03-18 19:52:24 +01:00
jenkins-bot
5e6021e01b Merge "parser: Cast return of Timestamp::format to int for n" 2022-03-18 13:02:44 +00:00
jenkins-bot
a2bdaea11e Merge "parser: Fix various scalar types to match documentation" 2022-03-18 13:02:21 +00:00
Umherirrender
20d4c4ba37 parser: Cast return of Timestamp::format to int for n
Some Language function are documented to take int,
cast the formatted string to int
n = Numeric representation of a month, without leading zeros

Found by phan strict checks

Change-Id: Ifc7fc64ac26a756f181b7d0155f13a6500114f5e
2022-03-15 23:55:39 +01:00
jenkins-bot
7e5a2477ff Merge "Pass a ConvertibleTimestamp to CoreMagicVariables" 2022-03-15 22:26:32 +00:00
C. Scott Ananian
b72fa830d6 Pass a ConvertibleTimestamp to CoreMagicVariables
This avoids a type mismatch found by phan strict checks
(Ifc7fc64ac26a756f181b7d0155f13a6500114f5e) -- the passed timestamp
from Parser was a string in unix format (ie, an integer as a string)
but was declared as an integer.  It was then passed to
MWTimestamp::getInstance() which expected a string.

However, the 'simple' fix for this issue still caused unnecessary
conversions to/from timestamp format.  We took the string (nominally
in TS_MW format), ran a regexp against it to convert it to an
MWTimestamp instance, then converted that MWTimestamp to UNIX format
and exported that as a string, just to take that string and run four
different regexps against it *again* to convert it back to an
MWTimestamp instance so we can format it.

Better to just pass the MWTimestamp directly.  Only two wrinkles:

1. the ParserGetVariableValueTs hook expects to be passed a string
in TS_UNIX format and then to be able to mutate it. Nothing in production
uses that hook, so only do this conversion if the hook is registered.

2. Parsoid would like to use the definitions in CoreMagicVariables
in the future as well.  So pass the timestamp as the not-MW-@internal
ConvertibleTimestamp class instead of directly as a MWTimestamp.

Change-Id: Ib2c5fa45630c54c2716897370a0580ed48d27242
2022-03-14 17:30:15 -04:00
Umherirrender
112e9f3895 parser: Fix various scalar types to match documentation
Found by phan strict checks

Change-Id: I0a1d1eeb6ee63ce8f6f6fd7dc6fe9273847609af
2022-03-14 20:20:06 +00:00
jenkins-bot
b148414cd3 Merge "ParserOutput::collectMetadata: Suppress hard failures from Parsoid" 2022-03-12 01:25:51 +00:00
jenkins-bot
920b15ede9 Merge "Hard deprecate Parser::{get,set}DefaultSort() and ::getCustomDefaultSort()" 2022-03-12 01:25:44 +00:00
jenkins-bot
85c43c06bc Merge "Export ParserOutput strategy keys for jsConfigVars" 2022-03-11 22:36:10 +00:00
jenkins-bot
4ff0b1fa18 Merge "parser cache: Adjust types for timestamps" 2022-03-11 19:33:01 +00:00
C. Scott Ananian
71f2705498 ParserOutput::collectMetadata: Suppress hard failures from Parsoid
A number of exceptions assume that the ParserAfterParse hook is called
exactly once per top-level page, and use that hook to "finalize"
various write-once properties in ParserOutput, including jsconfigvars.
Unfortunately, ParserAfterParse can't be supported properly in Parsoid
(T303630), which results in legacy extensions using this hook
overwriting extensiondata and jsconfigvars multiple times.

Instead of throwing an exception, restore the previous Parsoid
behavior where the last write wins.  This doesn't fix the root cause,
but at least it doesn't regress.  Eventually we'll have to deprecate
the ParserAfterParse hook, and when we do so we can add deprecation
warnings to these code paths in ParserOutput::collectMetadata() as
well and eventually remove them.

Noting uses ParserOutput::collectMetadata() except Parsoid at the
moment.

Bug: T303014
Bug: T303015
Change-Id: I9d1f0f6bab1305552a0350667d6142a24bc04049
2022-03-11 12:51:32 -05:00
jenkins-bot
71d1038133 Merge "parser: Simplify CoreParserFunctions::formatRaw for phan" 2022-03-11 17:01:01 +00:00
C. Scott Ananian
a6d443f352 parser cache: Adjust types for timestamps
Split from Ifc7fc64ac26a756f181b7d0155f13a6500114f5e.

Found by phan strict checks.

Change-Id: I897367ecc1448edb542a0383e34fb34c0de8ed73
2022-03-11 11:42:05 -05:00
jenkins-bot
0200221a93 Merge "Fix various documentation related to null types (part II)" 2022-03-09 22:42:42 +00:00
C. Scott Ananian
a3000c2972 Export ParserOutput strategy keys for jsConfigVars
This allows Parsoid to properly merge jsconfigvars via the external API
(ie, when Parsoid is run in 'standalone mode') when an extension uses
the new-in-1.38 ParserOutput::appendJsConfigVar() method.

Change-Id: I974d9ecfb4ca8b22361d25c4c70fc5e55c39d5ed
2022-03-09 17:42:00 -05:00
jenkins-bot
fc8b8e77fe Merge "Fix various documentation related to scalar types" 2022-03-09 21:58:03 +00:00
Umherirrender
d30b3d8926 Fix various documentation related to scalar types
Found by phan strict checks

Change-Id: If41d16b473baddd92cc4261cdc2bfbe65fedcb19
2022-03-09 20:49:51 +00:00
Umherirrender
3fdf000bbf parser: Simplify CoreParserFunctions::formatRaw for phan
Allow static code analyzer to understand that the factory is always set
by using only on outer if for $raw

Found by phan strict checks

Change-Id: I644f03f08fdb1b23a3074c603d00e2aa863ae8c0
2022-03-09 20:21:53 +01:00
C. Scott Ananian
dc3d489156 Hard deprecate Parser::{get,set}DefaultSort() and ::getCustomDefaultSort()
Code search:
https://codesearch.wmcloud.org/deployed/?q=-%3E%28get%7Cset%29%28Custom%29%3FDefaultSort%5C%28&i=nope&files=&excludeFiles=&repos=

Change-Id: I582f13b104a6c7bb9e98427a260b191d94866fa9
Depends-On: I603051d75b551d398233c607d73403584dc75852
2022-03-09 12:06:58 -05:00
jenkins-bot
a9900325fa Merge "Ensure that ToC is converted into the proper target language" 2022-03-09 05:34:10 +00:00
jenkins-bot
74ebfcc00a Merge "Revert "ParserOutput: Use page language instead of site content language for conversion"" 2022-03-09 05:11:23 +00:00
C. Scott Ananian
0955046ca5 Ensure that ToC is converted into the proper target language
This patch exports the necessary information from the Parser into the
ParserOutput to ensure that the Table of Contents can be properly
language-converted: both ensuring that the target language is correct
(in cases where it differs from the content language) and that various
conversion-suppression mechanisms are functional.  When the
ParserCache does not (yet) have the new properties from Parser, the
behavior is unchanged from before (the content language is used, and
its "preferred variant").

This is a follow up to the "quick fix" deployed in
Ic14b3a49a8ee7ed600485d4f8a363a206035a847 to fix an UBN regression.

Parser tests have also been added to verify that ToC conversion
is correctly done (T299973).

Task T303329 has been opened to (eventually) rename the
'core:target-lang' and 'core:target-lang-variant' properties added to
the ParserOutput in this patch.

Bug: T303235
Bug: T295187
Bug: T299973
Followup-To: Ic14b3a49a8ee7ed600485d4f8a363a206035a847
Followup-To: Ib273f88531c340b561072ee9f616aa60725091e6
Change-Id: Ie0f1d7b6daffc8ff47228f6f086a257518f72717
2022-03-09 00:08:57 -05:00
C. Scott Ananian
d5cddb0c24 Ensure that the recognizedTagData static cache is properly initialized
This fixes a bug introduced inadvertently in
9f14fbd002 when the static cache for
Sanitizer::getRecognizedTagData() was refactored to reduce setup
overhead.

Bug: T303360
Change-Id: I606eeda1fcdbb6d4c62e2dc8db5b6e1659ae3f3f
2022-03-08 19:18:32 -05:00
Umherirrender
d7248d63fb Fix various documentation related to null types (part II)
The functions returning null or the class property is set explict null.
Some function should not accept null or return null.

Found by phan strict checks

Change-Id: Ie50f23249282cdb18caa332f562a3945a58d86ff
2022-03-08 23:45:31 +00:00
jenkins-bot
057d3b6358 Merge "Deprecate the ParserOutputHook functionality" 2022-03-08 21:38:43 +00:00
jenkins-bot
4140d7eaad Merge "Deprecate Parser::getDefaultSort(), ::setDefaultSort(), ::getCustomDefaultSort()" 2022-03-08 21:08:53 +00:00
C. Scott Ananian
e997f811ef Revert "ParserOutput: Use page language instead of site content language for conversion"
This reverts commit 0fdd607a84.

This attempt to fix T295187 caused other issues (T303235). A proper
fix is in Ie0f1d7b6daffc8ff47228f6f086a257518f72717.

Bug: T303235
Change-Id: Ib273f88531c340b561072ee9f616aa60725091e6
2022-03-08 16:01:08 -05:00
C. Scott Ananian
822020da6f Deprecate Parser::getDefaultSort(), ::setDefaultSort(), ::getCustomDefaultSort()
In modern mediawiki these methods are just wrappers for the 'defaultsort'
page property, and don't need a parser property of their own.

Change-Id: I18bdffd4d6565733fb52cbff409cc25d49a76b65
2022-03-08 15:08:02 -05:00
jenkins-bot
c268687d46 Merge "Hard deprecate Sanitizer::removeHTMLtags()" 2022-03-08 19:29:55 +00:00
jenkins-bot
d1cfc0317d Merge "Add explicit casts between scalar types" 2022-03-08 17:32:26 +00:00
Umherirrender
6ea3d6ac2c Add explicit casts between scalar types
php internal functions like floor/round/ceil documented to return
float, most cases the result is used as int, added casts

Found by phan strict checks

Change-Id: I92daeb0f7be8a0566fd9258f66ed3aced9a7b792
2022-03-08 16:59:01 +00:00
C. Scott Ananian
d6576e5dc6 Hard deprecate Sanitizer::removeHTMLtags()
Rename Sanitizer::removeHTMLtags() into an @internal method named
::internalRemoveHtmlTags() so that we can deprecate external use.

Code search:
https://codesearch.wmcloud.org/deployed/?q=removeHTMLtags&i=nope&files=&excludeFiles=&repos=

Followup-To: Ic864c01471c292f11799c4fbdac4d7d30b8bc50f
Depends-On: Iaca83ed06e9c61d8366579cd2283cba653c82319
Depends-On: I1963bfe9a99198ea02ca482a5769467ce806cd58
Depends-On: I83923d8b38d33f3638cd53958dd10f257ec21f7c
Depends-On: I018b34bb5f6e113056da9b04cc72d4318422adce
Change-Id: I202826f8b27519f7be89643e24eda47a6e3fc9f6
2022-03-07 22:04:56 -05:00
C. Scott Ananian
773801e439 Deprecate the ParserOutputHook functionality
These hooks should be implemented in the OutputPageParserOutput hook
instead.

Bug: T292321
Change-Id: Ib6f457596ea9d193bc03e15a48f135db4f4a6b27
2022-03-07 16:56:10 -05:00
C. Scott Ananian
a5e96d9d66 Add inline taint information for Sanitizer::remove*Tags()
This moves the taint information to be directly on the method,
moving it out of the SecurityCheckPlugin.  See discussion on
Ieb202ef92bd9888ce767f8dd4d97f19eeb10a073.

We also fix a legit "double-escape" issue flagged by the phan
SecurityCheckPlugin once the correct taint information has been
added.

Followup-To: Ic864c01471c292f11799c4fbdac4d7d30b8bc50f
Change-Id: I0f873618d43cb6daf9c43394a669125469462223
2022-03-07 16:50:58 -05:00
jenkins-bot
fd2c1795a9 Merge "Add Sanitizer::removeSomeTags() which uses Remex to tokenize" 2022-03-06 08:35:40 +00:00
C. Scott Ananian
9f14fbd002 Add Sanitizer::removeSomeTags() which uses Remex to tokenize
The existing Sanitizer::removeHTMLtags() method, in addition to having
dodgy capitalization, uses regular expressions to parse the HTML.
That produces corner cases like T298401 and T67747 and is not guaranteed
to yield balanced or well-formed HTML.

Instead, introduce and use a new Sanitizer::removeSomeTags() method
which is guaranteed to always return balanced and well-formed HTML.

Note that Sanitizer::removeHTMLtags()/::removeSomeTags() take a callback
argument which (as far as I can tell) is never used outside core. Mark
that argument as @internal, and clean up the version used by
::removeSomeTags().

Use the new ::removeSomeTags() method in the two places where
DISPLAYTITLE is handled (following up on T67747).  The use by the
legacy parser is more difficult to replace (and would have a
performace cost), so leave the old ::removeHTMLtags() method in place
for that call site for now: when the legacy parser is replaced by
Parsoid the need for the old ::removeHTMLtags() will go away.  In a
follow-up patch we'll rename ::removeHTMLtags() and mark it @internal
so that we can deprecate ::removeHTMLtags() for external use.

Some benchmarking code added.  On my machine, with PHP 7.4, the new
method tidies short 30-character title strings at a rate of about
6764/s while the tidy-based method being replaced here managed 6384/s.
Sanitizer::removeHTMLtags blazes through short strings 20x faster
(120,915/s); some of this difference is due to the set up cost of
creating the tag whitelist and the Remex pipeline, so further
optimizations could doubtless be done if Sanitizer::removeSomeTags()
is more widely used.

Bug: T299722
Bug: T67747
Change-Id: Ic864c01471c292f11799c4fbdac4d7d30b8bc50f
2022-03-04 14:06:02 -05:00
Umherirrender
45a4868f55 parser: Adjust documentation about false for $wgExternalLinkTarget
The default for $wgExternalLinkTarget is false,
which can be set or returned on ParserOptions

Found by phan strict checks

Change-Id: Ibcf6c40eaf1ffb628fc1b92a0d0ecd79d9421100
2022-03-03 22:35:41 +01:00
Umherirrender
86fc7aaf9e parser: Improve line indent in Parser::expandMagicVariable
Also break some line on other hookrunner

Change-Id: Ia4077b7f0240abd7e5b731739f2c9801fe3b5096
2022-03-01 23:42:44 +01:00
Umherirrender
9efd9ca45e Add explicit casts between scalar types
* Some functions accept only string, cast ints and floats to string
* After preg_matches or explode() casts numbers to int to do maths
* Cast unix timestamps to int to do maths
* Cast return values from timestamp format function to int
* Cast bitwise operator to bool when needed as bool

* php internal functions like floor/round/ceil documented to return
  float, most cases the result is used as int, added casts

Found by phan strict checks

Change-Id: Icb2de32107f43817acc45fe296fb77acf65c1786
2022-03-01 18:19:33 +01:00
jenkins-bot
df0801833a Merge "ParserOutput: Use page language instead of site content language for conversion" 2022-02-24 16:08:07 +00:00
jenkins-bot
5abb2ecdd5 Merge "ParserOutput: implement the abstract ContentMetadataCollector interface" 2022-02-22 22:52:52 +00:00
Func
0fdd607a84 ParserOutput: Use page language instead of site content language for conversion
Content language of specific pages can be changed manually or by the Translate extension.

Bug: T295187
Change-Id: I714711201ba71a2234d625c2e71505973655f36e
2022-02-21 14:59:37 +08:00
jenkins-bot
99dee6855a Merge "Change return value of ParserOutput::getPageProperty() when property is missing" 2022-02-19 00:49:48 +00:00
Arlo Breault
c5c7d5c9bb Fallback to parser-extlink-target instead of setting link-target
This makes the call to getImageLinkMTOParams consistent between thumbs
and not thumbs by now passing the parser in the former.  The result
being that the nofollow relationship is now added for external thumb
links as well.

Linker::getImageLinkMTOParams sets the parser-extlink-target based on
the same getExternalLinkTarget call.  This also makes it clearer when
parser-extlink-rel is modified by the target in getExternalLinkAttribs.

custom-target-link does have a higher precedence but it looks like this
was added in 1a4957e as a fallback to respect the global config so the
change seems fine.

A few parserTests cover this with the wgExternalLinkTarget config but a
case for thumbs with external links was added.

The phan annotations look like a false positive similar to
https://github.com/phan/phan/issues/3645
from the use of `$params[$type][$paramName] = $value;`

Change-Id: Icf887b13d046b0f610b1984d641f248d1dec5226
2022-02-18 23:06:43 +00:00
C. Scott Ananian
bc36c768f0 ParserOutput: implement the abstract ContentMetadataCollector interface
This has core implement an abstract interface defined by Parsoid in order
to allow Parsoid to record metadata in ParserOutput without introducing
a cyclic dependency.

Bug: T287216
Followup-To: Ia02c6774c87b13d1ae5a8ed1e55cdd8c88c19b9e
Depends-On: Ie0e358a4910c1946eb4added76318fcacf9308df
Change-Id: I15c0e81185b9957fe097c82e6609a200742ee7d1
2022-02-18 16:24:50 -05:00
C. Scott Ananian
c39ef6c6c9 Change return value of ParserOutput::getPageProperty() when property is missing
The old ParserOutput::getProperty() method returned `false` when a property
was missing.  This requires callers to use the `?:` syntax to supply default
values, which then causes any falsey value to be treated as missing.
So, for example, setting the defaultsort to '0' will cause the default
sort to be ignored.

Modern php convention is to use `null` for missing values, and the `??`
syntax is a better/more restrictive alternative to `?:`.

We renamed `ParserOutput::getProperty()` to `::getPageProperty()` in
1.38 (Ie963eea5aa0f0e984ced7c4dfa0fd65d57313cfa/T287216) but kept the
return value convention.  Before this actually makes it into a 1.38
release, take the opportunity to fix the return value for the new
`ParserOutput::getPageProperty()` method to return `null` when the
property is missing.

We need to do some temporary workarounds to the places we'd
already swapped over to use the new `::getPageProperty()` method
to allow them to handle either `false` or `null` as a return value;
we'll clean that up once this is merged.

Code search:
https://codesearch.wmcloud.org/deployed/?q=-%3EgetPageProperty%5C%28|T301915&i=nope&files=&excludeFiles=&repos=

Bug: T301915
Depends-On: I3f11ce604970e47b41fc1c123792df8c3045626f
Depends-On: Ie7533f49fe4cad01ebfda29760d23c61e9867b10
Depends-On: Ic5c09f5caa4c897bc553c614fbae9cee159566a2
Depends-On: I0278b2eafd90e77e4fee41c45a1165fb79ddf47e
Depends-On: I383abb6b7dc5e96c0061af13957609f6e31a1065
Depends-On: I79f9f4078e415284af29b15047bafd1c823d7f5b
Depends-On: I02276c48c49f5d2d241a69eb0a6cdf439b572d8b
Depends-On: I71628661b4539a4e35ae32846e719f92bcf782e0
Depends-On: I7e215cb43de0ce150a6bcc00f92481dcdcfed383
Change-Id: Iaa25c390118d2db2b6578cdd558f2defd5351d15
2022-02-18 21:15:58 +00:00