This patch only adds and removes suppressions, which must be done in the
same patch as the version bump.
Bug: T298571
Change-Id: I4044d4d9ce82b3dae7ba0af85bf04f22cb1dd347
Eliminate a difference between the magic variable (no-arg) and
parser function (1-arg) forms; aka the difference between
{REVISIONTIMESTAMP}} and {{REVISIONTIMESTAMP:{{PAGENAME}}}}.
This is a follow up to I8d25755e4d92bd91988cfb706d85bdb170abb207.
The magic variable contains a MAX_TTS optimisation which reduces
the use of vary-revision-timestamp, since it has severe performance
implications; this patch applies the same optimisation to the
parser function.
The ParserTestRunner has a small issue with test setup:
ParserOptions::setTimestamp() was called with a unix-format timestamp,
where it expected a TS_MW format timestamp. This issue was
fixed, along with tweaking the test timestamps so that a timestamp
coming from ParserOptions would still be distinguishable from one
coming from the revision.
Bug: T204370
Change-Id: I883d42d67013b6fb0da57c61e715b51d3a807879
Reduce code duplication by having the "magic variable" implementations
of:
revisionday, revisionday2, revisionmonth, revisionmonth1,
revisionyear, revisiontimestamp
invoke the corresponding "parser function" implementations with no
arguments. This reduces code duplication and also supports consistent
results from direct invocation of the parser function with no
arguments in the future.
Bug: T204370
Change-Id: I8d25755e4d92bd91988cfb706d85bdb170abb207
The following magic variables also have "parser function" forms, where
the first argument is a user-supplied title:
pagename, pagenamee, fullpagename, fullpagenamee, subpagename,
subpagenamee, rootpagename, rootpagenamee, basepagename,
basepagenamee, talkpagename, talkpagenamee, subjectpagename,
subjectpagenamee, pageid, cascadingsources, namespace, namespacee,
namespacenumber, talkspace, talkspacee, subjectspace, subjectspacee
Refactor the code so that the magic variable form invokes the parser
function form with no arguments to reduce code duplication. We also
tweak the behavior of parser function when invoked with no arguments,
although this change will not be directly visible because the parser
always prefers magic variables over parser functions, so the parser
function is never actually invoked with no arguments.
A future patch may allow the parser function to be invoked with a
hash prefix (I895087c546dc820c77c0dda596dfeb72586b87cc) in which case
consistency will be more important.
Note that `revisionuser`, `revisionid`, `revisionday`, `revisionday2`,
`revisionmonth`, `revisionmonth1`, `revisionyear` and
`revisiontimestamp` are also of a similar form and could be included
in this list, but their magic variable and parser function
implementations do not appear to be consistent. This will be
addressed in future patches.
In addition, the following magic variables have a "parser function" form
where the presence or absence of the first argument selects "raw" output:
numberofarticles, numberoffiles, numberofusers, numberofactiveusers,
numberofpages, numberofadmins, numberofedits
Similar to above, refactor the code so that the magic variable form
invokes the parser function form with no arguments to reduce code
duplication (and to support future direct invocation of the parser
function with no arguments).
Bug: T204370
Change-Id: Iaec33fb40a2d9884daf2852ed6a6a3b53c9d3863
The check is only relevant when calling Title::getTalkPage/getTalkNsText
Check exists since the addtion of the functions in a4fafb0
Bug: T317582
Change-Id: I2e36fd963b2f943ed67a93c2573008b2d1fb094b
* This patch relies on extensions setting a flag in their Parsoid ext.
config indicating that a specific tag handler needs nowikis stripped
from #tag arguments.
In the #tag parser function implementation, Parsoid's SiteConfig is
looked up to see if nowiki needs to be stripped.
* This need not be limited to nowikis, but to support extension use in
{{#tag:ext|...}} more generally, we would need to either
(a) implement the #tag parser function in Parsoid natively; OR
(b) find a way to call Parsoid from extensionSubstitution
Soln (a) needs Parsoid to support parser functions natively.
If this general support becomes necessary, a later patch can
generalize this appropriately.
Bug: T272939
Bug: T299103
Depends-On: I6a653889afd42fefb61daefd8ac842107dce8759
Depends-On: I56043e0cb7d355a3f0d08e429bb1dbba6acb4fba
Change-Id: I614153af67b5a14f33b7dfc04bd00dd9e03557d0
This corresponds to the `namespace` parser function, but between PHP 5.3
and PHP 7, `namespace` was a reserved name that couldn't be used as a
function name. It was made "semi-reserved" by the PHP 7 context-sensitive
lexer, and MW currently requires PHP >= 7.3.19.
Change-Id: If8a1401c38b9140bb40a3381845a0d115546422a
PHP 8.1 doesn't like passing nulls around, and in context the empty
string makes more sense anyways as the default value for unspecified
options.
Bug: T313663
Bug: T313662
Change-Id: Ica9460716129481f9cb1ebff3b660d2d1bb15f55
Now largely automated:
VARS=$(grep -o "'[A-Za-z0-9_]*'" includes/MainConfigNames.php | \
tr "\n" '|' | sed "s/|$/\n/;s/'//g")
sed -i -E "s/'($VARS)'/MainConfigNames::\1/g" \
$(grep -ERIl "'($VARS)'" includes/)
Then git add -p with lots of error-prone manual checking. Then
semi-manually add all the necessary "use" lines:
vim $(grep -L 'use MediaWiki\\MainConfigNames;' \
$(git diff --cached --name-only --diff-filter=M HEAD^))
I didn't bother fixing lines that were over 100 characters unless they
were over 120 and triggered phpcs.
Bug: T305805
Change-Id: I74e0ab511abecb276717ad4276a124760a268147
Make phan stricter about scalar types by setting scalar_implicit_cast to
false (the default in mediawiki-phan-config)
Bug: T242536
Bug: T301991
Change-Id: Ia2fe30b17804186571722e728578121c8b75d455
Allow static code analyzer to understand that the factory is always set
by using only on outer if for $raw
Found by phan strict checks
Change-Id: I644f03f08fdb1b23a3074c603d00e2aa863ae8c0
In modern mediawiki these methods are just wrappers for the 'defaultsort'
page property, and don't need a parser property of their own.
Change-Id: I18bdffd4d6565733fb52cbff409cc25d49a76b65
This moves the taint information to be directly on the method,
moving it out of the SecurityCheckPlugin. See discussion on
Ieb202ef92bd9888ce767f8dd4d97f19eeb10a073.
We also fix a legit "double-escape" issue flagged by the phan
SecurityCheckPlugin once the correct taint information has been
added.
Followup-To: Ic864c01471c292f11799c4fbdac4d7d30b8bc50f
Change-Id: I0f873618d43cb6daf9c43394a669125469462223
The existing Sanitizer::removeHTMLtags() method, in addition to having
dodgy capitalization, uses regular expressions to parse the HTML.
That produces corner cases like T298401 and T67747 and is not guaranteed
to yield balanced or well-formed HTML.
Instead, introduce and use a new Sanitizer::removeSomeTags() method
which is guaranteed to always return balanced and well-formed HTML.
Note that Sanitizer::removeHTMLtags()/::removeSomeTags() take a callback
argument which (as far as I can tell) is never used outside core. Mark
that argument as @internal, and clean up the version used by
::removeSomeTags().
Use the new ::removeSomeTags() method in the two places where
DISPLAYTITLE is handled (following up on T67747). The use by the
legacy parser is more difficult to replace (and would have a
performace cost), so leave the old ::removeHTMLtags() method in place
for that call site for now: when the legacy parser is replaced by
Parsoid the need for the old ::removeHTMLtags() will go away. In a
follow-up patch we'll rename ::removeHTMLtags() and mark it @internal
so that we can deprecate ::removeHTMLtags() for external use.
Some benchmarking code added. On my machine, with PHP 7.4, the new
method tidies short 30-character title strings at a rate of about
6764/s while the tidy-based method being replaced here managed 6384/s.
Sanitizer::removeHTMLtags blazes through short strings 20x faster
(120,915/s); some of this difference is due to the set up cost of
creating the tag whitelist and the Remex pipeline, so further
optimizations could doubtless be done if Sanitizer::removeSomeTags()
is more widely used.
Bug: T299722
Bug: T67747
Change-Id: Ic864c01471c292f11799c4fbdac4d7d30b8bc50f
* Some functions accept only string, cast ints and floats to string
* After preg_matches or explode() casts numbers to int to do maths
* Cast unix timestamps to int to do maths
* Cast return values from timestamp format function to int
* Cast bitwise operator to bool when needed as bool
* php internal functions like floor/round/ceil documented to return
float, most cases the result is used as int, added casts
Found by phan strict checks
Change-Id: Icb2de32107f43817acc45fe296fb77acf65c1786
The old ParserOutput::getProperty() method returned `false` when a property
was missing. This requires callers to use the `?:` syntax to supply default
values, which then causes any falsey value to be treated as missing.
So, for example, setting the defaultsort to '0' will cause the default
sort to be ignored.
Modern php convention is to use `null` for missing values, and the `??`
syntax is a better/more restrictive alternative to `?:`.
We renamed `ParserOutput::getProperty()` to `::getPageProperty()` in
1.38 (Ie963eea5aa0f0e984ced7c4dfa0fd65d57313cfa/T287216) but kept the
return value convention. Before this actually makes it into a 1.38
release, take the opportunity to fix the return value for the new
`ParserOutput::getPageProperty()` method to return `null` when the
property is missing.
We need to do some temporary workarounds to the places we'd
already swapped over to use the new `::getPageProperty()` method
to allow them to handle either `false` or `null` as a return value;
we'll clean that up once this is merged.
Code search:
https://codesearch.wmcloud.org/deployed/?q=-%3EgetPageProperty%5C%28|T301915&i=nope&files=&excludeFiles=&repos=
Bug: T301915
Depends-On: I3f11ce604970e47b41fc1c123792df8c3045626f
Depends-On: Ie7533f49fe4cad01ebfda29760d23c61e9867b10
Depends-On: Ic5c09f5caa4c897bc553c614fbae9cee159566a2
Depends-On: I0278b2eafd90e77e4fee41c45a1165fb79ddf47e
Depends-On: I383abb6b7dc5e96c0061af13957609f6e31a1065
Depends-On: I79f9f4078e415284af29b15047bafd1c823d7f5b
Depends-On: I02276c48c49f5d2d241a69eb0a6cdf439b572d8b
Depends-On: I71628661b4539a4e35ae32846e719f92bcf782e0
Depends-On: I7e215cb43de0ce150a6bcc00f92481dcdcfed383
Change-Id: Iaa25c390118d2db2b6578cdd558f2defd5351d15
Loops ServiceOptions through to CoreParserFunctions and CoreTagHooks to
avoid access to the main config from static methods.
Bug: T294739
Change-Id: Ia6c97f2d0952964c2ad6189f8053ad127589b37c
getPageCount() method return `cat_pages`, a value that makes sense
on database table but is currently non-intuitve in object context
where there's a value that better deserves the name. This makes it
necessary for callers to repeat same logic to get the content pages
count and a comment to explain the behavior.
In this patch, getMemberCount() is added. It returns the total
member count as getPageCount(), by default, does now.
getPageCount() now takes a parameter and two public constants are
provided for that; Category::COUNT_CONTENT_PAGES return count of all
memebers to retain existing behavior, Category::COUNT_CONTENT_PAGES
will return only content pages.
In future there'd be no need for the parameter. Content pages will
be returned always. Total member count is already accessible with
getMemberCount().
Also improve return type doc of getId() and getName()
Bug: T299350
Change-Id: I63c711ebc697c1a131a50910c854f956d4021254
In PHP 8.1 the default $flags argument to htmlspecialchars() has changed
from ENT_COMPAT to ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401. This
breaks some tests.
I changed all the calls that break unit tests, and some others
based on a quick code review. A lot of callers just use the default for
convenience, and were already over-quoting, so the default should still
be good enough for them.
Change-Id: Ie9fbeae6f0417c6cf29dceaf429243a135f9fecb
Encourage localization and factor out common code by taking a message
key as the first argument to ::addWarningMsg() instead of a wikitext
string. This also plays nicer with Parsoid by separating out the
localization code from the parse.
Bug: T293515
Change-Id: I6a7c04c67ac586ab00d4edcbb3d09485a7794e23
This is a uniform mechanism to access a number of bespoke boolean
flags in ParserOutput. It allows extensibility in core (by adding new
field names to ParserOutputFlags) without exposing new getter/setter
methods to Parsoid. It replaces the ParserOutput::{get,set}Flag()
interface which (a) doesn't allow access to certain flags, and (b) is
typically called with a string rather than a constant, and (c) has a
very generic name. (Note that Parser::setOutputFlag() already called
these "output flags".)
In the future we might unify the representation so that we store
everything in $mFlags and don't have explicit properties in
ParserOutput, but those representation details should be invisible to
the clients of this API. (We might also use a proper enumeration
for ParserOutputFlags, when PHP supports this.)
There is some overlap with ParserOutput::{get,set}ExtensionData(), but
I've left those methods as-is because (a) they allow for non-boolean
data, unlike the *Flag() methods, and (b) it seems worthwhile to
distingush properties set by extensions from properties used by core.
Code search:
https://codesearch.wmcloud.org/search/?q=%5BOo%5Dut%28put%29%3F%28%5C%28%5C%29%29%3F-%3E%28g%7Cs%29etFlag%5C%28&i=nope&files=&excludeFiles=&repos=
Bug: T292868
Change-Id: I39bc58d207836df6f328c54be9e3330719cebbeb
The ::getProperty() naming is too generic and doesn't clearly indicate
that these are "page properties" (which have their own table in the DB).
As part of refactoring a clean API out of ParserOutput which can be used
by Parsoid, clean up the naming here.
Soft-deprecation in this patch, there are a handful of external users
which need to be cleaned up before we hard-deprecate.
Bug: T287216
Change-Id: Ie963eea5aa0f0e984ced7c4dfa0fd65d57313cfa
Updates for the removal of the Revision class itself
and the various methods/hooks/variables removed in the
process, including:
- Update some documentation removing most references
to the Revision class and updating the MCR migration
notes to reflect the past tense for Revision methods.
- Change some capitalization from "Revision" to "revision"
to make it clear comments are about revisions in general,
not the Revision class in particular.
- Minor code tweaks including removing unused variables that
were around for the old hooks that were removed, and
removing the use of DeprecatablePropertyArray where no
longer needed for anything.
- Fix incorrect documentation for PageUpdater::getStatus(),
the status value changed a while ago to have revision-record
in addition to revision, and recently to only have the
revision-record, but ironically PageUpdater was never updated.
- Removed Parser::$mRevisionObject, used to be a Revision object
and was deprecated in 1.35, missed earlier because it was no
longer being set to Revision objects, always null.
- Add RevisionRecord typehints in DummyLinker to match those
in the corresponding Linker methods
This should be a no-op in terms of functionality.
Bug: T247143
Change-Id: I03bbb94fc29085855448780b1a5ad9063911ecc4
This has effectively been the case since 1.35; this just cleans up the
remaining code which assumed it still needed to explicitly call
Parser::firstCallInit() on a newly-constructed Parser.
Bug: T250444
Change-Id: I340947c721172f12ff413322b4283627c0b0b3a4
This is micro-optimization of closure code to avoid binding the closure
to $this where it is not needed.
Created by I25a17fb22b6b669e817317a0f45051ae9c608208
Change-Id: I0ffc6200f6c6693d78a3151cb8cea7dce7c21653
The PHP function is_numeric() returns true for numbers like '123.456'
and even '1.23e45'. However, it returns false for (string)NAN,
(string)INF, and (string)-INF (which are "NAN", "INF" and "-INF"
respectively). We can return the appropriate unicode characters for
the infinities to localize these/make them universal, and allow a
localization of the "Not a Number" message.
Make the corresponding change to Language::parseFormattedNumber() so
that its remains the inverse operation to ::formatNum().
Accept "NAN"/"INF"/"-INF" only when they stand alone in the string;
in the legacy case where text and numbers are intermingled, split
only on "traditional" numbers; I think we're more likely to find
INF/NAN "innocently" in the middle of text than we are to find it
as a "real" number.
Change-Id: I3ff227a4aac66fc938182dc9fb8a7b743e94faca
NumberFormatter handles exponential notation fine, and is_numeric
recognizes it, but some of our checks on the {{formatnum}} parser
function were a bit too strict.
Bug: T237467
Change-Id: I20c51da1e58bffeefba18237815541c1b6ccb415
* Update digitGroupingPattern to match CLDR 31: New versions of CLDR has
digit grouping pattern with decimal part. Update digitGroupingPattern
values in Message classes with this improved pattern.
Refer: http://unicode.org/reports/tr35/tr35-numbers.html
* Refer the following chart for the decimal patterns.
http://www.unicode.org/cldr/charts/31/by_type/numbers.number_formatting_patterns.html
* Uses PHP NumberFormatter class for the commafy implementation, which
is available in PHP 7.
* Some tests need to update to match the TR 35 spec
* The formatNum public method in Language.php is the preferred way to
use this feature. It does separator transformation and digit transformation
wherever applicable.
* Renamed the second param name for formatNum from noCommafy to noSeparators
* commafy method is deprecated and formatNum is preferred. Practically,
we are not just adding comma, but seperators according to the language.
Replaced some tests based on commafy methods with tests based on formatNum.
Note: The corresponding js implementation is not changed in this commit.
It would probably be a good idea to use globalize.js, which is also based
on the CLDR patterns.
Note: This patch preserves the existing off-by-one error in
$minimumGroupingDigits; T262500 will eventually fix this.
Bug: T167088
Co-Authored-By: C. Scott Ananian <cscott@cscott.net>
Change-Id: Ic721b9a91e78e4ef07040339d1006b7a90a910c0
The {{formatnum}} parser function can take anything, not just numeric
strings. We'd like to restrict Language::commafy() to operate only on
numeric strings, however (see T237467). Split the argument to the
{{formatnum}} parser function so that we only invoke
Language::commafy() on numeric strings. Add a tracking category so we
can (gradually) lint our content appropriately.
Bug: T237467
Change-Id: Ib6c832df1f69aa4579402701fad1f77e548291ee
This reverts commit c45ccd7ca8.
Reason for revert: Assuming that I6af7aeabbba fixes the real issue.
Change-Id: Ie1fc595a18e54f0c29b43740039cd7114d8e071e
This newly-added method returns `false` on error; the caller expects
it to return `null`.
Bug: T253725
Followup-To: If36b35391f7833a1aded8b5a0de706d44187d423
Change-Id: I6af7aeabbba9f95338497026fd08d9ae23f75c22
Reason for revert: issue arose again when deployed with wmf.34
Partial revert: keep the intended fix in Parser.php, revert
removal of fail-safe logic in CoreParserFunctions.hp
This reverts commit 2712cb8330.
Bug: T253725
Change-Id: I06266ca8bd29520b2c8f86c430d0f1e2d5dd20c0
Parser::getRevisionRecordObject() returns `null` if the revision is
missing, but it invokes ParserOptions::getCurrentRevisionRecordCallback()
(ie, Parser::statelessFetchRevisionRecord() by default) which returns
`false` as its error condition.
This reverts commit ae74a29af3, and instead
fixes the bug at its root.
Bug: T251952
Change-Id: If36b35391f7833a1aded8b5a0de706d44187d423
Private method, no need to worry about deprecation
Most of its uses in the class called Revision methods that were
identical to the RevisionRecord methods, and didn't need to change
to reflect the new type being returned.
Remove a use of Revision::getUserText
Bug: T249393
Bug: T250579
Change-Id: Ide0dcd01caee3d3388038e6f40edda25528f55d8