php internal functions like floor/round/ceil documented to return
float, most cases the result is used as int, added casts
Found by phan strict checks
Change-Id: I92daeb0f7be8a0566fd9258f66ed3aced9a7b792
Rename Sanitizer::removeHTMLtags() into an @internal method named
::internalRemoveHtmlTags() so that we can deprecate external use.
Code search:
https://codesearch.wmcloud.org/deployed/?q=removeHTMLtags&i=nope&files=&excludeFiles=&repos=
Followup-To: Ic864c01471c292f11799c4fbdac4d7d30b8bc50f
Depends-On: Iaca83ed06e9c61d8366579cd2283cba653c82319
Depends-On: I1963bfe9a99198ea02ca482a5769467ce806cd58
Depends-On: I83923d8b38d33f3638cd53958dd10f257ec21f7c
Depends-On: I018b34bb5f6e113056da9b04cc72d4318422adce
Change-Id: I202826f8b27519f7be89643e24eda47a6e3fc9f6
The existing Sanitizer::removeHTMLtags() method, in addition to having
dodgy capitalization, uses regular expressions to parse the HTML.
That produces corner cases like T298401 and T67747 and is not guaranteed
to yield balanced or well-formed HTML.
Instead, introduce and use a new Sanitizer::removeSomeTags() method
which is guaranteed to always return balanced and well-formed HTML.
Note that Sanitizer::removeHTMLtags()/::removeSomeTags() take a callback
argument which (as far as I can tell) is never used outside core. Mark
that argument as @internal, and clean up the version used by
::removeSomeTags().
Use the new ::removeSomeTags() method in the two places where
DISPLAYTITLE is handled (following up on T67747). The use by the
legacy parser is more difficult to replace (and would have a
performace cost), so leave the old ::removeHTMLtags() method in place
for that call site for now: when the legacy parser is replaced by
Parsoid the need for the old ::removeHTMLtags() will go away. In a
follow-up patch we'll rename ::removeHTMLtags() and mark it @internal
so that we can deprecate ::removeHTMLtags() for external use.
Some benchmarking code added. On my machine, with PHP 7.4, the new
method tidies short 30-character title strings at a rate of about
6764/s while the tidy-based method being replaced here managed 6384/s.
Sanitizer::removeHTMLtags blazes through short strings 20x faster
(120,915/s); some of this difference is due to the set up cost of
creating the tag whitelist and the Remex pipeline, so further
optimizations could doubtless be done if Sanitizer::removeSomeTags()
is more widely used.
Bug: T299722
Bug: T67747
Change-Id: Ic864c01471c292f11799c4fbdac4d7d30b8bc50f
The old ParserOutput::getProperty() method returned `false` when a property
was missing. This requires callers to use the `?:` syntax to supply default
values, which then causes any falsey value to be treated as missing.
So, for example, setting the defaultsort to '0' will cause the default
sort to be ignored.
Modern php convention is to use `null` for missing values, and the `??`
syntax is a better/more restrictive alternative to `?:`.
We renamed `ParserOutput::getProperty()` to `::getPageProperty()` in
1.38 (Ie963eea5aa0f0e984ced7c4dfa0fd65d57313cfa/T287216) but kept the
return value convention. Before this actually makes it into a 1.38
release, take the opportunity to fix the return value for the new
`ParserOutput::getPageProperty()` method to return `null` when the
property is missing.
We need to do some temporary workarounds to the places we'd
already swapped over to use the new `::getPageProperty()` method
to allow them to handle either `false` or `null` as a return value;
we'll clean that up once this is merged.
Code search:
https://codesearch.wmcloud.org/deployed/?q=-%3EgetPageProperty%5C%28|T301915&i=nope&files=&excludeFiles=&repos=
Bug: T301915
Depends-On: I3f11ce604970e47b41fc1c123792df8c3045626f
Depends-On: Ie7533f49fe4cad01ebfda29760d23c61e9867b10
Depends-On: Ic5c09f5caa4c897bc553c614fbae9cee159566a2
Depends-On: I0278b2eafd90e77e4fee41c45a1165fb79ddf47e
Depends-On: I383abb6b7dc5e96c0061af13957609f6e31a1065
Depends-On: I79f9f4078e415284af29b15047bafd1c823d7f5b
Depends-On: I02276c48c49f5d2d241a69eb0a6cdf439b572d8b
Depends-On: I71628661b4539a4e35ae32846e719f92bcf782e0
Depends-On: I7e215cb43de0ce150a6bcc00f92481dcdcfed383
Change-Id: Iaa25c390118d2db2b6578cdd558f2defd5351d15
There is a common and reasonable need for longer lines in tests.
The nudge for shorter lines doesn't seem valuable here. The natural
breaks will likely still fall in 80-100 given the enforced practice
for non-test code, e.g. whether through habit, or 80-100 column markers
in text editors, or the finite width of diff and code review
interfaces.
Change-Id: I879479e13551789a67624ce66f0946d2f185e6ee
Category keys are supposed to be non-null strings.
The test cases use bogus integer values, which causes issues when
refactoring more strictly enforces validity checks on category sort
key values.
Change-Id: If2937a694ba6bd4c522336f33aa58d40949b5a54
Valid values for ParserCache::$mIndexPolicy are '', 'noindex', and 'index'.
The test cases use the bogus value 'policy1', which causes issues when
refactoring more strictly enforces validity checks on index policy
values.
Change-Id: I2d00ff4e3ded043d18942c8482a39fac14ec60bc
Expected value is the first parameter to assertSame() or assertEquals().
And turn to use assertCount() for some assertions aginst count of array.
Based on code search `assert(?:Same|Equals)\(.+,.+expected` and I look
through files roughly, so some assertions that don't contains 'expected'
are also fixed. In the meantime, some assertions that I am not clear
about are not touched.
Change-Id: I75798b60d29fd19b33f4fdf34ed3c788db420d01
Soft-deprecate the use of ::setExtensionData() to destructively update
the value stored under a single key. Add the new
::appendExtensionData() method to use where multiple values are
desired. This accomodates the asynchronous and incremental parsing
goals on the Parsoid roadmap.
Bug: T300981
Change-Id: I2dea4ba71ea506428854a9983c1abd906b2efd5f
Deprecate ParserOutput::addJsConfigVars() and add setter methods which
better ensure that the ParserOutput contents are independent of parse
order. This accomodates the asynchronous and incremental parsing goals
on the Parsoid roadmap.
Bug: T300307
Change-Id: I4f08d1098da211f7bf5c43c08c620de224cbf37f
Loops ServiceOptions through to CoreParserFunctions and CoreTagHooks to
avoid access to the main config from static methods.
Bug: T294739
Change-Id: Ia6c97f2d0952964c2ad6189f8053ad127589b37c
This reverts commit 2bcb3fe567.
Reason for revert: this is a good change,
just needed more work to not break CI
Change-Id: I23768bee242e3cf81b1493a740cf070e7ad1e224
This does not move the actual limit report data into
ParserOptions yet, that should be done separately
given that it will require serialization changes.
Let's get this change settled first before messing
with serialization.
This unifies canonical and non-canonical ParserOptions,
so ParserCache can now be used with both. It is hard
to say how this will affect the ParserCache capacity,
so we should monitor it after releasing this.
Change-Id: I154c0a77a5b0287b5572614d56339fb57ac56c33
* Do not store table of contents in parser output
* Instead inject table of contents via strpos where needed
inside Article based on Skin "toc" option
* Use <mw:tocplace> as a TOC placeholder; for Parsoid compatibility
this will be replaced with a <meta> tag in a followup patch.
Bug: T287767
Change-Id: I44045b3b9e78e7ab793da3f37e3c0dbc91cd7d39
Encourage localization and factor out common code by taking a message
key as the first argument to ::addWarningMsg() instead of a wikitext
string. This also plays nicer with Parsoid by separating out the
localization code from the parse.
Bug: T293515
Change-Id: I6a7c04c67ac586ab00d4edcbb3d09485a7794e23
This moves the implementation of ParserOutput::addTrackingCategory()
to the TrackingCategories class as a non-static method. This makes
invocation from ParserOutput awkward, but when invoking as
Parser::addTrackingCategory() all the necessary services are
available. As a result, we've also soft-deprecated
ParserOutput::addTrackingCategory(); new users should use the
TrackingCategories::addTrackingCategory() method, or else
Parser::addTrackingCategory() if the parser object is available.
The Parser class is already kind of bloated as it is (alas), but there
aren't too many callsites which invoke
ParserOutput::addTrackingCategory() and don't have the corresponding
Parser object handy; see:
https://codesearch.wmcloud.org/search/?q=%5BOo%5Dutput%28%5C%28%5C%29%29%3F-%3EaddTrackingCategory%5C%28&i=nope&files=&excludeFiles=&repos=
Change-Id: I697ce188a912e445a6a748121575548e79aabac6
Make ::setCategory() consistent with the corresponding singular method,
which is ::addCategory(), not ::addCategoryLink(). Also, don't return
a value.
This renaming is in preparation for factoring out a write-only base
class from ParserOutput suitable to be used by Parsoid.
Note that OutputPage does distinguish a 'category link' from a
'category list', and there are separate OutputPage::getCategories()
and OutputPage::getCategoryLinks() methods. However, the category
map in ParserOutput isn't exactly the same as either of these:
it's actually a map (or list of pairs) of category name to sort key.
Rename ParserOutput::getCategoryLinks() to ::getCategoryNames()
in order to clarify that the concept involved is not the same as
the OutputPage "category links" methods.
Code search:
https://codesearch.wmcloud.org/deployed/?q=-%3E(get%7Cset)CategoryLinks%5C(&i=nope&files=&excludeFiles=&repos=
(Note that many of the code search matches are for the methods in
OutputPage, which we are trying to disambiguate here.)
Bug: T287216
Change-Id: Idb383d3d9ef7b76f8a0208a057a3cb8c639465c9
Usages of Title in the public interface of ParsrOutput are being
replaced with LinkTarget or PageReference, as appropriate.
Change-Id: I571cf18fd7e666a59ef3947b09c2e75357bcac04
- Added a test where ParserOutput objects with CacheTime
properties set are unserialized from previous versions.
- Generate new serialization tests for 1.38
Now all serialization in production is JSON, so changing
property visibility shouldn't affect ParserCache.
Bug: T263851
Depends-On: I283340ff559420ceee8f286ba3ef202c01206a23
Change-Id: I70d6feb1c995a0a0f763b21261141ae8ee6dc570
The ::getProperty() naming is too generic and doesn't clearly indicate
that these are "page properties" (which have their own table in the DB).
As part of refactoring a clean API out of ParserOutput which can be used
by Parsoid, clean up the naming here.
Soft-deprecation in this patch, there are a handful of external users
which need to be cleaned up before we hard-deprecate.
Bug: T287216
Change-Id: Ie963eea5aa0f0e984ced7c4dfa0fd65d57313cfa
This name is consist with the rest of the setter and getter methods
in ParserOutput. Renamed the methods in OutputPage, ImageHistoryList,
ImageHistoryPseudoPager, and ContribsPager as well for consistency;
it also makes chasing down lingering references in codesearch easier.
Soft-deprecated the old name for 1.38. Hard-deprecation will follow,
but there are a number of users in production that should be chased
down first.
Code search:
https://codesearch.https://codesearch.wmcloud.org/deployed/?q=(allow%7Cprevent)Clickjacking&i=nope&files=&excludeFiles=&repos=
Bug: T287216
Change-Id: I9822c60c180d204bd30cb4447a1120155d456da4
New option 'absoluteURLs' was added to getText method
of the ParserOutput object that replaces all links
in the page HTML with absolute URLs.
Removing the action=render special case from Title
seems safe cause we will end up replacing the result
with absolute URL if we're in a render action no matter
where Title::getLocalUrl was called from.
This change is safely revertable from the perspective
of ParserCache.
Bug: T263581
Change-Id: Id660e1026192f40181587199d3418568f0fdb6d3
If the default is not set for a lazy loaded option,
canonical parser output key ends up being 'lazy_option=default',
because when computing the keys we exclude defaults.
If on the other hand we register a default for a lazy option,
it does not get loaded, since in lazyLoadOption we believe
the default is already the loaded value.
Change-Id: I92b3e18fabef4eecac2ec2a4844f1be2716e5d89
Needed-By: I3bce04684070ad306685dabbc51267def25773cd