We already have the ServiceOptions injected into the constructor, let's
make sure we have all the settings added to the constant and use that
instead of accessing the full config via global service container.
Except for static methods, other cases can be easily replaced as done
in this patch.
Change-Id: Id173775ea48c302cdfa698db33a0b75d6da76652
This script was deleted in Ib845f1bc2cd5c452a998b01612d45fe59e8ffc37
and seems some references to it were forgotten.
@note: The methods deleted along-side are private so no need for them
to go through the deprecation process.
Change-Id: Ib4df56e225b44f5a69be1b635d40537863a971e3
* Explicitly set wrapSections to true. This has have no significant
impact since it defaults to true within Parsoid.
* 'pageName' and 'prefix' removed from ParsoidOutputAccess since
they are not needed / used in Parsoid.
* 'logLinterData' need to be set in the ParserOutputAccess paths.
* A bunch of documentation FIXMEs as I was digging through the code.
* Record a FIXME that ParsoidOutputAccess and ParsoidParser (which
is used in the ParserOutputAccess use page) differ in how they
handle the language value (whether the default value of the title /
page or the pageLanguageOverride from the REST API). ParsoidParser
computes a preferred variant whereas ParsoidOutputAccess right now
does NOT do that. So, as part of the switchover to ParserOutputAccess,
we will need to set disableContentConversion in ParserOptions.
That will happen in a later patch.
Bug: T332931
Change-Id: I7326ae3452a7d496a57f5c4ff2ddeaf0daa7ab70
StubUserLang was meant to avoid the cost of looking up the user
preferences on requests which don't need it. There's no point in using
it if you are going to unconditionally call a method on the resulting
object.
StubUserLang proxies to RequestContext::getLanguage() via __call(),
which has a cost. Originally this cost was avoided on subsequent calls
by overwriting $wgLang, but this mechanism is not effective if you retain
a reference to the StubUserLang.
Removing the potential for Title::getPageLanguage() to return
StubUserLang simplifies the type declarations for methods that call it.
Bug: T160814
Change-Id: I12ad75c2496ca727580aac55e860178d15febb6e
Follow-up of I348840ef405e1370cc0c17d69051bce30153c9c0 for the gallery part.
Bug: T205040
Bug: T310453
Change-Id: Ia0c699675d40f6effbe359818aca3278c56042e3
The Hooks class contains deprecated functions and the whole class is
going to get removed, so remove the convenience function and inline the
code.
Bug: T335536
Change-Id: I8ef3468a64a0199996f26ef293543fcacdf2797f
Just a bit of cleanup to simplify the logic around showing/suppressing
the TOC.
Change-Id: I99f1f29bf067df2ea3f9f235af7ce054d7e4af68
Followup-To: Ib41e6e4926cb752826ad75d10e8692125fc0b064
Rather than suppress the TOCData in ParserOutput when __NOTOC__ is used,
set a new parser output flag, NO_TOC, since some clients want to know
whether there are sections present on the page irrespective of whether
the UX for the Table Of Contents should be displayed/suppressed.
Added OutputPage::getOutputFlag() as an @internal method for the
moment; eventually we should use the same object to represent
metadata in ParserOutput and OutputPage (T301020).
Bug: T332243
Followup-To: Ife2126ace95ac4d9ec44f6374c63d8fc995cf034
Followup-To: Iea6426336f93c053a5977768f0785cdb46daf5bf
Change-Id: Ib41e6e4926cb752826ad75d10e8692125fc0b064
There is no way to express that Title::castFromPageIdentity(),
Title::castFromPageReference() and Title::castFromLinkTarget()
can only return null when the parameter is null. We need to add
Phan suppressions or explicit types almost everywhere that these
methods are used with parameters that are known to not be null.
Instead, introduce new methods Title::newFromPageIdentity() and
Title::newFromPageReference() (Title::newFromLinkTarget() already
exists), without the null-coalescing behavior, and use them when
the parameter is not null. This lets static analysis tools, and
humans, easily understand where nulls can't appear.
Do the same with the corresponding TitleFactory methods.
Change the obvious uses of castFrom*() to newFrom*() (if there is
a Phan suppression, a type check, or a method call on the result).
Change-Id: Ida4da75953cf3bca372a40dc88022443109ca0cb
This reverts commit 42aa5f9481.
Reason for revert: Caused T334753, the proposed fix may need more time for review. Let's revert for now, before the train cut.
Bug: T310453
Bug: T334753
Change-Id: I790604eef00491b7f2a921fb3423a2f727f6593b
Consolidate cache TTL handling within CoreMagicVariables.
Make the TTL account for how many seconds away the value is from changing.
For example, CURRENTHOUR should change soon after the next hour is reached.
There is a minimum adjustment TTL to avoid parser-after-save delays.
This allows for longer caching in most cases, as well as more up-to-date
rendering when the hour/day/week/year is about to change. Previously, there
were blind TTLs, which are either way too pessimistic or way too generous.
This commit does not change the CURRENTTIME, CURRENTTIMESTAMP, LOCALTIME,
and LOCALTIMESTAMP words, since there is no reasonable way to cache output
while keeping them up-to-date.
Bug: T320668
Change-Id: I9acb42b0d9ff67798a1624cbf9c7cac99c8fbe2f
* Unnecessary regex modifier. I agree with this inspection which flags
/s modifiers on regexes that don't use a dot.
* Property declared dynamically.
* Unused local variable. But it's acceptable for an unused local
variable to take the return value of a method under test, when it is
being tested for its side-effects. And it's acceptable for an unused
local variable to document unused list expansion elements, or the
nature of array keys in a foreach.
Change-Id: I067b5b45dd1138c00e7269b66d3d1385f202fe7f
* Inappropriate @inheritDoc usage. Arguably all @inheritDoc is
inappropriate but these are the ones PHPStorm flags as misleading
due to the method not being inherited.
* Doc comment type does not match actual argument/return type.
* I replaced "@return void|never" with "@return void" since never means
never, it doesn't make sense for it to be conditional. If a method
can return (even if that is unlikely) then @return contains the type
that it returns. "@return never" means that there is no such type
because the method never returns.
* Incomplete/partial/broken doc tags
Change-Id: Ide86bd6d2b44387f37d234c2b059d6fbc42ec962
* Illegal string offset and invalid argument supplied to foreach, due to incorrect type information
* Array internal pointer reset is unnecessary
* $hookData unused since MW 1.35 due to incomplete revert
* array_push() with single element
* Unnecessary sprintf()
* for loop can be replaced with str_repeat()
* preg_replace() can be replaced with rtrim()
* array_values() call is redundant
* Unnecessary cast to string
* Unnecessary ternary. Often the result relies on short-circuit evaluation, but I find it more readable nonetheless.
Change-Id: I4c45bdb59b51b243fa96286bec8b58deb097d707
The TOC used to be language-converted in ParserOutput::getText(), but
it wasn't possible to apply custom rules defined in the wikitext
article body at ::getText() time. Remove the various hacks that we'd
added in an attempt to do so, which were made unnecessary by
I321cd31dae64bbf845d53282e5d28a55bc4ec319.
Bug: T306862
Change-Id: Ib12cd02e9ade91d5794462e8833f2aa3b45a51f2
Provide a way for backend code to determine the primary language of a
ParserOutput, eg for setting the Content-Language header of an API
response.
This is read-only and backed by extension data at the moment for
transition purposes; if this API sticks we'll graduate it to a
"real" property in the future, with appropriate serialization
to/from JSON (T303329).
Similarly, this patch only includes the most basic code to handle
the various ParserOutput merge cases in
ParserOutput::merge{Internal,Html,Tracking}MetaDataFrom(),
ParserOutput::collectMetadata(), and
OutputPage::addParserOutput{Content,Metadata,Text,}(); mostly
inherited from the fact that the storage is backed by extension
data at the moment.
Generally only the "top-level" parser output gets to set the
primary language; we'll presumably need to ensure that the
language is consistent during merge.
Change-Id: I767daba22805a877d9b806fd77334e508902844b
The LanguageConverter::convert()/::convertTo() methods clear the
converted title and reset other (less important) bits of
LanguageConverter state. Add an optional parameter in order
to skip this reset.
(The LanguageConverter::translate() methods are available which
don't reset LanguageConverter state, but they also don't process
embedded language converter markup. Since headings can contain
embedded markup, the ::translate() methods aren't appropriate.)
Bug: T306862
Bug: T331316
Change-Id: Ifb2745e45974755ba5a6068c13e84be6c4e3f329
In 24949480eb (Oct 2021) injection of
the Table of Contents was moved from Parser to
ParserOutput::getText(); that is, from parse time to "postprocess text
possibly fetched from the cache" time. Unfortunately, this meant that
language conversion wasn't done on the table of contents (!), for
either traditional skins or the vector-2022 skin. This was fixed for
traditional skins by 059e62cde6 (Nov
2021), later amended by 0955046ca5 (Mar
2022), which added explicit language conversion to the TOC injection
process in ParserOptions::getText(). This fix was still not complete,
however, since editor-defined custom language-conversion rules defined
in the article body were no longer available to the language converter
when conversion was done in ParserOutput::getText(); the ToC title was
also being double-converted. Further, neither of these short-term
fixes addressed the output of ParserOutput::getSections() (now
ParserOutput::getTOCData()) which was used by vector-2022 to generate
the ToC in the sidebar and which remained entirely unconverted.
With 439656e019 (Jan 2023), we started
using the ::getSections()/::getTOCData() output for main article text
as well, but we kept the previous hack which post-converted the
generated HTML. This kept old skins at parity with the post-Oct-2021
status, but also didn't address the conversion issue for vector-2022.
The solution here is to perform language conversion on the ToC lines
at parse time along with the rest of the language conversion, and
store *converted* headings in TOCData. This has a number of side
effects:
1. The ToC information array available via the action API
is now language converted. This is *probably* what you wanted in the
first place, but could potentially be disruptive.
2. The ToC is consistently converted with the full set of
editor-defined custom conversion rules. Before Oct 2021, the ToC was
converted using the set of custom conversion rules *active at the
point at which the ToC was inserted* (which was usually near the
beginning of the article). When all conversion rules appear at the
very top of the article (best practice!):
-{en:Foo; en-x-piglatin:Bar;}
Lead section text
== Introduction ==
== Foo ==
There should be no difference before pre-Oct 2021 behavior and the
behavior after this patch: in both cases the rule defined in the
article body will be applied both to the heading and to the TOC, and
they will be consistent. (After Oct 2021 and before this patch, Foo
would be converted in the heading but not in the table of contents.)
But in cases where conversion rules are defined after the
TOC insertion point, the section heading as it appears in the body
text could appear different from the section heading as it appears in
the ToC. For example, if you defined a conversion rule just before
using a term in a heading:
== Introduction ==
-{en:Foo; en-x-piglatin:Bar;}-
== Foo ==
Before Oct 2021, this rule would be applied to the heading, but not to
the TOC (because the TOC insertion point was before the rule
definition). This would also be the behavior before this patch (since
rules defined in the article body are currently not applied at all).
After this patch, the rule will be applied to both the heading and the
TOC (because the rule application location is effectively "at the very
end of the article"). In the rare cases when rules are not defined in
glossaries at the top of the article, this type of usage (definition
immediately preceding first use) is expected to be the most common
and the behavior after this patch is more correct.
But alternatively, if you defined a conversion rule *after* using
the term in a heading:
== Introduction ==
== Foo ==
-{en:Foo; en-x-piglatin:Bar;}-
Before Oct 2021, this rule wouldn't be applied to the heading *or* the
TOC. Before this patch, this would also be the case (because rules
defined in the article body are not applied at all). After this
patch, the rule will be applied to the ToC but not the heading, since
the application point for the TOC is effectively at the end of the
article. This inconsistency is probably not desirable, but this case
is expected to be rare, and (assuming the editor intended 'Foo' to be
unconverted) the editor can work around the inconsistency by
explicitly protecting 'Foo' from conversion:
== -{Foo}- ==
-{en:Foo; en-x-piglatin:Bar;}-
And if the editor /intended/ Foo to be converted, the rule definition
should be moved earlier in the article. Again, putting all rules at
the top of the article is the preferred style, and works better with
the glossary style used by the zhwiki community (see also
https://www.mediawiki.org/wiki/Requests_for_comment/Scoped_language_converter
).
Bug: T306862
Depends-On: I0c9c9fec920f7cb028d935e552a8f11475a23ba7
Change-Id: I321cd31dae64bbf845d53282e5d28a55bc4ec319
The offset is actually mesured in codepoints, not in bytes.
This field is meant to replace the "byteoffset" field since the
current naming is misleading, and we already have misused it in
deployed extensions. Support was added in Parsoid in
Ide436dca5a609c866da3c63049723243b8242c34 and the patch depends
on a version of Parsoid with that patch in mediawiki-vendor.
Parsoid still uses the old name in the ::toLegacy() serialization, and
thus in the action API, but that method will eventually be deprecated
(T327439, T330232).
Bug: T319141
Depends-On: Iacdd9a11b79bbafb9cfe9568c889ed721a137833
Depends-On: Ide436dca5a609c866da3c63049723243b8242c34
Change-Id: Ie618a964574780d2ad72192483b399407c7a0bbe
Ensure that TOCData is non-null if there is a valid table of contents
for the page -- that is, it is not suppressed (due to non-wikitext
content) and the editor hasn't used explicitly suppressed it, for
example by using __NOTOC__. (Note that __FORCETOC__ and __TOC__
both intentionally override __NOTOC__, and the TOCData will be
non-null if those are set, regardless of whether __NOTOC__ is
present.)
This gives skins the information they need to make their own
decisions about whether the table of contents is "big enough"
to be interesting, without forcing them also to reimplement the
__NOTOC__ logic.
Note that the SHOW_TOC parser output flag is provided for
legacy compatibility; it is set only when the TOC is "big enough"
under the legacy skin, which injects the TOC into the article HTML.
Change-Id: Ife2126ace95ac4d9ec44f6374c63d8fc995cf034
Before 1.39 we used <mw:toc> and in 1.39 we switched to <mw:tocplace/>
(commit 24949480eb). This was changed
to a <meta> tag in 1.40 (commit
0b10563895 and
fa8646ca7b) and the old content has long
since expired from the ParserCache. Clean up the old ParserCache
transition code.
Change-Id: I3254d0acba31e107b50767797a2b0ad28aba59ee
Matches Parsoid's current output.
Not a canonical source, but, this site says,
https://help.siteimprove.com/support/solutions/articles/80000863904-accessibility-image-alt-text-best-practices
> If no alt attribute is present, the screen reader will read the file
> name for the image instead, which can be a major distraction to those
> using screen-reading technology.
So, reading the filename seems to be a default behaviour anyways and
using the filename doesn't seem to adding any benefit. However, placing
it preempts any improvements that might happen in screen reading
technology since the screen reader would likely prefer the alt attribute
to any magic it tries to do in its absence (like machine vision
processing of the image).
An alternative proposal would be to strip off the file extension as in
I218e5565816b7643f3b85083031644e3e4749a5c and implement the same in
Parsoid.
Longer term plans that actually address the issue here are in T325955
and T63566.
Bug: T326041
Bug: T63566
Bug: T325955
Depends-On: I7b1f07190e8eaca5cbda38d9ce366aa60041ab81
Depends-On: I9dd37f70be8163df76c154f175ef50134fb811d8
Depends-On: If9cdabdfac26656272fcf3b4aaae0576aaed1346
Change-Id: If1e55feb86ce8b32f772e3b78bc9d29f122f4d58
From https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img#attr-alt
> Setting this attribute to an empty string (alt="") indicates that
> this image is not a key part of the content (it's decoration or a
> tracking pixel), and that non-visual browsers may omit it from
> rendering. Visual browsers will also hide the broken image icon if the
> alt is empty and the image failed to display.
This matches Parsoid's current output as well.
The parserTest "Image: empty alt attribute (T50924)" asserts that the
empty media options (|alt=|) is still respected.
Depends-On: Id6ad0b922f8384f2bbf08e1032b0197aa3136233
Change-Id: I8d059852f472b40b4f4f80a8fa12230f6f4f13ad
* The removed code has been extracted to Parsoid's TOCData class
to enable reuse and avoid code duplication.
Change-Id: Id17cf037b3a2bd4f9de0a12ebb382f3974244091
* Rather than computing TOC HTML in Parser and setting it in
ParserOutput, compute it on demand based on section metadata.
This will let Parsoid set section metadata in ParserOutput
and have the TOC generated automatically.
* This required fixing some "bugs" in Linker's generateTOC
which didn't properly close tags and relied on Tidy to fix
up unclosed li and ul tags.
* This patch relies on converting section metadata objects to
array objects, but Linker::generateTOC could be converted to
use TOC data instead.
* Since TOC generation is now moved to getText(), this is done
post-PC load and this eliminates the parser cache split on
user language for TOC heading localization.
Bug: T293513
Change-Id: Ief1bba326d3612b40930440c872a61abadffab10