Several methods on the Content interface had been deprecated in 1.35 and
1.36 in favor of corresponding methods on the ContentHandler base class,
to allow implementations of these methods to use proper dependency
injection. This patch removes backwards compatibility support for
subclasses that were overriding these methods.
Change-Id: I8e474a1cc4dec760a7f6db25e4b313392f3723b1
* See inline comment in WikitextContentHandler that explains
what is happening here.
* Added a new WikitextContentHandlerIntegration test to verify
this expectation (which fails without the change in this patch).
Bug: T349087
Change-Id: I072ddf89562fe79bad47d741feb5788430e05bb6
WikitextContent should be a value object that does not need access to
services. For this reason, getRedirectTargetAndText has to be moved to
WikitextContentHandler.
Change-Id: Ia6595caf7e913ef580709a4d076aa2cc9dbaacef
The use of Article::getRedirectHeaderHtml() has been discouraged for a
while, since WikitextContentHandler can (should) be used to insert the
redirect header. Further, since I20db09619999919bfeda997d79561d21e3bf8718
the header should be added as an extension data property instead of
directly concatenated to the HTML. Regardless, this functionality
logically should live in LinkRenderer.
Change-Id: I4d0de0e72473ae039dca420a2733bc746d8c2951
Insert the redirect handler as part of the post-processing done in
ParserOutput::getText(). This ensures that it does not corrupt
edit-mode Parsoid output.
Depends-On: Ia6e390d849830993a6b97004f099773cfd4fa54b
Change-Id: I20db09619999919bfeda997d79561d21e3bf8718
Start using `class-string<ClassName>` as a type hint in a few places
where the information is really helpful. A lot of tools are able to
understand this already.
Change-Id: Ide45cae8c7875e664fab1155c6c720e515d8d811
There is no way to express that Title::castFromPageIdentity(),
Title::castFromPageReference() and Title::castFromLinkTarget()
can only return null when the parameter is null. We need to add
Phan suppressions or explicit types almost everywhere that these
methods are used with parameters that are known to not be null.
Instead, introduce new methods Title::newFromPageIdentity() and
Title::newFromPageReference() (Title::newFromLinkTarget() already
exists), without the null-coalescing behavior, and use them when
the parameter is not null. This lets static analysis tools, and
humans, easily understand where nulls can't appear.
Do the same with the corresponding TitleFactory methods.
Change the obvious uses of castFrom*() to newFrom*() (if there is
a Phan suppression, a type check, or a method call on the result).
Change-Id: Ida4da75953cf3bca372a40dc88022443109ca0cb
This is an initial quick-and-dirty implementation. The
ParsoidParser class will eventually inherit from \Parser,
but this is an initial placeholder to unblock other Parsoid
read views work.
Currently Parsoid does not fully implement all the ParserOutput
metadata set by the legacy parser, but we're working on it.
This patch also addresses T300325 by ensuring the the Page HTML
APIs use ParserOutput::getRawText(), which will return the entire
Parsoid HTML document without post-processing. This is what
the Parsoid team refers to as "edit mode" HTML. The
ParserOutput::getText() method returns only the <body> contents
of the HTML, and applies several transformations, including
inserting Table of Contents and style deduplication; this is
the "read views" flavor of the Parsoid HTML.
We need to be careful of the interaction of the `useParsoid` flag with
the ParserCacheMetadata. Effectively `useParsoid` should *always* be
marked as "used" or else the ParserCache will assume its value doesn't
matter and will serve legacy content for parsoid requests and
vice-versa. T330677 is a follow up to address this more thoroughly by
splitting the parser cache in ParserOutputAccess; the stop gap in this
patch is fragile and, because it doesn't fork the ParserCacheMetadata
cache, may corrupt the ParserCacheMetadata in the case when Parsoid
and the legacy parser consult different sets of options to render a
page.
Bug: T300191
Bug: T330677
Bug: T300325
Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
file_text has previously been set to false or an empty array,
depending on context, when it wasn't available. As part of normalizing
the set of types used in the search index default the value to null
and only set it if the media handler is able to extract text content
from it.
Setting the value to null when not available is done to clear out
historical false/empty array values in storage. Perhaps we need to
think more in the future about if/when default values should be
provided to search updates, everything is a bit ad-hoc today.
Bug: T322327
Change-Id: I1367154b17d9e69c9373e7efee384838aa3b51e8
Added an optional RevisionRecord param to:
- ContentHandler::getParserOutputForIndexing
- ContentHandler::getDataForSearchIndex
- the SearchDataForIndex hook
So that they have a chance to build the content related to a specific
revision.
Ultimately we'd like to make this parameter mandatory.
Bug: T317309
Depends-On: I8b220cd6c4aeeca1d924bdd527409b8602318944
Depends-On: I8616b611caab3f5fa97ff0e655b19c3034304597
Change-Id: I3298ce7591069eb32f624b2c9fbb6de58ae04a29
Following up on the comment I made at Ibbc1423166f4804a5122, make Parser
instance management a ParserFactory responsibility. It is weird for
Parser to have a ParserFactory proxy aspect.
* Add ParserFactory::getMainInstance(), which is equivalent to the old
MediaWikiServices::getParser() and $wgParser.
* Add ParserFactory::getInstance(), which is equivalent to
$wgParser->getFreshInstance(), returning the main instance if it is
free, or a new instance otherwise. The naming is supposed to encourage
it as the default way to get a parser, which will help with the linked
bug.
* Deprecate Parser::getFreshParser() and migrate all core callers.
I left the entry in ServiceWiring.php so that it's not immediately
necessary to migrate ObjectFactory specs that ask for Parser.
Bug: T310948
Change-Id: I762b191e978c2d1bbc9f332c9cfa047888ce2e67
Previously:
* It was unclear that generate-html is an optional optimization
* Most of MediaWiki core was doing $parserOutput->setText('') if
html wasn't generated. However this is wrong and will cause
$parserOutput->hasText() to return true and also potentially cause
cache pollution if a content handler both does that and supports
parser cache (Like MassMessage; see T299896)
* The default value of mText in the constructor was '', and most
of the time MW used that default. This doesn't seem right. If
setText() is never called, the ParserOutput should not be considered
to have text
* It was impossible to set mText to null, as $parserOutput->setText(null)
was a no-op. Docs implied you were supposed to do this, so it was very
confusing.
This patch clarifies docs, changes the default value for ParserOutput::$mText
from '' to null, and makes $parserOutput->setText(null) do what you
expect it to. The last two are arguably breaking changes, although
the previous behaviours were unexpected, mostly undocumented and
based on a code search do not appear to be relied on.
It seems like the main reason this only broke MassMessage is most
content handlers either don't support generateHtml, or they don't
support parser cache.
Bug: T306591
Change-Id: I49cdf21411c6b02ac9a221a13393bebe17c7871e
Depends-On: I68ad491735b2df13951399312a4f9c37b63a08fa
Make phan stricter about null types by setting null_casts_as_any_type to
false (the default in mediawiki-phan-config)
Remaining false positive issues are suppressed.
The suppression and the setting change can only be done together
Bug: T242536
Bug: T301991
Change-Id: I0f295382b96fb3be8037a01c10487d9d591e7e01
We always implicitly converted a string argument to an array anyway; just
ask the caller to do this instead so that we can have a simpler and
more straight-forward method signature which matches the plural form
of the method name.
Part of the ParserOutput API cleanup / Parsoid unification discussed
in T287216.
In a number of places we also rename $out to $parserOutput, to make it
easier for codesearch (and human readers) to distinguish between
ParserOutput and OutputPage methods.
Code search:
https://codesearch.wmcloud.org/deployed/?q=p%28arser%29%3F%28Out%7Cout%29%28put%29%3F-%3EaddModule%28Style%29%3Fs%5C%28&i=nope&files=&excludeFiles=&repos=https://codesearch.wmcloud.org/deployed/?q=arser-%3EgetOutput%5C%28%5C%29-%3EaddModule%28Style%29%3Fs%5C%28&i=nope&files=&excludeFiles=&repos=
Bug: T296123
Depends-On: Iedea960bd450474966eb60ff8dfbf31c127025b6
Depends-On: I7900c5746a9ea75ce4918ffd97d45128038ab3f0
Depends-On: If29dc1d696b3a4c249fa9b150cedf2a502796ea1
Depends-On: I8f1bc7233a00382123a9b1b0bb549bd4dbc4a095
Depends-On: I52dda72aee6c7784a8961488c437863e31affc17
Depends-On: Ia1dcc86cb64f6aa39c68403d37bd76f970e55b97
Depends-On: Ib89ef9c900514d50173e13ab49d17c312b729900
Depends-On: If54244a0278d532c8553029c487c916068e1300f
Depends-On: I8d9b34f5d1ed5b1534bb29f5cd6edcdc086b71ca
Depends-On: I068f9f8e85e88a5c457d40e6a92f09b7eddd6b81
Depends-On: Iced2fc7b4f3cda5296532f22d233875bbc2f5d1b
Depends-On: If14866f76703aa62d33e197bb18a5eacde7a55c0
Depends-On: I9b7fe5acee73c3a378153c0820b46816164ebf21
Depends-On: I95858c08bce0d90709ac7771a910f73d78cc8be4
Depends-On: If9a70e8f8545d4f9ee3b605ad849dbd7de742fc1
Depends-On: I982c81e1ad73b58a90649648e19501cf9172d493
Depends-On: I53a8fd22b22c93bba703233b62377c49ba9f5562
Depends-On: Ic532bca4348b17882716fcb2ca8656a04766c095
Depends-On: If34330acf97d2c4e357b693b086264a718738fb1
Change-Id: Ie4d6bbe258cc483d5693f7a27dbccb60d8f37e2c
Make it safe to parse articles while in the parser, by always calling
getFreshParser() from WikitextContentHandler.
I think ideally this should be a ParserFactory responsibility, with
Parser instances stored by ParserFactory instead of directly by
ServiceContainer, but this fixes the bug, follows existing conventions,
and does not reduce performance in the usual case.
Bug: T299149
Change-Id: Ibbc1423166f4804a5122de10293ea26f5704d96d
The concept of a redirect chain didn't really work for a value of
max redirect > 1. In the ideal world, we just want to have a source
which points to target (source -> target) discarding the concept of
a redirect chain completely.
Having something like: source -> target -> target1 -> target2 doesn't
really work well with the current database design.
NOTE: Support for $wgMaxRedirect will be removed soon hence
deprecation without interfaces for replacement.
Bug: T290639
Change-Id: I469de6f85e405e8ddbe7abaa5b99b77cb9cf415d
This is a uniform mechanism to access a number of bespoke boolean
flags in ParserOutput. It allows extensibility in core (by adding new
field names to ParserOutputFlags) without exposing new getter/setter
methods to Parsoid. It replaces the ParserOutput::{get,set}Flag()
interface which (a) doesn't allow access to certain flags, and (b) is
typically called with a string rather than a constant, and (c) has a
very generic name. (Note that Parser::setOutputFlag() already called
these "output flags".)
In the future we might unify the representation so that we store
everything in $mFlags and don't have explicit properties in
ParserOutput, but those representation details should be invisible to
the clients of this API. (We might also use a proper enumeration
for ParserOutputFlags, when PHP supports this.)
There is some overlap with ParserOutput::{get,set}ExtensionData(), but
I've left those methods as-is because (a) they allow for non-boolean
data, unlike the *Flag() methods, and (b) it seems worthwhile to
distingush properties set by extensions from properties used by core.
Code search:
https://codesearch.wmcloud.org/search/?q=%5BOo%5Dut%28put%29%3F%28%5C%28%5C%29%29%3F-%3E%28g%7Cs%29etFlag%5C%28&i=nope&files=&excludeFiles=&repos=
Bug: T292868
Change-Id: I39bc58d207836df6f328c54be9e3330719cebbeb
Update/Create override classes of ContentHandler.
Soft-deprecate and remove method from Content and classes that override them.
Bug: T287158
Change-Id: Idfcfbfe1a196cd69a04ca357281d08bb3d097ce2
Update ContentTransformer to access ContentHandler::preLoadTransform through the service.
Prepare object to hold a data that required for ContentHandler::preLoadTranform params.
This is a fully backwards compatible change.
We are doing hard deprecation via MWDebug::detectDeprecatedOverride.
However, with the ContentHandler calling Content and
Content calling ContentHandler, it doesn't matter whether
callers use Content or ContentHandler. This will allow us
to naturally convert all callers.
Bug: T287157
Change-Id: I89537e1e7d24c6e15252b2b51890a0bd81ea3e6b
If an exctension ContentHandler overrides one of the
subclasses of the core ContentHandler, for example
TextContentHandler, when switching calls we no longer
call deprecated Content::preSaveTransform for the
extension Content model.
Bug: T288191
Change-Id: Ie7edc97be9098f3cd188949bd37943c37a0b65ff
Create ContentTransformer to access ContentHandler::preSaveTransform through the service.
Prepare object to hold a data that required for ContentHandler::preSaveTranform params.
This will require making a semi-backwards-incompatible
change no matter what, we don't really have a great way
of hard-deprecating overriding methods.
However, with the ContentHandler calling Content and
Content calling ContentHandler, and with the ProxyContent
trick to stop infinite recursion, it doesn't matter whether
callers use Content or ContentHandler. This will allow us
to naturally convert all callers. But won't really allow
hard-deprecation.
Bug: T287156
Change-Id: If6a2025868ceca3a3b6f11baec39695e47292e40
For example, documenting the method getUser() with "get the User
object" does not add any information that's not already there.
But I have to read the text first to understand that it doesn't
document anything that's not already obvious from the code.
Some of this is from a time when we had a PHPCS sniff that was
complaining when a line like `@param User $user` doesn't end
with some descriptive text. Some users started adding text like
`@param User $user The User` back then. Let's please remove
this.
Change-Id: I0ea8d051bc732466c73940de9259f87ffb86ce7a
getNativeData() is under-specified - callers can do nothing with the
value returned by getNativeData without knowing the concrete Content
class. And if they know the concrete class, they can and should use
a specialized getter instead, anyway.
Basically, getNativeData is overly generic, an example of polymorphism
done poorly. Let's fix it now.
Bug: T155582
Change-Id: Id2c61dcd38ab30416a25746e3680edb8791ae8e8
I was bored. What? Don't look at me that way.
I mostly targetted mixed tabs and spaces, but others were not spared.
Note that some of the whitespace changes are inside HTML output,
extended regexps or SQL snippets.
Change-Id: Ie206cc946459f6befcfc2d520e35ad3ea3c0f1e0
Added FLAG_SOURCE_DATA to support additional data that is not supposed to be
part of the default mapping.
Should merged with I1484c2e62788bedb57a42869a5fb25cd8f64482f, otherwize rebuilding
an index may add an extra field to CirrusSearch mapping.
Bug: T134978
Change-Id: Ia41f8eeb9dd4f764543bdd4d71b7a50de8101101
ContentHandler implementations were not including fields
defined by their parent ContentHandler classes.
merge method is added to the SearchIndexFieldDefinition
mock in SearchEngineTest, to allow merges of fields
in the way that SearchIndexFieldDefition implementation does.
Change-Id: Id04a51528f566da2666bad0394a2f61c949c69b4