Commit graph

65 commits

Author SHA1 Message Date
Reedy
d3b14ec862 WikiTextStructure/WikitextContentHandler: Minor cleanup
Change-Id: If2f8243867994609d82618e61ddaaacca3516990
2024-02-01 08:01:27 +00:00
James D. Forrester
9bfb75ff90 Namespace ParserOutput
Most used non-namespaced class!

Bug: T353458
Change-Id: I4c2cbb0a808b3881a4d6ca489eee5d8c8ebf26cf
2023-12-14 14:57:34 -05:00
daniel
eb881d9b59 Remove deprecated methods from Content interface
Several methods on the Content interface had been deprecated in 1.35 and
1.36 in favor of corresponding methods on the ContentHandler base class,
to allow implementations of these methods to use proper dependency
injection. This patch removes backwards compatibility support for
subclasses that were overriding these methods.

Change-Id: I8e474a1cc4dec760a7f6db25e4b313392f3723b1
2023-11-21 12:40:11 +01:00
Subramanya Sastry
f4a9eb992a Pass full content to Parsoid for redirect pages
* See inline comment in WikitextContentHandler that explains
  what is happening here.

* Added a new WikitextContentHandlerIntegration test to verify
  this expectation (which fails without the change in this patch).

Bug: T349087
Change-Id: I072ddf89562fe79bad47d741feb5788430e05bb6
2023-10-17 13:12:17 -04:00
daniel
954d4c5c59 Move getRedirectTargetAndText out of WikitextContent
WikitextContent should be a value object that does not need access to
services. For this reason, getRedirectTargetAndText has to be moved to
WikitextContentHandler.

Change-Id: Ia6595caf7e913ef580709a4d076aa2cc9dbaacef
2023-09-22 17:08:20 -04:00
C. Scott Ananian
097983e9d7 Unit test for LinkRenderer::makeRedirectHeader() used by WikitextContentHandler
Change-Id: I3577633670d0c2a771c690e3d6601300e1867222
2023-09-22 17:08:20 -04:00
C. Scott Ananian
07b396d5b5 Move Article::getRedirectHeaderHtml() to LinkRenderer::makeRedirectHeader()
The use of Article::getRedirectHeaderHtml() has been discouraged for a
while, since WikitextContentHandler can (should) be used to insert the
redirect header.  Further, since I20db09619999919bfeda997d79561d21e3bf8718
the header should be added as an extension data property instead of
directly concatenated to the HTML.  Regardless, this functionality
logically should live in LinkRenderer.

Change-Id: I4d0de0e72473ae039dca420a2733bc746d8c2951
2023-09-22 17:08:17 -04:00
C. Scott Ananian
3dd695a1ec WikitextContentHandler/ParserOutput: move redirect header to post processing
Insert the redirect handler as part of the post-processing done in
ParserOutput::getText().  This ensures that it does not corrupt
edit-mode Parsoid output.

Depends-On: Ia6e390d849830993a6b97004f099773cfd4fa54b
Change-Id: I20db09619999919bfeda997d79561d21e3bf8718
2023-09-15 15:20:01 -04:00
thiemowmde
f549005179 content,maintenance: Use class-string<ClassName> in doc blocks
Start using `class-string<ClassName>` as a type hint in a few places
where the information is really helpful. A lot of tools are able to
understand this already.

Change-Id: Ide45cae8c7875e664fab1155c6c720e515d8d811
2023-07-31 17:14:09 +00:00
Bartosz Dziewoński
6ba47296d9 Fix Phan suppressions related to Title::castFrom*() and friends
There is no way to express that Title::castFromPageIdentity(),
Title::castFromPageReference() and Title::castFromLinkTarget()
can only return null when the parameter is null. We need to add
Phan suppressions or explicit types almost everywhere that these
methods are used with parameters that are known to not be null.

Instead, introduce new methods Title::newFromPageIdentity() and
Title::newFromPageReference() (Title::newFromLinkTarget() already
exists), without the null-coalescing behavior, and use them when
the parameter is not null. This lets static analysis tools, and
humans, easily understand where nulls can't appear.

Do the same with the corresponding TitleFactory methods.

Change the obvious uses of castFrom*() to newFrom*() (if there is
a Phan suppression, a type check, or a method call on the result).

Change-Id: Ida4da75953cf3bca372a40dc88022443109ca0cb
2023-04-22 16:45:09 +02:00
C. Scott Ananian
cfd9c516e1 Allow setting a ParserOption to generate Parsoid HTML
This is an initial quick-and-dirty implementation.  The
ParsoidParser class will eventually inherit from \Parser,
but this is an initial placeholder to unblock other Parsoid
read views work.

Currently Parsoid does not fully implement all the ParserOutput
metadata set by the legacy parser, but we're working on it.

This patch also addresses T300325 by ensuring the the Page HTML
APIs use ParserOutput::getRawText(), which will return the entire
Parsoid HTML document without post-processing.  This is what
the Parsoid team refers to as "edit mode" HTML. The
ParserOutput::getText() method returns only the <body> contents
of the HTML, and applies several transformations, including
inserting Table of Contents and style deduplication; this is
the "read views" flavor of the Parsoid HTML.

We need to be careful of the interaction of the `useParsoid` flag with
the ParserCacheMetadata.  Effectively `useParsoid` should *always* be
marked as "used" or else the ParserCache will assume its value doesn't
matter and will serve legacy content for parsoid requests and
vice-versa.  T330677 is a follow up to address this more thoroughly by
splitting the parser cache in ParserOutputAccess; the stop gap in this
patch is fragile and, because it doesn't fork the ParserCacheMetadata
cache, may corrupt the ParserCacheMetadata in the case when Parsoid
and the legacy parser consult different sets of options to render a
page.

Bug: T300191
Bug: T330677
Bug: T300325
Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
2023-03-26 21:46:05 -04:00
James D. Forrester
ad06527fb4 Reorg: Namespace the Title class
This is moderately messy.

Process was principally:

* xargs rg --files-with-matches '^use Title;' | grep 'php$' | \
  xargs -P 1 -n 1 sed -i -z 's/use Title;/use MediaWiki\\Title\\Title;/1'
* rg --files-without-match 'MediaWiki\\Title\\Title;' . | grep 'php$' | \
  xargs rg --files-with-matches 'Title\b' | \
  xargs -P 1 -n 1 sed -i -z 's/\nuse /\nuse MediaWiki\\Title\\Title;\nuse /1'
* composer fix

Then manual fix-ups for a few files that don't have any use statements.

Bug: T166010
Follows-Up: Ia5d8cb759dc3bc9e9bbe217d0fb109e2f8c4101a
Change-Id: If8fc9d0d95fc1a114021e282a706fc3e7da3524b
2023-03-02 08:46:53 -05:00
jenkins-bot
7a01a3b0c9 Merge "search: Set file_text to null when not available" 2023-01-05 08:48:47 +00:00
Amir Sarabadani
a1b4699fea Reorg: Move MagicWord related files to under parser/
This is approved as part of T166010 RFC.

Bug: T321882
Change-Id: Ia4498c0a20e38a6a288dc14065ea8242c84fbc49
2022-12-09 13:48:35 +01:00
daniel
090ec5777d Use services in WikitextContentHandler
Change-Id: I626b5ee9a070ad3a97ab9ac9f44cb7003d68bf13
2022-12-06 15:44:40 -05:00
Erik Bernhardson
3ff25f3fa9 search: Set file_text to null when not available
file_text has previously been set to false or an empty array,
depending on context, when it wasn't available. As part of normalizing
the set of types used in the search index default the value to null
and only set it if the media handler is able to extract text content
from it.

Setting the value to null when not available is done to clear out
historical false/empty array values in storage. Perhaps we need to
think more in the future about if/when default values should be
provided to search updates, everything is a bit ad-hoc today.

Bug: T322327
Change-Id: I1367154b17d9e69c9373e7efee384838aa3b51e8
2022-12-02 18:21:16 +00:00
David Causse
9fbd8f500f Make the doc building for search aware of the revision
Added an optional RevisionRecord param to:
- ContentHandler::getParserOutputForIndexing
- ContentHandler::getDataForSearchIndex
- the SearchDataForIndex hook

So that they have a chance to build the content related to a specific
revision.

Ultimately we'd like to make this parameter mandatory.

Bug: T317309
Depends-On: I8b220cd6c4aeeca1d924bdd527409b8602318944
Depends-On: I8616b611caab3f5fa97ff0e655b19c3034304597
Change-Id: I3298ce7591069eb32f624b2c9fbb6de58ae04a29
2022-10-25 18:45:23 +02:00
Tim Starling
0077c5da15 Use short array destructuring instead of list()
Introduced in PHP 7.1. Because it's shorter and looks nice.

I used regex replacement.

Change-Id: I0555e199d126cd44501f859cb4589f8bd49694da
2022-10-21 15:33:37 +11:00
jenkins-bot
14d324c15f Merge "Add new ContentHandler::supportsPreloadContent() feature" 2022-07-06 22:38:10 +00:00
Thiemo Kreuz
6de15c17c1 Add new ContentHandler::supportsPreloadContent() feature
Enable it for JSON content.

Bug: T300644
Change-Id: Ia5c491cd856ca395fb431bcefd63026084b01a99
2022-07-06 22:19:32 +00:00
Tim Starling
e2c26e1774 Migrate risky callers of MediaWikiServices::getParser()
Don't call MediaWikiServices::getParser() from ContentHandler.
Always use ParserFactory::getInstance().

Bug: T310948
Change-Id: I5fcdc28111e0c5c7d4a76e69b3978402433ebad9
2022-07-05 14:09:36 +10:00
Tim Starling
f270881ca2 Deprecate Parser::getFreshParser()
Following up on the comment I made at Ibbc1423166f4804a5122, make Parser
instance management a ParserFactory responsibility. It is weird for
Parser to have a ParserFactory proxy aspect.

* Add ParserFactory::getMainInstance(), which is equivalent to the old
  MediaWikiServices::getParser() and $wgParser.
* Add ParserFactory::getInstance(), which is equivalent to
  $wgParser->getFreshInstance(), returning the main instance if it is
  free, or a new instance otherwise. The naming is supposed to encourage
  it as the default way to get a parser, which will help with the linked
  bug.
* Deprecate Parser::getFreshParser() and migrate all core callers.

I left the entry in ServiceWiring.php so that it's not immediately
necessary to migrate ObjectFactory specs that ask for Parser.

Bug: T310948
Change-Id: I762b191e978c2d1bbc9f332c9cfa047888ce2e67
2022-07-05 14:09:36 +10:00
Brian Wolff
bec8dada48 Clarify generate-html and make ParserOutput behave as expected
Previously:
* It was unclear that generate-html is an optional optimization
* Most of MediaWiki core was doing $parserOutput->setText('') if
html wasn't generated. However this is wrong and will cause
$parserOutput->hasText() to return true and also potentially cause
cache pollution if a content handler both does that and supports
parser cache (Like MassMessage; see T299896)
* The default value of mText in the constructor was '', and most
of the time MW used that default. This doesn't seem right. If
setText() is never called, the ParserOutput should not be considered
to have text
* It was impossible to set mText to null, as $parserOutput->setText(null)
was a no-op. Docs implied you were supposed to do this, so it was very
confusing.

This patch clarifies docs, changes the default value for ParserOutput::$mText
from '' to null, and makes $parserOutput->setText(null) do what you
expect it to. The last two are arguably breaking changes, although
the previous behaviours were unexpected, mostly undocumented and
based on a code search do not appear to be relied on.

It seems like the main reason this only broke MassMessage is most
content handlers either don't support generateHtml, or they don't
support parser cache.

Bug: T306591
Change-Id: I49cdf21411c6b02ac9a221a13393bebe17c7871e
Depends-On: I68ad491735b2df13951399312a4f9c37b63a08fa
2022-05-03 11:23:08 +02:00
Umherirrender
1f71eccf63 phan: Disable null_casts_as_any_type setting
Make phan stricter about null types by setting null_casts_as_any_type to
false (the default in mediawiki-phan-config)
Remaining false positive issues are suppressed.
The suppression and the setting change can only be done together

Bug: T242536
Bug: T301991
Change-Id: I0f295382b96fb3be8037a01c10487d9d591e7e01
2022-03-21 18:25:07 +00:00
C. Scott Ananian
75480cf1e0 Narrow the signature of ParserOutput::addModules() and ::addModuleStyles()
We always implicitly converted a string argument to an array anyway; just
ask the caller to do this instead so that we can have a simpler and
more straight-forward method signature which matches the plural form
of the method name.

Part of the ParserOutput API cleanup / Parsoid unification discussed
in T287216.

In a number of places we also rename $out to $parserOutput, to make it
easier for codesearch (and human readers) to distinguish between
ParserOutput and OutputPage methods.

Code search:

https://codesearch.wmcloud.org/deployed/?q=p%28arser%29%3F%28Out%7Cout%29%28put%29%3F-%3EaddModule%28Style%29%3Fs%5C%28&i=nope&files=&excludeFiles=&repos=
https://codesearch.wmcloud.org/deployed/?q=arser-%3EgetOutput%5C%28%5C%29-%3EaddModule%28Style%29%3Fs%5C%28&i=nope&files=&excludeFiles=&repos=

Bug: T296123
Depends-On: Iedea960bd450474966eb60ff8dfbf31c127025b6
Depends-On: I7900c5746a9ea75ce4918ffd97d45128038ab3f0
Depends-On: If29dc1d696b3a4c249fa9b150cedf2a502796ea1
Depends-On: I8f1bc7233a00382123a9b1b0bb549bd4dbc4a095
Depends-On: I52dda72aee6c7784a8961488c437863e31affc17
Depends-On: Ia1dcc86cb64f6aa39c68403d37bd76f970e55b97
Depends-On: Ib89ef9c900514d50173e13ab49d17c312b729900
Depends-On: If54244a0278d532c8553029c487c916068e1300f
Depends-On: I8d9b34f5d1ed5b1534bb29f5cd6edcdc086b71ca
Depends-On: I068f9f8e85e88a5c457d40e6a92f09b7eddd6b81
Depends-On: Iced2fc7b4f3cda5296532f22d233875bbc2f5d1b
Depends-On: If14866f76703aa62d33e197bb18a5eacde7a55c0
Depends-On: I9b7fe5acee73c3a378153c0820b46816164ebf21
Depends-On: I95858c08bce0d90709ac7771a910f73d78cc8be4
Depends-On: If9a70e8f8545d4f9ee3b605ad849dbd7de742fc1
Depends-On: I982c81e1ad73b58a90649648e19501cf9172d493
Depends-On: I53a8fd22b22c93bba703233b62377c49ba9f5562
Depends-On: Ic532bca4348b17882716fcb2ca8656a04766c095
Depends-On: If34330acf97d2c4e357b693b086264a718738fb1
Change-Id: Ie4d6bbe258cc483d5693f7a27dbccb60d8f37e2c
2022-01-20 13:14:20 -05:00
Tim Starling
d636ae57c1 In WikitextContentHandler always use getFreshParser()
Make it safe to parse articles while in the parser, by always calling
getFreshParser() from WikitextContentHandler.

I think ideally this should be a ParserFactory responsibility, with
Parser instances stored by ParserFactory instead of directly by
ServiceContainer, but this fixes the bug, follows existing conventions,
and does not reduce performance in the usual case.

Bug: T299149
Change-Id: Ibbc1423166f4804a5122de10293ea26f5704d96d
2022-01-14 09:36:02 +11:00
Derick Alangi
8fe9e0317f Introduce Redirect(Lookup&Store) services to handle redirects
The concept of a redirect chain didn't really work for a value of
max redirect > 1. In the ideal world, we just want to have a source
which points to target (source -> target) discarding the concept of
a redirect chain completely.

Having something like: source -> target -> target1 -> target2 doesn't
really work well with the current database design.

NOTE: Support for $wgMaxRedirect will be removed soon hence
deprecation without interfaces for replacement.

Bug: T290639
Change-Id: I469de6f85e405e8ddbe7abaa5b99b77cb9cf415d
2021-12-01 19:14:22 +01:00
C. Scott Ananian
06ab90f163 Add new ParserOutput::{get,set}OutputFlag() interface
This is a uniform mechanism to access a number of bespoke boolean
flags in ParserOutput.  It allows extensibility in core (by adding new
field names to ParserOutputFlags) without exposing new getter/setter
methods to Parsoid.  It replaces the ParserOutput::{get,set}Flag()
interface which (a) doesn't allow access to certain flags, and (b) is
typically called with a string rather than a constant, and (c) has a
very generic name.  (Note that Parser::setOutputFlag() already called
these "output flags".)

In the future we might unify the representation so that we store
everything in $mFlags and don't have explicit properties in
ParserOutput, but those representation details should be invisible to
the clients of this API.  (We might also use a proper enumeration
for ParserOutputFlags, when PHP supports this.)

There is some overlap with ParserOutput::{get,set}ExtensionData(), but
I've left those methods as-is because (a) they allow for non-boolean
data, unlike the *Flag() methods, and (b) it seems worthwhile to
distingush properties set by extensions from properties used by core.

Code search:
https://codesearch.wmcloud.org/search/?q=%5BOo%5Dut%28put%29%3F%28%5C%28%5C%29%29%3F-%3E%28g%7Cs%29etFlag%5C%28&i=nope&files=&excludeFiles=&repos=

Bug: T292868
Change-Id: I39bc58d207836df6f328c54be9e3330719cebbeb
2021-10-15 14:25:54 -04:00
Roman Stolar
a68e641f9d Move Content::getParserOutput & AbstractContent::fillParserOutput to ContentHandler
Update/Create override classes of ContentHandler.
Soft-deprecate and remove method from Content and classes that override them.

Bug: T287158
Change-Id: Idfcfbfe1a196cd69a04ca357281d08bb3d097ce2
2021-09-29 13:10:51 +03:00
Roman Stolar
42442e01ff Move Content::preloadTransform to ContentHandler
Update ContentTransformer to access ContentHandler::preLoadTransform through the service.
Prepare object to hold a data that required for ContentHandler::preLoadTranform params.

This is a fully backwards compatible change.
We are doing hard deprecation via MWDebug::detectDeprecatedOverride.

However, with the ContentHandler calling Content and
Content calling ContentHandler, it doesn't matter whether
callers use Content or ContentHandler. This will allow us
to naturally convert all callers.

Bug: T287157
Change-Id: I89537e1e7d24c6e15252b2b51890a0bd81ea3e6b
2021-08-17 15:17:34 +00:00
Petr Pchelko
bf438e8c87 Support deprecated Content::preSaveTransform override
If an exctension ContentHandler overrides one of the
subclasses of the core ContentHandler, for example
TextContentHandler, when switching calls we no longer
call deprecated Content::preSaveTransform for the
extension Content model.

Bug: T288191
Change-Id: Ie7edc97be9098f3cd188949bd37943c37a0b65ff
2021-08-05 08:56:47 -07:00
Petr Pchelko
b782a7e66d Move Content::preSaveTransform to ContentHandler
Create ContentTransformer to access ContentHandler::preSaveTransform through the service.
Prepare object to hold a data that required for ContentHandler::preSaveTranform params.

This will require making a semi-backwards-incompatible
change no matter what, we don't really have a great way
of hard-deprecating overriding methods.

However, with the ContentHandler calling Content and
Content calling ContentHandler, and with the ProxyContent
trick to stop infinite recursion, it doesn't matter whether
callers use Content or ContentHandler. This will allow us
to naturally convert all callers. But won't really allow
hard-deprecation.

Bug: T287156
Change-Id: If6a2025868ceca3a3b6f11baec39695e47292e40
2021-07-29 18:06:02 +03:00
Thiemo Kreuz
1fc8d79ac6 Remove documentation that literally repeats the code
For example, documenting the method getUser() with "get the User
object" does not add any information that's not already there.
But I have to read the text first to understand that it doesn't
document anything that's not already obvious from the code.

Some of this is from a time when we had a PHPCS sniff that was
complaining when a line like `@param User $user` doesn't end
with some descriptive text. Some users started adding text like
`@param User $user The User` back then. Let's please remove
this.

Change-Id: I0ea8d051bc732466c73940de9259f87ffb86ce7a
2020-10-27 19:20:26 +00:00
Ed Sanders
7683f7d839 Use strict (in)equality with namespaces constants when LHS is definitely an integer
Change-Id: I8fede00dfe1270d93c5d78d3c36e788cddfc8a99
2020-07-31 18:03:28 +01:00
Ed Sanders
0cf40a4f7a Flip Yoda conditionals
Change-Id: Id3495b6f15c267123c89f3a0ace496e6ecbeb58e
2020-07-22 17:49:12 +01:00
Petr Pchelko
204fa7e509 Remove usages of deprecated Language methods
Change-Id: Iad3375b141b1d87c890baec6ecd16ed92f93e699
2020-02-16 00:45:48 +00:00
daniel
54c70c3551 Deprecate Content::getNativeData, add TextContent::getText
getNativeData() is under-specified - callers can do nothing with the
value returned by getNativeData without knowing the concrete Content
class. And if they know the concrete class, they can and should use
a specialized getter instead, anyway.

Basically, getNativeData is overly generic, an example of polymorphism
done poorly. Let's fix it now.

Bug: T155582
Change-Id: Id2c61dcd38ab30416a25746e3680edb8791ae8e8
2019-01-16 11:57:50 -08:00
Aryeh Gregor
4bdae1c9d2 Convert remaining MagicWord:: calls to MagicWordFactory
Bug: T200247
Depends-On: Ie061fe90f9b9eca0cbf7e8199d9ca325c464867a
Change-Id: I49c507f3875e46a8e15fd2c28d61c17188aabffc
2018-08-01 10:47:43 +03:00
Bartosz Dziewoński
ecdef925bb Miscellaneous indentation tweaks
I was bored. What? Don't look at me that way.

I mostly targetted mixed tabs and spaces, but others were not spared.
Note that some of the whitespace changes are inside HTML output,
extended regexps or SQL snippets.

Change-Id: Ie206cc946459f6befcfc2d520e35ad3ea3c0f1e0
2017-02-27 19:23:54 +01:00
Stanislav Malyshev
2a395370fc Create fields & data for image/file data indexing
Bug: T145558
Change-Id: I23d4c8235d0e4150eefec31cea4b2cfdd32bf32a
2016-09-26 23:42:06 -07:00
jenkins-bot
fd8a5d4689 Merge "Remove SourceIndexField FLAG_SOURCE_DATA" 2016-09-01 18:59:55 +00:00
dcausse
2dc04ccdb4 Remove SourceIndexField FLAG_SOURCE_DATA
Change-Id: I080f06a5a09f2d67a153b491555d0dbf65c626d0
2016-09-01 17:03:56 +02:00
jenkins-bot
dc36560cdf Merge "Add DEFAULTSORT to search index field data" 2016-09-01 14:51:41 +00:00
dcausse
7c09f09432 Add DEFAULTSORT to search index field data
Added FLAG_SOURCE_DATA to support additional data that is not supposed to be
part of the default mapping.

Should merged with I1484c2e62788bedb57a42869a5fb25cd8f64482f, otherwize rebuilding
an index may add an extra field to CirrusSearch mapping.

Bug: T134978
Change-Id: Ia41f8eeb9dd4f764543bdd4d71b7a50de8101101
2016-08-29 16:51:57 +02:00
jenkins-bot
4b70bc2b28 Merge "Extract ParserOutput search index data fields from WikiTextContentHandler" 2016-08-19 18:40:17 +00:00
aude
64ee3d3269 Extract ParserOutput search index data fields from WikiTextContentHandler
Bug: T142491
Change-Id: I69b010b893135e53fac7f16f4b927b8fbcba06d2
2016-08-19 09:26:17 -04:00
Stanislav Malyshev
9053f5f2c6 Fix text extraction where we don't have proper file handler
Bug: T143251
Change-Id: I611f6a001bbcea971cc9126bd3f004622e88b47d
2016-08-17 13:54:23 -07:00
aude
c67536716d Call parent::getFieldsForSearchIndex in ContentHandlers
ContentHandler implementations were not including fields
defined by their parent ContentHandler classes.

merge method is added to the SearchIndexFieldDefinition
mock in SearchEngineTest, to allow merges of fields
in the way that SearchIndexFieldDefition implementation does.

Change-Id: Id04a51528f566da2666bad0394a2f61c949c69b4
2016-08-15 19:33:09 -04:00
Kunal Mehta
3cb341b185 content: Use "::class" when overriding TextContent::getContentClass()
Change-Id: Iea03d2cd24fdb90253145a8abfefe9f8a09e46cd
2016-08-12 21:16:37 -07:00
Stanislav Malyshev
add1ebe2ab Make content handlers assemble content for search
Bug: T89733
Change-Id: Ie45de496ecc826211d98eea3a410c7639b4be0a4
2016-07-26 13:08:45 -07:00