Thijs/wiki.techinc.nl

Author	SHA1	Message	Date
Reedy	d3b14ec862	WikiTextStructure/WikitextContentHandler: Minor cleanup Change-Id: If2f8243867994609d82618e61ddaaacca3516990	2024-02-01 08:01:27 +00:00
James D. Forrester	9bfb75ff90	Namespace ParserOutput Most used non-namespaced class! Bug: T353458 Change-Id: I4c2cbb0a808b3881a4d6ca489eee5d8c8ebf26cf	2023-12-14 14:57:34 -05:00
daniel	eb881d9b59	Remove deprecated methods from Content interface Several methods on the Content interface had been deprecated in 1.35 and 1.36 in favor of corresponding methods on the ContentHandler base class, to allow implementations of these methods to use proper dependency injection. This patch removes backwards compatibility support for subclasses that were overriding these methods. Change-Id: I8e474a1cc4dec760a7f6db25e4b313392f3723b1	2023-11-21 12:40:11 +01:00
Subramanya Sastry	f4a9eb992a	Pass full content to Parsoid for redirect pages * See inline comment in WikitextContentHandler that explains what is happening here. * Added a new WikitextContentHandlerIntegration test to verify this expectation (which fails without the change in this patch). Bug: T349087 Change-Id: I072ddf89562fe79bad47d741feb5788430e05bb6	2023-10-17 13:12:17 -04:00
daniel	954d4c5c59	Move getRedirectTargetAndText out of WikitextContent WikitextContent should be a value object that does not need access to services. For this reason, getRedirectTargetAndText has to be moved to WikitextContentHandler. Change-Id: Ia6595caf7e913ef580709a4d076aa2cc9dbaacef	2023-09-22 17:08:20 -04:00
C. Scott Ananian	097983e9d7	Unit test for LinkRenderer::makeRedirectHeader() used by WikitextContentHandler Change-Id: I3577633670d0c2a771c690e3d6601300e1867222	2023-09-22 17:08:20 -04:00
C. Scott Ananian	07b396d5b5	Move Article::getRedirectHeaderHtml() to LinkRenderer::makeRedirectHeader() The use of Article::getRedirectHeaderHtml() has been discouraged for a while, since WikitextContentHandler can (should) be used to insert the redirect header. Further, since I20db09619999919bfeda997d79561d21e3bf8718 the header should be added as an extension data property instead of directly concatenated to the HTML. Regardless, this functionality logically should live in LinkRenderer. Change-Id: I4d0de0e72473ae039dca420a2733bc746d8c2951	2023-09-22 17:08:17 -04:00
C. Scott Ananian	3dd695a1ec	WikitextContentHandler/ParserOutput: move redirect header to post processing Insert the redirect handler as part of the post-processing done in ParserOutput::getText(). This ensures that it does not corrupt edit-mode Parsoid output. Depends-On: Ia6e390d849830993a6b97004f099773cfd4fa54b Change-Id: I20db09619999919bfeda997d79561d21e3bf8718	2023-09-15 15:20:01 -04:00
thiemowmde	f549005179	content,maintenance: Use class-string<ClassName> in doc blocks Start using `class-string<ClassName>` as a type hint in a few places where the information is really helpful. A lot of tools are able to understand this already. Change-Id: Ide45cae8c7875e664fab1155c6c720e515d8d811	2023-07-31 17:14:09 +00:00
Bartosz Dziewoński	6ba47296d9	Fix Phan suppressions related to Title::castFrom() and friends There is no way to express that Title::castFromPageIdentity(), Title::castFromPageReference() and Title::castFromLinkTarget() can only return null when the parameter is null. We need to add Phan suppressions or explicit types almost everywhere that these methods are used with parameters that are known to not be null. Instead, introduce new methods Title::newFromPageIdentity() and Title::newFromPageReference() (Title::newFromLinkTarget() already exists), without the null-coalescing behavior, and use them when the parameter is not null. This lets static analysis tools, and humans, easily understand where nulls can't appear. Do the same with the corresponding TitleFactory methods. Change the obvious uses of castFrom() to newFrom*() (if there is a Phan suppression, a type check, or a method call on the result). Change-Id: Ida4da75953cf3bca372a40dc88022443109ca0cb	2023-04-22 16:45:09 +02:00
C. Scott Ananian	cfd9c516e1	Allow setting a ParserOption to generate Parsoid HTML This is an initial quick-and-dirty implementation. The ParsoidParser class will eventually inherit from \Parser, but this is an initial placeholder to unblock other Parsoid read views work. Currently Parsoid does not fully implement all the ParserOutput metadata set by the legacy parser, but we're working on it. This patch also addresses T300325 by ensuring the the Page HTML APIs use ParserOutput::getRawText(), which will return the entire Parsoid HTML document without post-processing. This is what the Parsoid team refers to as "edit mode" HTML. The ParserOutput::getText() method returns only the <body> contents of the HTML, and applies several transformations, including inserting Table of Contents and style deduplication; this is the "read views" flavor of the Parsoid HTML. We need to be careful of the interaction of the `useParsoid` flag with the ParserCacheMetadata. Effectively `useParsoid` should always be marked as "used" or else the ParserCache will assume its value doesn't matter and will serve legacy content for parsoid requests and vice-versa. T330677 is a follow up to address this more thoroughly by splitting the parser cache in ParserOutputAccess; the stop gap in this patch is fragile and, because it doesn't fork the ParserCacheMetadata cache, may corrupt the ParserCacheMetadata in the case when Parsoid and the legacy parser consult different sets of options to render a page. Bug: T300191 Bug: T330677 Bug: T300325 Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011	2023-03-26 21:46:05 -04:00
James D. Forrester	ad06527fb4	Reorg: Namespace the Title class This is moderately messy. Process was principally: * xargs rg --files-with-matches '^use Title;' \| grep 'php$' \| \ xargs -P 1 -n 1 sed -i -z 's/use Title;/use MediaWiki\\Title\\Title;/1' * rg --files-without-match 'MediaWiki\\Title\\Title;' . \| grep 'php$' \| \ xargs rg --files-with-matches 'Title\b' \| \ xargs -P 1 -n 1 sed -i -z 's/\nuse /\nuse MediaWiki\\Title\\Title;\nuse /1' * composer fix Then manual fix-ups for a few files that don't have any use statements. Bug: T166010 Follows-Up: Ia5d8cb759dc3bc9e9bbe217d0fb109e2f8c4101a Change-Id: If8fc9d0d95fc1a114021e282a706fc3e7da3524b	2023-03-02 08:46:53 -05:00
jenkins-bot	7a01a3b0c9	Merge "search: Set file_text to null when not available"	2023-01-05 08:48:47 +00:00
Amir Sarabadani	a1b4699fea	Reorg: Move MagicWord related files to under parser/ This is approved as part of T166010 RFC. Bug: T321882 Change-Id: Ia4498c0a20e38a6a288dc14065ea8242c84fbc49	2022-12-09 13:48:35 +01:00
daniel	090ec5777d	Use services in WikitextContentHandler Change-Id: I626b5ee9a070ad3a97ab9ac9f44cb7003d68bf13	2022-12-06 15:44:40 -05:00
Erik Bernhardson	3ff25f3fa9	search: Set file_text to null when not available file_text has previously been set to false or an empty array, depending on context, when it wasn't available. As part of normalizing the set of types used in the search index default the value to null and only set it if the media handler is able to extract text content from it. Setting the value to null when not available is done to clear out historical false/empty array values in storage. Perhaps we need to think more in the future about if/when default values should be provided to search updates, everything is a bit ad-hoc today. Bug: T322327 Change-Id: I1367154b17d9e69c9373e7efee384838aa3b51e8	2022-12-02 18:21:16 +00:00
David Causse	9fbd8f500f	Make the doc building for search aware of the revision Added an optional RevisionRecord param to: - ContentHandler::getParserOutputForIndexing - ContentHandler::getDataForSearchIndex - the SearchDataForIndex hook So that they have a chance to build the content related to a specific revision. Ultimately we'd like to make this parameter mandatory. Bug: T317309 Depends-On: I8b220cd6c4aeeca1d924bdd527409b8602318944 Depends-On: I8616b611caab3f5fa97ff0e655b19c3034304597 Change-Id: I3298ce7591069eb32f624b2c9fbb6de58ae04a29	2022-10-25 18:45:23 +02:00
Tim Starling	0077c5da15	Use short array destructuring instead of list() Introduced in PHP 7.1. Because it's shorter and looks nice. I used regex replacement. Change-Id: I0555e199d126cd44501f859cb4589f8bd49694da	2022-10-21 15:33:37 +11:00
jenkins-bot	14d324c15f	Merge "Add new ContentHandler::supportsPreloadContent() feature"	2022-07-06 22:38:10 +00:00
Thiemo Kreuz	6de15c17c1	Add new ContentHandler::supportsPreloadContent() feature Enable it for JSON content. Bug: T300644 Change-Id: Ia5c491cd856ca395fb431bcefd63026084b01a99	2022-07-06 22:19:32 +00:00
Tim Starling	e2c26e1774	Migrate risky callers of MediaWikiServices::getParser() Don't call MediaWikiServices::getParser() from ContentHandler. Always use ParserFactory::getInstance(). Bug: T310948 Change-Id: I5fcdc28111e0c5c7d4a76e69b3978402433ebad9	2022-07-05 14:09:36 +10:00
Tim Starling	f270881ca2	Deprecate Parser::getFreshParser() Following up on the comment I made at Ibbc1423166f4804a5122, make Parser instance management a ParserFactory responsibility. It is weird for Parser to have a ParserFactory proxy aspect. * Add ParserFactory::getMainInstance(), which is equivalent to the old MediaWikiServices::getParser() and $wgParser. * Add ParserFactory::getInstance(), which is equivalent to $wgParser->getFreshInstance(), returning the main instance if it is free, or a new instance otherwise. The naming is supposed to encourage it as the default way to get a parser, which will help with the linked bug. * Deprecate Parser::getFreshParser() and migrate all core callers. I left the entry in ServiceWiring.php so that it's not immediately necessary to migrate ObjectFactory specs that ask for Parser. Bug: T310948 Change-Id: I762b191e978c2d1bbc9f332c9cfa047888ce2e67	2022-07-05 14:09:36 +10:00
Brian Wolff	bec8dada48	Clarify generate-html and make ParserOutput behave as expected Previously: * It was unclear that generate-html is an optional optimization * Most of MediaWiki core was doing $parserOutput->setText('') if html wasn't generated. However this is wrong and will cause $parserOutput->hasText() to return true and also potentially cause cache pollution if a content handler both does that and supports parser cache (Like MassMessage; see T299896) * The default value of mText in the constructor was '', and most of the time MW used that default. This doesn't seem right. If setText() is never called, the ParserOutput should not be considered to have text * It was impossible to set mText to null, as $parserOutput->setText(null) was a no-op. Docs implied you were supposed to do this, so it was very confusing. This patch clarifies docs, changes the default value for ParserOutput::$mText from '' to null, and makes $parserOutput->setText(null) do what you expect it to. The last two are arguably breaking changes, although the previous behaviours were unexpected, mostly undocumented and based on a code search do not appear to be relied on. It seems like the main reason this only broke MassMessage is most content handlers either don't support generateHtml, or they don't support parser cache. Bug: T306591 Change-Id: I49cdf21411c6b02ac9a221a13393bebe17c7871e Depends-On: I68ad491735b2df13951399312a4f9c37b63a08fa	2022-05-03 11:23:08 +02:00
Umherirrender	1f71eccf63	phan: Disable null_casts_as_any_type setting Make phan stricter about null types by setting null_casts_as_any_type to false (the default in mediawiki-phan-config) Remaining false positive issues are suppressed. The suppression and the setting change can only be done together Bug: T242536 Bug: T301991 Change-Id: I0f295382b96fb3be8037a01c10487d9d591e7e01	2022-03-21 18:25:07 +00:00
C. Scott Ananian	75480cf1e0	Narrow the signature of ParserOutput::addModules() and ::addModuleStyles() We always implicitly converted a string argument to an array anyway; just ask the caller to do this instead so that we can have a simpler and more straight-forward method signature which matches the plural form of the method name. Part of the ParserOutput API cleanup / Parsoid unification discussed in T287216. In a number of places we also rename $out to $parserOutput, to make it easier for codesearch (and human readers) to distinguish between ParserOutput and OutputPage methods. Code search: https://codesearch.wmcloud.org/deployed/?q=p%28arser%29%3F%28Out%7Cout%29%28put%29%3F-%3EaddModule%28Style%29%3Fs%5C%28&i=nope&files=&excludeFiles=&repos= https://codesearch.wmcloud.org/deployed/?q=arser-%3EgetOutput%5C%28%5C%29-%3EaddModule%28Style%29%3Fs%5C%28&i=nope&files=&excludeFiles=&repos= Bug: T296123 Depends-On: Iedea960bd450474966eb60ff8dfbf31c127025b6 Depends-On: I7900c5746a9ea75ce4918ffd97d45128038ab3f0 Depends-On: If29dc1d696b3a4c249fa9b150cedf2a502796ea1 Depends-On: I8f1bc7233a00382123a9b1b0bb549bd4dbc4a095 Depends-On: I52dda72aee6c7784a8961488c437863e31affc17 Depends-On: Ia1dcc86cb64f6aa39c68403d37bd76f970e55b97 Depends-On: Ib89ef9c900514d50173e13ab49d17c312b729900 Depends-On: If54244a0278d532c8553029c487c916068e1300f Depends-On: I8d9b34f5d1ed5b1534bb29f5cd6edcdc086b71ca Depends-On: I068f9f8e85e88a5c457d40e6a92f09b7eddd6b81 Depends-On: Iced2fc7b4f3cda5296532f22d233875bbc2f5d1b Depends-On: If14866f76703aa62d33e197bb18a5eacde7a55c0 Depends-On: I9b7fe5acee73c3a378153c0820b46816164ebf21 Depends-On: I95858c08bce0d90709ac7771a910f73d78cc8be4 Depends-On: If9a70e8f8545d4f9ee3b605ad849dbd7de742fc1 Depends-On: I982c81e1ad73b58a90649648e19501cf9172d493 Depends-On: I53a8fd22b22c93bba703233b62377c49ba9f5562 Depends-On: Ic532bca4348b17882716fcb2ca8656a04766c095 Depends-On: If34330acf97d2c4e357b693b086264a718738fb1 Change-Id: Ie4d6bbe258cc483d5693f7a27dbccb60d8f37e2c	2022-01-20 13:14:20 -05:00
Tim Starling	d636ae57c1	In WikitextContentHandler always use getFreshParser() Make it safe to parse articles while in the parser, by always calling getFreshParser() from WikitextContentHandler. I think ideally this should be a ParserFactory responsibility, with Parser instances stored by ParserFactory instead of directly by ServiceContainer, but this fixes the bug, follows existing conventions, and does not reduce performance in the usual case. Bug: T299149 Change-Id: Ibbc1423166f4804a5122de10293ea26f5704d96d	2022-01-14 09:36:02 +11:00
Derick Alangi	8fe9e0317f	Introduce `Redirect(Lookup&Store)` services to handle redirects The concept of a redirect chain didn't really work for a value of max redirect > 1. In the ideal world, we just want to have a source which points to target (source -> target) discarding the concept of a redirect chain completely. Having something like: source -> target -> target1 -> target2 doesn't really work well with the current database design. NOTE: Support for $wgMaxRedirect will be removed soon hence deprecation without interfaces for replacement. Bug: T290639 Change-Id: I469de6f85e405e8ddbe7abaa5b99b77cb9cf415d	2021-12-01 19:14:22 +01:00
C. Scott Ananian	06ab90f163	Add new ParserOutput::{get,set}OutputFlag() interface This is a uniform mechanism to access a number of bespoke boolean flags in ParserOutput. It allows extensibility in core (by adding new field names to ParserOutputFlags) without exposing new getter/setter methods to Parsoid. It replaces the ParserOutput::{get,set}Flag() interface which (a) doesn't allow access to certain flags, and (b) is typically called with a string rather than a constant, and (c) has a very generic name. (Note that Parser::setOutputFlag() already called these "output flags".) In the future we might unify the representation so that we store everything in $mFlags and don't have explicit properties in ParserOutput, but those representation details should be invisible to the clients of this API. (We might also use a proper enumeration for ParserOutputFlags, when PHP supports this.) There is some overlap with ParserOutput::{get,set}ExtensionData(), but I've left those methods as-is because (a) they allow for non-boolean data, unlike the *Flag() methods, and (b) it seems worthwhile to distingush properties set by extensions from properties used by core. Code search: https://codesearch.wmcloud.org/search/?q=%5BOo%5Dut%28put%29%3F%28%5C%28%5C%29%29%3F-%3E%28g%7Cs%29etFlag%5C%28&i=nope&files=&excludeFiles=&repos= Bug: T292868 Change-Id: I39bc58d207836df6f328c54be9e3330719cebbeb	2021-10-15 14:25:54 -04:00
Roman Stolar	a68e641f9d	Move Content::getParserOutput & AbstractContent::fillParserOutput to ContentHandler Update/Create override classes of ContentHandler. Soft-deprecate and remove method from Content and classes that override them. Bug: T287158 Change-Id: Idfcfbfe1a196cd69a04ca357281d08bb3d097ce2	2021-09-29 13:10:51 +03:00
Roman Stolar	42442e01ff	Move Content::preloadTransform to ContentHandler Update ContentTransformer to access ContentHandler::preLoadTransform through the service. Prepare object to hold a data that required for ContentHandler::preLoadTranform params. This is a fully backwards compatible change. We are doing hard deprecation via MWDebug::detectDeprecatedOverride. However, with the ContentHandler calling Content and Content calling ContentHandler, it doesn't matter whether callers use Content or ContentHandler. This will allow us to naturally convert all callers. Bug: T287157 Change-Id: I89537e1e7d24c6e15252b2b51890a0bd81ea3e6b	2021-08-17 15:17:34 +00:00
Petr Pchelko	bf438e8c87	Support deprecated Content::preSaveTransform override If an exctension ContentHandler overrides one of the subclasses of the core ContentHandler, for example TextContentHandler, when switching calls we no longer call deprecated Content::preSaveTransform for the extension Content model. Bug: T288191 Change-Id: Ie7edc97be9098f3cd188949bd37943c37a0b65ff	2021-08-05 08:56:47 -07:00
Petr Pchelko	b782a7e66d	Move Content::preSaveTransform to ContentHandler Create ContentTransformer to access ContentHandler::preSaveTransform through the service. Prepare object to hold a data that required for ContentHandler::preSaveTranform params. This will require making a semi-backwards-incompatible change no matter what, we don't really have a great way of hard-deprecating overriding methods. However, with the ContentHandler calling Content and Content calling ContentHandler, and with the ProxyContent trick to stop infinite recursion, it doesn't matter whether callers use Content or ContentHandler. This will allow us to naturally convert all callers. But won't really allow hard-deprecation. Bug: T287156 Change-Id: If6a2025868ceca3a3b6f11baec39695e47292e40	2021-07-29 18:06:02 +03:00
Thiemo Kreuz	1fc8d79ac6	Remove documentation that literally repeats the code For example, documenting the method getUser() with "get the User object" does not add any information that's not already there. But I have to read the text first to understand that it doesn't document anything that's not already obvious from the code. Some of this is from a time when we had a PHPCS sniff that was complaining when a line like `@param User $user` doesn't end with some descriptive text. Some users started adding text like `@param User $user The User` back then. Let's please remove this. Change-Id: I0ea8d051bc732466c73940de9259f87ffb86ce7a	2020-10-27 19:20:26 +00:00
Ed Sanders	7683f7d839	Use strict (in)equality with namespaces constants when LHS is definitely an integer Change-Id: I8fede00dfe1270d93c5d78d3c36e788cddfc8a99	2020-07-31 18:03:28 +01:00
Ed Sanders	0cf40a4f7a	Flip Yoda conditionals Change-Id: Id3495b6f15c267123c89f3a0ace496e6ecbeb58e	2020-07-22 17:49:12 +01:00
Petr Pchelko	204fa7e509	Remove usages of deprecated Language methods Change-Id: Iad3375b141b1d87c890baec6ecd16ed92f93e699	2020-02-16 00:45:48 +00:00
daniel	54c70c3551	Deprecate Content::getNativeData, add TextContent::getText getNativeData() is under-specified - callers can do nothing with the value returned by getNativeData without knowing the concrete Content class. And if they know the concrete class, they can and should use a specialized getter instead, anyway. Basically, getNativeData is overly generic, an example of polymorphism done poorly. Let's fix it now. Bug: T155582 Change-Id: Id2c61dcd38ab30416a25746e3680edb8791ae8e8	2019-01-16 11:57:50 -08:00
Aryeh Gregor	4bdae1c9d2	Convert remaining MagicWord:: calls to MagicWordFactory Bug: T200247 Depends-On: Ie061fe90f9b9eca0cbf7e8199d9ca325c464867a Change-Id: I49c507f3875e46a8e15fd2c28d61c17188aabffc	2018-08-01 10:47:43 +03:00
Bartosz Dziewoński	ecdef925bb	Miscellaneous indentation tweaks I was bored. What? Don't look at me that way. I mostly targetted mixed tabs and spaces, but others were not spared. Note that some of the whitespace changes are inside HTML output, extended regexps or SQL snippets. Change-Id: Ie206cc946459f6befcfc2d520e35ad3ea3c0f1e0	2017-02-27 19:23:54 +01:00
Stanislav Malyshev	2a395370fc	Create fields & data for image/file data indexing Bug: T145558 Change-Id: I23d4c8235d0e4150eefec31cea4b2cfdd32bf32a	2016-09-26 23:42:06 -07:00
jenkins-bot	fd8a5d4689	Merge "Remove SourceIndexField FLAG_SOURCE_DATA"	2016-09-01 18:59:55 +00:00
dcausse	2dc04ccdb4	Remove SourceIndexField FLAG_SOURCE_DATA Change-Id: I080f06a5a09f2d67a153b491555d0dbf65c626d0	2016-09-01 17:03:56 +02:00
jenkins-bot	dc36560cdf	Merge "Add DEFAULTSORT to search index field data"	2016-09-01 14:51:41 +00:00
dcausse	7c09f09432	Add DEFAULTSORT to search index field data Added FLAG_SOURCE_DATA to support additional data that is not supposed to be part of the default mapping. Should merged with I1484c2e62788bedb57a42869a5fb25cd8f64482f, otherwize rebuilding an index may add an extra field to CirrusSearch mapping. Bug: T134978 Change-Id: Ia41f8eeb9dd4f764543bdd4d71b7a50de8101101	2016-08-29 16:51:57 +02:00
jenkins-bot	4b70bc2b28	Merge "Extract ParserOutput search index data fields from WikiTextContentHandler"	2016-08-19 18:40:17 +00:00
aude	64ee3d3269	Extract ParserOutput search index data fields from WikiTextContentHandler Bug: T142491 Change-Id: I69b010b893135e53fac7f16f4b927b8fbcba06d2	2016-08-19 09:26:17 -04:00
Stanislav Malyshev	9053f5f2c6	Fix text extraction where we don't have proper file handler Bug: T143251 Change-Id: I611f6a001bbcea971cc9126bd3f004622e88b47d	2016-08-17 13:54:23 -07:00
aude	c67536716d	Call parent::getFieldsForSearchIndex in ContentHandlers ContentHandler implementations were not including fields defined by their parent ContentHandler classes. merge method is added to the SearchIndexFieldDefinition mock in SearchEngineTest, to allow merges of fields in the way that SearchIndexFieldDefition implementation does. Change-Id: Id04a51528f566da2666bad0394a2f61c949c69b4	2016-08-15 19:33:09 -04:00
Kunal Mehta	3cb341b185	content: Use "::class" when overriding TextContent::getContentClass() Change-Id: Iea03d2cd24fdb90253145a8abfefe9f8a09e46cd	2016-08-12 21:16:37 -07:00
Stanislav Malyshev	add1ebe2ab	Make content handlers assemble content for search Bug: T89733 Change-Id: Ie45de496ecc826211d98eea3a410c7639b4be0a4	2016-07-26 13:08:45 -07:00

1 2

65 commits