Commit graph

2321 commits

Author SHA1 Message Date
Arlo Breault
fdd8f864b8 Emit media structure as piloted in Parsoid
Gated behind the flag $wgParserEnableLegacyMediaDOM.  The scattershot
usage of it is a little unfortunate but isn't expected to live very long
so maybe that's acceptable.

Further details can be found at,
https://www.mediawiki.org/wiki/Parsing/Media_structure

Bug: T51097
Bug: T266148
Bug: T271129
Change-Id: I978187f9f6e9e0a105521ab3e26821e36a96b911
2021-06-24 23:32:40 +00:00
jenkins-bot
f3ecfead48 Merge "Add <figure> to the never suppressing group in BlockLevelPass" 2021-06-24 19:37:41 +00:00
Arlo Breault
5438048386 Add <figure> to the never suppressing group in BlockLevelPass
This matches Parsoid and shouldn't be an issue since nothing in core is
generating <figure>s yet, presumably.

Bug: T51097
Change-Id: I9f489b13d5d4db0415a3f0f0dbb936c8e89461fa
2021-06-24 15:07:04 -04:00
DannyS712
29ec3ec7e3 Remove $wgUser fallback in ParserOptions
ParserOptions::__construct() and ::newCanonical()
no longer accept null and fallback to the global
$wgUser - instead, ::__construct() has a typehint
for a UserIdentity, and ::newCanonical() will throw
an exception.

Bug: T284977
Change-Id: I35865e160190582ab10abaa696c6fc6686cc8989
2021-06-24 02:55:20 +00:00
DannyS712
47d70dbfba Post Revision-removal cleanup
Updates for the removal of the Revision class itself
and the various methods/hooks/variables removed in the
process, including:

- Update some documentation removing most references
to the Revision class and updating the MCR migration
notes to reflect the past tense for Revision methods.

- Change some capitalization from "Revision" to "revision"
to make it clear comments are about revisions in general,
not the Revision class in particular.

- Minor code tweaks including removing unused variables that
were around for the old hooks that were removed, and
removing the use of DeprecatablePropertyArray where no
longer needed for anything.

- Fix incorrect documentation for PageUpdater::getStatus(),
the status value changed a while ago to have revision-record
in addition to revision, and recently to only have the
revision-record, but ironically PageUpdater was never updated.

- Removed Parser::$mRevisionObject, used to be a Revision object
and was deprecated in 1.35, missed earlier because it was no
longer being set to Revision objects, always null.

- Add RevisionRecord typehints in DummyLinker to match those
in the corresponding Linker methods

This should be a no-op in terms of functionality.

Bug: T247143
Change-Id: I03bbb94fc29085855448780b1a5ad9063911ecc4
2021-06-24 00:32:39 +00:00
Thiemo Kreuz
2ba01c7ee7 Remove some more comments that literally repeat the code
… including PHPDoc tags like `@return <type> $variableName`.
A return value doesn't have a variable name. I can see that
some people do this intentionally, repeating the variable
name that was used in the final `return $var;` at the end
of a method. This can indeed be helpful. I leave a lot of
these untouched and removed them only when it's obviously
wrong, or does not provide any additional information in
addition to what the code already says.

Change-Id: Ia18cd9f25ef658b08ad25b97a744897e2a8deffc
2021-06-18 21:23:56 +00:00
Thiemo Kreuz
51777ee8c1 Add and fix various type hints in PHPDocs
Random fixes I collected the past weeks in my local dev
environment.

Change-Id: Ic8a6262fd28e05cb57335f2faf390a47ff97dbaa
2021-06-18 08:19:23 +00:00
Tim Starling
9c3c0b704b Use array_fill_keys() instead of array_flip() if that reflects the developer's intention
array_fill_keys() was introduced in PHP 5.2.0 and works like
array_flip() except that it does only one thing (copying keys) instead
of two things (copying keys and values). That makes it faster and more
obvious.

When array_flip() calls were paired, I left them as is, because that
pattern is too cute. I couldn't kill something so cute.

Sometimes it was hard to figure out whether the values in array_flip()
result were used. That's the point of this change. If you use
array_fill_keys(), the intention is obvious.

Change-Id: If8d340a8bc816a15afec37e64f00106ae45e10ed
2021-06-15 00:11:10 +00:00
Petr Pchelko
92564edc7c Use Message::page instead of Message::title
Also modified new APIs added to ApiErrorFormatter to
use PageReference instead of Title.

Change-Id: I093c89f8e1e6d383603f887358be6ece70f23a02
2021-06-09 13:18:22 +00:00
Petr Pchelko
fb6529e653 FileRepo::findFile - support Authority
Change-Id: Ib42b7f7d5aa88447b4fb363f52062b08a1af30c3
2021-05-26 19:01:12 -07:00
DannyS712
8ec531f3c8 Add @since to some Parser methods
Change-Id: If8fd6fe6f0cf3aeaeb43e3fdccc774089b9c058e
2021-05-20 11:21:58 -07:00
Umherirrender
8831b494c1 Use @deprecated annotation on hook interfaces, not functions
Use only one place to document the deprecation of hook
interfaces/functions

Bug: T282903
Change-Id: Ie7d2d7a50afe2897e5c2369f473a33ecaa821637
2021-05-17 23:00:40 +02:00
jenkins-bot
f808e7dc4a Merge "WikiPage: Document triggerOpportunisticLinksUpdate and related code" 2021-05-05 22:41:05 +00:00
Timo Tijhof
481f1a49d6 WikiPage: Document triggerOpportunisticLinksUpdate and related code
== History of WikiPage::triggerOpportunisticLinksUpdate ==

* 2007 (r19095; T10575; b3a8d488a8)

  Introduces the "cascading protection" feature.

  This commit added code to Article.php, in a conditional branch
  where we encountered a ParserCache "miss" and thus have done a
  fresh parse. The code in question would query which templates
  we ended up using, and if that differed from what the database
  said (e.g. stored during the last actual edit or links update),
  then a new LinksUpdate is ad-hoc constructed and executed.

  I could not find it anywhere explicitly spelled out, but my best
  guess is that the reason for this is to make sure that if the page
  in question contains wikitext that trancludes a different page based
  on the current date and time (such as how most Wikipedia main pages
  transclude news information and "Did you know" information based on
  dated subpages that are prepared in advance), then we don't just
  want to re-render the page after a day has passed, we also want to
  re-do the links update to ensure the search index, category links,
  and "WhatLinksHere" is correct, and thus by extent, to make sure
  that cascading protection from the main page does in fact apply
  to the "current" set of subpages and templates actually in-use.

* 2007 (r19227; 0c0c0eff81)

  This adds an optimisation to the added logic that limits it to
  pages that satisfy `mTitle->areRestrictionsCascading()`.

  Thus for most articles, which aren't protected at all, we don't
  run LinksUpdate mid-request after a cache miss page view.

  Because of this commit, the pre-2007 status quo remained unaltered
  and has remains unaltered to this very day: We don't re-index
  categories and WhatLinksHere etc, unless an article edit or
  propagating template edit takes place.

* 2009 (r52888; 1353a8ba29)

  Introduces the PoolCounter feature.

  The logic in question moves to Article::doCascadeProtectionUpdates().

* 2015 (Iea952d4d2e66; df5ef8b5d7).

  The logic in question is changed, motivated by wanting to avoid
  DB writes during page views.

  * Instead of executing LinksUpdate mid-request, we now queue a
    RefreshLinksJob on the JobQueue, and utilize a newly added
    `prioritize => true` parameter.

  This commit also introduces a new feature, which is to queue
  RefreshLinksJob also for pages that do not have cascading
  protection, but that do satisfy a new boolean method
  called `$parserOutput->hasDynamicContent()`, which is set when
  the Parser encounters TTL-reducing magic words and functions
  such as {{CURRENTDAY}} and {{#time}}. For this new case, however,
  the `prioritize` parameter is not set, and this feature is disabled
  in WMF production (and other farms that enable wgMiserMode).

  This commit also renamed doCascadeProtectionUpdates()
  to triggerOpportunisticLinksUpdate().

  This commit also removed various documentation comments, which
  I've partly restored in this patch, the patch you're looking at
  now.

== Actual changes ==

* Rename hasDynamicContent() to hasReducedExpiry() and keep the
  previous method as a non-deprecated wrapper.

  This change is motivated by T280605, in which I intent to make use
  of a Parser hook that reduces the cache expiry. There are numerous
  extensions in WMF production that already do this, and thus the
  assumption that these have "dynamic content" is already false in
  some cases. I'm not yet sure how or if to refactor this so to allow
  reducing of the TTL *without* causing this side-effect, but as a
  first step we can make the method more obvious in its impact
  and behaviour.

  I've also updated two of the callers that I think will benefit from
  this more explicit name and (current) implementation detail.

Bug: T280605
Change-Id: I85bdff7f86911f8ea5b866e3639f08ddd3f3bf6f
2021-05-05 02:03:30 +01:00
DannyS712
f60ea069ba Remove remaining non-test uses of Revision objects
The following methods no longer support Revision parameters:
- CategoryMembershipChange::__construct
- ContentHandler::getUndoContent
- DerivedPageDataUpdater::prepareUpdate
- DifferenceEngine::getRevisionHeader

The following methods were removed entirely:
- Title::countAuthorsBetween

The following methods return arrays that formerly include
a 'revision' key that would emit deprecation warnings when
accessed and return a Revision object. The Revision object
has been removed from the arrays, and the 'revision-record'
key should be used to get the relevant RevisionRecord instead:
- PageUpdater::doModify
- PageUpdater::doCreate
- Parser::statelessFetchTemplate

The ParserOptions `templateCallback` option is a callback
that is called in Parser::fetchTemplateAndTitle() and should
return an array - the 'revision' key to that array used to
be a Revision object and was used if no 'revision-record'
was returned - it is now ignored.

Bug: T247143
Change-Id: I163ada88d649c75697aff4fa31a3a3c0bdef78b7
2021-05-04 13:10:22 -07:00
DannyS712
7bd7d2a6c1 Remove hooks that use Revision objects
All hooks were previously hard deprecated
in 1.35. Affected hooks:
* ArticleRevisionUndeleted - use RevisionUndeleted
* ArticleRollbackComplete - use RollbackComplete
* DiffRevisionTools - use DiffTools
* DiffViewHeader - use DifferenceEngineViewHeader
* HistoryRevisionTools - use HistoryTools
* NewRevisionFromEditComplete - use RevisionFromEditComplete
* PageContentInsertComplete - use PageSaveComplete
* PageContentSaveComplete - use PageSaveComplete
* ParserFetchTemplate - use BeforeParserFetchTemplateRevisionRecord
* RevisionInsertComplete - use RevisionRecordInserted
* TitleMoveComplete - use PageMoveComplete
* TitleMoveCompleting - use PageMoveCompleting
* UndeleteShowRevision - no replacement

Includes a fix for setting the associated rev id
of page protections, which previously was only done
using $nullRevision which was a Revision object created
if any hooks needed it; those hooks were hard deprecated
and so for WMF prod the rev id was not being set.

Bug: T247143
Depends-On: Idfa345193ae99fb2f1c9a8f8d28d8d540a6e3d62
Change-Id: I519167f76a5a3c1f5410415b2721462a3dcc3ec8
2021-04-30 17:28:20 +00:00
jenkins-bot
b92d5e0e0e Merge "Document methods that may return StubUserLang instead of Language" 2021-04-30 04:36:32 +00:00
Umherirrender
e69f56fc9b build: Remove unneeded phpcs:ignore on false positives
False positives are resolved with the current release

Change-Id: I21986ec808edb341bf56abae8ee4e34e1559bc49
2021-04-29 23:50:07 +02:00
jenkins-bot
acb29985a2 Merge "Parser: remove Title from method signatures" 2021-04-29 18:28:28 +00:00
daniel
4880a82555 Parser: remove Title from method signatures
Bug: T281068
Change-Id: I3280e38dd82d71845c343eeb911e71dd33bb380b
2021-04-29 18:11:46 +02:00
Petr Pchelko
61599cd74a Clean up hard-deprecated Parser methods returning Revision
Bug: T278376
Change-Id: Ia4b5ab71c1df20e07dbfa3465be022225e8b44c1
2021-04-26 13:59:53 -07:00
jenkins-bot
ee3e2a572d Merge "Don't p-wrap <aside> tags in extension HTML" 2021-04-26 18:50:46 +00:00
Ammarpad
ed6450374b Document methods that may return StubUserLang instead of Language
Typehinting parameters that take the return value of these methods
with Language is not safe as they may return global $wgLang which
may or may not be instance of Language.

Bug: T278429
Change-Id: Ia5a71e4c39124f4427bd816e6e19207bb371cc6b
2021-04-19 12:50:14 +00:00
Bartosz Dziewoński
89eaaac661 Parser: Trim trailing whitespace as the last step in pre-save transform
It was accidentally removed in 2016 in commit
85034abca5.

Bug: T279964
Change-Id: I1da4d67143b86e7f852be7ccf3f16ae7b4f99bc4
2021-04-14 20:53:49 +02:00
daniel
489e2826e0 ParserCache: fix stats for metadata cache missed
Cache misses in metadata were miscounted as miss.unserialize.
Count them as miss.absent.metadata instead.

Change-Id: Idff062325a34445478a4543709a9f2b3cc365f60
2021-04-08 17:54:01 +02:00
Petr Pchelko
d1f481f242 ParserCache: only use in-process caching for metadata
CachedBagOStuff caches negatives, so it breaks PoolCounter.
We only need to cache metadata in-process, since it's commonly
used twice within the request.

Bug: T277829
Change-Id: I11a147c24b6cdb275b521b48802d6f3d0e1a4387
2021-04-06 17:53:38 -06:00
Máté Szabó
377c53ae51 Don't p-wrap <aside> tags in extension HTML
Our PortableInfobox extension uses the HTML5 <aside> tag in its generated HTML.
This tag isn't recognized as a block element (in the way e.g. <div> is) by the
legacy parser, resulting in some spurious empty paragraphs in the output.

As a fix, make the legacy parser aware of <aside> tags to avoid unnecessary
p-wrapping. Also add <aside> to the Sanitizer's internal attribute check.
I3e57f55ac69d2c1ee8a1d41c21b692e56fc7e628 takes care of updating Parsoid-PHP
accordingly.

Bug: T278565
Change-Id: I89dbdf7770e13e1b62320228a366c64e64217b0b
2021-04-06 16:26:12 +02:00
Petr Pchelko
f642215aed Convert ParserCache to PageRecord
ParserOptions not updated cause they depend on Title::getLanguage
implementation.

Tests converted to not require a DB anymore. Can't be proper unit
tests yet due to globals in ParserOptions and fake time hacks,
but exec time does go down from 70 seconds to 9 seconds.

Page content model is still emitted in the metrics since
it was considered useful. Should be removed when we get
something like a page type concept.

Change-Id: Ib16fd0b5b87ffc3cb4d21f4aa43d1203cb7206d2
2021-04-02 21:14:54 -06:00
jenkins-bot
e4488f349a Merge "RevisionRenderer should set revision ID/Timestamp in ParserOutput" 2021-03-26 22:20:00 +00:00
Petr Pchelko
37030c04f0 RevisionRenderer should set revision ID/Timestamp in ParserOutput
ParserOutput object wraps revision ID and revision timestamp
of the parsed revision. Currently ParserCache sets these properties,
but it's not at all it's job - whatever generates the ParserOutput
knows much better what revision it parsed. This also allows us to
simplify ParserCache and easier switch it to PageRecord.

I've only removed setting the timestamp inside ParserCache
cause it's a blocker for page record, I will do followupus
to remove the $revId parameter from ParserCache as well.

cacheRevisionId should also be renamed, but later.

Bug: T278284
Change-Id: I9a82e9fd154b29a81d1f7a3c4abb073c9a27314e
2021-03-24 10:25:56 -06:00
Petr Pchelko
7bf51ccef3 Convert ParserOptions to UserIdentity
We still need a lot of refactoring in ParserOptions
constructions, but for now converting the public interface
should be enough.

Change-Id: I04663c39ca037129b827b33555c3f59def5f9b59
2021-03-24 09:40:42 -06:00
Reedy
cce3fb49d0 Use more neutral or alternative language
Bug: T277987
Change-Id: Iafc4b3e3137936046487119b7e17635f4e560277
2021-03-20 19:47:18 +00:00
jenkins-bot
cba20b9981 Merge "When the parser fetches revision content, guard against empty slots" 2021-03-18 11:51:15 +00:00
Petr Pchelko
0f0a11f6dc Make Parser use UserIdentity instead of User
Change-Id: Idf8578e88af1fd4824f49417a200b16befdbca51
2021-03-17 13:51:52 -06:00
Tim Starling
e767ae1ecc When the parser fetches revision content, guard against empty slots
Bug: T276476
Change-Id: I014da3a333f8ee6ca623b98c415b8d9f9d1be084
2021-03-17 12:18:10 +11:00
C. Scott Ananian
6844f3f158 Make Parser::$mPreprocessor private
This property was deprecated in 1.35.  The replacement function
Parser::getPreprocessor() was introduced in MediaWiki 1.12.0
(commit 8404b249ad).

Code search:
https://codesearch.wmcloud.org/search/?q=mPreprocessor&i=nope&files=&excludeFiles=&repos=

Bug: T275160
Change-Id: Ie5368fce94b5a239a91552bc7a145d9bdfaf47e5
Depends-On: I0917290d4ade8675b2d0eac17a22682c9c1b4a85
2021-03-16 22:38:01 +00:00
C. Scott Ananian
4497f99796 Parser: initialize preprocessor in constructor
Initializing the preprocessor in the constructor allows better
dependency injection, and removes code complexity caused by
lazy initialization.  Any use of the parser is going to end
up creating the preprocessor in any case, so deferring the
initialization doesn't save any performance.  (Best performance
is given by not creating the Parser in the first place if it
is not needed, which is what DI allows.)

Old code tried to unbreak cyclic dependencies by setting the
preprocessor to null.  This is somewhat of a lost cause,
since there are a number of other cyclic dependencies
involving the parser, including StripState, LinkHolders,
etc.  The code complexity is not worth it, given how
ineffective it is in any case.

This is part of T275160 in so far as it allows
Parser::getPreprocessor() to be a simple getter, and thus
(once this patch is merged) we can safely replace any
direct access to Parser::$mPreprocessor with a call to
Parser::getPreprocessor().

Bug: T275160
Change-Id: I38c6fe7d5a97badffdbf34d8b9d725756ed86514
2021-03-16 22:37:40 +00:00
C. Scott Ananian
e95b42eda6 Make Parser::$mOutput private
This property was deprecated in 1.35.  The replacement function
Parser::getOutput() was introduced in MediaWiki 1.12.2
(commit 350b498b9f).

Code search:
https://codesearch.wmcloud.org/search/?q=ser-%3EmOutput&i=nope&files=&excludeFiles=&repos=

Depends-On clauses below are for WMF-deployed code.  Other uses in
non-WMF-deployed code have been patched in:
* I550b19f58520f30ce158dab1969108edc9cdcce9 (SemanticDrilldown)
* https://github.com/SemanticMediaWiki/SemanticFormsSelect/pull/81
* Ic2798b0df5f1f11aea6becdfc186f1be0ecb43e4 (ApprovedRevs)
* I58dff3fc17292d9f6b5e1e43b3d18485027ec880 (DisplayTitle)
* Idd9736dacf257788d74e503687b8554e138ec3c5 (JsonData)
* Ia42f9fa0c45abe6eef21c9815f3f6d6794e3cf95 (MediaFunctions)
* I4f4f7b0118470741a6cdaba562f858e425fcf350 (ParserFunctions)
* Ie44573d9952e62e1fe75e2b9f4691e0d757c53c1 (PhpTags)
* Ib59a4789cfeebf1acbc24c5c00fb996413ae9d5c (SmoothGallery)
* I2b6f3be928a4cb101836ded7abaf2eb8665d4d50 (TinyMCE)
* https://gitlab.com/hydrawiki/extensions/DynamicPageList/-/merge_requests/118
* https://gitlab.com/nonsensopedia/extensions/advancedbacklinks/-/merge_requests/91
* https://github.com/SemanticMediaWiki/SemanticMediaWiki/pull/4935

Bug: T275160
Depends-On: I4f4f7b0118470741a6cdaba562f858e425fcf350
Change-Id: Ib5e9e22db1781ba338dc63ec479ef587de2cd675
2021-03-16 22:37:08 +00:00
C. Scott Ananian
77c48b6857 Remove Parser::$mConf
This was deprecated in 1.34 and then made private; no one uses it any more.

Code search:
https://codesearch.wmcloud.org/search/?q=-%3EmConf&i=nope&files=&excludeFiles=&repos=

Bug: T275160
Change-Id: I4f054328dcc20091030c130ddbcb5fac1eeeac82
2021-03-16 19:43:22 +00:00
C. Scott Ananian
ff20e86a4c Make Parser::$mFunctionHooks private
This property was deprecated in 1.35.

Code search:
https://codesearch.wmcloud.org/search/?q=mFunctionHooks&i=nope&files=&excludeFiles=&repos=

No dependencies in WMF-deployed code.  One use in non-WMF-deployed code:
* I5ae19465561b150ee1c74a1fe03fa359964e81c4

Bug: T275160
Change-Id: I96ca88048c5a1cc8032ebcd502015819958680fb
2021-03-16 19:42:25 +00:00
C. Scott Ananian
1c6e24a1a2 Make Parser::$mTagHooks private
This property was deprecated in 1.35.

Code search:
https://codesearch.wmcloud.org/search/?q=mTagHooks&i=nope&files=&excludeFiles=&repos=

Dependencies below are for WMF-deployed code.  Other uses in
non-WMF-deployed code have been patched in:
* I435b0d1ccae9d9bf6fff85dc3e79d3c4b447eb37
* I85ef0e6ce3f0c818df85809d39259d13b56d966c
* Idab6c9475f78ff4040061f2f317560bbe41666d8

Bug: T275160
Depends-On: Ic5445471d770e396421a4fb2bcfbe1490a77e1bf
Depends-On: Ib708e3f84aa871de84aa56561c875f4a85bb000c
Change-Id: I42e23b101e870b66d169cbb731a0359e90f46265
2021-03-16 19:42:12 +00:00
C. Scott Ananian
3f990d5b4c Inline Parser::firstCallInit() into ::__construct()
This has effectively been the case since 1.35; this just cleans up the
remaining code which assumed it still needed to explicitly call
Parser::firstCallInit() on a newly-constructed Parser.

Bug: T250444
Change-Id: I340947c721172f12ff413322b4283627c0b0b3a4
2021-03-16 19:41:56 +00:00
C. Scott Ananian
115410f077 Parser::__construct(): Remove deprecated argument variants
A number of different argument variants were deprecated in 1.34,
and direct calls to the Parser constructor were deprecated at the
same time (a ParserFactory should be used instead).  These were
hard-deprecated in 1.35.  They should be safe to remove now.

Code search:
https://codesearch.wmcloud.org/deployed/?q=new%20Parser%5C%28&i=nope&files=%5C.php%24&excludeFiles=&repos=

Bug: T236811
Change-Id: I58f7b3ba1b1d62851b2db71197a8d9129e8d473d
2021-03-16 19:41:45 +00:00
C. Scott Ananian
e99cf5c98d Deprecate MWTidy and TidyDriverBase::supportsValidate()
Also copied the tests that used to be in TidyTest into
RemexDriverTest, so that we're not losing coverage when MWTidy is
eventually removed.

Bug: T198214
Change-Id: I0b301f6c98d0943ce4b6dc224f1066cb7bf244d1
2021-03-16 12:29:55 -07:00
jenkins-bot
f6e4f7280b Merge "Introduce Tidy service" 2021-03-16 19:29:17 +00:00
jenkins-bot
1b8fb385a0 Merge "Parser: Move Sanitizer::normalizeCharReferences into RemexCompatFormatter" 2021-03-16 19:09:28 +00:00
jenkins-bot
461b4ef559 Merge "Minor readability tweaks in Parser.php" 2021-03-16 16:53:41 +00:00
Peter Ovchyn
45140daa29 Avoid using User ::getDefaultOption, ::getDefaultOptions
This patch hard-deprecates the methods above

Bug: T276035
Change-Id: Ic36b0702f7547acce0d162d6e0b54bbd4ecf4d81
2021-03-16 17:24:17 +02:00
jenkins-bot
92aab7c56b Merge "French spacing: don't require non-space before French spacing" 2021-03-15 21:44:21 +00:00
C. Scott Ananian
1fd4a7af4e Introduce Tidy service
Refactor the old MWTidy singleton as a DI service.

Change-Id: I95605ea5fd22f53a7f90fe07a6a73fa6c959597a
2021-03-15 17:22:36 -04:00