Commit graph

455 commits

Author SHA1 Message Date
Petr Pchelko
7c68ae9296 Safe ParserOutput extension data and JsonUnserializable helper.
One major difference with what we've had before is that now we
actually write class names into the serialization - given that
this new mechanism is extencible, we can't establish any kind
of mapping of allowed classes. I do not think it's a problem
though.

Bug: T264394
Change-Id: Ia152f3b76b967aabde2d8a182e3aec7d3002e5ea
2020-11-10 11:21:09 -07:00
jenkins-bot
646dd3d594 Merge "Introduce ParserOutputAccess" 2020-11-10 14:56:12 +00:00
daniel
67d0986211 Introduce ParserOutputAccess
Encapsulate logic for getting rendered page content, for any revision,
with caching and pooling hidden away.

Introducing such a service object will also give us a leverage point for
supporting output transformations. Output transformations are currently
implemented partially in ParserOutput, partially in Parser, and partially
duplicated in Parsoid.

Bug: T267234
Change-Id: I566d7a7936633823ba68b5aecbc8c2d88949b4f8
2020-11-10 15:12:12 +01:00
Petr Pchelko
017cfcf016 Forward-compat for merging CacheTime and ParserOutput mOptions
CacheTime::mUsedOptions and ParserOutput::mAccessedOptions
do exactly the same thing and has to be merged into a single property.
This patch adds forward-compatibility and needs to be deployed
at least one train before the patch which actually merges the properties.

Change-Id: Ic9d71a443994e2545ebf2a826b9155c82961cb88
2020-11-10 07:09:41 -07:00
daniel
cac89b547c ParserOutput: add support for binary properties in JSON.
This introduces a mechanism for encoding binary data in
strings set via setProperty(). This is needed to accommodate compressed
data as used by TemplateData, which uses gzip compression to make the
data fit into the page_props table.

Bug: T266200
Change-Id: I19fa0dea8c25d93fcdec9dc5ddd6f3c9c162b621
2020-11-04 18:52:09 +01:00
jenkins-bot
831fcaf0da Merge "Use ::class together with createMock in unit tests" 2020-10-30 17:56:24 +00:00
Petr Pchelko
2ab1aa3b0a Add some more tests for invalid JSON in ParserCache.
Change-Id: I4983592b9a964f4371ef42c824090468eb938862
2020-10-30 14:02:20 +00:00
Umherirrender
5d41326891 Use ::class together with createMock in unit tests
This makes it easier for IDEs to find usage
This works even for non-existing classes

Change-Id: I4a6389a9bc0b3c212633841d69bd4f48a7ed6f56
2020-10-30 14:45:37 +01:00
Thiemo Kreuz
1fc8d79ac6 Remove documentation that literally repeats the code
For example, documenting the method getUser() with "get the User
object" does not add any information that's not already there.
But I have to read the text first to understand that it doesn't
document anything that's not already obvious from the code.

Some of this is from a time when we had a PHPCS sniff that was
complaining when a line like `@param User $user` doesn't end
with some descriptive text. Some users started adding text like
`@param User $user The User` back then. Let's please remove
this.

Change-Id: I0ea8d051bc732466c73940de9259f87ffb86ce7a
2020-10-27 19:20:26 +00:00
Petr Pchelko
8cc6b7f99a ParserCache JSON - do not \u encode unicode and special characters.
Without passing ALL_OK constant, json-encoding will \u-escape
all the unicode, which will blow the size of serialized data,
especially on Russian wiki out of proportion.

Bug: T263579
Change-Id: Ifaaf1cdfaeeb17c3a99ed742b64ae5cc3157500c
2020-10-22 18:26:59 -07:00
DannyS712
e2731a76ad Normalize error messages for non-serializable properties
Change-Id: If599082bd4acdc9df5b32aaabf2ba8d24e830914
2020-10-21 22:49:57 +00:00
Petr Pchelko
09c14b9dd0 Move serializability validation from ParserOutput to ParserCache
Bug: T263579
Change-Id: Iac2dbc817c2e7af4a6d112f01bd380a04354db22
2020-10-15 13:15:30 -07:00
daniel
600f64029f Use JSON for parser cache
This adds JSON serialization and deserialization capabilities
to CacheTime and ParserOutput.

NOTE: JSON serialization is disabled for now. Merging this patch
should not change behavior in production.

Bug: T263579
Change-Id: I18187e8bce573d21f6f1bd29106e07c63a6d2f4d
2020-10-13 16:28:52 -07:00
Petr Pchelko
bb39896603 Hard-deprecate ParserCache::getKey.
Bug: T263689
Depends-On: I20b5a3eece79afaac6a4fef733d7a60ea23c6ffe
Depends-On: I3ed1188e267f4eaab0ae46f2bc6f9a379dea58ce
Change-Id: I30d05ee5b217fce0521d14867309979e76f34760
2020-10-13 08:31:23 -07:00
Petr Pchelko
13574e8404 Deprecate ParserCache::getKey and replace it with getMetadata
Bug: T263689
Change-Id: I4a71e5a7eb1c25cd53b857c115883cd00160736b
2020-10-13 08:31:22 -07:00
Petr Pchelko
8a879605d9 Add deserialization acceptance tests for ParserOutput
Bug: T264397
Change-Id: I6476fd9b8eff0e1b61ce5f43280d1cd9b7aaa77c
2020-10-12 08:55:32 +00:00
daniel
6eea7d7ed5 Add test infra for ParserCache serialization/deserialization
Based on Daniel's work at Ia6e70179b7ee5ce4e93888585ccc30d92da165c3
however was changed enough to move into a separate changeset.

More acceptance tests and data will be added in a followup commit.

Bug: T264397
Change-Id: I135187e83cbfa02b97c5656f0752f8bf1ceb58d0
2020-10-09 08:14:57 -06:00
daniel
ff07253be5 ParserCache: be resilient to string values
This makes the parser cache resilient to encountering string values
where it is currently expecting to get a ParserOutput objerct from the
underlying cache.

This provides forward compatibility with a switch to JSON based caching:
If we have to switch back after writing JSON to the cache for a while,
ParserCache would simply ignore the respective entries, rather than
causing fatal errors.

Bug: T263579
Change-Id: Iaed582097ab2d05edb4b99a738ac39c530fd63c1
2020-10-01 14:53:00 -06:00
Petr Pchelko
e7ff3cbb6b Cover ParserCache with integration tests
Bug: T250500
Change-Id: I8c45e7c6706b532f1569d06330cc45e841f208b7
2020-10-01 13:56:22 -06:00
Ppchelko
3254e41a4c Revert "Revert "Revert "Hard deprecate all public properties in CacheTime and ParserOutput"""
This reverts commit deacee9088.

Bug: T264257
Change-Id: Ie68d8081a42e7d8103e287b6d6857a30dc522f75
2020-10-01 12:03:41 -06:00
jenkins-bot
da3270feac Merge "Hard-deprecate passing -1 to CacheTime::setCacheTime" 2020-09-28 18:25:11 +00:00
jenkins-bot
b43b4c728f Merge "Revert "Revert "Hard deprecate all public properties in CacheTime and ParserOutput""" 2020-09-24 16:26:17 +00:00
Ppchelko
deacee9088 Revert "Revert "Hard deprecate all public properties in CacheTime and ParserOutput""
This reverts commit a4dc6d82af.

I've reverted the merged patch since I didn't do enough testing
on serialized/reserialized ParserOutput and CacheTime. Now I'm
confident serialization/deserialization works.

Changes since original reverted version:
 - Use __get/__set instead of DeprecationHelper in order to
   avoid $deprecateProperties array to be serialized.
 - Add test for old format serialization new format deserialization.

Change-Id: Ic911c2724ad709931d3316e609781fb89b5b7b28
2020-09-24 07:55:18 -07:00
Petr Pchelko
6a5eb040c7 Hard-deprecate passing -1 to CacheTime::setCacheTime
I've been trying to find how -1 can actually get to
the cacheTime and could find any usages in core or
extensions. The proper way of making ParserOutput
uncacheable is to use updateExpiry( 0 ), that is
documented and has some warning about performance costs.
Setting the cacheTime to -1 seems to be an undocumented
feature (it is mentioned in comments as undocumented)
that doesn't seem to be used anywhere anymore.

I intend to remove it entirely after the deprecation is
deployed and we so not see any warnings for a few weeks.

Change-Id: Ie90b7e4a21faae726940fa9082f2e6a6ea8df613
2020-09-23 19:48:51 -07:00
daniel
e6f37dc1d8 ParserOutput: don't throw on bad editsection
When ParserOutput encounters a bad page title in an editsection
placeholder, this should not cause a fatal error. We can just not
produce an edit link and continue.

It's still worth logging though, since the parser shouldn't be putting
invalid links into editsection placeholders.

Bug: T261347
Change-Id: I154e85aec4b408e659e6281b02473c51f370865d
2020-09-23 22:30:59 +00:00
Reedy
813bd8114a Use $msg2 in CoreParserFunctionsTest::testGender
Bug: T263091
Change-Id: Ifd5bcab9e3aba16ddf4b86d7e28971507bee696a
2020-09-17 03:26:43 +01:00
C. Scott Ananian
c704adaf9f Remove Parser::setFunctionTagHook(), deprecated in 1.35
Code search:
https://codesearch.wmcloud.org/search/?q=mFunctionTagHooks%7CsetFunctionTagHook&i=nope&files=&repos=

Bug: T236809
Change-Id: I293b017cd1caa646b71dffecab02c4cd6df6544c
2020-08-26 13:49:00 -04:00
C. Scott Ananian
b8abd8e01e Hard-deprecate Sanitizer::escapeIdReferenceList()
Code search: https://codesearch.wmcloud.org/search/?q=escapeIdReferenceList&i=nope&files=&repos=

Followup-To: Ifce057b0c436eabec310f812394e86ee7123e7c8
Change-Id: I18f2c47ad6b4f6256d1727f24314cc3c5e13f466
2020-08-20 19:59:13 -04:00
C. Scott Ananian
36da9ef204 Remove all methods of MWTidy except for MWTidy::tidy()
These methods were either @internal or deprecated in 1.35

Bug: T198214
Change-Id: Ica1d1fdfd2a23a2040eac90c71f6211a4513c916
2020-08-17 18:15:37 +00:00
addshore
959bc315f2 MediaWikiTestCase to MediaWikiIntegrationTestCase
The name change happened some time ago, and I think its
about time to start using the name name!
(Done with a find and replace)

My personal motivation for doing this is that I have started
trying out vscode as an IDE for mediawiki development, and
right now it doesn't appear to handle php aliases very well
or at all.

Change-Id: I412235d91ae26e4c1c6a62e0dbb7e7cf3c5ed4a6
2020-06-30 17:02:22 +01:00
Tim Starling
f7f6f0d700 Update LinkHolderArray tests for new HookContainer parameter
Change-Id: I63fc731ca1dbaef6f215279ee0b1788e735783df
2020-06-23 09:00:32 +10:00
Thiemo Kreuz
231bcef6af parser: Remove unused $query param from LinkHolderArray::makeHolder
We know it's never anything but an empty array:
https://codesearch.wmflabs.org/search/?q=makeHolder

Change-Id: Ibc230ec1a1a15a9a5dc61abe5b989a3391d671c1
2020-06-22 14:33:59 +00:00
Thiemo Kreuz
5f3a92385b Fix visibility of setUp/tearDown
Change-Id: I636be48eb9f713680abac35d46091f7b49374696
2020-06-16 21:02:05 +02:00
DannyS712
44945be0a5 Hard deprecate calling ParserOptions::newCanonical with no parameters
Falls back to $wgUser
No remaining deployed uses in MW 1.35+

Bug: T246861
Change-Id: If4304de546457fe0a96a6ac8d705a70c480c6fae
2020-06-15 23:11:45 +00:00
C. Scott Ananian
86fb3b14af Use 'list of allowed attributes' in Sanitizer, instead of 'whitelist'
Bug: T254646
Change-Id: I48d1a5b318c3511fae94291d84f65e5c9cd05a27
2020-06-10 15:58:39 -04:00
jenkins-bot
db3ee78de4 Merge "New unit and integraton tests for class LinkHolderArray" 2020-06-10 10:09:35 +00:00
Thiemo Kreuz
6aa6d10e86 Replace all call_user_func(_array) in all tests
There is native support for all of this now in PHP, thanks to changes
and additions that have been made in later versions. There should be no
need any more to ever use call_user_func() or call_user_func_array().

Reviewing this should be fairly easy: Because this patch touches
exclusivly tests, but no production code, there is no such thing as
"insufficent test coverage". As long as CI goes green, this should be
fine.

Change-Id: Ib9690103687734bb5a85d3dab0e5642a07087bbc
2020-06-06 18:41:20 +02:00
ArtBaltai
757072d182 New unit and integraton tests for class LinkHolderArray
Bug: T243747
Change-Id: I2c12cc76a9bf01eb527db3ea038e4adc59446cac
2020-06-04 11:38:40 +00:00
DannyS712
381d873a8b Replace core uses and hard deprecate Parser(Options) Revision methods
Bug: T249384
Change-Id: Iff10e76120eb8b6b4fbb939182dede83c86d3da2
2020-06-03 05:55:35 +00:00
Tim Starling
68c433bd23 Hooks::run() call site migration
Migrate all callers of Hooks::run() to use the new
HookContainer/HookRunner system.

General principles:
* Use DI if it is already used. We're not changing the way state is
  managed in this patch.
* HookContainer is always injected, not HookRunner. HookContainer
  is a service, it's a more generic interface, it is the only
  thing that provides isRegistered() which is needed in some cases,
  and a HookRunner can be efficiently constructed from it
  (confirmed by benchmark). Because HookContainer is needed
  for object construction, it is also needed by all factories.
* "Ask your friendly local base class". Big hierarchies like
  SpecialPage and ApiBase have getHookContainer() and getHookRunner()
  methods in the base class, and classes that extend that base class
  are not expected to know or care where the base class gets its
  HookContainer from.
* ProtectedHookAccessorTrait provides protected getHookContainer() and
  getHookRunner() methods, getting them from the global service
  container. The point of this is to ease migration to DI by ensuring
  that call sites ask their local friendly base class rather than
  getting a HookRunner from the service container directly.
* Private $this->hookRunner. In some smaller classes where accessor
  methods did not seem warranted, there is a private HookRunner property
  which is accessed directly. Very rarely (two cases), there is a
  protected property, for consistency with code that conventionally
  assumes protected=private, but in cases where the class might actually
  be overridden, a protected accessor is preferred over a protected
  property.
* The last resort: Hooks::runner(). Mostly for static, file-scope and
  global code. In a few cases it was used for objects with broken
  construction schemes, out of horror or laziness.

Constructors with new required arguments:
* AuthManager
* BadFileLookup
* BlockManager
* ClassicInterwikiLookup
* ContentHandlerFactory
* ContentSecurityPolicy
* DefaultOptionsManager
* DerivedPageDataUpdater
* FullSearchResultWidget
* HtmlCacheUpdater
* LanguageFactory
* LanguageNameUtils
* LinkRenderer
* LinkRendererFactory
* LocalisationCache
* MagicWordFactory
* MessageCache
* NamespaceInfo
* PageEditStash
* PageHandlerFactory
* PageUpdater
* ParserFactory
* PermissionManager
* RevisionStore
* RevisionStoreFactory
* SearchEngineConfig
* SearchEngineFactory
* SearchFormWidget
* SearchNearMatcher
* SessionBackend
* SpecialPageFactory
* UserNameUtils
* UserOptionsManager
* WatchedItemQueryService
* WatchedItemStore

Constructors with new optional arguments:
* DefaultPreferencesFactory
* Language
* LinkHolderArray
* MovePage
* Parser
* ParserCache
* PasswordReset
* Router

setHookContainer() now required after construction:
* AuthenticationProvider
* ResourceLoaderModule
* SearchEngine

Change-Id: Id442b0dbe43aba84bd5cf801d86dedc768b082c7
2020-05-30 14:23:28 +00:00
jenkins-bot
5a412a6046 Merge "Use HTML5 semantics for self-closed HTML tags in wikitext" 2020-05-28 23:02:54 +00:00
C. Scott Ananian
05bc687111 Use HTML5 semantics for self-closed HTML tags in wikitext
This behavior has been deprecated and with a tracking category since
1.28.  Time to remove the temporary parameter added to
Sanitizer::removeHTMLtags() and (finally) tweak the behavior to match
HTML5.

Bug: T134423
Change-Id: I5c725175d05854139c95a2b3d8d35ff63cb6707b
2020-05-27 11:59:18 -04:00
DannyS712
a8ba047008 ParserOptionsTest: Rename non-global variable $wgLang
Bug: T160814
Change-Id: I281e71e22f335427829e07b9fce25a753e7f5246
2020-05-24 23:10:55 +00:00
DannyS712
7b08cbf765 ParserOptionsTest: Rename non-global variable $wgUser
Bug: T243708
Change-Id: I135085fff0f1d9b24a69f6d048f3d3d3e97a8470
2020-04-29 17:32:10 +00:00
jenkins-bot
eb88d78c3a Merge "Refactor magic word implementations out of Parser.php" 2020-04-22 08:34:54 +00:00
C. Scott Ananian
7a2331706f Deprecate Parser::firstCallInit()
Originally we created a Parser object on every request, and so care
was taken to make Parser construction lightweight.  In particular,
all potentially costly initialization was moved into a separate
Parser::firstCallInit() method.  Starting with 1.32, parser construction
has instead been done lazily, via the ParserFactory registered with
MediaWikiServices.  The extra complexity associated with the old manual
lazy initialization of Parser is therefore no longer needed.

Deprecate Parser::firstCallInit() as part of a general plan to refactor
the Parser class to allow subclasses and alternate parser implementations.
Add some tests to assert that parsers are being created lazily, and are
not being created when they are not needed.

Bug: T250444
Change-Id: Iffd2b38a2f848dad88010d243250b37506b2c715
2020-04-17 12:49:34 -04:00
C. Scott Ananian
7dd65ba43d Deprecate old-style accessor/mutation methods of Parser
Parser::Options(), Parser::OutputType(), and Parser::Title() have been
deprecated.  All of these had incomplete replacements with either a
::get* method or a ::set* method (and in the case of Title, both).
Add the missing getters or setters where required.

Only Parser::Title() has been hard deprecated.  Replacing the other
uses in deployed code requires the newly-added Parser::getOutputType()
or Parser::setOptions() methods, so we can't replace those methods in
our deployed code after this patch has been merged.

Code search:
https://codesearch.wmflabs.org/deployed/?q=-%3E%28OutputType%7CTitle%7COptions%29%5C%28&i=nope&files=&repos=

Bug: T236809
Change-Id: I0b4d5f170216597afb259cedbb13b8028d284715
2020-04-16 16:37:02 -04:00
C. Scott Ananian
0eaaceea3e Hard-deprecate direct calls to Parser::__construct()
These were deprecated in 1.34, but let's put in some hard deprecation
warnings for 1.35 since this class is certainly going to be refactored
in the future to allow both the legacy parser and Parsoid to extend
Parser as a base class.  Access via the ParserFactory will be fine,
but cut down on the number of different ways Parsers can be constructed
and initialized.

Code search:
https://codesearch.wmflabs.org/deployed/?q=new%20Parser%5C%28&i=nope&files=&repos=
https://codesearch.wmflabs.org/deployed/?q=getMockBuilder%5C%28%20Parser%3A%3A&i=nope&files=&repos=
https://codesearch.wmflabs.org/deployed/?q=new%20Parser%3B&i=nope&files=&repos=

Bug: T236811
Depends-On: Ib3be450c55e1793b027d9b4dae692ba5891b0328
Depends-On: I9d16513f8bd449a43b0a0afbd73651a5c0afa588
Depends-On: I74efda708470efeb82f8f80346ec1ee7e9fd8f2b
Depends-On: I777475d0ab0144e53240173f501d6c8da35d33fb
Change-Id: If36283ec0b78b188b61f658639105d1ed7653e0a
2020-04-16 16:34:34 -04:00
C. Scott Ananian
83a22b7fcd Remove codepaths which ran parser in 'untidy' mode
Disabling tidy has been deprecated since 1.33.  This cleans up the code
paths which still used untidy output.

Bug: T198214
Change-Id: I821ef3b8f59b272d983583d407b2f0794fe1e791
2020-04-13 21:34:04 +00:00
C. Scott Ananian
0a53c9725a Refactor magic word implementations out of Parser.php
This allows them to be more easily reused by other Parser implementations
(ie, Parsoid), and helps keep Parser.php compact.

Bug: T236813
Change-Id: I68fb1e786374e445b7df047934c532d7e10b8e94
2020-04-10 14:43:40 -04:00