Commit graph

152 commits

Author SHA1 Message Date
Petr Pchelko
d334de960a Expand local URLs to absolute URLs in ParserOutput
New option 'absoluteURLs' was added to getText method
of the ParserOutput object that replaces all links
in the page HTML with absolute URLs.

Removing the action=render special case from Title
seems safe cause we will end up replacing the result
with absolute URL if we're in a render action no matter
where Title::getLocalUrl was called from.

This change is safely revertable from the perspective
of ParserCache.

Bug: T263581
Change-Id: Id660e1026192f40181587199d3418568f0fdb6d3
2021-09-23 11:48:51 -07:00
Timo Tijhof
e387cd9c35 Change trivial use of getVal('action') to getRawVal
Per docs added in I18767cd809f67b, these don't need normalization
as they are only compared against predefined strings, and besides
are generally entered manually in a form, and even then would not
require the kinds of Unicode chars that have multiple/non-normalized
forms.

In nearby areas to also fix some trivial cases:

* getVal('title') obviously needs normalization.
  Use getText() to make this more obvious.

* getVal() compared against simple string literals within the code
  obviously don't need normalization (e.g. printable === 'no').

* Change hot code in MediaWiki checking for whether 'diff' or 'oldid'
  are set to getCheck (which uses getRawVal) instead of getVal.
  As a bonus this means it now handles values like "0" correctly,
  which could theoretically have caused bad behaviour before.

Change-Id: Ied721cfdf59c7ba11d1afa6f4cc59ede1381238e
2021-08-26 22:11:58 +01:00
Petr Pchelko
1aa68d183d Remove depecated ParserCache::getKey and ::getEtag
Change-Id: Idea037eaab851110d0c58f537dafcb2153cd2613
2021-07-27 14:47:49 -07:00
Thiemo Kreuz
51777ee8c1 Add and fix various type hints in PHPDocs
Random fixes I collected the past weeks in my local dev
environment.

Change-Id: Ic8a6262fd28e05cb57335f2faf390a47ff97dbaa
2021-06-18 08:19:23 +00:00
daniel
489e2826e0 ParserCache: fix stats for metadata cache missed
Cache misses in metadata were miscounted as miss.unserialize.
Count them as miss.absent.metadata instead.

Change-Id: Idff062325a34445478a4543709a9f2b3cc365f60
2021-04-08 17:54:01 +02:00
Petr Pchelko
d1f481f242 ParserCache: only use in-process caching for metadata
CachedBagOStuff caches negatives, so it breaks PoolCounter.
We only need to cache metadata in-process, since it's commonly
used twice within the request.

Bug: T277829
Change-Id: I11a147c24b6cdb275b521b48802d6f3d0e1a4387
2021-04-06 17:53:38 -06:00
Petr Pchelko
f642215aed Convert ParserCache to PageRecord
ParserOptions not updated cause they depend on Title::getLanguage
implementation.

Tests converted to not require a DB anymore. Can't be proper unit
tests yet due to globals in ParserOptions and fake time hacks,
but exec time does go down from 70 seconds to 9 seconds.

Page content model is still emitted in the metrics since
it was considered useful. Should be removed when we get
something like a page type concept.

Change-Id: Ib16fd0b5b87ffc3cb4d21f4aa43d1203cb7206d2
2021-04-02 21:14:54 -06:00
Petr Pchelko
37030c04f0 RevisionRenderer should set revision ID/Timestamp in ParserOutput
ParserOutput object wraps revision ID and revision timestamp
of the parsed revision. Currently ParserCache sets these properties,
but it's not at all it's job - whatever generates the ParserOutput
knows much better what revision it parsed. This also allows us to
simplify ParserCache and easier switch it to PageRecord.

I've only removed setting the timestamp inside ParserCache
cause it's a blocker for page record, I will do followupus
to remove the $revId parameter from ParserCache as well.

cacheRevisionId should also be renamed, but later.

Bug: T278284
Change-Id: I9a82e9fd154b29a81d1f7a3c4abb073c9a27314e
2021-03-24 10:25:56 -06:00
Timo Tijhof
eb7b9c8e7d ParserCache: Instrument CachedBagOStuff to understand dupe fetches
Follows-up 66cc685b45.

Bug: T269593
Change-Id: Iff5267689a17281330307575d618cfd531051e57
2021-03-13 01:43:10 +00:00
jenkins-bot
d491f23b90 Merge "Respect used options for ParserOptions::isSafeToCache" 2021-01-25 19:13:53 +00:00
Petr Pchelko
7e8d1a11c8 Return back accidentally removed ParserCache 'hit' metric
Change-Id: Ibd69e532a2f373f9d0129ac2a2c6ac70039c9bec
2021-01-05 14:44:19 -06:00
Petr Pchelko
46b66f093a Respect used options for ParserOptions::isSafeToCache
Bug: T269293
Change-Id: Ic3cf908265ad470815f0ac81442d33bde04a5665
2021-01-04 10:32:34 -06:00
Petr Pchelko
71bb51ed55 ParserCache: general code cleanup, abstracted expiration checks.
Change-Id: I7374f30d582064236b8f782e6a2528eb692e3010
2020-12-16 12:09:55 +00:00
Petr Pchelko
66cc685b45 Make ParserCache use CachedBagOStuff
Bug: T269593
Change-Id: I21e6e39eccad22b781252b142c1e5b079c1ee0b4
2020-12-07 10:28:30 -06:00
Petr Pchelko
4417b13d58 Make ParserCache respect ParserOptions::isSafeToCache
Bug: T269154
Change-Id: I8e9ecd2787aa8d172e708ba64ea936e63fbc6b36
2020-12-02 14:02:36 -06:00
Petr Pchelko
b956c77d27 Merge CacheTime and ParserOutput accessedOptions properties
Change-Id: I5785596d68e8923f8bcbd182ace0b1991bd75c9a
2020-11-19 10:12:39 -07:00
Petr Pchelko
dbdc2a3cd3 Introduce JsonCodec to help with serialization/deserialization
Change-Id: I5433090ae8e2b3f2a4590cc404baf838025546ce
2020-11-19 08:32:21 -07:00
Petr Pchelko
7c68ae9296 Safe ParserOutput extension data and JsonUnserializable helper.
One major difference with what we've had before is that now we
actually write class names into the serialization - given that
this new mechanism is extencible, we can't establish any kind
of mapping of allowed classes. I do not think it's a problem
though.

Bug: T264394
Change-Id: Ia152f3b76b967aabde2d8a182e3aec7d3002e5ea
2020-11-10 11:21:09 -07:00
Petr Pchelko
8cc6b7f99a ParserCache JSON - do not \u encode unicode and special characters.
Without passing ALL_OK constant, json-encoding will \u-escape
all the unicode, which will blow the size of serialized data,
especially on Russian wiki out of proportion.

Bug: T263579
Change-Id: Ifaaf1cdfaeeb17c3a99ed742b64ae5cc3157500c
2020-10-22 18:26:59 -07:00
DannyS712
e2731a76ad Normalize error messages for non-serializable properties
Change-Id: If599082bd4acdc9df5b32aaabf2ba8d24e830914
2020-10-21 22:49:57 +00:00
Petr Pchelko
2bbf1dc97e ParserCache: add serialization format to HTML debug message.
Bug: T263579
Change-Id: I80f316ce78285cb245e05d01c7e1a8e314a2e732
2020-10-20 12:48:44 -07:00
Petr Pchelko
e269dd028b Hard-deprecate ParserCache::getETag.
This is not ParserCache business to build etags for output.

See https://github.com/SemanticMediaWiki/SemanticMediaWiki/pull/4862
for removal of the only use.
Change-Id: Iceb6bd761acc7511ea7d9d14b9df2e9e1fa51648
2020-10-16 20:17:26 +00:00
jenkins-bot
ed57d5295f Merge "Move serializability validation from ParserOutput to ParserCache" 2020-10-16 13:19:59 +00:00
Petr Pchelko
0f16608e6d Add basic docs for ParserCache
Change-Id: I6290c2f064d6ddc4693a27f1d8bf933bcdb4293f
2020-10-15 13:51:25 -07:00
Petr Pchelko
09c14b9dd0 Move serializability validation from ParserOutput to ParserCache
Bug: T263579
Change-Id: Iac2dbc817c2e7af4a6d112f01bd380a04354db22
2020-10-15 13:15:30 -07:00
daniel
0c059b7381 ParserCache: introduce feature flag for enabling JSON encoding.
This introduces $wgParserCacheUseJson for selectively enabling
JSON encoding in the parser cache. This is intended for testing only.

It should be removed before the release of 1.36.

Bug: T263579
Change-Id: I0d9cab3fafb984a3159e24f9e80f792429ff3c71
2020-10-13 23:46:57 +00:00
daniel
600f64029f Use JSON for parser cache
This adds JSON serialization and deserialization capabilities
to CacheTime and ParserOutput.

NOTE: JSON serialization is disabled for now. Merging this patch
should not change behavior in production.

Bug: T263579
Change-Id: I18187e8bce573d21f6f1bd29106e07c63a6d2f4d
2020-10-13 16:28:52 -07:00
Petr Pchelko
bb39896603 Hard-deprecate ParserCache::getKey.
Bug: T263689
Depends-On: I20b5a3eece79afaac6a4fef733d7a60ea23c6ffe
Depends-On: I3ed1188e267f4eaab0ae46f2bc6f9a379dea58ce
Change-Id: I30d05ee5b217fce0521d14867309979e76f34760
2020-10-13 08:31:23 -07:00
Petr Pchelko
13574e8404 Deprecate ParserCache::getKey and replace it with getMetadata
Bug: T263689
Change-Id: I4a71e5a7eb1c25cd53b857c115883cd00160736b
2020-10-13 08:31:22 -07:00
jenkins-bot
f43007d3f1 Merge "HACK/ParserCache: Force cache-miss if mUsedOptions is undefined" 2020-10-05 13:58:14 +00:00
daniel
ff07253be5 ParserCache: be resilient to string values
This makes the parser cache resilient to encountering string values
where it is currently expecting to get a ParserOutput objerct from the
underlying cache.

This provides forward compatibility with a switch to JSON based caching:
If we have to switch back after writing JSON to the cache for a while,
ParserCache would simply ignore the respective entries, rather than
causing fatal errors.

Bug: T263579
Change-Id: Iaed582097ab2d05edb4b99a738ac39c530fd63c1
2020-10-01 14:53:00 -06:00
Petr Pchelko
e7ff3cbb6b Cover ParserCache with integration tests
Bug: T250500
Change-Id: I8c45e7c6706b532f1569d06330cc45e841f208b7
2020-10-01 13:56:22 -06:00
Timo Tijhof
b52660a1f1 HACK/ParserCache: Force cache-miss if mUsedOptions is undefined
These are causing thousands of errors from wmf.11-cached pages
since we rolled back to wmf.10.

Bug: T264257
Change-Id: Ia3357b2f593ca16fc12241d7ea22bbfd222f2536
(cherry picked from commit 71ee44aabba5c10187ad6d5cb26b5ef072cbf9b2)
2020-10-01 18:25:47 +00:00
Ppchelko
3254e41a4c Revert "Revert "Revert "Hard deprecate all public properties in CacheTime and ParserOutput"""
This reverts commit deacee9088.

Bug: T264257
Change-Id: Ie68d8081a42e7d8103e287b6d6857a30dc522f75
2020-10-01 12:03:41 -06:00
Petr Pchelko
f24125684c Clean up ParserCache construction and inject logger
Bug: T263583
Depends-On: Iceaa0e872c53aa79b7012711813895221fa62fa6
Change-Id: I6f131a078e9d6eb5da3533b0ac3730e24bd3f56f
2020-09-28 13:17:30 -07:00
jenkins-bot
17291773c1 Merge "Create ParserCacheFactory." 2020-09-28 16:13:37 +00:00
Petr Pchelko
6417f2c49f ParserCache::get - drop support for passing Article.
Deprecated in 1.35. However, if you look closely,
the deprecation warning emitting code was passing
numeric 1.35 instead of a string '1.35' which caused
the deprecation function to throw an exception.

Thus, this code has not been deprecated in 1.35, but
was accidentally broken. Instead of fixing the deprecation,
just remove the fallback.

Change-Id: I369f03d6b01053fc0396beb635c7b7d49bd249da
2020-09-27 15:46:34 -07:00
Petr Pchelko
fec48eb5a4 Create ParserCacheFactory.
* Makes ParserCache take the root of the key
  as a constructor argument
* Introduces a ParserCacheFactory

Next steps:
- convert FlaggedRevs to using this.
- cleanup

This assumes that we wouldn't want to differentiate
the parser cache settings per use-case, as it is now
for default vs flaggedrevs caches. There are only two settings:
$wgParserCacheType - name of the BagOStuff to use
$wgParserCacheExpireTime - the expiration time.
I think if we wanted to have different settings for different
caches, we could add that as a next step.

Bug: T263583
Change-Id: I188772da541a95c95a5ecece7c7dd748395506c2
2020-09-25 18:17:58 -07:00
Ppchelko
deacee9088 Revert "Revert "Hard deprecate all public properties in CacheTime and ParserOutput""
This reverts commit a4dc6d82af.

I've reverted the merged patch since I didn't do enough testing
on serialized/reserialized ParserOutput and CacheTime. Now I'm
confident serialization/deserialization works.

Changes since original reverted version:
 - Use __get/__set instead of DeprecationHelper in order to
   avoid $deprecateProperties array to be serialized.
 - Add test for old format serialization new format deserialization.

Change-Id: Ic911c2724ad709931d3316e609781fb89b5b7b28
2020-09-24 07:55:18 -07:00
Ppchelko
a4dc6d82af Revert "Hard deprecate all public properties in CacheTime and ParserOutput"
This reverts commit 799c10b7eb.

Reason for revert: Didn't test how this would work with deserializing stored ParserOutput.

Change-Id: I4221bc26282f3b4bd044f0ab50d00e77eb57ede0
2020-09-23 22:46:33 +00:00
Petr Pchelko
799c10b7eb Hard deprecate all public properties in CacheTime and ParserOutput
* In preparation for ParserCache/Parsoid integration, it's nice to
  do some cleanups. Will untie our hands a bit more.
* Verified no usages in extensions deployed at wikimedia, other then
  Flow, fixed in the dependent patch.

Change-Id: Idd78413a36887e2ff5c902d410e55691cafb736b
2020-09-23 07:17:13 -07:00
Tim Starling
6b05a27987 Require three parameters to ParserCache::__construct()
Change-Id: I8a74fdf016bafa2efd32ef81f3c51909bc1d8ec7
Depends-On: I8bc1b94c01d2e6e0b352a44bcb8e1d24a9fbe4ee
2020-09-18 08:14:15 +10:00
Umherirrender
381c934075 Use StatsdDataFactory service in ParserCache
New argument is optional, because extension extends this class

Change-Id: I710016c0ca9f8bb595d9f3ccd9452c76fdda3ef3
2020-06-21 21:15:17 +02:00
DannyS712
cbbd029cac Remove terminating line breaks from wfDebugLog calls
Change-Id: Iac61ba7924597d654df7bf0a9136eeb3adbe0eef
2020-06-03 02:48:36 +00:00
Tim Starling
47a1619027 Remove terminating line breaks from debug messages
A terminating line break has not been required in wfDebug() since 2014,
however no migration was done. Some of these line breaks found their way
into LoggerInterface::debug() calls, where they mess up the formatting
of the debug log.

So, remove terminating line breaks from wfDebug() and
LoggerInterface::debug() calls.

Also:
* Fix the stripping of leading line breaks from the log header emitted
  by Setup.php. This feature, accidentally broken in 2014, allows
  requests to be distinguished in the log file.
* Avoid using the global variable $self.
* Move the logging of the client IP back to Setup.php. It was moved to
  WebRequest in the hopes that it would not always be needed, however
  $wgRequest->getIP() is now called unconditionally a few lines up in
  Setup.php. This means that it is put in its proper place after the
  "start request" message.
* Wrap the log header code in a closure so that variables like $name do
  not leak into global scope.
* In Linker.php, remove a few instances of an unnecessary second
  parameter to wfDebug().

Change-Id: I96651d3044a95b9d210b51cb8368edc76bebbb9e
2020-06-03 12:01:16 +10:00
Tim Starling
68c433bd23 Hooks::run() call site migration
Migrate all callers of Hooks::run() to use the new
HookContainer/HookRunner system.

General principles:
* Use DI if it is already used. We're not changing the way state is
  managed in this patch.
* HookContainer is always injected, not HookRunner. HookContainer
  is a service, it's a more generic interface, it is the only
  thing that provides isRegistered() which is needed in some cases,
  and a HookRunner can be efficiently constructed from it
  (confirmed by benchmark). Because HookContainer is needed
  for object construction, it is also needed by all factories.
* "Ask your friendly local base class". Big hierarchies like
  SpecialPage and ApiBase have getHookContainer() and getHookRunner()
  methods in the base class, and classes that extend that base class
  are not expected to know or care where the base class gets its
  HookContainer from.
* ProtectedHookAccessorTrait provides protected getHookContainer() and
  getHookRunner() methods, getting them from the global service
  container. The point of this is to ease migration to DI by ensuring
  that call sites ask their local friendly base class rather than
  getting a HookRunner from the service container directly.
* Private $this->hookRunner. In some smaller classes where accessor
  methods did not seem warranted, there is a private HookRunner property
  which is accessed directly. Very rarely (two cases), there is a
  protected property, for consistency with code that conventionally
  assumes protected=private, but in cases where the class might actually
  be overridden, a protected accessor is preferred over a protected
  property.
* The last resort: Hooks::runner(). Mostly for static, file-scope and
  global code. In a few cases it was used for objects with broken
  construction schemes, out of horror or laziness.

Constructors with new required arguments:
* AuthManager
* BadFileLookup
* BlockManager
* ClassicInterwikiLookup
* ContentHandlerFactory
* ContentSecurityPolicy
* DefaultOptionsManager
* DerivedPageDataUpdater
* FullSearchResultWidget
* HtmlCacheUpdater
* LanguageFactory
* LanguageNameUtils
* LinkRenderer
* LinkRendererFactory
* LocalisationCache
* MagicWordFactory
* MessageCache
* NamespaceInfo
* PageEditStash
* PageHandlerFactory
* PageUpdater
* ParserFactory
* PermissionManager
* RevisionStore
* RevisionStoreFactory
* SearchEngineConfig
* SearchEngineFactory
* SearchFormWidget
* SearchNearMatcher
* SessionBackend
* SpecialPageFactory
* UserNameUtils
* UserOptionsManager
* WatchedItemQueryService
* WatchedItemStore

Constructors with new optional arguments:
* DefaultPreferencesFactory
* Language
* LinkHolderArray
* MovePage
* Parser
* ParserCache
* PasswordReset
* Router

setHookContainer() now required after construction:
* AuthenticationProvider
* ResourceLoaderModule
* SearchEngine

Change-Id: Id442b0dbe43aba84bd5cf801d86dedc768b082c7
2020-05-30 14:23:28 +00:00
Reedy
b038d6333a Fix even more PSR12.Properties.ConstantVisibility.NotFound
Change-Id: I6d98efcfac1f1c0ab6a442e0af6d5daa6ef7801a
2020-05-16 00:28:41 +00:00
DannyS712
4721717527 Replace uses and hard deprecate Article:: and WikiPage::getRevision
Bug: T250532
Bug: T239975
Change-Id: Ic8f2baa0ac805d5196a7107bdc7a1abb36eba139
2020-04-20 23:06:48 +00:00
ArtBaltai
13ae7b807f ParserCache::get use WikiPage only as argument
ParserCache work only with WikiPage,remove Article and Page interfaces
Rename WikiPage property names and type hintings

Bug: T248719
Change-Id: I08afded432b059f94538be574a4789e18e89bf03
2020-04-12 03:49:48 +03:00
C. Scott Ananian
8a1c656150 Hard deprecate ParserCache::singleton(), deprecated in 1.30
Code search:
https://codesearch.wmflabs.org/search/?q=ParserCache%5Cs*%3A%3A%5Cs*singleton&i=fosho&files=&repos=

Bug: T249032
Change-Id: I22308bb2530a4aaa6a29e42d50fd679b932a6e9f
2020-04-01 10:31:38 -04:00