Commit graph

158 commits

Author SHA1 Message Date
thiemowmde
52963bbcc0 tests: Make use of ?? and ??= operators in test code
I believe the more recent syntax is quite a bit more readable. The
most obvious benefit is that it allows for much less duplication.

Note this patch is intentionally only touching tests, so it can't
have any effect on production code.

Change-Id: Ibdde9e37edf80f0d3bb3cb9056bee5f7df8010ee
2024-08-08 15:51:20 +02:00
Yiannis Giannelos
90bac43f11 Extract StatsFactory methods in parsoid SiteConfig
* Its not very clean to import Wikimedia\Stats in parsoid
  * Mediawiki depends on parsoid
* As a workaround we can extract the 2 methods we need in SiteConfig

Bug: T354908
Change-Id: I696131cfba6ccc26ae1f705f216e221a7c3db175
2024-07-10 18:01:56 +02:00
Ebrahim Byagowi
fab78547ad Add namespace to the root classes of ObjectCache
And deprecated aliases for the the no namespaced classes.

ReplicatedBagOStuff that already is deprecated isn't moved.

Bug: T353458
Change-Id: Ie01962517e5b53e59b9721e9996d4f1ea95abb51
2024-07-10 00:14:54 +03:30
Umherirrender
f27c2433bb tests: Use namespaced classes (2)
Changes to the use statements done automatically via script
Addition of missing use statement done manually

Change-Id: I4ff4d0c10820dc2a3b8419b4115fadf81a76f7a2
2024-06-13 23:21:02 +02:00
Isabelle Hurbain-Palatin
f65d1c44d0 Make $headers['content-language'] a string instead of Bcp47Code
Page bundle headers should not contain objects, as they are supposed
to represent plaintext HTTP headers.

Change-Id: I2a87a8233b9e42cbafdba63bdf513abe00d826ce
2024-06-11 11:08:34 +02:00
jenkins-bot
08ef39abfd Merge "Move ParsoidOutputAccess::supportsContentModel() into Parsoid SiteConfig" 2024-05-22 16:46:31 +00:00
C. Scott Ananian
a565e388f9 Move ParsoidOutputAccess::supportsContentModel() into Parsoid SiteConfig
The `supportsContentModel` method is really querying Parsoid for the
set of content models it supports, so it makes sense to put it in the
Parsoid-specific SiteConfig service.

This is part of the work to deprecate and remove ParsoidOutputAccess.

Change-Id: I81eb2df8cef93ede95361a4e03185b3d58e5b84b
2024-05-22 10:57:37 -04:00
Fomafix
66aa439d00 Parser: Inject service LanguageNameUtils
Change-Id: Ia9884f991550c96e4d9bbca9bfb882144716cd24
2024-05-20 19:23:37 +00:00
thiemowmde
52ddf3e8ce Remove all @package comments
I don't think these do anything with the documentation generators
we currently use. Especially not in tests. How are tests part of a
"package" when the code is not?

Note how most of these are simply identical to the namespace. They
are most probably auto-generated by some IDEs but don't actually
mean anything.

Change-Id: I771b5f2041a8e3b077865c79cbebddbe028543d1
2024-05-10 13:53:15 +02:00
jenkins-bot
b6df1fd6f7 Merge "Re-enable test after bumping Parsoid" 2024-04-22 04:34:20 +00:00
jenkins-bot
76a7f4aefd Merge "Skip test to bump Parsoid version" 2024-04-22 03:30:13 +00:00
Arlo Breault
9731a015f5 Re-enable test after bumping Parsoid
Follows-Up: I10b77b800dd23f00707011f545817182d3cb58b7
Change-Id: Id1b684876b6fbcafc96e4ae35cd9712720bad1c9
2024-04-19 20:11:26 -04:00
Arlo Breault
0ab4f85ed2 Skip test to bump Parsoid version
The method was moved / renamed.

Needed-By: I441699e7fe9827a5e06e4638ce88c685deb9b856
Change-Id: I10b77b800dd23f00707011f545817182d3cb58b7
2024-04-19 20:10:42 -04:00
Umherirrender
8d97313f81 Fix some line indent
Change-Id: I8f82724197d20f9289d80e138d80310f1eab29f2
2024-04-20 00:25:15 +02:00
thiemowmde
021f2f6c32 Add __debugInfo to MediaWikiTestCaseTrait::createNoOpMock
I keep running into this whenever I use createNoOpMock. I think it's
XDebug that's calling this method, and then PHPUnit flooding the
console with extremely long stack traces.

We pretty much never do anything custom with this method:
https://codesearch.wmcloud.org/search/?q=__debugInfo&files=%5C.php%24

Change-Id: Ib2ab86fb243555f5e4449ed72cb032cb465e415d
2024-04-10 14:25:07 +02:00
Derick Alangi
483321e601 parser: Remove explicit StatsdDataFactory backward-compat logic
This is a follow-up to: I0b683461212a357c7eb09ddec59c87539e323c65
and I40a8372a76f33c5f62ea73bb1180dd7c47412c89 which explicitly for
backward compatibility reasons supports IBufferingStatsdDataFactory.

Now that we've fully switched to StatsFactory together with the
`copyToStatsdAt()` method, we're fine to fully remove this `instanceof`
logic.

Bug: T356815
Change-Id: I164d82904b6d3fb575cb973c14f9454569bf09ac
2024-03-26 22:53:58 +00:00
jenkins-bot
cf35b37992 Merge "HtmlOutputRendererHelper: fall back to page language" 2024-03-13 15:57:24 +00:00
jenkins-bot
c3cc71b430 Merge "test: Add PHPUnit tests for ParsoidParserFactory" 2024-03-13 08:49:12 +00:00
Doğu Abaris
8a1eae0684 test: Add PHPUnit tests for PageContent
Covered:
- Constructor initialization with correct dependencies.
- Retrieve roles assigned to page content.
- Check if the specified role exists in the page content slots.
- Retrieve model name for specified role in page content
- Handle exception for non-existent role when retrieving model
- Retrieve content format for specified role in page content
- Retrieve serialized content for specified role in page content
- Handle exception for non-existent role when retrieving content

Change-Id: Ia2129e37b15bb8c09c0b26e487a9e311e66b932f
2024-03-08 14:56:00 +00:00
daniel
e7f21f6e64 HtmlOutputRendererHelper: fall back to page language
HtmlOutputRendererHelper should not crash hard if the ParserOutput has
no language set. ParserOutput may come from a variety of places, we
should be lenient about it not having a language.

However, we should try harder to actually set a language on ParserOutput
if we have one available. So this also updates
PageBundleParserOutputConverter to keep the ParserOutput's language in
sync wit the language header in the PageBundle.

Bug: T349868
Bug: T353689
Bug: T359426
Change-Id: I2edf20dc3b199e22cda2f32bc858c21ca7d8f4bd
2024-03-06 17:18:16 +00:00
James D. Forrester
fe1fbb3a5c build: Upgrade mediawiki/mediawiki-codesniffer to v43.0.0
Depends-On: I5349d3378b5acd04f0d7c60072a9b1e3dd8f2052
Change-Id: I3b7fd4c460418e72ed0c36febef75f41bad0afb1
2024-03-01 15:58:13 -05:00
jenkins-bot
a62f5c7911 Merge "[ParserOutput] Rename $mText to $mRawText and ::setText() to ::setRawText()" 2024-02-21 17:11:00 +00:00
C. Scott Ananian
72c4945a72 [ParserOutput] Rename $mText to $mRawText and ::setText() to ::setRawText()
ParserOutput::getText() is not a simple getter, but does
transformations on the "text" of the ParserOutput; the simple getter
is named ::getRawText().

To maintain consistency, rename ParserOutput::setText() to
::setRawText() and the property name ParserOutput::$mText to
::$mRawText so future readers are not confused.

The JSON property name as it appears in the serialized ParserCache
is left as 'Text' so that we don't have any forward- or backward-
rollback issues.

Change-Id: I3ef34814ab9473cc70d0a6806e8c5a4a02b73491
2024-02-20 17:13:28 +00:00
Doğu Abaris
e8a13d0266 test: Add PHPUnit tests for ParsoidParserFactory
Covered:
- `testCreate`: Test the create method to create a new Parsoid parser.

Change-Id: I8aba66397e3beae5ddb765398a4ff83a606f4076
2024-02-18 21:18:08 +00:00
C. Scott Ananian
19ae795ac2 [Parsoid\Config\SiteConfig] enable Parsoid support for disabling magic links
Bug: T145590
Change-Id: Ic35c964e1ae224ca6985ddc01ad9eda5671fb7b6
2024-02-17 01:57:42 +00:00
Reedy
85396a9c99 tests: Fix @covers and @coversDefaultClass to have leading \
Change-Id: I5629f91387f2ac453ee4341bfe4bba310bd52f03
2024-02-16 22:43:56 +00:00
Reedy
19c8ca74c2 tests: Add or fix Parser test namespaces
Bug: T357823
Change-Id: I1d07ff559f4607ba98bc834a1432e014f3ebdd35
2024-02-16 22:39:13 +00:00
Reedy
e94e265a93 tests: Add Tests to PHP namespacing
Change-Id: I849268172751d50292e93aa75abe8094873f56bc
2024-02-16 19:10:11 +00:00
Subramanya Sastry
e55cc517da Move Parser to Mediawiki\Parser namespace
Bug: T166010
Co-Authored-By: Daimona Eaytoy <daimona.wiki@gmail.com>
Co-Authored-By: James Forrester <jforrester@wikimedia.org>
Co-Authored-By: Subramanya Sastry <ssastry@wikimedia.org>
Change-Id: I79b4e732c45095eedbaa80afa5eb7479b387ed8a
2024-02-16 09:18:38 -05:00
Brian Wolff
289a900665 Allow filter: in inline CSS.
This was banned because it could be used to load other files,
including potentially local files, in IE9 and earlier. This
browser is no longer relevant. Wikimedia sites stopped supporting
the needed TLS versions for that browser 4 years ago.

Modern browsers have redefined filter to mean something different.
Generally the new filter is perfectly safe as long as we ban the url()
function which we do.

For context on why it was originally banned, see
https://static-codereview.wikimedia.org/MediaWiki/66990.html

Bug: T308160
Change-Id: Ic94f499dfe66e3cce12496893d0ecbee006bd243
2024-02-13 17:59:07 +00:00
C. Scott Ananian
52320c0902 Move ParsoidRenderID to MediaWiki\Edit
This class belongs with the rest of the Parsoid output stash code.

This class has been marked @unstable since 1.39 and thus the move
does not need release notes.

Change-Id: I16061c0c28b1549fbe90ea082cc717fee4a09a6e
2024-02-07 21:22:06 -05:00
C. Scott Ananian
0de13d7662 Add ParserOutput::{get,set}RenderId() and set render id in ContentRenderer
Set the render ID for each parse stored into cache so that we are able
to identify a specific parse when there are dependencies (for example
in an edit based on that parse).  This is recorded as a property added
to the ParserOutput, not the parent CacheTime interface.  Even though
the render ID is /related/ to the CacheTime interface, CacheTime is
also used directly as a parser cache key, and the UUID should not be
part of the lookup key.

In general we are trying to move the location where these cache
properties are set as early as possible, so we check at each location
to ensure we don't overwrite a previously-set value.  Eventually we
can convert most of these checks into assertions that the cache
properties have already been set (T350538).  The primary location for
setting cache properties is the ContentRenderer.

Moved setting the revision timestamp into ContentRenderer as well, as
it was set along the same code paths.  An extra parameter was added to
ContentRenderer::getParserOutput() to support this.

Added merge code to ParserOutput::mergeInternalMetaDataFrom() which
should ensure that cache time, revision, timestamp, and render id are
all set properly when multiple slots are combined together in MCR.

In order to ensure the render ID is set on all codepaths we needed to
plumb the GlobalIdGenerator service into ContentRenderer, ParserCache,
ParserCacheFactory, and RevisionOutputCache.  Eventually (T350538) it
should only be necessary in the ContentRenderer.

Bug: T350538
Bug: T349868
Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80
Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
2024-02-07 21:22:06 -05:00
Umherirrender
a3a9cf99cb tests: Use namespaced class names in @covers annotations
Assist from 8c9cb701e56226cac43fee2fa24b0d0e586f1733

Change-Id: I47897c499028d9e24c00ad0bc6ba7fd8002d9bc1
2024-01-27 01:11:07 +01:00
Daimona Eaytoy
175c0c4abf Replace more instances of deprecated MWException
Bug: T328220
Change-Id: Iba90f7f9b5766bccc05380d040138d74d5e9558a
2024-01-19 23:11:59 +00:00
James D. Forrester
9bfb75ff90 Namespace ParserOutput
Most used non-namespaced class!

Bug: T353458
Change-Id: I4c2cbb0a808b3881a4d6ca489eee5d8c8ebf26cf
2023-12-14 14:57:34 -05:00
Umherirrender
388b0374fa tests: Use namespaced classes
Changes to the use statements done automatically via script
Addition of missing use statements and changes to docs done manually

Change-Id: Ib326ae1e5c8409a98398c721e8b8ce42c73bd012
2023-12-11 15:59:55 +01:00
jenkins-bot
b7fc1b2f43 Merge "Only cache expensive renderings" 2023-11-30 21:24:34 +00:00
daniel
e3fb964439 Only cache expensive renderings
Pages that are fast to render can be omitted from the parser cache
to preserve disk space and cache write operations.

The threshold is configurable per namespace, so the tradeoff can
be evaluated based on different access patterns. For example, pages
that are accessed rarely, like file description pages on commons,
may have a high threshold configured, while pages that are read
frequently, like wikipedia articles, may be configured to be always
cached, using a 0 threshold.

Filtering is based on a time profile recorded in the ParserOutput.
A generic mechanism for capturing the timing profile is implemented
in the ContentHandler base class. Subclasses may implement a more
rigorous capture mechanism.

Bug: T346765
Change-Id: I38a6f3ef064f98f3ad6a7c60856b0248a94fe9ac
2023-11-30 20:56:12 +00:00
Martin Urbanec
29af4dd074 Move user options related classes into its own namespace
There are a couple of user options related classes already,
and the T321527 work on dynamic defaults is going to add
even more. Let's move them into a separate namespace
to make core a bit more organized.

Old name is kept as an alias for compatibility purposes.

Bug: T321527
Bug: T352284
Change-Id: I9822eb1553870b876d0b8a927e4e86c27d83bd52
2023-11-29 13:27:13 +01:00
thiemowmde
9a3d6ecd03 Add PHPUnit test for MagicWord class
This is much more trivial than e.g. MagicWordArray. Still deserves
it's own test, in my opinion.

Change-Id: I1c19c9c1e51fd210a3827a2200153686f7205eee
2023-11-21 16:13:39 +00:00
thiemowmde
10a828ba72 Deprecate MagicWordFactory::getSubstIDs
The main motivation is to further reduce the complexity of the class:
* There is no code that ever writes to $this->mSubstIDs. It's
  effectively a constant.
* According to CodeSearch the getSubstIDs() method is not used
  anywhere. It's @internal to the parser.
* I find it weird that the parser needs to call 2 factory methods to
  do 1 thing.
* I still find it a good idea to keep the knowledge encapsulated in
  the factory and not have the [ 'subst', 'safesubst' ] array in the
  parser. That's why I propose the new method.

Change-Id: I5c147c75200c3c34a410d93a0328b56ea00a050f
2023-11-13 11:10:24 +01:00
jenkins-bot
93b7d14a16 Merge "Make MagicWordArray not fail on old revs with broken UTF-8" 2023-10-27 19:13:18 +00:00
thiemowmde
6f32dc8a8d Make MagicWordArray not fail on old revs with broken UTF-8
Garbage in, garbage out. When the wikitext is broken, it's still
helpful if the user can see the broken wikitext. Even if it's not
fully parsed. It's not the job of this class to fix broken UTF-8.
The worst thing that can happen is that the wikitext contains some
unparsed magic words. However, this is really only relevant for
very old revisions (20 years old, see T321234). It's very normal
that old revisions can't be 100% parsed any more, most notably
because of deleted templates. This here is not much different.

Bug: T321234
Change-Id: I0ce40f6575668847ef309599ee32de52190ab212
2023-10-27 16:45:10 +00:00
thiemowmde
c5541bfa71 parser: Replace exception with /J modifier in MagicWordArray
The extra code that scans for duplicates and throws an exception was
added via I95dea67 in 2017. I'm not entrirely sure why. This should
be impossible in all relevant real-world scenarios. Maybe it happened
in a local dev scenario?

Even if, duplicates are harmless. Let me explain:

The only way a duplicate can end here is when the same magic word is
added twice to the $this->names array. The only thing that happens
then is that the resulting regex contains one of the sub-patterns
twice. It doesn't matter which one matches. We know these subpatterns
are identical. Unfortunately the PCRE compiler doesn't know and
assumes duplicate names are a problem. We have two options to fix
this: Strip duplicates in $this->names with array_unique() or tell
the PCRE compiler that duplicates are ok with the /J modifier.

I would like to avoid the extra, potentially expensive array_unique()
because, as said, duplicates never happen in real-world scenarios.

The /J modifier is supported since PHP 7.2.

Change-Id: I5f113abdbb44354fcc01be7f36fbc7d07f75876c
2023-10-27 12:48:03 +02:00
Timo Tijhof
08ddbf3465 parser: deprecate unused MagicWord::getId, improve docs and tests
* MagicWord::getId was added in r24808 (164bb322f2) but never used.
  At the time, access modifiers like 'private' were not yet in use.
  Deprecate the method with warnings, for removal in a future release.

* Fix zero coverage for MagicWord, due to constructor being
  internal, this is only intended to be created via array and
  factory classes. Let their tests cover this class.

* Remove redundant file-level description and ensure the class desc
  and ingroup tag are on the class block instead.
  Ref https://gerrit.wikimedia.org/r/q/owner:Krinkle+message:ingroup

* Mark constructor `@internal` (was already implied by
  stable interface policy), and explain where to get the object
  instead.

* Mark load() `@internal`. Method was introduced in 1.1 when the
  class (and PHP) did not yet use visibility modifiers for private
  methods. The only way to get an instance of MagicWord
  (MagicWordFactory::get) already calls load(), the method is not
  a no-op if called a second time, and (fortunately) there exist no
  callers to this outside this class that I could find.

* MagicWordArray::getBaseRegex was marked as internal
  in change I17f1b7207db8d2203c904508f3ab8a64b68736a8.

Change-Id: I4084f858bb356029c142fbdb699f91cf0d6ec56f
2023-10-26 16:07:20 +01:00
thiemowmde
6cc9f835c2 parser: Add more complex MagicWordArray test cases
The tests we added before create only MagicWordArray objects with a
single magic word. Here we are testing actual arrays of magic words.

Change-Id: I5880cca2a1e1ecf7018edd22c11229da5d5baffd
2023-10-19 17:22:59 +02:00
thiemowmde
97e269836f Add missing PHPUnit test for MagicWordArray class
I think this code is effectively covered by the parser tests that use
magic words. Still it worried me more and more to make changes to
this code without dedicated unit tests.

Change-Id: Id72e1d7ef4736e4d0672798d720465648d91b3ba
2023-10-06 15:00:07 +00:00
jenkins-bot
a1f4fb418a Merge "Allow Bcp47Code as parameter to LanguageCode::bcp47ToInternal()" 2023-09-29 21:27:27 +00:00
C. Scott Ananian
f47de6ec61 Allow Bcp47Code as parameter to LanguageCode::bcp47ToInternal()
This nominally takes a string-valued language code conforming to the
BCP-47 standard, but this is often generated from a Bcp47Code object.
Since the MediaWiki Language code implements Bcp47Code, we may have
the case where we have a Language object in hand (but typed as a
Bcp47Code not Language) and call Language::toBcp47Code() only to pass
it to LanguageCode::bcp47ToInternal to convert it back to a
mediawiki-internal code.

We can save steps and be more efficient if allow the parameter to be a
Bcp47Code object, and write a fast path for the special case where
that Bcp47Code happens to be a Language object and we can simply call
Language::getCode() to obtain the internal code.

Change-Id: I24932449b8c40e3a5072748d87667184f4befa67
2023-09-29 15:10:29 -04:00
James D. Forrester
c1599c91b3 Namespace Config-related classes under \MediaWiki\Config
Bug: T166010
Change-Id: I4066885a7ea071d22497abcdb3f95e73e154d08c
2023-09-21 05:41:58 +00:00