Commit graph

15 commits

Author SHA1 Message Date
Umherirrender
e662614f95 Use explicit nullable type on parameter arguments
Implicitly marking parameter $... as nullable is deprecated in php8.4,
the explicit nullable type must be used instead

Created with autofix from Ide15839e98a6229c22584d1c1c88c690982e1d7a

Break one long line in SpecialPage.php

Bug: T376276
Change-Id: I807257b2ba1ab2744ab74d9572c9c3d3ac2a968e
2024-10-16 20:58:33 +02:00
jenkins-bot
702fb9abe4 Merge "Rewrite MagicWordArray::matchAndRemove to use single preg_… call" 2024-05-15 16:00:22 +00:00
James D. Forrester
8e940c4f21 Standardise all our class alias deprecation comments for ease of grepping
Change-Id: I7f85d931d3b79da23e87b4e5692b2e14be8fcaa0
2024-03-19 20:11:29 +00:00
thiemowmde
a9003f0fc4 Rewrite MagicWordArray::matchAndRemove to use single preg_… call
… instead of running the same (possibly expensive) regular expression
twice. The new code does the same as before, as proven by the tests.

I leave the error handling untouched for the moment, even if we know
from T321234 that the probably only reason the preg_… calls here can
fail is broken UTF-8.

The error message still mentions "preg_match_all". This is intentional
to make it easier to find old and new errors under the same name in
logstash.

I benchmarked this and it's indeed a bit faster. About 25% faster.
However, performance was not the main motivation for this patch but
the readability of the code.

Bug: T321234
Change-Id: I5b7a04abc008dd095dc87d0a05d06e061eb8d6b2
2024-03-01 12:34:21 +01:00
jenkins-bot
93b7d14a16 Merge "Make MagicWordArray not fail on old revs with broken UTF-8" 2023-10-27 19:13:18 +00:00
thiemowmde
6f32dc8a8d Make MagicWordArray not fail on old revs with broken UTF-8
Garbage in, garbage out. When the wikitext is broken, it's still
helpful if the user can see the broken wikitext. Even if it's not
fully parsed. It's not the job of this class to fix broken UTF-8.
The worst thing that can happen is that the wikitext contains some
unparsed magic words. However, this is really only relevant for
very old revisions (20 years old, see T321234). It's very normal
that old revisions can't be 100% parsed any more, most notably
because of deleted templates. This here is not much different.

Bug: T321234
Change-Id: I0ce40f6575668847ef309599ee32de52190ab212
2023-10-27 16:45:10 +00:00
thiemowmde
c5541bfa71 parser: Replace exception with /J modifier in MagicWordArray
The extra code that scans for duplicates and throws an exception was
added via I95dea67 in 2017. I'm not entrirely sure why. This should
be impossible in all relevant real-world scenarios. Maybe it happened
in a local dev scenario?

Even if, duplicates are harmless. Let me explain:

The only way a duplicate can end here is when the same magic word is
added twice to the $this->names array. The only thing that happens
then is that the resulting regex contains one of the sub-patterns
twice. It doesn't matter which one matches. We know these subpatterns
are identical. Unfortunately the PCRE compiler doesn't know and
assumes duplicate names are a problem. We have two options to fix
this: Strip duplicates in $this->names with array_unique() or tell
the PCRE compiler that duplicates are ok with the /J modifier.

I would like to avoid the extra, potentially expensive array_unique()
because, as said, duplicates never happen in real-world scenarios.

The /J modifier is supported since PHP 7.2.

Change-Id: I5f113abdbb44354fcc01be7f36fbc7d07f75876c
2023-10-27 12:48:03 +02:00
Timo Tijhof
08ddbf3465 parser: deprecate unused MagicWord::getId, improve docs and tests
* MagicWord::getId was added in r24808 (164bb322f2) but never used.
  At the time, access modifiers like 'private' were not yet in use.
  Deprecate the method with warnings, for removal in a future release.

* Fix zero coverage for MagicWord, due to constructor being
  internal, this is only intended to be created via array and
  factory classes. Let their tests cover this class.

* Remove redundant file-level description and ensure the class desc
  and ingroup tag are on the class block instead.
  Ref https://gerrit.wikimedia.org/r/q/owner:Krinkle+message:ingroup

* Mark constructor `@internal` (was already implied by
  stable interface policy), and explain where to get the object
  instead.

* Mark load() `@internal`. Method was introduced in 1.1 when the
  class (and PHP) did not yet use visibility modifiers for private
  methods. The only way to get an instance of MagicWord
  (MagicWordFactory::get) already calls load(), the method is not
  a no-op if called a second time, and (fortunately) there exist no
  callers to this outside this class that I could find.

* MagicWordArray::getBaseRegex was marked as internal
  in change I17f1b7207db8d2203c904508f3ab8a64b68736a8.

Change-Id: I4084f858bb356029c142fbdb699f91cf0d6ec56f
2023-10-26 16:07:20 +01:00
thiemowmde
6447dbc37b parser: Use more specific exceptions in MagicWord classes
… instead of the generic MWException and even more generic Exception.
Most, if not all of these should be unreachable anyway. I.e. these
are what we call "unchecked" exceptions, see T240672.

We also have a polyfill for preg_last_error_msg. No need to wrap it
in a function_exists (any more).

Change-Id: Ie26bef3b4371d011ec3f1874986072605692f486
2023-10-25 15:34:03 +02:00
thiemowmde
88fb445dd4 parser: Replace key/current/next loop with foreach
The original motivation was readability. I added comments that
hopefully explain better what's going on here.

I also benchmarked this and it's more than 10 times faster than
before. The main difference comes from foreach itself.

Change-Id: I5e717ea0f3c0ce12f4beffac7105314b63cb752a
2023-10-23 15:55:48 +02:00
thiemowmde
2e0301e634 parser: Add strict type constraints to MagicWord… classes
This patch is intentionally "incomplete". It's limited to places
where we can be 100% sure about the type just from looking at the
code. More to be done in later patches.

Change-Id: Ideea49ea9603127038ef08c6a9805f40a0b86b6d
2023-10-16 10:36:36 +02:00
thiemowmde
bef3da3210 parser: Improve PHPDoc type hints in MagicWord… classes
Intentionally split across multiple patches. This is only about
documentation and impossible to break anything (other than Phan).

MagicWordArray::matchAndRemove is particularly confusing because the
documentation and structure of the returned array make it look like
it would support parameters. But it never (!) did.

The method was added like this in 2008 via commit 269a9103 (r31113).

There was always only a single caller in the Parser class. The
parser never used the array values, only the keys (via isset). Which
makes sense because that code in the parser is about "double
underscore" magic words (e.g. __NOTOC__). These don't support
parameters anyway.

Change-Id: Ife92fc3d6d5b03606ba2b209a886cadef3451fea
2023-10-11 00:07:19 +00:00
thiemowmde
231a562c37 Remove unused public methods from MagicWord & MagicWordArray
Remove unused methods:
https://codesearch.wmcloud.org/search/?q=%5CbmatchStart%5Cb
https://codesearch.wmcloud.org/search/?q=%5CbmatchVariableStartToEnd%5Cb
https://codesearch.wmcloud.org/search/?q=%5CbsubstituteCallback%5Cb
https://codesearch.wmcloud.org/search/?q=%5CbgetVariableRegex%5Cb
https://codesearch.wmcloud.org/search/?q=%5CbgetVariableStartToEndRegex%5Cb
https://codesearch.wmcloud.org/search/?q=%5CbgetWasModified%5Cb
https://codesearch.wmcloud.org/search/?q=%5CbaddArray%5Cb
https://codesearch.wmcloud.org/search/?q=%5CbgetVariableRegex%5Cb

Mark internal methods as private:
https://codesearch.wmcloud.org/search/?q=%5CbgetRegex%5Cb
https://codesearch.wmcloud.org/search/?q=%5CbgetRegexStart%5Cb
https://codesearch.wmcloud.org/search/?q=%5CbgetVariableStartToEndRegex%5Cb
https://codesearch.wmcloud.org/search/?q=%5CbparseMatch%5Cb

There is probably more, e.g. the "add" method, but this is much
harder to prove.

Change-Id: I4093489b309105d2272535fb92135f5052f96ab6
2023-10-04 02:24:37 +02:00
James D. Forrester
14ab1a5276 Follow-up a1b4699: Add in-code comment on aliases for when they were added
Change-Id: I84266fd02edff7002c765f53d3ddee6085d922d4
2023-08-28 14:39:35 -04:00
Amir Sarabadani
a1b4699fea Reorg: Move MagicWord related files to under parser/
This is approved as part of T166010 RFC.

Bug: T321882
Change-Id: Ia4498c0a20e38a6a288dc14065ea8242c84fbc49
2022-12-09 13:48:35 +01:00
Renamed from includes/MagicWordArray.php (Browse further)