Commit graph

2105 commits

Author SHA1 Message Date
jenkins-bot
3ea0a9068a Merge "preferences: Signature validation (lint errors, user links, nested subst)" 2020-06-24 22:14:57 +00:00
DannyS712
c1f07ca663 Parser::statelessFetchTemplate - return DeprecatablePropertyArray
Bug: T249393
Change-Id: I8cea2c7451b33f2e9a6063cfb1c85b3dbbbc5d96
2020-06-24 02:08:10 +00:00
Bartosz Dziewoński
df7231ad89 preferences: Signature validation (lint errors, user links, nested subst)
Three new checks are now applied to user signatures in preferences:

* Disallow invalid HTML and lint errors (T140606)

  Since 15e0e9bb4b we can rely on Parsoid to check the signature for
  lint errors. (The old PHP Parser doesn't have this capability.)

  Most importantly, this will disallow unclosed HTML tags. Unclosed
  formatting tags like `<i>` (and also wikitext markup like `''`)
  could affect the entire page with the bad markup.

  New configuration variable $wgSignatureAllowedLintErrors is added
  to allow ignoring some errors. The default value ignores the
  'obsolete-tag' error (caused by HTML tags like `<font>` and `<tt>`.)

* Require a link to user page, talk page or contributions (T237700)

  Various tools don't work correctly when such a link is missing. For
  example, Echo notifications are not sent, DiscussionTools will not
  allow replying to these comments, English Wikipedia's SineBot treats
  these comments as unsigned.

  Such requirement has been present for a long time in many Wikimedia
  wikis' policies, but it was not enforced by software.

* Disallow "nested" substitution in signature (T230652)

  Clever abuse of "subst" markup and tildes allows users to save edits
  containing wikitext in which substitution occurs again when the page
  is next saved. Disallow this in signatures, at least.

New configuration variable $wgSignatureValidation is added to control
what we do about the result of the validation described above. The
options are:

* 'warning':
  Only displays a warning near the field on Special:Preferences if
  the current signature is invalid. Signatures can still be changed
  regardless of validity and will be used when signing comments.

* 'new':
  In addition to the above, if a user tries to change their signature,
  the new one must be valid. Existing invalid signatures are still
  used when signing comments.

* 'disallow':
  In addition to the above, existing invalid signatures are no longer
  used when signing comments.

Bug: T140606
Bug: T237700
Bug: T230652
Change-Id: I07c575c2d9d2afe7a89c4847d16ac044417297bf
2020-06-24 01:20:05 +02:00
jenkins-bot
a7dae830b3 Merge "Introduce wfDeprecatedMsg()" 2020-06-22 22:30:49 +00:00
Thiemo Kreuz
231bcef6af parser: Remove unused $query param from LinkHolderArray::makeHolder
We know it's never anything but an empty array:
https://codesearch.wmflabs.org/search/?q=makeHolder

Change-Id: Ibc230ec1a1a15a9a5dc61abe5b989a3391d671c1
2020-06-22 14:33:59 +00:00
Thiemo Kreuz
7c2d4ca8a3 parser: Streamline LinkHolderArrar::$size handling
Note there is another line of code (line #96, as of now) where the
$this->size property is increased *before* the two $this->internals
and $this->interwikis arrays are increased. Just do the same here.

Change-Id: I15f9e438706d75323ec17cb92e933f600701f9b8
2020-06-22 14:33:43 +00:00
Thiemo Kreuz
6363d64112 parser: Add Title type hint to LinkHolderArray::makeHolder
We *know* this can never be anything but a Title object:
https://codesearch.wmflabs.org/search/?q=makeHolder

Change-Id: Id6de0df627f2aeda79c6483f12a6d500ccd7853f
2020-06-22 14:33:04 +00:00
jenkins-bot
ff2a6d19e0 Merge "parser: Trivial code transformations to LinkHolderArray" 2020-06-22 14:31:02 +00:00
Thiemo Kreuz
5b22e184a0 parser: Trivial code transformations to LinkHolderArray
This is a series of extremely basic, trivial transformations that
don't change any behavior. The goal of this patch is to make the
code less surprising and less cluttered.

In detail:
* Remove an unused property:
  https://codesearch.wmflabs.org/search/?q=tempIdOffset
* Add a strict "Parser" type hint. Note this code would fail anyway
  if that property would not be a Parser.
* Avoid count() if we don't need to know the actual number, just if
  it's empty.
* Inline a few single-use variables.

Change-Id: Ic76cc3984462b1b7700bbc675adaca8fc8219152
2020-06-22 05:50:43 +00:00
Tim Starling
d459add63d Introduce wfDeprecatedMsg()
Deprecating something means to say something nasty about it, or to draw
its character into question. For example, "this function is lazy and good
for nothing". Deprecatory remarks by a developer are generally taken as a
warning that violence will soon be done against the function in question.
Other developers are thus warned to avoid associating with the deprecated
function.

However, since wfDeprecated() was introduced, it has become obvious that
the targets of deprecation are not limited to functions. Developers can
deprecate literally anything: a parameter, a return value, a file
format, Mondays, the concept of being, etc. wfDeprecated() requires
every deprecatory statement to begin with "use of", leading to some
awkward sentences. For example, one might say: "Use of your mouth to
cough without it being covered by your arm is deprecated since 2020."

So, introduce wfDeprecatedMsg(), which allows deprecation messages to be
specified in plain text, with the caller description being optionally
appended. Migrate incorrect or gramatically awkward uses of wfDeprecated()
to wfDeprecatedMsg().

Change-Id: Ib3dd2fe37677d98425d0f3692db5c9e988943ae8
2020-06-22 14:34:39 +10:00
jenkins-bot
836400235f Merge "parser: Remove return from callback for Sanitizer::removeHTMLtags" 2020-06-18 16:46:06 +00:00
DannyS712
44945be0a5 Hard deprecate calling ParserOptions::newCanonical with no parameters
Falls back to $wgUser
No remaining deployed uses in MW 1.35+

Bug: T246861
Change-Id: If4304de546457fe0a96a6ac8d705a70c480c6fae
2020-06-15 23:11:45 +00:00
jenkins-bot
006ec1b507 Merge "Deprecate external image related configuration in ParserOptions" 2020-06-15 14:10:19 +00:00
jenkins-bot
efdcab9f0b Merge "Hard-deprecate sequential array as parameter to Sanitizer::validateAttributes" 2020-06-15 12:31:03 +00:00
C. Scott Ananian
3928fd218e Hard-deprecate sequential array as parameter to Sanitizer::validateAttributes
Code search:
https://codesearch.wmflabs.org/search/?q=validateAttributes&i=nope&files=&repos=

Bug: T255049
Depends-On: I68f122d5a3fa06b0434863cff73851a39dd10514
Depends-On: Ia6315da837f1b27794bac8bc2e96008c60ca28ae
Change-Id: Ie942e7e24dbf3256db15fe83bb5592f7a7c2fbc1
2020-06-15 12:11:30 +00:00
Umherirrender
fb184336f6 Call StubObject::unstub directly
No need to check with isRealObject before

Change-Id: Ia060de98e51a7f11ee4a7d2ffa06f3175fddaddf
2020-06-15 09:04:13 +00:00
Umherirrender
461d587ed6 parser: Remove return from callback for Sanitizer::removeHTMLtags
Change-Id: I119668c87c3e7e6d2727bf986746678540262d72
2020-06-15 00:54:04 +00:00
Umherirrender
19a1b7b5d4 Avoid variable reuse in LinkHolderArray
Help taint-check to determine the correct taint of the variable

Change-Id: I943dad3bc3fa495b78f37776f54788b4766fa3b2
2020-06-14 21:25:55 +02:00
jenkins-bot
1bf3db2214 Merge "Use 'list of allowed attributes' in Sanitizer, instead of 'whitelist'" 2020-06-12 23:37:20 +00:00
jenkins-bot
a2812b8a6a Merge "Rename CoreMagicWords to CoreMagicVariables and update docs" 2020-06-12 19:18:02 +00:00
jenkins-bot
10d722489d Merge "Return null instead of false in Parser methods newly added in 1.35" 2020-06-11 12:14:07 +00:00
Tim Starling
a30b328bd4 Rename CoreMagicWords to CoreMagicVariables and update docs
There's already a thing called magic words, and this is not it. These
things are called variables. There are many usages of this term in the
source. The term was introduced by Lee in 2002: originally
OutputPage::replaceVariables() contained only this functionality.

I introduced the term "magic word", meaning a localizable keyword.
Localizable keywords are an abstraction not limited to this use case.

"Magic variables" is a neologism, but I suppose it is permissible, since
it disambiguates. Whereas calling a variable a magic word conflates rather
than disambiguates.

Fix terminology in magicword.md and update the examples.

Change-Id: I621c888e3790a145ca9978f6b30ff1a8f685b64c
2020-06-11 13:28:45 +10:00
C. Scott Ananian
8c43d75841 Hard deprecate $wgAllowImageTag configuration
The future Parsoid parser will not support this, and it appears to be unused.

It could be reimplemented as an extension tag once it is removed from core.

Code search:
https://codesearch.wmflabs.org/search/?q=allowimagetag&i=fosho&files=&repos=

Bug: T254802
Change-Id: I1b532a7a8794766f8df6fdf375a6ffd78fee94e5
2020-06-10 21:01:10 +00:00
C. Scott Ananian
86fb3b14af Use 'list of allowed attributes' in Sanitizer, instead of 'whitelist'
Bug: T254646
Change-Id: I48d1a5b318c3511fae94291d84f65e5c9cd05a27
2020-06-10 15:58:39 -04:00
C. Scott Ananian
ae594f44f0 Remove unnecessary use of black/whitelist in Sanitizer comments
Bug: T254646
Change-Id: Ie1d4ce761f02304db4a990495e687e75e6783411
2020-06-10 13:30:59 -04:00
C. Scott Ananian
ee8759a281 Deprecate external image related configuration in ParserOptions
Per-parser configuration is discouraged; use sitewide configuration instead.

Code search:
https://codesearch.wmflabs.org/search/?q=setEnableImageWhitelist%7CsetAllowExternalImages%7CsetAllowExternalImagesFrom&i=nope&files=&repos=

The ultimate goal here is to refactor the image filtering functionality into
an extension and move it out of core, so that it can be used by both
Parsoid and the legacy parser in the same way.  We may add back per-parser
customization of the filtering, but the API will probably look different.
Deprecate the existing ParserOptions-based mechanism, which code search
indicates almost no one is using.

Bug: T254802
Change-Id: Ib4a59bbae10cfc924c0290948330d93e02de9ed0
2020-06-10 13:05:34 -04:00
DannyS712
a6d16bd03d Remove unneeded creation of revision objects
Clean up some technical debt; use MutableRevisionRecord instead of
manually constructing a Revision from an array, remove last uses of
RevisionStoreDbTestBase::revisionToRow and remove the method.

Each file can be reviewed separately (except that the removal of
revisionToRow depends on replacing its usage)

Bug: T246284
Change-Id: I0bdc069b21a5c41ef8f9e972c5b17ff189d4a741
2020-06-10 09:09:55 +00:00
jenkins-bot
18ec60d147 Merge "Un-deprecate the ParserPreSaveTransformComplete hook" 2020-06-04 22:03:55 +00:00
C. Scott Ananian
2d8a125b48 Return null instead of false in Parser methods newly added in 1.35
The `false` return has been the source of persistent bugs (T253725,
T251952); lets nip this pattern in the bud before we release these new
APIs.

It would be nice to fix Parser::statelessFetchRevisionRecord() as well,
but that was released in 1.34, so it's not quite as easy to change.

Change-Id: I05a968e3dfb660d0709a6417d1d53a1d08ed4818
2020-06-04 13:59:15 -04:00
jenkins-bot
edb800a0f1 Merge "Revert "Partially revert "Fix impedance mismatch with Parser::getRevisionRecordObject()""" 2020-06-04 17:48:28 +00:00
jenkins-bot
765814513a Merge "Remove terminating line breaks from wfDebugLog calls" 2020-06-04 01:56:05 +00:00
DannyS712
381d873a8b Replace core uses and hard deprecate Parser(Options) Revision methods
Bug: T249384
Change-Id: Iff10e76120eb8b6b4fbb939182dede83c86d3da2
2020-06-03 05:55:35 +00:00
DannyS712
cbbd029cac Remove terminating line breaks from wfDebugLog calls
Change-Id: Iac61ba7924597d654df7bf0a9136eeb3adbe0eef
2020-06-03 02:48:36 +00:00
Tim Starling
47a1619027 Remove terminating line breaks from debug messages
A terminating line break has not been required in wfDebug() since 2014,
however no migration was done. Some of these line breaks found their way
into LoggerInterface::debug() calls, where they mess up the formatting
of the debug log.

So, remove terminating line breaks from wfDebug() and
LoggerInterface::debug() calls.

Also:
* Fix the stripping of leading line breaks from the log header emitted
  by Setup.php. This feature, accidentally broken in 2014, allows
  requests to be distinguished in the log file.
* Avoid using the global variable $self.
* Move the logging of the client IP back to Setup.php. It was moved to
  WebRequest in the hopes that it would not always be needed, however
  $wgRequest->getIP() is now called unconditionally a few lines up in
  Setup.php. This means that it is put in its proper place after the
  "start request" message.
* Wrap the log header code in a closure so that variables like $name do
  not leak into global scope.
* In Linker.php, remove a few instances of an unnecessary second
  parameter to wfDebug().

Change-Id: I96651d3044a95b9d210b51cb8368edc76bebbb9e
2020-06-03 12:01:16 +10:00
jenkins-bot
50b861dcb3 Merge "Move french space armoring below language conversion" 2020-06-01 17:17:57 +00:00
C. Scott Ananian
13d0ad0de6 Un-deprecate the ParserPreSaveTransformComplete hook
Although it's true that Parsoid doesn't (yet) support this hook, and
the $parser object referenced in the hook is likely going to be changed,
this is a hook added in 1.35 (eb6c5f70d9)
to replace use of an even worse hook.  So let's keep the lesser of the
evils, at least for now.

Bug: T236809
Change-Id: I8f866c3b9f1fc51848cfe9364635112371d18e3e
2020-06-01 10:14:39 -04:00
Tim Starling
68c433bd23 Hooks::run() call site migration
Migrate all callers of Hooks::run() to use the new
HookContainer/HookRunner system.

General principles:
* Use DI if it is already used. We're not changing the way state is
  managed in this patch.
* HookContainer is always injected, not HookRunner. HookContainer
  is a service, it's a more generic interface, it is the only
  thing that provides isRegistered() which is needed in some cases,
  and a HookRunner can be efficiently constructed from it
  (confirmed by benchmark). Because HookContainer is needed
  for object construction, it is also needed by all factories.
* "Ask your friendly local base class". Big hierarchies like
  SpecialPage and ApiBase have getHookContainer() and getHookRunner()
  methods in the base class, and classes that extend that base class
  are not expected to know or care where the base class gets its
  HookContainer from.
* ProtectedHookAccessorTrait provides protected getHookContainer() and
  getHookRunner() methods, getting them from the global service
  container. The point of this is to ease migration to DI by ensuring
  that call sites ask their local friendly base class rather than
  getting a HookRunner from the service container directly.
* Private $this->hookRunner. In some smaller classes where accessor
  methods did not seem warranted, there is a private HookRunner property
  which is accessed directly. Very rarely (two cases), there is a
  protected property, for consistency with code that conventionally
  assumes protected=private, but in cases where the class might actually
  be overridden, a protected accessor is preferred over a protected
  property.
* The last resort: Hooks::runner(). Mostly for static, file-scope and
  global code. In a few cases it was used for objects with broken
  construction schemes, out of horror or laziness.

Constructors with new required arguments:
* AuthManager
* BadFileLookup
* BlockManager
* ClassicInterwikiLookup
* ContentHandlerFactory
* ContentSecurityPolicy
* DefaultOptionsManager
* DerivedPageDataUpdater
* FullSearchResultWidget
* HtmlCacheUpdater
* LanguageFactory
* LanguageNameUtils
* LinkRenderer
* LinkRendererFactory
* LocalisationCache
* MagicWordFactory
* MessageCache
* NamespaceInfo
* PageEditStash
* PageHandlerFactory
* PageUpdater
* ParserFactory
* PermissionManager
* RevisionStore
* RevisionStoreFactory
* SearchEngineConfig
* SearchEngineFactory
* SearchFormWidget
* SearchNearMatcher
* SessionBackend
* SpecialPageFactory
* UserNameUtils
* UserOptionsManager
* WatchedItemQueryService
* WatchedItemStore

Constructors with new optional arguments:
* DefaultPreferencesFactory
* Language
* LinkHolderArray
* MovePage
* Parser
* ParserCache
* PasswordReset
* Router

setHookContainer() now required after construction:
* AuthenticationProvider
* ResourceLoaderModule
* SearchEngine

Change-Id: Id442b0dbe43aba84bd5cf801d86dedc768b082c7
2020-05-30 14:23:28 +00:00
jenkins-bot
5a412a6046 Merge "Use HTML5 semantics for self-closed HTML tags in wikitext" 2020-05-28 23:02:54 +00:00
jenkins-bot
671c98f5ae Merge "Add caption to always suppressing" 2020-05-27 21:00:09 +00:00
Jforrester
03bddca131 Revert "Partially revert "Fix impedance mismatch with Parser::getRevisionRecordObject()""
This reverts commit c45ccd7ca8.

Reason for revert: Assuming that I6af7aeabbba fixes the real issue.

Change-Id: Ie1fc595a18e54f0c29b43740039cd7114d8e071e
2020-05-27 19:22:22 +00:00
jenkins-bot
ae4049fb6d Merge "Fix impedance mismatch with Parser::fetchCurrentRevisionRecordOfTitle" 2020-05-27 19:21:24 +00:00
jenkins-bot
5373d5722f Merge "Partially revert "Fix impedance mismatch with Parser::getRevisionRecordObject()"" 2020-05-27 19:21:16 +00:00
Arlo Breault
938b7234a4 Add caption to always suppressing
In brief, the BlockLevelPass looks at opening and closing tags on a line
to determine whether it should do paragraph wrapping.  The blockElems
want to stop wrapping when opened and start again when closed.  The
antiBlockElems want the opposite, to start when they're opened and stop
when closed.  "table" is a blockElems and "td"|"th" are anitBlockElems
so that content found in the interstitial spaces of tables are never
paragraph wrapped.

That means that, to date, "caption" elements are always found in a place
where paragraph wrapping is always suppressed and so adding them to that
set won't change any test results.  However, a new test is added to spec
out this behaviour.

In the legacy parser, "captions" are always found in the right place
because handleTables runs at an earlier stage.  In Parsoid, however, the
treebuilder is relied on to close table cells [0] so when we get to the
token stream paragraph wrappping pass, "caption"s are found in table
cells and therefore get wrapped, even though the treebuilder is about to
be induced to close the cell before opening the caption.

Therefore, in Parsoid, the fix would require us to make captions always-
suppressing to match the legacy parser behaviour.  Thus, this change
here is just to keep these lists [1] consistent between the two
parsers.

[0] 5e11a3f390/src/Wt2Html/TokenizerUtils.php (L138-L151)
[1] 5e11a3f390/src/Wt2Html/TT/ParagraphWrapper.php (L71-L78)

Bug: T210647
Change-Id: I8ccefd69d47dca740f50924b235dffa3873d1f99
2020-05-27 12:29:59 -04:00
C. Scott Ananian
1113039771 Fix impedance mismatch with Parser::fetchCurrentRevisionRecordOfTitle
This newly-added method returns `false` on error; the caller expects
it to return `null`.

Bug: T253725
Followup-To: If36b35391f7833a1aded8b5a0de706d44187d423
Change-Id: I6af7aeabbba9f95338497026fd08d9ae23f75c22
2020-05-27 12:10:27 -04:00
C. Scott Ananian
05bc687111 Use HTML5 semantics for self-closed HTML tags in wikitext
This behavior has been deprecated and with a tracking category since
1.28.  Time to remove the temporary parameter added to
Sanitizer::removeHTMLtags() and (finally) tweak the behavior to match
HTML5.

Bug: T134423
Change-Id: I5c725175d05854139c95a2b3d8d35ff63cb6707b
2020-05-27 11:59:18 -04:00
DannyS712
c45ccd7ca8 Partially revert "Fix impedance mismatch with Parser::getRevisionRecordObject()"
Reason for revert: issue arose again when deployed with wmf.34

Partial revert: keep the intended fix in Parser.php, revert
removal of fail-safe logic in CoreParserFunctions.hp

This reverts commit 2712cb8330.

Bug: T253725
Change-Id: I06266ca8bd29520b2c8f86c430d0f1e2d5dd20c0
2020-05-27 08:10:50 +00:00
jenkins-bot
90d5547799 Merge "Fix impedance mismatch with Parser::getRevisionRecordObject()" 2020-05-20 15:59:53 +00:00
Arlo Breault
cbe83b089d Move french space armoring below language conversion
This is a follow up to I3eae3719ab8fb50b7996d4fd8a9fa0d5ca250023 where
it was moved below doBlockLevels.

This puts it next to the other call to the sanitizer and aligns it
closer with the idea of a post-processing pass in Parsoid.

Bug: T197879
Change-Id: I8ba4934c01a24d53d4871b8efa1e9cf737ba9ebd
2020-05-19 19:31:31 -04:00
jenkins-bot
c0cb506ad8 Merge "Move french space armoring after doBlockLevels" 2020-05-19 22:09:52 +00:00
Reedy
af063dd794 Fix more Squiz.Scope.MethodScope.Missing
Change-Id: I44cd7ba39a898a27f0f66cf34238ab95370d2279
2020-05-18 21:02:14 +00:00