Commit graph

3378 commits

Author SHA1 Message Date
Ebrahim Byagowi
4c270a72ac Add namespace to WikitextContent
It adds MediaWiki\Content namespace to WikitextContent
and two classes related.

Change-Id: Ib74e4c5b3edac6aa0e35d3b2093ce1d0b794cb6d
2024-08-06 17:42:51 +03:30
jenkins-bot
512c78b8ea Merge "Make {{#language}} consistent with {{#dir}} and {{#bcp47}}" 2024-07-31 11:42:16 +00:00
jenkins-bot
52a10a36b1 Merge "Add {{#bcp47}} parser function" 2024-07-31 11:42:08 +00:00
jenkins-bot
f338ac3295 Merge "Add {{#dir}} parser function" 2024-07-30 20:34:27 +00:00
C. Scott Ananian
450fe7fcd8 Make {{#language}} consistent with {{#dir}} and {{#bcp47}}
Add the same no-arg options for language code that
{{#dir}} and {{#bcp47}} have, for consistency:
* `{{#language}}` will return the name of the *target language*
  (for articles, the content language; for messages, the user language)

The default value for the "in language" argument should be the autonym.
This was working previously but only via a baroque code flow path for
invalid language codes.  Make this a bit clearer and add tests.

Since non-autonym language code translations are added via the
[[Extension:CLDR]] in production, hook LanguageGetTranslatedLanguageNames
in the ParserTestRunner to ensure that we can test this.

Followup-To: Ice1c671c5b3cc077d2bb80ea5dc25c5eabbfeb36
Followup-To: I19c3e91a924e080f37dc95a0d4e61493583b533e
Change-Id: Ibf6e7f194cc056eadb48a5ad8e6d01a761d9351c
2024-07-30 20:27:17 +00:00
C. Scott Ananian
416c33bb6a Add {{#bcp47}} parser function
Template:Bcp47 is one of the most used templates in Wikimedia Commons.
Providing its functionality as a parser function, tied to MediaWiki's
language-handling code, reduces code duplication and will allow us to
reduce template usage on commons.

As with the {{#dir}} parser function, support one special case:

* `{{#bcp47}}` will return the BCP-47 code of the *target language*
  (for articles, the content language; for messages, the user language)

Note the following slight differences from [[Template:BCP47]] on Commons,
documented in an added parser test:

* 'simple' maps to 'en-simple' (not just 'en')
* 'roa-tara' maps to 'nap-x-tara' (not 'it-x-tara')

Bug: T366623
Change-Id: Ice1c671c5b3cc077d2bb80ea5dc25c5eabbfeb36
2024-07-30 20:27:03 +00:00
jenkins-bot
1cbafee5de Merge "[ParsoidParser] Remove unneeded code to set render ID" 2024-07-30 16:54:51 +00:00
Kunal Mehta
4d49a4a59e Extract LintErrorChecker out of SignatureValidator
This code was partially copied into MassMessage and will hopefully
enable more places that accept arbitrary wikitext to check lint errors.

It also hides the internal details of checking with the Linter
extension's configuration in one place until it can be refactored into
something more acceptable (T360809).

Bug: T368690
Change-Id: Iaeb3ccbd61a2a8cb0d8b3dc8b06a3a10bc8fa653
2024-07-29 14:35:40 -04:00
jenkins-bot
2c6d357b9b Merge "Extract StatsFactory methods in parsoid SiteConfig" 2024-07-19 22:20:18 +00:00
Ebrahim Byagowi
e1385d3bdf Add {{#dir}} parser function
Template:Dir is one of the most used templates in Wikimedia Commons,
this tries to provide parts of its functionality in hope we can
perhaps simplify or get rid of the template eventually for clarity and
performance reasons.

As a convenience, `{{#dir}}` and `{{#dir:}}` are synonyms for
`{{#dir:{{PAGELANGUAGE}}}}`: they return the direction of the target
language.  For articles, the target language is the content language;
for messages, the target language is the user language.

In addition, to avoid confusion between BCP-47 language codes and
MediaWiki-internal language codes, an optional second parameter can be
supplied.  If the second parameter is the (localizable) string
'bcp47', the language code given in the first parameter will be
treated as a BCP-47 code.  For example: `{{#dir:sr-Cyrl|bcp47}}`.

(See LanguageCode::bcp47ToInternal() for a description of the
differences and overlaps between MediaWiki internal and BCP-47
codes.  These overlaps *so far* don't result in any case where
encouraging editors to be precise about which set of enumerated
string values they are using for consistency with other
language-related functions, and because MediaWiki internally
differentiates between BCP-47 codes and internal codes.)

Bug: T359761
Change-Id: I19c3e91a924e080f37dc95a0d4e61493583b533e
2024-07-19 16:57:48 -04:00
C. Scott Ananian
16de2c0851 [ParsoidParser] Remove unneeded code to set render ID
Since I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78 it is part of the
contract of ContentRenderer::getParserOutput() that the render ID (and
other cache parameters) will be set when it returns.
(ContentHandler::getParserOutput() can set them even earlier if it has
custom content-based overrides.)  We had a lot of temporary
backward-compatibility code "later" in the parse process to try to close
the barn door if some code path "forgot" to set them, but these are
unnecessary now.

This patch removes that backward-compatibility code in ParsoidParser;
there is similar remaining code in ParserCache etc. which can be
addressed in follow ups.

(For compatibility we do have to temporarily copy the render ID code
inside ParsoidOutputAccess::parseUncachable, but that class is
deprecated and will be removed.)

The HtmlOutputRendererHelper path which used to call
ParsoidParser::parseFakeRevision() is now replaced with a codepath that
goes through RevisionRenderer.  In order to maintain the same behavior
of the ParsoidHandler, we have also added 'useParsoid' handling to the
JsonContentHandler.  This support can perhaps be deprecated eventually.

Bug: T350538
Change-Id: I0853624cf785f72fd956c6c2336f979f4402a68f
2024-07-19 16:09:32 -04:00
C. Scott Ananian
fc0af94d32 Hard deprecate ParsoidOutputAccess
This class was @unstable and should be replaced by ParserOutputAccess.

Bug: T367074
Depends-On: I543a6e9da4fc473a2ac54ac635286453f2aff96a
Change-Id: Ie51b9b7a8b42a6faafeb28378c188347f274a9c5
2024-07-19 03:09:35 -04:00
Yiannis Giannelos
90bac43f11 Extract StatsFactory methods in parsoid SiteConfig
* Its not very clean to import Wikimedia\Stats in parsoid
  * Mediawiki depends on parsoid
* As a workaround we can extract the 2 methods we need in SiteConfig

Bug: T354908
Change-Id: I696131cfba6ccc26ae1f705f216e221a7c3db175
2024-07-10 18:01:56 +02:00
Ebrahim Byagowi
fab78547ad Add namespace to the root classes of ObjectCache
And deprecated aliases for the the no namespaced classes.

ReplicatedBagOStuff that already is deprecated isn't moved.

Bug: T353458
Change-Id: Ie01962517e5b53e59b9721e9996d4f1ea95abb51
2024-07-10 00:14:54 +03:30
jenkins-bot
b46da91b60 Merge "Fix various version mention for class_alias" 2024-07-05 19:32:42 +00:00
Umherirrender
1951aea6b8 Fix various version mention for class_alias
Versions are changed in 8e940c4f21,
but that makes the version wrong

Follow-Up: I7f85d931d3b79da23e87b4e5692b2e14be8fcaa0
Change-Id: Iae43725b8e0fffc4d44bf57f6227334b41290bd9
2024-07-05 18:39:49 +02:00
Umherirrender
1934b45c65 parser: Fix version mention for class_alias
e55cc517da is REL1_42

Follow-Up: I79b4e732c45095eedbaa80afa5eb7479b387ed8a
Change-Id: Ib92b8f64f7443406742c5c6244866e701b010079
2024-07-05 16:16:44 +00:00
Umherirrender
e66f66d875 Use namespaced classes
Change-Id: Ie08a616eb07c8da50e971a5fc3f6207c34c3f342
2024-07-05 00:16:44 +02:00
jenkins-bot
57731f2c90 Merge "add @deprecated to hard deprecated methods" 2024-07-02 17:52:09 +00:00
jenkins-bot
6fe77b9950 Merge "Add a warning to Sanitizer::checkCSS" 2024-06-28 15:19:42 +00:00
Novem Linguae
3391b91af2 Sanitizer: delete method removeHTMLtags()
Git blame on wfDeprecated() is 7 years old.

Unique name, no sign of it in CodeSearch for MediaWiki & services
at WMF. Couple of CodeSearch hits using the "Everything" filter.

Bug: T362636
Change-Id: I8961ebb3a72b328e839659aeeee3e73512a88dee
2024-06-27 14:09:00 +00:00
jenkins-bot
65533a55b6 Merge "ParserOptions: delete 3 hard deprecated methods" 2024-06-27 14:02:24 +00:00
jenkins-bot
1dba954b3f Merge "Remove image and gallery image caption trimming" 2024-06-27 13:54:58 +00:00
jenkins-bot
f412e1b3b8 Merge "Fix mw-selflink-fragment on variant fragment links" 2024-06-27 10:42:44 +00:00
Brian Wolff
d5dc6e657b Add a warning to Sanitizer::checkCSS
Good to be extra clear on security sensitive code

Change-Id: I66b6404aac6f51200bce606f698a179737c15675
2024-06-26 23:39:56 -07:00
Arlo Breault
6b05fa3a21 Remove image and gallery image caption trimming
Post I5039c7ef9e07199c256fd568b4f94714e5831d17, gallery image captions
are no longer placed on new lines, so the presence of leading whitespace
shouldn't be significant.

This fixes an inconsistency in gallery image caption trimming, where
only the first and last option had start and end trimming, respectively.

It also matches Parsoid output, where no trimming takes places, as seen
in the updated tests.

Change-Id: I2a80198c43598dc8c7fa61cb4b0340a97d2ee895
2024-06-26 21:51:40 -04:00
Novem Linguae
bf8e83fe44 ParserOptions: delete 3 hard deprecated methods
Git blame on wfDeprecated() is 4 years old.

Unique names, no sign of them in CodeSearch.

Bug: T362636
Change-Id: I90f11dc78be0938aea53a304b5824f034dd70107
2024-06-26 18:56:59 +00:00
Novem Linguae
bee9048b0e add @deprecated to hard deprecated methods
Stable interface policy says that hard deprecations MUST also
contain soft deprecations in docblock. I imagine this is for
Doxygen and IDEs.

https://www.mediawiki.org/wiki/Stable_interface_policy#Hard_deprecation
Change-Id: Ic1aeb031a4479a1c86c5a1d645c53f2a51055191
2024-06-26 00:21:08 +00:00
Arlo Breault
c356dfed72 Fix mw-selflink-fragment on variant fragment links
Should have been part of 1fca3b5b.

The fix to doVariants can be seen in old output linking [[Dуна#Foo]] to
Дуна despite [[Dуна]] being a self-link in the test above.

Bug: T198652
Change-Id: Id38cfc47041492c5cc68b4f8f9566f421c9168bd
2024-06-19 08:50:46 +00:00
Umherirrender
c08b492d75 Use namespaced classes (3)
Changes to the use statements done automatically via script
Addition of missing use statement done manually

Change-Id: Ia35b2d3105880631dd26ec974068b000ac7f4b6b
2024-06-16 20:26:43 +02:00
Bartosz Dziewoński
ccd423225f Add "implements Stringable" to every class with "function __toString()"
In PHP 8, but not in PHP 7.4, every class with a __toString() function
implicitly implements the Stringable interface. Therefore, the
behavior of checks like "instanceof Stringable" differs between these
PHP versions when such classes are involved. Make every such class
implement the interface so that the behavior will be consistent.

The PHP 7.4 fallback for the Stringable interface is provided by
symfony/polyfill-php80.

Change-Id: I3f0330c2555c7d3bf99b654ed3c0b0303e257ea1
2024-06-13 00:23:39 +00:00
jenkins-bot
6b6f333791 Merge "Make MessageValue implement JsonDeserializable" 2024-06-12 20:15:46 +00:00
jenkins-bot
45c105ec46 Merge "Parser: Avoid extra escaping in replaceTableOfContentsMarker" 2024-06-12 19:51:18 +00:00
Bartosz Dziewoński
c7f52f0ddb Make MessageValue implement JsonDeserializable
MessageValue and friends are pure value objects and newable, so
it makes sense for them to be (de)serializable too. There are some
places where we want to serialize messages, such as in ParserOutput.

The structure of the resulting JSON is inspired by the way we
represent Message objects as plain values elsewhere in MediaWiki,
e.g. StatusValue::getStatusArray().

Co-Authored-By: C. Scott Ananian <cscott@cscott.net>
Depends-On: Ia32f95a6bdf342262b4ef044140527f0676402b9
Depends-On: I7bafe80cd36c2558517f474871148286350a4e76
Change-Id: Id47d58b5e26707fa0e0dbdd37418c0d54c8dd503
2024-06-12 15:47:37 -04:00
James D. Forrester
19f4e6945a Rename JsonUnserial… to JsonDeserial…
This is to make it clearer that they're related to converting serialized
content back into JSON, rather than stating that things are not
representable in JSON.

Change-Id: Ic440ac2d05b5ac238a1c0e4821d3f2d858bc3d76
2024-06-12 14:50:58 -04:00
vahurzpu
fbba3bb2cf Parser: Avoid extra escaping in replaceTableOfContentsMarker
I60fdfc2c52 changed replaceTableOfContentsMarker from using
preg_replace, which supports backreferences in the replacement, and
thus expects literal backslashes and dollar signs to to be escaped,
to using preg_replace_callback, which does not expect any escaping.
This caused unwanted backslashes in headings. This patch removes the
escaping.

Bug: T365413
Change-Id: Idbdc3074c7ad007627c4c259a1aaf090a5d0c7f9
2024-06-12 06:10:58 -04:00
Isabelle Hurbain-Palatin
f65d1c44d0 Make $headers['content-language'] a string instead of Bcp47Code
Page bundle headers should not contain objects, as they are supposed
to represent plaintext HTTP headers.

Change-Id: I2a87a8233b9e42cbafdba63bdf513abe00d826ce
2024-06-11 11:08:34 +02:00
jenkins-bot
bbadf63fa8 Merge "Move Linker::makeExternalLink() to the LinkRenderer service" 2024-06-10 19:58:45 +00:00
C. Scott Ananian
b855c62f66 Move Linker::makeExternalLink() to the LinkRenderer service
Move Linker::makeExternalLink to the LinkRenderer service, as has been
done with the other static methods of Linker.

In order to allow phan's SecurityCheckPlugin to perform a more accurate
analysis of taintedness, tweak the API of Linker::makeExternalLink to
clearly indicate via the type system whether the link text has already
been escaped or not: a `string` argument will always be escaped, and
if the argument is already escaped it should be passed as an HtmlArmor
object.  In refactoring, `Message` arguments were also common, and accept
them as-is to avoid the caller having to think about whether to call
Message::text() or Message::escaped().

This allows us to provide a more precise taint type to the $text argument,
avoids an opaque boolean argument, and avoids spurious errors from
SecurityCheck.

We also require the caller to explicitly pass a Title context, instead
of implicitly relying on the global $wgTitle.  This works cleanly
everywhere except for CommentParser, which has a $selfLinkTarget which
generally works as the title context for the external link, but which
is nullable.  The original Linker::makeExternalLink() used $wgTitle as
a fallback, but $wgTitle can also be null in some circumstances.  The
title context only determines how $wgNoFollowNsExceptions is handled,
so existing code basically just ignored $wgNoFollowNsExceptions when
$wgTitle was null, which isn't terrible.  A future refactor could/should
clean up CommentParser to ensure that there is always a non-null title
context that can be used.

Change-Id: I9bcf4780f388ba639a9cc882dd9dd42eda5736ae
2024-06-10 18:47:32 +00:00
jenkins-bot
afef7619f3 Merge "Parser: Deprecate use of mOutput before initialization" 2024-06-06 16:07:42 +00:00
Paladox
580811c573 Parser: Deprecate use of mOutput before initialization
In some extensions they check that getOutput returns null
if it does they return early.  For example:

https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/5579

But with the type change in I247643b9bf0cabdc92a7e893d653edeaed9a1307
(MW 1.41.0), that attempt fails with the following:

```
Error: Typed property Parser::$mOutput must not be accessed before initialization
```

Fix this by allowing Parser::getOutput() to return null,
but emit a deprecation warning to notify users that the
Parser object has not been property initialized at this
point.

Change-Id: I0de2cf0381ceac36374a47fb11e260b1c522353b
2024-06-06 15:48:41 +00:00
jenkins-bot
10b0dc3fe1 Merge "Sanitizer: Disallow src()" 2024-05-30 14:10:18 +00:00
Gergő Tisza
c24c73619f
Sanitizer: Disallow src()
src() is equivalent to url() in CSS Values 4 (minus the quote mark
weirdness): https://www.w3.org/TR/css-values-4/#funcdef-src

It's not supported by anything yet (doesn't even have a caniuse
entry) but no harm in being proactive.

Change-Id: I01af662b5fba7e84fa69c36e066b15277b8f73ea
2024-05-30 13:52:48 +02:00
Derk-Jan Hartman
2953f337fa Detect modern image formats when using wgAllowExternalImages
Add webp, avif, and svg

Bug: T365636
Change-Id: I11c45094c9f2228bc4a019ff2946a58bc91ba9a0
2024-05-24 21:55:42 +00:00
jenkins-bot
08ef39abfd Merge "Move ParsoidOutputAccess::supportsContentModel() into Parsoid SiteConfig" 2024-05-22 16:46:31 +00:00
jenkins-bot
df0fa2b09c Merge "Parser: Inject service LanguageNameUtils" 2024-05-22 15:31:12 +00:00
C. Scott Ananian
a565e388f9 Move ParsoidOutputAccess::supportsContentModel() into Parsoid SiteConfig
The `supportsContentModel` method is really querying Parsoid for the
set of content models it supports, so it makes sense to put it in the
Parsoid-specific SiteConfig service.

This is part of the work to deprecate and remove ParsoidOutputAccess.

Change-Id: I81eb2df8cef93ede95361a4e03185b3d58e5b84b
2024-05-22 10:57:37 -04:00
jenkins-bot
63b2bf6e3c Merge "[ParserOutput] Remove unused TOCHTML from ParserCache serialization" 2024-05-20 21:08:16 +00:00
Fomafix
66aa439d00 Parser: Inject service LanguageNameUtils
Change-Id: Ia9884f991550c96e4d9bbca9bfb882144716cd24
2024-05-20 19:23:37 +00:00
jenkins-bot
60eb078088 Merge "Serialization test cases: fix filename after ParserOutput namespacing" 2024-05-20 13:13:06 +00:00