Commit graph

34 commits

Author SHA1 Message Date
James D. Forrester
a5387c7c20 Namespace all remaining classes in includes/parser
Bug: T353458
Change-Id: If02cc9b1ff78e26c1cf8c91ee4695845eb133829
2024-10-15 23:54:32 +01:00
dvorapa
10ab0e40a9 parser: Add a new {{USERLANGUAGE}} magic word for use in wikitext
Depending on configuration, this returns either the interface language
code of the current user or the current page language.

Bug: T4085
Change-Id: Iab7fda272ec81af88c74612727ff6bed014d4a81
2024-09-07 19:16:32 +00:00
jenkins-bot
672f5d7e6b Merge "parser: Avoid cache stampede on pages which use {{CURRENTYEAR}}" 2024-08-26 16:00:08 +00:00
C. Scott Ananian
450fe7fcd8 Make {{#language}} consistent with {{#dir}} and {{#bcp47}}
Add the same no-arg options for language code that
{{#dir}} and {{#bcp47}} have, for consistency:
* `{{#language}}` will return the name of the *target language*
  (for articles, the content language; for messages, the user language)

The default value for the "in language" argument should be the autonym.
This was working previously but only via a baroque code flow path for
invalid language codes.  Make this a bit clearer and add tests.

Since non-autonym language code translations are added via the
[[Extension:CLDR]] in production, hook LanguageGetTranslatedLanguageNames
in the ParserTestRunner to ensure that we can test this.

Followup-To: Ice1c671c5b3cc077d2bb80ea5dc25c5eabbfeb36
Followup-To: I19c3e91a924e080f37dc95a0d4e61493583b533e
Change-Id: Ibf6e7f194cc056eadb48a5ad8e6d01a761d9351c
2024-07-30 20:27:17 +00:00
C. Scott Ananian
416c33bb6a Add {{#bcp47}} parser function
Template:Bcp47 is one of the most used templates in Wikimedia Commons.
Providing its functionality as a parser function, tied to MediaWiki's
language-handling code, reduces code duplication and will allow us to
reduce template usage on commons.

As with the {{#dir}} parser function, support one special case:

* `{{#bcp47}}` will return the BCP-47 code of the *target language*
  (for articles, the content language; for messages, the user language)

Note the following slight differences from [[Template:BCP47]] on Commons,
documented in an added parser test:

* 'simple' maps to 'en-simple' (not just 'en')
* 'roa-tara' maps to 'nap-x-tara' (not 'it-x-tara')

Bug: T366623
Change-Id: Ice1c671c5b3cc077d2bb80ea5dc25c5eabbfeb36
2024-07-30 20:27:03 +00:00
Ebrahim Byagowi
e1385d3bdf Add {{#dir}} parser function
Template:Dir is one of the most used templates in Wikimedia Commons,
this tries to provide parts of its functionality in hope we can
perhaps simplify or get rid of the template eventually for clarity and
performance reasons.

As a convenience, `{{#dir}}` and `{{#dir:}}` are synonyms for
`{{#dir:{{PAGELANGUAGE}}}}`: they return the direction of the target
language.  For articles, the target language is the content language;
for messages, the target language is the user language.

In addition, to avoid confusion between BCP-47 language codes and
MediaWiki-internal language codes, an optional second parameter can be
supplied.  If the second parameter is the (localizable) string
'bcp47', the language code given in the first parameter will be
treated as a BCP-47 code.  For example: `{{#dir:sr-Cyrl|bcp47}}`.

(See LanguageCode::bcp47ToInternal() for a description of the
differences and overlaps between MediaWiki internal and BCP-47
codes.  These overlaps *so far* don't result in any case where
encouraging editors to be precise about which set of enumerated
string values they are using for consistency with other
language-related functions, and because MediaWiki internally
differentiates between BCP-47 codes and internal codes.)

Bug: T359761
Change-Id: I19c3e91a924e080f37dc95a0d4e61493583b533e
2024-07-19 16:57:48 -04:00
C. Scott Ananian
c668d33d21 parser: Avoid cache stampede on pages which use {{CURRENTYEAR}}
With the precise expiration computation for magic words like
{{CURRENTYEAR}} there's the potential for all the pages which
use (say) {{CURRENTYEAR}} to expire at the exact same time,
causing a cache stampede.

I9acb42b0d9ff67798a1624cbf9c7cac99c8fbe2f added code to "randomly"
stagger the cache expiration:

  $ttl += ( $deadlineUnix % self::DEADLINE_TTL_STAGGER_MAX );

However, this "stagger" was not actually random, since it is
based on the computed deadline.  So all of the pages which use
{{CURRENTYEAR}} will all compute the same `$deadlineUnix` (midnight
of January 1st of the next year) and thus compute the same "stagger".

Change $deadlineUnix to $tsUnix so that the stagger is based on when
this particular page is being parsed, which should actually have
the desired effect of spreading out the possible cache stampede.

Followup-To: I9acb42b0d9ff67798a1624cbf9c7cac99c8fbe2f
Change-Id: I95272e301c00e4646dd29ca22abc26c6cbe9028e
2024-05-06 16:36:45 -04:00
Subramanya Sastry
e55cc517da Move Parser to Mediawiki\Parser namespace
Bug: T166010
Co-Authored-By: Daimona Eaytoy <daimona.wiki@gmail.com>
Co-Authored-By: James Forrester <jforrester@wikimedia.org>
Co-Authored-By: Subramanya Sastry <ssastry@wikimedia.org>
Change-Id: I79b4e732c45095eedbaa80afa5eb7479b387ed8a
2024-02-16 09:18:38 -05:00
C. Scott Ananian
c46c71749f [SpecialVersion] Fix double-escaping in {{CURRENTVERSION}}
Phan correctly catches a case of double-escaping here: the output of
Special::getVersion() is used as the return value for the
CoreMagicVariables implementation of {{CURRENTVERSION}} which means it
is wikitext (not escaped HTML).

No practical consequence since the only thing being escaped is a SHA
hash and the localized parentheses, neither of which are likely to contain
characters which are going to trigger the extra escape.

Change-Id: I9f125d56b9d143f2a0baea3da8bbb92b38317537
2024-02-09 23:47:21 +00:00
James D. Forrester
870f7c3f1a Namespace SpecialVersion under \MediaWiki\Specials
Change-Id: Ibeb181c653dac3796c44b36c8ff9f2ed572d5f42
2023-09-14 19:25:51 +00:00
Matěj Suchánek
1c8896a0dd Fix various typos and documentation issues
Change-Id: I2cd4b647c01d84cfe0e1b4d55e155ced8c918b17
2023-08-27 12:05:11 +02:00
Amir Sarabadani
15a278189f Reorg: Move MWTimestamp to MediaWiki\Utils
Bug: T321882
Change-Id: I48c10343295c4eb3d9ef8037343b0070e928f040
2023-08-19 05:53:40 +02:00
C. Scott Ananian
4d4a96b469 CoreMagicVariables.php: Improve documentation of DEADLINE_DATE_SPEC_BY_UNIT
Followup-To: I9acb42b0d9ff67798a1624cbf9c7cac99c8fbe2f
Change-Id: I7022a04f359226058a90ba2b2195984862de1ebf
2023-03-29 09:53:04 -04:00
Aaron Schulz
366a0afd63 parser: improve cache TTL accuracy for CURRENT*/LOCAL* magic words
Consolidate cache TTL handling within CoreMagicVariables.

Make the TTL account for how many seconds away the value is from changing.
For example, CURRENTHOUR should change soon after the next hour is reached.
There is a minimum adjustment TTL to avoid parser-after-save delays.

This allows for longer caching in most cases, as well as more up-to-date
rendering when the hour/day/week/year is about to change. Previously, there
were blind TTLs, which are either way too pessimistic or way too generous.

This commit does not change the CURRENTTIME, CURRENTTIMESTAMP, LOCALTIME,
and LOCALTIMESTAMP words, since there is no reasonable way to cache output
while keeping them up-to-date.

Bug: T320668
Change-Id: I9acb42b0d9ff67798a1624cbf9c7cac99c8fbe2f
2023-03-28 22:35:17 +00:00
Tim Starling
be3018b268 Just another 80 or so PHPStorm inspection fixes (#4)
* Unnecessary regex modifier. I agree with this inspection which flags
  /s modifiers on regexes that don't use a dot.
* Property declared dynamically.
* Unused local variable. But it's acceptable for an unused local
  variable to take the return value of a method under test, when it is
  being tested for its side-effects. And it's acceptable for an unused
  local variable to document unused list expansion elements, or the
  nature of array keys in a foreach.

Change-Id: I067b5b45dd1138c00e7269b66d3d1385f202fe7f
2023-03-25 00:39:06 +00:00
jenkins-bot
548ede7d7b Merge "CoreMagicVariables/CoreParserFunction: unify revisionid" 2023-02-24 04:58:07 +00:00
jenkins-bot
7d697a6dea Merge "CoreMagicVariables/CoreParserFunction: unify revisionuser" 2023-02-24 04:25:22 +00:00
C. Scott Ananian
24d69ef952 Deprecate Parser::getFunctionLang()
This is identical to Parser::getTargetLanguage() in modern MediaWiki,
since 7df3473cfe in MW 1.19 (2012).

Bug: T318860
Depends-On: If5fa696e27e84a3aa1343551d7482c933da0a9b6
Depends-On: I87a7ceedce173f6de4bb6722ffe594273c7b0359
Change-Id: Ieed03003095656e69b8e64ed307c6bd67c45c1e7
2022-11-16 16:47:16 -05:00
C. Scott Ananian
d8e519987d CoreMagicVariables/CoreParserFunction: unify revisionid
Reduce code duplication by having the "magic variable" implementation
of `revisionid` invoke the corresponding "parser function"
implementation with no arguments.  This reduces code duplication and
also supports consistent results from direct invocation of the parser
function with no arguments in the future.

Bug: T204370
Change-Id: I2dc4799559f440511b4584a73513c17d5b0f1ff0
2022-09-22 13:05:19 -04:00
C. Scott Ananian
2da6deaba4 CoreMagicVariables/CoreParserFunction: unify revisionuser
Reduce code duplication by having the "magic variable" implementation
of `revisionuser` invoke the corresponding "parser function"
implementation with no arguments.  This reduces code duplication and
also supports consistent results from direct invocation of the parser
function with no arguments in the future.

Bug: T204370
Change-Id: I4200e4f05a8987b509349832d6d0b164acbe0dd8
2022-09-22 13:05:19 -04:00
C. Scott Ananian
d08e0cdf20 CoreMagicVariables/CoreParserFunctions: unify revisiontimestamp & etc
Reduce code duplication by having the "magic variable" implementations
of:

  revisionday, revisionday2, revisionmonth, revisionmonth1,
  revisionyear, revisiontimestamp

invoke the corresponding "parser function" implementations with no
arguments.  This reduces code duplication and also supports consistent
results from direct invocation of the parser function with no
arguments in the future.

Bug: T204370
Change-Id: I8d25755e4d92bd91988cfb706d85bdb170abb207
2022-09-21 16:58:02 -04:00
C. Scott Ananian
9a37dbda6d Unify the "magic variable" and "parser function" form of several built-ins
The following magic variables also have "parser function" forms, where
the first argument is a user-supplied title:

  pagename, pagenamee, fullpagename, fullpagenamee, subpagename,
  subpagenamee, rootpagename, rootpagenamee, basepagename,
  basepagenamee, talkpagename, talkpagenamee, subjectpagename,
  subjectpagenamee, pageid, cascadingsources, namespace, namespacee,
  namespacenumber, talkspace, talkspacee, subjectspace, subjectspacee

Refactor the code so that the magic variable form invokes the parser
function form with no arguments to reduce code duplication.  We also
tweak the behavior of parser function when invoked with no arguments,
although this change will not be directly visible because the parser
always prefers magic variables over parser functions, so the parser
function is never actually invoked with no arguments.

A future patch may allow the parser function to be invoked with a
hash prefix (I895087c546dc820c77c0dda596dfeb72586b87cc) in which case
consistency will be more important.

Note that `revisionuser`, `revisionid`, `revisionday`, `revisionday2`,
`revisionmonth`, `revisionmonth1`, `revisionyear` and
`revisiontimestamp` are also of a similar form and could be included
in this list, but their magic variable and parser function
implementations do not appear to be consistent.  This will be
addressed in future patches.

In addition, the following magic variables have a "parser function" form
where the presence or absence of the first argument selects "raw" output:

  numberofarticles, numberoffiles, numberofusers, numberofactiveusers,
  numberofpages, numberofadmins, numberofedits

Similar to above, refactor the code so that the magic variable form
invokes the parser function form with no arguments to reduce code
duplication (and to support future direct invocation of the parser
function with no arguments).

Bug: T204370
Change-Id: Iaec33fb40a2d9884daf2852ed6a6a3b53c9d3863
2022-09-19 22:11:58 -04:00
C. Scott Ananian
cf52f646bb Add {{=}} as a built-in magic word
This is a replay of 4bc0dc348a, which
was reverted in 9bd4fc0ae9 due to unexpected
use on Dutch Wiktionary.  In 1.36 deprecation warnings and a tracking
category were added if a wiki defined [[Template:=]] to expand to
anything other than `=` (see aeb3f45c20).
This patch follows up that deprecation by finally defining `{{=}}` as
a built-in, since the last usage on deployed wikis was cleaned up
sometime around February 2021 (list at
https://meta.wikimedia.org/wiki/Equals_sign_parser_function_template_conflicts
).

We've left the tracking category defined for now, so that any remaining
pages left in the tracking category on third-party wikis still retain
localized category documentation.  But it is expected that the next MW
release will also remove the tracking category.

Bug: T91154
Change-Id: I4717172f1d74d326212d51015a6cd87c3758f30d
2022-05-20 13:08:20 -04:00
Aryeh Gregor
7b791474a5 Use MainConfigNames instead of string literals, #4
Now largely automated:

VARS=$(grep -o "'[A-Za-z0-9_]*'" includes/MainConfigNames.php | \
  tr "\n" '|' | sed "s/|$/\n/;s/'//g")
sed -i -E "s/'($VARS)'/MainConfigNames::\1/g" \
  $(grep -ERIl "'($VARS)'" includes/)

Then git add -p with lots of error-prone manual checking. Then
semi-manually add all the necessary "use" lines:

vim $(grep -L 'use MediaWiki\\MainConfigNames;' \
  $(git diff --cached --name-only --diff-filter=M HEAD^))

I didn't bother fixing lines that were over 100 characters unless they
were over 120 and triggered phpcs.

Bug: T305805
Change-Id: I74e0ab511abecb276717ad4276a124760a268147
2022-04-26 19:03:37 +03:00
Aryeh Gregor
747bc81ac0 Use MainConfigNames instead of string literals
Part 1, proof of concept. Hundreds of files left to go. These changes
brought to you in large part by vim macros.

Bug: T305805
Change-Id: I44789091e9f6394c800a11b29f22528c8dcacf71
2022-04-11 17:53:27 +03:00
Umherirrender
20d4c4ba37 parser: Cast return of Timestamp::format to int for n
Some Language function are documented to take int,
cast the formatted string to int
n = Numeric representation of a month, without leading zeros

Found by phan strict checks

Change-Id: Ifc7fc64ac26a756f181b7d0155f13a6500114f5e
2022-03-15 23:55:39 +01:00
C. Scott Ananian
b72fa830d6 Pass a ConvertibleTimestamp to CoreMagicVariables
This avoids a type mismatch found by phan strict checks
(Ifc7fc64ac26a756f181b7d0155f13a6500114f5e) -- the passed timestamp
from Parser was a string in unix format (ie, an integer as a string)
but was declared as an integer.  It was then passed to
MWTimestamp::getInstance() which expected a string.

However, the 'simple' fix for this issue still caused unnecessary
conversions to/from timestamp format.  We took the string (nominally
in TS_MW format), ran a regexp against it to convert it to an
MWTimestamp instance, then converted that MWTimestamp to UNIX format
and exported that as a string, just to take that string and run four
different regexps against it *again* to convert it back to an
MWTimestamp instance so we can format it.

Better to just pass the MWTimestamp directly.  Only two wrinkles:

1. the ParserGetVariableValueTs hook expects to be passed a string
in TS_UNIX format and then to be able to mutate it. Nothing in production
uses that hook, so only do this conversion if the hook is registered.

2. Parsoid would like to use the definitions in CoreMagicVariables
in the future as well.  So pass the timestamp as the not-MW-@internal
ConvertibleTimestamp class instead of directly as a MWTimestamp.

Change-Id: Ib2c5fa45630c54c2716897370a0580ed48d27242
2022-03-14 17:30:15 -04:00
Umherirrender
9efd9ca45e Add explicit casts between scalar types
* Some functions accept only string, cast ints and floats to string
* After preg_matches or explode() casts numbers to int to do maths
* Cast unix timestamps to int to do maths
* Cast return values from timestamp format function to int
* Cast bitwise operator to bool when needed as bool

* php internal functions like floor/round/ceil documented to return
  float, most cases the result is used as int, added casts

Found by phan strict checks

Change-Id: Icb2de32107f43817acc45fe296fb77acf65c1786
2022-03-01 18:19:33 +01:00
Umherirrender
535aa27593 Remove Parser dependency on config LanguageCode/DisableLangConversion
Use the value from corresponding services,
for consistency if services are injected from outside of service wiring.

Change-Id: Ib0f6af20df8dbc0deae71023e5493524d43ce211
2021-12-18 16:18:19 +00:00
C. Scott Ananian
06ab90f163 Add new ParserOutput::{get,set}OutputFlag() interface
This is a uniform mechanism to access a number of bespoke boolean
flags in ParserOutput.  It allows extensibility in core (by adding new
field names to ParserOutputFlags) without exposing new getter/setter
methods to Parsoid.  It replaces the ParserOutput::{get,set}Flag()
interface which (a) doesn't allow access to certain flags, and (b) is
typically called with a string rather than a constant, and (c) has a
very generic name.  (Note that Parser::setOutputFlag() already called
these "output flags".)

In the future we might unify the representation so that we store
everything in $mFlags and don't have explicit properties in
ParserOutput, but those representation details should be invisible to
the clients of this API.  (We might also use a proper enumeration
for ParserOutputFlags, when PHP supports this.)

There is some overlap with ParserOutput::{get,set}ExtensionData(), but
I've left those methods as-is because (a) they allow for non-boolean
data, unlike the *Flag() methods, and (b) it seems worthwhile to
distingush properties set by extensions from properties used by core.

Code search:
https://codesearch.wmcloud.org/search/?q=%5BOo%5Dut%28put%29%3F%28%5C%28%5C%29%29%3F-%3E%28g%7Cs%29etFlag%5C%28&i=nope&files=&excludeFiles=&repos=

Bug: T292868
Change-Id: I39bc58d207836df6f328c54be9e3330719cebbeb
2021-10-15 14:25:54 -04:00
C. Scott Ananian
1c9bbfc14e Language: hard deprecate the noSeparators parameter to ::formatNum
Code should use Language::formatNumNoSeparators() instead, which has
existed since MW 1.21.

Code search:
https://codesearch.wmcloud.org/search/?q=formatNum%5C%28%5B%5E%29%5D*%2C&i=nope&files=&repos=

Depends-On: I95c365e2535bb3c47bed69a9b702c8f13d9fab87
Depends-On: I012434d5f6c749fec45a6c160e8d5d03686192e9
Depends-On: If3de5645a92514f605d4117fea3a820ed6c86624
Change-Id: I58a66975e505f16d8db5d663a9ca225535277983
2020-10-21 10:08:04 -04:00
C. Scott Ananian
9bd4fc0ae9 Revert "Adding = as a parser function"
This reverts commit 4bc0dc348a.

Reason for revert: Dutch Wikitionary uses {{=}} for something else;
see https://phabricator.wikimedia.org/T91154#6276915 for details.

Revert for now so it doesn't disrupt next week's train, we'll add it back with a config var or some other mitigation.

Bug: T91154
Change-Id: I9f81c7b73a04d6c1d77b67ce311cc7e6d279eb8b
2020-07-03 14:52:27 +00:00
Base
4bc0dc348a Adding = as a parser function
Bug: T91154
Change-Id: I8c9df0a8ce07e1febef137946615efd74d4800e3
2020-07-01 14:20:37 -04:00
Tim Starling
a30b328bd4 Rename CoreMagicWords to CoreMagicVariables and update docs
There's already a thing called magic words, and this is not it. These
things are called variables. There are many usages of this term in the
source. The term was introduced by Lee in 2002: originally
OutputPage::replaceVariables() contained only this functionality.

I introduced the term "magic word", meaning a localizable keyword.
Localizable keywords are an abstraction not limited to this use case.

"Magic variables" is a neologism, but I suppose it is permissible, since
it disambiguates. Whereas calling a variable a magic word conflates rather
than disambiguates.

Fix terminology in magicword.md and update the examples.

Change-Id: I621c888e3790a145ca9978f6b30ff1a8f685b64c
2020-06-11 13:28:45 +10:00
Renamed from includes/parser/CoreMagicWords.php (Browse further)