Html::element is more lenient about which characters it escapes.
But really this is just factored out of the next patch for ease of
review.
Change-Id: I9abb4d866a624df7bf4628ab9cc581967e715160
The <langconvert> tag takes two attributes: from (language variant from) and to (language variant to). It returns the content of the tag converted using LanguageConverter. It returns an error if the attributes are not present, if the variants do not exist, or if the variants belong to different languages. Currently it does not work for IuConverter, because the variants use the code ike rather than iu, and ike isn't in the list of languages with converters available.
This patchset reimplements from a parser function to a tag, and renames from transliterate to langconvert.
Bug: T263082
Change-Id: Idc3a32c66d5a0466c63e7ce8753d2619354c30b0
This replaces the 'requirements' from parser tests (hooks and
functionhooks) with a more flexible 'options' clause to allow
additional file-level requirements/options to support running parser
tests in multiple modes. (For example, with the legacy parser or in
one of two parsoid modes.)
Bug: T254181
Depends-On: I636bd1f2c8aee327acbbd1636e2ac76355f1d80e
Change-Id: I58373d135c3a804f4ce9967112c338435f5cd4b6
Replace direct access to $wgDisableLangConversion with
LanguageConverterFactory::isConversionDisabled(), and replace direct
access to $wgDisableTitleConversion with
LanguageConverterFactory::isTitleConversionDisabled(). However, most
places that check ::isTitleConversionDisabled() actually want
::isLinkConversionDisabled(), so add that too (and deprecate
isTitleConversionDisabled()).
Code search:
https://codesearch.wmcloud.org/search/?q=Disable%28Lang|Title%29Conversion&i=nope&files=&repos=
This change removes a number of spurious dependencies on the global
configuration and reduces code duplication (for example, if the logic
for disabling language conversion were ever to change).
Depends-On: I6fa8230ae97b0e34c381003548e61f9b7387d363
Change-Id: Icc4687638ff1815003dd903854efdbd904854f1e
The past weeks I collected a few minor updates in my local dev
environment, and would like to submit them now.
Change-Id: Ibe00d72763f1b66c50cf73e00c8fa52d265043fc
The default is a-z plus every non-ASCII character, but this
is too broad.
Instead use the same character set as is used for link trails,
specifically Latin & Arabic letters.
Bonus:
Add combining diacritics to both sets as when these are appended
to letters, the resulting glyph is still considered a letter.
Bug: T263266
Change-Id: I358673f79989491799d3d68da17e73b806b167e0
This is causing problems for Parsoid CI, as parser tests fail when
phpunit runs the tests at a different point than they are run in
core's CI due to the side-effects of content-language changes made in
other phpunit tests. (For example, phpunit runs all extension tests
after core tests, so the same parsertest can pass if included in core
and then fail when included in an extension.)
SpecialPageFactory::$aliases has a dependency on the current content
language, with no way to reset it other than to recreate the
SpecialPageFactory.
Change-Id: I278580ed5cf2c85403cbaf601f8af4753e14a9d0
The parsertests file allows certain tests to declare a dependency on
a particular tag hook, but this doesn't work for extensions like
TimedMediaHandler which affect the output but don't register a
unique extension tag name. Allow using 'extension:Foo' in the
`hooks` clause to register a dependency on the specific extension name,
instead of indirectly on the registered extension tag name.
Change-Id: I2d3f7e1313b4456733f820e6d8c504bb8d7427a7
This seems to primarily be used in ParserTestPrinter::showTesting()
to print the string,
"Running test $desc... "
Follow up to 585cbcd
Change-Id: I53fc98ae56e3e9faad6ab1ca5a5a778f1c146fd1
We plan to add {{=}} as a built-in parser function, expanding to `=`,
in the same way that `{{!}}` is a built-in. It will be used to
automatically escape uses of `=` in template arguments (again, in the
same way that `{{!}}` can be used to protect uses of `|` in template
arguments).
Some wikis have non-standard definitions of `Template:=`; add a
tracking category to warn these wikis to transition before we turn on
the built-in parser function in a future release.
New parser test file added, so we can re-define Template:= and test
both cases of this new warning.
Bug: T91154
Change-Id: I50ff8a7b6be95901ebb14ffbe64940a0f499cfac
It leads to surprising results when the definitions in one parser test
file leak into all the others. This can cause spurious test failures
when you happen to have two extensions which define conflicting
article fixtures, and prevents you from using parser tests to test
patches like I50ff8a7b6be95901ebb14ffbe64940a0f499cfac, where you
deliberately want to set up and test two different definitions for the
same template name.
Change-Id: I958c6305a95ca32418d83b7f33f7c180a3b370cd
Reduce code duplication by using the authoritative HTML entity list
from Remex, instead of duplicating the table inside MediaWiki.
This also extends the set of entities accepted in wikitext to nearly
match HTML5. (HTML5 allows some entities which are not
semicolon-terminated; wiktext insists on the semicolon.)
This patch brings the core parser closer to Parsoid output, as in most
cases Parsoid already accepted the full HTML5 entity list.
(I873a6120e4bd1c69fee9da76d266e24e97a22add is a corresponding patch to
Parsoid to unify its copy of Sanitizer.)
Also deprecate Sanitizer::hackDocType() while we're updating it, since
this method should not be public.
Bug: T94603
Change-Id: Ia08bc261c3644f83109f13df04b692101b4e8ef2
One (test file) parser to rule them all. Reduce a little bit of
redundant code between core and Parsoid by using Parsoid's parser test
file parser to run core's parser tests.
This should have no effect on users of TestFileReader::read() *except*
that Parsoid's test file reader is more strict about bogus lines in
the test file, including duplicate test names, and we've removed support
for the old v1 format (hard deprecated in 1.35).
Next step will be to be able to execute parser tests on extensions
using Parsoid's parser as well.
Bug: T254181
Depends-On: I8ab4a8c59ed1b6837dba428f96a8ba0084b7fb68
Change-Id: I5acaf82819ae964895a831be4f28c31c77a09e84
Support for Preprocessor_DOM was removed in 1.35; it's time to clean up
any old parser tests which required it.
Change-Id: I36c7906b8ce31ef6885aef54175749e67e51d07c
This fixes a regression in the parser test CLI runner caused by
435d5f4d55, which added a change tag for
"manual reverts". The parser test framework was triggering this
change tag addition as it set up its mock article database, and then
subsequent attempts to write the change tag to the database failed
because the tables were missing.
Bug: T259186
Change-Id: I232e918dfdc83244a010681b6adffd6c1171cf24
This isn't really user visible, but the algorithm for ensuring there
are no conflicts in automatically-generated parser test class names
had a number of issues which led to inconsistent naming.
Change-Id: I50ff5b72381332c77f0d99af08e689796019a7af
This reverts commit 4bc0dc348a.
Reason for revert: Dutch Wikitionary uses {{=}} for something else;
see https://phabricator.wikimedia.org/T91154#6276915 for details.
Revert for now so it doesn't disrupt next week's train, we'll add it back with a config var or some other mitigation.
Bug: T91154
Change-Id: I9f81c7b73a04d6c1d77b67ce311cc7e6d279eb8b
The name change happened some time ago, and I think its
about time to start using the name name!
(Done with a find and replace)
My personal motivation for doing this is that I have started
trying out vscode as an IDE for mediawiki development, and
right now it doesn't appear to handle php aliases very well
or at all.
Change-Id: I412235d91ae26e4c1c6a62e0dbb7e7cf3c5ed4a6
Deprecating something means to say something nasty about it, or to draw
its character into question. For example, "this function is lazy and good
for nothing". Deprecatory remarks by a developer are generally taken as a
warning that violence will soon be done against the function in question.
Other developers are thus warned to avoid associating with the deprecated
function.
However, since wfDeprecated() was introduced, it has become obvious that
the targets of deprecation are not limited to functions. Developers can
deprecate literally anything: a parameter, a return value, a file
format, Mondays, the concept of being, etc. wfDeprecated() requires
every deprecatory statement to begin with "use of", leading to some
awkward sentences. For example, one might say: "Use of your mouth to
cough without it being covered by your arm is deprecated since 2020."
So, introduce wfDeprecatedMsg(), which allows deprecation messages to be
specified in plain text, with the caller description being optionally
appended. Migrate incorrect or gramatically awkward uses of wfDeprecated()
to wfDeprecatedMsg().
Change-Id: Ib3dd2fe37677d98425d0f3692db5c9e988943ae8
Clean up some technical debt; use MutableRevisionRecord instead of
manually constructing a Revision from an array, remove last uses of
RevisionStoreDbTestBase::revisionToRow and remove the method.
Each file can be reviewed separately (except that the removal of
revisionToRow depends on replacing its usage)
Bug: T246284
Change-Id: I0bdc069b21a5c41ef8f9e972c5b17ff189d4a741
There is native support for all of this now in PHP, thanks to changes
and additions that have been made in later versions. There should be no
need any more to ever use call_user_func() or call_user_func_array().
Reviewing this should be fairly easy: Because this patch touches
exclusivly tests, but no production code, there is no such thing as
"insufficent test coverage". As long as CI goes green, this should be
fine.
Change-Id: Ib9690103687734bb5a85d3dab0e5642a07087bbc
A terminating line break has not been required in wfDebug() since 2014,
however no migration was done. Some of these line breaks found their way
into LoggerInterface::debug() calls, where they mess up the formatting
of the debug log.
So, remove terminating line breaks from wfDebug() and
LoggerInterface::debug() calls.
Also:
* Fix the stripping of leading line breaks from the log header emitted
by Setup.php. This feature, accidentally broken in 2014, allows
requests to be distinguished in the log file.
* Avoid using the global variable $self.
* Move the logging of the client IP back to Setup.php. It was moved to
WebRequest in the hopes that it would not always be needed, however
$wgRequest->getIP() is now called unconditionally a few lines up in
Setup.php. This means that it is put in its proper place after the
"start request" message.
* Wrap the log header code in a closure so that variables like $name do
not leak into global scope.
* In Linker.php, remove a few instances of an unnecessary second
parameter to wfDebug().
Change-Id: I96651d3044a95b9d210b51cb8368edc76bebbb9e
Migrate all callers of Hooks::run() to use the new
HookContainer/HookRunner system.
General principles:
* Use DI if it is already used. We're not changing the way state is
managed in this patch.
* HookContainer is always injected, not HookRunner. HookContainer
is a service, it's a more generic interface, it is the only
thing that provides isRegistered() which is needed in some cases,
and a HookRunner can be efficiently constructed from it
(confirmed by benchmark). Because HookContainer is needed
for object construction, it is also needed by all factories.
* "Ask your friendly local base class". Big hierarchies like
SpecialPage and ApiBase have getHookContainer() and getHookRunner()
methods in the base class, and classes that extend that base class
are not expected to know or care where the base class gets its
HookContainer from.
* ProtectedHookAccessorTrait provides protected getHookContainer() and
getHookRunner() methods, getting them from the global service
container. The point of this is to ease migration to DI by ensuring
that call sites ask their local friendly base class rather than
getting a HookRunner from the service container directly.
* Private $this->hookRunner. In some smaller classes where accessor
methods did not seem warranted, there is a private HookRunner property
which is accessed directly. Very rarely (two cases), there is a
protected property, for consistency with code that conventionally
assumes protected=private, but in cases where the class might actually
be overridden, a protected accessor is preferred over a protected
property.
* The last resort: Hooks::runner(). Mostly for static, file-scope and
global code. In a few cases it was used for objects with broken
construction schemes, out of horror or laziness.
Constructors with new required arguments:
* AuthManager
* BadFileLookup
* BlockManager
* ClassicInterwikiLookup
* ContentHandlerFactory
* ContentSecurityPolicy
* DefaultOptionsManager
* DerivedPageDataUpdater
* FullSearchResultWidget
* HtmlCacheUpdater
* LanguageFactory
* LanguageNameUtils
* LinkRenderer
* LinkRendererFactory
* LocalisationCache
* MagicWordFactory
* MessageCache
* NamespaceInfo
* PageEditStash
* PageHandlerFactory
* PageUpdater
* ParserFactory
* PermissionManager
* RevisionStore
* RevisionStoreFactory
* SearchEngineConfig
* SearchEngineFactory
* SearchFormWidget
* SearchNearMatcher
* SessionBackend
* SpecialPageFactory
* UserNameUtils
* UserOptionsManager
* WatchedItemQueryService
* WatchedItemStore
Constructors with new optional arguments:
* DefaultPreferencesFactory
* Language
* LinkHolderArray
* MovePage
* Parser
* ParserCache
* PasswordReset
* Router
setHookContainer() now required after construction:
* AuthenticationProvider
* ResourceLoaderModule
* SearchEngine
Change-Id: Id442b0dbe43aba84bd5cf801d86dedc768b082c7
In brief, the BlockLevelPass looks at opening and closing tags on a line
to determine whether it should do paragraph wrapping. The blockElems
want to stop wrapping when opened and start again when closed. The
antiBlockElems want the opposite, to start when they're opened and stop
when closed. "table" is a blockElems and "td"|"th" are anitBlockElems
so that content found in the interstitial spaces of tables are never
paragraph wrapped.
That means that, to date, "caption" elements are always found in a place
where paragraph wrapping is always suppressed and so adding them to that
set won't change any test results. However, a new test is added to spec
out this behaviour.
In the legacy parser, "captions" are always found in the right place
because handleTables runs at an earlier stage. In Parsoid, however, the
treebuilder is relied on to close table cells [0] so when we get to the
token stream paragraph wrappping pass, "caption"s are found in table
cells and therefore get wrapped, even though the treebuilder is about to
be induced to close the cell before opening the caption.
Therefore, in Parsoid, the fix would require us to make captions always-
suppressing to match the legacy parser behaviour. Thus, this change
here is just to keep these lists [1] consistent between the two
parsers.
[0] 5e11a3f390/src/Wt2Html/TokenizerUtils.php (L138-L151)
[1] 5e11a3f390/src/Wt2Html/TT/ParagraphWrapper.php (L71-L78)
Bug: T210647
Change-Id: I8ccefd69d47dca740f50924b235dffa3873d1f99
This behavior has been deprecated and with a tracking category since
1.28. Time to remove the temporary parameter added to
Sanitizer::removeHTMLtags() and (finally) tweak the behavior to match
HTML5.
Bug: T134423
Change-Id: I5c725175d05854139c95a2b3d8d35ff63cb6707b