The existing Sanitizer::removeHTMLtags() method, in addition to having
dodgy capitalization, uses regular expressions to parse the HTML.
That produces corner cases like T298401 and T67747 and is not guaranteed
to yield balanced or well-formed HTML.
Instead, introduce and use a new Sanitizer::removeSomeTags() method
which is guaranteed to always return balanced and well-formed HTML.
Note that Sanitizer::removeHTMLtags()/::removeSomeTags() take a callback
argument which (as far as I can tell) is never used outside core. Mark
that argument as @internal, and clean up the version used by
::removeSomeTags().
Use the new ::removeSomeTags() method in the two places where
DISPLAYTITLE is handled (following up on T67747). The use by the
legacy parser is more difficult to replace (and would have a
performace cost), so leave the old ::removeHTMLtags() method in place
for that call site for now: when the legacy parser is replaced by
Parsoid the need for the old ::removeHTMLtags() will go away. In a
follow-up patch we'll rename ::removeHTMLtags() and mark it @internal
so that we can deprecate ::removeHTMLtags() for external use.
Some benchmarking code added. On my machine, with PHP 7.4, the new
method tidies short 30-character title strings at a rate of about
6764/s while the tidy-based method being replaced here managed 6384/s.
Sanitizer::removeHTMLtags blazes through short strings 20x faster
(120,915/s); some of this difference is due to the set up cost of
creating the tag whitelist and the Remex pipeline, so further
optimizations could doubtless be done if Sanitizer::removeSomeTags()
is more widely used.
Bug: T299722
Bug: T67747
Change-Id: Ic864c01471c292f11799c4fbdac4d7d30b8bc50f
- Moves Skin::getSectionsData and related methods into new TOC component.
- Update SkinMustacheTest for toc getTemplateData method.
- Add test for new TOC component.
Bug: T301523
Change-Id: I29dda96f1e91da6892840d38a80c6102d425d0f7
I noticed that the CentralAuth integration tests were slow due to their
use of permanent rather than temporary tables, which cause MariaDB to do
unlink+fsync on truncate. The result was an incredible ~100ms of latency
per truncate query, reproducible with this script, although it was
reduced to ~10ms by mounting with nobarrier. Since this is likely
system-dependent, I am curious to know if others have the same issue.
Change-Id: Ifb25ff7675fff179e0b6e465aa3043b928048fc0
This rewrites both generateSchemaChangeSql.php and generateSchemaSql.php
to deduplicate logic and adds the 'all' option to --type to generate
schemas for each supported DBMS platform.
Specifying a path with --sql will have the script insert a directory
named after the platform, but only when --type=all is provided.
Only if a directory named mysql/ exists will sql files for mysql be
placed there, to allow using a setup similar to MediaWiki, where only
non-mysql types have their sql files placed in dedicated directories.
This also adds preparations for T298320, so that the abstract schema
can be validated and any errors can be handled before generating a
schema.
Bug: T268587
Change-Id: Ief8282017c8d38659b79262afb8fc691b5bda256
In preparation for refactoring SkinTemplate so that SkinMustache
extends Skin rather than SkinTemplate, we take the opportunity
to reorganize the skin code around the concept of components.
Going forward a skin will consist of multiple components, each
of which must return template data that can be passed to an
associated template.
This will result in code that is easier to work with, compared
with the existing 3000 line skin class.
This is the beginning of that journey. Other components will follow
while maintaining backwards compatibility
Bug: T263213
Change-Id: Ib62724c24601e04aa13ab09b3242e70d7d6436ca
This is a first draft of the configuration doc renderer.
The resulting markdown certainly needs some love, but
we can work on improvements incrementally. This gives
us a baseline to reference on doc.wikimedia.org
Bug: T296647
Change-Id: I3c426b9fc37b1cf7ce8423969b2d7589767ee6cc
Per Krinkle. This was just introduced and not included in any
released version of MW, so we don't have to worry about back-compat.
However, this patch should be merged at the same time as the
corresponding patch for TimedMediaHandler.
Change-Id: Ife41cabb0f5a6b89b160aec9a123220275edf914
Don't catch and discard exceptions from the RequestTimeout library,
except when the exception is properly handled and the code seems to be
trying to wrap things up.
In most cases the exception is rethrown. Ideally it should instead be
done by narrowing the catch, and this was feasible in a few cases. But
sometimes the exception being caught is an instance of the base class
(notably DateTime::__construct()). Often Exception is the root of the
hierarchy of exceptions being thrown and so is the obvious catch-all.
Notes on specific callers:
* In the case of ResourceLoader::respond(), exceptions were caught for API
correctness, but processing continued. I added an outer try block for
timeout handling so that termination would be more prompt.
* In LCStoreCDB the Exception being caught was Cdb\Exception not
\Exception. I added an alias to avoid confusion.
* In ImageGallery I added a special exception class.
* In Message::__toString() the rationale for catching disappears
in PHP 7.4.0+, so I added a PHP version check.
* In PoolCounterRedis, let the shutdown function do its thing, but
rethrow the exception for logging.
Change-Id: I4c3770b9efc76a1ce42ed9f59329c36de04d657c
Add a hook which runs at the end of UserEditCountUpdate. The idea is to
allow CentralAuth to keep its own similar edit count. There's no
other hook appropriate for this purpose since
UserEditTracker::incrementUserEditCount() has multiple callers.
Convert the private associative array in UserEditCountUpdate to a class.
Instances of the class are passed to the new hook. A class is better
suited to a public interface.
Bug: T300075
Change-Id: I16a92e6a6e9faf1be4c7fbe25354a08559df163d
This would make Database class smaller and encapsulates the transaction
logic making it easier to understand and change.
Bug: T299698
Change-Id: I409a474209ab4b714c5f62e5e7c0b7a62b9e82c1
Right now, TimedMediaHandler hardcodes looking for ForeignAPIRepo
class name to do some api queries. This means if you make an extension
file repo that supports making api queries that you want to work
with TimedMediaHandler, you need to subclass ForeignAPIRepo.
This isn't very friendly to extension developers. Instead introduce
an interface that extension file repos can implement, so that they
can mark their repos as working for this purpose.
Change-Id: Iadcb831442529d0b9443a422069ada3ce9eac0dd
This script from ~2005 is not runnable on large wikis, partly redundant
given deleteOrphanedRevisions.php, and is pretty scary. It also is one
of the few callers of lockTables().
Bug: T294969
Change-Id: I8d4608c51ce83ac2e221a91727959b8c1df13db7
PageImages is expensively loading and reparsing the lead section during
LinksUpdate because the parser's image link hooks do not give enough
context to tell whether the image is in the lead section.
So, add a hook which allows PageImages to add a marker to image links.
The marker can be associated with the section number in ParserAfterTidy.
Bug: T296895
Bug: T176520
Change-Id: I24528381e8d24ca8d138bceadb9397c83fd31356
Add a collation that gets its data from a remote Shellbox instance. This
is meant as a migration helper to use during an ICU upgrade.
Add a batch method to Collation so that this can be somewhat efficient
when adding multiple categories.
Bug: T263437
Change-Id: I76610d251fb55df90c78acb9f59fd81421f876dd
== Background ==
The `user.options` module is private, and thus has to be embedded in
the page HTML. This data is quite large. For example, on enwiki the
finalized mw.user.options object is about 3KB serialized/compressed
(7KB uncompressed).
The `user.defaults` module is an implementation detail of
`user.options`, and was created to accomplish mainly two things:
* Save significant data transfers by allowing it to be cached
client-side without being part of the article.
* Ensure consistency between articles and allow faster deployment of
changes, by not being part of the cacheable article HTML.
All our pageviews already load `user.defaults`, as a dependency of
the popular `mediawiki.api` and `mediawiki.user` modules. These are
used by `mediawiki.page.ready` (queued on all pages), and on Wikipedia
these are also loaded on all pages by ULS, VisualEditor, EventLogging,
and more.
As such, in practice, bundling "user.defaults" with "mediawiki.base"
will not cause the data to be loaded more often than before.
== What ==
* Add virtual "user.json" package file with the same data that
was previously exported by ResourceLoaderUserDefaultsModule,
and pass it to mw.user.options.set() from base module's entry point.
An alternative way would be to use a "user.js" file, which would
return a generated "mw.user.options.set()" expression. I went
for exporting it as JSON for improved maintainability (reducing
the amount of JS code written in PHP), and because it performs
slightly better. The JS file would implicitly come with a file
closure (tiny bit more bytes), and would then be lazy executed
(tiny bit more time).
The chosen approach allows the browser to compile the JSON
off-the-main-thread ahead of time while the module response downloads.
Then when the module executes, we can reference the JSON object
and use it directly.
* Update internal dependency from `user.options`.
* Remove `user.defaults` module without deprecation. It is an internal
module with no direct use anywhere in Git (Codeseach), and no use
anywhere on-wiki (Global Search).
Change-Id: Id3916f94f75078808951863dea2b3a9c71b0e30c
Otherwise they are using the 'delete' and 'protect' message key,
which are already defined.
Follow-Up: Idf076573c0f429171221660145b616ec83516a2a
Bug: T295611
Change-Id: I797f0e23ca639dac6889f3f363b7a8c72e30f681
Adding Special:Delete and Special:Protect as redirects to the delete and
protect action.
Bug: T295611
Change-Id: Idf076573c0f429171221660145b616ec83516a2a
Since 1.34 setting non-default HTTP engine
has been deprecated. It's time to remove
the old implementations. Only Guzzle is
now available.
Change-Id: I978b75827e69db02cbc027fe0b89a028adfc6820
This is a uniform mechanism to access a number of bespoke boolean
flags in ParserOutput. It allows extensibility in core (by adding new
field names to ParserOutputFlags) without exposing new getter/setter
methods to Parsoid. It replaces the ParserOutput::{get,set}Flag()
interface which (a) doesn't allow access to certain flags, and (b) is
typically called with a string rather than a constant, and (c) has a
very generic name. (Note that Parser::setOutputFlag() already called
these "output flags".)
In the future we might unify the representation so that we store
everything in $mFlags and don't have explicit properties in
ParserOutput, but those representation details should be invisible to
the clients of this API. (We might also use a proper enumeration
for ParserOutputFlags, when PHP supports this.)
There is some overlap with ParserOutput::{get,set}ExtensionData(), but
I've left those methods as-is because (a) they allow for non-boolean
data, unlike the *Flag() methods, and (b) it seems worthwhile to
distingush properties set by extensions from properties used by core.
Code search:
https://codesearch.wmcloud.org/search/?q=%5BOo%5Dut%28put%29%3F%28%5C%28%5C%29%29%3F-%3E%28g%7Cs%29etFlag%5C%28&i=nope&files=&excludeFiles=&repos=
Bug: T292868
Change-Id: I39bc58d207836df6f328c54be9e3330719cebbeb
This hook allows extensions to add additional pages to the list of
pages being exported. For example, if an extension creates supplemental
pages that must be exported along with a content page, this hook will
allow the extension to export the supplemental pages automatically when
the content page is exported.
Bug: T292378
Change-Id: I3b6b466b6ed32b1939674872df9e431bd6941645
With this patch deprecation warnings will be emitted
if $wgUser is accessed or written into. The only pattern
of usage still allowed is
$oldUser = $wgUser;
$wgUser = $newUser;
// Do something
$wgUser = $oldUser;
Once there is no deprecation warnings, we know that nothing
legitimately depends on $wgUser being set, so we can safely
remove the code that's still allowed as well.
Bug: T267861
Change-Id: Ia1c42b3a32acd0e2bb9b0e93f1dc3c82640dcb22
This reverts commit 0d453d915a.
Reason for revert: Trying to find a better approach than an abstract class
Change-Id: I69c0ec25cb920f41cb0eb6b3ad921d7b02290581
Removes deprecated API endpoints and modules for dealing with
CSRF tokens.
Note: i18n messages are removed in a followup for ease of revert.
Bug: T280806
Depends-On: Ic83f44587db119ff2e3e6d5ff33a10894e0695e7
Change-Id: I58aedec6942ac5d3c21574cb0072f00ef365098c