Commit graph

669 commits

Author SHA1 Message Date
Daimona Eaytoy
2b37cfaf18 build: Bump mediawiki-codesniffer to 31.0.0
Done with `composer fix` and suppressing the rest (i.e. sniffs for
global variables, which for core should be suppressed anyway).

Additionally, add `-p` to `phpcbf`, as otherwise it just seems stuck.

Change-Id: Ide8d6cdd083655891b6d654e78440fbda81ab2bc
2020-05-30 14:56:28 +00:00
Ori Livneh
19931e069f mime: Update usage of MimeAnalyzer methods
Follow-up to I93bd71ec1.

Bug: T252228
Change-Id: I45c9fc592c9e41e0868e7d965206d4c04f4f92e1
2020-05-28 20:13:47 +00:00
Ori Livneh
7e01e86e09 mime: Represent lists as arrays instead of space-delimited strings
Deprecate the interfaces in MimeAnalyzer that return lists as
space-separated strings in favor of replacement methods that return
arrays.

Deprecated:

 - ::getExtensionsForType( $mime ) : string|null
 - ::getTypesForExtension( $ext ) : string|null
 - ::guessTypesForExtension( $ext ) : string|null

Added:

 - ::getExtensionsFromMimeType( $mime ) : string[]
 - ::getMimeTypesFromExtension( $ext ) : string[]
 - ::getExtensionFromMimeTypeOrNull( $mime ) : string|null
 - ::getMimeTypeFromExtensionOrNull( $ext ) : string|null

- "From" is clearer than "For"[1] and is neatly symmetrical with "To"
  (viz. ::mExtToMime and ::mMimeToExt).
- "MimeType" is less ambiguous than "Type", which in this context may
  refer either to media type or MIME type.
- "{..}OrNull" is better because it helps users remember to handle a null
  return value. Putting the "OrNull" at the end (getXFromYOrNull) is
  better than putting it in the middle (getXOrNullFromY) because it's
  harder to ignore that way, at the cost of a very slight grammatical
  ambiguity.

Usage in Core will updated in a separate commit.

Lastly, this change prepares for the deprecation of mutating the public
'mExtToMime' attribute as a means of registering extensions. It will be
formally deprecated in a follow-up change.

  [1]: Positive signal: https://developer.android.com/reference/android/webkit/MimeTypeMap#getMimeTypeFromExtension(java.lang.String)

Bug: T252228
Change-Id: I93bd71ec18492722f05c66e0a2945d93281c3100
2020-05-28 15:15:43 +00:00
jenkins-bot
d535ebeb51 Merge "WatchedItemStore: Enforce a maximum watchlist expiry duration" 2020-05-27 22:32:06 +00:00
Reedy
229b2c15e8 Fix a plethora of class and function call case mismatches
Bug: T231412
Change-Id: I597a25de3294a6673424f30475760280ef209a8a
2020-05-26 14:14:46 +01:00
Ori Livneh
7c9e19ed5e mime: Document null return from MimeAnalyzer::improveTypeFromExtension()
This method returns null when $mime is 'unknown/unknown' and the file
extension is unknown to MediaWiki. The inline documentation and @return
annotation omitted this.

I don't think this was an intentional design choice, but it's the
existing behavior and I'm not sure it's safe to change.

Since it is the existing behavior, document it and add a test case, to
ensure that any changes to this behavior are intentional.

Bug: T253483
Change-Id: Ie6615a4bd9ae77e9ab59cfe76edb237cace693b1
2020-05-24 15:51:08 -04:00
Ori Livneh
9971d4dced mime: Add test for MimeAnalyzer::addExtra{Types,Info}
..and for adding file extensions by modifying the mExtToMime field.
All three interfaces will be deprecated in a follow-up change.

Change-Id: I7ec940a8b2fe02cd0fe01593cd6897f75777a8fa
2020-05-23 01:11:41 -04:00
MusikAnimal
0694cc02f1 WatchedItemStore: Enforce a maximum watchlist expiry duration
Introduces $wgWatchlistExpiryMaxDuration which is used instead of given
expiry if the given exceeds it. This is done in the storage layer. The
reasoning is to control the size of the watchlist_expiry table. Hence,
the max duration does not apply to indefinite expiries (since that would
mean now row in watchlist_expiry).

The frontend is responsible for disallowing expiries greater than the
max, if it choses to do so.

APIs should now pass in $wgWatchlistExpiryMaxDuration as the PARAM_MAX
setting for the 'expiry' type. They should also set PARAM_USE_MAX so
that the maximum value is used if it is exceeded.

Other APIs that watch pages will be updated in separate patches
(see T248512 and T248514).

Bug: T249672
Change-Id: I811c444c36c1da1470f2d6e185404b6121a263eb
2020-05-22 00:15:23 -04:00
jenkins-bot
f40f3e8b27 Merge "Add rawTables(), getQueryInfo() and queryInfo() to SelectQueryBuilder" 2020-05-21 18:18:30 +00:00
jenkins-bot
3f2937810e Merge "mime: Convert built-in MIME mappings to PHP arrays" 2020-05-21 01:01:06 +00:00
Ori Livneh
cb44ddf85b mime: Convert built-in MIME mappings to PHP arrays
Currently, MimeAnalyzer builds the internal mappings of MIME types <=> file
extensions by concatenating several string buffers in mime.type format into a
giant string, and then parsing it. The mapping of MIME types to internal
media types is built up in a similar way, except we use a dubious homegrown
format with undocumented conventions. It's a mess, and an expensive one --
~1.5% of api.php CPU time on the WMF cluster is spent building these buffers
and parsing them. Converting the mappings to PHP associative arrays makes
them much cheaper to load and easier to maintain.

Doing this without breaking compatibility with existing behaviors requires
some delicate footwork. The current mime.types buffer is made up of the
following fragments, in order:

  1) MimeAnalyzer::$wellKnownTypes
  2) If $wgMimeTypeFile == 'includes/mime.types' (sic!):
       the contents of includes/libs/mime/mime.types.
     If $wgMimeTypeFile is another file path (e.g., '/etc/mime.types'):
       the contents of that file.
     If !wg$MimeTypeFile, this fragment is blank.
  3) MimeAnalyzer::$extraTypes (populated by extensions via hook).

The mime.info buffer is built up in the exact same way, except it's
MimeAnalyzer::$wellKnownInfo, $wgMimeInfoFile, and MimeAnalyzer::$extraInfo.

What this means in effect is that some built-in MediaWiki MIME mappings are
"baked in" (anything in MimeAnalyzer::$wellKnown*), and others can be
overridden (anything in includes/libs/mime/mime.*).

To avoid breaking backward compatibility, we have to preserve the
distinction.  Thus this change has two MIME mappings, encapsulated in two
classes: 'MimeMapMinimal', which contains just the baked-in mappings, and
'MimeMap' which contains both the baked-in and overridable mappings.  We also
have to keep the code for parsing mime.types and the ad-hoc mime.info format,
at least for now.

In a FUTURE change (i.e., not here), I think we can:

* Deprecate $wgMimeTypeFile in favor of a new config var,
  $wgExtraMimeTypeFile. $wgMimeTypeFile is evil because if you are using to
  add support for additional MIME types, you can end up unwittingly dropping
  support for other types that exist in MediaWiki's mime.types but not your
  file. The new $wgExtraMimeTypeFile would only be used to add new MIME
  mappings on top of the standard MimeMappings, which was probably the
  original intent for $wgMimeTypeFile.
* Deprecate $wgMimeInfoFile. I don't think we need to provide a replacement,
  because extensions can use the hook, and I doubt anyone is using the config
  var. But if we wanted to provide an alternative, we could have a
  $wgExtraMimeInfoMap that has an array of extra mappings.
* Deprecate MimeAnalyzer::addExtraTypes and MimeAnalyzer::addExtraInfo, and
  provide alternative interfaces that take structured input instead of string
  blobs.

I tested this by dumping the internal state of MimeAnalyzer before and after
this CL using the script in Ib856a69fe, using both default and custom values
for $wgMimeInfo(File|Type).

Bug: T252228
Change-Id: I9b2979d3c9c0dee96bb19e0290f680724e718891
2020-05-19 00:59:52 -04:00
Tim Starling
8c1904a4d4 Add rawTables(), getQueryInfo() and queryInfo() to SelectQueryBuilder
To support direct access to the underlying arrays as required by the
ApiQueryBaseBeforeQuery hook. It was a design goal of SelectQueryBuilder
to retain these arrays for the benefit of legacy code.

Change-Id: I523a9e53d17659ad35098e586e8a501f57e4de25
2020-05-18 14:42:42 +10:00
Reedy
a8b006426e Fix tests/ PSR12.Properties.ConstantVisibility.NotFound
Change-Id: I0beed1a35e046705fb84c9d1f63cf92afd009bb4
2020-05-16 04:30:21 +01:00
Ori Livneh
a7e9412297 mime: Fix whitespace parsing of 'mime.info' file
Some entries in mime.info used runs of spaces instead of tabs to
separate the list of MIME types from the media type field, and the
mime.info code handled that incorrectly. This led to us treating the
empty string as a valid MIME alias for application/sla.

Fix that by splitting on /\s+/ rather than ' '. Also made tab usage in
mime.info uniform. I'd document the convention, but I plan on nuking
mime.info in a forthcoming changeset anyway.

Bug: T252228
Change-Id: I06c733b54fd622280ea67e206340598605cb6958
2020-05-13 17:28:26 +00:00
jenkins-bot
20bfc01395 Merge "objectcache: add "non-global" mode to WANObjectCache "coalesceKeys"" 2020-05-08 17:15:07 +00:00
Aaron Schulz
c95c6f7470 objectcache: add "non-global" mode to WANObjectCache "coalesceKeys"
This makes it easier to rollout one keyspace/project at a time even
if some keys are shared and receive purges. The shared keys can all
be done as the last step.

Also, simplify getMulti() to no longer need extractBaseKey().
Make the "warm up cache" logic a bit easier to follow and less
likely to copy values around.

Change-Id: I8b602ddf5dd1feaada45fb0af202c5603836a8dd
2020-05-08 17:01:31 +00:00
jenkins-bot
a449c5005b Merge "database: Disallow db->update() without condition" 2020-05-06 16:15:11 +00:00
Max Semenik
79ca26279c UploadedFileStream: PHP 8 compatibility
In 8, some I/O functions now throw, adapt to this.

Bug: T248925
Change-Id: Ic167ae0e903f143a8d423dc185383437e4c0afd2
2020-04-30 17:09:23 +00:00
Peter Ovchyn
64ea02b060 database: Disallow db->update() without condition
In order to prevent possible performance or replication issues, empty condition
for 'update' queries shouldn't be allowed

Bug: T243619
Depends-On: Ica5f4719c7c927a4e33ba818c40c9f6fc1a5ee7b
Change-Id: Ib728b639ec0c1b079046ac0f8492449def36f2a0
2020-04-29 19:25:27 +03:00
Aaron Schulz
27cf5ace45 rdbms: add IDatabase::QUERY_* flags to obviate isWriteQuery()
This reduces regex overhead and reliance on brittle assumptions.
This will also be useful for complex write queries involving WITH.
Some RDBMS types allow writes with in the WITH aliases themselves,
in addition to the main query itself. Checking raw SQL strings for
such things would get fairly complex.

Change-Id: I8ac4bc4d671abf02f97e82c5daf7b21271b85e5e
2020-04-28 00:49:11 +00:00
Aaron Schulz
cbc700e186 rbms: optimize and rename truncateTable() to truncate()
Allow truncation of multiple tables. This also provides for
a way to avoid risky keywords like CASCADE for Postgres.

For Postgres, use RESTART IDENTITY, which has been supported
since Postgres 8.4.

Avoid TRUNCATE/DELETE queries for empty temp tables, which is
useful for integrations tests that frequently call this method.

Reorganize and tweak the regexes in Database::getTempWrites().
It now recognizes multi-table DROP/TRUNCATE (Postgres-style).

Change-Id: Idd49f118b20ea5a0f7a3e8c00369aabcd45dd44e
2020-04-21 01:26:18 -07:00
jenkins-bot
8e3297246d Merge "objectcache: make WANObjectCache::set() handle very slow regeneration" 2020-04-15 01:46:35 +00:00
Aaron Schulz
9ec57d7e5b objectcache: make WANObjectCache::set() handle very slow regeneration
If a key always takes a very long time to regenerate, is popular,
and does not use lockTSE, it still needs to be cacheable. Since a
value cannot be more up-to-date than the time it takes to regenerate
it, take the "lower the TTL" approach for these cases. Use "walltime"
to narrow down the "reject the set()" case based on regeneration time.
This is already provided by getWithSetCallback() automatically.

Bug: T244877
Change-Id: Id43fb02738b28dad3bc922057efb7eee0272d0e1
2020-04-14 22:53:38 +00:00
MusikAnimal
2d21ee58ec Add expiry type to ParamValidator
This commit also changes ApiWatch to make use of the new parameter type.
Other APIs will be updated to use it in a separate patch (T248196).

In doing this, we are for the first using logic within a TypeDef outside
the API. This seems acceptable given TypeDefs chiefly appear to serve as
a validation method, with otherwise no particular logic tied to the
concept of APIs.

wfIsInfinity() now uses ExpiryDef::INFINITY_VALS

Bug: T248508
Change-Id: If8f0df059eafb73ec9f39cc076b3a9ce2412d60a
2020-04-08 16:21:04 -04:00
Max Semenik
b04e62f31d phpunit: Simplify StructureTest
* Get rid of a long regexp that had to be maintained and
  was broken anyway, resulting in a false negative.
* Fix that false negative.
* Make the failures array a bit more readable.

Bug: T248075
Change-Id: I4e4e5d6487d23b0d64f29c113d84bddce758e516
2020-03-29 15:40:45 +00:00
Aaron Schulz
13b11a946e rdbms: reduce duplication in Database via helper methods
Add several new internal methods to help with wrangling
the various formats that rows, conditions, options, and
unique key lists can come in. Remove now unused method
isMultiRowArray().

Add various sanity checks and logging for parameters to
upsert(), replace(), insert(), and insertSelect().

Move DatabasePostgresTest to the integration/ directory.

Change-Id: If5988a6f0816e8da2cbf2fd612e1a3e3a2e9c52f
2020-03-10 22:26:04 +00:00
jenkins-bot
ec5ccd7e07 Merge "Replace all new stdClass() with identical (object)[]" 2020-03-04 21:36:01 +00:00
jenkins-bot
44354f45eb Merge "rdbms: inject replLogger into Database and consolidate duplicated logging" 2020-03-04 21:29:23 +00:00
Thiemo Kreuz
6b2c9deef5 Replace all new stdClass() with identical (object)[]
This should be the exact same. Its more a style change than anything.
So why do it then?
* I believe this is much less confusing than code mentioning a weird
"standard class". Barely anybody knows what this is, and what the
difference between "object" and "stdClass" is.
* The code is shorter.
* It's even faster. In my micro benchmark it's twice as fast.

Change-Id: I7ee0e8ae6d9264a89b6cd1dd861f0466ae620ccc
2020-03-04 21:18:30 +00:00
Thiemo Kreuz
e1dd371e11 Make use of PHPUnit's assertCount feature where possible
… and avoid assertEmpty() on arrays, in favor of a much more strict
assertSame( [] ).

Change-Id: I20266b0b1fc38a3a87666ba1b0793cb2b37d94a9
2020-03-02 15:58:41 +00:00
Aaron Schulz
6c5d937adb rdbms: inject replLogger into Database and consolidate duplicated logging
Bug: T235244
Change-Id: I9397f6f74f703a395ef1be4713702247060d8bd4
2020-02-23 00:33:33 +00:00
Aaron Schulz
6b12696452 Move UIDGenerator code to a service and put it under /libs
All MediaWiki dependencies have been removed or injected.

Change-Id: I01c9e96edd6b03496c1595670967ffa5a4069c9d
2020-02-18 00:20:40 +00:00
Aaron Schulz
5d6470d37d rdbms: make Database::build(Greatest|Least) support expressions
This makes it possible to use with counter UPDATE queries.

Also add some extra sanity checks for input types.

Change-Id: Ibc2b7173e28022b5ba7bb04d11c594313a47a101
2020-02-15 21:56:57 +00:00
jenkins-bot
991b58bffb Merge "objectcache: fix "coalesceKeys" option name in WANObjectCache" 2020-02-13 17:24:34 +00:00
Aaron Schulz
b9b3f366bb objectcache: fix "coalesceKeys" option name in WANObjectCache
Follow-up to 85bc62c5a8

Fix related unit tests that otherwise break as a result

Change-Id: I28b3a1537d319c68a7c12c578e1acfb916f3ec99
2020-02-13 03:17:34 +00:00
jenkins-bot
c3f20ad3f7 Merge "Add SelectQueryBuilder" 2020-02-13 00:04:11 +00:00
jenkins-bot
3cddc7eb36 Merge "In Database::select() allow an empty array for $table" 2020-02-06 23:36:03 +00:00
Tim Starling
d06a3e049b Add SelectQueryBuilder
Add a query builder class which encapsulates the parameters to
IDatabase::select() and related functions.

Override useIndexClause() and ignoreIndexClause() in DatabaseTestHelper
so that index hints can be tested.

Bug: T243051
Change-Id: I58eec4eeb23bd7fb05b8b77d0a748f1026afee52
2020-02-07 10:10:17 +11:00
Tim Starling
00c8a5cbde In Database::select() allow an empty array for $table
Previously it would give FROM followed by nothing which is always a
syntax error. Easier to fix it here than to convert empty arrays to
empty strings in SelectQueryBuilder.

Bug: T243051
Change-Id: I95a9b6a34cfb5c1ca4cf243c4226b5ed4f968035
2020-02-07 09:57:27 +11:00
Aaron Schulz
85bc62c5a8 objectcache: add "coalesceKeys" option to WANObjectCache for key grouping
This is useful for grouping related keys on the same servers to reduce
the need for cache server connections and availability. A cache key that
uses "lockTSE" can already involve accessing several keys during the
read/write cache-aside paths:
a) The value key itself
b) The check key (named after the main key, a common pattern)
c) The mutex key (used if the value looks stale)
d) The cool-off key (used if regeneration took a while)

Any problems accessing the first two could cause extra value regenerations.
Problems with the mutex key could lead to stampedes due to threads assuming
another thread was regerating a soon-to-expire value when, in fact, none was.
A similar problem could happen with cool-off keys, with threads assuming
that another saved the newly regenerated value when, in fact, none did.

The use of hash stops puts the tiny related keys on the same server as the
main cache key that they serve. This is only for hash-based routing, and not
route prefix routing (e.g. All*Route still sends the key to multiple child
routes, but the PoolRoute/HashRoute function will hash differently).

The option is not enabled by default yet.

Change-Id: I37e92a88f356ef1e2a2b7728263040e2f6f09a13
2020-02-06 20:27:08 +00:00
Brad Jorsch
a04633f678 ParamValidator: Default PresenceBooleanDef to false rather than implicitly null
PresenceBooleanDef is causing ParamValidator::getValue() to return null,
while historical Action API behavior had returned false instead.

While it generally shouldn't make a difference since PHP considers both
falsey, and good arguments could be made either way, we can restore the
historical behavior easily enough by having normalizeSettings()
default PARAM_DEFAULT to false.

Bug: T244440
Change-Id: Iee1d8e5753407674adc3f7384989841bc9b44c54
2020-02-06 09:44:06 -05:00
jenkins-bot
7b54f5dc3d Merge "rdbms: add GREATEST/LEAST wrappers to IDatabase" 2020-02-04 22:03:11 +00:00
Brad Jorsch
d4c2f0d899 Move some validation logic from ApiStructureTest to ParamValidator
ApiStructureTest has a lot of logic for validating Action API settings
arrays during CI. Some of that logic should be part of ParamValidator
instead.

Bug: T242887
Change-Id: I3c3d23e38456de19179ae3e5855397316b6e4c40
Depends-On: I04de72d731b94468d8a12b35df67f359382b3742
2020-02-04 20:29:35 +00:00
jenkins-bot
51280df815 Merge "objectcache: fix cache pollution in WANObectCache Multi* methods" 2020-01-30 18:17:03 +00:00
Aaron Schulz
527fd0109f objectcache: fix cache pollution in WANObectCache Multi* methods
This was triggered by bad reference handling during preemptive refreshes

Bug: T235188
Change-Id: I239a3e1922f478c74c94d8d2debff28f525c7c31
2020-01-29 21:08:39 +00:00
jenkins-bot
0a46bef2db Merge "objectcache: fix storage of null values in WANObjectCache" 2020-01-29 04:14:20 +00:00
Aaron Schulz
4fb5210b62 objectcache: fix storage of null values in WANObjectCache
Bug: T234583
Change-Id: I38a531b9a0acb95d7884519f3381b48cd9d8faa0
2020-01-24 22:49:26 +00:00
Aaron Schulz
314efebb56 rdbms: add GREATEST/LEAST wrappers to IDatabase
Change-Id: I9de931123b03ce10713a3a9bbb34e1332dd5965b
2020-01-17 22:19:08 +00:00
Brad Jorsch
55d3d81803 ParamValidator: Tighten unrecognized value handling
Rename PARAM_IGNORE_INVALID_VALUES to PARAM_IGNORE_UNRECOGNIZED_VALUES,
and make it only ignore the "badvalue" errors thrown by EnumDef for
unrecognized values instead of ignoring every kind of error.

This better matches historical Action API behavior.

Change-Id: Ifdb9063c0a2e82c728e98065e2ac950f4a713552
2020-01-14 17:24:20 -05:00
Brad Jorsch
6e9ec41d0a ParamValidator: Adjust message usage
This fixes a few things in ParamValidator's use of messages:

* Values need to be stringified in various places.
* Don't pass the value to paramvalidator-toomanyvalues, to make it match
  apierror-toomanyvalues. We weren't using it anyway.
* Rename "paramvalidator-help-multi-sep" to match
  "api-help-param-multi-separate".
* Fix attempts to use messages "paramvalidator-param-default" and
  "paramvalidator-param-default-empty" that don't exist.
* EnumDef's logic for selecting between enummulti vs enumnotmulti
  was backwards.
* Sort enum values in getParamInfo() output.
* Adjust some message texts to match equivalent former Action API
  messages.

Change-Id: I1551a59bd110f12e478de7292a00f9a767cd94ca
2020-01-14 17:24:20 -05:00