Commit graph

59 commits

Author SHA1 Message Date
Aaron Schulz
21e71e0235 Use IDatabase type hints in /maintenance
Relatedly, move lockTables()/unlockTables() to IMaintainableDatabase

Change-Id: Ib53e9fa948deb2f9a70f0ce16c002613d0060bf9
2017-04-07 23:37:41 +00:00
Aaron Schulz
30f4b3c103 Replace DatabaseBase => Database in more places
Change-Id: If37a7909056bf2c31a8228cbc84f0fbbf5f1c517
2016-09-28 15:53:02 -07:00
Aaron Schulz
950cf6016c Rename DB_SLAVE constant to DB_REPLICA
This is more consistent with LoadBalancer, modern, and inclusive
of master/master mysql, NDB cluster, and MariaDB galera cluster.

The old constant is an alias now.

Change-Id: I0b37299ecb439cc446ffbe8c341365d1eef45849
2016-09-05 22:55:53 -07:00
jenkins-bot
a206443c0e Merge "Change "slave" => "replica DB" in /maintenance" 2016-09-06 01:05:50 +00:00
Aaron Schulz
c0a9ab0f6d Change "slave" => "replica DB" in /maintenance
Change-Id: Ibd3d617901130378a935402326cd4eefbb382c9e
2016-09-06 00:13:08 +00:00
Kaldari
c1bf1f369e Fixing dry-run logic in updateCollation.php
Currently if you run updateCollation.php in dry-run mode, it ignores
the other parameters and doesn't give you a row estimate. Now it
will behave the same as an actual run (just without making any
changes to the database).

Change-Id: I25a9751d8ab7554e7975e5f08122dd1ddaaf40a7
2016-08-29 16:29:24 -07:00
jenkins-bot
e2eefa3a97 Merge "Make updateCollation wait for slaves every 500 (instead of 2000)" 2016-05-12 11:47:08 +00:00
Brian Wolff
d9a965c69d Make updateCollation wait for slaves every 500 (instead of 2000)
2000 writes per wfWaitForSlaves() seems a bit high. There was a
report of this script causing some slave lag when being run.
Note that, the amount of time between wfWaitForSlaves() was
previously increased in r97146.

Bug: T58041
Change-Id: I07a29499775a17255865f25e6b9f1058f898193b
2016-05-12 06:32:47 -04:00
jenkins-bot
1b526b27d3 Merge "use slave for row estimate in updateCollation.php" 2016-05-11 17:06:05 +00:00
Brian Wolff
09205b2954 use slave for row estimate in updateCollation.php
jcrespo reported a lag spike at the very beginning of running this
script. I'm guessing that's due to counting how many rows in
categorylinks to give the progress bar. Since we only need a
rough estimate for the progress meter, make that query run on
a slave. Also add a wfWaitForSlaves() immediately after it for
good measure.

Bug: T58041
Change-Id: I3cba392f0013fcb2ef86803632e2d9b1b88b3b29
2016-05-10 15:38:27 -04:00
Brian Wolff
01bcb5a9e6 Use STRAIGHT_JOIN on updateCollation.php per jcrespo
Was not using the right index on ruwiktionary

Bug: T58041
Change-Id: Ib55a2cdd7807a96df7076a1b54457dd4f74912ce
2016-05-10 15:30:03 -04:00
Brian Wolff
0d4e0ca543 Add -f as an alias of --force to cli args of updateCollation.php
Because I kept accidentally using -f without realizing it didn't
work.

Change-Id: I71da15c81ca12c630304f594d144c4c7289ec28c
2016-04-26 17:07:30 +00:00
Brian Wolff
eec016ece6 Add new index to make updateCollation.php painless
We want to update categories in order, to minimize disruption
to users. Previous indexes required a filesort to do this, which
exploded things on large wikis. See bug for details

Bug: T58041
Change-Id: Iee6cd997ff87a313a46fda19d8ab063d0fed8ce8
2016-03-22 16:32:52 -06:00
Kunal Mehta
6e9b4f0e9c Convert all array() syntax to []
Per wikitech-l consensus:
 https://lists.wikimedia.org/pipermail/wikitech-l/2016-February/084821.html

Notes:
* Disabled CallTimePassByReference due to false positives (T127163)

Change-Id: I2c8ce713ce6600a0bb7bf67537c87044c7a45c4b
2016-02-17 01:33:00 -08:00
Max Semenik
59db24e90b Use addDescription() instead of accessing mDescription directly
Change-Id: I0e2aa83024b8abf5298cfea4b21bf45722ad3103
2016-01-30 01:28:32 -08:00
Kevin Israel
1dd4c867e5 updateCollation.php: Switch back to using cl_from index for now
Using the cl_sortkey index instead (to reduce disruption to a live
site), as currently implemented, seems to have two serious problems:

* MySQL / MariaDB filesorts all rows that "sort above the given row
  [the last row of the previous batch]", not just a single category
  at a time until the row limit is reached.
* The current approach to pagination is broken in that it does not
  work with ENUM columns such as cl_type, causing 'file' rows to be
  skipped, or rows of any type to be repeated. See T119173.

This reverts part of commit a43f751cf6.

Bug: T58041
Change-Id: I619564e85b2122f249bdacc45d547b9ce1b3beb5
2016-01-21 05:57:48 +00:00
Aaron Schulz
fa8e1a9b00 Clean up transactions in maintenance scripts
Add transaction methods to complement getDB().
This makes it easy to grep for direct begin()/commit()
calls to IDatabase by having script use their own
wrapper. Maintenance scripts are one of the few places
that can (and need to) use begin/commit instead of the
start/end atomic methods.

Eventually, there should be almost no direct callers
and those methods can be made stricter about throwing
errors on nested calls.

Change-Id: Ibbfc7a77c0d2a55f7fc2261087f6c3a19061e0aa
2015-12-30 23:40:35 +00:00
Kevin Israel
924a34c298 Remove --max-slave-lag options and remnants from maintenance scripts
Change-Id: Id01fb9a82bcfe1af8cbce23a9aec7eccaa0f6b21
2015-03-26 19:33:35 -04:00
umherirrender
b0cfcd0fcb Add missing @return and @param to doc blocks
Change-Id: I9d99ba1968ed8f97624d957754c8847dfe1b41da
2014-08-27 21:57:45 +02:00
umherirrender
6b4c44c2db Add missing @param to function docs
Change-Id: Ib26407bc55dff7969d8a3b1e2ae51751b202d8fb
2014-08-18 16:24:59 +00:00
Siebrand Mazeland
606c680b21 Update formatting in maintenance/ (4/4)
Change-Id: I6b58d014a4bfd6600e4e6f80188fdcfce18482ca
2014-04-23 20:09:26 +02:00
Mark A. Hershberger
0b5acd0623 Move reference to $row where it is in-scope and doesn't produce
E_STRICT notices.

Bug: 57575
Change-Id: Ic508ebbb0816acd32be355b5f19b46637d58c36a
2013-11-25 22:18:55 -05:00
MatmaRex
c9e8cffc81 updateCollation.php: sanity check the collation before proceeding
In some cases the constructor will work, but trying to access first
letter data will raise an exception, breaking all category pages.

Bug: 46615
Change-Id: I77de040f97080653fe0d1734d38490eaa2d322db
2013-07-04 05:21:04 +00:00
Timo Tijhof
beb1c4a0ec phpcs: More require/include is not a function
Follows-up I1343872de7, Ia533aedf63 and I2df2f80b81.

Also updated usage in text in documentation and the
installer LocalSettingsGenerator.

Most of them were handled by this regex:
- find: (require|include|require_once|include_once)\s*\(\s*(.+?)\s*\)\s*;$
- replace: $1 $2;

Change-Id: I6b38aad9a5149c9c43ce18bd8edbab14b8ce43fa
2013-05-21 23:26:28 +02:00
Brian Wolff
af6d3572fa Revert "(bug 46615) updateCollation.php: sanity check the collation before proceeding"
Sorry, forgot that method was not in the base class, and I had only tested with uca based collations. This breaks on uppercase type collations.

This reverts commit 6eb84144df

Change-Id: Ib7b9597ff842a76185ba5c153922834ffb741237
2013-05-15 22:40:29 +00:00
Timo Tijhof
50e7985d4d phpcs: Fix WhiteSpace.LanguageConstructSpacing warnings
Squiz.WhiteSpace.LanguageConstructSpacing:
   Language constructs must be followed by a single space;
   expected "require_once expression" but found
   "require_once(expression)"

It is a keyword (e.g. like `new`, `return` and `print`). As
such the parentheses don't make sense.

Per our code conventions, we use a space after keywords like
these. We appeared to have an unwritten exception for `require`
that doesn't make sense. About 60% of require/include usage
was missing the space and/or had superfluous parentheses.

It is as silly as print("foo") or return("foo"), it works
because keywords have no significance for whitespace between
it and the expression that follows, and since experessions can
be wrapped in parentheses for clarity (e.g. when doing string
concatenation or mathematical operations) the parenthesis
before and after basiclaly just ignored.

Change-Id: I2df2f80b8123714bea7e0771bf94b51ad5bb4b87
2013-05-09 05:56:26 +02:00
MatmaRex
6eb84144df (bug 46615) updateCollation.php: sanity check the collation before proceeding
Change-Id: I5be1b1ec1823fdb7438c3f501fb6194142c1e9dc
2013-03-27 21:16:57 +01:00
Platonides
c3f1a3c9ea a43f751 removed the usage of $wgMiserMode
Change-Id: I5528dba582d218721324431015bd930b9b6ab57e
2013-03-18 04:21:55 +00:00
Tim Starling
1db83c1b76 Restore SET cl_timestamp=cl_timestamp
Apparently cl_timestamp=cl_timestamp is a workaround for obscure
behaviour of the timestamp type in MySQL

Change-Id: I803f20bcf4e28e8e2833a07bcf00e7edc00ad84b
2013-03-13 10:18:12 +11:00
Tim Starling
a43f751cf6 Reduce disruption during updateCollation.php
Have updateCollation.php order by cl_to, so that each category is
updated all at once. This minimises the time during which a category
will appear to be incorrectly sorted, while the maintenance script is in
progress.

Mark the cl_collation index as needing deletion, it was always pretty
pointless. You can't do much better than a full table scan when you're
changing the collation value on a wiki.

Increase the batch size since the lack of a cl_to,cl_from index means
that it will have to filesort each category. A larger batch size means
less sorts. As noted by Liangent on bug 45970, you can't order by
cl_sortkey since that will change during execution.

Also fix an inappropriate use of $wgMiserMode and remove a no-op from
the SET clause of the UPDATE.

Very lightly tested.

Change-Id: I19bc8d6701f5f78040aa9c521427ac98ef488d89
2013-03-12 23:08:29 +00:00
Marius Hoch
652c4be7c2 Clean up: Declare variables with public instead of var
Variables in classes should be declared using public $foo
instead of var $foo for various reasons. As we require PHP 5.3
we don't have to take care about that PHP4 left over, but can
get rid of it in favour of the more clear and better readable
public.
See also: http://php.net/manual/en/language.oop5.visibility.php
(Divided into several commits to keep reviewable)

Change-Id: Ic723d0347ab2e3c78bc0097345c68bbee3dc035a
2012-09-14 21:00:00 +02:00
Alexandre Emsenhuber
2a7478b4fb Improve documentation of maintenance scripts.
Change-Id: Id7a04ff816dc47a8cc81a4da5ab0dff26b688bd5
2012-09-03 20:10:09 +02:00
jeroendedauw
38c7f444e1 Use __DIR__ instead of dirname( __FILE__ )
We can now do this since we finally switched to PHP 5.3 for MW 1.20 and get rid of the silly dirname(__FILE__) stuff :)

Change-Id: Id9b2c9cd2e678197aa81c78adced5d1d31ff57b1
2012-08-27 21:45:00 +02:00
Tim Starling
8df24d5586 updateCollation.php size histogram feature
Added a feature allowing updateCollation.php to show a histogram of
sort key sizes, to assess the effect of index size truncation. Added
--dry-run and --target-collation options to allow the index truncation
to be assessed without actually changing the collation.

Change-Id: I497b5d0740384f5d6fdebc6d5ccfea5d853fbd37
2012-07-18 13:23:14 +10:00
Reedy
a8cdc7df3a Use estimateRowPage if wiki is using wgMiserMode
Change-Id: I59404e9514a87f65faf3eb865fafe358d9f01079
2012-07-06 17:57:40 +01:00
Sam Reed
c47f83a4d4 More __METHOD__ in our madness 2012-02-24 18:45:24 +00:00
Sam Reed
62491fef13 Comments, braces, explicit member variables
Remove a couple of unused variables
2011-11-16 13:22:03 +00:00
Roan Kattouw
a47f2dcb2d Followup r97146: drop the $lb->waitTimeout() call per Tim. Was used so Tim could sleep while a schema change was going on, but this is the kind of live hack that doesn't belong in core. 2011-09-15 12:42:29 +00:00
Roan Kattouw
c6fb8af8ef Merge live hacks from r83992 to trunk, after cleaning some things up.
* Wait for slaves after every thousand rows rather than after processing every batch. r83992 had 1000 hard-coded, I put it in SYNC_INTERVAL
* Set $lb->waitTimeout(100000). I have no idea why, but it was in the live hack. Maybe Tim or Domas could enlighten me
* Use a STRAIGHT JOIN for the query on categorylinks and page because MySQL appears to want to join the tables the wrong way around
* Use cl_collation='previousValue' rather than cl_collation!='newValue' if possible. This was originally a dirty live hack, but I re-implemented it nicely with a --previous-collation command line option
* Print a status update both before and after the SELECT query. This allows the user to notice when the SELECT queries are getting increasingly slower, which is an indication you may want to set --previous-collation
2011-09-15 12:17:44 +00:00
Max Semenik
c79a16167a Introduced Maintenance::getDB() and corresponding setDB() to control externally what database object should be used by maintenance script. Currently used by updater to avoid DatabaseSqliteTest from running stuff like Populate* on the live database instead of the one used for testing. 2011-05-24 17:48:22 +00:00
Sam Reed
fa7662d94a Ensure $collationConds is defined on all paths 2011-04-14 18:46:37 +00:00
Sam Reed
b88afb0daa Fixup/add documentation
Remove some unused variables
2011-03-30 19:00:11 +00:00
Roan Kattouw
a38fd53df2 (bug 27975) Fix r83529 (slave catchup in updateCollation.php) to not try to wait for slaves if there are none. Reporter was getting a permission error for getting the master position on a single-server setup 2011-03-14 09:30:56 +00:00
Aryeh Gregor
8c69bdb0a6 Change collationUpdate batch size from 1000 to 50
It selects that many rows, then does PHP processing and an individual
update query for each one.  This is not a good idea when each batch is
done in a single transaction: 1000 MySQL updates interspersed with PHP
processing might take a second or more while locks are held.
2011-03-08 21:21:08 +00:00
Roan Kattouw
ff6fec1e6f Make updateCollation.php a bit less murderous for WMF databases:
* Don't run a COUNT(*) query on what's potentially the entire categorylinks table on enwiki (hundreds of millions of rows). Put it in a miser mode check
* Wait for DB replication to catch up before processing the next batch. Implemented LoadBalancer::waitAll() for this purpose, which should behave more nicely than wfWaitForSlaves()
2011-03-08 16:47:26 +00:00
Tim Starling
f1869f59b0 Add --force option to updateCollation.php. 2011-01-20 06:24:11 +00:00
Tim Starling
eaeea84b44 * Introduced a non-dummy collation for $wgCategoryCollation, namely UCA with default tables.
* Added a maintenance script which generates a list of first letters. Unified Han are omitted for performance, and because they shouldn't be used as headings anyway. A future collation specific to Chinese would provide the KangXi radicals as "first letters".
* Provided a precomputed list of first letters. Used Unicode 6.0.0 data and ICU 4.2. 
* Moved collation functionality from Language to a Collation class hierarchy with factory function. Removed the recently-added methods from Language and updated all callers.
* Changed Title::getCategorySortkey() to separate its parts with a line break instead of a null character. All collations supported by the intl extension ignore the null character, i.e. "ab" == "a\0b". It would have required a lot of hacking to make it work.
* Fixed the uppercase collation to handle non-ASCII characters, redundantly with r80436. I don't think it's necessary to change the collation name as was done there, so I reverted that in the course of my conflict merge. A --force option to updateCollation.php might be nice though.
2011-01-17 14:02:22 +00:00
Brian Wolff
c79b4bdd21 Change the default collation from strtoupper to Language::uc, so that non-ascii characters get to play too.
I know the uppercase thing is just a standby until a real collation function is written. However in the
mean time, i think it'd be really weird for a wiki with $wgCapitalLinks = false to suddenly have
[[a]] and [[A]] sort under the same letter in a category page, but [[Ä]] and [[ä]] sort no where
near each other, even though on a capitalized wiki they would be the same page.

See discussion on r69816.

Also fix an issue with maintenance/updateCollation.php, where php thinks
that 'uppercase' == 0 (?!). I don't really know what the deal with that
is, but using a ! instead of == 0 seems to fix it. (Follow-up r69961)
2011-01-17 06:27:49 +00:00
Chad Horohoe
26505b170a Fix concern raised by Brion in r74108 (but has really existed since the maintenance rewrite). Right now, including a maintenance script causes it to execute. This is bad when you want to reuse the particular class but not have it start executing all by itself.
Until now, we relied on setting MW_NO_SETUP which was a) hacky, b) irreversable, and c) likely to be forgotten if you didn't use one of the wrappers like runChild().

Instead, move the freaky magic to doMaintenance and have *it* check if it's in a specific call stack that indicates this is being run from the file scope and should be executed. Rename DO_MAINTENANCE to RUN_MAINTENANCE_IF_MAIN so it's nice and clear what magic happens behind the require_once().
2011-01-13 22:58:55 +00:00
Alexandre Emsenhuber
9f5d06527c Part of bug 26280: added license headers to PHP files in maintenance 2010-12-16 19:15:12 +00:00