Commit graph

29 commits

Author SHA1 Message Date
Marius Hoch
652c4be7c2 Clean up: Declare variables with public instead of var
Variables in classes should be declared using public $foo
instead of var $foo for various reasons. As we require PHP 5.3
we don't have to take care about that PHP4 left over, but can
get rid of it in favour of the more clear and better readable
public.
See also: http://php.net/manual/en/language.oop5.visibility.php
(Divided into several commits to keep reviewable)

Change-Id: Ic723d0347ab2e3c78bc0097345c68bbee3dc035a
2012-09-14 21:00:00 +02:00
Alexandre Emsenhuber
2a7478b4fb Improve documentation of maintenance scripts.
Change-Id: Id7a04ff816dc47a8cc81a4da5ab0dff26b688bd5
2012-09-03 20:10:09 +02:00
jeroendedauw
38c7f444e1 Use __DIR__ instead of dirname( __FILE__ )
We can now do this since we finally switched to PHP 5.3 for MW 1.20 and get rid of the silly dirname(__FILE__) stuff :)

Change-Id: Id9b2c9cd2e678197aa81c78adced5d1d31ff57b1
2012-08-27 21:45:00 +02:00
Tim Starling
8df24d5586 updateCollation.php size histogram feature
Added a feature allowing updateCollation.php to show a histogram of
sort key sizes, to assess the effect of index size truncation. Added
--dry-run and --target-collation options to allow the index truncation
to be assessed without actually changing the collation.

Change-Id: I497b5d0740384f5d6fdebc6d5ccfea5d853fbd37
2012-07-18 13:23:14 +10:00
Reedy
a8cdc7df3a Use estimateRowPage if wiki is using wgMiserMode
Change-Id: I59404e9514a87f65faf3eb865fafe358d9f01079
2012-07-06 17:57:40 +01:00
Sam Reed
c47f83a4d4 More __METHOD__ in our madness 2012-02-24 18:45:24 +00:00
Sam Reed
62491fef13 Comments, braces, explicit member variables
Remove a couple of unused variables
2011-11-16 13:22:03 +00:00
Roan Kattouw
a47f2dcb2d Followup r97146: drop the $lb->waitTimeout() call per Tim. Was used so Tim could sleep while a schema change was going on, but this is the kind of live hack that doesn't belong in core. 2011-09-15 12:42:29 +00:00
Roan Kattouw
c6fb8af8ef Merge live hacks from r83992 to trunk, after cleaning some things up.
* Wait for slaves after every thousand rows rather than after processing every batch. r83992 had 1000 hard-coded, I put it in SYNC_INTERVAL
* Set $lb->waitTimeout(100000). I have no idea why, but it was in the live hack. Maybe Tim or Domas could enlighten me
* Use a STRAIGHT JOIN for the query on categorylinks and page because MySQL appears to want to join the tables the wrong way around
* Use cl_collation='previousValue' rather than cl_collation!='newValue' if possible. This was originally a dirty live hack, but I re-implemented it nicely with a --previous-collation command line option
* Print a status update both before and after the SELECT query. This allows the user to notice when the SELECT queries are getting increasingly slower, which is an indication you may want to set --previous-collation
2011-09-15 12:17:44 +00:00
Max Semenik
c79a16167a Introduced Maintenance::getDB() and corresponding setDB() to control externally what database object should be used by maintenance script. Currently used by updater to avoid DatabaseSqliteTest from running stuff like Populate* on the live database instead of the one used for testing. 2011-05-24 17:48:22 +00:00
Sam Reed
fa7662d94a Ensure $collationConds is defined on all paths 2011-04-14 18:46:37 +00:00
Sam Reed
b88afb0daa Fixup/add documentation
Remove some unused variables
2011-03-30 19:00:11 +00:00
Roan Kattouw
a38fd53df2 (bug 27975) Fix r83529 (slave catchup in updateCollation.php) to not try to wait for slaves if there are none. Reporter was getting a permission error for getting the master position on a single-server setup 2011-03-14 09:30:56 +00:00
Aryeh Gregor
8c69bdb0a6 Change collationUpdate batch size from 1000 to 50
It selects that many rows, then does PHP processing and an individual
update query for each one.  This is not a good idea when each batch is
done in a single transaction: 1000 MySQL updates interspersed with PHP
processing might take a second or more while locks are held.
2011-03-08 21:21:08 +00:00
Roan Kattouw
ff6fec1e6f Make updateCollation.php a bit less murderous for WMF databases:
* Don't run a COUNT(*) query on what's potentially the entire categorylinks table on enwiki (hundreds of millions of rows). Put it in a miser mode check
* Wait for DB replication to catch up before processing the next batch. Implemented LoadBalancer::waitAll() for this purpose, which should behave more nicely than wfWaitForSlaves()
2011-03-08 16:47:26 +00:00
Tim Starling
f1869f59b0 Add --force option to updateCollation.php. 2011-01-20 06:24:11 +00:00
Tim Starling
eaeea84b44 * Introduced a non-dummy collation for $wgCategoryCollation, namely UCA with default tables.
* Added a maintenance script which generates a list of first letters. Unified Han are omitted for performance, and because they shouldn't be used as headings anyway. A future collation specific to Chinese would provide the KangXi radicals as "first letters".
* Provided a precomputed list of first letters. Used Unicode 6.0.0 data and ICU 4.2. 
* Moved collation functionality from Language to a Collation class hierarchy with factory function. Removed the recently-added methods from Language and updated all callers.
* Changed Title::getCategorySortkey() to separate its parts with a line break instead of a null character. All collations supported by the intl extension ignore the null character, i.e. "ab" == "a\0b". It would have required a lot of hacking to make it work.
* Fixed the uppercase collation to handle non-ASCII characters, redundantly with r80436. I don't think it's necessary to change the collation name as was done there, so I reverted that in the course of my conflict merge. A --force option to updateCollation.php might be nice though.
2011-01-17 14:02:22 +00:00
Brian Wolff
c79b4bdd21 Change the default collation from strtoupper to Language::uc, so that non-ascii characters get to play too.
I know the uppercase thing is just a standby until a real collation function is written. However in the
mean time, i think it'd be really weird for a wiki with $wgCapitalLinks = false to suddenly have
[[a]] and [[A]] sort under the same letter in a category page, but [[Ä]] and [[ä]] sort no where
near each other, even though on a capitalized wiki they would be the same page.

See discussion on r69816.

Also fix an issue with maintenance/updateCollation.php, where php thinks
that 'uppercase' == 0 (?!). I don't really know what the deal with that
is, but using a ! instead of == 0 seems to fix it. (Follow-up r69961)
2011-01-17 06:27:49 +00:00
Chad Horohoe
26505b170a Fix concern raised by Brion in r74108 (but has really existed since the maintenance rewrite). Right now, including a maintenance script causes it to execute. This is bad when you want to reuse the particular class but not have it start executing all by itself.
Until now, we relied on setting MW_NO_SETUP which was a) hacky, b) irreversable, and c) likely to be forgotten if you didn't use one of the wrappers like runChild().

Instead, move the freaky magic to doMaintenance and have *it* check if it's in a specific call stack that indicates this is being run from the file scope and should be executed. Rename DO_MAINTENANCE to RUN_MAINTENANCE_IF_MAIN so it's nice and clear what magic happens behind the require_once().
2011-01-13 22:58:55 +00:00
Alexandre Emsenhuber
9f5d06527c Part of bug 26280: added license headers to PHP files in maintenance 2010-12-16 19:15:12 +00:00
Mark A. Hershberger
617a5b1e15 Whitespace fixup under tha maint directory. 2010-12-04 03:20:14 +00:00
Aryeh Gregor
dcd5d260d4 Further categorylinks schema changes
Per review by Tim, I made two changes:

1) Fix cl_sortkey to be varbinary(255).

2) Expand cl_collation to varbinary(32), and change $wgCollationVersion
to $wgCategoryCollation, to account for the variety of collations we
might have.  tinyint is too small.  I could have gone with int, but
that's annoyingly inscrutable in practice, as we all know from namespace
fields.

To make the upgrade easier for non-trunk users, I updated the old patch
file to incorporate the new changes, using the updatelog table so that
people upgrading from 1.16 won't have to do two alters on categorylinks.
I didn't test the upgrade-from-1.16 code path yet, so if anyone tests
that and it seems not to break, commenting to that effect would be
appreciated.

Also removed wfDeprecated() from archive().  Do *not* add this to
functions that are still actively used in core.  If you think this
function is so terrible that it really mustn't be used, remove callers
yourself, don't pester every single developer with messages in the hope
that someone else will do it for you.
2010-09-03 20:52:08 +00:00
Aryeh Gregor
5b132a4f47 Preserve cl_timestamp in updateCollation.php
For those crazy Wikinews people, and other DPL users.  Why do we use a
crazy auto-updating column type instead of specifying the current time
explicitly when we want to update it, again . . . ?
2010-08-04 00:29:20 +00:00
Aryeh Gregor
34db6f4b6f Use exact counts in updateCollation.php
There's no reason to avoid a one-time COUNT(*), is there?  It will be
free if collations are actually up-to-date, because the column is
indexed.
2010-08-03 21:11:16 +00:00
Aryeh Gregor
a30d4319a5 Sort pages in categories without namespace prefix
This removes $wgCategoryPrefixedDefaultSortkey and effectively always
makes it false.  The setting was added in the first place to hack around
the default, clearly broken behavior, but this just fixes it instead, so
the setting is no longer needed.

Running maintenance/updateCollation.php for the first time will fix
this, no need to run refreshLinks.php.  If you've already run
updateCollation.php, you can do UPDATE categorylinks SET cl_collation =
76; or such and then run the script again.
2010-08-03 20:50:31 +00:00
Aryeh Gregor
7ec501be6a Enable new category sort by default
Patch best viewed with whitespace changes ignored.  This will doubtless
introduce a bunch of bugs.  Please report any so I can fix them.  If
they're big enough and the fix isn't obvious, please revert.
2010-08-03 20:50:01 +00:00
Aryeh Gregor
2ffa5e4876 Fix bug in prefixing scheme
As Bawolff pointed out at [[mw:User talk:Simetrical/Collation]], the
prefixing scheme I was using meant that the page "Z" with sort key of
"F" would sort after a page named "A" with a sort key of "FF", since the
first one's raw sort key would compute to "FZ", and the second's would
compute to "FFA".  I've fixed this by separating the prefix from the
unprefixed part by a null byte (cl_sortkey is eventually going to be
totally binary anyway, may as well start now).
2010-07-26 22:04:19 +00:00
Aryeh Gregor
022b7ba140 Reconcept cl_raw_sortkey as cl_sortkey_prefix
In response to feedback by Phillipe Verdy on bug 164.  Now if a bunch of
pages have [[Category:Foo| ]], they'll sort amongst themselves according
to page name, instead of in basically random order as it is currently.
This also makes storage more elegant and intuitive: instead of giving
NULL a magic meaning when there's no custom sortkey specified, we just
store an empty string, since there's no prefix.

This means {{defaultsort:}} really now means {{defaultsortprefix:}},
which is slightly confusing, and a lot of code is now slightly
misleading or poorly named.  But it should all work fine.

Also, while I was at it, I made updateCollation.php work as a transition
script, so you can apply the SQL patch and then run updateCollation.php
and things will work.  However, with the new schema it's not trivial to
reverse this -- you'd have to recover the raw sort keys with some PHP.
Conversion goes at about a thousand rows a second for me, and seems to
be CPU-bound.  Could probably be optimized.

I also adjusted the transition script so it will fix rows with collation
versions *greater* than the current one, as well as less.  Thus if some
site wants to use their own collation, they can call it 137 or
something, and if they later want to switch back to MediaWiki stock
collation 7, it will work.

Also fixed a silly bug in updateCollation.php where it would say "1000
done" if it did nothing, and changed $res->numRows() >= self::BATCH_SIZE
to == so people don't wonder how it could be bigger (since it can't, I
hope).
2010-07-26 19:27:13 +00:00
Aryeh Gregor
3783aa2a3c Add non-identity collation, with migration script
It seemed to work correctly, with the newly-created page "bob" sorting
as "BOB", but then I nuked all my cl_sortkey by running the migration
script before refreshLinks.php had finished running, so I'll have to
wait a while to see if it works properly with a non-messed-up database.
It's possible there's something wrong with the display of section
letters in the categories, but otherwise I think this is working right.
2010-07-23 20:58:11 +00:00