Commit graph

55 commits

Author SHA1 Message Date
jenkins-bot
fcef0f1cea Merge "avoid link cache issues with duplicate title keys for xml dumps" 2019-04-09 14:00:38 +00:00
jenkins-bot
06aaf493e1 Merge "redo: don't die producing xml files if rev text export conversion fails" 2019-04-08 18:55:30 +00:00
Ariel T. Glenn
ff12075282 avoid link cache issues with duplicate title keys for xml dumps
Bug: T220316
Change-Id: If73d6c9b4cac298a7832d65ffa34bc8f69b87752
2019-04-08 21:46:54 +03:00
Derick Alangi
84292b7728 Replace deprecated function wfEscapeShellArg with Shell::escape()
Change-Id: I4046d593d1450cfffc489ca2abadba1084a540e4
2019-04-07 20:17:39 +01:00
Ariel T. Glenn
804b7f1f0f for exports, make sure we compare page titles as strings only
...and not as numbers!! Also added strict compare for the namespaces
field while we're in here.

Bug: T220257
Change-Id: If68b79334188c2f3be5d254bea3c1e27d52c4a9f
2019-04-06 13:01:33 +03:00
Ariel T. Glenn
7fdcc1d319 redo: don't die producing xml files if rev text export conversion fails
Regresson introduced in If4c31b7975b4d901afa8c194c10446c99e27eadf

Bug: T217329
Change-Id: I003a8c230db293d37ae05e0157b3447775a95e59
2019-04-01 18:30:35 +03:00
jenkins-bot
b9789f9c56 Merge "add lbzip2 output processor for exports" 2019-03-23 23:35:29 +00:00
Ariel T. Glenn
b01ff36537 add lbzip2 output processor for exports
Bug: T214293
Change-Id: I98e26b833df473bbeb3dc1b881f428174d776b64
2019-03-24 01:20:04 +02:00
jenkins-bot
9aed0482f4 Merge "don't die producing xml files if rev text export conversion fails" 2019-03-22 09:22:41 +00:00
daniel
45f3912bf1 Make the XML dump schema version configurable.
Bug: T174031
Change-Id: I979b6c8f0a72bc1f5ecce1d499d3fdfa0f671588
2019-03-21 12:43:32 +01:00
jenkins-bot
68b12dfded Merge "Make BackupDumper MCR compatible (main slot only)" 2019-03-20 02:15:43 +00:00
daniel
5988e35505 Make BackupDumper MCR compatible (main slot only)
This makes BackupDumper compatible with the new mechanism for accessing
revision content.

This requires some changes to the way database connections are re-used,
since RevisionStore/SqlBlobStore needs to be able to run queries against
the database while the overall result set is being streamed.

This change does not yet add handing for extra slots to BackupDumper.
That first needs a spec for how extra slots will be represented in the
XML schma (T174031).

NOTE: this changes the output of fetchText from using integer text_id
values to using content_address values (e.g. "tt:4567" for text row
with old_id 4567). It also changes fetchText to accept such addresses
as input, for forward-compatibility. XML stub dumps still use the
numeric format in the id attribute, pending T199121.

Bug: T198706
Change-Id: If4c31b7975b4d901afa8c194c10446c99e27eadf
2019-03-14 13:19:51 +00:00
Ariel T. Glenn
45831b2213 don't die producing xml files if rev text export conversion fails
In abstracts for the specific case, we don't care at all, since the
problem is that it's a self redirect. Redirects are filtered out of
the stream at the end so it won't even show up.

In anything else, we do what dumpTextPass does already, which is to
leave the text alone and emit it as is.

Bug: T217329
Change-Id: I39cdf89531c67962b1a9bba4e0a91f7c655ad6f3
2019-03-14 01:16:24 +02:00
Aaron Schulz
cb15755e92 Normalize use of "INNER JOIN" to "JOIN" in database queries
The ANSI SQL default join type is INNER and this might save
some line breaks here and there.

Change-Id: Ibd39976f46ca3f9b71190d3b60b76ca085787a00
2019-03-06 09:17:30 -08:00
Brian Wolff
a848eae679 Use htmlspecialchars() not htmlentities in xml export for validity
htmlentities() can output entity references that are invalid in XML.
Use htmlspecialchars() instead.

Additionally, cast user-id to int for phan-taint-check

Bug: T216348
Change-Id: Idf781f5a3ffc3c6463969b3f5af63f0f08ae837c
2019-02-17 11:23:50 +00:00
Kunal Mehta
cc5d9a92a2 build: Updating mediawiki/mediawiki-codesniffer to 24.0.0
Change-Id: I66b1775b7c1d36076d9ca78cbeb42787a743f2aa
2019-02-07 18:39:42 +00:00
Thiemo Kreuz
ed96e6f1a7 export: Mark DumpFilter::mark() as being protected
I used
https://codesearch.wmflabs.org/search/?q=-%3Epass%5C(
to make sure there really is no other call to this. This function really
is meant to be protected.

I also used
https://codesearch.wmflabs.org/search/?q=function%20pass%5C(&files=php
to make sure I got all subclasses.

Required for I7da632c43681438aa886bdb709379f10cd9cc658.

Change-Id: I9aaf95c66a6efa22131de627ce015587a109858b
2019-01-11 18:39:40 +00:00
Max Semenik
c70119302d Don't check for LIBXML_PARSEHUGE presence
It's been present since PHP 5.3.2.

Change-Id: I23a3c50c10e984abe6ff214fbf504ab6f6be763c
2019-01-07 19:32:39 -08:00
Alangi Derick
0848d0e607 export: Fix return value of write() function in DumpOutput.php
Per http://php.net/manual/en/function.print.php, print always returns
1 (integer value of one) but write() doesn't actually return this value
to it's caller. So "@return bool" in this case doesn't make sense as
one will think it's returning a bool type. `write()` only takes a string
and prints it, and if we really want to return it's value (in this case),
it will be "@return int" not data of type bool.

Change-Id: I45b4a157cde2f768d77dda76d6ae0caa47e28f20
2018-12-03 12:28:48 +01:00
Jakub Vrana
3559bca6f7 export: Do not pass unused parameter
Found by PHPStan.

Change-Id: I0e4971a3d5a4170ada776cd88ea664806f23c6ed
2018-12-01 23:50:22 +00:00
Bill Pirkle
94ec06e0bf Fix for missing end tag </page> on some exports
T203424 replaced streaming mode with batched queries.
However, it did not properly handle some values of
the $wgExportMaxHistory config variable, and emitted
broken XML. This change fixes that issue.

Bug: T207974
Change-Id: Iade3fc603e513da51b7a970c16275516c02ede49
2018-10-31 18:02:02 -05:00
Ariel T. Glenn
1113b1203c fix stubs dump query to use straight join
regression from a combination of
https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/380669/
and
https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/459885/

We have to do a straight join for all stubs, since they all
now order revisions by rev_id after deployment of the second
patchset above.

Bug: T207628
Change-Id: I4d2a311c14c66d4813eb9fc3c587fa3ddb958454
2018-10-23 18:33:02 +03:00
Bill Pirkle
085b6e4787 Replace WikiExporter streaming (unbuffered) mode with batched queries
WikiExporter allows streaming mode, using unbuffered mode on
the database connection. We are moving away from this technique.
Instead, do multiple normal queries and retrieve the
information in batches.

Bug: T203424
Change-Id: I582240b67c91a8be993a68c23831c9d86617350e
2018-09-28 10:55:05 -05:00
daniel
7c2d0202ab Allow dumps to function with MCR in read-new mode.
This is a temporary fix that forces the dump code to continue to
use the old database schema, even when MCR is in
SCHEMA_COMPAT_READ_NEW mode. This will continue to function until
SCHEMA_COMPAT_WRITE_OLD  mode is disabled.

Bug: T198561
Change-Id: Ic54ee703f47d1843f70fdb7185ac1b098f148680
2018-09-07 14:18:40 +02:00
Aryeh Gregor
90d4f56fe4 Mass conversion of $wgContLang to service
Brought to you by vim macros.

Bug: T200246
Change-Id: I79e919f4553e3bd3eb714073fed7a43051b4fb2a
2018-08-11 22:44:29 -06:00
Umherirrender
130ec2523d Fix PhanTypeMismatchDeclaredParam
Auto fix MediaWiki.Commenting.FunctionComment.DefaultNullTypeParam sniff

Change-Id: I865323fd0295aabd06f3e3c75e0e5043fb31069e
2018-07-07 00:34:30 +00:00
Max Semenik
8a82749ce9 Don't use deprecated NS_IMAGE*
Change-Id: I3df00c8e55a79baa3f3727daeea6c9441113aebc
2018-03-16 22:58:58 -07:00
Brad Jorsch
27c61fb1e9 Add actor table and code to start using it
Storing the user name or IP in every row in large tables like revision
and logging takes up space and makes operations on these tables slower.
This patch begins the process of moving those into one "actor" table
which other tables can reference with a single integer field.

A subsequent patch will remove the old columns.

Bug: T167246
Depends-On: I9293fd6e0f958d87e52965de925046f1bb8f8a50
Change-Id: I8d825eb02c69cc66d90bd41325133fd3f99f0226
2018-02-23 10:06:20 -08:00
addshore
e5879da149 Pass $key into CommentStore methods and use MediawikiServices
This allows CommentStore to be added to MediaWikiServices
without the need of an aditional Factory.

This change includes a compatability layer to allow the behaviour
from 1.30 to continue to be used while deprecated.

CommentStore::newKey has been deprecated.
Keys are now passed into the public methods of CommentStore
where needed.
The following CommentStore methods have had their signatures changed
to introduced a $key parameter, but when used in conjunction with
CommentStore::newKey behaviour will remain unchanged:
  * CommentStore::getFields
  * CommentStore::getJoin
  * CommentStore::getComment
  * CommentStore::getCommentLegacy
  * CommentStore::insert
  * CommentStore::insertWithTemplate

Change-Id: I3abb62a5cfb0dcd456da9f4eb35583476ae41cfb
2018-02-05 15:34:12 +00:00
Chad Horohoe
93d44c9a42 Move BaseDump into includes/export/
There's no reason for this to have to live in Maintenance land. It's
generally useful and lets us avoid some random require/include calls

Change-Id: I60419c7f9fc52313905053bbeb3aa81666c9160c
2018-01-08 22:10:25 -08:00
Albert221
6a47a03236 Fix autoloading of ExportProgressFilter
Bug: T177239
Change-Id: Ieb5d5aa78d569af8cd8f8bfa32ce10a33482cb84
2017-12-13 22:04:59 +01:00
Reedy
7a836958be Run strval() over the File description
Bug: T176090
Change-Id: I8488666c221a1bd4e4e063291e74819a07a4a20f
2017-09-18 01:00:10 +01:00
Brad Jorsch
11cf01dd9a Add comment table and code to start using it
A subsequent patch will remove the old columns.

Bug: T166732
Change-Id: Ic3a434c061ed6e443ea072bc62dda09acbeeed7f
2017-08-30 15:05:00 +10:00
Umherirrender
718e63694d Add missing @param and @return documentation
Change-Id: I1d1098eec3933df6561cceef646576013ddc08c8
2017-08-11 22:17:01 +02:00
Umherirrender
a9007e8baf Add missing & to @param documentation to match functon call
Change-Id: I81e68310abcbc59964b22e0e74842d509f6b1fb9
2017-08-11 18:47:46 +02:00
Kunal Mehta
d1cf48a397 build: Update mediawiki/mediawiki-codesniffer to 0.10.1
And auto-fix all errors.

The `<exclude-pattern>` stanzas are now included in the default ruleset
and don't need to be repeated.

Change-Id: I928af549dc88ac2c6cb82058f64c7c7f3111598a
2017-07-22 18:24:09 -07:00
Umherirrender
b5cddfb27b Remove empty lines at begin of function, if, foreach, switch
Organize phpcs.xml a bit

Change-Id: Ifb767729b481b4b686e6d6444cf48b1f580cc478
2017-07-01 11:34:16 +00:00
Aaron Schulz
488a647831 Move IDatabase/IMaintainableDatabase to Rdbms namespace
Change-Id: If7e8a8ff574661fd827de8bcec11d2c39a687300
2017-03-28 15:32:38 -07:00
jenkins-bot
22806b0a45 Merge "Handle missing namespace prefix in XML dumps more gracefully" 2017-03-08 05:07:57 +00:00
Aaron Schulz
e01fd44388 Move ResultWrapper subclasses to Rdbms
Change-Id: I6f3f0e85e268b24c57c537aa6ad8016e0b4cdddb
2017-03-03 00:44:41 +00:00
Bartosz Dziewoński
ecdef925bb Miscellaneous indentation tweaks
I was bored. What? Don't look at me that way.

I mostly targetted mixed tabs and spaces, but others were not spared.
Note that some of the whitespace changes are inside HTML output,
extended regexps or SQL snippets.

Change-Id: Ie206cc946459f6befcfc2d520e35ad3ea3c0f1e0
2017-02-27 19:23:54 +01:00
Brad Jorsch
fb3ae6fbe3 Replace use of &$this
Use of &$this doesn't work in PHP 7.1. For callbacks to methods like
array_map() it's completely unnecessary, while for hooks we still need
to pass a reference and so we need to copy $this into a local variable.

Bug: T153505
Change-Id: I8bbb26e248cd6f213fd0e7460d6d6935a3f9e468
2017-01-31 23:01:54 -05:00
This, that and the other
ef8bc825c6 Handle missing namespace prefix in XML dumps more gracefully
If an XML dump of a wiki is exported using dumpBackup.php, and there are
pages in a namespace that is not registered (perhaps because of a missing
extension), they will appear in the dump in the form

<page> ... <title>PageTitle</title> <ns>1234</ns> ... </page>

This caused the ForeignTitle code to raise an undefined offset error,
because it assumed that the <title> element was of the form
"Namespace:PageTitle" when <ns> was nonzero. This assumption is not valid.

Now, the importation of such dumps will no longer throw errors and the
pages will be correctly imported, although possibly to unexpected
locations.

Bug: T114115
Change-Id: I0271435dc208e7ea118339584f8a0e359c96113a
2017-01-01 09:11:45 +00:00
jenkins-bot
c8d361a380 Merge "Export: Use BCP 47 language code for attribute xml:lang" 2016-12-14 09:21:35 +00:00
Fomafix
155ee515d4 Export: Use BCP 47 language code for attribute xml:lang
The patch changes for example
 https://crh.wikipedia.org/wiki/Mahsus:Export/Ba%C5%9F_Saife
from
 xml:lang="crh-latn"
to
 xml:lang="crh-Latn"

Change-Id: I2fb218fe026c5ffee081fb8aaee7b154a8732bdc
2016-12-13 20:23:25 +01:00
Fomafix
202f695f67 Update weblinks in comments from HTTP to HTTPS
Use HTTPS instead of HTTP where the HTTP link is a redirect to the HTTPS link.

Also update some defect links.

Change-Id: Ic3a5eac910d098ed5c2a21e9f47c9b6ee06b2643
2016-11-07 15:24:46 +01:00
Kevin Israel
be46ffa771 DumpStringOutput: Rename getOutput() to __toString()
Though getOutput() is what first came to mind, I do not particularly
like the name, partly because it is used in many, many places as a
method that returns an OutputPage object.

* Renamed the method to __toString(). This is appropriate because
  each instance, at any given time, corresponds to a single string
  value (and exceptions cannot occur during this conversion).
* Removed unnecessary variables from ApiQuery and ExportTest. In
  these and most other cases, it should no longer be necessary to
  call getOutput() explicitly.

Change-Id: Icf202743d1f332f8981338f42eb6e3e5a04abdf1
2016-07-14 06:28:16 -04:00
Kevin Israel
81d5d8adc2 ApiQuery: Don't mess with PHP output buffering
Specifically, it is not necessary to use output buffering functions
to capture XML generated by the export code because it is already
possible to set the "output sink" object to be used.

* Created a DumpStringOutput class, which appends all output to a
  string property rather than printing output immediately.
* Used that class, instead of ob_start() and ob_get_clean(), in
  ApiQuery and ExportTest.

Change-Id: I238f5d5ec7fd442c845b25cb59ef81ac3285099f
2016-07-08 18:30:55 -04:00
Ariel T. Glenn
327d8c8b54 add option to XML dump stubs of page ranges with explicit rev_id ordering
tested for stubs, text, logging with and without start/end values,
with and without orderrevs, seems to work as expected, with the
appropriate changes to the query.

Bug: T29112
Change-Id: I94ca4a06235bdbed384bb997deb7432bb5aaa5b9
2016-06-22 22:05:55 +03:00
Reedy
b5656b6953 Many more function case mismatches
Change-Id: I5d3a5eb8adea1ecbf136415bb9fd7a162633ccca
2016-03-19 00:20:58 +00:00