Commit graph

87 commits

Author SHA1 Message Date
Umherirrender
0688dd7c6d Set method visibility for various constructors
Change-Id: Id3c88257e866923b06e878ccdeddded7f08f2c98
2019-12-03 20:17:30 +01:00
daniel
6d505204e5 XmlDumpWriter: emit xml:space only if text is present.
This patch restores the old behavior of omitting the xml:space attribute
on empty <text> tags in stub dumps and suppressed revisions.

Bug: T228763
Change-Id: I12e72a3f4f3583e4e41daa11a9a28a96cadf7725
2019-11-25 20:58:07 +00:00
daniel
d9209707cc WikiExporter: Remove unnecessary check for SCHEMA_COMPAT_WRITE_OLD flag
WikiExporter used to require SCHEMA_COMPAT_WRITE_OLD to be enabled,
until that requirement was fixed in I5ea972bb07ca1cfb3a2ad8ef120aef7.
However, I failed to remove the explicit check for the flag at the
time, causing all exports to fail in SCHEMA_COMPAT_NEW mode. This
change removes the obsolete check.

Bug: T236735
Change-Id: I809ed4e2f1f30fdc4bd817f815d733d8a62f3d4f
2019-10-28 20:46:30 +00:00
Max Semenik
bdf7e3f5bd Set constant visibility, part 1
Change-Id: I3dad26b1a0bd469fa84fee5c15d9b581765ceb94
2019-10-18 02:19:24 +00:00
Daimona Eaytoy
e3412efac3 Unsuppress PhanParamReqAfterOpt, use PHP71 nullable types
These were all checked with codesearch to ensure nothing is overriding
these methods.
For the most part, I've updated the signature to use nullable types; for
two Pager's, I've just made all parameters non-optional, because you're
already forced to pass them with a required parameter at the end.

Bug: T231636
Change-Id: Ie047891f55fcd322039194cfa9a8549e4f1f6f14
2019-10-10 11:53:58 +02:00
Daimona Eaytoy
1927fda909 export: Align docs of close(Rename/Reopen) methods
The base implementation says it can accept an array with a single
element, but the subclasses only had `string` in the docblock (although
they could handle the array case). Hence, replace docblocks in
subclasses with @inheritDoc to copy the parent description and avoid
such discrepancies in the future.

Plus, change `array` to `string[]` for better type inference.

Change-Id: Ica9929fd50f31d8d5f0e29f7c60364086ea39ae5
2019-09-19 17:29:30 +00:00
Umherirrender
f74400487f phan: Disable enable_class_alias_support
It is enabled for b/c in extensions, but not needed in core

Change-Id: I51dca12be9c77049f77563d9bf0edd07928c2300
2019-09-15 08:26:52 +00:00
jenkins-bot
d2f799f103 Merge "Declare dynamic properties" 2019-09-13 21:49:14 +00:00
Daimona Eaytoy
9699158f74 Declare dynamic properties
This is for all classes with 2 or more undeclared properties.

Change-Id: I1d80deb31f331bcc277b33f9e9f74857ba825637
2019-09-13 17:54:37 +00:00
jenkins-bot
5d1b19ed47 Merge "Avoid Database::tableName in WikiExporter" 2019-09-12 22:36:03 +00:00
Umherirrender
5a4d30ed09 Avoid Database::tableName in WikiExporter
Using * in select is not the prefered way.
List all needed columns to make the use visible and to avoid issues when
new fields gets added with big data.
As each column name is unique there is no need to get the table name for
prefixing the columns

The following columns no longer selected:
- log_user_text -> not used due to use of ActorMigration class
- log_actor -> Add by ActorMigration class
- log_comment_id -> Added by CommentStore
- log_page -> Unused in the writer, the ns/title pair is used instead

Move the arrays out of the loop, because there are not depending on
values changing in the loop

Change-Id: I140641b7ed75bc2b8db2e7612020d668f1be663b
2019-09-12 20:06:07 +02:00
Daimona Eaytoy
b5cbb5ab3f Upgrade phan config to 0.7.1
This allows us to remove many suppressions for phan false positives.

Bug: T231636
Depends-On: I82a279e1f7b0fdefd3bb712e46c7d0665429d065
Change-Id: I5c251e9584a1ae9fb1577afcafb5001e0dcd41c7
2019-09-04 08:20:53 +00:00
Daimona Eaytoy
c659bc6308 Unsuppress another phan issue (part 7)
Bug: T231636
Depends-On: I2cd24e73726394e3200a570c45d5e86b6849bfa9
Depends-On: I4fa3e6aad872434ca397325ed7a83f94973661d0
Change-Id: Ie6233561de78457cae5e4e44e220feec2d1272d8
2019-09-03 17:19:21 +00:00
Daimona Eaytoy
5eac6d131c Unsuppress more phan issues (part 3)
Bug: T231636
Depends-On: I78354bf5f0c831108c8f606e50c87cf6bc00d8bd
Change-Id: I58e67c2b38389df874438deada4239510d21654f
2019-08-31 16:38:55 +00:00
jenkins-bot
bd62b3562a Merge "Improve type hints in export related classes" 2019-07-25 21:42:32 +00:00
Ariel T. Glenn
a27820692f make XmlDumpwriter more resilient to blob store corruption
Loading content can also throw InvalidArgumentException when
the cluster address is an unknown cluster.

Bug: T228720
Change-Id: I313f9a5a27b21a33e90639abae3f505640c30e23
2019-07-24 08:59:38 +03:00
daniel
30bb36f210 Make XmlDumpwriter resilient to blob store corruption.
In the WMF databases, we have several revisions for which we cannot
load the content. They typically (but not necessarily) have
content_address = "tt:0" and content_sha1 = "" and rev_sha1 = ""
and content_size = 0 and rev_len = 0.

This patch makes sure we can still generate dumps in the presence of
such revisions.

Bug: T228720
Change-Id: Iaadad44eb5b5fe5a4f2e60da406ffc11f39c735b
2019-07-23 13:59:57 +02:00
Ariel T. Glenn
accecbc9a8 don't load revision text content unless requested to
Bug: T228614
Change-Id: Idef4d9684560110a16c6a7c074402c5a5a6e59db
2019-07-22 10:50:06 +03:00
Derick Alangi
339211a1ea Avoid usage of deprecated Revision::* constants, use RevisionRecord
Change-Id: I872fc89e5c02dd6a3ae9cd7e76640b95dc33f514
2019-07-21 15:03:03 +01:00
Daimona Eaytoy
148b239f03 Add fields and docs to WikiExporter
Three fields were undeclared, thus raising some phan warnings.

Change-Id: Ib7934b507cb69d29a3d2422dadc24b12207a12ad
2019-07-09 10:33:03 +02:00
Umherirrender
f5fa7a94d9 Improve type hints in export related classes
Change-Id: I3a11173bc96611c69cdc615eba741c6e4f92824a
2019-07-05 18:41:56 +00:00
Ariel T. Glenn
6531479e78 Restore previous export behavior with respect to empty comment text
Bug: T174031
Change-Id: I0df1be8cb832e94ecda3db57b5fee5922a866aea
2019-07-03 12:10:44 +03:00
daniel
fdc3e9f952 Add support for xml dump schema 0.11
Bug: T174031
Change-Id: I2717019ea7efe36694bd2b2fba4dc2952a987cfc
2019-06-27 21:56:01 +00:00
daniel
dd14601afb Join slot and content tables when dumping XML
This introduces a way to construct a RevisionRecord based on a
known set of SlotRecords. To allow this to be used consistently
with the legacy revision schema, some tweaks had to be made
to getSlotsQueryInfo().

Bug: T220493
Change-Id: I5ea972bb07ca1cfb3a2ad8ef120aef77e460745c
2019-06-27 22:26:22 +02:00
Aaron Schulz
1fb1494c93 Use IResultWrapper in code comments instead of ResultWrapper
Change-Id: Idb813c20bef0d41d0f9f01440daab4fee6cdb38d
2019-06-22 17:58:39 +00:00
Derick Alangi
2dca5bbbf5 Remove unnecessary semi-colons
Change-Id: I9eb65bdfbd3aa581effc14ead801b9e89b0359c3
2019-06-12 14:35:59 +01:00
Derick Alangi
21e2d71560 Replace some uses of deprecated wfFindFile() and wfLocalFile()
These global functions were deprecated in 1.34 and services made
available to replace them. See services below;

* wfFindFile() - MediaWikiServices::getInstance()->getRepoGroup()->findFile()
* wfLocalFind() - MediaWikiServices::getInstance()->getRepoGroup()->getLocalRepo()->newFile()

NOTES:

* wfFindFile() and wfLocalFind() usages in tests have been ignored
  in this change per @Timo's comments about state of objects.

* includes/upload/UploadBase.php also maintained for now as it causes
  some failures I don't fully understand, will investigate and handle
  it in a follow up patch.

* Also, includes/MovePage.php

Change-Id: I9437494de003f40fbe591321da7b42d16bb732d6
2019-06-11 13:26:37 +00:00
Ariel T. Glenn
e8805741ed make sure revision uids are 0 in the xml if missing/0 in the db
Bug: T224221
Change-Id: Id9861866fd9e4d2fe8d151c9631403aa24b9a779
2019-06-03 15:49:11 +00:00
Ariel T. Glenn
c27ced4f31 always order by page_id for dumps of current revisions
Also drop ordering of revs within pages, since there is only one
revision being dumped

Bug: T207628
Change-Id: I5e4f0bea7b54506ca389818407c43152a290da6e
2019-05-27 18:42:54 +00:00
jenkins-bot
284778405b Merge "allow xml page content or metadata dumps to target specific namespaces" 2019-05-13 06:51:45 +00:00
Aryeh Gregor
2e1ac38485 Mass conversion to NamespaceInfo
Change-Id: I2fef157ceec772f304c0923a1cd8c0eef2e82a0f
2019-05-07 22:44:56 +02:00
Ariel T. Glenn
7f51b9e040 allow xml page content or metadata dumps to target specific namespaces
We don't alter the db query for this, but throw away the extraneous
rows before doing any processing on them whatsoever.

Use of the DumpNamespaceFilter comes too late to avoid processing
for each revision done in XmlDumpWriter::writeRevision.

Bug: T220940
Change-Id: I9cb30ce612d862d97d96720ac68ff2327409f485
2019-04-18 14:42:50 +03:00
jenkins-bot
fcef0f1cea Merge "avoid link cache issues with duplicate title keys for xml dumps" 2019-04-09 14:00:38 +00:00
jenkins-bot
06aaf493e1 Merge "redo: don't die producing xml files if rev text export conversion fails" 2019-04-08 18:55:30 +00:00
Ariel T. Glenn
ff12075282 avoid link cache issues with duplicate title keys for xml dumps
Bug: T220316
Change-Id: If73d6c9b4cac298a7832d65ffa34bc8f69b87752
2019-04-08 21:46:54 +03:00
Derick Alangi
84292b7728 Replace deprecated function wfEscapeShellArg with Shell::escape()
Change-Id: I4046d593d1450cfffc489ca2abadba1084a540e4
2019-04-07 20:17:39 +01:00
Ariel T. Glenn
804b7f1f0f for exports, make sure we compare page titles as strings only
...and not as numbers!! Also added strict compare for the namespaces
field while we're in here.

Bug: T220257
Change-Id: If68b79334188c2f3be5d254bea3c1e27d52c4a9f
2019-04-06 13:01:33 +03:00
Ariel T. Glenn
7fdcc1d319 redo: don't die producing xml files if rev text export conversion fails
Regresson introduced in If4c31b7975b4d901afa8c194c10446c99e27eadf

Bug: T217329
Change-Id: I003a8c230db293d37ae05e0157b3447775a95e59
2019-04-01 18:30:35 +03:00
jenkins-bot
b9789f9c56 Merge "add lbzip2 output processor for exports" 2019-03-23 23:35:29 +00:00
Ariel T. Glenn
b01ff36537 add lbzip2 output processor for exports
Bug: T214293
Change-Id: I98e26b833df473bbeb3dc1b881f428174d776b64
2019-03-24 01:20:04 +02:00
jenkins-bot
9aed0482f4 Merge "don't die producing xml files if rev text export conversion fails" 2019-03-22 09:22:41 +00:00
daniel
45f3912bf1 Make the XML dump schema version configurable.
Bug: T174031
Change-Id: I979b6c8f0a72bc1f5ecce1d499d3fdfa0f671588
2019-03-21 12:43:32 +01:00
jenkins-bot
68b12dfded Merge "Make BackupDumper MCR compatible (main slot only)" 2019-03-20 02:15:43 +00:00
daniel
5988e35505 Make BackupDumper MCR compatible (main slot only)
This makes BackupDumper compatible with the new mechanism for accessing
revision content.

This requires some changes to the way database connections are re-used,
since RevisionStore/SqlBlobStore needs to be able to run queries against
the database while the overall result set is being streamed.

This change does not yet add handing for extra slots to BackupDumper.
That first needs a spec for how extra slots will be represented in the
XML schma (T174031).

NOTE: this changes the output of fetchText from using integer text_id
values to using content_address values (e.g. "tt:4567" for text row
with old_id 4567). It also changes fetchText to accept such addresses
as input, for forward-compatibility. XML stub dumps still use the
numeric format in the id attribute, pending T199121.

Bug: T198706
Change-Id: If4c31b7975b4d901afa8c194c10446c99e27eadf
2019-03-14 13:19:51 +00:00
Ariel T. Glenn
45831b2213 don't die producing xml files if rev text export conversion fails
In abstracts for the specific case, we don't care at all, since the
problem is that it's a self redirect. Redirects are filtered out of
the stream at the end so it won't even show up.

In anything else, we do what dumpTextPass does already, which is to
leave the text alone and emit it as is.

Bug: T217329
Change-Id: I39cdf89531c67962b1a9bba4e0a91f7c655ad6f3
2019-03-14 01:16:24 +02:00
Aaron Schulz
cb15755e92 Normalize use of "INNER JOIN" to "JOIN" in database queries
The ANSI SQL default join type is INNER and this might save
some line breaks here and there.

Change-Id: Ibd39976f46ca3f9b71190d3b60b76ca085787a00
2019-03-06 09:17:30 -08:00
Brian Wolff
a848eae679 Use htmlspecialchars() not htmlentities in xml export for validity
htmlentities() can output entity references that are invalid in XML.
Use htmlspecialchars() instead.

Additionally, cast user-id to int for phan-taint-check

Bug: T216348
Change-Id: Idf781f5a3ffc3c6463969b3f5af63f0f08ae837c
2019-02-17 11:23:50 +00:00
Kunal Mehta
cc5d9a92a2 build: Updating mediawiki/mediawiki-codesniffer to 24.0.0
Change-Id: I66b1775b7c1d36076d9ca78cbeb42787a743f2aa
2019-02-07 18:39:42 +00:00
Thiemo Kreuz
ed96e6f1a7 export: Mark DumpFilter::mark() as being protected
I used
https://codesearch.wmflabs.org/search/?q=-%3Epass%5C(
to make sure there really is no other call to this. This function really
is meant to be protected.

I also used
https://codesearch.wmflabs.org/search/?q=function%20pass%5C(&files=php
to make sure I got all subclasses.

Required for I7da632c43681438aa886bdb709379f10cd9cc658.

Change-Id: I9aaf95c66a6efa22131de627ce015587a109858b
2019-01-11 18:39:40 +00:00
Max Semenik
c70119302d Don't check for LIBXML_PARSEHUGE presence
It's been present since PHP 5.3.2.

Change-Id: I23a3c50c10e984abe6ff214fbf504ab6f6be763c
2019-01-07 19:32:39 -08:00