Commit graph

35 commits

Author SHA1 Message Date
James D. Forrester
f7ce0a0976 Move remaining four classes in includes/content into Content namespace
Bug: T353458
Change-Id: Ia0f3e22078550be410c4b87faf6aa4eabe6e270d
2024-08-10 10:40:53 +02:00
Umherirrender
fc9e42823b rdbms: Create IReadableDatabase::andExpr() / ::orExpr()
Avoid the call to internal constructor of AndExpressionGroup and
OrExpressionGroup by creating a factory function similiar as the
IReadableDatabase::expr function for Expression objects.

This is also a replacement for calls to ISQLPlatform::makeList with
LIST_AND or LIST_OR argument to reduce passing sql as string to the
query builders.

Created two functions to allow the return type to be set for both
expression group to allow further calls of ->and() or ->or() on the
returned object.
Depending on the length of the array argument to makeList() it is
sometimes hard to see if the list gets converted to AND or OR, having
the operator in the function name makes it easier to read, so two
functions are helpful in this case as well.

Bug: T358961
Change-Id: Ica29689cbd0b111b099bb09b20845f85ae4c3376
2024-07-11 15:29:20 +00:00
Ebrahim Byagowi
97d1202784 Add namespace and deprecation alias to TextContent
This patch introduces a namespace declaration for the
MediaWiki\Content to TextContent and establishes a class
alias marked as deprecated since version 1.43.

Bug: T353458
Change-Id: Ic251b1ddfcf6db9c85cb54cddf912aa827d2bc3a
2024-05-19 23:23:01 +03:30
Lucas Werkmeister
8908074552 LinkFilter::makeLikeArray: Fix another 'path' access
If a news: or mailto: URL is specified with two slashes, it will have a
'host' rather than a 'path' after all, so this workaround is unnecessary
and should be skipped in that case; compare also change Idc6b389da9
(commit ec1b572362) for makeIndexes().

I’m not very sure that the test case makes much sense, but it’s at least
enough to trigger the error and verify the fix.

Bug: T364743
Change-Id: I09be813e661b80968da00d8a898b2add8c95fec7
2024-05-14 12:29:23 +02:00
Umherirrender
8d97313f81 Fix some line indent
Change-Id: I8f82724197d20f9289d80e138d80310f1eab29f2
2024-04-20 00:25:15 +02:00
James D. Forrester
8e940c4f21 Standardise all our class alias deprecation comments for ease of grepping
Change-Id: I7f85d931d3b79da23e87b4e5692b2e14be8fcaa0
2024-03-19 20:11:29 +00:00
Umherirrender
564ff2beab Combine the expressions in LinkFilter::getQueryConditions
Avoid use of array.
It removes some extra parenthesis from the query which are only one
expression and no longer a AndExpressionGroup, as a group with one
element is not needed.

Bug: T358961
Change-Id: I9daad5e3703bd4a94f56d384c922cb415b5c2fb4
2024-03-02 20:01:36 +00:00
Bartosz Dziewoński
e4c7272976 Change uses of getDBLoadBalancerFactory() to getConnectionProvider()
Update cases where one of the IConnectionProvider methods is called
immediately.

This doesn't really change anything, but I hope it helps promote
getConnectionProvider() as the common way to do this.

Follow-up to 8604c384f6.

Change-Id: Id0e7d02bab0c570343c2b1f03c70b44ee39db112
2024-01-22 22:27:45 +01:00
Dogu
652cfccb5c Replace deprecated wfParseUrl with UrlUtils::parse
The wfParseUrl function is deprecated as of MediaWiki 1.39 and has been
replaced with the UrlUtils::parse method provided by the UrlUtils class.

Change-Id: I5df192af99b38774c458bd4e0836fdce581683dd
2024-01-08 14:12:32 +01:00
Amir Sarabadani
f60e576c69 rdbms: Add support for LIKE in expression builder
Bug: T210206
Change-Id: Iec33a64bb1ec1485ce91b8b05e660f8c1723182b
2023-11-03 02:03:44 +01:00
Amir Sarabadani
d5adc3ca65 Mass migrate simple cases to use expression builder
Done via
'([A-Za-z_\.]+) ?(=|!=|<|<=|>|>=) ?' . (\$db(?:r|w|))->addQuotes\( (.+?) \)
to:
$3->expr\( '$1', '$2', $4 \)

And
'([A-Za-z_\.]+) IS NULL OR ([A-Za-z_\.]+) ?(=|!=|<|<=|>|>=) ?' . (\$db(?:r|w|))->addQuotes\( (.+?) \)
to:
$4->expr( '$1', '=', null )->or\( '$2', '$3', $5 \)

Bug: T210206
Change-Id: I109bf2a712bdefa9e074f775b1bee41ac5b9d665
2023-10-26 16:59:19 +00:00
Bartosz Dziewoński
978d739bc6 Replace single-value $db->buildComparison() with $db->expr()
Find:
->buildComparison\( ('..?'), \[(\s*)([^\],]+) => ([^\],]+)(\s*)\] \)

Replace with:
->expr($2$3, $1, $4$5)

Change-Id: I2cfc3070c2a08fc3888ad48a995f7d79198cc336
2023-10-22 01:05:47 +02:00
jenkins-bot
70ef48b846 Merge "Improve performance of trivial encoding/decoding regexes" 2023-10-17 20:54:11 +00:00
James D. Forrester
ec1b572362 LinkFilter::makeIndexes: Don't explode if the 'host' key is missing for news://
Bug: T347574
Change-Id: Idc6b389da974a70bdee9b1d49e4b5c45ccdd0d73
2023-10-10 09:55:19 -04:00
thiemowmde
f5cd1ba7ca Improve performance of trivial encoding/decoding regexes
Instead of replacing 1 character at a time the functions used here
can replace sequences of any length. This can dramatically reduce the
function call overhead.

Also make use of the `fn ()` syntax because we can.

Change-Id: I2dbc2271aa7847d9b687703f837cb0d850596ef0
2023-10-04 11:09:44 +02:00
jenkins-bot
871c2f2160 Merge "Follow RFC 3986 on what is path in mailto URLs" 2023-09-20 15:38:07 +00:00
Lucas Werkmeister
7122b6b2c7 Add $wgExternalLinksDomainGaps config setting
This setting can be used to optimize externallinks queries for certain
domains that have many entries in the externallinks table, but also big
“gaps” where the table contains no entries for that domain. By putting
those gaps (whose el_id values would usually have been obtained on the
analytics databases) into the configuration, we can have MediaWiki tell
the database to skip those ranges of the table instead of scanning
through them. (This is only relevant for domains that have enough
entries that the database chooses to scan the table in primary key order
rather than using the el_to_domain_index_to_path index and filesorting.)

Bug: T341000
Change-Id: Iec4fe01aaa595fbaf3b427b7baa68a9d7209b117
2023-09-06 20:18:32 +02:00
Amir Sarabadani
06fa7a9107 ExternalLinks: Drop migration code
Anything that writes or reads from now-dropped columns

Bug: T312666
Change-Id: Ic1c69de717bfa03bba94e97dabad9e717ba13fd6
2023-09-05 16:43:18 +02:00
jenkins-bot
9f3c7996ee Merge "Schema: Drop old externallinks columns and indexes" 2023-09-05 14:05:03 +00:00
Amir Sarabadani
e5eda1c358 Schema: Drop old externallinks columns and indexes
Already dropped from production

Also dropping FixExtLinksProtocolRelative as it's not useful anymore and
it has been run in previous releases so it's not worth fixing.

Bug: T312666
Change-Id: I1dd6e704b34e685ada6e316da11243d10827d769
2023-09-05 15:32:23 +02:00
Amir Sarabadani
e1b3323312 Migrate calls to wfGetDB() in static methods
wfGetDB() has been deprecated since 1.39 (or more?) and it's better to
inject LBF and call ::getReplicaDatabase() or ::getPrimaryDatabase()
which is not straightforward in classes but for static functions, there
is no way to inject the method so we can simply call
MediaWikiServices::getInstance()->getDBLoadBalancerFactory()

While I was here, I migrated one call to SelectQueryBuilder.

Bug: T330641
Change-Id: Idd2278cef647035dce05a2d461a620e145fe1167
2023-09-05 10:48:31 +02:00
Petr Pchelko
5ad8ee4d92 Follow RFC 3986 on what is path in mailto URLs
This hack was originally added to wfParseUrl
as a fix for T10324 specifically for LinkFilter,
however according to the RFC 3986 this is wrong.

RFC defines that in URLs the authority component
must start with //, so in urls without //, e.g. news:
or mailto: there is no authority component, and thus
no host component, everything after : is actually a path,
so default PHP parse_url is correct.

RFC even has an example:
> For example, the URI <mailto:fred@example.com>
has a path of "fred@example.com".

It's fairly ugly to just copy-paste the hack
into LinkFilter, but I didn't find an easy and
elegant way to rewrite it without making any
changes to the link indexes values stored in the DB.

See https://datatracker.ietf.org/doc/html/rfc3986

Co-Authored-by: 沈澄心 <dringsim@qq.com>
Change-Id: I3dd04495db9c7a66f62c3914c0eff06754b7d560
2023-09-04 05:48:23 +00:00
James D. Forrester
c68841f2c7 Follow-up 22cec53: Add in-code comment on alias for when it was added
Change-Id: I98a0cc509c3436a2f77996781db393d555d3504a
2023-08-25 20:54:52 +00:00
Amir Sarabadani
8059435f23 Externallinks: Keep domain wildcard if path is not specified
Currently, if the query is *.wikipedia.org, it still makes an exact
match to only wikipedia.org and not any of the subdomains.

Bug: T326251
Change-Id: Ib372c35220a89ad9cd4d9879f4436ed153a830c7
2023-07-11 12:56:42 +02:00
Amir Sarabadani
4f726c6d59 ExternalLinks: Make oneWildcard avoid adding wildcard to domain
This is not providing much value and on top of that it makes using the
el_to_domain_index_to_path index possible by turning like into exact
match.

Bug: T326251
Change-Id: Icace8725ab8b19e78072ed45f306ccf4ef90e2eb
2023-07-10 18:38:55 +02:00
Timo Tijhof
bcd6c5eaac ExternalLinks: Clean up LinkFilter file header and code comments
Clean up doc blocks. Remove redundant file-level description and
ensure the description and any ingroup are on the class block.
Ref https://gerrit.wikimedia.org/r/q/owner:Krinkle+message:ingroup

Remove mention of outdated `el_index` field and instead describe
the purpose more generally. The internal column names should mostly
not matter to the callers anyway.

Follows-up I123662f40f6efb, mostly pre-existing issues except for
the duplicate `'protocol'` default being specified in two places
which this patch improves upon.

Change-Id: Ief9b733377ce4611881b15b7faeedc5ee13916ae
2023-06-21 18:34:51 +00:00
Amir Sarabadani
e4078e9940 LinkSearch: Change default protocol to http:// and https:// in READ_NEW
Now that el_to_domain is much smaller and indexed, this shouldn't be
taxing on the database anymore.

It's not perfect but it works beautifully.

Bug: T14810
Change-Id: I123662f40f6efbfd24f280984cd824ced6892840
2023-06-16 00:42:29 +02:00
Amir Sarabadani
88d7e39857 Externallinks: Make port part of the index
This is important in rebuilding the URL and causes bugs such as T337149#8910620

Bug: T337149
Change-Id: I9cd5a17da6da9fdd85574de06e6f5d0310dd48f3
2023-06-08 20:28:03 +02:00
Amir Sarabadani
a69e2d9ce4 ExternalLinks: Make IP links work with read new
Fixed tests and such. It's not the prettiest patch I have ever written
but I'm planning to refactor the whole class once we are done with the
data migration.

Bug: T337149
Change-Id: I3303a063455cf444b78f4d5832d6bf243b290556
2023-05-31 21:11:11 +02:00
Amir Sarabadani
c4de31c2e6 ExternalLinks: Fix mailto: handling in read new
Added regression tests too.

Bug: T337149
Change-Id: Ia5edf60cd4180bc92e87a5cebf34cf23aed3c574
2023-05-31 16:47:18 +02:00
Amir Sarabadani
4d2396dec5 ExternalLinks: Add support for non-reveresed indexed URLs
This can be useful to get https://foo.com out of proto-relative one or
add trailing / to end of URLs without one, etc. So we could compare
content of externallinks with the URLs provided.

Bug: T337149
Change-Id: I921728974cde0a095fb3034fc80f7f4bb046f380
2023-05-26 12:40:18 +00:00
Amir Sarabadani
9ef9dc366a Stop storing more than one row for proto-relative external links
Bug: T335819
Change-Id: I21e467bdd57768bee0ca0a6018fec4e20009911e
2023-05-25 15:43:23 -04:00
Amir Sarabadani
e06f77134d ExternalLinks: Add function for looking up extlinks of a page
This logic has been repeated in three different extensions, let's DRY
them up.

Bug: T326251
Change-Id: I8ae9ef388957b0c04efa281f3bc3b5796bec17fe
2023-04-24 19:55:57 +02:00
Amir Sarabadani
4d30532784 Add support for externallinks read new
In API and Special:LinkSearch

That's basically where it's needed in core.

Bug: T326251
Change-Id: I7f95a2bb983987319c1b0ca0ff231064b0c07278
2023-04-20 14:15:01 +02:00
Amir Sarabadani
22cec534c5 Reorg: Move LinkFilter to ExternalLinks
It's one-class namespace and I know it's not great but:
 - I hope to add more classes with the redesign of externallinks table
 - It's not named very well either, it's a collection of URL-related
   functionalities
 - Making it clear LinkFilter is about external links, not internal or
   interwiki or templatelinks etc.

Bug: T321882
Change-Id: I0dd530237f45e4fec786178ec03ee941c6bcd982
2023-03-01 22:08:29 +01:00