Followup to I76231d6f adding more clarity to the doc blocks that were
changed with respect to when and why they can return different values.
Change-Id: I9afe50c8ee7a74b2c99a89ab19817e466153ba78
Passing around strings that are expected to be safe html and are
known to be based on user input is a fairly unsafe operation. Make
it harder to do the wrong thing by requiring HtmlArmor to be returned
from the ResultSet snippets. This does not address the snippets on
individual result objects as the api surface is larger and requires
more bc handling.
Change-Id: I76231d6fc53c4982eb4cd174d2e6a75eb2740497
These are almost only doc changes, with two exceptions:
1-In LinkHolderArray, int-alike array keys are now cast to int, to be uniform with what we do in other code paths
2-In ExtensionRegistration, changed a line to throw an Exception
immediately, instead of an ExtensionDependencyError. This is because the
latter takes an array with msg and type, but we were passing it a plain
string (and in fact the code was bugged).
Bug: T231636
Change-Id: I8b0ef50d279c2a87490dde6a467a4e22c0710afd
The use case is code that uses search internally, but wants to allow
a development setup where search is proxied via some public API and
the actual querying is performed on a production server, instead of
having to set up all the search services + staging content locally.
Also fix a small logic error in SearchResultSet::shrink.
Change-Id: I83931c1a19c1cb747380e17a4aff3b1fadd5fcf3
Yes, the constructor of a class constructs a new instance of that
class. That's what constructors are for. No need to document this.
Change-Id: Ia94f476512f437f48edfeb71d95b269ac2ca0ab4
Transitional step for the transformation of SearchResult into an
abstract base class:
- RevisionSearchResult is introduced to behave like SearchResult
- methods are currently shared between RevisionSearchResult and
SearchResult in the RevisionSearchResultTrait
Bug: T228626
Change-Id: I13d132de50f6c66086b7f9055d036f2e76667b27
Use the ExtensionRegistry instead of checking for the main class
name of the Cite extension. - I'm not entirely sure if this is much
better. A hook would still be the better option. But at least Cite
could be refactored and the class name changed.
Bug: T89151
Change-Id: I35e5aa9955141b575de68a5be2c0d5b87585eb77
It was just a wrapper to Content::getTextForSearchIndex(), simply use
this method rather than depending on (and sometimes constructing) a
SearchEngine.
Change-Id: I8541248ffdca303f0af3b959cf2f051dcb497925
was only returning two constants and forced a cyclic dep bewteen
SearchEngine and its SearchResult instance.
Simple make these values a constant in SearchHighlighter and use
them as default values for the highlight methods.
Change-Id: Ia78408d5266d0a305006027fe6265a2a1d68b0b9
The function calling this abstract function allows to return null,
so it should be okay to return null here
The null for empty result in SearchOracle
Change-Id: I66a8fb3a4190bf5506f358a47f6f4833b1715c7f
trait meant to hold methods that rarely overridden or that are
non trivial to implement.
Bug: T228626
Change-Id: I4fba27b22860109d648e830a435d4a036d26ad06
Was only called by SpecialSearch, according to IResultWrapper::free()
this method is rarely worth being called. Therefor it does not seem wise
to expose it in the upcoming interface defining a search result set.
Bug: T228626
Change-Id: I12d41a488025eb2d6dd543c9fbdc1c803c840316
This is the preferred method as it enforces read-only mode for DB_REPLICA
and handles LoadBalancer::reuseConnection() calls automatically.
Change-Id: Iab9439ba8e0810fa14c302661ed7a3534f6bfc0d
And start indicating that hooks relying on this data might become
unreliable as this data is only populated by SearchDatabase search
engines.
This information was only populated by SearchDatabase implementations
and due to bad initial design of SearchResult[Set] (now fixed) it forced
users of these classes to carry this information for the sole purpose of
highlighting.
Because SearchEngine can now own their SearchResult[Set] implementations
nothing that is engine specific should be exposed outside of these
specific implementations.
If there are some logic that still requires access to such list of terms
they should be made engine specific by guarding their code against
instanceof SqlSearchResult.
Change-Id: I38b82c5e4c35309ee447edc3ded60ca6a18b247a
Depends-On: I53fe37c65c7940f696c1e184125e01e592a976e4
Also make the update() methods of the subclasses use DB_MASTER as they
should. This avoids read-only errors.
In addition, avoid passing a dummy argument of null in some cases
within SearchEngineFactory::create(). Fix some dynamic calls to
static methods too.
Change-Id: Id94f34994b0f9c18e23ef30cb2fe895e6dedd09c
These global functions were deprecated in 1.34 and services made
available to replace them. See services below;
* wfFindFile() - MediaWikiServices::getInstance()->getRepoGroup()->findFile()
* wfLocalFind() - MediaWikiServices::getInstance()->getRepoGroup()->getLocalRepo()->newFile()
NOTES:
* wfFindFile() and wfLocalFind() usages in tests have been ignored
in this change per @Timo's comments about state of objects.
* includes/upload/UploadBase.php also maintained for now as it causes
some failures I don't fully understand, will investigate and handle
it in a follow up patch.
* Also, includes/MovePage.php
Change-Id: I9437494de003f40fbe591321da7b42d16bb732d6
Seems this was a typo and I think 1.32 which is a double/float will be
implicitly converted to true (bool) because it will resolve 1.32 to 1 as
integer and then 1 which maps to true (bool).
To avoid this, use '1.32' instead of the integer form of the version.
Change-Id: Ifaf6ab0d36bc02bd1707f8caf375f65a30eb1af5
Seems this was a typo and I think 1.32 which is a double/float will be
implicitly converted to true (bool) because it will resolve 1.32 to 1 as
integer and then 1 which maps to true (bool).
To avoid this, use '1.32' instead of the integer form of the version.
Change-Id: I2420396e110284f582cd79820ffc6064e247b4b9
I've checked around with code search tool and realized that SMW has
its own implementation of `transformSearchTerm()` which overwrites the
implementation of this method from core as it extends SearchEngine.
So removing this won't break SMW, see usage below;
Usage
=====
https://codesearch.wmflabs.org/search/?q=%5CbtransformSearchTerm%5Cb&i=nope&files=&repos=
Bug: T220656
Change-Id: I3dbe04ecba07700167c894673e23c1eead95460f
Field rev_text_id is being retired as part of MCR Schema Migration.
Remove references to this field from the Postgres search queries.
Bug: T198341
Change-Id: Id3cbf853aacc06709f441a93248b55943097cd11
The most notable improvements I was able to fit into this patch can be
seen in the User class, as well as in AbstractRestriction.
Our documentation generator ignores the @const tag. It's not needed. Just
have a comment above a constant and it will show up in the generated
documentation.
Using @var is misleading because a constant is not a "variable". The type
of a constant is strictly derived from it's value. Documenting the type
typically does not provide useful information. Doxygen does not understand
the type, but ignores any @… tag and renders everything else as plain text.
I can split this patch if you prefer. Please tell me.
Change-Id: I8019ae45c049822cdc1768d895ea3e3216c6db5f
This was originally a global search and replace. I manually checked all
replacements and reverted them if (due to the lack of type hints) either
null (that would be 0 when counted) or a Countable object can end in the
variable or property in question.
Now this patch only touches places where I'm sure nothing can break.
For the sanity of the honorable reviewers this patch is exclusively touching
negated counts. You should not find a single `!== []` in this patch, that
would be a mistake.
Change-Id: I5eafd4d8fccdb53a668be8e6f25a566f9c3a0a95
Non-breaking change. Remaining uses are public interfaces (a constant, two
globals, a config sub-parameter, SQL queries, storage function names), one i18n
message key, and a whole lot of maintenance scripts with calls to the deprecated
function wfWaitForSlaves().
Change-Id: I6ee5ca92ccf6a80c08f53d9efe38ebb4b05064d7
When submiting using the 'go' feature of Special:Search (standard
usecase for autocomplete) a nearby title can be selected.
Unfortunately a fragment only title is considered valid, which
sends the user to Main_Page#searchterm.
Instead don't allow a search starting with # to trigger the go
feature.
Bug: T182452
Change-Id: I8ce3ce8633e23db41ae6eafb9996ca7294f8445a
In some functions MediaWikiServices::getInstance() was called twices or
in loops. Extract the variable to reduce calls.
Change-Id: I2705db11d7a9ea73efb9b5a5c40747ab0b3ea36f
Search engines support the sort parameter, it would be nice if we
could pass it to be rendered in the web ui. This avoids implementing
any ui, which can be done at a later time.
Bug: T195071
Change-Id: Id7bd35bd4259bb1f1f856933d279dbd6eddfa2e2
For ease of understanding, use the same technique for both
sides of the comparison. Follows-up c2a308075f.
Change-Id: I66475fd4baaa6ab7cd24ad884e9fe01a1cd30d9f
This should be handled internally by SearchEngine implementations.
Bug: T198860
Change-Id: Ifbfd0fcb81fcacf5228bd2ffcac7b80fca872b2a
Depends-On: I7d4ff9498fa1f4ea66835c634b8931f4c29993fb
These methods are very similar there should be no need to have
two differents way to extract the namespace prefix.
Bug: T198860
Change-Id: I22802278452559d35a3d8f6068549c1fef1a5e86
This method was introduced in 4115586000
to support the prefix URI param introduced by the InputBox extension.
There are no reasons that this logic is exposed to SearchEngine users
and should be handled internally by SearchEngine implementations
that supports it.
Previously the search query was updated, now the prefix param will passed
along using SpecialSearch::$extraParams.
Bug: T198318
Change-Id: I33518d3f3ddee741ff4f3b47eb4928009bea66d1
Depends-On: I67c7f1886dd6a2d07c12015e2711c138e9f140e9
The Language::truncate() function was split into
Language::truncateForVisual() (which measures characters) and
Language::truncateForDatabase() (which measures bytes) in 1.31, but
the patch which soft-deprecated Language::truncate() didn't actually
remove all the uses in the codebase. Replace most of those old uses
now, which should actually improve the situation for
non-latin-alphabet users who were getting unfairly squeezed in a
number of places.
Bug: T197492
Change-Id: I2291c69d9df17c1a9e4ab1b7d4cbc73bc51d3ebb
This mimics how full text works by silenty dropping results returned from
search that no longer exist. This could be because the search index is slightly
out of sync with reality, or the search engine could simply be broken.
Only silent from the users perspective. We maintain a count in statsd of
the number of titles dropped. This can be monitored over time to
recognize any increases.
Bug: T115756
Change-Id: I2f29d73e258cd448a14d35a2b4902a4fb6f61c68
Various code using the search engine shouldn't need to implement it's
own methods, such as over-fetching, to determine if there are more
results available. This should be knowledge internal to search that is
exposed by a boolean.
Change-Id: Ica094428700637dfdedb723b03f6aeadfe12b9f4
The funky iteration here was at best annoying. Switch
it over to an iterator based approach with appropriate
BC code to simulate the old iteration style.
Depends-On: I19a8d6621a130811871dec9335038797627d9448
Change-Id: I9fccda15dd58a0dc35771d3b5cd7a6e8b02514a0
The plan is to convert these methods into final, considering
it a removal under the deprecation policy. By making entry
points into the search engine final we provide a guaranteed
point where generic handling can be applied to all search engines.
The first use case for this generic handling is pushing pagination
via overfetch into the SearchEngine class instead of re-implementing
an overfetch in individual parts of the code that perform searches.
Change-Id: I3426d6a2f32d8b368b044b154e1cb70dac007c62
MySQL normally attempts to rank the results when performing a full-
text search. However, this behavior is disabled when using BOOLEAN
MODE. While BOOLEAN MODE is needed in the WHERE clause, the default
NATURAL LANGUAGE MODE can be used in an ORDER BY clause to reenable
ranking.
Bug: T192458
Change-Id: I09462c339432927efead58fb543a10aed2c53195
These comments do not add anything. I argue they are worse than having
no comments, because I have to read them first to understand they
actually don't explain anything. Removing them makes room for actual
improvements in the future (if needed).
Change-Id: Iee70aad681b3385e9af282d5581c10addbb91ac4
In phpcs.xml rename renamed sniffs and add the failing sniffs,
because now the whole sniff is no longer excluded.
Change-Id: If5b0bd16028761abc2c47ace9e97d37ad14bb36f
- mostly auto fixes
- some too long lines fixed
- ignore amp space in one case passing by reference
Change-Id: I6472f83bc3cbf4bd629d83050cc3319b19ec465c
For some varargs a variable name is added with suffix ,... as seen for
many other varargs
Some @param are swapped, because there are in the wrong order
Enable Sniff MediaWiki.Commenting.FunctionComment.ParamNameNoMatch
Change-Id: I60fec6025bce824d5c67563ab7b65ad6cd628ad8
Having such comments is worse than not having them. They add zero
information. But you must read the text to understand there is
nothing you don't already know from the class and the method name.
This is similar to I994d11e. Even more trivial, because this here is
about comments that don't say anything but "constructor".
Change-Id: I474dcdb5997bea3aafd11c0760ee072dfaff124c
Work to support interleaved AB testing of search will display
interleaved results of two search configurations on the first page of
results, but standard results on all pages other than the first page.
This means that while 20 results may be requested, the next 'new' result
of the primary query may be at position 10. Allow the SearchResultSet
to declare what the new offset is.
Bug: T150032
Change-Id: I14c0c33249fcdb66f72f5966e2aa72781a34daee
Having such comments is worse than not having them. They add zero
information. But you must read the text to understand there is
nothing you don't already know from the class and the method name.
Change-Id: I994d11e05f202b880390723e148d79c72cca29f0
Partially revert I61dc536 that broke phrase search support.
Fix phrase search by making explicit that there are two
kind of legalSearchChars() usecases :
- the chars allowed to be part of the search query (including special
syntax chars such as " and *). Used by SearchDatabase::filter() to
cleanup the whole query string (the default).
- the chars allowed to be part of a search term (excluding special
syntax chars) Used by search engine implementaions when parsing with
a regex.
For future reference:
Originally this distinction was made "explicit" by calling directly
SearchEngine::legalSearchChars() during the parsing stage. This was
broken by Iaabc10c by enabling inheritance.
This patch adds a new optional param to legalSearchChars to make this
more explicit.
Also remove the function I introduced in I61dc536 (I wrongly assumed
that the disctinction made between legalSearchChars usecases was due
to a difference in behavior between indexing and searching).
Added more tests to prevent this from happening in the future.
Bug: T167798
Change-Id: Ibdc796bb2881a2ed8194099d8c9f491980010f0f
The used phpcs has a bug, so the version 0.9.0 could not be enforced at the moment.
Will be fixed in next version, see T167168
Changed:
- Remove duplicate newline at end of file
- Add space between function and ( for closures
- and -> &&, or -> ||
Change-Id: I4172fb08861729bccd55aecbd07e029e2638d311
I think the bug was introduced during a cleanup in Iaabc10c.
I don't think that " should be part of the legalSearchChars at query
time, it seems to break the regex.
The strategy here is to distinguish legalSearchChars used query time vs
the ones used at index time by introducing:
SearchEngine::legalSearchCharsForUpdate()
Bug: T167798
Change-Id: I61dc53665e26d3c6c48caed78dd3bbde9a33def7
Allows search engine clients that implement custom definitions
to pass engine hints used at index time.
Hints are a way to fine tune the behavior of the search engine
when handling a particular field.
As of now this is introduced for CirrusSearch to let SearchIndexField
implementations to control how the noop script is configured.
The noop hint with CirrusSearch allows to (for example):
- ignore an update if a numeric value does not change for more than X%
- control the merge strategy of complex fields
Bug: T166589
Change-Id: Ia560e41d33013c30ac47e5a60543f8cb133e61fb
In the non-default configuration where $wgAdvancedSearchHighlighting
is set to true, there is an XSS vulnerability as HTML tags are
not properly escaped if the tag spans multiple search results
Issue introduced in abf726ea0 (MediaWiki 1.13 and above).
Bug: T144845
Change-Id: I2db7888d591b97f1a01bfd3b7567ce6f169874d3
Allows search engine to suggest deleted titles for undelete search.
Note that the titles are still verified against the archive table,
to ensure search engine is not out-of-date.
Bug: T109561
Change-Id: Id6099fe9fbf18481068a6f0a329bbde0d218135f
I was bored. What? Don't look at me that way.
I mostly targetted mixed tabs and spaces, but others were not spared.
Note that some of the whitespace changes are inside HTML output,
extended regexps or SQL snippets.
Change-Id: Ie206cc946459f6befcfc2d520e35ad3ea3c0f1e0
Useful in case the client wants to re-evaluate what was set
here, or if the SearchEngine implementation wants to expose
some of its states.
In our case it allows CirrusSearch to inform SpecialSearch
that we prefer to display search results with a new experimental
layout.
Bug: T156299
Change-Id: I7f661c852ef70ea7bc9ae2959f7d6e48776a9877
Searched for /([^\d\w\s\)\]]\s*)- \d/ to find potential issues.
It seems there's no PHPCS check for this, huh.
Also fixed typo in a comment in LoginSignupSpecialPage.
Change-Id: Iaab1a1f5a9f234971e550e7909aa5c3e0c02a983
At one point SearchIndexFieldDefinition was updated to require the
engine to be passed in, but it seems that update was missed here.
BackupDumper::loadPlugin() requires the second argument, set it to
the empty string to keep current behaviour.
Change-Id: Ifbd8fc4870ff63b2d338f8bb4d251d7a3477b989
This is a pure documentation change. It mostly removes empty lines from
comments (and entirely empty comments), as well as adds a few missing
documentation blocks and fixes a minor mistake. I hope it's ok to have
this in one patch. I can split it, please tell me.
Change-Id: I9668338602ac77b903ab6b02ff56bd52743c37c4
It looks like there is something missing after the last statement
Also remove some other empty lines at begin of functions, ifs or loops
while at these files
Change-Id: Ib00b5cfd31ca4dcd0c32ce33754d3c80bae70641
When scrolling results on prefix search api the exact Title
match is always at pos 0 even if we want to scroll the results
by setting offset to > 0.
Change-Id: Ib02c9d3e479d739e6fe79014d962db50b6fd9de8
Useful for search engines that allow users to customize search profiles.
Depends-On: Icd577c8ebc6e162befe30bde4fe276e633d2e434
Change-Id: I471cd090730d2a25cb70d622ec3bebbe9583118c