Previously things like "192.168.1.1" couldn't be searched very cleanly in the MySQL backend for two reasons:
* First, the periods were stripped out. This resulted in it being broken into multiple short words: "192 168 1 1", leading at best to false positives and general weirdness.
* Second, for IP addresses these were shorter than the default minimum word length of 4 and thus didn't even get indexed!
The addition of padding for short words let them at least get indexed, but they still didn't turn up cleanly due to the word split. Now allowing periods through to the indexed text, and encoding periods that appear within a compound word so they get caught more cleanly.
Also made a tweak so highlighting works a bit better on word boundaries -- eg "192.168.1.1" no longer hits a highlight match for "192.168.1.100". However it's still not 100% handling some cases with the periods. Sigh.
Let's avoid making up our own syntax that nobody will know or think to try...
Lucene, Google, Yahoo!, and Windows Live search all understand "red OR lion" but see nothing special in "?red ?lion". If we're going to add it, let's make the OR thing work. :)
This is a global search and replace of NS_IMAGE and NS_IMAGE_TALK with NS_FILE and NS_FILE_TALK respectively in all core files, excluding those already updated in step 1 (r44004).
Can we please do these sorts of experimental developments on branches before putting them to trunk? Our trunk is meant to be functional and ready to deploy at all times; doing experiments on trunk delays deployments because we have to test and fix them or roll them back.
Reverted:
r43376
r43385
r43403
r43404
r43405
r43406
r43423
Issues I noticed during a few minutes testing:
* Background color mismatch on the frameset for wikis that set a BG color on Special:
* 'browse pages with this prefix' is kind of funky in "Pages & Project" (goes to main namespace only)
* Which namespaces are in "Pages & Project" is not discoverable
* Default link to nonexistent help page Project:Searching kind of sucks. we should avoid doing those by default. It used to link to Help:Contents; any reason for the change from one page which might have previously existed to one which doesn't?
* "Create the page "Stuff" on this wiki!" - Double page creation links seem kind of weird. There should only be one, otherwise why have two?
* Bad prefix link output when search terms are not a valid wiki title
* We've lost the Whatlinkshere link from the old results
** User search preference removed
** Added Project search
** Removed 'all' prefix use
** Keep help/page link in same fieldset
** Fixed title case of direct page/create link
** Moved "Files" link next to 'page'
** Made 'advanced' act more consistent
* Added div to make next/prev links clearer
* Reduced the 'next' Links to Nowhere
* Removed misaligned bullets
* Code cleanup/better function names/mark visibility
* remove all horrible long messages that just kept being sticked on the search page. There is
no reason to show long messages on *every search* to *everyone*, use the Help link instead
* organize search options into a straighforward menu on right side of search box
* Search box now comes with a header for quickly switching between typical namespace groups:
- Articles - wgNamespacesToBeSearchedDefault namespaces, default for anons
- Articles/Project - wgNamespacesToBeSearchedDefault + wgNamespacesToBeSearchedProject,
default for logged-in users. Contains namespaces like main, user, project, etc..
- Images - local/commons images
- Everything - quick link to search *all* namespaces
- Advanced - this will show our powerbox, which is now not shown on every page
Preferences change:
* logged-in users by default search more namespaces than anons, this relies on assumption
that logged in users are more likely to be regular contributors in a community, and
thus be interested in community stuff as well as articles
* bug 14609, if users leave their namespaces settings on default, changing default
search namespaces will change users namespaces as well
Images:
* bug 5101. Don't hide commons images as broken links if search backend is smart enough to
return them.
*In SearchEngine's case, it gets caught every time further up the page (it never even reaches the upper/lower casing of only first character part)
*PrefixSearch eventually sends it through ApiQueryBase::titleToKey()
*AjaxFunctions immediately sends it to Title::newFromText()
The loss of specific names would create a visible name conflict; when you've got "MySQL.php" open, what the hell is it? Is it the DatabaseMySQL class? Some other random MySQL-related thing? Update.php is also confusing -- we have an update.php which is a command-line script.
Don't do these confusing names; there's no pressing functional need to move the files at all, but if you must move them at least keep their distinct names so I can find my code.
Doxygen documentation update:
* Changed alls @addtogroup to @ingroup. @addtogroup adds the comment to the group description, but doesn't add the file, class, function, ... to the group like @ingroup does. See for example http://svn.wikimedia.org/doc/group__SpecialPage.html where it's impossible to see related files, classes, ... that should belong to that group.
* Added @file to file description, it seems that it should be explicitely decalred for file descriptions, otherwise doxygen will think that the comment document the first class, variabled, function, ... that is in that file.
* Removed some empty comments
* Removed some ?>
Added following groups:
* ExternalStorage
* JobQueue
* MaintenanceLanguage
One more thing: there are still a lot of warnings when generating the doc.
* turned off by default (set $wgAdvancedSearchHighlighting to turn on)
* reverted r26269, \b doesn't interact very good with unicode data,
so it broke highlighting of words that end/begin in nonascii chars
completely
* small bugfixes in unicode handling, tested in more languages
* $wgSearchHighlightBoundaries need to be set to "" for CJK wikis
* benchmarking: on typical simplewiki data, the code is around 4-5 slower
(according to noc.wikimedia.org the old code profiles to about 0.8%),
but can be up to 20 times slower on featured-size articles
* update release notes (also for r33400)
* fix profiling errors in SpecialSearch
* r34072 -- new highlighter code; looks a bit expensive, not fully tested yet.
* r33489 -- broke search result highlighting all around
* Part of r32350 -- bring the color back to search highlighting so we can see our results again. Why was this removed without comment?
snippet extraction:
* prefer text hits over matches on images/templates/tables, making the
snippets more readable and relevant
* cleanup wikitext
* prefer snippets with exact query match - works only for whole phrases
* drop the old context calculation and replace it will a more flexible one
that does a better job keeping snippets of constant width
* if the first line of the article matches whole query show only one snippet
* manually lower/uppercase non-ascii chars so that words in e.g. cyrillic
are also case-insensitive
* workaround for php limited utf8 support so that snippets end up being of
constant char-size over single and multiple byte text
* if there is no text match for some reason, show beginning of the article
Warning:
* haven't done performance testing, might not be safe to go live, although
I don't see any immediate problems with it
* check in a new ajax suggestion engine (mwsuggest.js) which uses
OpenSearch to fetch results (by default via API), this should
deprecated the old ajaxsearch thingy
* extend PrefixSearchBackend hook to accept multiple namespaces for
future lucene use (default implementation however can still
process only one)
* Added to preferences, also a feature to turn it on/off for every
input (disabled atm until I work out browser issues completely)
* WMF wikis probably won't be using API to fetch results, but a
custom php wrapper that just forwards the request to appropriate
lucene daemon, added support for that
SpecialSearch:
* moved stuff out of SpecialSearch to SearchEngine, like snippet
highlighting and such
* support for additional interwiki results, e.g. title matches
from other projects shown in a separate box on the right
* todo: interwiki box doesn't have standard prev/next links to
avoid clutter and unintuitive interface
* support for related articles
* add "all:" prefix that searches all namespaces (port from LuceneSearch)
* added a simplistic replacePrefixes so that now image:something will
always search the image namespace
* let the backend provide snippets and other info, fill only what is not
provided
* wrap textual results in a div, should make the snippets look more
compact and consistent over hits
* added a did you mean.. container
* show total number of hits if available
* added messages for "redirects to article", and "relevant section" hits
so it throws an error. Created a "too many" class as an alternate search result
to return, and consider any error in SearchPostgres when running the actual search as a "too many"
problem. Not an ideal solution, but I'm not sure how to get at the error message
without requiring a newer version of PHP.
* Added support for configuration of an arbitrary number of commons-style file repositories.
* Split Image.php into filerepo/File.php and filerepo/LocalFile.php
* Renamed Image::getImagePath() to File::getPath()
* Added initial support for timestamp-based file fetching (OldLocalFile), to be expanded upon by aaron.
* Changed the interface for Image/File object creation: use wfFindFile() or wfLocalFile() depending on semantics
* ImageGallery::add() now accepts a title object as the first parameter
* Moved file handling operations on upload from SpecialUpload to File
* Removed path-related functions from ImageFunctions.php. Removed static path accessors from File.
* Added a Content-Disposition header to thumb.php output
* Improved thumb.php error handling
* Updated the unit test suite to kind of partially work with modern computers. RunTests.php doesn't work just yet. Fixed an actual regression that the test suite detected -- moved some defines to Defines.php where they will be loaded consistently.
* Add @addtogroup tags to various classes, to try and group conceptually-related classes together.
* Add brief descriptions to various Special pages, thanks to Phil Boswell.
* Moving some docs to be right above the classes they represent, so that they are picked up.