This reverts r45470 and fixes the problems it identified. Things should
now work as they always used to, but with less code duplication, and
with $wgEnforceHtmlIds = false working correctly as well.
This means that old links will still work in a lot of cases. However,
if the legacy anchor is invalid XML, we omit it. In particular, on
non-Latin wikis, practically all old section links (from external
sources or using external links) are likely to break, since the first
character of legacy anchors starting with a non-ASCII character will be
".", which is invalid in XML as well as in HTML4.
r45267 commented out the logic prohibiting numbers and so on at the
start of id's. I've uncommented this logic, but passed 'noninitial' as
an option to escapeId() in all necessary circumstances. Shouldn't
change behavior, this is to simplify some further work I'm about to do.
Fixes other cases broken by Parser's assumptions failing to hold after change in Title::isAlwaysKnown()'s behavior:
* Links to invalid Special: pages were being recorded, but shouldn't
* Links to valid MediaWiki: pages were no longer recorded
Instead of the NS_FILE special-case in r45174, I'm just tossing *all* isAlwaysKnown links over to ParserOutput::addLink(), and letting the latter worry about what types of titles it won't record.
Just for good measure, in case any NS_MEDIA titles make it into ParserOutput::addLink() they'll be normalized to NS_FILE.
This adds a config option, $wgEnforceHtmlIds, true by default. If this
is set to false, all characters that are allowed in XML ids are let
through in header ids and manually-specified ids. In particular, this
should include all alphabetic and numeric characters.
Some remaining issues to work out:
* This will cause backward-compatibility issues for some types of links
and references: links from non-MediaWiki sources, links from MediaWiki
sources running a different version, external links, and references from
stylesheets/scripts. These could be partially alleviated by having a
second <a name="" id=""> for headers where the two versions differ, but
it would remain an issue for manually-specified id's.
* Any invalid characters are now, effectively, stripped (replaced with
underscores). This might cause problems if some writing systems are
invalid in id's for some reason: we'll want to double-check the list of
prohibited characters carefully.
* Some user agents might not support these links. IE5 appears to, and
so do recent versions of Opera and Firefox, but I didn't do extensive
testing.
* Not tested extensively, there are probably some bugs.
I think this would be good to enable on testwiki for the moment to see
how it goes.
No parser test regressions. No change to RELEASE-NOTES, we can add that
when the option is enabled by default (ideally, removed entirely).
Calling it with no extra arguments will now assume that you're escaping
a whole id, not an id fragment, which is safer. Also, instead of ugly
bitfield-based options, I've changed the options to use an array of
strings. I fixed all callers in trunk. Out-of-tree callers that were
using Sanitizer::NONE will get correct behavior, while those that were
calling it with no arguments will get slightly changed behavior (an x
will be prepended). I think this is harmless enough that we can skip
back-compat cruft here.
This should cause no visible changes. No parser test regressions.
This will break any preexisting links to such sections (other than
those generated by the software, of course). There should be no parser
test regressions.
This is a global search and replace of NS_IMAGE and NS_IMAGE_TALK with NS_FILE and NS_FILE_TALK respectively in all core files, excluding those already updated in step 1 (r44004).
The namespace parsing thing feels very hacky and grabs bits out of an internal implementation function which doesn't feel like a stable interface.
Would recommend thinking about this and coming up with a more serious stable interface for it.
** <poem> handling fixes: use DoubleReplacer class instead of create_function(), moving recursiveTagParse above line-break replacements, removed strip items, updated parser tests to reflect new output when combined with <nowiki>
Alt text is now set in the following ways, in decreasing priority:
1) Set to the alt= parameter if present.
2) Set to the unnamed (caption) parameter if present, and if the image does not have the thumb or frame option set (i.e., if the unnamed parameter is not actually being used for a caption -- using it as both caption and alt text would just lead to text being repeated).
3) Set to the empty string.
Title text and captions should not be affected in any case. The only backward-compatibility effect (i.e., on images not using the new alt= syntax) should be that if previously the same text was repeated in the alt text and then again in the caption, the alt text will now be empty. Setting the alt parameter should never change the HTML output compared to not setting it, except of course changing the alt text.
All parser tests pass, except the usual ones.
* Renamed to "link", which seems clearer and less mouse-centric ;)
* Added parser test cases:
3 new PASSING test(s) :)
* Image with link parameter, wiki target [Has never failed]
* Image with link parameter, URL target [Has never failed]
* Image with empty link parameter [Has never failed]
* Don't call quickUserCan('edit') unless section edit is enabled
* In DatabasePostgres and DatabaseSqlite: throw an exception on connection error
* In DatabasePostgres: don't send an invalid connection string whenever one of the fields is empty. Use quoting.
* In Database: make the captured PHP error prettier
* Display a descriptive error message when the user navigates to index.php with PHP 4, not a parse error. Check to see if the *.php5 extension works, using file_get_contents().
* The default port number for PostgreSQL is 5432, not blank.
* Better default for $wgDBname
The caption was originally defined *as* the alt text (defaulting to the image file name if there is no alt text). Note that a separate caption text is only displayed in some display modes ('frame' and 'thumb', iirc), and not by default.
Please run the parser tests and check the effect you have on them. If it's really an appropriate change, then update the test cases. If you're not sure, consider backing out pending further discussion. :)
It might be appropriate to not set the 'alt' attribute for frame/thumb cases, but definitely not for inline images where we already have a way of setting the alt text which you're removing!
* Removed the namespace parameter from Linker::makeExternalLink(), added a generic associative array of attributes instead. Let the Parser decide whether to use rel=nofollow.