72 lines
3.7 KiB
Text
72 lines
3.7 KiB
Text
TITLE.DOC
|
|
|
|
The Wikipedia software's "Title" class represents article
|
|
titles, which are used for many purposes: as the human-readable
|
|
text title of the article, in the URL used to access the article,
|
|
the wikitext link to the article, the key into the article
|
|
database, and so on. The class in instantiated from one of
|
|
these forms and can be queried for the others, and for other
|
|
attributes of the title. This is intended to be an
|
|
immutable "value" class, so there are no mutator functions.
|
|
|
|
To get a new instance, call one of the static factory
|
|
methods WikiTitle::newFromURL(), WikiTitle::newFromDBKey(),
|
|
or WikiTitle::newFromText(). Once instantiated, the
|
|
other non-static accessor methods can be used, such as
|
|
getText(), getDBKey(), getNamespace(), etc.
|
|
|
|
The prefix rules: a title consists of an optional Interwiki
|
|
prefix (such as "m:" for meta or "de:" for German), followed
|
|
by an optional namespace, followed by the remainder of the
|
|
title. Both Interwiki prefixes and namespace prefixes have
|
|
the same rules: they contain only letters, digits, space, and
|
|
underscore, must start with a letter, are case insensitive,
|
|
and spaces and underscores are interchangeable. Prefixes end
|
|
with a ":". A prefix is only recognized if it is one of those
|
|
specifically allowed by the software. For example, "de:name"
|
|
is a link to the article "name" in the German Wikipedia, because
|
|
"de" is recognized as one of the allowable interwikis. The
|
|
title "talk:name" is a link to the article "name" in the "talk"
|
|
namespace of the current wiki, because "talk" is a recognized
|
|
namespace. Both may be present, and if so, the interwiki must
|
|
come first, for example, "m:talk:name". If a title begins with
|
|
a colon as its first character, no prefixes are scanned for,
|
|
and the colon is just removed. Note that because of these
|
|
rules, it is possible to have articles with colons in their
|
|
names. "E. Coli 0157:H7" is a valid title, as is "2001: A Space
|
|
Odyssey", because "E. Coli 0157" and "2001" are not valid
|
|
interwikis or namespaces. Likewise, ":de:name" is a link to
|
|
the article "de:name"--even though "de" is a valid interwiki,
|
|
the initial colon stops all prefix matching.
|
|
|
|
Character mapping rules: Once prefixes have been stripped, the
|
|
rest of the title processed this way: spaces and underscores are
|
|
treated as equivalent and each is converted to the other in the
|
|
appropriate context (underscore in URL and database keys, spaces
|
|
in plain text). "Extended" characters in the 0x80..0xFF range
|
|
are allowed in all places, and are valid characters. They are
|
|
encoded in URLs. Other characters may be ASCII letters, digits,
|
|
hyphen, comma, period, apostrophe, parentheses, and colon. No
|
|
other ASCII characters are allowed, and will be deleted if found
|
|
(they will probably cause a browser to misinterpret the URL).
|
|
Extended characters are _not_ urlencoded when used as text or
|
|
database keys.
|
|
|
|
Character encoding rules: TODO
|
|
|
|
Canonical forms: the canonical form of a title will always be
|
|
returned by the object. In this form, the first (and only the
|
|
first) character of the namespace and title will be uppercased;
|
|
the rest of the namespace will be lowercased, while the title
|
|
will be left as is. The text form will use spaces, the URL and
|
|
DBkey forms will use underscores. Interwiki prefixes are all
|
|
lowercase. The namespace will use underscores when returned
|
|
alone; it will use spaces only when attached to the text title.
|
|
|
|
getArticleID() needs some explanation: for "internal" articles,
|
|
it should return the "cur_id" field if the article exists, else
|
|
it returns 0. For all external articles it returns 0. All of
|
|
the IDs for all instances of Title created during a request are
|
|
cached, so they can be looked up wuickly while rendering wiki
|
|
text with lots of internal links.
|
|
|