Doxygen was unable to parse the file past validateSig().
> Parser.php:6397: warning: reached end of file while inside a ~~~ block!
> The command that should end the block seems to be missing!
Change-Id: I3d1b547968302611d2bd78a7c11dd0738b40d23a
* This makes use of the injected new revision object used elsewhere
in Parser to solve this problem.
Bug: T94407
Change-Id: I7881583cf7cb2bc799c89ffaa2a344a2d4ca3a4e
This drops support for the custom utf8 normal PHP extension in favor
of the intl extension.
Bug: T90825
Change-Id: Ifbaeb2ef684217cf6187ccc4fb4d303f89608300
* There's a branch path in the sanitizer that depends on $wgUseTidy,
which means the test output differs from on wiki.
* In general, we should set these variables to match the wiki behaviour
in tests.
* Exposes T92892, Sanitizer removes empty tags when tidy is disabled.
* Tweaked tests for T19663 to use an extension tag to show that
HTML5 tags with non-word characters make it through the parser
intact (before being ultimately sanitized).
Change-Id: I09c72fd739e11a8b757f37dc4c790758d782ad73
This is a hard deprecation, with getSecondaryDataUpdates returning an
empty array and addSecondaryDataUpdate throwing an exception. This seems
prudent since there are no known users of these methods, and they
interfere with the parser cache:
DataUpdates are basically jobs, they need access to services to
function. That makes them inherently non-serializable. This interferes
with the function of the parser cache, which serializes ParserOutput
objects in order to persist them.
This could be solved by splitting DataUpdates into DataUpdateDefinitions
and DataUpdateHandlers, similar to how JobSpecification works with
wgJobClasses. That however seems pointless and overkill, since
ParserOutput already has a mechanism for storing arbitrary data,
including any info needed by an UpdateJob: the setExtensionData method.
After this change, the preferred method to introduce custom data updates
is to store any relevant data using setExtensionData and
implement Content::getSecondaryDataUpdates() if possible. If not,
use the 'SecondaryDataUpdates' hook to construct the necessary update
objects from the info stored using setExtensionData.
Change-Id: I0f6f49e61fa3d8904e55f42c99f342a3dc357495
* Use special prioritized refreshLinksJobs instead, which triggers when
transcluded pages are changed
* Also added a triggerOpportunisticLinksUpdate() method to handle
dynamic transcludes
bug: T89389
Change-Id: Iea952d4d2e660b7957eafb5f73fc87fab347dbe7
* Use special prioritized refreshLinksJobs instead, which triggers when
transcluded pages are changed
* Also added a triggerOpportunisticLinksUpdate() method to handle
dynamic transcludes
bug: T89389
Change-Id: I8e5a6ddb643c12e0fb5c1c68bc83f912944e6e8d
Currently, the parser adds a "_2" to the second of two identical headlines to
avoid collisions, but there's still a collision if another headline actually
ends in "_2". This change causes the new headline to also be checked for a
collision, and advances to "_3" or beyond if there is one.
Bug: T26787
Change-Id: Id0a55aa4c1917bac2f8f0d4863fcb85bd3dff1ca
* Sanitizer: dev.w3.org/html5/spec-preview
Follows-up 8e8b15afc6.
Use stable reference to www.w3.org/TR/html5 instead (currently
from October 2014) instead of an old preview branch from 2012.
* parserTests: dev.w3.org/html5
Follows-up 959aa336a1.
Url is now a dead end. Replaced with link to a draft from around
that time. The relevant section no longer exists in the curent
spec as it got split off into a separate spec. Maybe this one:
https://url.spec.whatwg.org/#percent-encoded-bytes
* Parser, HTMLIntField: dev.w3.org/html5
Use stable reference to www.w3.org/TR/html5 instead.
* HTMLFloatField.php: dev.w3.org/html5
Url is now a dead end. Draft from around that time:
http://www.w3.org/TR/2011/WD-html5-20110525/common-microsyntaxes.html#real-numbers
The section "Real numbers" no longer exists in the current spec,
but the Infrastructure chapter has a section on floating point
numbers that describes the same sequence now.
Change-Id: I7dcd49b6cd39785fb1b294e4eeaf39bda52337b2
This demonstrates how we can transition from extensions putting
things into the global scope ($wgTrackingCategories) to instead
storing them in the extension registry. This will increase the
overall performance of the extension registry since it no
longer needs to do an array_merge with $wgTrackingCategories.
For extensions already converted to using the registry
no change is needed as the schema is still the same.
Change-Id: Ie0df4c20b123dac784a1c02eb991edc609a911b6
My previous patch broke this: ApiStashEdit would stash ParserOutput
with no custom DataUpdates, but calling getSecondaryDataUpdates still
failed after unserialization. This patch should fix that.
Bug: T86305
Change-Id: Ic114e521c5dfd0d3c028ea7d16e93eace758deef
Xhprof generates this data now. Custom profiling of various
sub-function units are kept.
Calls to profiler represented about 3% of page execution
time on Special:BlankPage (1.5% in/out); after this change
it's down to about 0.98% of page execution time.
Change-Id: Id9a1dc9d8f80bbd52e42226b724a1e1213d07af7
When a page transcludes itself, such as <noinclude>foo
{{:{{FULLPAGENAME}}}}</noinclude><includeonly>bar</includeonly>, use the
preview content in its own transclusions. This code was basically ripped
straight from Extension:TemplateSandbox.
Bug: T85408
Bug: T7278
Change-Id: I1aa091a395a4f7b7b744e09e0bed59bc2e1176d0
This continues the work started in T67278 to make magic link parsing
more consistent with wiki text parsing in general, and closes two
long-standing bugs.
Bug: T30950
Bug: T31025
Change-Id: I71f8b337543163569c64bbfdec154eb9b69d7264
When #tag is given a tag that it doesn't recognize, re-emit it as a
regular tag instead of giving an error. This allows for it to be used with
transparent tags and HTML tags.
Change-Id: I0ceee8a4fdaf2d3142054a108f445ff06597c31a
This is broken, for reasons indicated in
<https://gerrit.wikimedia.org/r/#/c/180384/>. It was broken before, but I made
it more broken. So revert for now, and I'll give this another stab.
Change-Id: I7e67a61f7d6370f90487be6470bebe1449432a4c
Autolinking free external links is clever about making sure that trailing
punctuation isn't included in the link. But if an HTML entity happens to
terminate the URL, the semicolon from the entity is stripped from the url,
breaking it.
Fix this corner case. This also unifies autolink parsing with Parsoid.
See: I5ae8435322c78dd1df170d7a3543fff3642759b1
Change-Id: I5482782c25e12283030b0fd2150ac55092f7979b
The behavior of the different preprocessors differs when given \r or
\r\n newlines. We already normalize the latter here, so may as well do
the former here too.
Bug: T78488
Change-Id: Id6390f64a73ea01088729f25d79103388c1fe7e8
Ensure that there is a \b boundary before and after RFC, PMID, and ISBN
links. (Previously we enforced \b boundaries only before free external
links and after ISBN links.) Consistency is a good thing!
In addition:
* \b is not a PHP escape sequence, so you don't need to write \\b inside
a string.
* \b before the numeric part of an ISBN is pointless: by the structure
of the regexp there will always be a space on the left and a word
character (a digit) on the right.
Bug: 65278
Change-Id: Ic315b988091a5c7530a8285b9249804db72e55db
* Make the internal MWTidy::*clean() functions always return an array of two
elements: the output buffer and the error buffer.
* Make MWTidy::externalTidy() always read both stdout and stderr. We can read
stderr after stdout because tidy.c produces output in the same order.
* Remove the $stderr parameter from the private MWTidy::*clean() methods, since
error output is always returned.
* Merge MWTidy::phpClean and MWTidy::hhvmClean, since the difference between
them is now small enough that splitting them up is not warranted.
* On HHVM, MWTidy::internalTidy() always returns an empty string for the error
buffer.
Change-Id: I178b42d6ebdd1a5b9bd5921eb093a6c5014ffa49
* This also changes previews to render section edit tokens but
remove them on output, avoiding cache fragmentation.
* Also shortened the resulting getStashKey() value.
Change-Id: Ic8fa87669106b960c76912b864788b781f6ee2e6
- Added/removed spaces around parenthesis
- Added newline in empty blocks
- Added space after switch/foreach/function
- Use tabs at begin of line
- Add newline at end of file
Change-Id: I244cdb2c333489e1020931bf4ac5266a87439f0d
EZC doesn't currently support direct access to object properties via the
obj->std.properties hashtable, but tidy uses this extensively. But it
turns out that for production use cases, tidy_repair_string() should be
sufficient. $wgDebugTidy and $wgValidateAllHtml are not used, and
no deployed extension calls MWTidy::checkErrors().
The only difference I know of is that errors from tidy (status==2) lead
to the tidy output being used, rather than discarded. But
TY_(ReportFatal) has very few callers in tidylib -- probably none that
are reachable from stripped parser output.
So, throw an exception if MWTidy::checkErrors() is requested on an HHVM
instance with the tidy extension. For MWTidy::tidy(), use
tidy_repair_string(). Refactor some relevant code.
Bug: T758
Change-Id: I8d5b1c2c9f9ddce46d8ad099a671a2e297d256e0