This change adds the possibility to enable OOUI out of the parser,
which enabled parser tag functions to easily enable OOUI, if they
need it, for every page view out of the function that handles the
parser tag.
Bug: T106949
Change-Id: If1e139d4f07be98e418e11470794ea42e8a9b2eb
* \s matches the trim on the line.
* Since leading space is ok for table start tags, and you can use them
in ":" context, you should be able to compose the two together.
Bug: T105238
Change-Id: Id08e24e5dd2bb8ca09453adec87b21225df4a840
Non-string input shouldn't be fed into newFromText(). We currently handle this
indirectly with relying on Title to do it. Instead just return earlier and not
try to construct a title from bad input.
Bug: T102321
Change-Id: I9bc96111378d9d4ed5981bffc6f150cbd0c1e331
If someone renames a section but wants old targeted links to still work,
<span id="old-anchor"></span> is the usual solution. And sometimes
people put it inside the section header markup, like
== <span id="old-anchor"></span>New name ==
since putting it before makes it be considered part of the previous section
while putting it after causes the browser to scroll the section header
off the screen.
But this has the unfortunate side effect that the TOC text for that
section will be "<span></span>New name". We should strip that useless
empty span.
Bug: T96153
Change-Id: I47a33ceb79d48f6d0c38fa3b3814a378feb5e31e
Generating one-time, unique strip markers hurts us in multiple ways:
* The strip marker regexes don't benefit from JIT compilation, so they are
slower to execute than they could be.
* Although the regexes don't benefit from JIT compilation, they are still
compiled, because HHVM bets on regexes getting reused. This extra work is
fairly costly (1-2% of CPU usage on the app servers) and doesn't pay off.
* The size of the PCRE JIT cache is finite, and the caching of one-off regexes
displaces from the cache regexes which are in fact reused.
Tim's preferred solution (per his review comment on
https://gerrit.wikimedia.org/r/167530/) is to use fixed strip markers.
So:
* Replace usage of $parser->mUniqPrefix with Parser::MARKER_PREFIX, which
complements the existing Parser::MARKER_SUFFIX.
* Deprecate Parser::mUniqPrefix and its accessor, Parser::uniqPrefix().
* Deprecate Parser::getRandomString(), since it is no longer useful.
* In Preprocessor_*:preprocessToObj() and Parser::fetchTemplateAndTitle,
replace any occurences of \x7f with '?', to prevent strip marker forgery.
\x7f is not valid input anyway.
* Deprecate the $prefix parameter for StripState::__construct, since a custom
prefix may no longer be specified.
Change-Id: I31d4556bbb07acb72c33fda335fa5a230379a03f
preg_match_all can return false on failure, which than results in
undefined index access.
Check the result and just keep it as nothing found by processing an
empty array
Change-Id: I1f11894240dc6869506d68d3513715abdc3abb5d
When adding strip markers, allow closures to be passed in place of text.
The closure is then called during unstrip. Also, add a hook that runs
after unstripGeneral. This is needed for Extension:Cite's I0e136f952.
Change-Id: If83b0623671fd67e5ccc9deaaaab456a6679af8f
Doxygen was unable to parse the file past validateSig().
> Parser.php:6397: warning: reached end of file while inside a ~~~ block!
> The command that should end the block seems to be missing!
Change-Id: I3d1b547968302611d2bd78a7c11dd0738b40d23a
* This makes use of the injected new revision object used elsewhere
in Parser to solve this problem.
Bug: T94407
Change-Id: I7881583cf7cb2bc799c89ffaa2a344a2d4ca3a4e
Currently, the parser adds a "_2" to the second of two identical headlines to
avoid collisions, but there's still a collision if another headline actually
ends in "_2". This change causes the new headline to also be checked for a
collision, and advances to "_3" or beyond if there is one.
Bug: T26787
Change-Id: Id0a55aa4c1917bac2f8f0d4863fcb85bd3dff1ca
* Sanitizer: dev.w3.org/html5/spec-preview
Follows-up 8e8b15afc6.
Use stable reference to www.w3.org/TR/html5 instead (currently
from October 2014) instead of an old preview branch from 2012.
* parserTests: dev.w3.org/html5
Follows-up 959aa336a1.
Url is now a dead end. Replaced with link to a draft from around
that time. The relevant section no longer exists in the curent
spec as it got split off into a separate spec. Maybe this one:
https://url.spec.whatwg.org/#percent-encoded-bytes
* Parser, HTMLIntField: dev.w3.org/html5
Use stable reference to www.w3.org/TR/html5 instead.
* HTMLFloatField.php: dev.w3.org/html5
Url is now a dead end. Draft from around that time:
http://www.w3.org/TR/2011/WD-html5-20110525/common-microsyntaxes.html#real-numbers
The section "Real numbers" no longer exists in the current spec,
but the Infrastructure chapter has a section on floating point
numbers that describes the same sequence now.
Change-Id: I7dcd49b6cd39785fb1b294e4eeaf39bda52337b2
Xhprof generates this data now. Custom profiling of various
sub-function units are kept.
Calls to profiler represented about 3% of page execution
time on Special:BlankPage (1.5% in/out); after this change
it's down to about 0.98% of page execution time.
Change-Id: Id9a1dc9d8f80bbd52e42226b724a1e1213d07af7
This continues the work started in T67278 to make magic link parsing
more consistent with wiki text parsing in general, and closes two
long-standing bugs.
Bug: T30950
Bug: T31025
Change-Id: I71f8b337543163569c64bbfdec154eb9b69d7264
Autolinking free external links is clever about making sure that trailing
punctuation isn't included in the link. But if an HTML entity happens to
terminate the URL, the semicolon from the entity is stripped from the url,
breaking it.
Fix this corner case. This also unifies autolink parsing with Parsoid.
See: I5ae8435322c78dd1df170d7a3543fff3642759b1
Change-Id: I5482782c25e12283030b0fd2150ac55092f7979b
The behavior of the different preprocessors differs when given \r or
\r\n newlines. We already normalize the latter here, so may as well do
the former here too.
Bug: T78488
Change-Id: Id6390f64a73ea01088729f25d79103388c1fe7e8
Ensure that there is a \b boundary before and after RFC, PMID, and ISBN
links. (Previously we enforced \b boundaries only before free external
links and after ISBN links.) Consistency is a good thing!
In addition:
* \b is not a PHP escape sequence, so you don't need to write \\b inside
a string.
* \b before the numeric part of an ISBN is pointless: by the structure
of the regexp there will always be a space on the left and a word
character (a digit) on the right.
Bug: 65278
Change-Id: Ic315b988091a5c7530a8285b9249804db72e55db
- Added/removed spaces around parenthesis
- Added newline in empty blocks
- Added space after switch/foreach/function
- Use tabs at begin of line
- Add newline at end of file
Change-Id: I244cdb2c333489e1020931bf4ac5266a87439f0d
* Added a standard getFunctionStats() method for Profilers to return
per function data as maps. This is not toolbar specific like getRawData().
* Cleaned up the interface of SectionProfiler::getFunctionStats() a bit.
* Removed unused cpu_sq, real_sq fields from profiler UDP output.
* Moved getTime/getInitialTime to ProfilerStandard.
Co-Authored-By: Aaron Schulz <aschulz@wikimedia.org>
Change-Id: I266ed82031a434465f64896eb327f3872fdf1db1
includes/parser/Parser.php
* Pull out a chunk of code we need to reuse from parse() to
internalParseHalfParsed(). This is a fully backwards-compatible
change.
Code changes:
* Add a guard for running ParserBeforeTidy and ParserAfterTidy
hooks, as extensions might not expect them to be called for
snippets, only full page content.
* Change $options to $this->mOptions.
The bulk of parsing work is now done in internalParse() and
internalParseHalfParsed(), parse() only handles four things:
* Resetting parser state when a parse starts/finishes
* Page title language conversion
* Outputting limit report and limitation warnings
* Running ParserAfterParse hook (dunno why, but it's documented)
* Expand documentation for recursiveTagParse(), with some uppercase
warnings so that no one does the stupid thing I did ever again.
* Add new public method recursiveTagParseFully(), which is a
recursive parser entry point that produces fully parsed HTML ready
for inclusion in HTML output. Compared to Parser::parse(), it
doesn't produce limit reports and doesn't run the ParserAfterParse
hook.
includes/parser/CoreTagHooks.php
* Use the new recursiveTagParseFully() method.
* Use Parser::stripOuterParagraph() to remove silly tags.
Bug: 72887
Change-Id: I89ae9a50b82245f9a9e4a903563aeb1c51b6103e
Breaks extensions, doesn't entirely fix the problem it was meant to fix.
This reverts commit 6da3f169ac.
Change-Id: Ic193abcff8c72b0c8b434fcac514f88603a45beb
The JIT compiler in newer versions of PCRE experiences lock contention
when multithreaded applications perform a high rate of concurrent
compilations. We are seeing some performance impact on HHVM under normal
production traffic.
The random part of the strip marker is just there to protect against
deliberate insertion of strip markers into the source text, which is
very rare. So use a generic regex to find strip markers, and check in
the callback whether the random state ID is correct.
StripState::killMarkers() will be slower when it has to remove many
strip markers, but most calls to it will not match any strip markers, so
overall performance should be improved due to reduced JIT compilation.
Bug: 72205
Change-Id: I8d37ae929a8c669c9e39adc8096b89e5732b68d0
addTrackingCategory is more in line with ParserOutput's functionality
(addLink, addCategory etc), and tracking categories are useful even for
content types which do not use the parser at all. There is no reason to
require the caller to obtain a Parser object just to be able to add
tracking categories.
Change-Id: I89d9ea1db3a4e6486e77eee940bd438f7753b776
If you have a reference *to* an object field (anywhere in the call
stack) when you clone the object, the field will be cloned as a
reference rather than as a value.
So we have to break those unexpected references in the cloned object
manually, which is easy enough by making a non-reference copy and then
rebinding the cloned object's reference to this copy.
Bug: 56226
Change-Id: I9c600e9c0845b4fde0366126ce3809d74e2240b4
Add Parser::fetchCurrentRevisionOfTitle(). By default, this just calls
Revision::newFromTitle, but a callback can be set in ParserOptions that
will override it. Anything that runs as part of a parse should use this
wherever possible.
Bug: 70495
Change-Id: I521f1f68ad819cf0f37e63240806f10c1cceef9c
The previous implementation would unescape '&', '=', '+', and '%'. The
first three will break the URL when unescaped in the query string, and
the last will break when unescaped anywhere.
The code is now changed to treat the path, query, and fragment parts of
the URL separately when unescaping. We also escape any unsafe characters
and ensure all percent-encodings use uppercase hexits.
And since the old name is no longer accurate,
Parser::replaceUnusualEscapes is deprecated in favor of
Parser::normalizeLinkUrl.
Bug: 57909
Change-Id: I77dc308d0d016c395ad737c08cf10a7711e25bbd