PHP JSON decoding has surprising behavior on some edge cases.
Documented this via comments, added related tests, and tweaked
related CommentStore code.
Bug: T206411
Change-Id: I6927fdaf616b37a04d81a638a0ed257afac9b844
"continue" statements are equivalent to "break". In PHP 7.3, will generate a warning.
Bug: T200595
Change-Id: I244ecb2e1ce5a76295f014fb1becd8d263196846
* Use PHP 5.6 constant expression support in definition of ALL_OK.
* Remove one level of nesting in encode(). Follows up I801eaffc.
* Update HTML5 section number in doc comment for XMLMETA_OK.
* Made other minor doc comment fixes, such as capitalizing "JSON".
* Not done: changing $badChars and $badCharsEscaped to constants.
This will have to wait until HHVM 3.18 support is dropped.
Change-Id: I06413dfe0fedddfd20d3e375eadd9daad6d6230e
In cases where we're operating on text data (and not binary data),
use e.g. "\u{00A0}" to refer directly to the Unicode character
'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h
(which correspond to the UTF-8 encoding of that character). This
makes it easier to look up those mysterious sequences, as not all
are as recognizable as the no-break space.
This is not enforced by PHP, but I think we should write those in
uppercase and zero-padded to at least four characters, like the
Unicode standard does.
Note that not all "\xNN" escapes can be automatically replaced:
* We can't use Unicode escapes for binary data that is not UTF-8
(e.g. in code converting from legacy encodings or testing the
handling of invalid UTF-8 byte sequences).
* '\xNN' escapes in regular expressions in single-quoted strings
are actually handled by PCRE and have to be dealt with carefully
(those regexps should probably be changed to use the /u modifier).
* "\xNN" referring to ASCII characters ("\x7F" and lower) should
probably be left as-is.
The replacements in this commit were done semi-manually by piping
the existing "\xNN" escapes through the following terrible Ruby
script I devised:
chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8')
puts chars.split('').map{|char|
'\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}'
}.join('')
Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a
The PHP bug 66021 <https://bugs.php.net/bug.php?id=66021> was fixed by
https://github.com/php/php-src/pull/518 and is included in PHP 5.4.28+
and PHP 5.5.12+.
This workaround is not necessary anymore because the minimum PHP
version for MediaWiki is 7.0.0+.
Change-Id: I801eaffc253fd88e0d3c87cfe97777837bd3902d
Searched for /([^\d\w\s\)\]]\s*)- \d/ to find potential issues.
It seems there's no PHPCS check for this, huh.
Also fixed typo in a comment in LoginSignupSpecialPage.
Change-Id: Iaab1a1f5a9f234971e550e7909aa5c3e0c02a983
This fixes the outstanding mis-spaced cast operators to bring them
into line with the coding standards on mediawiki.org (and with the
more common usage within this codebase).
Bug: T149545
Change-Id: Ib7bcf95bbee83d20c05f6d621ce7b4e1fb58a347
MediaWiki now only works with PHP versions that are new enough
to have the encoding options required by encode54(). So fold
that into encode() and remove encode53() and prettyPrint().
Change-Id: I6b22daf8fa01ef608efbde9c6aecdbb5ce03e2b9
This makes sure static analyzers don't warn for supposedly unsafe
code accessing variables as strings when they could be boolean after
having only checked against false.
https://github.com/scrutinizer-ci/php-analyzer/issues/605
Change-Id: Idb676de7587f1eccb46c12de0131bea4489a0785
Add stripComments method that can be used to remove single line and
multiline comments from an otherwise valid JSON string. Inspired by the
comment removal code in redisJobRunnerService and discussions on irc
about the Extension registration RFC.
Change-Id: Ie743957bfbb7b1fca8cb78ad48c1efd953362fde
Constant values were changed to be above 0xFF - this way
we can easily decide to allow depth-parsing-limit to be OR-able:
FormatJson::parse( $value, 30 | FormatJson::FORCE_ASSOC )
Follows-up Ic0eb0a7 and I1c4f37a.
Change-Id: I9bfd67a5ca4ea1d399821549c7e63ffdecd56ad1
Removes trailing commas from json text when parsing
Solves very common cases like [1,2,3,]
Resulting status will be set to OK but not Good to warn caller
Change-Id: Ic0eb0a711da3ae578d6bb58d7474279d6845a4a7
* Returns Status object that will contain decoded value on success
* Adds i18n messages for all available PHP JSON errors
ATTN Translation team: please copy these messages:
gwtoolset-json-error-depth => json-error-depth
gwtoolset-json-error-state-mismatch => json-error-state-mismatch
gwtoolset-json-error-ctrl-char => json-error-ctrl-char
gwtoolset-json-error-syntax => json-error-syntax
gwtoolset-json-error-utf8 => json-error-utf8
Change-Id: I1c4f37aaabad369b75a1fbd223fad27ebcfe1c3c
Follows-up bec7e8287c. The comment "Can be removed once we require
PHP >= 5.4.28, 5.5.12, 5.6.0" relies on some assumptions that might
later prove to be incorrect:
* That the fix won't be reverted from any of those PHP versions
(e.g. if deemed to break BC)
* That the bug will be fixed in PECL jsonc and jsond, as well as in
HHVM
* That we don't need to support older versions of those once we
require one of the mentioned PHP versions
Change-Id: I67034c561d54d37dee961ada8c9cf5ccfd113da1
The patch[1] for PHP bug 66021[2], which removes the same undesirable
whitespace that WS_CLEANUP_REGEX does, has been merged into php-src.
Subsequent PHP versions having the patch shouldn't have to take the
10-20% performance hit from that workaround.
[1]: https://github.com/php/php-src/commit/82a4f1a1a287
[2]: https://bugs.php.net/bug.php?id=66021
Change-Id: I717a0e164952cc6ace104f13f6236e86c4ab8b58
This is to allow consistency with MediaWiki PHP and JS files (e.g. when
generating JSON i18n files), not because tabs are "better" than spaces for
indenting code (both have advantages and disadvantages).
Because PHP's json_encode() function hardcodes the indent string, using tabs
has a performance cost (in post-processing the output) and is less suitable
for web output; thus the API and ResourceLoader debug mode will continue to
use four spaces. Adjusting the maintenance scripts and JSON files is left to
separate change sets.
Bug: 63444
Change-Id: Ic915c50b0acd2e236940b70d5dd48ea87954c9d5
As noted in c370ad21d7, the pretty output can differ between
Zend PHP and HHVM. This change adds some post-processing to make
the output consistent across implementations and with JavaScript
JSON.stringify() and Python json.dumps(); all whitespace between
the opening and closing brackets/braces is removed.
Change-Id: I490e0ff1fac3d6c3fb44ab127e432872c0301a9d
* Prefer feature detection over version comparison.
* Prefer pre- over post- increment and decrement operators.
* Remove the statement that FormatJson::decode() decodes `true`,
`false`, and `null` case insensitively. Nobody should assume
this is (or is not) the case, even though the PHP manual says so.
* Avoid using the ternary operator with long strings; prior to
PHP 5.4, the operator prevented the copy-on-write optimization.
* Avoid placing comments on the same lines as code.
Change-Id: I8fc88e9b7b49aa0cbd4128216557836a3b2cd011
Using the following command line, I have found doc comments mentioning
"1.21" when they should mention "1.22" instead, which I have fixed
manually:
git diff REL1_21 | grep --color=always -C 10 -iE \
'^\+.*(since|deprecated).*1\.21(\D|$)' | aha > oldver.html
I also moved the release notes for I1987190f ("Combine JavaScript and
JSON encoding logic") from RELEASE-NOTES-1.21 to RELEASE-NOTES-1.22
because I had reverted the commit on REL1_21 only (see Id3b88102 and
bug 47431 for the rationale).
Change-Id: I11b917a371e07267dfa98b8449776d0c1cb29b15
Follows-Up: I25cf5a94f6e47f85a9d0b80cc1c9c9f957288478
Follows-Up: I3d72e4105f6244b0695116940e62a2ddef66eb66
Follows-Up: I3faa9c3e8107c6e46cdf21f8c18adda1f42890d7
Follows-Up: I6aab19c8d68bf47beddad42632b0360a7b12f251
Follows-Up: I86368821fc2cd0729df5342b8572eb470c0f77a0
Follows-Up: Id3b88102e768318e3605a19e9952121091a40915
Follows-Up: Ie667088010e24eb6cb569f9e8e8e2553005223eb
* Replace strtr with str_replace where faster.
* Use addcslashes to escape json_encode's output. Because no control
characters are included, the only characters that have to be
escaped are \ and ". (irrelevant for PHP 5.4+ installations)
Re-encoding a ~1.5 MB API response from the Chinese Wikipedia:
* PHP 5.3: 32% faster (from 347 ms to 239 ms)
* PHP 5.4: 70% faster (from 51 ms to 15 ms)
* HHVM: 42% faster (from 326 ms to 191 ms)
Change-Id: I7c9342682986d40a2f2436ac978390b6018a3521
If a string encoded as part of the output ends in a backslash
(e.g. an edit token), FormatJson::prettyPrint() may incorrectly
treat the unescaped double quote marking the end of the string as
a character that is part of the string.
This is a serious problem in that the "pretty" output may not
necessarily be valid JSON; a later string literal might contain
one or more of these tokens: :[{,]}
To fix the bug, I exploit strtr's behavior when it is given an
associative array having keys of the same length to skip over
escaped backslashes while replacing escaped double quotes with "\x01".
I also updated the corresponding unit test.
Change-Id: I159105b6493c14b82cd0a41a95e04bfed744931e
This will help with improving human readability of JS and JSON
objects encoded by both ResourceLoader and the API. This patch
also adds new "utf8" parameter to the JSON formatter of the API.
Changes to FormatJson class:
* Added escaping of '<', '>', and '&' by default to protect against XSS.
* Removed unnecessary escaping of '/' and added an additional option to
unescape non-ASCII characters (those above U+007F) as well.
* Added PHP 5.3 pretty printing code (to replace Services_JSON) that
uses a four-space indent as PHP 5.4 does.
Changes to Xml class:
* Defined Xml::encodeJsVar() in terms of FormatJson::encode()
and added a pretty printing option. Also added a pretty printing
option to Xml::encodeJsCall() as well.
* Deprecated Xml::escapeJsString() and QuickTemplate::jstext();
callers have to add quotes themselves, hence the escaping of
both double quotes and apostrophes.
Bug: 26818
Change-Id: I1987190f1ba5bf41738e7bd611209706c1f6bb5c
* Ran spell-checker over code comments in /includes/
* A few spellchecking fixes for wfDebug() calls
Found one very strange (NOOP?) line in Linker.php - see "TODO: BUG?"
Change-Id: Ibb86b51073b980eda9ecce2cf0b8dd33f058adbf
Doxygen expects parameter types to come before the
parameter name in @param tags. Used a quick regex
to switch everything around where possible. This
only fixes cases where a primitve variable (or a
primitive followed by other types) is the variable
type. Other cases will need to be fixed manually.
Change-Id: Ic59fd20856eb0489d70f3469a56ebce0efb3db13
As the FIXME notes, $isHtml has nothing to do with HTML; it simply controls
whether the resultant JSON should be formatted for readability by inserting
whitespace as appropriate.
Change-Id: I90d46d6624d683f18a39c98500bd71bbd0ca3800
We can now do this since we finally switched to PHP 5.3 for MW 1.20 and get rid of the silly dirname(__FILE__) stuff :)
Change-Id: Id9b2c9cd2e678197aa81c78adced5d1d31ff57b1
Use this conditionally when $isHtml is true, and is
also running on PHP > = 5.4. Else return default 0
Change-Id: Ief775720a99d1a305c3f9f4ba7cc04eb96817fb3
When inserting XML elements inline <such as this one>, doxygen chokes
about it not being known. Simply enclosing the tag in double quotes
prevents doxygen from emitting a warning.
Also enclosed a few invalid functions calls such as \. and double quoted
the HTML entities such as &foobar;
Change-Id: I4019637145e683c2bec3d17b2fd98b0c50a932f1