Commit graph

68 commits

Author SHA1 Message Date
Bill Pirkle
5a166f00d8 Comments, tests, and tweaks for JSON decoding quirks
PHP JSON decoding has surprising behavior on some edge cases.
Documented this via comments, added related tests, and tweaked
related CommentStore code.

Bug: T206411
Change-Id: I6927fdaf616b37a04d81a638a0ed257afac9b844
2018-11-07 13:04:21 -06:00
RazeSoldier
24ffbd9bd1 Use "break" instead of "continue"
"continue" statements are equivalent to "break". In PHP 7.3, will generate a warning.

Bug: T200595
Change-Id: I244ecb2e1ce5a76295f014fb1becd8d263196846
2018-08-24 00:18:07 +08:00
Kevin Israel
381858ab52 FormatJson: cleanup after PHP 5.5 support removal
* Use PHP 5.6 constant expression support in definition of ALL_OK.
* Remove one level of nesting in encode(). Follows up I801eaffc.
* Update HTML5 section number in doc comment for XMLMETA_OK.
* Made other minor doc comment fixes, such as capitalizing "JSON".
* Not done: changing $badChars and $badCharsEscaped to constants.
  This will have to wait until HHVM 3.18 support is dropped.

Change-Id: I06413dfe0fedddfd20d3e375eadd9daad6d6230e
2018-06-09 09:06:02 -04:00
Bartosz Dziewoński
0313128b10 Use PHP 7 "\u{NNNN}" Unicode codepoint escapes in string literals
In cases where we're operating on text data (and not binary data),
use e.g. "\u{00A0}" to refer directly to the Unicode character
'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h
(which correspond to the UTF-8 encoding of that character). This
makes it easier to look up those mysterious sequences, as not all
are as recognizable as the no-break space.

This is not enforced by PHP, but I think we should write those in
uppercase and zero-padded to at least four characters, like the
Unicode standard does.

Note that not all "\xNN" escapes can be automatically replaced:
* We can't use Unicode escapes for binary data that is not UTF-8
  (e.g. in code converting from legacy encodings or testing the
  handling of invalid UTF-8 byte sequences).
* '\xNN' escapes in regular expressions in single-quoted strings
  are actually handled by PCRE and have to be dealt with carefully
  (those regexps should probably be changed to use the /u modifier).
* "\xNN" referring to ASCII characters ("\x7F" and lower) should
  probably be left as-is.

The replacements in this commit were done semi-manually by piping
the existing "\xNN" escapes through the following terrible Ruby
script I devised:

  chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8')
  puts chars.split('').map{|char|
    '\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}'
  }.join('')

Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a
2018-06-04 16:20:13 +00:00
Fomafix
bb52950fee Remove workaround for PHP bug 66021 (PHP < 5.5.12)
The PHP bug 66021 <https://bugs.php.net/bug.php?id=66021> was fixed by
https://github.com/php/php-src/pull/518 and is included in PHP 5.4.28+
and PHP 5.5.12+.
This workaround is not necessary anymore because the minimum PHP
version for MediaWiki is 7.0.0+.

Change-Id: I801eaffc253fd88e0d3c87cfe97777837bd3902d
2018-05-31 01:14:58 +00:00
Bartosz Dziewoński
0cccd68dc8 Code style: no space after unary minus operator
Searched for /([^\d\w\s\)\]]\s*)- \d/ to find potential issues.
It seems there's no PHPCS check for this, huh.

Also fixed typo in a comment in LoginSignupSpecialPage.

Change-Id: Iaab1a1f5a9f234971e550e7909aa5c3e0c02a983
2017-01-05 14:38:32 +01:00
Sam Wilson
66e215baee Remove spaces after cast operators
This fixes the outstanding mis-spaced cast operators to bring them
into line with the coding standards on mediawiki.org (and with the
more common usage within this codebase).

Bug: T149545
Change-Id: Ib7bcf95bbee83d20c05f6d621ce7b4e1fb58a347
2016-10-31 13:57:39 +00:00
Kunal Mehta
6e9b4f0e9c Convert all array() syntax to []
Per wikitech-l consensus:
 https://lists.wikimedia.org/pipermail/wikitech-l/2016-February/084821.html

Notes:
* Disabled CallTimePassByReference due to false positives (T127163)

Change-Id: I2c8ce713ce6600a0bb7bf67537c87044c7a45c4b
2016-02-17 01:33:00 -08:00
Kevin Israel
a508f5daee FormatJson: Remove PHP 5.3 compatibility code
MediaWiki now only works with PHP versions that are new enough
to have the encoding options required by encode54(). So fold
that into encode() and remove encode53() and prettyPrint().

Change-Id: I6b22daf8fa01ef608efbde9c6aecdbb5ce03e2b9
2016-02-12 18:49:01 -05:00
Vivek Ghaisas
9f5b6f5aeb Fix whitespace issues around parentheses
Fix issues found by MediaWiki.WhiteSpace.SpaceyParenthesis sniff.

Bug: T102617
Change-Id: Iec7f71e64081659fba373ec20d9d2006306a98f4
2015-06-16 22:14:02 +03:00
Timo Tijhof
532337e6ff Use "string|false" as @return instead of "string|bool" where appropiate
This makes sure static analyzers don't warn for supposedly unsafe
code accessing variables as strings when they could be boolean after
having only checked against false.

https://github.com/scrutinizer-ci/php-analyzer/issues/605

Change-Id: Idb676de7587f1eccb46c12de0131bea4489a0785
2015-04-01 09:48:30 +01:00
Kunal Mehta
c91fd8043b Fix phpcs errors and warnings in includes/json
Change-Id: Id5ae1cabe87f73f7458a744834ebb6a1a7c3dbf8
2015-03-15 02:35:26 +00:00
Bryan Davis
8fea9c619d FormatJson::stripComments
Add stripComments method that can be used to remove single line and
multiline comments from an otherwise valid JSON string. Inspired by the
comment removal code in redisJobRunnerService and discussions on irc
about the Extension registration RFC.

Change-Id: Ie743957bfbb7b1fca8cb78ad48c1efd953362fde
2014-10-12 12:34:22 -06:00
Yuri Astrakhan
c361cb74fa Added missing JsonFormat::parse() RELEASE NOTES, fixed docs
Constant values were changed to be above 0xFF - this way
we can easily decide to allow depth-parsing-limit to be OR-able:

  FormatJson::parse( $value, 30 | FormatJson::FORCE_ASSOC )

Follows-up Ic0eb0a7 and I1c4f37a.

Change-Id: I9bfd67a5ca4ea1d399821549c7e63ffdecd56ad1
2014-09-30 01:45:35 +00:00
Yuri Astrakhan
289d3e4f00 FormatJson::parse( TRY_FIXING ) - remove trailing commas
Removes trailing commas from json text when parsing
Solves very common cases like [1,2,3,]

Resulting status will be set to OK but not Good to warn caller

Change-Id: Ic0eb0a711da3ae578d6bb58d7474279d6845a4a7
2014-09-27 06:20:36 -04:00
Yuri Astrakhan
9a380626bc Added FormatJson::parse( $value, $options = 0 ) returning Status
* Returns Status object that will contain decoded value on success
* Adds i18n messages for all available PHP JSON errors

ATTN Translation team: please copy these messages:

gwtoolset-json-error-depth => json-error-depth
gwtoolset-json-error-state-mismatch => json-error-state-mismatch
gwtoolset-json-error-ctrl-char => json-error-ctrl-char
gwtoolset-json-error-syntax => json-error-syntax
gwtoolset-json-error-utf8 => json-error-utf8

Change-Id: I1c4f37aaabad369b75a1fbd223fad27ebcfe1c3c
2014-09-26 18:55:09 +00:00
Kevin Israel
b9a12b0b33 FormatJson: Remove speculative comment
Follows-up bec7e8287c. The comment "Can be removed once we require
PHP >= 5.4.28, 5.5.12, 5.6.0" relies on some assumptions that might
later prove to be incorrect:

* That the fix won't be reverted from any of those PHP versions
  (e.g. if deemed to break BC)

* That the bug will be fixed in PECL jsonc and jsond, as well as in
  HHVM

* That we don't need to support older versions of those once we
  require one of the mentioned PHP versions

Change-Id: I67034c561d54d37dee961ada8c9cf5ccfd113da1
2014-04-25 15:16:18 -04:00
Kevin Israel
bec7e8287c FormatJson: Skip whitespace cleanup when unnecessary
The patch[1] for PHP bug 66021[2], which removes the same undesirable
whitespace that WS_CLEANUP_REGEX does, has been merged into php-src.
Subsequent PHP versions having the patch shouldn't have to take the
10-20% performance hit from that workaround.

[1]: https://github.com/php/php-src/commit/82a4f1a1a287
[2]: https://bugs.php.net/bug.php?id=66021

Change-Id: I717a0e164952cc6ace104f13f6236e86c4ab8b58
2014-04-24 20:54:44 -04:00
Kevin Israel
1efdda25ee FormatJson: Make it possible to change the indent string
This is to allow consistency with MediaWiki PHP and JS files (e.g. when
generating JSON i18n files), not because tabs are "better" than spaces for
indenting code (both have advantages and disadvantages).

Because PHP's json_encode() function hardcodes the indent string, using tabs
has a performance cost (in post-processing the output) and is less suitable
for web output; thus the API and ResourceLoader debug mode will continue to
use four spaces. Adjusting the maintenance scripts and JSON files is left to
separate change sets.

Bug: 63444
Change-Id: Ic915c50b0acd2e236940b70d5dd48ea87954c9d5
2014-04-16 10:00:10 -04:00
Siebrand Mazeland
07a3e0ae54 Update formatting and comments in FormatJson
Change-Id: I4edc08760004e170b045b201d3cc5444b2028e55
2013-11-25 18:46:25 +01:00
Kevin Israel
b6a5bb484d FormatJson: Remove whitespace from empty arrays and objects
As noted in c370ad21d7, the pretty output can differ between
Zend PHP and HHVM. This change adds some post-processing to make
the output consistent across implementations and with JavaScript
JSON.stringify() and Python json.dumps(); all whitespace between
the opening and closing brackets/braces is removed.

Change-Id: I490e0ff1fac3d6c3fb44ab127e432872c0301a9d
2013-10-09 04:13:31 +02:00
Kevin Israel
39e22628f7 FormatJson: minor cleanup
* Prefer feature detection over version comparison.
* Prefer pre- over post- increment and decrement operators.
* Remove the statement that FormatJson::decode() decodes `true`,
  `false`, and `null` case insensitively. Nobody should assume
  this is (or is not) the case, even though the PHP manual says so.
* Avoid using the ternary operator with long strings; prior to
  PHP 5.4, the operator prevented the copy-on-write optimization.
* Avoid placing comments on the same lines as code.

Change-Id: I8fc88e9b7b49aa0cbd4128216557836a3b2cd011
2013-10-08 18:28:09 -04:00
Kevin Israel
876bddf637 Change @since and @deprecated notes to 1.22
Using the following command line, I have found doc comments mentioning
"1.21" when they should mention "1.22" instead, which I have fixed
manually:

git diff REL1_21 | grep --color=always -C 10 -iE \
'^\+.*(since|deprecated).*1\.21(\D|$)' | aha > oldver.html

I also moved the release notes for I1987190f ("Combine JavaScript and
JSON encoding logic") from RELEASE-NOTES-1.21 to RELEASE-NOTES-1.22
because I had reverted the commit on REL1_21 only (see Id3b88102 and
bug 47431 for the rationale).

Change-Id: I11b917a371e07267dfa98b8449776d0c1cb29b15
Follows-Up: I25cf5a94f6e47f85a9d0b80cc1c9c9f957288478
Follows-Up: I3d72e4105f6244b0695116940e62a2ddef66eb66
Follows-Up: I3faa9c3e8107c6e46cdf21f8c18adda1f42890d7
Follows-Up: I6aab19c8d68bf47beddad42632b0360a7b12f251
Follows-Up: I86368821fc2cd0729df5342b8572eb470c0f77a0
Follows-Up: Id3b88102e768318e3605a19e9952121091a40915
Follows-Up: Ie667088010e24eb6cb569f9e8e8e2553005223eb
2013-06-21 05:33:22 +00:00
Kevin Israel
e9443cf677 FormatJson: microoptimizations for UTF8_OK mode
* Replace strtr with str_replace where faster.
* Use addcslashes to escape json_encode's output. Because no control
  characters are included, the only characters that have to be
  escaped are \ and ". (irrelevant for PHP 5.4+ installations)

Re-encoding a ~1.5 MB API response from the Chinese Wikipedia:
* PHP 5.3: 32% faster (from 347 ms to 239 ms)
* PHP 5.4: 70% faster (from 51 ms to 15 ms)
* HHVM: 42% faster (from 326 ms to 191 ms)

Change-Id: I7c9342682986d40a2f2436ac978390b6018a3521
2013-04-08 05:07:07 +00:00
Kevin Israel
217cb2e3a6 Fix pretty JSON when strings end with backslashes
If a string encoded as part of the output ends in a backslash
(e.g. an edit token), FormatJson::prettyPrint() may incorrectly
treat the unescaped double quote marking the end of the string as
a character that is part of the string.

This is a serious problem in that the "pretty" output may not
necessarily be valid JSON; a later string literal might contain
one or more of these tokens: :[{,]}

To fix the bug, I exploit strtr's behavior when it is given an
associative array having keys of the same length to skip over
escaped backslashes while replacing escaped double quotes with "\x01".

I also updated the corresponding unit test.

Change-Id: I159105b6493c14b82cd0a41a95e04bfed744931e
2013-03-30 16:23:24 -04:00
Kevin Israel
79f80cc495 Combine JavaScript and JSON encoding logic
This will help with improving human readability of JS and JSON
objects encoded by both ResourceLoader and the API. This patch
also adds new "utf8" parameter to the JSON formatter of the API.

Changes to FormatJson class:

* Added escaping of '<', '>', and '&' by default to protect against XSS.
* Removed unnecessary escaping of '/' and added an additional option to
  unescape non-ASCII characters (those above U+007F) as well.
* Added PHP 5.3 pretty printing code (to replace Services_JSON) that
  uses a four-space indent as PHP 5.4 does.

Changes to Xml class:

* Defined Xml::encodeJsVar() in terms of FormatJson::encode()
  and added a pretty printing option. Also added a pretty printing
  option to Xml::encodeJsCall() as well.
* Deprecated Xml::escapeJsString() and QuickTemplate::jstext();
  callers have to add quotes themselves, hence the escaping of
  both double quotes and apostrophes.

Bug: 26818
Change-Id: I1987190f1ba5bf41738e7bd611209706c1f6bb5c
2013-03-27 20:22:45 -04:00
Yuri Astrakhan
9506e3d812 Spellchecked /includes directory
* Ran spell-checker over code comments in /includes/
* A few spellchecking fixes for wfDebug() calls

Found one very strange (NOOP?) line in Linker.php - see "TODO: BUG?"

Change-Id: Ibb86b51073b980eda9ecce2cf0b8dd33f058adbf
2013-03-13 03:42:41 -04:00
Tyler Anthony Romeo
4dcc7961df Fixed @param tags to conform with Doxygen format.
Doxygen expects parameter types to come before the
parameter name in @param tags. Used a quick regex
to switch everything around where possible. This
only fixes cases where a primitve variable (or a
primitive followed by other types) is the variable
type. Other cases will need to be fixed manually.

Change-Id: Ic59fd20856eb0489d70f3469a56ebce0efb3db13
2013-03-11 13:15:01 -04:00
umherirrender
25bc3a0727 The use of function sizeof() is forbidden; use count() instead
From phpcs

Change-Id: I919c8af46a722cd1c14bb8c134400e2ec51160d1
2013-01-26 22:20:04 +01:00
Ori Livneh
10254ee6e8 Fix misleading param name in FormatJson::encode signature
As the FIXME notes, $isHtml has nothing to do with HTML; it simply controls
whether the resultant JSON should be formatted for readability by inserting
whitespace as appropriate.

Change-Id: I90d46d6624d683f18a39c98500bd71bbd0ca3800
2012-11-23 13:51:15 -08:00
umherirrender
e5f5e95137 Fix indentation whitespace errors
Change-Id: Ie268bee2098c589c050e1b5b0e93fe1b3feca86f
2012-10-26 17:42:13 +02:00
jeroendedauw
38c7f444e1 Use __DIR__ instead of dirname( __FILE__ )
We can now do this since we finally switched to PHP 5.3 for MW 1.20 and get rid of the silly dirname(__FILE__) stuff :)

Change-Id: Id9b2c9cd2e678197aa81c78adced5d1d31ff57b1
2012-08-27 21:45:00 +02:00
Reedy
28e7830d78 PHP 5.4 has JSON_PRETTY_PRINT
Use this conditionally when $isHtml is true, and is
also running on PHP > = 5.4. Else return default 0

Change-Id: Ief775720a99d1a305c3f9f4ba7cc04eb96817fb3
2012-08-14 18:57:40 +02:00
Reedy
de185ca16f Remove workaround hack for php bug 46944
https://bugs.php.net/bug.php?id=46944

Fixed for 5.3.0, and as we require >= 5.3.2, workaround is redundant

http://php.net/ChangeLog-5.php

Change-Id: I567466c0c747dba2f903e9258d0f06f725cefb8f
2012-08-12 21:11:46 +00:00
Antoine Musso
aab43dd495 escape tags and entity in doxygen comments
When inserting XML elements inline <such as this one>, doxygen chokes
about it not being known. Simply enclosing the tag in double quotes
prevents doxygen from emitting a warning.

Also enclosed a few invalid functions calls such as \. and double quoted
the HTML entities such as &foobar;

Change-Id: I4019637145e683c2bec3d17b2fd98b0c50a932f1
2012-07-10 17:08:32 +02:00
Alexandre Emsenhuber
aad9d5fd6b Removed checks for the "MEDIAWIKI" constant on files that only define classes.
This checks are not needed in that case.

Change-Id: Ia83447427de8b7ea32aced8ff43c7a252b8d504c
2012-05-23 21:20:42 +02:00
Alexandre Emsenhuber
63176b99b7 Added missing GPLv2 headers in some places.
Also made file/class documentation more consistent.

Change-Id: I1deb70318d01a257b51948ba806d80cd1a239f4f
2012-05-04 08:47:07 +02:00
Sam Reed
2ec09c5165 More return documentation 2012-02-09 21:35:05 +00:00
Chad Horohoe
9ee17e43f6 Simplify $assoc check 2011-12-20 20:15:42 +00:00
Sam Reed
9d41b95053 Kill various unused variables
Comment some out also

Add some bits of documentation
2011-10-29 01:17:26 +00:00
Sam Reed
4622da783c More documentation! 2011-10-26 04:15:09 +00:00
Roan Kattouw
485dc3a0d6 Fix the non-PEAR alternative for Services_JSON_Error to be less useless. bug 29278 shows an error related to this class not being stringifiable 2011-06-06 09:41:33 +00:00
Siebrand Mazeland
75c6696aa8 Use consistent notation for "@todo FIXME". Should update http://svn.wikimedia.org/doc/todo.html nicely. 2011-05-17 22:03:20 +00:00
Roan Kattouw
e77aa77cd8 Remove debugging code accidentally committed in r86253 2011-04-17 10:49:27 +00:00
Roan Kattouw
3a7731a6f0 Fix broken check for bad JSON encoders, had been broken since inception and caused the native JSON encoder to always be bypassed in favor of Services_JSON. 2011-04-17 10:48:17 +00:00
Alexandre Emsenhuber
4207ab0c63 * (bug 28511) Use [] syntax instead of {} for string offset access 2011-04-17 07:59:58 +00:00
Bryan Tong Minh
cd311a1678 Revert r83647, was based on the inability to read the function signature 2011-03-10 12:04:38 +00:00
Bryan Tong Minh
e018563918 Add wfObjectToArray to json_decode to ensure that the return value is an array 2011-03-10 12:02:35 +00:00
Platonides
e98e6abdbd Follow up r82091, which uses the local file constant SERVICES_JSON_LOOSE_TYPE. 2011-02-19 16:59:34 +00:00
Brion Vibber
8982ac25b7 * (bug 23817, bug 26250) User Service_JSON's native associative array mode in FormatJson::decode(), bypassing wfObjectToArray (which is also fixed)
Patches from Tim Yates on https://bugzilla.wikimedia.org/show_bug.cgi?id=23817
2011-02-13 23:08:28 +00:00