Commit graph

66 commits

Author SHA1 Message Date
Amir Sarabadani
f4e68e055f Reorg: Move Status to MediaWiki\Status\
This class is used heavily basically everywhere, moving it to Utils
wouldn't make much sense. Also with this change, we can move
StatusValue to MediaWiki\Status as well.

Bug: T321882
Depends-On: I5f89ecf27ce1471a74f31c6018806461781213c3
Change-Id: I04c1dcf5129df437589149f0f3e284974d7c98fa
2023-08-25 15:44:17 +02:00
Kevin Israel
210a34369a FormatJson: Optimize encode() for supported PHP versions
- Removed the str_replace() call to replace unescaped line terminators
  if UTF8_OK is set. PHP 7.1 and later escape these by default.

  The speedup isn't much at all (about 1% in my testing when encoding an
  API siteinfo result taken from enwiki). Perhaps it's not surprising
  given the way str_replace() works[1]. Still, it's better not to spend
  CPU time looking for characters that will not occur.

- Changed the algorithm for the optional spaces-to-tabs conversion when
  pretty printing. Instead of replacing one indent level throughout the
  entire string before replacing the next level, use a regex to replace
  in one pass. This is usually faster now that PHP 7 enables PCRE's JIT
  compiler by default. Without JIT, the regex was often slower.

  The speedup can be large for deeply nested data. For example, in my
  testing the languages/i18n data took about 8% less time to encode as
  tab-indented JSON, yet the API site info result took about 45% less.
  (This, of course, isn't actually relevant to the API even when pretty
  printed output is requested, because ApiFormatJson uses the default
  indent string of four spaces, which will always be faster unless
  support for tab indentation is added to PHP's json extension.)

- Set options using if statements instead of the ternary operator. This
  is the clearer way, and maybe the slightly faster one, skipping the
  assignment when the flags do not need to be set.

[1]: https://github.com/php/php-src/blob/PHP-8.0.10/ext/standard/string.c#L2969

Change-Id: Iebb1df0264e335a1819956710eeacf6d6b8f1471
2021-08-20 08:03:11 -04:00
Kevin Israel
b084f499db FormatJson: Add message for JSON_ERROR_INVALID_PROPERTY_NAME
The comment added in b9461e3f1c is incorrect. This is actually a
decode error, so is relevant to FormatJson::parse().

Change-Id: I3cc33f0f260c0ba4fe96fb75565f52d089b9a975
2021-08-16 10:57:14 -04:00
Petr Pchelko
dbdc2a3cd3 Introduce JsonCodec to help with serialization/deserialization
Change-Id: I5433090ae8e2b3f2a4590cc404baf838025546ce
2020-11-19 08:32:21 -07:00
Petr Pchelko
7c68ae9296 Safe ParserOutput extension data and JsonUnserializable helper.
One major difference with what we've had before is that now we
actually write class names into the serialization - given that
this new mechanism is extencible, we can't establish any kind
of mapping of allowed classes. I do not think it's a problem
though.

Bug: T264394
Change-Id: Ia152f3b76b967aabde2d8a182e3aec7d3002e5ea
2020-11-10 11:21:09 -07:00
Petr Pchelko
1c70cca3ee Check if non-JSON-serializable data passed to ParserOutput
Bug: T264394
Change-Id: I6eedd03a81b95f6f55d25c00b31e01cbd8658d43
2020-10-05 10:54:08 -06:00
Ed Sanders
0cf40a4f7a Flip Yoda conditionals
Change-Id: Id3495b6f15c267123c89f3a0ace496e6ecbeb58e
2020-07-22 17:49:12 +01:00
Reedy
4cd8d9cff5 Fix numerous PSR12.Properties.ConstantVisibility.NotFound
Change-Id: I2ec09c02c2e4ed399d993cb1871e67df02167ca8
2020-05-11 01:36:36 +01:00
Kunal Mehta
b9461e3f1c FormatJson: Improve parse() error code handling and tests
Three of the errors are encode errors that won't be emitted when we're
trying to decode JSON, so we can ignore those lines of code.

JSON_ERROR_UTF16 is a new error code in PHP 7.0, so add that in.

Improve test coverage while we're at it. The UTF16 test case was
copied from php-src/ext/json/tests/bug62010.phpt.

Change-Id: I79aa0db3d967d512611f8521bb052af36c3cda8e
2020-01-01 02:34:44 -08:00
Max Semenik
8a98dd9d59 Convert some private static arrays to constants
Remove @since for some private ones as we don't guarantee anything
about private class members.

Change-Id: Ifb898353c02082e9ef69d67f69339345c6cd154d
2019-10-16 01:30:54 +00:00
Bill Pirkle
5a166f00d8 Comments, tests, and tweaks for JSON decoding quirks
PHP JSON decoding has surprising behavior on some edge cases.
Documented this via comments, added related tests, and tweaked
related CommentStore code.

Bug: T206411
Change-Id: I6927fdaf616b37a04d81a638a0ed257afac9b844
2018-11-07 13:04:21 -06:00
RazeSoldier
24ffbd9bd1 Use "break" instead of "continue"
"continue" statements are equivalent to "break". In PHP 7.3, will generate a warning.

Bug: T200595
Change-Id: I244ecb2e1ce5a76295f014fb1becd8d263196846
2018-08-24 00:18:07 +08:00
Kevin Israel
381858ab52 FormatJson: cleanup after PHP 5.5 support removal
* Use PHP 5.6 constant expression support in definition of ALL_OK.
* Remove one level of nesting in encode(). Follows up I801eaffc.
* Update HTML5 section number in doc comment for XMLMETA_OK.
* Made other minor doc comment fixes, such as capitalizing "JSON".
* Not done: changing $badChars and $badCharsEscaped to constants.
  This will have to wait until HHVM 3.18 support is dropped.

Change-Id: I06413dfe0fedddfd20d3e375eadd9daad6d6230e
2018-06-09 09:06:02 -04:00
Bartosz Dziewoński
0313128b10 Use PHP 7 "\u{NNNN}" Unicode codepoint escapes in string literals
In cases where we're operating on text data (and not binary data),
use e.g. "\u{00A0}" to refer directly to the Unicode character
'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h
(which correspond to the UTF-8 encoding of that character). This
makes it easier to look up those mysterious sequences, as not all
are as recognizable as the no-break space.

This is not enforced by PHP, but I think we should write those in
uppercase and zero-padded to at least four characters, like the
Unicode standard does.

Note that not all "\xNN" escapes can be automatically replaced:
* We can't use Unicode escapes for binary data that is not UTF-8
  (e.g. in code converting from legacy encodings or testing the
  handling of invalid UTF-8 byte sequences).
* '\xNN' escapes in regular expressions in single-quoted strings
  are actually handled by PCRE and have to be dealt with carefully
  (those regexps should probably be changed to use the /u modifier).
* "\xNN" referring to ASCII characters ("\x7F" and lower) should
  probably be left as-is.

The replacements in this commit were done semi-manually by piping
the existing "\xNN" escapes through the following terrible Ruby
script I devised:

  chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8')
  puts chars.split('').map{|char|
    '\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}'
  }.join('')

Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a
2018-06-04 16:20:13 +00:00
Fomafix
bb52950fee Remove workaround for PHP bug 66021 (PHP < 5.5.12)
The PHP bug 66021 <https://bugs.php.net/bug.php?id=66021> was fixed by
https://github.com/php/php-src/pull/518 and is included in PHP 5.4.28+
and PHP 5.5.12+.
This workaround is not necessary anymore because the minimum PHP
version for MediaWiki is 7.0.0+.

Change-Id: I801eaffc253fd88e0d3c87cfe97777837bd3902d
2018-05-31 01:14:58 +00:00
Bartosz Dziewoński
0cccd68dc8 Code style: no space after unary minus operator
Searched for /([^\d\w\s\)\]]\s*)- \d/ to find potential issues.
It seems there's no PHPCS check for this, huh.

Also fixed typo in a comment in LoginSignupSpecialPage.

Change-Id: Iaab1a1f5a9f234971e550e7909aa5c3e0c02a983
2017-01-05 14:38:32 +01:00
Sam Wilson
66e215baee Remove spaces after cast operators
This fixes the outstanding mis-spaced cast operators to bring them
into line with the coding standards on mediawiki.org (and with the
more common usage within this codebase).

Bug: T149545
Change-Id: Ib7bcf95bbee83d20c05f6d621ce7b4e1fb58a347
2016-10-31 13:57:39 +00:00
Kunal Mehta
6e9b4f0e9c Convert all array() syntax to []
Per wikitech-l consensus:
 https://lists.wikimedia.org/pipermail/wikitech-l/2016-February/084821.html

Notes:
* Disabled CallTimePassByReference due to false positives (T127163)

Change-Id: I2c8ce713ce6600a0bb7bf67537c87044c7a45c4b
2016-02-17 01:33:00 -08:00
Kevin Israel
a508f5daee FormatJson: Remove PHP 5.3 compatibility code
MediaWiki now only works with PHP versions that are new enough
to have the encoding options required by encode54(). So fold
that into encode() and remove encode53() and prettyPrint().

Change-Id: I6b22daf8fa01ef608efbde9c6aecdbb5ce03e2b9
2016-02-12 18:49:01 -05:00
Vivek Ghaisas
9f5b6f5aeb Fix whitespace issues around parentheses
Fix issues found by MediaWiki.WhiteSpace.SpaceyParenthesis sniff.

Bug: T102617
Change-Id: Iec7f71e64081659fba373ec20d9d2006306a98f4
2015-06-16 22:14:02 +03:00
Timo Tijhof
532337e6ff Use "string|false" as @return instead of "string|bool" where appropiate
This makes sure static analyzers don't warn for supposedly unsafe
code accessing variables as strings when they could be boolean after
having only checked against false.

https://github.com/scrutinizer-ci/php-analyzer/issues/605

Change-Id: Idb676de7587f1eccb46c12de0131bea4489a0785
2015-04-01 09:48:30 +01:00
Kunal Mehta
c91fd8043b Fix phpcs errors and warnings in includes/json
Change-Id: Id5ae1cabe87f73f7458a744834ebb6a1a7c3dbf8
2015-03-15 02:35:26 +00:00
Bryan Davis
8fea9c619d FormatJson::stripComments
Add stripComments method that can be used to remove single line and
multiline comments from an otherwise valid JSON string. Inspired by the
comment removal code in redisJobRunnerService and discussions on irc
about the Extension registration RFC.

Change-Id: Ie743957bfbb7b1fca8cb78ad48c1efd953362fde
2014-10-12 12:34:22 -06:00
Yuri Astrakhan
c361cb74fa Added missing JsonFormat::parse() RELEASE NOTES, fixed docs
Constant values were changed to be above 0xFF - this way
we can easily decide to allow depth-parsing-limit to be OR-able:

  FormatJson::parse( $value, 30 | FormatJson::FORCE_ASSOC )

Follows-up Ic0eb0a7 and I1c4f37a.

Change-Id: I9bfd67a5ca4ea1d399821549c7e63ffdecd56ad1
2014-09-30 01:45:35 +00:00
Yuri Astrakhan
289d3e4f00 FormatJson::parse( TRY_FIXING ) - remove trailing commas
Removes trailing commas from json text when parsing
Solves very common cases like [1,2,3,]

Resulting status will be set to OK but not Good to warn caller

Change-Id: Ic0eb0a711da3ae578d6bb58d7474279d6845a4a7
2014-09-27 06:20:36 -04:00
Yuri Astrakhan
9a380626bc Added FormatJson::parse( $value, $options = 0 ) returning Status
* Returns Status object that will contain decoded value on success
* Adds i18n messages for all available PHP JSON errors

ATTN Translation team: please copy these messages:

gwtoolset-json-error-depth => json-error-depth
gwtoolset-json-error-state-mismatch => json-error-state-mismatch
gwtoolset-json-error-ctrl-char => json-error-ctrl-char
gwtoolset-json-error-syntax => json-error-syntax
gwtoolset-json-error-utf8 => json-error-utf8

Change-Id: I1c4f37aaabad369b75a1fbd223fad27ebcfe1c3c
2014-09-26 18:55:09 +00:00
Kevin Israel
b9a12b0b33 FormatJson: Remove speculative comment
Follows-up bec7e8287c. The comment "Can be removed once we require
PHP >= 5.4.28, 5.5.12, 5.6.0" relies on some assumptions that might
later prove to be incorrect:

* That the fix won't be reverted from any of those PHP versions
  (e.g. if deemed to break BC)

* That the bug will be fixed in PECL jsonc and jsond, as well as in
  HHVM

* That we don't need to support older versions of those once we
  require one of the mentioned PHP versions

Change-Id: I67034c561d54d37dee961ada8c9cf5ccfd113da1
2014-04-25 15:16:18 -04:00
Kevin Israel
bec7e8287c FormatJson: Skip whitespace cleanup when unnecessary
The patch[1] for PHP bug 66021[2], which removes the same undesirable
whitespace that WS_CLEANUP_REGEX does, has been merged into php-src.
Subsequent PHP versions having the patch shouldn't have to take the
10-20% performance hit from that workaround.

[1]: https://github.com/php/php-src/commit/82a4f1a1a287
[2]: https://bugs.php.net/bug.php?id=66021

Change-Id: I717a0e164952cc6ace104f13f6236e86c4ab8b58
2014-04-24 20:54:44 -04:00
Kevin Israel
1efdda25ee FormatJson: Make it possible to change the indent string
This is to allow consistency with MediaWiki PHP and JS files (e.g. when
generating JSON i18n files), not because tabs are "better" than spaces for
indenting code (both have advantages and disadvantages).

Because PHP's json_encode() function hardcodes the indent string, using tabs
has a performance cost (in post-processing the output) and is less suitable
for web output; thus the API and ResourceLoader debug mode will continue to
use four spaces. Adjusting the maintenance scripts and JSON files is left to
separate change sets.

Bug: 63444
Change-Id: Ic915c50b0acd2e236940b70d5dd48ea87954c9d5
2014-04-16 10:00:10 -04:00
Siebrand Mazeland
07a3e0ae54 Update formatting and comments in FormatJson
Change-Id: I4edc08760004e170b045b201d3cc5444b2028e55
2013-11-25 18:46:25 +01:00
Kevin Israel
b6a5bb484d FormatJson: Remove whitespace from empty arrays and objects
As noted in c370ad21d7, the pretty output can differ between
Zend PHP and HHVM. This change adds some post-processing to make
the output consistent across implementations and with JavaScript
JSON.stringify() and Python json.dumps(); all whitespace between
the opening and closing brackets/braces is removed.

Change-Id: I490e0ff1fac3d6c3fb44ab127e432872c0301a9d
2013-10-09 04:13:31 +02:00
Kevin Israel
39e22628f7 FormatJson: minor cleanup
* Prefer feature detection over version comparison.
* Prefer pre- over post- increment and decrement operators.
* Remove the statement that FormatJson::decode() decodes `true`,
  `false`, and `null` case insensitively. Nobody should assume
  this is (or is not) the case, even though the PHP manual says so.
* Avoid using the ternary operator with long strings; prior to
  PHP 5.4, the operator prevented the copy-on-write optimization.
* Avoid placing comments on the same lines as code.

Change-Id: I8fc88e9b7b49aa0cbd4128216557836a3b2cd011
2013-10-08 18:28:09 -04:00
Kevin Israel
876bddf637 Change @since and @deprecated notes to 1.22
Using the following command line, I have found doc comments mentioning
"1.21" when they should mention "1.22" instead, which I have fixed
manually:

git diff REL1_21 | grep --color=always -C 10 -iE \
'^\+.*(since|deprecated).*1\.21(\D|$)' | aha > oldver.html

I also moved the release notes for I1987190f ("Combine JavaScript and
JSON encoding logic") from RELEASE-NOTES-1.21 to RELEASE-NOTES-1.22
because I had reverted the commit on REL1_21 only (see Id3b88102 and
bug 47431 for the rationale).

Change-Id: I11b917a371e07267dfa98b8449776d0c1cb29b15
Follows-Up: I25cf5a94f6e47f85a9d0b80cc1c9c9f957288478
Follows-Up: I3d72e4105f6244b0695116940e62a2ddef66eb66
Follows-Up: I3faa9c3e8107c6e46cdf21f8c18adda1f42890d7
Follows-Up: I6aab19c8d68bf47beddad42632b0360a7b12f251
Follows-Up: I86368821fc2cd0729df5342b8572eb470c0f77a0
Follows-Up: Id3b88102e768318e3605a19e9952121091a40915
Follows-Up: Ie667088010e24eb6cb569f9e8e8e2553005223eb
2013-06-21 05:33:22 +00:00
Kevin Israel
e9443cf677 FormatJson: microoptimizations for UTF8_OK mode
* Replace strtr with str_replace where faster.
* Use addcslashes to escape json_encode's output. Because no control
  characters are included, the only characters that have to be
  escaped are \ and ". (irrelevant for PHP 5.4+ installations)

Re-encoding a ~1.5 MB API response from the Chinese Wikipedia:
* PHP 5.3: 32% faster (from 347 ms to 239 ms)
* PHP 5.4: 70% faster (from 51 ms to 15 ms)
* HHVM: 42% faster (from 326 ms to 191 ms)

Change-Id: I7c9342682986d40a2f2436ac978390b6018a3521
2013-04-08 05:07:07 +00:00
Kevin Israel
217cb2e3a6 Fix pretty JSON when strings end with backslashes
If a string encoded as part of the output ends in a backslash
(e.g. an edit token), FormatJson::prettyPrint() may incorrectly
treat the unescaped double quote marking the end of the string as
a character that is part of the string.

This is a serious problem in that the "pretty" output may not
necessarily be valid JSON; a later string literal might contain
one or more of these tokens: :[{,]}

To fix the bug, I exploit strtr's behavior when it is given an
associative array having keys of the same length to skip over
escaped backslashes while replacing escaped double quotes with "\x01".

I also updated the corresponding unit test.

Change-Id: I159105b6493c14b82cd0a41a95e04bfed744931e
2013-03-30 16:23:24 -04:00
Kevin Israel
79f80cc495 Combine JavaScript and JSON encoding logic
This will help with improving human readability of JS and JSON
objects encoded by both ResourceLoader and the API. This patch
also adds new "utf8" parameter to the JSON formatter of the API.

Changes to FormatJson class:

* Added escaping of '<', '>', and '&' by default to protect against XSS.
* Removed unnecessary escaping of '/' and added an additional option to
  unescape non-ASCII characters (those above U+007F) as well.
* Added PHP 5.3 pretty printing code (to replace Services_JSON) that
  uses a four-space indent as PHP 5.4 does.

Changes to Xml class:

* Defined Xml::encodeJsVar() in terms of FormatJson::encode()
  and added a pretty printing option. Also added a pretty printing
  option to Xml::encodeJsCall() as well.
* Deprecated Xml::escapeJsString() and QuickTemplate::jstext();
  callers have to add quotes themselves, hence the escaping of
  both double quotes and apostrophes.

Bug: 26818
Change-Id: I1987190f1ba5bf41738e7bd611209706c1f6bb5c
2013-03-27 20:22:45 -04:00
Yuri Astrakhan
9506e3d812 Spellchecked /includes directory
* Ran spell-checker over code comments in /includes/
* A few spellchecking fixes for wfDebug() calls

Found one very strange (NOOP?) line in Linker.php - see "TODO: BUG?"

Change-Id: Ibb86b51073b980eda9ecce2cf0b8dd33f058adbf
2013-03-13 03:42:41 -04:00
Tyler Anthony Romeo
4dcc7961df Fixed @param tags to conform with Doxygen format.
Doxygen expects parameter types to come before the
parameter name in @param tags. Used a quick regex
to switch everything around where possible. This
only fixes cases where a primitve variable (or a
primitive followed by other types) is the variable
type. Other cases will need to be fixed manually.

Change-Id: Ic59fd20856eb0489d70f3469a56ebce0efb3db13
2013-03-11 13:15:01 -04:00
Ori Livneh
10254ee6e8 Fix misleading param name in FormatJson::encode signature
As the FIXME notes, $isHtml has nothing to do with HTML; it simply controls
whether the resultant JSON should be formatted for readability by inserting
whitespace as appropriate.

Change-Id: I90d46d6624d683f18a39c98500bd71bbd0ca3800
2012-11-23 13:51:15 -08:00
jeroendedauw
38c7f444e1 Use __DIR__ instead of dirname( __FILE__ )
We can now do this since we finally switched to PHP 5.3 for MW 1.20 and get rid of the silly dirname(__FILE__) stuff :)

Change-Id: Id9b2c9cd2e678197aa81c78adced5d1d31ff57b1
2012-08-27 21:45:00 +02:00
Reedy
28e7830d78 PHP 5.4 has JSON_PRETTY_PRINT
Use this conditionally when $isHtml is true, and is
also running on PHP > = 5.4. Else return default 0

Change-Id: Ief775720a99d1a305c3f9f4ba7cc04eb96817fb3
2012-08-14 18:57:40 +02:00
Reedy
de185ca16f Remove workaround hack for php bug 46944
https://bugs.php.net/bug.php?id=46944

Fixed for 5.3.0, and as we require >= 5.3.2, workaround is redundant

http://php.net/ChangeLog-5.php

Change-Id: I567466c0c747dba2f903e9258d0f06f725cefb8f
2012-08-12 21:11:46 +00:00
Antoine Musso
aab43dd495 escape tags and entity in doxygen comments
When inserting XML elements inline <such as this one>, doxygen chokes
about it not being known. Simply enclosing the tag in double quotes
prevents doxygen from emitting a warning.

Also enclosed a few invalid functions calls such as \. and double quoted
the HTML entities such as &foobar;

Change-Id: I4019637145e683c2bec3d17b2fd98b0c50a932f1
2012-07-10 17:08:32 +02:00
Alexandre Emsenhuber
aad9d5fd6b Removed checks for the "MEDIAWIKI" constant on files that only define classes.
This checks are not needed in that case.

Change-Id: Ia83447427de8b7ea32aced8ff43c7a252b8d504c
2012-05-23 21:20:42 +02:00
Alexandre Emsenhuber
63176b99b7 Added missing GPLv2 headers in some places.
Also made file/class documentation more consistent.

Change-Id: I1deb70318d01a257b51948ba806d80cd1a239f4f
2012-05-04 08:47:07 +02:00
Chad Horohoe
9ee17e43f6 Simplify $assoc check 2011-12-20 20:15:42 +00:00
Sam Reed
4622da783c More documentation! 2011-10-26 04:15:09 +00:00
Siebrand Mazeland
75c6696aa8 Use consistent notation for "@todo FIXME". Should update http://svn.wikimedia.org/doc/todo.html nicely. 2011-05-17 22:03:20 +00:00
Roan Kattouw
e77aa77cd8 Remove debugging code accidentally committed in r86253 2011-04-17 10:49:27 +00:00
Roan Kattouw
3a7731a6f0 Fix broken check for bad JSON encoders, had been broken since inception and caused the native JSON encoder to always be bypassed in favor of Services_JSON. 2011-04-17 10:48:17 +00:00