The commafy method "should" be given valid numeric strings, but this
wasn't enforced, and if were provided input which started with a UTF-8
multibyte character and then had a single ASCII digit somewhere after
that, it would return the first byte of the input string, resulting in
an invalid UTF-8 sequence.
Fix this bug with belt and suspenders: first, enforce the expected
input structure at the top of the function. Since there is existing
code which expects us to "do our best" with invalid input, split the
input string into valid numeric chunks before processing it. This
split code triggers a hard deprecation warning, so we can eventually
remove it.
Second, make the sign test more robust and anchor the $integerPart
regexp to match assumptions made in the algorithm, so that even if
bogus input *did* creep through (a sloppy future maintainer, say) it
wouldn't lead to corrupt UTF-8 in the output.
Add test cases covering these conditions, borrowing liberally from
I741b70757e43b1312c86719920e29885566e916c, which points out that while
commafy expects numeric strings, formatNum replaces – character by
character – digits and separator characters with language specific
ones. Optionally thousand separators are added (a.k.a. "commafy").
Eventually we should tighten the spec for formatNum as well; some of
this has already been done in
I03ffa99f7de1dcc48535ba1e1251567dbf3db116 and
I89b17a9e11b3afc6c653ba7ccc6ff84c37863b66.
Some additional test case fixes borrowed from
If45ef33a50b2623322f17306d123f0d8cb468618 which updated a few test
cases to be more specific, i.e. actually test stuff (for example,
commafy doesn't happen on 3-digit numbers, and numerals are not
translated in English).
Bug: T237467
Depends-On: I89b17a9e11b3afc6c653ba7ccc6ff84c37863b66
Depends-On: I9dcbe91fa926dba1cfd24d9bf075ee1ebef36b9e
Depends-On: I03ffa99f7de1dcc48535ba1e1251567dbf3db116
Change-Id: If3dcfd71acd8ebf3eea6a49408260f2aaa07e469