Thijs/wiki.techinc.nl

Author	SHA1	Message	Date
Ævar Arnfjörð Bjarmason	a26d5a49d7	* s~\t+$~~	2006-01-07 13:31:29 +00:00
Ævar Arnfjörð Bjarmason	7bbe971aec	* s~ +$~~	2006-01-07 13:09:30 +00:00
Brion Vibber	af2177edfd	Code cleanup: normalize case for intval(), strval(), floatval() calls.	2005-08-16 23:36:16 +00:00
Brion Vibber	727e4d1aab	Fix composition bug: completed hangul syllable should not be merged with another following final jamo	2004-11-15 00:59:40 +00:00
Brion Vibber	c6340de5b3	Fix regression in ICU-mode UTF-8 verification: U+FFFF is forbidden	2004-11-14 21:36:43 +00:00
Brion Vibber	e4e75a58a6	Support using ICU to do most of the heavy lifting in cleanUp() if the extension is loaded. Modestly faster for roman text (1-2x), 16-20x faster than the PHP looping for already normalized Russian, Japanese, and Korean text.	2004-11-14 05:17:29 +00:00
Brion Vibber	4a4f248655	Fix regression: surrogate half followed by extra tail bytes	2004-11-14 04:27:03 +00:00
Brion Vibber	9535fc035b	Fix UTF-8 validation regression: well-formed but forbidden UTF-8 sequence followed by bogus tail bytes	2004-11-14 04:07:28 +00:00
Brion Vibber	dd69eb14f5	Fix UTF-8 validation regression where a bad head byte is followed by ascii, then bad tail byte.	2004-11-14 03:48:49 +00:00
Brion Vibber	7bf6095d73	Fix UTF-8 validation bug where some cases didn't get replacement chars inserted correctly	2004-11-14 02:24:44 +00:00
Brion Vibber	eae361e2f0	cleanUp() optimization: speed up Japanese, Korean tests by another 15% by rearranging the loop and avoiding rebuilding the string if there are no illegal characters. Removed restrictions on U+FDD0 and friends; these do seem to be allowed by XML, though they 'recommend' you avoid them.	2004-11-07 11:28:00 +00:00
Brion Vibber	7434438b98	Don't forgot to actually _make_ the replacements for illegal chars. :P	2004-11-06 02:52:25 +00:00
Brion Vibber	51dd271399	Shave off a few more milliseconds from cleanUp() inner loop.	2004-11-05 09:13:02 +00:00
Brion Vibber	97f577163c	Shave a few more percentage points from times on cleanUp() on unicode text by building a combined NFC-check hash.	2004-11-05 08:22:56 +00:00
Brion Vibber	0db79dbed6	More incremental optimization on cleanUp(): * when splitting ascii vs non-ascii chunks, don't split punctuation and control chars as aggressively; this benefits the Korean test data * use output buffer and echo; it's _slightly_ faster than string concatenation. * Separate the surrogate check from the others; many Korean letters fall in the adjacent area with the same head byte, so this gives a small speed boost on Korean text	2004-11-05 04:07:04 +00:00
Brion Vibber	874f8b48c6	cleanUp() optimization: about 1/8 speed boost on unicode-dominant text (Japanese, Korean test data)	2004-11-05 00:47:03 +00:00
Brion Vibber	9ba6a6c74a	cleanUp() optimization: split the string into pure ASCII chunks and chunks which need to be checked byte by byte. Over 5x speedup for German text sample.	2004-11-05 00:26:09 +00:00
Brion Vibber	48cb181bd2	Optimization on cleanUp(): roughly 1/3 speed boost on ascii-dominant but not ascii-pure text (eg German)	2004-11-04 23:53:44 +00:00
Brion Vibber	5f530ba1f3	Optimize inner loop in cleanUp(): boosts performance on non-ASCII text by about 20%. Also, trim the XML-illegal control characters from pure ASCII as well as non-ASCII strings.	2004-11-04 11:44:45 +00:00
Brion Vibber	1897c54f2a	The pass-by-reference on the string on fastCompose() really slows things down sometimes in PHP4. Taking it out speeds up processing of Japanese text significantly.	2004-10-30 12:35:37 +00:00
Brion Vibber	286dd13042	More inlining; fastCompose() is now twice as fast on hangul chars, which cuts down the NFC() time on Korean text a fair chunk.	2004-10-30 12:06:31 +00:00
Brion Vibber	de3549d9e9	Optimize inner loops a bit.	2004-10-30 06:02:30 +00:00
Brion Vibber	d2e152e6de	Munge doc comments. Mark as its own package for docs.	2004-10-28 02:56:13 +00:00
Brion Vibber	6377e82b76	Load form C data on demand; if we are dealing in all-ASCII text we can save some memory and time by not loading it.	2004-10-09 08:08:26 +00:00
Brion Vibber	0824182956	Add support for using ICU to perform normalization, which is much much faster than the PHP code! Still need to add support for cleanup/verification.	2004-10-07 05:59:10 +00:00
Brion Vibber	f0610d0f67	Doc comments	2004-09-27 02:59:24 +00:00
Brion Vibber	dd195aa594	Some more phpdoc bits	2004-09-04 09:35:01 +00:00
Antoine Musso	ba2afcd9fa	Split files and classes in different packages for phpdocumentor. I probably changed some double quotes to single and used function foo () { shema	2004-09-03 23:00:01 +00:00
Brion Vibber	9857a47c3f	Correction to the \r stripping	2004-09-03 06:44:57 +00:00
Brion Vibber	ed46bd50fe	Add UtfNormal::cleanUp() function: strips XML-unsafe characters and illegal UTF-8 sequences, then normalizes to form C.	2004-09-03 05:39:30 +00:00
Brion Vibber	53e71c1702	Split the data arrays for form KC, KD to a separate include file and load it on demand. These are less likely to be used, so save the memory and parse time...	2004-09-02 07:39:06 +00:00
Brion Vibber	a5cfdf0360	Unicode normalization routines. See: http://www.unicode.org/reports/tr15/	2004-08-29 10:30:23 +00:00

32 commits