Commit graph

118 commits

Author SHA1 Message Date
Nick Jenkins
baaee13afc Prevent some unnecessary lstat system calls, generated by include or require directives.
This can be done either by:
* Using explicit full paths, using the $IP global for the installation directory full path, and then working down the tree from there.
* Using explicit full paths, using the "dirname(__FILE__)" directive to get a full directory path for the includer file. 
* Occasionally removing the line altogether, and then for some files the inclusion is handled by the autoloader.

For example, if the "extensions/wikihiero/wh_main.php" file does an include or require on "wh_list.php", then PHP does the following:
* tries to open "wiki/wh_list.php", and fails.
* tries to open "wiki/includes/wh_list.php", and fails.
* tries to open "wiki/languages/wh_list.php", and fails.
* tries to open "wiki/extensions/wikihiero/wh_list.php", and succeeds.

So in this example, the first 3 calls can be prevented if PHP is told where the file is.

Testing Method: On a Linux box, run these commands to attach strace to all the apache2 processes, and log their system calls to a temporary file, then generate some activity, and then stop the strace:
-----------------------------------
rm /tmp/strace-log.txt
strace -tt -o /tmp/strace-log.txt -p `pidof apache2 | sed 's/ / -p /g'` &
php maintenance/fuzz-tester.php --keep-passed-tests --include-binary --max-runtime=3 > /tmp/strace-tests.txt
killall -9 strace
grep "No such file or directory"  /tmp/strace-log.txt | sort -u
-----------------------------------

Any failed file stats will be marked with: "-1 ENOENT (No such file or directory)".

Also:
* Strict Standards: Undefined offset:  230 in includes/normal/UtfNormal.php on line 637
* Strict Standards: iconv() [<a href='function.iconv'>function.iconv</a>]: Detected an illegal character in input string in languages/Language.php on line 776
  [Note: Partial only - despite adding "//IGNORE", it still seems to be possible with some
         messed- up binary input to cause PHP 5.1.2's iconv() function to squeal like a stuck pig].
* Update one $fname variable (method belongs to HistoryBlobStub class).
2007-02-09 05:36:56 +00:00
Brion Vibber
8fab89a6c4 Cleanup from r19742:
* use diffchange class alone for backwards compatibility with old renderings and diff plugins
* set text-decoration: none in diffs in RSS/Atom feeds
* fix bad diff regex in UTF-8 RandomTest script
2007-02-04 18:42:07 +00:00
Antoine Musso
fe7d2d15d4 Fix #6844: Semantically correct tags for diffchanges (<ins> && <del>)
Bumps wgStyleVersion to 55.
Patch by Messi <messias+spam@gmail.com>
2007-02-03 21:47:53 +00:00
Antoine Musso
c771fc9c96 Use Doxygen @addtogroup instead of phpdoc @package && @subpackage 2007-01-20 15:09:52 +00:00
Brion Vibber
e398816be7 use number_format on bytes/sec in output to make it easier to read 2007-01-13 04:22:47 +00:00
Brion Vibber
f15f0e05bb fix benchmark test data downloads; fix link for english text; find another page for korean text (page was deleted) 2007-01-13 02:57:58 +00:00
Brion Vibber
161d9aee1f * (bug 7250) Updated Unicode normalization tables to Unicode 5.0 2007-01-13 02:30:59 +00:00
Brion Vibber
25a2f1b60a adjust CleanUpTest to run with PHPUnit 3 2007-01-13 02:15:19 +00:00
Nick Jenkins
14c53b728f Code housekeeping stuff (and barring any stuff-ups on my behalf, there should be no changes in behaviour whatsoever after this) -
* removing some unused global declarations.
* removing or commenting out or adding comments for unused local vars.
* Adding one or two local var declarations.
* Declaring $matches array passed to preg_match() / preg_match_all() as array() before using [not required, just have a slight preference for the explicitness].
* remove one or two pass-by-reference function declarations where the value is not modified.
* Adding some braces to if-else blocks.
* In Parser.php, stripstrate is now an object rather than an array as per r17820, so we no longer need ask for a reference to it (as in "$x =& $this->mStripState;"), and in fact it's probably just simpler to get rid of $x altogether.
* Moving some preg regexes from "" quoting to '' quoting to stop static analyzer whinging about bad escape sequences.

... up to "LinksUpdate.php" in the includes/ directory.
2006-11-23 08:25:56 +00:00
Yuri Astrakhan
7b49a7bdda Marked all functions as static 2006-10-21 08:30:48 +00:00
Tim Starling
f3ce9d418d Use absolute path in require_once, errors reported in some configurations due to odd include_path. 2006-10-03 13:06:39 +00:00
Antoine Musso
93154120cc Remove forced dereferencements (new() returns a reference in PHP5) 2006-07-11 14:11:23 +00:00
Antoine Musso
473cd5cbcc unused variables as per #3692 2006-05-01 10:53:59 +00:00
Antoine Musso
69689725c1 Switching from phpdoc to doxygen (use less than 32MB of memory).
Run maintenance/mwdocgen.php to generate doc in ./docs/html/ .
2006-04-19 15:46:24 +00:00
Brion Vibber
3bbf7dcbd2 Remove .cvsignore files 2006-04-05 08:23:27 +00:00
Brion Vibber
f2c29baf9f Update the FSF's address in all these GPL stub headers 2006-04-05 07:43:17 +00:00
Tim Starling
11f0b952f6 Replaced codepointToUtf8 calls with string literals, should save a few milliseconds according to xdebug. Ran unit test. 2006-03-05 03:03:03 +00:00
Brion Vibber
266d41f165 * Added wfDie() wrapper, and some manual die(-1), to force the return code
to the shell to return nonzero when we crap out with an error.
2006-01-14 02:49:43 +00:00
Ævar Arnfjörð Bjarmason
a26d5a49d7 * s~\t+$~~ 2006-01-07 13:31:29 +00:00
Ævar Arnfjörð Bjarmason
7bbe971aec * s~ +$~~ 2006-01-07 13:09:30 +00:00
Brion Vibber
af2177edfd Code cleanup: normalize case for intval(), strval(), floatval() calls. 2005-08-16 23:36:16 +00:00
Brion Vibber
f77b1cbbf3 Update files as currently generated. 2005-05-18 09:18:07 +00:00
Antoine Musso
2104f62734 fix phpdoc comment 2005-01-27 19:51:47 +00:00
Brion Vibber
9f963dfac7 notes 2004-12-03 10:41:57 +00:00
Brion Vibber
11e0f6ecff Require running from command line 2004-12-03 10:30:50 +00:00
Brion Vibber
727e4d1aab Fix composition bug: completed hangul syllable should not be merged with another following final jamo 2004-11-15 00:59:40 +00:00
Brion Vibber
deb0452649 Add a utf-8 to hex sequence function for debugging 2004-11-15 00:58:36 +00:00
Brion Vibber
66e64d98d2 Test: feeds random strings to both pure PHP and ICU code paths looking for differences. 2004-11-14 21:40:44 +00:00
Brion Vibber
c6340de5b3 Fix regression in ICU-mode UTF-8 verification: U+FFFF is forbidden 2004-11-14 21:36:43 +00:00
Brion Vibber
e4e75a58a6 Support using ICU to do most of the heavy lifting in cleanUp() if the extension is loaded.
Modestly faster for roman text (1-2x), 16-20x faster than the PHP looping for already normalized Russian, Japanese, and Korean text.
2004-11-14 05:17:29 +00:00
Brion Vibber
4a4f248655 Fix regression: surrogate half followed by extra tail bytes 2004-11-14 04:27:03 +00:00
Brion Vibber
9535fc035b Fix UTF-8 validation regression: well-formed but forbidden UTF-8 sequence followed by bogus tail bytes 2004-11-14 04:07:28 +00:00
Brion Vibber
dd69eb14f5 Fix UTF-8 validation regression where a bad head byte is followed by ascii, then bad tail byte. 2004-11-14 03:48:49 +00:00
Brion Vibber
dec06744da Ignore some Mac-related files 2004-11-14 02:25:44 +00:00
Brion Vibber
7bf6095d73 Fix UTF-8 validation bug where some cases didn't get replacement chars inserted correctly 2004-11-14 02:24:44 +00:00
Brion Vibber
b108d98286 Add a Russian test file to the benchmark (2-byte characters, using ASCII spacing and punctuation) 2004-11-11 07:05:21 +00:00
Brion Vibber
961187ba17 Tweak benchmark a bit; display times in milliseconds instead of seconds for legibility. 2004-11-07 22:01:57 +00:00
Brion Vibber
eae361e2f0 cleanUp() optimization: speed up Japanese, Korean tests by another 15% by rearranging the loop and avoiding rebuilding the string if there are no illegal characters.
Removed restrictions on U+FDD0 and friends; these do seem to be allowed by XML, though they 'recommend' you avoid them.
2004-11-07 11:28:00 +00:00
Brion Vibber
8efe66008c Don't run the control characters through the invariant test, as they are stripped by cleanUp() for XML safety. 2004-11-06 03:00:29 +00:00
Brion Vibber
7434438b98 Don't forgot to actually _make_ the replacements for illegal chars. :P 2004-11-06 02:52:25 +00:00
Brion Vibber
93c098dfb7 Adding some extra tests for the cleanUp() function 2004-11-06 02:51:43 +00:00
Brion Vibber
51dd271399 Shave off a few more milliseconds from cleanUp() inner loop. 2004-11-05 09:13:02 +00:00
Brion Vibber
97f577163c Shave a few more percentage points from times on cleanUp() on unicode text by building a combined NFC-check hash. 2004-11-05 08:22:56 +00:00
Brion Vibber
0db79dbed6 More incremental optimization on cleanUp():
* when splitting ascii vs non-ascii chunks, don't split punctuation and control chars as aggressively; this benefits the Korean test data
* use output buffer and echo; it's _slightly_ faster than string concatenation.
* Separate the surrogate check from the others; many Korean letters fall in the adjacent area with the same head byte, so this gives a small speed boost on Korean text
2004-11-05 04:07:04 +00:00
Brion Vibber
874f8b48c6 cleanUp() optimization: about 1/8 speed boost on unicode-dominant text (Japanese, Korean test data) 2004-11-05 00:47:03 +00:00
Brion Vibber
9ba6a6c74a cleanUp() optimization: split the string into pure ASCII chunks and chunks which need to be checked byte by byte. Over 5x speedup for German text sample. 2004-11-05 00:26:09 +00:00
Brion Vibber
48cb181bd2 Optimization on cleanUp(): roughly 1/3 speed boost on ascii-dominant but not ascii-pure text (eg German) 2004-11-04 23:53:44 +00:00
Brion Vibber
5f530ba1f3 Optimize inner loop in cleanUp(): boosts performance on non-ASCII text by about 20%.
Also, trim the XML-illegal control characters from pure ASCII as well as non-ASCII strings.
2004-11-04 11:44:45 +00:00
Brion Vibber
1897c54f2a The pass-by-reference on the string on fastCompose() really slows things down sometimes in PHP4. Taking it out speeds up processing of Japanese text significantly. 2004-10-30 12:35:37 +00:00
Brion Vibber
286dd13042 More inlining; fastCompose() is now twice as fast on hangul chars, which cuts down the NFC() time on Korean text a fair chunk. 2004-10-30 12:06:31 +00:00
Brion Vibber
dafeb1fe3b Work through the NFC substeps with the actual data to make the substep times more meaningful 2004-10-30 10:20:19 +00:00
Brion Vibber
711899c70d Benchmark was pulling the wrong Tokyo article (shorter than the others) 2004-10-30 06:47:36 +00:00
Brion Vibber
959f097c2d Add some sub-functions back to the benchmark 2004-10-30 06:42:39 +00:00
Brion Vibber
de3549d9e9 Optimize inner loops a bit. 2004-10-30 06:02:30 +00:00
Brion Vibber
5cf94de93f Subject UtfNormal::cleanUp() to the same tests as UtfNormal::toNFC() 2004-10-30 05:24:24 +00:00
Brion Vibber
d2e152e6de Munge doc comments. Mark as its own package for docs. 2004-10-28 02:56:13 +00:00
Brion Vibber
6377e82b76 Load form C data on demand; if we are dealing in all-ASCII text we can save some memory and time by not loading it. 2004-10-09 08:08:26 +00:00
Brion Vibber
0824182956 Add support for using ICU to perform normalization, which is much much faster than the PHP code!
Still need to add support for cleanup/verification.
2004-10-07 05:59:10 +00:00
Brion Vibber
bcd1e9e844 Fetch test data for the benchmark 2004-10-07 03:40:06 +00:00
Brion Vibber
f0610d0f67 Doc comments 2004-09-27 02:59:24 +00:00
Brion Vibber
106d11a197 Add remotely fetched files to .cvsignore to reduce screen pollution 2004-09-23 07:29:25 +00:00
Brion Vibber
dd195aa594 Some more phpdoc bits 2004-09-04 09:35:01 +00:00
Antoine Musso
ba2afcd9fa Split files and classes in different packages for phpdocumentor. I probably changed some double quotes to single and used function foo () { shema 2004-09-03 23:00:01 +00:00
Antoine Musso
705bb88da0 Change the way comment are generated so they are compatible with phpdocumentor. Changes already existing files as well. 2004-09-03 22:52:28 +00:00
Brion Vibber
9857a47c3f Correction to the \r stripping 2004-09-03 06:44:57 +00:00
Brion Vibber
ed46bd50fe Add UtfNormal::cleanUp() function: strips XML-unsafe characters and illegal UTF-8 sequences, then normalizes to form C. 2004-09-03 05:39:30 +00:00
Brion Vibber
53e71c1702 Split the data arrays for form KC, KD to a separate include file and load it on demand.
These are less likely to be used, so save the memory and parse time...
2004-09-02 07:39:06 +00:00
Brion Vibber
a5cfdf0360 Unicode normalization routines.
See: http://www.unicode.org/reports/tr15/
2004-08-29 10:30:23 +00:00