moveToExternal: Also check for utf8 encoding before trying to convert

While most rows in production use 'utf-8' to flag content being UTF-8,
we have lots of rows flagged with 'utf8':
mysql:research@s3-analytics-replica.eqiad.wmnet [dawiki]> select old_flags, count(*) from text group by old_flags limit 50;
+---------------------+----------+
| old_flags           | count(*) |
+---------------------+----------+
| error               |        2 |
| external,gzip       |       49 |
| external,object     |       36 |
| external,utf-8      |  1614469 |
| external,utf8       |   336780 |
| gzip,utf-8,external |     1094 |
| utf-8,gzip,external |  9458083 |
+---------------------+----------+
7 rows in set (26.038 sec)

This would confuse the script to try to reencode it again which possibly
could lead to all sorts of errors

Change-Id: I9b4a38538199c9954cfed51cdd2bba8b0f6cb953
This commit is contained in:
Amir Sarabadani 2023-06-08 14:20:51 +02:00
parent 15f076efca
commit 4dd3850beb

View file

@ -232,6 +232,7 @@ class MoveToExternal extends Maintenance {
private function resolveLegacyEncoding( $text, $flags ) {
if ( $this->legacyEncoding !== null
&& !in_array( 'utf-8', $flags )
&& !in_array( 'utf8', $flags )
) {
AtEase::suppressWarnings();
$text = iconv( $this->legacyEncoding, 'UTF-8//IGNORE', $text );