Improve normalization and sanitization of memcached keys
The motivation for this patch came from seeing the following error in the
memcached log:
memcached ERROR: Memcached error for key "flowdb:flow_ref:wiki:by-source:\
v3:0:tawikinews:பூமிக்கு_அச்சுறுத்தலான_சிறுகோள்களைக்_கண்டுபிடிக்கும்_முயற்சியில்_தனியார்_நிறுவன0jம்:4.7" \
on server ":": A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE
I submitted a fix to Flow in I26e531f6, but I noticed that AbuseFilter had a
similar issue (fixed by Aaron in I27b51a4b), so I started thinking about how
to solve this more generally:
* The regular expression we current use to sanitize keys does not cover
characters outside the ASCII range, but such characters can be illegal
if one of their constituent bytes (when taken by itself) is an ASCII
control character. So change the regular expression to cover any and all
characters that fall outside the range \x22-\x7e (and '#' -- see below).
* Enforce a key length limit of 255 bytes, which is the maximum length
permitted by the memcached protocol. The Tamil segment in the key above is 84
characters, but 233 bytes in UTF-8, which become 684 characters when
URL-encoded. To fix this, try to shrink any segment that would push the total
key length over the limit by md5()ing it. If the end result is *still* over
the limit (this would happen if, for example, $args consists of many short
strings), then concatenate all args together and MD5 them.
* MD5'd arguments are prefixed with '#'. Any "organic" '#'s in the key segments
are URL-encoded.
Change-Id: Ia46987d3b0a09bb6b1952abd936d4c72ea7c56a0
2015-10-23 18:10:10 +00:00
|
|
|
|
<?php
|
|
|
|
|
|
/**
|
|
|
|
|
|
* @group BagOStuff
|
|
|
|
|
|
*/
|
|
|
|
|
|
class MemcachedBagOStuffTest extends MediaWikiTestCase {
|
|
|
|
|
|
/** @var MemcachedBagOStuff */
|
|
|
|
|
|
private $cache;
|
|
|
|
|
|
|
|
|
|
|
|
protected function setUp() {
|
|
|
|
|
|
parent::setUp();
|
2016-02-17 09:09:32 +00:00
|
|
|
|
$this->cache = new MemcachedBagOStuff( [ 'keyspace' => 'test' ] );
|
Improve normalization and sanitization of memcached keys
The motivation for this patch came from seeing the following error in the
memcached log:
memcached ERROR: Memcached error for key "flowdb:flow_ref:wiki:by-source:\
v3:0:tawikinews:பூமிக்கு_அச்சுறுத்தலான_சிறுகோள்களைக்_கண்டுபிடிக்கும்_முயற்சியில்_தனியார்_நிறுவன0jம்:4.7" \
on server ":": A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE
I submitted a fix to Flow in I26e531f6, but I noticed that AbuseFilter had a
similar issue (fixed by Aaron in I27b51a4b), so I started thinking about how
to solve this more generally:
* The regular expression we current use to sanitize keys does not cover
characters outside the ASCII range, but such characters can be illegal
if one of their constituent bytes (when taken by itself) is an ASCII
control character. So change the regular expression to cover any and all
characters that fall outside the range \x22-\x7e (and '#' -- see below).
* Enforce a key length limit of 255 bytes, which is the maximum length
permitted by the memcached protocol. The Tamil segment in the key above is 84
characters, but 233 bytes in UTF-8, which become 684 characters when
URL-encoded. To fix this, try to shrink any segment that would push the total
key length over the limit by md5()ing it. If the end result is *still* over
the limit (this would happen if, for example, $args consists of many short
strings), then concatenate all args together and MD5 them.
* MD5'd arguments are prefixed with '#'. Any "organic" '#'s in the key segments
are URL-encoded.
Change-Id: Ia46987d3b0a09bb6b1952abd936d4c72ea7c56a0
2015-10-23 18:10:10 +00:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
2015-10-28 15:26:37 +00:00
|
|
|
|
* @covers MemcachedBagOStuff::makeKey
|
Improve normalization and sanitization of memcached keys
The motivation for this patch came from seeing the following error in the
memcached log:
memcached ERROR: Memcached error for key "flowdb:flow_ref:wiki:by-source:\
v3:0:tawikinews:பூமிக்கு_அச்சுறுத்தலான_சிறுகோள்களைக்_கண்டுபிடிக்கும்_முயற்சியில்_தனியார்_நிறுவன0jம்:4.7" \
on server ":": A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE
I submitted a fix to Flow in I26e531f6, but I noticed that AbuseFilter had a
similar issue (fixed by Aaron in I27b51a4b), so I started thinking about how
to solve this more generally:
* The regular expression we current use to sanitize keys does not cover
characters outside the ASCII range, but such characters can be illegal
if one of their constituent bytes (when taken by itself) is an ASCII
control character. So change the regular expression to cover any and all
characters that fall outside the range \x22-\x7e (and '#' -- see below).
* Enforce a key length limit of 255 bytes, which is the maximum length
permitted by the memcached protocol. The Tamil segment in the key above is 84
characters, but 233 bytes in UTF-8, which become 684 characters when
URL-encoded. To fix this, try to shrink any segment that would push the total
key length over the limit by md5()ing it. If the end result is *still* over
the limit (this would happen if, for example, $args consists of many short
strings), then concatenate all args together and MD5 them.
* MD5'd arguments are prefixed with '#'. Any "organic" '#'s in the key segments
are URL-encoded.
Change-Id: Ia46987d3b0a09bb6b1952abd936d4c72ea7c56a0
2015-10-23 18:10:10 +00:00
|
|
|
|
*/
|
|
|
|
|
|
public function testKeyNormalization() {
|
|
|
|
|
|
$this->assertEquals(
|
2015-10-24 04:03:11 +00:00
|
|
|
|
'test:vanilla',
|
|
|
|
|
|
$this->cache->makeKey( 'vanilla' )
|
Improve normalization and sanitization of memcached keys
The motivation for this patch came from seeing the following error in the
memcached log:
memcached ERROR: Memcached error for key "flowdb:flow_ref:wiki:by-source:\
v3:0:tawikinews:பூமிக்கு_அச்சுறுத்தலான_சிறுகோள்களைக்_கண்டுபிடிக்கும்_முயற்சியில்_தனியார்_நிறுவன0jம்:4.7" \
on server ":": A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE
I submitted a fix to Flow in I26e531f6, but I noticed that AbuseFilter had a
similar issue (fixed by Aaron in I27b51a4b), so I started thinking about how
to solve this more generally:
* The regular expression we current use to sanitize keys does not cover
characters outside the ASCII range, but such characters can be illegal
if one of their constituent bytes (when taken by itself) is an ASCII
control character. So change the regular expression to cover any and all
characters that fall outside the range \x22-\x7e (and '#' -- see below).
* Enforce a key length limit of 255 bytes, which is the maximum length
permitted by the memcached protocol. The Tamil segment in the key above is 84
characters, but 233 bytes in UTF-8, which become 684 characters when
URL-encoded. To fix this, try to shrink any segment that would push the total
key length over the limit by md5()ing it. If the end result is *still* over
the limit (this would happen if, for example, $args consists of many short
strings), then concatenate all args together and MD5 them.
* MD5'd arguments are prefixed with '#'. Any "organic" '#'s in the key segments
are URL-encoded.
Change-Id: Ia46987d3b0a09bb6b1952abd936d4c72ea7c56a0
2015-10-23 18:10:10 +00:00
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
|
|
$this->assertEquals(
|
2015-10-24 04:03:11 +00:00
|
|
|
|
'test:punctuation_marks_are_ok:!@$^&*()',
|
|
|
|
|
|
$this->cache->makeKey( 'punctuation_marks_are_ok', '!@$^&*()' )
|
2015-10-24 00:53:25 +00:00
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
|
|
$this->assertEquals(
|
2015-10-24 04:03:11 +00:00
|
|
|
|
'test:but_spaces:hashes%23:and%0Anewlines:are_not',
|
|
|
|
|
|
$this->cache->makeKey( 'but spaces', 'hashes#', "and\nnewlines", 'are_not' )
|
Improve normalization and sanitization of memcached keys
The motivation for this patch came from seeing the following error in the
memcached log:
memcached ERROR: Memcached error for key "flowdb:flow_ref:wiki:by-source:\
v3:0:tawikinews:பூமிக்கு_அச்சுறுத்தலான_சிறுகோள்களைக்_கண்டுபிடிக்கும்_முயற்சியில்_தனியார்_நிறுவன0jம்:4.7" \
on server ":": A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE
I submitted a fix to Flow in I26e531f6, but I noticed that AbuseFilter had a
similar issue (fixed by Aaron in I27b51a4b), so I started thinking about how
to solve this more generally:
* The regular expression we current use to sanitize keys does not cover
characters outside the ASCII range, but such characters can be illegal
if one of their constituent bytes (when taken by itself) is an ASCII
control character. So change the regular expression to cover any and all
characters that fall outside the range \x22-\x7e (and '#' -- see below).
* Enforce a key length limit of 255 bytes, which is the maximum length
permitted by the memcached protocol. The Tamil segment in the key above is 84
characters, but 233 bytes in UTF-8, which become 684 characters when
URL-encoded. To fix this, try to shrink any segment that would push the total
key length over the limit by md5()ing it. If the end result is *still* over
the limit (this would happen if, for example, $args consists of many short
strings), then concatenate all args together and MD5 them.
* MD5'd arguments are prefixed with '#'. Any "organic" '#'s in the key segments
are URL-encoded.
Change-Id: Ia46987d3b0a09bb6b1952abd936d4c72ea7c56a0
2015-10-23 18:10:10 +00:00
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
|
|
$this->assertEquals(
|
2015-10-24 04:03:11 +00:00
|
|
|
|
'test:this:key:contains:%F0%9D%95%9E%F0%9D%95%A6%F0%9D%95%9D%F0%9D%95%A5%F0%9' .
|
|
|
|
|
|
'D%95%9A%F0%9D%95%93%F0%9D%95%AA%F0%9D%95%A5%F0%9D%95%96:characters',
|
|
|
|
|
|
$this->cache->makeKey( 'this', 'key', 'contains', '𝕞𝕦𝕝𝕥𝕚𝕓𝕪𝕥𝕖', 'characters' )
|
Improve normalization and sanitization of memcached keys
The motivation for this patch came from seeing the following error in the
memcached log:
memcached ERROR: Memcached error for key "flowdb:flow_ref:wiki:by-source:\
v3:0:tawikinews:பூமிக்கு_அச்சுறுத்தலான_சிறுகோள்களைக்_கண்டுபிடிக்கும்_முயற்சியில்_தனியார்_நிறுவன0jம்:4.7" \
on server ":": A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE
I submitted a fix to Flow in I26e531f6, but I noticed that AbuseFilter had a
similar issue (fixed by Aaron in I27b51a4b), so I started thinking about how
to solve this more generally:
* The regular expression we current use to sanitize keys does not cover
characters outside the ASCII range, but such characters can be illegal
if one of their constituent bytes (when taken by itself) is an ASCII
control character. So change the regular expression to cover any and all
characters that fall outside the range \x22-\x7e (and '#' -- see below).
* Enforce a key length limit of 255 bytes, which is the maximum length
permitted by the memcached protocol. The Tamil segment in the key above is 84
characters, but 233 bytes in UTF-8, which become 684 characters when
URL-encoded. To fix this, try to shrink any segment that would push the total
key length over the limit by md5()ing it. If the end result is *still* over
the limit (this would happen if, for example, $args consists of many short
strings), then concatenate all args together and MD5 them.
* MD5'd arguments are prefixed with '#'. Any "organic" '#'s in the key segments
are URL-encoded.
Change-Id: Ia46987d3b0a09bb6b1952abd936d4c72ea7c56a0
2015-10-23 18:10:10 +00:00
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
|
|
$this->assertEquals(
|
2015-10-24 04:03:11 +00:00
|
|
|
|
'test:this:key:contains:#c118f92685a635cb843039de50014c9c',
|
|
|
|
|
|
$this->cache->makeKey( 'this', 'key', 'contains', '𝕥𝕠𝕠 𝕞𝕒𝕟𝕪 𝕞𝕦𝕝𝕥𝕚𝕓𝕪𝕥𝕖 𝕔𝕙𝕒𝕣𝕒𝕔𝕥𝕖𝕣𝕤' )
|
Improve normalization and sanitization of memcached keys
The motivation for this patch came from seeing the following error in the
memcached log:
memcached ERROR: Memcached error for key "flowdb:flow_ref:wiki:by-source:\
v3:0:tawikinews:பூமிக்கு_அச்சுறுத்தலான_சிறுகோள்களைக்_கண்டுபிடிக்கும்_முயற்சியில்_தனியார்_நிறுவன0jம்:4.7" \
on server ":": A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE
I submitted a fix to Flow in I26e531f6, but I noticed that AbuseFilter had a
similar issue (fixed by Aaron in I27b51a4b), so I started thinking about how
to solve this more generally:
* The regular expression we current use to sanitize keys does not cover
characters outside the ASCII range, but such characters can be illegal
if one of their constituent bytes (when taken by itself) is an ASCII
control character. So change the regular expression to cover any and all
characters that fall outside the range \x22-\x7e (and '#' -- see below).
* Enforce a key length limit of 255 bytes, which is the maximum length
permitted by the memcached protocol. The Tamil segment in the key above is 84
characters, but 233 bytes in UTF-8, which become 684 characters when
URL-encoded. To fix this, try to shrink any segment that would push the total
key length over the limit by md5()ing it. If the end result is *still* over
the limit (this would happen if, for example, $args consists of many short
strings), then concatenate all args together and MD5 them.
* MD5'd arguments are prefixed with '#'. Any "organic" '#'s in the key segments
are URL-encoded.
Change-Id: Ia46987d3b0a09bb6b1952abd936d4c72ea7c56a0
2015-10-23 18:10:10 +00:00
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
|
|
$this->assertEquals(
|
2017-11-21 00:04:38 +00:00
|
|
|
|
'test:BagOStuff-long-key:##dc89dcb43b28614da27660240af478b5',
|
2015-10-24 04:03:11 +00:00
|
|
|
|
$this->cache->makeKey( '𝕖𝕧𝕖𝕟', '𝕚𝕗', '𝕨𝕖', '𝕄𝔻𝟝', '𝕖𝕒𝕔𝕙',
|
|
|
|
|
|
'𝕒𝕣𝕘𝕦𝕞𝕖𝕟𝕥', '𝕥𝕙𝕚𝕤', '𝕜𝕖𝕪', '𝕨𝕠𝕦𝕝𝕕', '𝕤𝕥𝕚𝕝𝕝', '𝕓𝕖', '𝕥𝕠𝕠', '𝕝𝕠𝕟𝕘' )
|
Improve normalization and sanitization of memcached keys
The motivation for this patch came from seeing the following error in the
memcached log:
memcached ERROR: Memcached error for key "flowdb:flow_ref:wiki:by-source:\
v3:0:tawikinews:பூமிக்கு_அச்சுறுத்தலான_சிறுகோள்களைக்_கண்டுபிடிக்கும்_முயற்சியில்_தனியார்_நிறுவன0jம்:4.7" \
on server ":": A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE
I submitted a fix to Flow in I26e531f6, but I noticed that AbuseFilter had a
similar issue (fixed by Aaron in I27b51a4b), so I started thinking about how
to solve this more generally:
* The regular expression we current use to sanitize keys does not cover
characters outside the ASCII range, but such characters can be illegal
if one of their constituent bytes (when taken by itself) is an ASCII
control character. So change the regular expression to cover any and all
characters that fall outside the range \x22-\x7e (and '#' -- see below).
* Enforce a key length limit of 255 bytes, which is the maximum length
permitted by the memcached protocol. The Tamil segment in the key above is 84
characters, but 233 bytes in UTF-8, which become 684 characters when
URL-encoded. To fix this, try to shrink any segment that would push the total
key length over the limit by md5()ing it. If the end result is *still* over
the limit (this would happen if, for example, $args consists of many short
strings), then concatenate all args together and MD5 them.
* MD5'd arguments are prefixed with '#'. Any "organic" '#'s in the key segments
are URL-encoded.
Change-Id: Ia46987d3b0a09bb6b1952abd936d4c72ea7c56a0
2015-10-23 18:10:10 +00:00
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
|
|
$this->assertEquals(
|
2015-10-24 04:03:11 +00:00
|
|
|
|
'test:%23%235820ad1d105aa4dc698585c39df73e19',
|
|
|
|
|
|
$this->cache->makeKey( '##5820ad1d105aa4dc698585c39df73e19' )
|
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
|
|
$this->assertEquals(
|
|
|
|
|
|
'test:percent_is_escaped:!@$%25^&*()',
|
|
|
|
|
|
$this->cache->makeKey( 'percent_is_escaped', '!@$%^&*()' )
|
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
|
|
$this->assertEquals(
|
|
|
|
|
|
'test:colon_is_escaped:!@$%3A^&*()',
|
|
|
|
|
|
$this->cache->makeKey( 'colon_is_escaped', '!@$:^&*()' )
|
Improve normalization and sanitization of memcached keys
The motivation for this patch came from seeing the following error in the
memcached log:
memcached ERROR: Memcached error for key "flowdb:flow_ref:wiki:by-source:\
v3:0:tawikinews:பூமிக்கு_அச்சுறுத்தலான_சிறுகோள்களைக்_கண்டுபிடிக்கும்_முயற்சியில்_தனியார்_நிறுவன0jம்:4.7" \
on server ":": A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE
I submitted a fix to Flow in I26e531f6, but I noticed that AbuseFilter had a
similar issue (fixed by Aaron in I27b51a4b), so I started thinking about how
to solve this more generally:
* The regular expression we current use to sanitize keys does not cover
characters outside the ASCII range, but such characters can be illegal
if one of their constituent bytes (when taken by itself) is an ASCII
control character. So change the regular expression to cover any and all
characters that fall outside the range \x22-\x7e (and '#' -- see below).
* Enforce a key length limit of 255 bytes, which is the maximum length
permitted by the memcached protocol. The Tamil segment in the key above is 84
characters, but 233 bytes in UTF-8, which become 684 characters when
URL-encoded. To fix this, try to shrink any segment that would push the total
key length over the limit by md5()ing it. If the end result is *still* over
the limit (this would happen if, for example, $args consists of many short
strings), then concatenate all args together and MD5 them.
* MD5'd arguments are prefixed with '#'. Any "organic" '#'s in the key segments
are URL-encoded.
Change-Id: Ia46987d3b0a09bb6b1952abd936d4c72ea7c56a0
2015-10-23 18:10:10 +00:00
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
|
|
$this->assertEquals(
|
2015-10-24 04:03:11 +00:00
|
|
|
|
'test:long_key_part_hashed:#0244f7b1811d982dd932dd7de01465ac',
|
|
|
|
|
|
$this->cache->makeKey( 'long_key_part_hashed', str_repeat( 'y', 500 ) )
|
Improve normalization and sanitization of memcached keys
The motivation for this patch came from seeing the following error in the
memcached log:
memcached ERROR: Memcached error for key "flowdb:flow_ref:wiki:by-source:\
v3:0:tawikinews:பூமிக்கு_அச்சுறுத்தலான_சிறுகோள்களைக்_கண்டுபிடிக்கும்_முயற்சியில்_தனியார்_நிறுவன0jம்:4.7" \
on server ":": A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE
I submitted a fix to Flow in I26e531f6, but I noticed that AbuseFilter had a
similar issue (fixed by Aaron in I27b51a4b), so I started thinking about how
to solve this more generally:
* The regular expression we current use to sanitize keys does not cover
characters outside the ASCII range, but such characters can be illegal
if one of their constituent bytes (when taken by itself) is an ASCII
control character. So change the regular expression to cover any and all
characters that fall outside the range \x22-\x7e (and '#' -- see below).
* Enforce a key length limit of 255 bytes, which is the maximum length
permitted by the memcached protocol. The Tamil segment in the key above is 84
characters, but 233 bytes in UTF-8, which become 684 characters when
URL-encoded. To fix this, try to shrink any segment that would push the total
key length over the limit by md5()ing it. If the end result is *still* over
the limit (this would happen if, for example, $args consists of many short
strings), then concatenate all args together and MD5 them.
* MD5'd arguments are prefixed with '#'. Any "organic" '#'s in the key segments
are URL-encoded.
Change-Id: Ia46987d3b0a09bb6b1952abd936d4c72ea7c56a0
2015-10-23 18:10:10 +00:00
|
|
|
|
);
|
|
|
|
|
|
}
|
2015-10-28 15:26:37 +00:00
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
|
* @dataProvider validKeyProvider
|
2017-12-25 07:26:26 +00:00
|
|
|
|
* @covers MemcachedBagOStuff::validateKeyEncoding
|
2015-10-28 15:26:37 +00:00
|
|
|
|
*/
|
|
|
|
|
|
public function testValidateKeyEncoding( $key ) {
|
|
|
|
|
|
$this->assertSame( $key, $this->cache->validateKeyEncoding( $key ) );
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
public function validKeyProvider() {
|
2016-02-17 09:09:32 +00:00
|
|
|
|
return [
|
|
|
|
|
|
'empty' => [ '' ],
|
|
|
|
|
|
'digits' => [ '09' ],
|
|
|
|
|
|
'letters' => [ 'AZaz' ],
|
|
|
|
|
|
'ASCII special characters' => [ '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~' ],
|
|
|
|
|
|
];
|
2015-10-28 15:26:37 +00:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
|
* @dataProvider invalidKeyProvider
|
2017-12-25 07:26:26 +00:00
|
|
|
|
* @covers MemcachedBagOStuff::validateKeyEncoding
|
2015-10-28 15:26:37 +00:00
|
|
|
|
*/
|
|
|
|
|
|
public function testValidateKeyEncodingThrowsException( $key ) {
|
2018-01-13 00:02:09 +00:00
|
|
|
|
$this->setExpectedException( Exception::class );
|
2015-10-28 15:26:37 +00:00
|
|
|
|
$this->cache->validateKeyEncoding( $key );
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
public function invalidKeyProvider() {
|
2016-02-17 09:09:32 +00:00
|
|
|
|
return [
|
|
|
|
|
|
[ "\x00" ],
|
|
|
|
|
|
[ ' ' ],
|
|
|
|
|
|
[ "\x1F" ],
|
|
|
|
|
|
[ "\x7F" ],
|
|
|
|
|
|
[ "\x80" ],
|
|
|
|
|
|
[ "\xFF" ],
|
|
|
|
|
|
];
|
2015-10-28 15:26:37 +00:00
|
|
|
|
}
|
Improve normalization and sanitization of memcached keys
The motivation for this patch came from seeing the following error in the
memcached log:
memcached ERROR: Memcached error for key "flowdb:flow_ref:wiki:by-source:\
v3:0:tawikinews:பூமிக்கு_அச்சுறுத்தலான_சிறுகோள்களைக்_கண்டுபிடிக்கும்_முயற்சியில்_தனியார்_நிறுவன0jம்:4.7" \
on server ":": A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE
I submitted a fix to Flow in I26e531f6, but I noticed that AbuseFilter had a
similar issue (fixed by Aaron in I27b51a4b), so I started thinking about how
to solve this more generally:
* The regular expression we current use to sanitize keys does not cover
characters outside the ASCII range, but such characters can be illegal
if one of their constituent bytes (when taken by itself) is an ASCII
control character. So change the regular expression to cover any and all
characters that fall outside the range \x22-\x7e (and '#' -- see below).
* Enforce a key length limit of 255 bytes, which is the maximum length
permitted by the memcached protocol. The Tamil segment in the key above is 84
characters, but 233 bytes in UTF-8, which become 684 characters when
URL-encoded. To fix this, try to shrink any segment that would push the total
key length over the limit by md5()ing it. If the end result is *still* over
the limit (this would happen if, for example, $args consists of many short
strings), then concatenate all args together and MD5 them.
* MD5'd arguments are prefixed with '#'. Any "organic" '#'s in the key segments
are URL-encoded.
Change-Id: Ia46987d3b0a09bb6b1952abd936d4c72ea7c56a0
2015-10-23 18:10:10 +00:00
|
|
|
|
}
|