Commit graph

62 commits

Author SHA1 Message Date
Arlo Breault
c44a3958a3 Don't apply French spacing in raw text elements
This also means we don't need to take special care for French spacing in
attributes, since it's no longer applied there.

Adds a test that captures this change.

Note that the test "Nowiki and french spacing" wonders whether this
escaping should be applied to nowiki content.

Bug: T255007
Change-Id: Ic8965e81882d7cf024bdced437f684064a30ac86
2021-02-16 19:26:29 -05:00
Umherirrender
8de3b7d324 Use static closures where safe to use
This is micro-optimization of closure code to avoid binding the closure
to $this where it is not needed.

Created by I25a17fb22b6b669e817317a0f45051ae9c608208

Change-Id: I0ffc6200f6c6693d78a3151cb8cea7dce7c21653
2021-02-11 00:13:52 +00:00
Daimona Eaytoy
95e17ee645 Fix some unit tests accessing MediaWikiServices
These are mostly easy fixes. Tests were fixed when that didn't require
any change to the tested code, and moved to /integration otherwise.

MediaWikiUnitTestCase::setTemporaryHook was removed: the
caller should provide a HookContainer, at which point it would just
become a useless wrapper around HookContainer::register. (We don't
really need it to be temporary, if proper DI is used).
The method was only used in the tests touched by this commit.

Change-Id: I2aba02560c41b77eea9dd4bff0e4d1c4bb0da9a2
2020-11-12 19:13:47 +00:00
Arlo Breault
5723f1fa45 Remove figure-inline from the set of allowed tags in the Sanitizer
This was added in f6038b0 to keep Parsoid and the legacy parser in sync.
However, in T251641, we're moving away from using it in both.

Bug: T251641
Change-Id: I148bcf09e64ae443104723f94e6bbdb4ad23a8ef
2020-09-11 17:05:18 +00:00
jenkins-bot
c101873e0f Merge "Hard-deprecate Sanitizer::escapeIdReferenceList()" 2020-08-21 07:13:58 +00:00
C. Scott Ananian
e66f8e2393 Sanitizer: use RemexHtml entity table, instead of its own
Reduce code duplication by using the authoritative HTML entity list
from Remex, instead of duplicating the table inside MediaWiki.

This also extends the set of entities accepted in wikitext to nearly
match HTML5.  (HTML5 allows some entities which are not
semicolon-terminated; wiktext insists on the semicolon.)

This patch brings the core parser closer to Parsoid output, as in most
cases Parsoid already accepted the full HTML5 entity list.
(I873a6120e4bd1c69fee9da76d266e24e97a22add is a corresponding patch to
Parsoid to unify its copy of Sanitizer.)

Also deprecate Sanitizer::hackDocType() while we're updating it, since
this method should not be public.

Bug: T94603
Change-Id: Ia08bc261c3644f83109f13df04b692101b4e8ef2
2020-08-21 00:02:44 +00:00
C. Scott Ananian
b8abd8e01e Hard-deprecate Sanitizer::escapeIdReferenceList()
Code search: https://codesearch.wmcloud.org/search/?q=escapeIdReferenceList&i=nope&files=&repos=

Followup-To: Ifce057b0c436eabec310f812394e86ee7123e7c8
Change-Id: I18f2c47ad6b4f6256d1727f24314cc3c5e13f466
2020-08-20 19:59:13 -04:00
C. Scott Ananian
0de9d89e37 Sanitizer: Truncate IDs to a reasonable length; deprecate escapeIdReferenceList
Overly-long anchors can cause OOMs later on during TOC processing, and
are needless.

The method Sanitizer::escapeIdReferenceList() is also deprecated in
this patch, since it is a way to get around the ID length limit and
appears to be unused outside the Sanitizer class.  Since the use
within Sanitizer (for ARIA attributes) appears safe, we'll just make
this private in a future release and avoid the potential that someone
will misuse this.

Bug: T251506
Change-Id: Ifce057b0c436eabec310f812394e86ee7123e7c8
2020-08-13 11:33:16 -04:00
James D. Forrester
a52c933998 Drop Sanitizer::escapeId(), deprecated in MediaWiki 1.30
Hard deprecation was in b79c1e2, which shipped in MediaWiki 1.35.

Change-Id: I7186462c95d346f362ba0cf84b136c083d66a7d3
2020-07-29 17:08:45 -04:00
daniel
f59bf8a22f Use @internal instead of @private per policy
https://www.mediawiki.org/wiki/Stable_interface_policy mandates the use
of @internal. The semantics of @private was never properly defined.

Bug: T247862
Change-Id: I4c7c6e7b5a80e86456965521f88d1dfa7d698f84
2020-06-26 14:14:23 +02:00
Tim Starling
d459add63d Introduce wfDeprecatedMsg()
Deprecating something means to say something nasty about it, or to draw
its character into question. For example, "this function is lazy and good
for nothing". Deprecatory remarks by a developer are generally taken as a
warning that violence will soon be done against the function in question.
Other developers are thus warned to avoid associating with the deprecated
function.

However, since wfDeprecated() was introduced, it has become obvious that
the targets of deprecation are not limited to functions. Developers can
deprecate literally anything: a parameter, a return value, a file
format, Mondays, the concept of being, etc. wfDeprecated() requires
every deprecatory statement to begin with "use of", leading to some
awkward sentences. For example, one might say: "Use of your mouth to
cough without it being covered by your arm is deprecated since 2020."

So, introduce wfDeprecatedMsg(), which allows deprecation messages to be
specified in plain text, with the caller description being optionally
appended. Migrate incorrect or gramatically awkward uses of wfDeprecated()
to wfDeprecatedMsg().

Change-Id: Ib3dd2fe37677d98425d0f3692db5c9e988943ae8
2020-06-22 14:34:39 +10:00
jenkins-bot
efdcab9f0b Merge "Hard-deprecate sequential array as parameter to Sanitizer::validateAttributes" 2020-06-15 12:31:03 +00:00
C. Scott Ananian
3928fd218e Hard-deprecate sequential array as parameter to Sanitizer::validateAttributes
Code search:
https://codesearch.wmflabs.org/search/?q=validateAttributes&i=nope&files=&repos=

Bug: T255049
Depends-On: I68f122d5a3fa06b0434863cff73851a39dd10514
Depends-On: Ia6315da837f1b27794bac8bc2e96008c60ca28ae
Change-Id: Ie942e7e24dbf3256db15fe83bb5592f7a7c2fbc1
2020-06-15 12:11:30 +00:00
jenkins-bot
1bf3db2214 Merge "Use 'list of allowed attributes' in Sanitizer, instead of 'whitelist'" 2020-06-12 23:37:20 +00:00
C. Scott Ananian
8c43d75841 Hard deprecate $wgAllowImageTag configuration
The future Parsoid parser will not support this, and it appears to be unused.

It could be reimplemented as an extension tag once it is removed from core.

Code search:
https://codesearch.wmflabs.org/search/?q=allowimagetag&i=fosho&files=&repos=

Bug: T254802
Change-Id: I1b532a7a8794766f8df6fdf375a6ffd78fee94e5
2020-06-10 21:01:10 +00:00
C. Scott Ananian
86fb3b14af Use 'list of allowed attributes' in Sanitizer, instead of 'whitelist'
Bug: T254646
Change-Id: I48d1a5b318c3511fae94291d84f65e5c9cd05a27
2020-06-10 15:58:39 -04:00
C. Scott Ananian
ae594f44f0 Remove unnecessary use of black/whitelist in Sanitizer comments
Bug: T254646
Change-Id: Ie1d4ce761f02304db4a990495e687e75e6783411
2020-06-10 13:30:59 -04:00
Tim Starling
68c433bd23 Hooks::run() call site migration
Migrate all callers of Hooks::run() to use the new
HookContainer/HookRunner system.

General principles:
* Use DI if it is already used. We're not changing the way state is
  managed in this patch.
* HookContainer is always injected, not HookRunner. HookContainer
  is a service, it's a more generic interface, it is the only
  thing that provides isRegistered() which is needed in some cases,
  and a HookRunner can be efficiently constructed from it
  (confirmed by benchmark). Because HookContainer is needed
  for object construction, it is also needed by all factories.
* "Ask your friendly local base class". Big hierarchies like
  SpecialPage and ApiBase have getHookContainer() and getHookRunner()
  methods in the base class, and classes that extend that base class
  are not expected to know or care where the base class gets its
  HookContainer from.
* ProtectedHookAccessorTrait provides protected getHookContainer() and
  getHookRunner() methods, getting them from the global service
  container. The point of this is to ease migration to DI by ensuring
  that call sites ask their local friendly base class rather than
  getting a HookRunner from the service container directly.
* Private $this->hookRunner. In some smaller classes where accessor
  methods did not seem warranted, there is a private HookRunner property
  which is accessed directly. Very rarely (two cases), there is a
  protected property, for consistency with code that conventionally
  assumes protected=private, but in cases where the class might actually
  be overridden, a protected accessor is preferred over a protected
  property.
* The last resort: Hooks::runner(). Mostly for static, file-scope and
  global code. In a few cases it was used for objects with broken
  construction schemes, out of horror or laziness.

Constructors with new required arguments:
* AuthManager
* BadFileLookup
* BlockManager
* ClassicInterwikiLookup
* ContentHandlerFactory
* ContentSecurityPolicy
* DefaultOptionsManager
* DerivedPageDataUpdater
* FullSearchResultWidget
* HtmlCacheUpdater
* LanguageFactory
* LanguageNameUtils
* LinkRenderer
* LinkRendererFactory
* LocalisationCache
* MagicWordFactory
* MessageCache
* NamespaceInfo
* PageEditStash
* PageHandlerFactory
* PageUpdater
* ParserFactory
* PermissionManager
* RevisionStore
* RevisionStoreFactory
* SearchEngineConfig
* SearchEngineFactory
* SearchFormWidget
* SearchNearMatcher
* SessionBackend
* SpecialPageFactory
* UserNameUtils
* UserOptionsManager
* WatchedItemQueryService
* WatchedItemStore

Constructors with new optional arguments:
* DefaultPreferencesFactory
* Language
* LinkHolderArray
* MovePage
* Parser
* ParserCache
* PasswordReset
* Router

setHookContainer() now required after construction:
* AuthenticationProvider
* ResourceLoaderModule
* SearchEngine

Change-Id: Id442b0dbe43aba84bd5cf801d86dedc768b082c7
2020-05-30 14:23:28 +00:00
C. Scott Ananian
05bc687111 Use HTML5 semantics for self-closed HTML tags in wikitext
This behavior has been deprecated and with a tracking category since
1.28.  Time to remove the temporary parameter added to
Sanitizer::removeHTMLtags() and (finally) tweak the behavior to match
HTML5.

Bug: T134423
Change-Id: I5c725175d05854139c95a2b3d8d35ff63cb6707b
2020-05-27 11:59:18 -04:00
Reedy
af063dd794 Fix more Squiz.Scope.MethodScope.Missing
Change-Id: I44cd7ba39a898a27f0f66cf34238ab95370d2279
2020-05-18 21:02:14 +00:00
Reedy
b038d6333a Fix even more PSR12.Properties.ConstantVisibility.NotFound
Change-Id: I6d98efcfac1f1c0ab6a442e0af6d5daa6ef7801a
2020-05-16 00:28:41 +00:00
Reedy
12a3883a7b Fix SingleSpaceBeforeSingleLineComment
Change-Id: I285af438ce484af40741489797f20455726ec110
2020-05-11 00:57:11 +00:00
C. Scott Ananian
83a22b7fcd Remove codepaths which ran parser in 'untidy' mode
Disabling tidy has been deprecated since 1.33.  This cleans up the code
paths which still used untidy output.

Bug: T198214
Change-Id: I821ef3b8f59b272d983583d407b2f0794fe1e791
2020-04-13 21:34:04 +00:00
Brian Wolff
b186b20d9f Allow users to set tabindex="0" on elements
Important for keyboard focusability of elements in order to ensure for 
example users with motoric impairments to reach those elements.
This patch does not allow setting tabindex="-1" or tabindex > 0.
tabindex > 1 seems like a terrible idea to allow users to do.
I don't see any valid reason for tabindex="-1" in wikitext, so
lets not allow that for now either.

Bug: T247910
Change-Id: I5065b2deeb14bdb3682dd176b87f254ac6f2cf88
2020-03-18 01:17:30 +00:00
Brian Wolff
0bdce21381 Make id attributes not include ascii whitespace per spec
HTML5 says id attributes should not have whitespace, where
whitespace is defined as LF, CR, FF, TAB or SPACE (oddly enough
VT does not count). Firefox in my testing actually was fine with
these except CR. Nonetheless we should follow the spec, so this converts
these whitespace characters to _. I don't think this will
cause any back-compat issues, since its very hard to make these
characters in wikitext (other than space which was already
being converted) and basically requires either Lua or html entities
to make these (with FF seeming to be impossible).

Bug: T238385
Depends-On: Ie6fa40798f06a358f6082110b4d8cc0028c80321
Change-Id: Ie2b7c9429691e2c491c3359d5b400d8f078aa789
2020-02-25 05:27:33 -08:00
Brian Wolff
28d44262aa Escape % sign if form valid percent-encoding in fragment identifiers
Currently if you combine a valid percent encoding and a non
escaped character that is reserved in urls in a headline, the toc
link does not work. E.g. ==`%41== needs #`%2541 but we currently
generate #`%41 which matches ==`A== instead.

Tested in firefox and chrome

Bug: T238385
Change-Id: Ice2bbf79bed612d488ed6feb7510035e9dfb33af
2020-02-15 02:54:32 -08:00
C. Scott Ananian
0437877656 Whitelist aria-hidden attribute in Sanitizer
Bug: T204618
Change-Id: I34b9b729eccd7658d5165b6661e5fd45a733b36c
2020-01-28 21:54:16 +00:00
jenkins-bot
015d3d7a9b Merge "Hard-deprecate Sanitizer::escapeId()" 2020-01-26 22:22:29 +00:00
C. Scott Ananian
b79c1e22ad Hard-deprecate Sanitizer::escapeId()
Deprecated in MW 1.30; time to clean up any remaining uses.

Code search:
https://codesearch.wmflabs.org/deployed/?q=escapeId%5C%28&i=nope&files=&repos=

Depends-On: Ic03a5da2e1d6b8f5656555420dd573a1d698b9cc
Depends-On: I311f44a5035f73c0fb2289f727eb39b73007429b
Depends-On: I76c5b539bae5572c4ac65f28fec9c0c36381348c
Depends-On: Id4cbfc3b113b1b04f949d485187e89ffe0b487f5
Depends-On: I7d5ba4930688ed7f011a4babed5986b8e40910a0
Depends-On: I964f83ce88fb9c66a7c59037c6066f4567bcf4c9
Change-Id: I89504cfdf8e02831d54a26900bfdc63a33b4eade
2020-01-26 22:05:45 +00:00
C. Scott Ananian
2d4aced658 Remove Sanitizer::attributeWhitelist()/setupAttributeWhitelist()
These method were deprecated in 1.34 and should never have been public
in the first place.  New private methods have replaced them.

Code search:
https://codesearch.wmflabs.org/deployed/?q=attributeWhitelist%5C%28&i=nope&files=&repos=

Change-Id: I363530b7edaced77f2c5b06721b1930d85e2e9dc
2020-01-25 13:06:19 -05:00
James D. Forrester
0958a0bce4 Coding style: Auto-fix MediaWiki.Usage.IsNull.IsNull
Change-Id: I90cfe8366c0245c9c67e598d17800684897a4e27
2020-01-10 14:17:13 -08:00
Tim Starling
164a3ac1f0 Remove IE 6 security features from server-side code
* Deprecate WebRequest::checkUrlExtension() and have it always return
  true. This reverts the security fixes made for T30235.
* Remove IEUrlExtension. This is a helper for checkUrlExtension() which
  is not used in any extensions.
* Remove CSS sanitization code which is specific to IE6. This reverts
  the changes made to fix T57332, and related followups. I confirmed
  that the relevant test cases do not result in XSS on IE8.
* Remove related tests.

Bug: T232563
Change-Id: I7318ea4a63210252ebc64968691d4f62d79a63e9
2019-11-28 15:11:56 +11:00
C. Scott Ananian
b14c1e2bea Improve efficiency of french-spacing regexp
Improvement pointed out by Od1n (thanks!).

Bug: T197902
Change-Id: I4c560539873b2c50f8658df89263e927efc9ce10
2019-10-25 09:20:00 -04:00
Max Semenik
8a98dd9d59 Convert some private static arrays to constants
Remove @since for some private ones as we don't guarantee anything
about private class members.

Change-Id: Ifb898353c02082e9ef69d67f69339345c6cd154d
2019-10-16 01:30:54 +00:00
sbassett
dcdbd13d97 Set @return-taint of Sanitizer::stripAllTags to tainted
phan-taint-check (aka SecurityCheckPlugin) doesn't recognize
Sanitizer::stripAllTags' output as tainted in certain situations.
Adding a @return-taint of tainted to ensure that it does, which
may result in the reporting of more issues.

Bug: T230234
Change-Id: I357c168417a26882c7c460df20f36ec2be401096
2019-08-13 17:07:27 -05:00
C. Scott Ananian
bda42cef3c Deprecate Sanitizer::setupAttributeWhitelist/attributeWhitelist
These methods should be made private in the next release, but
hard-deprecate them for 1.34.

Tweak the return value of the attribute whitelist to be an
associative rather than a sequential array, which makes the
lookup of allowed attributes more efficient and avoids an
array_flip for every html element sanitized.

Bug: T221677
Change-Id: I17d734937accec6c2679dbe17328cf9554bd556a
2019-06-20 14:42:20 -04:00
Fomafix
110a5877e9 Use [...] instead of array(...) in PHP comments and documentation
Change-Id: I0c83783051bf35fe785bc01644eeb2946902b6b2
2019-06-17 21:15:09 +02:00
Max Semenik
214b37ff07 SECURITY: blacklist CSS var()
Bug: T208881
Change-Id: I9a4ced2bc47eb5f96cf35e693bf5261c48acb126
2019-06-06 16:15:55 +00:00
C. Scott Ananian
f6038b0c81 Allow <figure-inline> attributes through Sanitizer
Parsoid uses <figure-inline> for inline figures.  The intention is to
transition core to use <figure> and <figure-inline> as well in the
future (T118517).  As a first step (and to keep Parsoid and the legacy
parser in sync) allow <figure-inline> attributes in the Sanitizer.

Note that this does not allow <figure-inline> in wikitext,
since neither <figure> nor <figure-inline> is on the
getRecognizedTagData() list.

Bug: T51097
Bug: T118517
Bug: T118520
Change-Id: I5248717739bef0f7106c2bcf0b4a15acbc3c9a68
2019-04-22 13:04:49 -04:00
C. Scott Ananian
cfc60acc56 Synchronize allowed attributes for <audio> with Parsoid/TimedMediaHandler
We synchronized the allowed attributes for <video> in
4e7483ffd3 but then decided to use the
<audio> tag for audio media in Parsoid commit
5f3dbdc8794f2605101609f28e679df29a0387bc and updated its Sanitizer,
but never updated core to match.

Bug: T163583
Bug: T133673
Change-Id: Iefcbead2f335949eb45e2880861fd9473b810367
2019-04-22 13:04:49 -04:00
Reedy
c13fee87d4 Collapse some nested if statements
Change-Id: I9a97325d738d09370d29d35d5254bc0dadc57ff4
2019-04-04 19:02:22 +00:00
Max Semenik
adf90edb33 Sanitizer: remove deprecated parameter to escapeIdReferenceList()
Change-Id: Iacd5796718c1d64e7290cfd9669c99d8f9e85dc5
2019-02-21 20:12:22 -08:00
jenkins-bot
c984a1f2f8 Merge "Quoted attributes don't need to be followed by a space" 2018-11-27 16:21:41 +00:00
Arlo Breault
59bb8864a2 Quoted attributes don't need to be followed by a space
Further, this splits up attribute parsing from filtering.

Change-Id: Ib4e0a808a6ca2ba032873e885837233e2f2feefe
2018-11-09 16:00:18 -05:00
C. Scott Ananian
54ac31f94d Hard deprecate codepaths where tidy is disabled
Future parsers will not support the output generated with tidy disabled.

Parser tests using untidied output will also be deprecated (and
rewritten) in a follow-up patch.

No new release notes necessary since user-visible tidy configuration
was deprecated previously (in 1.32), and individual methods which had
disabled tidy during execution were individually release-noted as they
were updated.

Bug: T198214
Depends-On: I0f417f75a49dfea873e9a2f44d81796a48b9f428
Depends-On: If5c619cdd3e7f786687cfc2ca166074d9197ca11
Change-Id: I592e0e0dfef7d929f05c60ffe4d60e09725b39cc
2018-11-05 18:49:16 +00:00
Erik Bernhardson
0d779c1ac6 Preserve whitespace in search index text content
Certain html tags imply a word break, but our html stripping doesn't
understand that at all. Adjust the html stripping to inject whitespace
for all block level tags (per MDN) along with the <br> element.

Bug: T195389
Change-Id: I9fbfac765ea88628e4f9b2794fb54e1cd0060203
2018-09-14 11:10:35 -07:00
Aryeh Gregor
90d4f56fe4 Mass conversion of $wgContLang to service
Brought to you by vim macros.

Bug: T200246
Change-Id: I79e919f4553e3bd3eb714073fed7a43051b4fb2a
2018-08-11 22:44:29 -06:00
jenkins-bot
1360a2884a Merge "Don't armor french spaces before punctuation followed by word characters" 2018-07-13 17:22:34 +00:00
Umherirrender
130ec2523d Fix PhanTypeMismatchDeclaredParam
Auto fix MediaWiki.Commenting.FunctionComment.DefaultNullTypeParam sniff

Change-Id: I865323fd0295aabd06f3e3c75e0e5043fb31069e
2018-07-07 00:34:30 +00:00
C. Scott Ananian
be266087b4 Don't armor french spaces before punctuation followed by word characters
This makes Sanitizer::armorFrenchSpaces() more selective about where it
inserts &nbsp;, avoiding the need to protect common "not actually French"
cases like `color: red !important` and `foo :bar`.

We also added the single-guillemet to the rules, to accomodate Swiss French.

Bug: T197902
Change-Id: I42e747f17c17c1513fec96cdd2d3285da7da05a4
2018-06-26 12:51:48 -04:00