Commit graph

1423 commits

Author SHA1 Message Date
Amir Sarabadani
d8e542abf9 Reorg: Move three output related classes to includes/Output/
And namesapce them:
 - StreamFile
 - OutputHandler
 - OutputPage

Bug: T321882
Change-Id: Iedf8d88c595e580f2d8f0734c92aa5c45618ba33
2023-09-05 19:36:42 +01:00
Derick Alangi
8abcc65747 parser: Remove b/c alias Parser::OT_MSG flag
In https://static-codereview.wikimedia.org/MediaWiki/29945.html,
this flag/constant was aliased for backward-compatibility and it
has been a very long time now (since 2008), there is usage.

The constant is no longer used anywhere, just a comment in mw-config
which I'll remove in another patch.
See: https://codesearch.wmcloud.org/search/?q=%28Parser%7Cself%29%3A%3AOT_MSG&files=&excludeFiles=&repos=

Change-Id: I8f065af61bd497a5174c553d754c5484c8639dff
2023-09-05 12:19:08 +00:00
jenkins-bot
36b8da0c30 Merge "parser: Remove references to preprocessorFuzzTest.php script" 2023-09-05 01:07:59 +00:00
Derick Alangi
452633ecf3
parser: Use ServiceOptions already injected in Parser::__construct()
We already have the ServiceOptions injected into the constructor, let's
make sure we have all the settings added to the constant and use that
instead of accessing the full config via global service container.

Except for static methods, other cases can be easily replaced as done
in this patch.

Change-Id: Id173775ea48c302cdfa698db33a0b75d6da76652
2023-09-04 23:40:59 +01:00
Derick Alangi
538d0b93b9
parser: Remove references to preprocessorFuzzTest.php script
This script was deleted in Ib845f1bc2cd5c452a998b01612d45fe59e8ffc37
and seems some references to it were forgotten.

@note: The methods deleted along-side are private so no need for them
to go through the deprecation process.

Change-Id: Ib4df56e225b44f5a69be1b635d40537863a971e3
2023-09-04 22:49:05 +01:00
Amir Sarabadani
15a278189f Reorg: Move MWTimestamp to MediaWiki\Utils
Bug: T321882
Change-Id: I48c10343295c4eb3d9ef8037343b0070e928f040
2023-08-19 05:53:40 +02:00
Subramanya Sastry
83ea46ff65 Reconcile Parsoid opts in ParsoidOutputAccess & ParserOutputAccess
* Explicitly set wrapSections to true. This has have no significant
  impact since it defaults to true within Parsoid.
* 'pageName' and 'prefix' removed from ParsoidOutputAccess since
  they are not needed / used in Parsoid.
* 'logLinterData' need to be set in the ParserOutputAccess paths.
* A bunch of documentation FIXMEs as I was digging through the code.
* Record a FIXME that ParsoidOutputAccess and ParsoidParser (which
  is used in the ParserOutputAccess use page) differ in how they
  handle the language value (whether the default value of the title /
  page or the pageLanguageOverride from the REST API). ParsoidParser
  computes a preferred variant whereas ParsoidOutputAccess right now
  does NOT do that. So, as part of the switchover to ParserOutputAccess,
  we will need to set disableContentConversion in ParserOptions.

  That will happen in a later patch.

Bug: T332931
Change-Id: I7326ae3452a7d496a57f5c4ff2ddeaf0daa7ab70
2023-08-10 23:40:26 +00:00
Tim Starling
6790bf9910 Remove $wgLang usage from Title
StubUserLang was meant to avoid the cost of looking up the user
preferences on requests which don't need it. There's no point in using
it if you are going to unconditionally call a method on the resulting
object.

StubUserLang proxies to RequestContext::getLanguage() via __call(),
which has a cost. Originally this cost was avoided on subsequent calls
by overwriting $wgLang, but this mechanism is not effective if you retain
a reference to the StubUserLang.

Removing the potential for Title::getPageLanguage() to return
StubUserLang simplifies the type declarations for methods that call it.

Bug: T160814
Change-Id: I12ad75c2496ca727580aac55e860178d15febb6e
2023-07-11 11:15:02 +10:00
Arlo Breault
50401b2c7e SECURITY: Move badFile lookup to Linker
CVE-2023-36674

Bug: T335612
Change-Id: I849d02f1d3dc9995353b7a9995601d214053dca3
2023-06-30 15:46:54 +00:00
Daimona Eaytoy
518a5da533 Replace deprecated MWException
Bug: T328220
Change-Id: I0408575ee71e58d1c9e9ebedabab35bd3813f515
2023-06-12 12:27:49 +00:00
jenkins-bot
8c369d467a Merge "Add a page property for __TOC__" 2023-05-20 12:49:41 +00:00
Arlo Breault
8a286dbceb Support multilingual SVGs in page language in galleries
Follow-up of I348840ef405e1370cc0c17d69051bce30153c9c0 for the gallery part.

Bug: T205040
Bug: T310453
Change-Id: Ia0c699675d40f6effbe359818aca3278c56042e3
2023-05-15 07:55:06 +00:00
Umherirrender
e04d3a28f6 Replace internal Hooks::runner
The Hooks class contains deprecated functions and the whole class is
going to get removed, so remove the convenience function and inline the
code.

Bug: T335536
Change-Id: I8ef3468a64a0199996f26ef293543fcacdf2797f
2023-05-11 06:17:38 +00:00
C. Scott Ananian
52d1259b1b Parser: Simplify showTOC/suppressTOC logic
Just a bit of cleanup to simplify the logic around showing/suppressing
the TOC.

Change-Id: I99f1f29bf067df2ea3f9f235af7ce054d7e4af68
Followup-To: Ib41e6e4926cb752826ad75d10e8692125fc0b064
2023-04-28 11:23:58 -04:00
C. Scott Ananian
0448851e92 Add ParserOutputFlags::NO_TOC
Rather than suppress the TOCData in ParserOutput when __NOTOC__ is used,
set a new parser output flag, NO_TOC, since some clients want to know
whether there are sections present on the page irrespective of whether
the UX for the Table Of Contents should be displayed/suppressed.

Added OutputPage::getOutputFlag() as an @internal method for the
moment; eventually we should use the same object to represent
metadata in ParserOutput and OutputPage (T301020).

Bug: T332243
Followup-To: Ife2126ace95ac4d9ec44f6374c63d8fc995cf034
Followup-To: Iea6426336f93c053a5977768f0785cdb46daf5bf
Change-Id: Ib41e6e4926cb752826ad75d10e8692125fc0b064
2023-04-28 10:56:57 -04:00
Bartosz Dziewoński
6ba47296d9 Fix Phan suppressions related to Title::castFrom*() and friends
There is no way to express that Title::castFromPageIdentity(),
Title::castFromPageReference() and Title::castFromLinkTarget()
can only return null when the parameter is null. We need to add
Phan suppressions or explicit types almost everywhere that these
methods are used with parameters that are known to not be null.

Instead, introduce new methods Title::newFromPageIdentity() and
Title::newFromPageReference() (Title::newFromLinkTarget() already
exists), without the null-coalescing behavior, and use them when
the parameter is not null. This lets static analysis tools, and
humans, easily understand where nulls can't appear.

Do the same with the corresponding TitleFactory methods.

Change the obvious uses of castFrom*() to newFrom*() (if there is
a Phan suppression, a type check, or a method call on the result).

Change-Id: Ida4da75953cf3bca372a40dc88022443109ca0cb
2023-04-22 16:45:09 +02:00
jenkins-bot
b1979fef97 Merge "Revert "Display SVGs in page view language for language variants"" 2023-04-17 13:27:36 +00:00
Func
eb065bb6e1 Revert "Display SVGs in page view language for language variants"
This reverts commit 42aa5f9481.

Reason for revert: Caused T334753, the proposed fix may need more time for review. Let's revert for now, before the train cut.

Bug: T310453
Bug: T334753
Change-Id: I790604eef00491b7f2a921fb3423a2f727f6593b
2023-04-17 11:53:37 +00:00
jenkins-bot
97d5377ea3 Merge "Display SVGs in page view language for language variants" 2023-04-13 21:11:10 +00:00
Aaron Schulz
366a0afd63 parser: improve cache TTL accuracy for CURRENT*/LOCAL* magic words
Consolidate cache TTL handling within CoreMagicVariables.

Make the TTL account for how many seconds away the value is from changing.
For example, CURRENTHOUR should change soon after the next hour is reached.
There is a minimum adjustment TTL to avoid parser-after-save delays.

This allows for longer caching in most cases, as well as more up-to-date
rendering when the hour/day/week/year is about to change. Previously, there
were blind TTLs, which are either way too pessimistic or way too generous.

This commit does not change the CURRENTTIME, CURRENTTIMESTAMP, LOCALTIME,
and LOCALTIMESTAMP words, since there is no reasonable way to cache output
while keeping them up-to-date.

Bug: T320668
Change-Id: I9acb42b0d9ff67798a1624cbf9c7cac99c8fbe2f
2023-03-28 22:35:17 +00:00
jenkins-bot
90997943f9 Merge "Parser: Remove back-compatibility NO_TOC_CONVERSION code" 2023-03-27 20:43:53 +00:00
Winston Sung
42aa5f9481 Display SVGs in page view language for language variants
Bug: T310453
Change-Id: I45e495d2c4fc026bdfc54e3219ff7138789d25dd
2023-03-27 19:59:20 +00:00
Tim Starling
be3018b268 Just another 80 or so PHPStorm inspection fixes (#4)
* Unnecessary regex modifier. I agree with this inspection which flags
  /s modifiers on regexes that don't use a dot.
* Property declared dynamically.
* Unused local variable. But it's acceptable for an unused local
  variable to take the return value of a method under test, when it is
  being tested for its side-effects. And it's acceptable for an unused
  local variable to document unused list expansion elements, or the
  nature of array keys in a foreach.

Change-Id: I067b5b45dd1138c00e7269b66d3d1385f202fe7f
2023-03-25 00:39:06 +00:00
Tim Starling
317b460500 Fix even more PHPStorm inspections (#3)
* Inappropriate @inheritDoc usage. Arguably all @inheritDoc is
  inappropriate but these are the ones PHPStorm flags as misleading
  due to the method not being inherited.
* Doc comment type does not match actual argument/return type.
* I replaced "@return void|never" with "@return void" since never means
  never, it doesn't make sense for it to be conditional. If a method
  can return (even if that is unlikely) then @return contains the type
  that it returns. "@return never" means that there is no such type
  because the method never returns.
* Incomplete/partial/broken doc tags

Change-Id: Ide86bd6d2b44387f37d234c2b059d6fbc42ec962
2023-03-25 00:30:15 +00:00
Tim Starling
580ec48e5b Fix more PHPStorm inspections (#2)
* Illegal string offset and invalid argument supplied to foreach, due to incorrect type information
* Array internal pointer reset is unnecessary
* $hookData unused since MW 1.35 due to incomplete revert
* array_push() with single element
* Unnecessary sprintf()
* for loop can be replaced with str_repeat()
* preg_replace() can be replaced with rtrim()
* array_values() call is redundant
* Unnecessary cast to string
* Unnecessary ternary. Often the result relies on short-circuit evaluation, but I find it more readable nonetheless.

Change-Id: I4c45bdb59b51b243fa96286bec8b58deb097d707
2023-03-25 00:19:58 +00:00
C. Scott Ananian
8aae904254 Parser: Remove back-compatibility NO_TOC_CONVERSION code
The TOC used to be language-converted in ParserOutput::getText(), but
it wasn't possible to apply custom rules defined in the wikitext
article body at ::getText() time.  Remove the various hacks that we'd
added in an attempt to do so, which were made unnecessary by
I321cd31dae64bbf845d53282e5d28a55bc4ec319.

Bug: T306862
Change-Id: Ib12cd02e9ade91d5794462e8833f2aa3b45a51f2
2023-03-24 22:14:42 +00:00
C. Scott Ananian
183a6da420 Add ParserOutput::getLanguage()
Provide a way for backend code to determine the primary language of a
ParserOutput, eg for setting the Content-Language header of an API
response.

This is read-only and backed by extension data at the moment for
transition purposes; if this API sticks we'll graduate it to a
"real" property in the future, with appropriate serialization
to/from JSON (T303329).

Similarly, this patch only includes the most basic code to handle
the various ParserOutput merge cases in
ParserOutput::merge{Internal,Html,Tracking}MetaDataFrom(),
ParserOutput::collectMetadata(), and
OutputPage::addParserOutput{Content,Metadata,Text,}(); mostly
inherited from the fact that the storage is backed by extension
data at the moment.

Generally only the "top-level" parser output gets to set the
primary language; we'll presumably need to ensure that the
language is consistent during merge.

Change-Id: I767daba22805a877d9b806fd77334e508902844b
2023-03-10 18:42:29 -05:00
C. Scott Ananian
4e4008c976 Don't clear LanguageConverter display title when converting ToC
The LanguageConverter::convert()/::convertTo() methods clear the
converted title and reset other (less important) bits of
LanguageConverter state.  Add an optional parameter in order
to skip this reset.

(The LanguageConverter::translate() methods are available which
don't reset LanguageConverter state, but they also don't process
embedded language converter markup.  Since headings can contain
embedded markup, the ::translate() methods aren't appropriate.)

Bug: T306862
Bug: T331316
Change-Id: Ifb2745e45974755ba5a6068c13e84be6c4e3f329
2023-03-09 13:08:01 -05:00
jenkins-bot
cc60b9a3c4 Merge "Parser: Cleanup the getRevisionRecordObject() method" 2023-03-04 02:36:18 +00:00
James D. Forrester
ad06527fb4 Reorg: Namespace the Title class
This is moderately messy.

Process was principally:

* xargs rg --files-with-matches '^use Title;' | grep 'php$' | \
  xargs -P 1 -n 1 sed -i -z 's/use Title;/use MediaWiki\\Title\\Title;/1'
* rg --files-without-match 'MediaWiki\\Title\\Title;' . | grep 'php$' | \
  xargs rg --files-with-matches 'Title\b' | \
  xargs -P 1 -n 1 sed -i -z 's/\nuse /\nuse MediaWiki\\Title\\Title;\nuse /1'
* composer fix

Then manual fix-ups for a few files that don't have any use statements.

Bug: T166010
Follows-Up: Ia5d8cb759dc3bc9e9bbe217d0fb109e2f8c4101a
Change-Id: If8fc9d0d95fc1a114021e282a706fc3e7da3524b
2023-03-02 08:46:53 -05:00
C. Scott Ananian
e7a762fd59 Language-convert Table of Contents at parse time
In 24949480eb (Oct 2021) injection of
the Table of Contents was moved from Parser to
ParserOutput::getText(); that is, from parse time to "postprocess text
possibly fetched from the cache" time.  Unfortunately, this meant that
language conversion wasn't done on the table of contents (!), for
either traditional skins or the vector-2022 skin.  This was fixed for
traditional skins by 059e62cde6 (Nov
2021), later amended by 0955046ca5 (Mar
2022), which added explicit language conversion to the TOC injection
process in ParserOptions::getText().  This fix was still not complete,
however, since editor-defined custom language-conversion rules defined
in the article body were no longer available to the language converter
when conversion was done in ParserOutput::getText(); the ToC title was
also being double-converted.  Further, neither of these short-term
fixes addressed the output of ParserOutput::getSections() (now
ParserOutput::getTOCData()) which was used by vector-2022 to generate
the ToC in the sidebar and which remained entirely unconverted.

With 439656e019 (Jan 2023), we started
using the ::getSections()/::getTOCData() output for main article text
as well, but we kept the previous hack which post-converted the
generated HTML. This kept old skins at parity with the post-Oct-2021
status, but also didn't address the conversion issue for vector-2022.

The solution here is to perform language conversion on the ToC lines
at parse time along with the rest of the language conversion, and
store *converted* headings in TOCData.  This has a number of side
effects:

1. The ToC information array available via the action API
is now language converted.  This is *probably* what you wanted in the
first place, but could potentially be disruptive.

2. The ToC is consistently converted with the full set of
editor-defined custom conversion rules.  Before Oct 2021, the ToC was
converted using the set of custom conversion rules *active at the
point at which the ToC was inserted* (which was usually near the
beginning of the article).  When all conversion rules appear at the
very top of the article (best practice!):

 -{en:Foo; en-x-piglatin:Bar;}
 Lead section text
 == Introduction ==
 == Foo ==

There should be no difference before pre-Oct 2021 behavior and the
behavior after this patch: in both cases the rule defined in the
article body will be applied both to the heading and to the TOC, and
they will be consistent.  (After Oct 2021 and before this patch, Foo
would be converted in the heading but not in the table of contents.)

But in cases where conversion rules are defined after the
TOC insertion point, the section heading as it appears in the body
text could appear different from the section heading as it appears in
the ToC.  For example, if you defined a conversion rule just before
using a term in a heading:

 == Introduction ==
 -{en:Foo; en-x-piglatin:Bar;}-
 == Foo ==

Before Oct 2021, this rule would be applied to the heading, but not to
the TOC (because the TOC insertion point was before the rule
definition).  This would also be the behavior before this patch (since
rules defined in the article body are currently not applied at all).
After this patch, the rule will be applied to both the heading and the
TOC (because the rule application location is effectively "at the very
end of the article").  In the rare cases when rules are not defined in
glossaries at the top of the article, this type of usage (definition
immediately preceding first use) is expected to be the most common
and the behavior after this patch is more correct.

But alternatively, if you defined a conversion rule *after* using
the term in a heading:

 == Introduction ==
 == Foo ==
 -{en:Foo; en-x-piglatin:Bar;}-

Before Oct 2021, this rule wouldn't be applied to the heading *or* the
TOC.  Before this patch, this would also be the case (because rules
defined in the article body are not applied at all).  After this
patch, the rule will be applied to the ToC but not the heading, since
the application point for the TOC is effectively at the end of the
article.  This inconsistency is probably not desirable, but this case
is expected to be rare, and (assuming the editor intended 'Foo' to be
unconverted) the editor can work around the inconsistency by
explicitly protecting 'Foo' from conversion:

  == -{Foo}- ==
  -{en:Foo; en-x-piglatin:Bar;}-

And if the editor /intended/ Foo to be converted, the rule definition
should be moved earlier in the article.  Again, putting all rules at
the top of the article is the preferred style, and works better with
the glossary style used by the zhwiki community (see also
https://www.mediawiki.org/wiki/Requests_for_comment/Scoped_language_converter
).

Bug: T306862
Depends-On: I0c9c9fec920f7cb028d935e552a8f11475a23ba7
Change-Id: I321cd31dae64bbf845d53282e5d28a55bc4ec319
2023-02-24 10:09:53 -05:00
Func
b08c7643ec Parser: Cleanup the getRevisionRecordObject() method
We can early-return in this case.

Change-Id: Ia817e272aca6981d3138f17be2608ec89bf8fd79
2023-02-24 13:53:05 +08:00
jenkins-bot
548ede7d7b Merge "CoreMagicVariables/CoreParserFunction: unify revisionid" 2023-02-24 04:58:07 +00:00
jenkins-bot
5e2a43166c Merge "Reorg: Move five page-related classes to page/ out of includes/" 2023-02-23 16:43:43 +00:00
Amir Sarabadani
0f13e81a15 Reorg: Move five page-related classes to page/ out of includes/
These classes:
 - MergeHistory
 - MovePage
 - ProtectionForm
 - BadFileLookup (to MediaWiki\Page\File)
 - FileDeleteForm (to MediaWiki\Page\File)

Bug: T321882
Change-Id: Ibeb488ba322c62a34042a0307bbb5562773bcad1
2023-02-23 17:03:49 +01:00
jenkins-bot
9839dd4387 Merge "Parser: Section offsets are in codepoints, not in bytes" 2023-02-23 04:00:12 +00:00
Func
eb60f38513 Parser: Section offsets are in codepoints, not in bytes
The offset is actually mesured in codepoints, not in bytes.

This field is meant to replace the "byteoffset" field since the
current naming is misleading, and we already have misused it in
deployed extensions.  Support was added in Parsoid in
Ide436dca5a609c866da3c63049723243b8242c34 and the patch depends
on a version of Parsoid with that patch in mediawiki-vendor.

Parsoid still uses the old name in the ::toLegacy() serialization, and
thus in the action API, but that method will eventually be deprecated
(T327439, T330232).

Bug: T319141
Depends-On: Iacdd9a11b79bbafb9cfe9568c889ed721a137833
Depends-On: Ide436dca5a609c866da3c63049723243b8242c34
Change-Id: Ie618a964574780d2ad72192483b399407c7a0bbe
2023-02-21 20:12:03 +00:00
C. Scott Ananian
544a479a27 Parser: don't set TOCData if __NOTOC__ is used on the page
Ensure that TOCData is non-null if there is a valid table of contents
for the page -- that is, it is not suppressed (due to non-wikitext
content) and the editor hasn't used explicitly suppressed it, for
example by using __NOTOC__.  (Note that __FORCETOC__ and __TOC__
both intentionally override __NOTOC__, and the TOCData will be
non-null if those are set, regardless of whether __NOTOC__ is
present.)

This gives skins the information they need to make their own
decisions about whether the table of contents is "big enough"
to be interesting, without forcing them also to reimplement the
__NOTOC__ logic.

Note that the SHOW_TOC parser output flag is provided for
legacy compatibility; it is set only when the TOC is "big enough"
under the legacy skin, which injects the TOC into the article HTML.

Change-Id: Ife2126ace95ac4d9ec44f6374c63d8fc995cf034
2023-02-16 17:44:24 -05:00
C. Scott Ananian
d5b39490ca Remove back-compatibility code for ToC marker
Before 1.39 we used <mw:toc> and in 1.39 we switched to <mw:tocplace/>
(commit 24949480eb).  This was changed
to a <meta> tag in 1.40 (commit
0b10563895 and
fa8646ca7b) and the old content has long
since expired from the ParserCache.  Clean up the old ParserCache
transition code.

Change-Id: I3254d0acba31e107b50767797a2b0ad28aba59ee
2023-02-10 00:03:54 -05:00
jenkins-bot
9f2e36641c Merge "Reorg: Move category-related classes from includes/ to Category/" 2023-02-09 23:20:40 +00:00
Arlo Breault
34d599bf11 Do not use media filename as alt attribute
Matches Parsoid's current output.

Not a canonical source, but, this site says,
https://help.siteimprove.com/support/solutions/articles/80000863904-accessibility-image-alt-text-best-practices

> If no alt attribute is present, the screen reader will read the file
> name for the image instead, which can be a major distraction to those
> using screen-reading technology.

So, reading the filename seems to be a default behaviour anyways and
using the filename doesn't seem to adding any benefit.  However, placing
it preempts any improvements that might happen in screen reading
technology since the screen reader would likely prefer the alt attribute
to any magic it tries to do in its absence (like machine vision
processing of the image).

An alternative proposal would be to strip off the file extension as in
I218e5565816b7643f3b85083031644e3e4749a5c and implement the same in
Parsoid.

Longer term plans that actually address the issue here are in T325955
and T63566.

Bug: T326041
Bug: T63566
Bug: T325955
Depends-On: I7b1f07190e8eaca5cbda38d9ce366aa60041ab81
Depends-On: I9dd37f70be8163df76c154f175ef50134fb811d8
Depends-On: If9cdabdfac26656272fcf3b4aaae0576aaed1346
Change-Id: If1e55feb86ce8b32f772e3b78bc9d29f122f4d58
2023-02-09 17:29:32 -05:00
Amir Sarabadani
c8116223b4 Reorg: Move category-related classes from includes/ to Category/
Bug: T321882
Change-Id: I0b86acfdeaa3a2a0a14b7763fd088122820bafdc
2023-02-09 20:18:54 +01:00
jenkins-bot
f5cf8e9eff Merge "Don't emit empty alt attribute unless it's explicitly asked for" 2023-02-09 18:20:18 +00:00
Arlo Breault
ee1d5248b4 Don't emit empty alt attribute unless it's explicitly asked for
From https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img#attr-alt

> Setting this attribute to an empty string (alt="") indicates that
> this image is not a key part of the content (it's decoration or a
> tracking pixel), and that non-visual browsers may omit it from
> rendering. Visual browsers will also hide the broken image icon if the
> alt is empty and the image failed to display.

This matches Parsoid's current output as well.

The parserTest "Image: empty alt attribute (T50924)" asserts that the
empty media options (|alt=|) is still respected.

Depends-On: Id6ad0b922f8384f2bbf08e1032b0197aa3136233
Change-Id: I8d059852f472b40b4f4f80a8fa12230f6f4f13ad
2023-02-09 12:57:48 -05:00
Umherirrender
ed169d991e Remove unused arguments to private functions
Found by phan dead detection

Change-Id: I93379b7b9a733206d0e53add04fcdb9478c58755
2023-02-08 19:00:47 +00:00
jenkins-bot
38a241252a Merge "docs: Add missing StubUserLang type to some @param/@return" 2023-02-03 23:45:03 +00:00
Subramanya Sastry
efbcf9117a Use TOCData methods to process new headings
* The removed code has been extracted to Parsoid's TOCData class
  to enable reuse and avoid code duplication.

Change-Id: Id17cf037b3a2bd4f9de0a12ebb382f3974244091
2023-02-01 13:12:28 -05:00
C. Scott Ananian
439656e019 Generate TOC HTML on demand in ParserOutput::getText()
* Rather than computing TOC HTML in Parser and setting it in
  ParserOutput, compute it on demand based on section metadata.

  This will let Parsoid set section metadata in ParserOutput
  and have the TOC generated automatically.

* This required fixing some "bugs" in Linker's generateTOC
  which didn't properly close tags and relied on Tidy to fix
  up unclosed li and ul tags.

* This patch relies on converting section metadata objects to
  array objects, but Linker::generateTOC could be converted to
  use TOC data instead.

* Since TOC generation is now moved to getText(), this is done
  post-PC load and this eliminates the parser cache split on
  user language for TOC heading localization.

Bug: T293513
Change-Id: Ief1bba326d3612b40930440c872a61abadffab10
2023-01-25 16:42:16 -05:00
jenkins-bot
6b66390f82 Merge "Add Parser::msg() helper for messages from extensions or parser functions" 2023-01-24 20:37:39 +00:00
Jon Robson
212529a68a Self link fragments should be properly escaped
Bug: T327467
Change-Id: Ic0625f8503ad7f6c918dcb263c5a4ef27d191759
2023-01-23 15:02:11 -08:00