XSLT:将分组 html 元素移动到节级别

发布于 2024-10-09 10:45:07 字数 2667 浏览 0 评论 0原文

我正在尝试编写一个 XSLT,根据标头级别将 HTML 文件组织为不同的部分级别。这是我的输入:

<html>
 <head>
  <title></title>
 </head>
 <body>
  <h1>HEADER 1 CONTENT</h1>
  <p>Level 1 para</p>
  <p>Level 1 para</p>
  <p>Level 1 para</p>
  <p>Level 1 para</p>

  <h2>Header 2 CONTENT</h2>
  <p>Level 2 para</p>
  <p>Level 2 para</p>
  <p>Level 2 para</p>
  <p>Level 2 para</p>
 </body>
</html>

我目前正在使用一个相当简单的结构,因此这种模式暂时保持不变。我需要这样的输出...

<document> 
  <section level="1">
     <header1>Header 1 CONTENT</header1>
     <p>Level 1 para</p>
     <p>Level 1 para</p>
     <p>Level 1 para</p>
     <p>Level 1 para</p>
     <section level="2">
        <header2>Header 2 CONTENT</header2>
        <p>Level 2 para</p>
        <p>Level 2 para</p>
        <p>Level 2 para</p>
        <p>Level 2 para</p>
     </section>
  </section>
</document>

我一直在使用这个示例: Stackoverflow 答案

但是,我无法让它完全按照我的需要进行操作。

我正在使用 Saxon 9 在 Oxygen for dev 中运行 xslt。我将在生产中使用 cmd/bat 文件。仍然是 Saxon 9。如果可能的话,我想处理最多 4 个嵌套节级别。

非常感谢任何帮助!

我需要附加到此,因为我遇到了另一个规定。我可能早就该想到这一点了。

我遇到以下代码示例

<html>
<head>
<title></title>
</head>
<body>
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>

<h1>Header 2 CONTENT</h1>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
</body>
</html>

如您所见,

的子级,而在我的第一个代码段 < ;p> 始终是标头级别的子级。我想要的结果与上面相同,只是当我遇到

作为 的子级时,它应该包裹在 <部分级别=“1”>

<document> 
<section level="1">     
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>
</section>
<section level="1">
<header1>Header 2 CONTENT</header1>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
</section>
</document>

I'm trying to write an XSLT that organizes an HTML file into different section levels depending on the header level. Here is my input:

<html>
 <head>
  <title></title>
 </head>
 <body>
  <h1>HEADER 1 CONTENT</h1>
  <p>Level 1 para</p>
  <p>Level 1 para</p>
  <p>Level 1 para</p>
  <p>Level 1 para</p>

  <h2>Header 2 CONTENT</h2>
  <p>Level 2 para</p>
  <p>Level 2 para</p>
  <p>Level 2 para</p>
  <p>Level 2 para</p>
 </body>
</html>

I'm working with a fairly simple structure at the moment so this pattern will be constant for the time-being. I need an output like this...

<document> 
  <section level="1">
     <header1>Header 1 CONTENT</header1>
     <p>Level 1 para</p>
     <p>Level 1 para</p>
     <p>Level 1 para</p>
     <p>Level 1 para</p>
     <section level="2">
        <header2>Header 2 CONTENT</header2>
        <p>Level 2 para</p>
        <p>Level 2 para</p>
        <p>Level 2 para</p>
        <p>Level 2 para</p>
     </section>
  </section>
</document>

I had been working with this example: Stackoverflow Answer

However, I cannot get it to do exactly what I need.

I'm using Saxon 9 to run the xslt within Oxygen for dev. I'll be using a cmd/bat file in production. Still Saxon 9. I'd like to handle up to 4 nested section levels if possible.

Any help is much appreciated!

I need to append onto this as I've encountered another stipulation. I probably should have thought of this before.

I'm encountering the following code sample

<html>
<head>
<title></title>
</head>
<body>
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>

<h1>Header 2 CONTENT</h1>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
</body>
</html>

As you can see, the <p> is a child of <body> while in my first snippet, <p> was always a child of a header level. My desired result is the same as above except that when I encounter <p> as a child of <body>, it should be wrapped in <section level="1">.

<document> 
<section level="1">     
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>
</section>
<section level="1">
<header1>Header 2 CONTENT</header1>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
</section>
</document>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

香橙ぽ 2024-10-16 10:45:07

这是一个 XSLT 2.0 样式表:

<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:mf="http://example.com/mf"
  exclude-result-prefixes="xs mf"
  version="2.0">

  <xsl:output indent="yes"/>

  <xsl:function name="mf:group" as="node()*">
    <xsl:param name="elements" as="element()*"/>
    <xsl:param name="level" as="xs:integer"/>
    <xsl:for-each-group select="$elements" group-starting-with="*[local-name() eq concat('h', $level)]">
      <xsl:choose>
        <xsl:when test="self::*[local-name() eq concat('h', $level)]">
          <section level="{$level}">
            <xsl:element name="header{$level}"><xsl:apply-templates/></xsl:element>
            <xsl:sequence select="mf:group(current-group() except ., $level + 1)"/>
          </section>
        </xsl:when>
        <xsl:otherwise>
          <xsl:apply-templates select="current-group()"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:for-each-group>
  </xsl:function>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@*, node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/html">
    <document>
      <xsl:apply-templates select="body"/>
    </document>
  </xsl:template>

  <xsl:template match="body">
    <xsl:sequence select="mf:group(*, 1)"/>
  </xsl:template>

</xsl:stylesheet>

它应该执行您所要求的操作,尽管它不会停止在四个嵌套级别,而是只要找到 h[n] 元素就进行分组。

Here is an XSLT 2.0 stylesheet:

<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:mf="http://example.com/mf"
  exclude-result-prefixes="xs mf"
  version="2.0">

  <xsl:output indent="yes"/>

  <xsl:function name="mf:group" as="node()*">
    <xsl:param name="elements" as="element()*"/>
    <xsl:param name="level" as="xs:integer"/>
    <xsl:for-each-group select="$elements" group-starting-with="*[local-name() eq concat('h', $level)]">
      <xsl:choose>
        <xsl:when test="self::*[local-name() eq concat('h', $level)]">
          <section level="{$level}">
            <xsl:element name="header{$level}"><xsl:apply-templates/></xsl:element>
            <xsl:sequence select="mf:group(current-group() except ., $level + 1)"/>
          </section>
        </xsl:when>
        <xsl:otherwise>
          <xsl:apply-templates select="current-group()"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:for-each-group>
  </xsl:function>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@*, node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/html">
    <document>
      <xsl:apply-templates select="body"/>
    </document>
  </xsl:template>

  <xsl:template match="body">
    <xsl:sequence select="mf:group(*, 1)"/>
  </xsl:template>

</xsl:stylesheet>

It should do what you asked for, although it does not stop at four nested levels but rather groups as long as it finds h[n] elements.

油焖大侠 2024-10-16 10:45:07

XSLT 1.0 解决方案(基本上是 Jenni Tennison 借用的):

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="html">
   <document><xsl:apply-templates/></document>
 </xsl:template>

 <xsl:template match="body">
   <xsl:apply-templates select="h1" />
 </xsl:template>

 <xsl:key name="next-headings" match="h6"
          use="generate-id(preceding-sibling::*[self::h1 or self::h2 or
                                               self::h3 or self::h4 or
                                               self::h5][1])" />
 <xsl:key name="next-headings" match="h5"
          use="generate-id(preceding-sibling::*[self::h1 or self::h2 or
                                               self::h3 or self::h4][1])" />
 <xsl:key name="next-headings" match="h4"
          use="generate-id(preceding-sibling::*[self::h1 or self::h2 or
                                               self::h3][1])" />
 <xsl:key name="next-headings" match="h3"
          use="generate-id(preceding-sibling::*[self::h1 or self::h2][1])" />
 <xsl:key name="next-headings" match="h2"
          use="generate-id(preceding-sibling::h1[1])" />

 <xsl:key name="immediate-nodes"
          match="node()[not(self::h1 | self::h2 | self::h3 | self::h4 |
                           self::h5 | self::h6)]"
          use="generate-id(preceding-sibling::*[self::h1 or self::h2 or
                                               self::h3 or self::h4 or
                                               self::h5 or self::h6][1])" />

 <xsl:template match="h1 | h2 | h3 | h4 | h5 | h6">
   <xsl:variable name="vLevel" select="substring-after(name(), 'h')" />
   <section level="{$vLevel}">
      <xsl:element name="header{$vLevel}">
        <xsl:apply-templates />
      </xsl:element>
      <xsl:apply-templates select="key('immediate-nodes', generate-id())" />
      <xsl:apply-templates select="key('next-headings', generate-id())" />
   </section>
 </xsl:template>

 <xsl:template match="/*/*/node()" priority="-20">
   <xsl:copy-of select="." />
 </xsl:template>
</xsl:stylesheet>

当此转换应用于以下 XML 文档时

<html>
    <body>
        <h1>1</h1>
        <p>1</p>
        <h2>1.1</h2>
        <p>2</p>
        <h3>1.1.1</h3>
        <p>3</p>
        <h2>1.2</h2>
        <p>4</p>
        <h1>2</h1>
        <p>5</p>
        <h2>2.1</h2>
        <p>6</p>
    </body>
</html>

生成所需的结果

<document>
   <section level="1">
      <header1>1</header1>
      <p>1</p>
      <section level="2">
         <header2>1.1</header2>
         <p>2</p>
         <section level="3">
            <header3>1.1.1</header3>
            <p>3</p>
         </section>
      </section>
      <section level="2">
         <header2>1.2</header2>
         <p>4</p>
      </section>
   </section>
   <section level="1">
      <header1>2</header1>
      <p>5</p>
      <section level="2">
         <header2>2.1</header2>
         <p>6</p>
      </section>
   </section>
</document>

An XSLT 1.0 solution (essentially borrowed by Jenni Tennison):

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="html">
   <document><xsl:apply-templates/></document>
 </xsl:template>

 <xsl:template match="body">
   <xsl:apply-templates select="h1" />
 </xsl:template>

 <xsl:key name="next-headings" match="h6"
          use="generate-id(preceding-sibling::*[self::h1 or self::h2 or
                                               self::h3 or self::h4 or
                                               self::h5][1])" />
 <xsl:key name="next-headings" match="h5"
          use="generate-id(preceding-sibling::*[self::h1 or self::h2 or
                                               self::h3 or self::h4][1])" />
 <xsl:key name="next-headings" match="h4"
          use="generate-id(preceding-sibling::*[self::h1 or self::h2 or
                                               self::h3][1])" />
 <xsl:key name="next-headings" match="h3"
          use="generate-id(preceding-sibling::*[self::h1 or self::h2][1])" />
 <xsl:key name="next-headings" match="h2"
          use="generate-id(preceding-sibling::h1[1])" />

 <xsl:key name="immediate-nodes"
          match="node()[not(self::h1 | self::h2 | self::h3 | self::h4 |
                           self::h5 | self::h6)]"
          use="generate-id(preceding-sibling::*[self::h1 or self::h2 or
                                               self::h3 or self::h4 or
                                               self::h5 or self::h6][1])" />

 <xsl:template match="h1 | h2 | h3 | h4 | h5 | h6">
   <xsl:variable name="vLevel" select="substring-after(name(), 'h')" />
   <section level="{$vLevel}">
      <xsl:element name="header{$vLevel}">
        <xsl:apply-templates />
      </xsl:element>
      <xsl:apply-templates select="key('immediate-nodes', generate-id())" />
      <xsl:apply-templates select="key('next-headings', generate-id())" />
   </section>
 </xsl:template>

 <xsl:template match="/*/*/node()" priority="-20">
   <xsl:copy-of select="." />
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the following XML document:

<html>
    <body>
        <h1>1</h1>
        <p>1</p>
        <h2>1.1</h2>
        <p>2</p>
        <h3>1.1.1</h3>
        <p>3</p>
        <h2>1.2</h2>
        <p>4</p>
        <h1>2</h1>
        <p>5</p>
        <h2>2.1</h2>
        <p>6</p>
    </body>
</html>

the wanted result is produced:

<document>
   <section level="1">
      <header1>1</header1>
      <p>1</p>
      <section level="2">
         <header2>1.1</header2>
         <p>2</p>
         <section level="3">
            <header3>1.1.1</header3>
            <p>3</p>
         </section>
      </section>
      <section level="2">
         <header2>1.2</header2>
         <p>4</p>
      </section>
   </section>
   <section level="1">
      <header1>2</header1>
      <p>5</p>
      <section level="2">
         <header2>2.1</header2>
         <p>6</p>
      </section>
   </section>
</document>
网白 2024-10-16 10:45:07

XSLT 1.0 输出中更通用的分组

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:key name="kHeaderByPreceding"
             match="body/*[starts-with(name(),'h')]"
             use="generate-id(preceding-sibling::*
                                 [starts-with(name(),'h')]
                                 [substring(name(current()),2)
                                   > substring(name(),2)][1])"/>
    <xsl:key name="kElementByPreceding"
             match="body/*[not(starts-with(name(),'h'))]"
             use="generate-id(preceding-sibling::*
                                 [starts-with(name(),'h')][1])"/>
    <xsl:template match="node()|@*" mode="copy">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*" mode="copy"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="body">
        <document>
            <xsl:apply-templates select="key('kHeaderByPreceding','')"/>
        </document>
    </xsl:template>
    <xsl:template match="body/*[starts-with(name(),'h')]">
        <section level="{substring(name(),2)}">
            <xsl:element name="header{substring(name(),2)}">
                <xsl:apply-templates mode="copy"/>
            </xsl:element>
            <xsl:apply-templates select="key('kElementByPreceding',
                                             generate-id())"
                                 mode="copy"/>
            <xsl:apply-templates select="key('kHeaderByPreceding',
                                             generate-id())"/>
        </section>
    </xsl:template>
    <xsl:template match="text()"/>
</xsl:stylesheet>

<document>
    <section level="1">
        <header1>HEADER 1 CONTENT</header1>
        <p>Level 1 para</p>
        <p>Level 1 para</p>
        <p>Level 1 para</p>
        <p>Level 1 para</p>
        <section level="2">
            <header2>Header 2 CONTENT</header2>
            <p>Level 2 para</p>
            <p>Level 2 para</p>
            <p>Level 2 para</p>
            <p>Level 2 para</p>
        </section>
    </section>
</document>

以及更复杂的输入示例,例如:

<body>
    <h1>1</h1>
    <p>1</p>
    <h2>1.1</h2>
    <p>2</p>
    <h3>1.1.1</h3>
    <p>3</p>
    <h2>1.2</h2>
    <p>4</p>
    <h1>2</h1>
    <p>5</p>
    <h2>2.1</h2>
    <p>6</p>
</body>

输出:

<document>
    <section level="1">
        <header1>1</header1>
        <p>1</p>
        <section level="2">
            <header2>1.1</header2>
            <p>2</p>
            <section level="3">
                <header3>1.1.1</header3>
                <p>3</p>
            </section>
        </section>
        <section level="2">
            <header2>1.2</header2>
            <p>4</p>
        </section>
    </section>
    <section level="1">
        <header1>2</header1>
        <p>5</p>
        <section level="2">
            <header2>2.1</header2>
            <p>6</p>
        </section>
    </section>
</document>

A more general grouping in XSLT 1.0

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:key name="kHeaderByPreceding"
             match="body/*[starts-with(name(),'h')]"
             use="generate-id(preceding-sibling::*
                                 [starts-with(name(),'h')]
                                 [substring(name(current()),2)
                                   > substring(name(),2)][1])"/>
    <xsl:key name="kElementByPreceding"
             match="body/*[not(starts-with(name(),'h'))]"
             use="generate-id(preceding-sibling::*
                                 [starts-with(name(),'h')][1])"/>
    <xsl:template match="node()|@*" mode="copy">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*" mode="copy"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="body">
        <document>
            <xsl:apply-templates select="key('kHeaderByPreceding','')"/>
        </document>
    </xsl:template>
    <xsl:template match="body/*[starts-with(name(),'h')]">
        <section level="{substring(name(),2)}">
            <xsl:element name="header{substring(name(),2)}">
                <xsl:apply-templates mode="copy"/>
            </xsl:element>
            <xsl:apply-templates select="key('kElementByPreceding',
                                             generate-id())"
                                 mode="copy"/>
            <xsl:apply-templates select="key('kHeaderByPreceding',
                                             generate-id())"/>
        </section>
    </xsl:template>
    <xsl:template match="text()"/>
</xsl:stylesheet>

Output:

<document>
    <section level="1">
        <header1>HEADER 1 CONTENT</header1>
        <p>Level 1 para</p>
        <p>Level 1 para</p>
        <p>Level 1 para</p>
        <p>Level 1 para</p>
        <section level="2">
            <header2>Header 2 CONTENT</header2>
            <p>Level 2 para</p>
            <p>Level 2 para</p>
            <p>Level 2 para</p>
            <p>Level 2 para</p>
        </section>
    </section>
</document>

And with a more complex input sample like:

<body>
    <h1>1</h1>
    <p>1</p>
    <h2>1.1</h2>
    <p>2</p>
    <h3>1.1.1</h3>
    <p>3</p>
    <h2>1.2</h2>
    <p>4</p>
    <h1>2</h1>
    <p>5</p>
    <h2>2.1</h2>
    <p>6</p>
</body>

Output:

<document>
    <section level="1">
        <header1>1</header1>
        <p>1</p>
        <section level="2">
            <header2>1.1</header2>
            <p>2</p>
            <section level="3">
                <header3>1.1.1</header3>
                <p>3</p>
            </section>
        </section>
        <section level="2">
            <header2>1.2</header2>
            <p>4</p>
        </section>
    </section>
    <section level="1">
        <header1>2</header1>
        <p>5</p>
        <section level="2">
            <header2>2.1</header2>
            <p>6</p>
        </section>
    </section>
</document>
太阳公公是暖光 2024-10-16 10:45:07

我能够为上面的附录找到一些有用的东西。我在正文模板中添加了逻辑来测试标头标签。它可能不适用于所有情况,但它对于我的任务来说效果很好。

<xsl:template match="body">
<xsl:choose>
<xsl:when test="descendant::h1">
<xsl:apply-templates/>
</xsl:when>
<xsl:otherwise>
<section level="1">
<item>
<block ccm="yes" onbup="no" quickref="no" web="no">
<xsl:apply-templates/>
</block>
</item>
</section>              
</xsl:otherwise>
</xsl:choose>        
</xsl:template>

I was able to get something working for my addendum above. I added logic into the body template to test for header tags. It may not work for every situation, but it is doing well for my task.

<xsl:template match="body">
<xsl:choose>
<xsl:when test="descendant::h1">
<xsl:apply-templates/>
</xsl:when>
<xsl:otherwise>
<section level="1">
<item>
<block ccm="yes" onbup="no" quickref="no" web="no">
<xsl:apply-templates/>
</block>
</item>
</section>              
</xsl:otherwise>
</xsl:choose>        
</xsl:template>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文