XSLT 2.0 中的分组类似于 br 到 p 问题

发布于 2024-10-26 15:08:39 字数 3032 浏览 5 评论 0原文

在 XSLT 1.0 中,论坛中的一个常见问题是如何将平面 HTML 转换为分层 XML,这很多时候归结为在

中的
标记之间嵌套文本;
标签。

我有一个类似的问题,我认为我已经使用 XSLT 2.0 部分解决了这个问题,但这对我来说是一种新方法,我想获得第二意见。

XHTML 源代码中散布着。它们可以出现在几个不同的父节点中。我想将一个页面开始标记和下一页之间的所有节点包装在 节点中。我目前的解决方案是:

<xsl:template match="*[child::span[@class='pageStart']]">
  <xsl:copy>
    <xsl:copy-of select="@*" />
      <xsl:for-each-group select="node()" 
                          group-starting-with="span[@class='pageStart']">
        <page>
          <xsl:apply-templates select="current-group()"/>
        </page>
      </xsl:for-each-group>
  </xsl:copy>
</xsl:template>

这至少有一个缺陷——当我不需要它时,标记的父节点会获取一个 作为子节点。在其他作品中,如果

中的任何位置都有子页面标记,则会创建 节点作为
除了我期望的位置之外。

我曾希望我可以简单地将模板规则设置为 但 current-group() 似乎是空的 no不管我尝试什么。我尝试的常识方法是

有没有更简单的方法来解决我所缺少的这个问题?

编辑

这是输入的示例:

<?xml version="1.0" encoding="UTF-8"?>
<html>
<head></head>
<body>
    <span class="pageStart"/>
    <p>...</p>
    <div>...</div>
    <img />
    <p></p>
    <span class="pageStart"/>
    <div>...</div>
    <span class="pageStart"/>
    <p>...</p>
    <div>
        <span class="pageStart"/>
        <p>...</p>
        <p>...</p>
        <span class="pageStart"/>
        <div>...</div>
        <img/>
    </div>
</body>
</html>

我假设最后两个嵌套页面使这个问题变得更加困难,所以我非常高兴将其作为输出或类似的输出:

<?xml version="1.0" encoding="UTF-8"?>
<html>
<head></head>
<body>
    <page>
        <span class="pageStart"/>
        <p>...</p>
        <div>...</div>
        <img />
        <p></p>
    </page>
    <page>
        <span class="pageStart"/>
        <div>...</div>
    </page>
    <page>
        <span class="pageStart"/>
        <p>...</p>
        <div>
            <page>
                <span class="pageStart"/>
                <p>...</p>
                <p>...</p>
            </page>
            <page>
                <span class="pageStart"/>
                <div>...</div>
                <img/>
            </page>
        </div>
    </page>
</body>
</html>

In XSLT 1.0, a common question in forums was how to convert flat HTML into hierarchical XML, which many times boiled down to nesting text in between <br /> tags in <p> tags.

I have a similar problem, which I think I've partially solved using XSLT 2.0, but it's a new approach to me and I'd like to get a second opinion.

The XHTML source has <span class="pageStart"></span> scattered throughout. They can appear in several different parent nodes. I want to wrap all the nodes between one page start marker and the next in an <page> node. The solution I currently have is:

<xsl:template match="*[child::span[@class='pageStart']]">
  <xsl:copy>
    <xsl:copy-of select="@*" />
      <xsl:for-each-group select="node()" 
                          group-starting-with="span[@class='pageStart']">
        <page>
          <xsl:apply-templates select="current-group()"/>
        </page>
      </xsl:for-each-group>
  </xsl:copy>
</xsl:template>

There's at least one flaw with this -- the parent node of the marker gets a <page> as a child node when I don't want it. In other works, if there's a <div> that has a child page marker anywhere in it, an <page> node is created as an immediate child of <div> in addition to the locations I expect.

I had hoped that I could simply make the template rule be <xsl:template match="span[@class='pageStart']"> but current-group() seems to be empty no matter what I try. The common sense approach I tried was <xsl:for-each-group select="node()" group-starting-with="span[@class='pageStart']">.

Is there an easier way to solve this problem that I'm missing?

EDIT

Here's an example of the input:

<?xml version="1.0" encoding="UTF-8"?>
<html>
<head></head>
<body>
    <span class="pageStart"/>
    <p>...</p>
    <div>...</div>
    <img />
    <p></p>
    <span class="pageStart"/>
    <div>...</div>
    <span class="pageStart"/>
    <p>...</p>
    <div>
        <span class="pageStart"/>
        <p>...</p>
        <p>...</p>
        <span class="pageStart"/>
        <div>...</div>
        <img/>
    </div>
</body>
</html>

I assume the last two nested pages make this problem more difficult, so I'd be perfectly happy getting this as the output, or something close:

<?xml version="1.0" encoding="UTF-8"?>
<html>
<head></head>
<body>
    <page>
        <span class="pageStart"/>
        <p>...</p>
        <div>...</div>
        <img />
        <p></p>
    </page>
    <page>
        <span class="pageStart"/>
        <div>...</div>
    </page>
    <page>
        <span class="pageStart"/>
        <p>...</p>
        <div>
            <page>
                <span class="pageStart"/>
                <p>...</p>
                <p>...</p>
            </page>
            <page>
                <span class="pageStart"/>
                <div>...</div>
                <img/>
            </page>
        </div>
    </page>
</body>
</html>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

热血少△年 2024-11-02 15:08:39

此转换

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="*[span/@class='pageStart']">
  <xsl:copy>
   <xsl:copy-of select="@*"/>
   <xsl:for-each-group select="node()"
       group-starting-with="span[@class='pageStart']">
     <page>
      <xsl:apply-templates select="current-group()"/>
     </page>
   </xsl:for-each-group>
  </xsl:copy>
 </xsl:template>
</xsl:stylesheet>

应用于提供的 XML 文档时:

<html>
<head></head>
<body>
    <span class="pageStart"/>
    <p>...</p>
    <div>...</div>
    <img />
    <p></p>
    <span class="pageStart"/>
    <div>...</div>
    <span class="pageStart"/>
    <p>...</p>
    <div>
        <span class="pageStart"/>
        <p>...</p>
        <p>...</p>
        <span class="pageStart"/>
        <div>...</div>
        <img/>
    </div>
</body>
</html>

产生所需的正确结果:

<html>
   <head/>
   <body>
      <page>
         <span class="pageStart"/>
         <p>...</p>
         <div>...</div>
         <img/>
         <p/>
      </page>
      <page>
         <span class="pageStart"/>
         <div>...</div>
      </page>
      <page>
         <span class="pageStart"/>
         <p>...</p>
         <div>
            <page>
               <span class="pageStart"/>
               <p>...</p>
               <p>...</p>
            </page>
            <page>
               <span class="pageStart"/>
               <div>...</div>
               <img/>
            </page>
         </div>
      </page>
   </body>
</html>

This transformation:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="*[span/@class='pageStart']">
  <xsl:copy>
   <xsl:copy-of select="@*"/>
   <xsl:for-each-group select="node()"
       group-starting-with="span[@class='pageStart']">
     <page>
      <xsl:apply-templates select="current-group()"/>
     </page>
   </xsl:for-each-group>
  </xsl:copy>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<html>
<head></head>
<body>
    <span class="pageStart"/>
    <p>...</p>
    <div>...</div>
    <img />
    <p></p>
    <span class="pageStart"/>
    <div>...</div>
    <span class="pageStart"/>
    <p>...</p>
    <div>
        <span class="pageStart"/>
        <p>...</p>
        <p>...</p>
        <span class="pageStart"/>
        <div>...</div>
        <img/>
    </div>
</body>
</html>

produces the wanted, correct result:

<html>
   <head/>
   <body>
      <page>
         <span class="pageStart"/>
         <p>...</p>
         <div>...</div>
         <img/>
         <p/>
      </page>
      <page>
         <span class="pageStart"/>
         <div>...</div>
      </page>
      <page>
         <span class="pageStart"/>
         <p>...</p>
         <div>
            <page>
               <span class="pageStart"/>
               <p>...</p>
               <p>...</p>
            </page>
            <page>
               <span class="pageStart"/>
               <div>...</div>
               <img/>
            </page>
         </div>
      </page>
   </body>
</html>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文