XSLT 2.0 中的分组类似于 br 到 p 问题
在 XSLT 1.0 中,论坛中的一个常见问题是如何将平面 HTML 转换为分层 XML,这很多时候归结为在
中的
标签。
标记之间嵌套文本;
我有一个类似的问题,我认为我已经使用 XSLT 2.0 部分解决了这个问题,但这对我来说是一种新方法,我想获得第二意见。
XHTML 源代码中散布着。它们可以出现在几个不同的父节点中。我想将一个页面开始标记和下一页之间的所有节点包装在
节点中。我目前的解决方案是:
<xsl:template match="*[child::span[@class='pageStart']]">
<xsl:copy>
<xsl:copy-of select="@*" />
<xsl:for-each-group select="node()"
group-starting-with="span[@class='pageStart']">
<page>
<xsl:apply-templates select="current-group()"/>
</page>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
这至少有一个缺陷——当我不需要它时,标记的父节点会获取一个
作为子节点。在其他作品中,如果
中的任何位置都有子页面标记,则会创建
节点作为
除了我期望的位置之外。我曾希望我可以简单地将模板规则设置为
但 current-group() 似乎是空的 no不管我尝试什么。我尝试的常识方法是
。
有没有更简单的方法来解决我所缺少的这个问题?
编辑
这是输入的示例:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<head></head>
<body>
<span class="pageStart"/>
<p>...</p>
<div>...</div>
<img />
<p></p>
<span class="pageStart"/>
<div>...</div>
<span class="pageStart"/>
<p>...</p>
<div>
<span class="pageStart"/>
<p>...</p>
<p>...</p>
<span class="pageStart"/>
<div>...</div>
<img/>
</div>
</body>
</html>
我假设最后两个嵌套页面使这个问题变得更加困难,所以我非常高兴将其作为输出或类似的输出:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<head></head>
<body>
<page>
<span class="pageStart"/>
<p>...</p>
<div>...</div>
<img />
<p></p>
</page>
<page>
<span class="pageStart"/>
<div>...</div>
</page>
<page>
<span class="pageStart"/>
<p>...</p>
<div>
<page>
<span class="pageStart"/>
<p>...</p>
<p>...</p>
</page>
<page>
<span class="pageStart"/>
<div>...</div>
<img/>
</page>
</div>
</page>
</body>
</html>
In XSLT 1.0, a common question in forums was how to convert flat HTML into hierarchical XML, which many times boiled down to nesting text in between <br />
tags in <p>
tags.
I have a similar problem, which I think I've partially solved using XSLT 2.0, but it's a new approach to me and I'd like to get a second opinion.
The XHTML source has <span class="pageStart"></span>
scattered throughout. They can appear in several different parent nodes. I want to wrap all the nodes between one page start marker and the next in an <page>
node. The solution I currently have is:
<xsl:template match="*[child::span[@class='pageStart']]">
<xsl:copy>
<xsl:copy-of select="@*" />
<xsl:for-each-group select="node()"
group-starting-with="span[@class='pageStart']">
<page>
<xsl:apply-templates select="current-group()"/>
</page>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
There's at least one flaw with this -- the parent node of the marker gets a <page>
as a child node when I don't want it. In other works, if there's a <div>
that has a child page marker anywhere in it, an <page>
node is created as an immediate child of <div>
in addition to the locations I expect.
I had hoped that I could simply make the template rule be <xsl:template match="span[@class='pageStart']">
but current-group() seems to be empty no matter what I try. The common sense approach I tried was <xsl:for-each-group select="node()" group-starting-with="span[@class='pageStart']">
.
Is there an easier way to solve this problem that I'm missing?
EDIT
Here's an example of the input:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<head></head>
<body>
<span class="pageStart"/>
<p>...</p>
<div>...</div>
<img />
<p></p>
<span class="pageStart"/>
<div>...</div>
<span class="pageStart"/>
<p>...</p>
<div>
<span class="pageStart"/>
<p>...</p>
<p>...</p>
<span class="pageStart"/>
<div>...</div>
<img/>
</div>
</body>
</html>
I assume the last two nested pages make this problem more difficult, so I'd be perfectly happy getting this as the output, or something close:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<head></head>
<body>
<page>
<span class="pageStart"/>
<p>...</p>
<div>...</div>
<img />
<p></p>
</page>
<page>
<span class="pageStart"/>
<div>...</div>
</page>
<page>
<span class="pageStart"/>
<p>...</p>
<div>
<page>
<span class="pageStart"/>
<p>...</p>
<p>...</p>
</page>
<page>
<span class="pageStart"/>
<div>...</div>
<img/>
</page>
</div>
</page>
</body>
</html>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
此转换:
应用于提供的 XML 文档时:
产生所需的正确结果:
This transformation:
when applied on the provided XML document:
produces the wanted, correct result: