使用 XSLT 处理循环依赖关系
我正在处理一个 XML 文件,简化后看起来像这样:
<resources>
<resource id="a">
<dependency idref="b"/>
<!-- some other stuff -->
</resource>
<resource id="b">
<!-- some other stuff -->
</resource>
</resources>
XSLT 样式表必须处理我们感兴趣的特定资源(我将其称为根资源)以及所有递归依赖项。依赖项是其他资源,由其 id 属性唯一标识。
资源处理两次并不重要,但最好只处理每个所需资源一次。以什么顺序处理资源也并不重要。
重要的是仅根资源及其递归依赖项得到处理。我们不能仅仅处理所有资源并使用它。
一个简单的实现如下:
<xsl:key name="resource-id" match="resource" use="@id"/>
<xsl:template match="resource">
<!-- do whatever is required to process the resource. -->
<!-- then handle any dependencies -->
<xsl:apply-templates select="key('resource-id', dependency/@idref)"/>
</xsl:template>
此实现对于上面的示例以及许多实际情况都运行良好。它确实有一个缺点,即它经常多次处理相同的资源,但如上所述,这并不是很重要。
问题在于,有时资源具有循环依赖性:
<resources>
<resource id="a">
<dependency idref="b"/>
<dependency idref="d"/>
</resource>
<resource id="b">
<dependency idref="c"/>
</resource>
<resource id="c">
<dependency idref="a"/>
</resource>
<resource id="d"/>
</resources>
如果您使用简单的实现来处理此示例,并且首先处理 a、b 或 c,你会得到无限递归。
不幸的是,我无法控制输入数据,并且在任何情况下循环依赖关系都是完全有效的并且是相关规范所允许的。
我提出了各种部分解决方案,但没有一种方法适用于所有情况。
理想的解决方案是防止节点被多次处理的通用方法,但我认为这是不可能的。事实上,我怀疑整个问题是不可能解决的。
如果有帮助的话,我有大部分 EXSLT 可用(包括函数)。如果有必要,我还可以使用任意数量的其他 XSLT 脚本来预处理输入,但最好不要对不会出现在输出中的资源进行过多的预处理。
我不能做的是切换到使用另一种语言来处理这个问题(至少在没有大量重新设计的情况下)。我也无法使用 XSLT 2.0。
有什么想法吗?
I’m processing an XML file that, simplified, looks something like this:
<resources>
<resource id="a">
<dependency idref="b"/>
<!-- some other stuff -->
</resource>
<resource id="b">
<!-- some other stuff -->
</resource>
</resources>
The XSLT stylesheet must process a particular resource that we’re interested in, which I will call the root resource, and all recursive dependencies. Dependencies are other resources, uniquely identified by their id
attribute.
It doesn’t matter if a resource is processed twice, although it’s preferable to process each required resource only once. It also doesn’t matter what order the resources are processed in.
It’s important that only the root resource and its recursive dependencies are processed. We can’t just process all the resources and be done with it.
A naïve implementation is as follows:
<xsl:key name="resource-id" match="resource" use="@id"/>
<xsl:template match="resource">
<!-- do whatever is required to process the resource. -->
<!-- then handle any dependencies -->
<xsl:apply-templates select="key('resource-id', dependency/@idref)"/>
</xsl:template>
This implementation works fine for the example above, as well as in many real-world cases. It does have the disadvantage that it often processes the same resource more than once, but as stated above that’s not hugely important.
The problem is that sometimes resources have cyclic dependencies:
<resources>
<resource id="a">
<dependency idref="b"/>
<dependency idref="d"/>
</resource>
<resource id="b">
<dependency idref="c"/>
</resource>
<resource id="c">
<dependency idref="a"/>
</resource>
<resource id="d"/>
</resources>
If you use the naïve implementation to process this example, and you start by processing a, b or c, you get infinite recursion.
Unfortunately I can’t control the input data and in any case cyclic dependencies are perfectly valid and allowed by the relevant specification.
I’ve come up with various partial solutions, but nothing that works in all cases.
The ideal solution would be a general approach to preventing a node from being processed more than once, but I don’t think that’s possible. In fact, I suspect this whole problem is impossible to solve.
If it helps, I have most of EXSLT available (including functions). If necessary I can also pre-process the input with any number of other XSLT scripts, although it’s preferable not to do excessive pre-processing of resources that won’t end up in the output.
What I can’t do is switch to processing this with another language (at least not without substantial re-engineering). I also can’t use XSLT 2.0.
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是一个简单的解决方案:
当应用于提供的 XML 文档时:
生成所需的正确结果:
主要思想是simple:维护已访问资源的 id 列表,并且仅当新资源的 id 不存在于列表中时才允许处理新资源。 “处理”用于演示目的,并输出包装所有其他请求(递归地)的请求,它所依赖的。
另请注意,每个
请求
仅处理一次。几年前,我为图形遍历问题提供了类似的解决方案 - 可以在 xml-dev 组档案中找到 - 此处。 :)
This is a simple solution:
When applied on the provided XML document:
the wanted, correct result is produced:
The main idea is simple: Maintain a list of ids of visited resources and only allow the processing of a new resource if its id is not present in the list. The "processing" is for demonstration purposes and outputs the request wrapping all other requests (recursively), on which it depends.
Also note that every
request
is processed only once.Years ago I provided a similar solution to a graph-traversal problem -- it can be found in the xml-dev group archives -- here. :)
只是为了好玩,另一个解决方案(遵循 Dimitre),但增加了包含已访问节点的节点集。我发布了两个样式表,一个包含节点集逻辑,另一个包含节点集比较,因为您必须测试对于大型 XML 输入,哪个样式表更快。
所以,这个样式表:
这个样式表:
两个输出:(
使用第一个输入)
(使用最后一个输入)
Just for fun, another solution (following Dimitre) but increasing a node-set with visited nodes. I post two stylesheet, one with node set logic and other with node set comparison, because you must test wich is faster for big XML inputs.
So, this stylesheet:
And this stylesheet:
Both output:
(Wiht first input)
(With last input)