提高 XSL 的性能
我使用下面的 XSL 2.0 代码来查找包含我作为输入给出的索引列表的文本节点的 id。该代码运行完美,但就性能而言,处理大文件需要很长时间。即使对于大文件,如果索引值很小,那么结果在几毫秒内很快。我正在使用 saxon9he Java 处理器来执行 XSL。
<xsl:variable name="insert-data" as="element(data)*">
<xsl:for-each-group
select="doc($insert-file)/insert-data/data"
group-by="xsd:integer(@index)">
<xsl:sort select="current-grouping-key()"/>
<data
index="{current-grouping-key()}"
text-id="{generate-id(
$main-root/descendant::text()[
sum((preceding::text(), .)/string-length(.)) ge current-grouping-key()
][1]
)}">
<xsl:copy-of select="current-group()/node()"/>
</data>
</xsl:for-each-group>
</xsl:variable>
在上述解决方案中,如果索引值太大(例如 270962),则 XSL 执行所需的时间为 83427 毫秒。在大文件中,如果索引值很大,例如 4605415、4605431,则需要几分钟才能执行。变量“insert-data”的计算似乎需要时间,尽管它是一个全局变量并且只计算一次。应该添加 XSL 还是处理器?我如何提高 XSL 的性能?
I am using the below XSL 2.0 code to find the ids of the text nodes that contains the list of indices that i give as input. the code works perfectly but in terms for performance it is taking a long time for huge files. Even for huge files if the index values are small then the result is quick in few ms. I am using saxon9he Java processor to execute the XSL.
<xsl:variable name="insert-data" as="element(data)*">
<xsl:for-each-group
select="doc($insert-file)/insert-data/data"
group-by="xsd:integer(@index)">
<xsl:sort select="current-grouping-key()"/>
<data
index="{current-grouping-key()}"
text-id="{generate-id(
$main-root/descendant::text()[
sum((preceding::text(), .)/string-length(.)) ge current-grouping-key()
][1]
)}">
<xsl:copy-of select="current-group()/node()"/>
</data>
</xsl:for-each-group>
</xsl:variable>
In the above solution if the index value is too huge say 270962 then the time taken for the XSL to execute is 83427ms. In huge files if the index value is huge say 4605415, 4605431 it takes several minutes to execute. Seems the computation of the variable "insert-data" takes time though it is a global variable and computed only once. Should the XSL be addessed or the processor? How can i improve the performance of the XSL.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我猜问题是
text-id
的生成,即表达式You are likely recalculated a lot of sums here。我认为这里最简单的路径是反转您的方法:在文档中的文本节点上递归,聚合到目前为止的字符串长度,并在每次新的
@index< 时输出
data
元素。 /code> 已到达。以下示例说明了该方法。请注意,每个唯一的@index
和每个文本节点仅被访问一次。I'd guess the problem is the generation of
text-id
, i.e. the expressionYou are potentially recalculating a lot of sums here. I think the easiest path here would be to invert your approach: recurse across the text nodes in the document, aggregate the string length so far, and output
data
elements each time a new@index
is reached. The following example illustrates the approach. Note that each unique@index
and each text node is visited only once.