提高 XSL 的性能

发布于 2024-09-01 02:15:40 字数 935 浏览 9 评论 0原文

我使用下面的 XSL 2.0 代码来查找包含我作为输入给出的索引列表的文本节点的 id。该代码运行完美，但就性能而言，处理大文件需要很长时间。即使对于大文件，如果索引值很小，那么结果在几毫秒内很快。我正在使用 saxon9he Java 处理器来执行 XSL。

<xsl:variable name="insert-data" as="element(data)*"> 
  <xsl:for-each-group 
    select="doc($insert-file)/insert-data/data" 
    group-by="xsd:integer(@index)"> 
    <xsl:sort select="current-grouping-key()"/> 
    <data 
      index="{current-grouping-key()}" 
      text-id="{generate-id(
        $main-root/descendant::text()[
          sum((preceding::text(), .)/string-length(.)) ge current-grouping-key()
        ][1]
      )}"> 
      <xsl:copy-of select="current-group()/node()"/> 
    </data> 
  </xsl:for-each-group> 
</xsl:variable>

在上述解决方案中，如果索引值太大（例如 270962），则 XSL 执行所需的时间为 83427 毫秒。在大文件中，如果索引值很大，例如 4605415、4605431，则需要几分钟才能执行。变量“insert-data”的计算似乎需要时间，尽管它是一个全局变量并且只计算一次。应该添加 XSL 还是处理器？我如何提高 XSL 的性能？

原文

I am using the below XSL 2.0 code to find the ids of the text nodes that contains the list of indices that i give as input. the code works perfectly but in terms for performance it is taking a long time for huge files. Even for huge files if the index values are small then the result is quick in few ms. I am using saxon9he Java processor to execute the XSL.

<xsl:variable name="insert-data" as="element(data)*"> 
  <xsl:for-each-group 
    select="doc($insert-file)/insert-data/data" 
    group-by="xsd:integer(@index)"> 
    <xsl:sort select="current-grouping-key()"/> 
    <data 
      index="{current-grouping-key()}" 
      text-id="{generate-id(
        $main-root/descendant::text()[
          sum((preceding::text(), .)/string-length(.)) ge current-grouping-key()
        ][1]
      )}"> 
      <xsl:copy-of select="current-group()/node()"/> 
    </data> 
  </xsl:for-each-group> 
</xsl:variable>

In the above solution if the index value is too huge say 270962 then the time taken for the XSL to execute is 83427ms. In huge files if the index value is huge say 4605415, 4605431 it takes several minutes to execute. Seems the computation of the variable "insert-data" takes time though it is a global variable and computed only once. Should the XSL be addessed or the processor? How can i improve the performance of the XSL.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小猫一只 2024-09-08 02:15:40

我猜问题是 text-id 的生成，即表达式

generate-id(
    $main-root/descendant::text()[
      sum((preceding::text(), .)/string-length(.)) ge current-grouping-key()
    ][1]
  )

You are likely recalculated a lot of sums here。我认为这里最简单的路径是反转您的方法：在文档中的文本节点上递归，聚合到目前为止的字符串长度，并在每次新的 @index< 时输出 data 元素。 /code> 已到达。以下示例说明了该方法。请注意，每个唯一的 @index 和每个文本节点仅被访问一次。

<xsl:variable name="insert-doc" select="doc($insert-file)"/>

<xsl:variable name="insert-data" as="element(data)*"> 
    <xsl:call-template name="calculate-data"/>
</xsl:variable>

<xsl:key name="index" match="data" use="xsd:integer(@index)"/>

<xsl:template name="calculate-data">
    <xsl:param name="text-nodes" select="$main-root//text()"/>
    <xsl:param name="previous-lengths" select="0"/>
    <xsl:param name="indexes" as="xsd:integer*">
        <xsl:perform-sort 
            select="distinct-values(
                    $insert-doc/insert-data/data/@index/xsd:integer(.))">
            <xsl:sort/>
        </xsl:perform-sort>
    </xsl:param>
    <xsl:if test="$text-nodes">
        <xsl:variable name="total-lengths" 
            select="$previous-lengths + string-length($text-nodes[1])"/>
        <xsl:choose>
            <xsl:when test="$total-lengths ge number($indexes[1])">
                <data 
                    index="{$indexes[1]}" 
                    text-id="{generate-id($text-nodes[1])}">
                    <xsl:copy-of select="key('index', $indexes[1], 
                                             $insert-doc)"/> 
                </data>
                <!-- Recursively move to the next index. -->
                <xsl:call-template name="calculate-data">
                    <xsl:with-param
                        name="text-nodes"
                        select="$text-nodes"/>
                    <xsl:with-param
                        name="previous-lengths" 
                        select="$previous-lengths"/>
                    <xsl:with-param
                        name="indexes" 
                        select="subsequence($indexes, 2)"/>
                </xsl:call-template>                    
            </xsl:when>
            <xsl:otherwise>
                <!-- Recursively move to the text node. -->
                <xsl:call-template name="calculate-data">
                    <xsl:with-param 
                        name="text-nodes" 
                        select="subsequence($text-nodes, 2)"/>
                    <xsl:with-param
                        name="previous-lengths" 
                        select="$total-lengths"/>
                    <xsl:with-param 
                        name="indexes" 
                        select="$indexes"/>
                </xsl:call-template>                    
            </xsl:otherwise>
        </xsl:choose>
    </xsl:if>
</xsl:template>

I'd guess the problem is the generation of text-id, i.e. the expression

generate-id(
    $main-root/descendant::text()[
      sum((preceding::text(), .)/string-length(.)) ge current-grouping-key()
    ][1]
  )

You are potentially recalculating a lot of sums here. I think the easiest path here would be to invert your approach: recurse across the text nodes in the document, aggregate the string length so far, and output data elements each time a new @index is reached. The following example illustrates the approach. Note that each unique @index and each text node is visited only once.

<xsl:variable name="insert-doc" select="doc($insert-file)"/>

<xsl:variable name="insert-data" as="element(data)*"> 
    <xsl:call-template name="calculate-data"/>
</xsl:variable>

<xsl:key name="index" match="data" use="xsd:integer(@index)"/>

<xsl:template name="calculate-data">
    <xsl:param name="text-nodes" select="$main-root//text()"/>
    <xsl:param name="previous-lengths" select="0"/>
    <xsl:param name="indexes" as="xsd:integer*">
        <xsl:perform-sort 
            select="distinct-values(
                    $insert-doc/insert-data/data/@index/xsd:integer(.))">
            <xsl:sort/>
        </xsl:perform-sort>
    </xsl:param>
    <xsl:if test="$text-nodes">
        <xsl:variable name="total-lengths" 
            select="$previous-lengths + string-length($text-nodes[1])"/>
        <xsl:choose>
            <xsl:when test="$total-lengths ge number($indexes[1])">
                <data 
                    index="{$indexes[1]}" 
                    text-id="{generate-id($text-nodes[1])}">
                    <xsl:copy-of select="key('index', $indexes[1], 
                                             $insert-doc)"/> 
                </data>
                <!-- Recursively move to the next index. -->
                <xsl:call-template name="calculate-data">
                    <xsl:with-param
                        name="text-nodes"
                        select="$text-nodes"/>
                    <xsl:with-param
                        name="previous-lengths" 
                        select="$previous-lengths"/>
                    <xsl:with-param
                        name="indexes" 
                        select="subsequence($indexes, 2)"/>
                </xsl:call-template>                    
            </xsl:when>
            <xsl:otherwise>
                <!-- Recursively move to the text node. -->
                <xsl:call-template name="calculate-data">
                    <xsl:with-param 
                        name="text-nodes" 
                        select="subsequence($text-nodes, 2)"/>
                    <xsl:with-param
                        name="previous-lengths" 
                        select="$total-lengths"/>
                    <xsl:with-param 
                        name="indexes" 
                        select="$indexes"/>
                </xsl:call-template>                    
            </xsl:otherwise>
        </xsl:choose>
    </xsl:if>
</xsl:template>

回复收藏 0 原文

~没有更多了~