根据外部文档中指定的顺序使用 XSLT 对 XML 元素顺序进行排序

发布于 2024-10-07 13:48:37 字数 2156 浏览 10 评论 0原文

平台： Saxon 9 - XSLT 2.0

我有 3000 个 xml 文档需要定期编辑、更新和保存。

该过程的一部分涉及在编辑之前从存储库中检出文档，并在编辑完成后定期发布它。

每个文档都包含一系列单独命名的部分，例如，

   <part>
        <meta>
            <place_id>12345</place_id>
            <place_name>London</place_name>
            <country_id>GB</country_id>
            <country_name>United Kingdom</country_name>
        </meta>
        <text>
            <docs>some blurb</docs>
            <airport>some blurb LGW LHR</airport>
            <trains>some blurb</trains>
            <hotels>some blurb</hotels>
            <health>some blurb</health>
            <attractions>some blurb</attractions>
        </text>
   </part>

在文本元素内有近 100 个部分，并且与所有编辑团队一样，他们会偶尔但定期地改变首选顺序的想法。也许每年两次。

目前，我们按照当前首选顺序向编辑者提供 XML 文档部分，以进行编辑和发布。此顺序在动态生成的名为“stdhdg.xml”的外部文档中指定，并且显示如下：

<hdgs>
    <hdg name="docs" newsort="10"/>
    <hdg name="airport" newsort="30"/>
    <hdg name="trains" newsort="20"/>
    <hdg name="hotels" newsort="40"/>
    <hdg name="health" newsort="60"/>
    <hdg name="attractions" newsort="50"/>
</hdgs>

其中首选排序顺序由 hdg/@newsort 指定。

所以我使用这样的模板以正确的顺序进行处理，

<xsl:template match="text">
    <xsl:variable name="thetext" select="."/>
<xsl:variable name="stdhead" select="document('stdhdg.xml')"/>
    <text>
        <xsl:for-each select="$stdhead//hdg">
            <xsl:sort data-type="number" order="ascending" select="@newsort"/>
            <xsl:variable name="tagname" select="@name"/>
            <xsl:variable name="thisnode" select="$thetext/*[local-name() = $tagname]"/>
            <xsl:apply-templates select="$thisnode"/>
        </xsl:for-each>
    </text>
</xsl:template>

但它看起来非常慢且麻烦，我觉得我应该使用按键来加快速度。

有没有更简单/更整洁的方法来执行此排序操作。

（请不要要求我改变编辑的编辑方式。这超出了我的生命价值）

TIA

Feargal

原文

Platform: Saxon 9 - XSLT 2.0

I have 3000 xml docs that need to be regularly edited, updated and saved.

Part of the process involves checking-out a document from a repository before editing, and publishing it at regular intervals when editing is complete.

Each document contains a series individually named sections e.g.

   <part>
        <meta>
            <place_id>12345</place_id>
            <place_name>London</place_name>
            <country_id>GB</country_id>
            <country_name>United Kingdom</country_name>
        </meta>
        <text>
            <docs>some blurb</docs>
            <airport>some blurb LGW LHR</airport>
            <trains>some blurb</trains>
            <hotels>some blurb</hotels>
            <health>some blurb</health>
            <attractions>some blurb</attractions>
        </text>
   </part>

Within the text element there are nearly 100 sections, and as with all editorial teams, they change their mind on the preferred order on an occasional, but regular, basis. Maybe twice per year.

At the moment, we present the XML doc sections to the editors IN THE CURRENT PREFERRED ORDER for editing and for publishing. This order is specified in a dynamically generated external document called 'stdhdg.xml', and appears something like this:

<hdgs>
    <hdg name="docs" newsort="10"/>
    <hdg name="airport" newsort="30"/>
    <hdg name="trains" newsort="20"/>
    <hdg name="hotels" newsort="40"/>
    <hdg name="health" newsort="60"/>
    <hdg name="attractions" newsort="50"/>
</hdgs>

where the preferred sort-order is specified by hdg/@newsort.

So I use a template like this to process in the correct order

<xsl:template match="text">
    <xsl:variable name="thetext" select="."/>
<xsl:variable name="stdhead" select="document('stdhdg.xml')"/>
    <text>
        <xsl:for-each select="$stdhead//hdg">
            <xsl:sort data-type="number" order="ascending" select="@newsort"/>
            <xsl:variable name="tagname" select="@name"/>
            <xsl:variable name="thisnode" select="$thetext/*[local-name() = $tagname]"/>
            <xsl:apply-templates select="$thisnode"/>
        </xsl:for-each>
    </text>
</xsl:template>

But it seems very slow and cumbersome and I feel that I should be using keys to speed it up.

Is there a simpler/neater way of doing this sorting operation.

(Please don't ask me to change the way the editors edit. That is more than my life's worth)

TIA

Feargal

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

溺渁∝ 2024-10-14 13:48:37

是的，键应该可以加速这样的查找。这是一个大纲：

<xsl:stylesheet ...>

  <xsl:key name="k1" match="text/*" use="local-name()"/>

  <xsl:variable name="stdhead" select="document('stdhdg.xml')"/>

  ...

<xsl:template match="text">
    <xsl:variable name="thetext" select="."/>
    <text>
        <xsl:for-each select="$stdhead//hdg">
            <xsl:sort data-type="number" order="ascending" select="@newsort"/>
            <xsl:apply-templates select="key('k1', @name, $thetext)"/>
        </xsl:for-each>
    </text>
</xsl:template>

</xsl:stylesheet>

所有内容都直接在浏览器中键入，因此请将其作为如何处理它的大纲，而不是在经过测试的代码中。

[编辑] 再考虑一下，我认为每次处理 text 元素时进行排序是一种浪费，因此您可以更改为

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

  <xsl:key name="k1" match="text/*" use="local-name()"/>

  <xsl:variable name="stdhead" select="document('stdhdg.xml')"/>

  <xsl:variable name="sorted-headers" as="element(hdg)*">
    <xsl:perform-sort select="$stdhead//hdg">
      <xsl:sort select="@newsort" data-type="number"/>
    </xsl:perform-sort>
  </xsl:variable>

<xsl:template match="text">
    <xsl:variable name="thetext" select="."/>
    <text>
        <xsl:for-each select="$sorted-headers">
            <xsl:apply-templates select="key('k1', @name, $thetext)"/>
        </xsl:for-each>
    </text>
</xsl:template>

</xsl:stylesheet>

Yes, keys should speed up such a lookup. Here is an outline:

<xsl:stylesheet ...>

  <xsl:key name="k1" match="text/*" use="local-name()"/>

  <xsl:variable name="stdhead" select="document('stdhdg.xml')"/>

  ...

<xsl:template match="text">
    <xsl:variable name="thetext" select="."/>
    <text>
        <xsl:for-each select="$stdhead//hdg">
            <xsl:sort data-type="number" order="ascending" select="@newsort"/>
            <xsl:apply-templates select="key('k1', @name, $thetext)"/>
        </xsl:for-each>
    </text>
</xsl:template>

</xsl:stylesheet>

All typed directly in the browser so take that as an outline on how to approach it, not at tested code.

[edit] As a second thought, I think sorting each time you process a text element is a waste, so you could change to

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

  <xsl:key name="k1" match="text/*" use="local-name()"/>

  <xsl:variable name="stdhead" select="document('stdhdg.xml')"/>

  <xsl:variable name="sorted-headers" as="element(hdg)*">
    <xsl:perform-sort select="$stdhead//hdg">
      <xsl:sort select="@newsort" data-type="number"/>
    </xsl:perform-sort>
  </xsl:variable>

<xsl:template match="text">
    <xsl:variable name="thetext" select="."/>
    <text>
        <xsl:for-each select="$sorted-headers">
            <xsl:apply-templates select="key('k1', @name, $thetext)"/>
        </xsl:for-each>
    </text>
</xsl:template>

</xsl:stylesheet>

回复收藏 0 原文

反差帅 2024-10-14 13:48:37

文本元素内有
近 100 个部分，并且与所有部分一样
编辑团队，他们改变了他们的
介意首选订单
偶尔但定期。或许
每年两次。
。。。。。。
但是看起来很
又慢又麻烦，我觉得我
应该使用按键来加速

每次呈现文档进行编辑时对文档进行排序是错误的方法。

最好的解决方案是在“stdhdg.xml”文档更改时对其进行排序并每年仅保存两次排序后的结果。

如果“stdhdg.xml”中的更改无法在组织上很好地同步，您可以有一个重复（例如每天）作业来运行以下转换：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="vHeaderLoc" select="'file:///C:/temp/deleteMe/stdhdg.xml'"/>

 <xsl:variable name="vHeaderDoc" select=
 "document($vHeaderLoc)"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match=
   "part/@hash
          [not(.
              = 
               string(document('file:///C:/temp/deleteMe/stdhdg.xml'))
              )
          ]">
  <xsl:attribute name="hash">
   <xsl:value-of select="string($vHeaderDoc)"/>
  </xsl:attribute>
 </xsl:template>

 <xsl:template match=
   "/*/text[not(/*/@hash
                = string(document('file:///C:/temp/deleteMe/stdhdg.xml'))
                )
            ]">
  <text>
   <xsl:apply-templates select="*">
    <xsl:sort data-type="number"
     select="$vHeaderDoc/*/hdg[@name=name(current())]"/>
   </xsl:apply-templates>
  </text>
 </xsl:template>
</xsl:stylesheet>

当主要内容 XML 文档是（请注意，顶部元素现在有一个 hash 属性）是：

<part hash="010203040506">
    <meta>
        <place_id>12345</place_id>
        <place_name>London</place_name>
        <country_id>GB</country_id>
        <country_name>United Kingdom</country_name>
    </meta>
    <text>
        <docs>some blurb</docs>
        <airport>some blurb LGW LHR</airport>
        <trains>some blurb</trains>
        <hotels>some blurb</hotels>
        <health>some blurb</health>
        <attractions>some blurb</attractions>
    </text>
</part>

并且 stdhdg.xml 文件是：

<hdgs>
    <hdg name="docs">10</hdg>
    <hdg name="airport">30</hdg>
    <hdg name="trains">20</hdg>
    <hdg name="hotels">40</hdg>
    <hdg name="health">60</hdg>
    <hdg name="attractions">50</hdg>
</hdgs>

然后上面的转换会生成具有最新哈希值的新排序的主要内容：

<part hash="103020406050">
   <meta>
      <place_id>12345</place_id>
      <place_name>London</place_name>
      <country_id>GB</country_id>
      <country_name>United Kingdom</country_name>
   </meta>
   <text>
      <docs>some blurb</docs>
      <trains>some blurb</trains>
      <airport>some blurb LGW LHR</airport>
      <hotels>some blurb</hotels>
      <attractions>some blurb</attractions>
      <health>some blurb</health>
   </text>
</part>

请注意：

主要内容文档的顶部元素现在有一个 hash 属性，其值是驻留在 stdhdg.xml 文档中的排序键的串联。
stdhdg.xml 文件的格式也略有更改，以便可以轻松地将键的串联生成为文档的字符串值。
如果主内容中保存的哈希值与 stdhdg.xml 中的排序键连接相同，则每日运行的转换就是恒等转换。
如果旧散列与 stdhdg.xml 中的排序键不匹配，则将其更新为新散列并对各部分重新排序。

Within the text element there are
nearly 100 sections, and as with all
editorial teams, they change their
mind on the preferred order on an
occasional, but regular, basis. Maybe
twice per year.
. . . . . .
But it seems very
slow and cumbersome and I feel that I
should be using keys to speed it up

Sorting the document each time when it is presented for editing is the wrong approach.

The best solution is to sort it and save it sorted only 2 times per year when the 'stdhdg.xml' document is changed.

If the change in 'stdhdg.xml' cannot be organizationally synched well, you can have a repeating (say daily) job that runs the following transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="vHeaderLoc" select="'file:///C:/temp/deleteMe/stdhdg.xml'"/>

 <xsl:variable name="vHeaderDoc" select=
 "document($vHeaderLoc)"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match=
   "part/@hash
          [not(.
              = 
               string(document('file:///C:/temp/deleteMe/stdhdg.xml'))
              )
          ]">
  <xsl:attribute name="hash">
   <xsl:value-of select="string($vHeaderDoc)"/>
  </xsl:attribute>
 </xsl:template>

 <xsl:template match=
   "/*/text[not(/*/@hash
                = string(document('file:///C:/temp/deleteMe/stdhdg.xml'))
                )
            ]">
  <text>
   <xsl:apply-templates select="*">
    <xsl:sort data-type="number"
     select="$vHeaderDoc/*/hdg[@name=name(current())]"/>
   </xsl:apply-templates>
  </text>
 </xsl:template>
</xsl:stylesheet>

when the main content XML document is (note the top element now has a hash attribute) is:

<part hash="010203040506">
    <meta>
        <place_id>12345</place_id>
        <place_name>London</place_name>
        <country_id>GB</country_id>
        <country_name>United Kingdom</country_name>
    </meta>
    <text>
        <docs>some blurb</docs>
        <airport>some blurb LGW LHR</airport>
        <trains>some blurb</trains>
        <hotels>some blurb</hotels>
        <health>some blurb</health>
        <attractions>some blurb</attractions>
    </text>
</part>

and the stdhdg.xml file is:

<hdgs>
    <hdg name="docs">10</hdg>
    <hdg name="airport">30</hdg>
    <hdg name="trains">20</hdg>
    <hdg name="hotels">40</hdg>
    <hdg name="health">60</hdg>
    <hdg name="attractions">50</hdg>
</hdgs>

then the transformation above produces a newly-sorted main content that has the latest hash:

<part hash="103020406050">
   <meta>
      <place_id>12345</place_id>
      <place_name>London</place_name>
      <country_id>GB</country_id>
      <country_name>United Kingdom</country_name>
   </meta>
   <text>
      <docs>some blurb</docs>
      <trains>some blurb</trains>
      <airport>some blurb LGW LHR</airport>
      <hotels>some blurb</hotels>
      <attractions>some blurb</attractions>
      <health>some blurb</health>
   </text>
</part>

Do Note:

The top element of the main content document has now a hash attribute, whose value is the concatenation of the sort keys residing in the stdhdg.xml document.
The format of the stdhdg.xml file is also slightly changed so that the concatenation of the keys ca be easily produced as the string value of the document.
The daily-run transformation is the identity transformation if the hash saved in the main content is the same as the sort-keys-concatenation in stdhdg.xml.
If the old hash and does not match the sort-keys in stdhdg.xml, then it is updated to the new hash and the sections are re-sorted.

回复收藏 0 原文

~没有更多了~