XSLT:合并一组树层次结构

发布于 2024-07-15 22:01:56 字数 2316 浏览 7 评论 0原文

我有一个基于 Excel 在另存为“XML Spreadsheet 2003 (*.xml)”时生成的 XML 文档。

电子表格本身包含一个带有标签层次结构的标题部分:

 | A     B     C     D     E     F     G     H     I
-+-----------------------------------------------------
1| a1                                  a2
2| a11         a12         a13         a21   a22
3| a111  a112  a121  a122  a131  a132        a221  a222

此层次结构存在于工作簿中的所有工作表上,并且到处看起来或多或少相同。

Excel XML 的工作方式与普通 HTML 表格完全相同。 (包含 )。 我已经能够将所有内容转换为这样的树结构:

<node title="a1" col="1">
  <node title="a11" col="1">
    <node title="a111" col="1"/>
    <node title="a112" col="2"/>
  </node>
  <node title="a12" col="3">
    <node title="a121" col="3" />
    <node title="a122" col="4" />
  </node>
  <!-- and so on -->
</node>

但这里很复杂:

  • 有多个工作表,因此每个工作表都有一棵树,
  • 每个工作表上的层次结构可能略有不同,树不会相等(例如,工作表 2 可能有“a113”,而其他工作表则没有)
  • 树深度没有明确限制,
  • 但是标签在所有工作表中都是相同的,这意味着它们可以用于

分组喜欢将这些单独的树合并成一个如下所示的树:

<node title="a1">
  <col on="sheet1">1</col>
  <col on="sheet2">1</col>
  <node title="a11">
    <col on="sheet1">1</col>
    <col on="sheet2">1</col>
    <node title="a111">
      <col on="sheet1">1</col>
      <col on="sheet2">1</col>
    </node>
    <node title="a112">
      <col on="sheet1">2</col>
      <col on="sheet2">2</col>
    </node>
    <node title="a113"><!-- different here -->
      <col on="sheet2">3</col>
    </node>
  </node>
  <node title="a12">
    <col on="sheet1">3</col>
    <col on="sheet2">4</col>
    <node title="a121">
      <col on="sheet1">3</col>
      <col on="sheet2">4</col>
    </node>
    <node title="a122">
      <col on="sheet1">4</col>
      <col on="sheet2">5</col>
    </node>
  </node>
  <!-- and so on -->
</node>

理想情况下,我希望能够在我什至从 Excel XML 构建这三个结构之前进行合并(如果您让我开始)这,那就太好了)。 但由于我不知道如何做到这一点,因此在构建树后进行合并(即:上述情况)就可以了。

谢谢你的时间。 :)

I have an XML document based what Excel produces when saving as "XML Spreadsheet 2003 (*.xml)".

The spreadsheet itself contains a header section with a hierarchy of labels:

 | A     B     C     D     E     F     G     H     I
-+-----------------------------------------------------
1| a1                                  a2
2| a11         a12         a13         a21   a22
3| a111  a112  a121  a122  a131  a132        a221  a222

This hierarchy is present on all sheets in the workbook, and looks more or less the same everywhere.

Excel XML works exactly like ordinary HTML tables. (<row>s that contain <cell>s). I have been able to transform everything into such a tree structure:

<node title="a1" col="1">
  <node title="a11" col="1">
    <node title="a111" col="1"/>
    <node title="a112" col="2"/>
  </node>
  <node title="a12" col="3">
    <node title="a121" col="3" />
    <node title="a122" col="4" />
  </node>
  <!-- and so on -->
</node>

But here is the complication:

  • there is more than one worksheet, so there is a tree for each of them
  • the hierarchy may be slightly different on each sheet, the trees will not be equal (for example, sheet 2 may have "a113", while the others don't)
  • tree depth is not explicitly limited
  • the labels however are meant to be the same across all sheets, which means they can be used for grouping

I'd like to merge these separate trees into one that looks like this:

<node title="a1">
  <col on="sheet1">1</col>
  <col on="sheet2">1</col>
  <node title="a11">
    <col on="sheet1">1</col>
    <col on="sheet2">1</col>
    <node title="a111">
      <col on="sheet1">1</col>
      <col on="sheet2">1</col>
    </node>
    <node title="a112">
      <col on="sheet1">2</col>
      <col on="sheet2">2</col>
    </node>
    <node title="a113"><!-- different here -->
      <col on="sheet2">3</col>
    </node>
  </node>
  <node title="a12">
    <col on="sheet1">3</col>
    <col on="sheet2">4</col>
    <node title="a121">
      <col on="sheet1">3</col>
      <col on="sheet2">4</col>
    </node>
    <node title="a122">
      <col on="sheet1">4</col>
      <col on="sheet2">5</col>
    </node>
  </node>
  <!-- and so on -->
</node>

Ideally I'd like to be able to do the merge before I even build the three structure from the Excel XML (if you get me started on this, it'd be great). But since I have no idea how I would do this, a merge after the trees have been built (i.e.: the situation described above) will be fine.

Thanks for your time. :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

唯憾梦倾城 2024-07-22 22:01:56

这里是 XSLT 1.0 中的一种可能的解决方案

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

    <xsl:template match="/*">
      <t>
        <xsl:apply-templates
           select="node[@title='a1'][1]">
          <xsl:with-param name="pOther"
            select="node[@title='a1'][2]"/>
        </xsl:apply-templates>
      </t>
    </xsl:template>

    <xsl:template match="node">
      <xsl:param name="pOther"/>

      <node title="{@title}">
        <col on="sheet1">
          <xsl:value-of select="@col"/>
        </col>
          <xsl:choose>
            <xsl:when test="not($pOther)">
              <xsl:apply-templates mode="copy">
                <xsl:with-param name="pSheet" select="'sheet1'"/>
              </xsl:apply-templates>
            </xsl:when>
            <xsl:otherwise>
              <col on="sheet2">
                <xsl:value-of select="$pOther/@col"/>
              </col>
              <xsl:for-each select=
                "node[@title = $pOther/node/@title]">

                <xsl:apply-templates select=".">
                  <xsl:with-param name="pOther" select=
                   "$pOther/node[@title = current()/@title]"/>
                </xsl:apply-templates>
              </xsl:for-each>

              <xsl:apply-templates mode="copy" select=
                "node[not(@title = $pOther/node/@title)]">
                <xsl:with-param name="pSheet" select="'sheet1'"/>
              </xsl:apply-templates>

              <xsl:apply-templates mode="copy" select=
                "$pOther/node[not(@title = current()/node/@title)]">
                <xsl:with-param name="pSheet" select="'sheet2'"/>
              </xsl:apply-templates>
            </xsl:otherwise>
          </xsl:choose>
      </node>
    </xsl:template>

    <xsl:template match="node" mode="copy">
      <xsl:param name="pSheet"/>

      <node title="{@title}">
        <col on="{$pSheet}">
          <xsl:value-of select="@col"/>
        </col>

        <xsl:apply-templates select="node" mode="copy">
          <xsl:with-param name="pSheet" select="$pSheet"/>
        </xsl:apply-templates>
      </node>
    </xsl:template>
</xsl:stylesheet>

当上述转换应用于此 XML 文档时(两个 XML 文档在一个公共顶部节点下的串联 - 左为供读者练习:) ):

<t>
    <node title="a1" col="1">
        <node title="a11" col="1">
            <node title="a111" col="1"/>
            <node title="a112" col="2"/>
        </node>
        <node title="a12" col="3">
            <node title="a121" col="3" />
            <node title="a122" col="4" />
        </node>
        <!-- and so on -->
    </node>
    <node title="a1" col="1">
        <node title="a11" col="1">
            <node title="a111" col="1"/>
            <node title="a112" col="2"/>
            <node title="a113" col="3"/>
        </node>
        <node title="a12" col="4">
            <node title="a121" col="4" />
            <node title="a122" col="5" />
        </node>
        <!-- and so on -->
    </node>
</t>

产生了想要的结果:

<t>
    <node title="a1">
        <col on="sheet1">1</col>
        <col on="sheet2">1</col>
        <node title="a11">
            <col on="sheet1">1</col>
            <col on="sheet2">1</col>
            <node title="a111">
                <col on="sheet1">1</col>
                <col on="sheet2">1</col>
            </node>
            <node title="a112">
                <col on="sheet1">2</col>
                <col on="sheet2">2</col>
            </node>
            <node title="a113">
                <col on="sheet2">3</col>
            </node>
        </node>
        <node title="a12">
            <col on="sheet1">3</col>
            <col on="sheet2">4</col>
            <node title="a121">
                <col on="sheet1">3</col>
                <col on="sheet2">4</col>
            </node>
            <node title="a122">
                <col on="sheet1">4</col>
                <col on="sheet2">5</col>
            </node>
        </node>
    </node>
</t>

请注意以下内容:

  1. 我们假设两个顶部节点元素都有“a1 " 作为其 title 属性的值。 这很容易推广。

  2. 匹配node的模板有一个名为pOther的参数,它是另一个文档中名为node的对应元素。 仅当 $pOther 存在时,才会应用此模板。

  3. 当不存在名为 node 的对应元素时,将应用另一个模板,也匹配 node,但采用 copy 模式。 该模板有一个名为pSheet的参数,其值为该元素所属的sheet名称(字符串)。

Here is one possible solution in XSLT 1.0:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

    <xsl:template match="/*">
      <t>
        <xsl:apply-templates
           select="node[@title='a1'][1]">
          <xsl:with-param name="pOther"
            select="node[@title='a1'][2]"/>
        </xsl:apply-templates>
      </t>
    </xsl:template>

    <xsl:template match="node">
      <xsl:param name="pOther"/>

      <node title="{@title}">
        <col on="sheet1">
          <xsl:value-of select="@col"/>
        </col>
          <xsl:choose>
            <xsl:when test="not($pOther)">
              <xsl:apply-templates mode="copy">
                <xsl:with-param name="pSheet" select="'sheet1'"/>
              </xsl:apply-templates>
            </xsl:when>
            <xsl:otherwise>
              <col on="sheet2">
                <xsl:value-of select="$pOther/@col"/>
              </col>
              <xsl:for-each select=
                "node[@title = $pOther/node/@title]">

                <xsl:apply-templates select=".">
                  <xsl:with-param name="pOther" select=
                   "$pOther/node[@title = current()/@title]"/>
                </xsl:apply-templates>
              </xsl:for-each>

              <xsl:apply-templates mode="copy" select=
                "node[not(@title = $pOther/node/@title)]">
                <xsl:with-param name="pSheet" select="'sheet1'"/>
              </xsl:apply-templates>

              <xsl:apply-templates mode="copy" select=
                "$pOther/node[not(@title = current()/node/@title)]">
                <xsl:with-param name="pSheet" select="'sheet2'"/>
              </xsl:apply-templates>
            </xsl:otherwise>
          </xsl:choose>
      </node>
    </xsl:template>

    <xsl:template match="node" mode="copy">
      <xsl:param name="pSheet"/>

      <node title="{@title}">
        <col on="{$pSheet}">
          <xsl:value-of select="@col"/>
        </col>

        <xsl:apply-templates select="node" mode="copy">
          <xsl:with-param name="pSheet" select="$pSheet"/>
        </xsl:apply-templates>
      </node>
    </xsl:template>
</xsl:stylesheet>

When the above transformation is applied on this XML document (the concatenation of the two XML documents under a common top node -- left as an exercise for the reader :) ):

<t>
    <node title="a1" col="1">
        <node title="a11" col="1">
            <node title="a111" col="1"/>
            <node title="a112" col="2"/>
        </node>
        <node title="a12" col="3">
            <node title="a121" col="3" />
            <node title="a122" col="4" />
        </node>
        <!-- and so on -->
    </node>
    <node title="a1" col="1">
        <node title="a11" col="1">
            <node title="a111" col="1"/>
            <node title="a112" col="2"/>
            <node title="a113" col="3"/>
        </node>
        <node title="a12" col="4">
            <node title="a121" col="4" />
            <node title="a122" col="5" />
        </node>
        <!-- and so on -->
    </node>
</t>

The wanted result is produced:

<t>
    <node title="a1">
        <col on="sheet1">1</col>
        <col on="sheet2">1</col>
        <node title="a11">
            <col on="sheet1">1</col>
            <col on="sheet2">1</col>
            <node title="a111">
                <col on="sheet1">1</col>
                <col on="sheet2">1</col>
            </node>
            <node title="a112">
                <col on="sheet1">2</col>
                <col on="sheet2">2</col>
            </node>
            <node title="a113">
                <col on="sheet2">3</col>
            </node>
        </node>
        <node title="a12">
            <col on="sheet1">3</col>
            <col on="sheet2">4</col>
            <node title="a121">
                <col on="sheet1">3</col>
                <col on="sheet2">4</col>
            </node>
            <node title="a122">
                <col on="sheet1">4</col>
                <col on="sheet2">5</col>
            </node>
        </node>
    </node>
</t>

Do note the following:

  1. We suppose that both top node elements have "a1" as the value of their title attribute. This can easily be generalized.

  2. The template matching node has a parameter named pOther, which is the corresponding element named node from the other document. This template is applied - to only if $pOther exists.

  3. When no corresponding element named node exists, another template, also matching node, but in mode copy is applied. This template has a parameter named pSheet, the value of which is the sheet name (string) this element belongs to.

∞琼窗梦回ˉ 2024-07-22 22:01:56

一个可调用模板如何将工作表编号作为参数,该模板检查输入并返回正确的“col”节点(如果它出现在该工作表的 XML 中),如果没有出现,则不返回任何内容。 在每个节点,为每个工作表调用一次。

要合并树,可能需要一个模板来查找任何工作表中当前节点的所有子节点,并为每个子节点递归自身。

抱歉,没有示例代码,我发现编写 XSLT 非常慢,可能是因为我不经常这样做。 所以我很可能错过了一些重要的事情。 但把它们放在一起会得到类似的结果:

  • 获得“/node”的标题。 有了这个标题:
    • 搜索该标题的所有工作表,为每个工作表发出“col”节点
    • 在所有工作表中搜索具有此标题的节点的子节点(丢弃重复项)
    • 递归每个标题。

以下是通过各种方式删除重复项的一些片段:

http://www.dpawson .co.uk/xsl/sect2/N2696.html

读取多个文档取决于处理器,但如果其他所有方法都失败,那么使用任何旧脚本语言进行一些剪切和粘贴可能就可以了,前提是您知道它们都将具有相同的编码,不要使用冲突的 ID,等等。

How about a callable template taking the sheet number as a parameter, which examines the input and returns the correct "col" node if it appears in that sheet's XML, and nothing if it doesn't. At each node, call it once for each sheet.

To merge the trees, maybe a template that looks for all children of the current node in any sheet, and recurses on itself for each of them.

Sorry no sample code, I find writing XSLT to be pretty slow, probably because I don't do it often. So I may well have missed something crucial. But putting it all together would give something like:

  • get the title of "/node". With that title:
    • search all sheets for this title, emitting the "col" node for each
    • search all sheets for children of nodes with this title (discarding duplicates)
    • recurse on each of those titles.

Here are some snippets for removing duplicates in various ways:

http://www.dpawson.co.uk/xsl/sect2/N2696.html

Reading multiple documents is processor-dependent, but if all else fails a bit of cut-and-pastery with any old scripting language would probably do, provided that you know they'll all have the same encoding, don't use conflicting ids, and so on.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文