XSLT:合并一组树层次结构
我有一个基于 Excel 在另存为“XML Spreadsheet 2003 (*.xml)”时生成的 XML 文档。
电子表格本身包含一个带有标签层次结构的标题部分:
| A B C D E F G H I -+----------------------------------------------------- 1| a1 a2 2| a11 a12 a13 a21 a22 3| a111 a112 a121 a122 a131 a132 a221 a222
此层次结构存在于工作簿中的所有工作表上,并且到处看起来或多或少相同。
Excel XML 的工作方式与普通 HTML 表格完全相同。 (包含
的
)。 我已经能够将所有内容转换为这样的树结构:
<node title="a1" col="1">
<node title="a11" col="1">
<node title="a111" col="1"/>
<node title="a112" col="2"/>
</node>
<node title="a12" col="3">
<node title="a121" col="3" />
<node title="a122" col="4" />
</node>
<!-- and so on -->
</node>
但这里很复杂:
- 有多个工作表,因此每个工作表都有一棵树,
- 每个工作表上的层次结构可能略有不同,树不会相等(例如,工作表 2 可能有“a113”,而其他工作表则没有)
- 树深度没有明确限制,
- 但是标签在所有工作表中都是相同的,这意味着它们可以用于
分组喜欢将这些单独的树合并成一个如下所示的树:
<node title="a1">
<col on="sheet1">1</col>
<col on="sheet2">1</col>
<node title="a11">
<col on="sheet1">1</col>
<col on="sheet2">1</col>
<node title="a111">
<col on="sheet1">1</col>
<col on="sheet2">1</col>
</node>
<node title="a112">
<col on="sheet1">2</col>
<col on="sheet2">2</col>
</node>
<node title="a113"><!-- different here -->
<col on="sheet2">3</col>
</node>
</node>
<node title="a12">
<col on="sheet1">3</col>
<col on="sheet2">4</col>
<node title="a121">
<col on="sheet1">3</col>
<col on="sheet2">4</col>
</node>
<node title="a122">
<col on="sheet1">4</col>
<col on="sheet2">5</col>
</node>
</node>
<!-- and so on -->
</node>
理想情况下,我希望能够在我什至从 Excel XML 构建这三个结构之前进行合并(如果您让我开始)这,那就太好了)。 但由于我不知道如何做到这一点,因此在构建树后进行合并(即:上述情况)就可以了。
谢谢你的时间。 :)
I have an XML document based what Excel produces when saving as "XML Spreadsheet 2003 (*.xml)".
The spreadsheet itself contains a header section with a hierarchy of labels:
| A B C D E F G H I -+----------------------------------------------------- 1| a1 a2 2| a11 a12 a13 a21 a22 3| a111 a112 a121 a122 a131 a132 a221 a222
This hierarchy is present on all sheets in the workbook, and looks more or less the same everywhere.
Excel XML works exactly like ordinary HTML tables. (<row>
s that contain <cell>
s). I have been able to transform everything into such a tree structure:
<node title="a1" col="1">
<node title="a11" col="1">
<node title="a111" col="1"/>
<node title="a112" col="2"/>
</node>
<node title="a12" col="3">
<node title="a121" col="3" />
<node title="a122" col="4" />
</node>
<!-- and so on -->
</node>
But here is the complication:
- there is more than one worksheet, so there is a tree for each of them
- the hierarchy may be slightly different on each sheet, the trees will not be equal (for example, sheet 2 may have "a113", while the others don't)
- tree depth is not explicitly limited
- the labels however are meant to be the same across all sheets, which means they can be used for grouping
I'd like to merge these separate trees into one that looks like this:
<node title="a1">
<col on="sheet1">1</col>
<col on="sheet2">1</col>
<node title="a11">
<col on="sheet1">1</col>
<col on="sheet2">1</col>
<node title="a111">
<col on="sheet1">1</col>
<col on="sheet2">1</col>
</node>
<node title="a112">
<col on="sheet1">2</col>
<col on="sheet2">2</col>
</node>
<node title="a113"><!-- different here -->
<col on="sheet2">3</col>
</node>
</node>
<node title="a12">
<col on="sheet1">3</col>
<col on="sheet2">4</col>
<node title="a121">
<col on="sheet1">3</col>
<col on="sheet2">4</col>
</node>
<node title="a122">
<col on="sheet1">4</col>
<col on="sheet2">5</col>
</node>
</node>
<!-- and so on -->
</node>
Ideally I'd like to be able to do the merge before I even build the three structure from the Excel XML (if you get me started on this, it'd be great). But since I have no idea how I would do this, a merge after the trees have been built (i.e.: the situation described above) will be fine.
Thanks for your time. :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这里是 XSLT 1.0 中的一种可能的解决方案:
当上述转换应用于此 XML 文档时(两个 XML 文档在一个公共顶部节点下的串联 - 左为供读者练习:) ):
产生了想要的结果:
请注意以下内容:
我们假设两个顶部
节点
元素都有“a1 "
作为其title
属性的值。 这很容易推广。匹配
node
的模板有一个名为pOther
的参数,它是另一个文档中名为node
的对应元素。 仅当 $pOther
存在时,才会应用此模板。当不存在名为
node
的对应元素时,将应用另一个模板,也匹配node
,但采用copy
模式。 该模板有一个名为pSheet
的参数,其值为该元素所属的sheet名称(字符串)。Here is one possible solution in XSLT 1.0:
When the above transformation is applied on this XML document (the concatenation of the two XML documents under a common top node -- left as an exercise for the reader :) ):
The wanted result is produced:
Do note the following:
We suppose that both top
node
elements have"a1"
as the value of theirtitle
attribute. This can easily be generalized.The template matching
node
has a parameter namedpOther
, which is the corresponding element namednode
from the other document. This template is applied - to only if $pOther
exists.When no corresponding element named
node
exists, another template, also matchingnode
, but in modecopy
is applied. This template has a parameter namedpSheet
, the value of which is the sheet name (string) this element belongs to.一个可调用模板如何将工作表编号作为参数,该模板检查输入并返回正确的“col”节点(如果它出现在该工作表的 XML 中),如果没有出现,则不返回任何内容。 在每个节点,为每个工作表调用一次。
要合并树,可能需要一个模板来查找任何工作表中当前节点的所有子节点,并为每个子节点递归自身。
抱歉,没有示例代码,我发现编写 XSLT 非常慢,可能是因为我不经常这样做。 所以我很可能错过了一些重要的事情。 但把它们放在一起会得到类似的结果:
以下是通过各种方式删除重复项的一些片段:
http://www.dpawson .co.uk/xsl/sect2/N2696.html
读取多个文档取决于处理器,但如果其他所有方法都失败,那么使用任何旧脚本语言进行一些剪切和粘贴可能就可以了,前提是您知道它们都将具有相同的编码,不要使用冲突的 ID,等等。
How about a callable template taking the sheet number as a parameter, which examines the input and returns the correct "col" node if it appears in that sheet's XML, and nothing if it doesn't. At each node, call it once for each sheet.
To merge the trees, maybe a template that looks for all children of the current node in any sheet, and recurses on itself for each of them.
Sorry no sample code, I find writing XSLT to be pretty slow, probably because I don't do it often. So I may well have missed something crucial. But putting it all together would give something like:
Here are some snippets for removing duplicates in various ways:
http://www.dpawson.co.uk/xsl/sect2/N2696.html
Reading multiple documents is processor-dependent, but if all else fails a bit of cut-and-pastery with any old scripting language would probably do, provided that you know they'll all have the same encoding, don't use conflicting ids, and so on.