棘手的 XSLT 转换
我有一个结构松散的 XHTML 数据,我需要将其转换为结构更好的 XML。
这是例子:
<tbody>
<tr>
<td class="header"><img src="http://www.abc.com/images/icon_apples.gif"/><img src="http://www.abc.com/images/flag/portugal.gif" alt="Portugal"/> First Grade</td>
</tr>
<tr>
<td>Green</td>
<td>Round shaped</td>
<td>Tasty</td>
</tr>
<tr>
<td>Red</td>
<td>Round shaped</td>
<td>Bitter</td>
</tr>
<tr>
<td>Pink</td>
<td>Round shaped</td>
<td>Tasty</td>
</tr>
<tr>
<td class="header"><img src="http://www.abc.com/images/icon_strawberries.gif"/><img src="http://www.abc.com/images/flag/usa.gif" alt="USA"/> Fifth Grade</td>
</tr>
<tr>
<td>Red</td>
<td>Heart shaped</td>
<td>Super tasty</td>
</tr>
<tr>
<td class="header"><img src="http://www.abc.com/images/icon_bananas.gif"/><img src="http://www.abc.com/images/flag/congo.gif" alt="Congo"/> Third Grade</td>
</tr>
<tr>
<td>Yellow</td>
<td>Smile shaped</td>
<td>Fairly tasty</td>
</tr>
<tr>
<td>Brown</td>
<td>Smile shaped</td>
<td>Too sweet</td>
</tr>
我试图实现以下结构:
<data>
<entry>
<type>Apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Green</color>
<shape>Round shaped</shape>
<taste>Tasty</taste>
</entry>
<entry>
<type>Apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Red</color>
<shape>Round shaped</shape>
<taste>Bitter</taste>
</entry>
<entry>
<type>Apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Pink</color>
<shape>Round shaped</shape>
<taste>Tasty</taste>
</entry>
<entry>
<type>Strawberries</type>
<country>USA</country>
<rank>Fifth Grade</rank>
<color>Red</color>
<shape>Heart shaped</shape>
<taste>Super</taste>
</entry>
<entry>
<type>Bananas</type>
<country>Congo</country>
<rank>Third Grade</rank>
<color>Yellow</color>
<shape>Smile shaped</shape>
<taste>Fairly tasty</taste>
</entry>
<entry>
<type>Bananas</type>
<country>Congo</country>
<rank>Third Grade</rank>
<color>Brown</color>
<shape>Smile shaped</shape>
<taste>Too sweet</taste>
</entry>
</data>
首先,我需要从 tbody/tr/td/img[1]/@src 中提取水果类型,其次从 tbody/tr/ 中提取国家/地区td/img[2]/@alt 属性,最后是来自 tbody/tr/td 本身的等级。
接下来,我需要填充每个类别下的所有条目,同时包含这些值(如上所示)。
但是...正如您所看到的,我得到的数据结构非常松散。类别只是一个td,之后是该类别中的所有项目。更糟糕的是,在我的数据集中,每个类别下的项目数量在 1 到 100 之间变化......
我尝试了几种方法,但似乎无法得到它。非常感谢任何帮助。我知道 XSLT 2.0 引入了 xsl:for-each-group,但我仅限于 XSLT 1.0。
I have a loosely structured XHTML data and I need to convert it to better structured XML.
Here's the example:
<tbody>
<tr>
<td class="header"><img src="http://www.abc.com/images/icon_apples.gif"/><img src="http://www.abc.com/images/flag/portugal.gif" alt="Portugal"/> First Grade</td>
</tr>
<tr>
<td>Green</td>
<td>Round shaped</td>
<td>Tasty</td>
</tr>
<tr>
<td>Red</td>
<td>Round shaped</td>
<td>Bitter</td>
</tr>
<tr>
<td>Pink</td>
<td>Round shaped</td>
<td>Tasty</td>
</tr>
<tr>
<td class="header"><img src="http://www.abc.com/images/icon_strawberries.gif"/><img src="http://www.abc.com/images/flag/usa.gif" alt="USA"/> Fifth Grade</td>
</tr>
<tr>
<td>Red</td>
<td>Heart shaped</td>
<td>Super tasty</td>
</tr>
<tr>
<td class="header"><img src="http://www.abc.com/images/icon_bananas.gif"/><img src="http://www.abc.com/images/flag/congo.gif" alt="Congo"/> Third Grade</td>
</tr>
<tr>
<td>Yellow</td>
<td>Smile shaped</td>
<td>Fairly tasty</td>
</tr>
<tr>
<td>Brown</td>
<td>Smile shaped</td>
<td>Too sweet</td>
</tr>
I am trying to achieve following structure:
<data>
<entry>
<type>Apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Green</color>
<shape>Round shaped</shape>
<taste>Tasty</taste>
</entry>
<entry>
<type>Apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Red</color>
<shape>Round shaped</shape>
<taste>Bitter</taste>
</entry>
<entry>
<type>Apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Pink</color>
<shape>Round shaped</shape>
<taste>Tasty</taste>
</entry>
<entry>
<type>Strawberries</type>
<country>USA</country>
<rank>Fifth Grade</rank>
<color>Red</color>
<shape>Heart shaped</shape>
<taste>Super</taste>
</entry>
<entry>
<type>Bananas</type>
<country>Congo</country>
<rank>Third Grade</rank>
<color>Yellow</color>
<shape>Smile shaped</shape>
<taste>Fairly tasty</taste>
</entry>
<entry>
<type>Bananas</type>
<country>Congo</country>
<rank>Third Grade</rank>
<color>Brown</color>
<shape>Smile shaped</shape>
<taste>Too sweet</taste>
</entry>
</data>
Firstly I need to extract the fruit type from the tbody/tr/td/img[1]/@src, secondly the country from tbody/tr/td/img[2]/@alt attribute and finally the grade from tbody/tr/td itself.
Next I need to populate all the entries under each category while including those values (like shown above).
But... As you can see, the the data I was given is very loosely structured. A category is simply a td and after that come all the items in that category. To make the things worse, in my datasets, the number of items under each category varies between 1 and 100...
I've tried a few approaches but just can't seem to get it. Any help is greatly appreciated. I know that XSLT 2.0 introduces xsl:for-each-group, but I am limited to XSLT 1.0.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在这种情况下,您实际上并未对元素进行分组。这更像是取消它们的分组。
执行此操作的一种方法是使用 xsl:key 查找每个详细信息行的“标题”行。
即对于每个详细信息行,获取最近的标题行。
接下来,您可以像这样匹配所有标题行:
在匹配模板中,您可以提取类型、国家/地区和排名。然后,要获取关联的详细信息行,只需查看父行的键即可:
这是整体 XSLT
当应用于输入文档时,会生成以下输出:
In this case, you are not actually grouping elements. It is more like ungrouping them.
One way to do this is to use an xsl:key to look up the "header" row for each of detail rows.
i.e For each detail row, get the most previous header row.
Next, you can then match all your header rows like so:
Within the matching template, you could then extract the type, country and rank. Then to get the associated detail rows, it is a simple case of looking at the key for the parent row:
Here is the overall XSLT
When applied to your input document, the following output is generated: