棘手的 XSLT 转换

发布于 2024-12-06 08:38:36 字数 3750 浏览 1 评论 0原文

我有一个结构松散的 XHTML 数据,我需要将其转换为结构更好的 XML。

这是例子:

<tbody>
<tr>
    <td class="header"><img src="http://www.abc.com/images/icon_apples.gif"/><img src="http://www.abc.com/images/flag/portugal.gif" alt="Portugal"/> First Grade</td>
</tr>
<tr>
    <td>Green</td>
    <td>Round shaped</td>
    <td>Tasty</td>
</tr>
<tr>
    <td>Red</td>
    <td>Round shaped</td>
    <td>Bitter</td>
</tr>
<tr>
    <td>Pink</td>
    <td>Round shaped</td>
    <td>Tasty</td>
</tr>
<tr>
    <td class="header"><img src="http://www.abc.com/images/icon_strawberries.gif"/><img src="http://www.abc.com/images/flag/usa.gif" alt="USA"/> Fifth Grade</td>
</tr>
<tr>
    <td>Red</td>
    <td>Heart shaped</td>
    <td>Super tasty</td>
</tr>
<tr>
    <td class="header"><img src="http://www.abc.com/images/icon_bananas.gif"/><img src="http://www.abc.com/images/flag/congo.gif" alt="Congo"/> Third Grade</td>
</tr>
<tr>
    <td>Yellow</td>
    <td>Smile shaped</td>
    <td>Fairly tasty</td>
</tr>
<tr>
    <td>Brown</td>
    <td>Smile shaped</td>
    <td>Too sweet</td>
</tr>

我试图实现以下结构:

    <data>
    <entry>
        <type>Apples</type>
        <country>Portugal</country>
        <rank>First Grade</rank>
        <color>Green</color>
        <shape>Round shaped</shape>
        <taste>Tasty</taste>
    </entry>
    <entry>
        <type>Apples</type>
        <country>Portugal</country>
        <rank>First Grade</rank>
        <color>Red</color>
        <shape>Round shaped</shape>
        <taste>Bitter</taste>
    </entry>
    <entry>
        <type>Apples</type>
        <country>Portugal</country>
        <rank>First Grade</rank>
        <color>Pink</color>
        <shape>Round shaped</shape>
        <taste>Tasty</taste>
    </entry>
    <entry>
        <type>Strawberries</type>
        <country>USA</country>
        <rank>Fifth Grade</rank>
        <color>Red</color>
        <shape>Heart shaped</shape>
        <taste>Super</taste>
    </entry>
    <entry>
        <type>Bananas</type>
        <country>Congo</country>
        <rank>Third Grade</rank>
        <color>Yellow</color>
        <shape>Smile shaped</shape>
        <taste>Fairly tasty</taste>
    </entry>
    <entry>
        <type>Bananas</type>
        <country>Congo</country>
        <rank>Third Grade</rank>
        <color>Brown</color>
        <shape>Smile shaped</shape>
        <taste>Too sweet</taste>
    </entry>
</data>

首先,我需要从 tbody/tr/td/img[1]/@src 中提取水果类型,其次从 tbody/tr/ 中提取国家/地区td/img[2]/@alt 属性,最后是来自 tbody/tr/td 本身的等级。

接下来,我需要填充每个类别下的所有条目,同时包含这些值(如上所示)。

但是...正如您所看到的,我得到的数据结构非常松散。类别只是一个td,之后是该类别中的所有项目。更糟糕的是,在我的数据集中,每个类别下的项目数量在 1 到 100 之间变化......

我尝试了几种方法,但似乎无法得到它。非常感谢任何帮助。我知道 XSLT 2.0 引入了 xsl:for-each-group,但我仅限于 XSLT 1.0。

I have a loosely structured XHTML data and I need to convert it to better structured XML.

Here's the example:

<tbody>
<tr>
    <td class="header"><img src="http://www.abc.com/images/icon_apples.gif"/><img src="http://www.abc.com/images/flag/portugal.gif" alt="Portugal"/> First Grade</td>
</tr>
<tr>
    <td>Green</td>
    <td>Round shaped</td>
    <td>Tasty</td>
</tr>
<tr>
    <td>Red</td>
    <td>Round shaped</td>
    <td>Bitter</td>
</tr>
<tr>
    <td>Pink</td>
    <td>Round shaped</td>
    <td>Tasty</td>
</tr>
<tr>
    <td class="header"><img src="http://www.abc.com/images/icon_strawberries.gif"/><img src="http://www.abc.com/images/flag/usa.gif" alt="USA"/> Fifth Grade</td>
</tr>
<tr>
    <td>Red</td>
    <td>Heart shaped</td>
    <td>Super tasty</td>
</tr>
<tr>
    <td class="header"><img src="http://www.abc.com/images/icon_bananas.gif"/><img src="http://www.abc.com/images/flag/congo.gif" alt="Congo"/> Third Grade</td>
</tr>
<tr>
    <td>Yellow</td>
    <td>Smile shaped</td>
    <td>Fairly tasty</td>
</tr>
<tr>
    <td>Brown</td>
    <td>Smile shaped</td>
    <td>Too sweet</td>
</tr>

I am trying to achieve following structure:

    <data>
    <entry>
        <type>Apples</type>
        <country>Portugal</country>
        <rank>First Grade</rank>
        <color>Green</color>
        <shape>Round shaped</shape>
        <taste>Tasty</taste>
    </entry>
    <entry>
        <type>Apples</type>
        <country>Portugal</country>
        <rank>First Grade</rank>
        <color>Red</color>
        <shape>Round shaped</shape>
        <taste>Bitter</taste>
    </entry>
    <entry>
        <type>Apples</type>
        <country>Portugal</country>
        <rank>First Grade</rank>
        <color>Pink</color>
        <shape>Round shaped</shape>
        <taste>Tasty</taste>
    </entry>
    <entry>
        <type>Strawberries</type>
        <country>USA</country>
        <rank>Fifth Grade</rank>
        <color>Red</color>
        <shape>Heart shaped</shape>
        <taste>Super</taste>
    </entry>
    <entry>
        <type>Bananas</type>
        <country>Congo</country>
        <rank>Third Grade</rank>
        <color>Yellow</color>
        <shape>Smile shaped</shape>
        <taste>Fairly tasty</taste>
    </entry>
    <entry>
        <type>Bananas</type>
        <country>Congo</country>
        <rank>Third Grade</rank>
        <color>Brown</color>
        <shape>Smile shaped</shape>
        <taste>Too sweet</taste>
    </entry>
</data>

Firstly I need to extract the fruit type from the tbody/tr/td/img[1]/@src, secondly the country from tbody/tr/td/img[2]/@alt attribute and finally the grade from tbody/tr/td itself.

Next I need to populate all the entries under each category while including those values (like shown above).

But... As you can see, the the data I was given is very loosely structured. A category is simply a td and after that come all the items in that category. To make the things worse, in my datasets, the number of items under each category varies between 1 and 100...

I've tried a few approaches but just can't seem to get it. Any help is greatly appreciated. I know that XSLT 2.0 introduces xsl:for-each-group, but I am limited to XSLT 1.0.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

奢欲 2024-12-13 08:38:36

在这种情况下,您实际上并未对元素进行分组。这更像是取消它们的分组。

执行此操作的一种方法是使用 xsl:key 查找每个详细信息行的“标题”行。

<xsl:key name="fruity" 
   match="tr[not(td[@class='header'])]" 
   use="generate-id(preceding-sibling::tr[td[@class='header']][1])"/>

即对于每个详细信息行,获取最近的标题行。

接下来,您可以像这样匹配所有标题行:

<xsl:apply-templates select="tr/td[@class='header']"/>

在匹配模板中,您可以提取类型、国家/地区和排名。然后,要获取关联的详细信息行,只需查看父行的键即可:

<xsl:apply-templates select="key('fruity', generate-id(..))">

这是整体 XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="xml" indent="yes"/>

   <xsl:key name="fruity" 
      match="tr[not(td[@class='header'])]" 
      use="generate-id(preceding-sibling::tr[td[@class='header']][1])"/>

   <xsl:template match="/tbody">
      <data>
         <!-- Match header rows -->
         <xsl:apply-templates select="tr/td[@class='header']"/>
      </data>
   </xsl:template>

   <xsl:template match="td">
      <!-- Match associated detail rows -->
      <xsl:apply-templates select="key('fruity', generate-id(..))">
         <!-- Extract relevant parameters from the td cell -->
         <xsl:with-param name="type" select="substring-before(substring-after(img[1]/@src, 'images/icon_'), '.gif')"/>
         <xsl:with-param name="country" select="img[2]/@alt"/>
         <xsl:with-param name="rank" select="normalize-space(text())"/>
      </xsl:apply-templates>
   </xsl:template>

   <xsl:template match="tr">
      <xsl:param name="type"/>
      <xsl:param name="country"/>
      <xsl:param name="rank"/>
      <entry>
         <type>
            <xsl:value-of select="$type"/>
         </type>
         <country>
            <xsl:value-of select="$country"/>
         </country>
         <rank>
            <xsl:value-of select="$rank"/>
         </rank>
         <color>
            <xsl:value-of select="td[1]"/>
         </color>
         <shape>
            <xsl:value-of select="td[2]"/>
         </shape>
         <taste>
            <xsl:value-of select="td[3]"/>
         </taste>
      </entry>
   </xsl:template>
</xsl:stylesheet>

当应用于输入文档时,会生成以下输出:

<data>
   <entry>
      <type>apples</type>
      <country>Portugal</country>
      <rank>First Grade</rank>
      <color>Green</color>
      <shape>Round shaped</shape>
      <taste>Tasty</taste>
   </entry>
   <entry>
      <type>apples</type>
      <country>Portugal</country>
      <rank>First Grade</rank>
      <color>Red</color>
      <shape>Round shaped</shape>
      <taste>Bitter</taste>
   </entry>
   <entry>
      <type>apples</type>
      <country>Portugal</country>
      <rank>First Grade</rank>
      <color>Pink</color>
      <shape>Round shaped</shape>
      <taste>Tasty</taste>
   </entry>
   <entry>
      <type>strawberries</type>
      <country>USA</country>
      <rank>Fifth Grade</rank>
      <color>Red</color>
      <shape>Heart shaped</shape>
      <taste>Super tasty</taste>
   </entry>
   <entry>
      <type>bananas</type>
      <country>Congo</country>
      <rank>Third Grade</rank>
      <color>Yellow</color>
      <shape>Smile shaped</shape>
      <taste>Fairly tasty</taste>
   </entry>
   <entry>
      <type>bananas</type>
      <country>Congo</country>
      <rank>Third Grade</rank>
      <color>Brown</color>
      <shape>Smile shaped</shape>
      <taste>Too sweet</taste>
   </entry>
</data>

In this case, you are not actually grouping elements. It is more like ungrouping them.

One way to do this is to use an xsl:key to look up the "header" row for each of detail rows.

<xsl:key name="fruity" 
   match="tr[not(td[@class='header'])]" 
   use="generate-id(preceding-sibling::tr[td[@class='header']][1])"/>

i.e For each detail row, get the most previous header row.

Next, you can then match all your header rows like so:

<xsl:apply-templates select="tr/td[@class='header']"/>

Within the matching template, you could then extract the type, country and rank. Then to get the associated detail rows, it is a simple case of looking at the key for the parent row:

<xsl:apply-templates select="key('fruity', generate-id(..))">

Here is the overall XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="xml" indent="yes"/>

   <xsl:key name="fruity" 
      match="tr[not(td[@class='header'])]" 
      use="generate-id(preceding-sibling::tr[td[@class='header']][1])"/>

   <xsl:template match="/tbody">
      <data>
         <!-- Match header rows -->
         <xsl:apply-templates select="tr/td[@class='header']"/>
      </data>
   </xsl:template>

   <xsl:template match="td">
      <!-- Match associated detail rows -->
      <xsl:apply-templates select="key('fruity', generate-id(..))">
         <!-- Extract relevant parameters from the td cell -->
         <xsl:with-param name="type" select="substring-before(substring-after(img[1]/@src, 'images/icon_'), '.gif')"/>
         <xsl:with-param name="country" select="img[2]/@alt"/>
         <xsl:with-param name="rank" select="normalize-space(text())"/>
      </xsl:apply-templates>
   </xsl:template>

   <xsl:template match="tr">
      <xsl:param name="type"/>
      <xsl:param name="country"/>
      <xsl:param name="rank"/>
      <entry>
         <type>
            <xsl:value-of select="$type"/>
         </type>
         <country>
            <xsl:value-of select="$country"/>
         </country>
         <rank>
            <xsl:value-of select="$rank"/>
         </rank>
         <color>
            <xsl:value-of select="td[1]"/>
         </color>
         <shape>
            <xsl:value-of select="td[2]"/>
         </shape>
         <taste>
            <xsl:value-of select="td[3]"/>
         </taste>
      </entry>
   </xsl:template>
</xsl:stylesheet>

When applied to your input document, the following output is generated:

<data>
   <entry>
      <type>apples</type>
      <country>Portugal</country>
      <rank>First Grade</rank>
      <color>Green</color>
      <shape>Round shaped</shape>
      <taste>Tasty</taste>
   </entry>
   <entry>
      <type>apples</type>
      <country>Portugal</country>
      <rank>First Grade</rank>
      <color>Red</color>
      <shape>Round shaped</shape>
      <taste>Bitter</taste>
   </entry>
   <entry>
      <type>apples</type>
      <country>Portugal</country>
      <rank>First Grade</rank>
      <color>Pink</color>
      <shape>Round shaped</shape>
      <taste>Tasty</taste>
   </entry>
   <entry>
      <type>strawberries</type>
      <country>USA</country>
      <rank>Fifth Grade</rank>
      <color>Red</color>
      <shape>Heart shaped</shape>
      <taste>Super tasty</taste>
   </entry>
   <entry>
      <type>bananas</type>
      <country>Congo</country>
      <rank>Third Grade</rank>
      <color>Yellow</color>
      <shape>Smile shaped</shape>
      <taste>Fairly tasty</taste>
   </entry>
   <entry>
      <type>bananas</type>
      <country>Congo</country>
      <rank>Third Grade</rank>
      <color>Brown</color>
      <shape>Smile shaped</shape>
      <taste>Too sweet</taste>
   </entry>
</data>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文