一种使用 XSL 将巨大的 XML 文件拆分为较小的 xml 文件的方法
我得到一个巨大的 XML 文件,其中包含电视广播列表。我必须将其分成包含一天所有广播的小文件。我设法做到了这一点,但是 xml 标头和一个节点多次出现有两个问题。
XML 的结构如下:
<?xml version="1.0" encoding="UTF-8"?>
<broadcasts>
<broadcast>
<id>4637445812</id>
<week>39</week>
<date>2009-09-22</date>
<time>21:45:00:00</time>
... (some more)
</broadcast>
... (long list of broadcast nodes)
</broadcasts>
我的 XSL 如下所示:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:redirect="http://xml.apache.org/xalan/redirect"
extension-element-prefixes="redirect"
version="1.0">
<!-- mark the CDATA escaped tags -->
<xsl:output method="xml" cdata-section-elements="title text"
indent="yes" omit-xml-declaration="no" />
<xsl:template match="broadcasts">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="broadcast">
<!-- Build filename PRG_YYYYMMDD.xml -->
<xsl:variable name="filename" select="concat(substring(date,1,4),substring(date,6,2))"/>
<xsl:variable name="filename" select="concat($filename,substring(date,9,2))" />
<xsl:variable name="filename" select="concat($filename,'.xml')" />
<redirect:write select="concat('PRG_',$filename)" append="true">
<schedule>
<broadcast program="TEST">
<!-- format timestamp in specific way -->
<xsl:variable name="tmstmp" select="concat(substring(date,9,2),'/')"/>
<xsl:variable name="tmstmp" select="concat($tmstmp,substring(date,6,2))"/>
<xsl:variable name="tmstmp" select="concat($tmstmp,'/')"/>
<xsl:variable name="tmstmp" select="concat($tmstmp,substring(date,1,4))"/>
<xsl:variable name="tmstmp" select="concat($tmstmp,' ')"/>
<xsl:variable name="tmstmp" select="concat($tmstmp,substring(time,1,5))"/>
<timestamp><xsl:value-of select="$tmstmp"/></timestamp>
<xsl:copy-of select="title"/>
<text><xsl:value-of select="subtitle"/></text>
<xsl:variable name="newVps" select="concat(substring(vps,1,2),substring(vps,4,2))"/>
<xsl:variable name="newVps" select="concat($newVps,substring(vps,7,2))"/>
<xsl:variable name="newVps" select="concat($newVps,substring(vps,10,2))"/>
<vps><xsl:value-of select="$newVps"/></vps>
<nextday>false</nextday>
</broadcast>
</schedule>
</redirect:write>
</xsl:template>
</xsl:stylesheet>
我的输出 XML 如下所示:
PRG_20090512.xml:
<?xml version="1.0" encoding="UTF-8"?>
<schedule>
<broadcast program="TEST">
<timestamp>01/03/2010 06:00</timestamp>
<title><![CDATA[TELEKOLLEG Geschichte ]]></title>
<text><![CDATA[Giganten in Fernost]]></text>
<vps>06000000</vps>
<nextday>false</nextday>
</broadcast>
</schedule>
<?xml version="1.0" encoding="UTF-8"?> <!-- don't want this -->
<schedule> <!-- don't want this -->
<broadcast program="TEST">
<timestamp>01/03/2010 06:30</timestamp>
<title><![CDATA[Die chemische Bindung]]></title>
<text/>
<vps>06300000</vps>
<nextday>false</nextday>
</broadcast>
</schedule>
<?xml version="1.0" encoding="UTF-8"?>
...and so on
我可以在输出声明中放入 omit-xml-declaration="yes",但我不这样做有任何 xml 标头。我尝试检查标签是否已在输出中,但未能在输出中选择节点...
这就是我尝试过的:
<xsl:choose>
<xsl:when test="count(schedule) = 0"> <!-- schedule needed -->
<schedule>
<broadcast>
...
<xsl:otherwise> <!-- no schedule needed -->
<broadcast>
...
感谢您的任何帮助,因为我不知道如何处理该问题。 ;( 野蒂
I get a huge XML file containing a list of TV broadcasts. And I have to split it up into small files containing all broadcasts for one day only. I managed to to that but have two problems with the xml header and a node being there multiple times.
The structure of the XML is the following:
<?xml version="1.0" encoding="UTF-8"?>
<broadcasts>
<broadcast>
<id>4637445812</id>
<week>39</week>
<date>2009-09-22</date>
<time>21:45:00:00</time>
... (some more)
</broadcast>
... (long list of broadcast nodes)
</broadcasts>
My XSL looks like this:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:redirect="http://xml.apache.org/xalan/redirect"
extension-element-prefixes="redirect"
version="1.0">
<!-- mark the CDATA escaped tags -->
<xsl:output method="xml" cdata-section-elements="title text"
indent="yes" omit-xml-declaration="no" />
<xsl:template match="broadcasts">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="broadcast">
<!-- Build filename PRG_YYYYMMDD.xml -->
<xsl:variable name="filename" select="concat(substring(date,1,4),substring(date,6,2))"/>
<xsl:variable name="filename" select="concat($filename,substring(date,9,2))" />
<xsl:variable name="filename" select="concat($filename,'.xml')" />
<redirect:write select="concat('PRG_',$filename)" append="true">
<schedule>
<broadcast program="TEST">
<!-- format timestamp in specific way -->
<xsl:variable name="tmstmp" select="concat(substring(date,9,2),'/')"/>
<xsl:variable name="tmstmp" select="concat($tmstmp,substring(date,6,2))"/>
<xsl:variable name="tmstmp" select="concat($tmstmp,'/')"/>
<xsl:variable name="tmstmp" select="concat($tmstmp,substring(date,1,4))"/>
<xsl:variable name="tmstmp" select="concat($tmstmp,' ')"/>
<xsl:variable name="tmstmp" select="concat($tmstmp,substring(time,1,5))"/>
<timestamp><xsl:value-of select="$tmstmp"/></timestamp>
<xsl:copy-of select="title"/>
<text><xsl:value-of select="subtitle"/></text>
<xsl:variable name="newVps" select="concat(substring(vps,1,2),substring(vps,4,2))"/>
<xsl:variable name="newVps" select="concat($newVps,substring(vps,7,2))"/>
<xsl:variable name="newVps" select="concat($newVps,substring(vps,10,2))"/>
<vps><xsl:value-of select="$newVps"/></vps>
<nextday>false</nextday>
</broadcast>
</schedule>
</redirect:write>
</xsl:template>
</xsl:stylesheet>
My output XMLs are like this:
PRG_20090512.xml:
<?xml version="1.0" encoding="UTF-8"?>
<schedule>
<broadcast program="TEST">
<timestamp>01/03/2010 06:00</timestamp>
<title><![CDATA[TELEKOLLEG Geschichte ]]></title>
<text><![CDATA[Giganten in Fernost]]></text>
<vps>06000000</vps>
<nextday>false</nextday>
</broadcast>
</schedule>
<?xml version="1.0" encoding="UTF-8"?> <!-- don't want this -->
<schedule> <!-- don't want this -->
<broadcast program="TEST">
<timestamp>01/03/2010 06:30</timestamp>
<title><![CDATA[Die chemische Bindung]]></title>
<text/>
<vps>06300000</vps>
<nextday>false</nextday>
</broadcast>
</schedule>
<?xml version="1.0" encoding="UTF-8"?>
...and so on
I can put in omit-xml-declaration="yes" in the output declaration, but the I don't have any xml header. I tried to put in a check if the tag is already in the output, but failed to select nodes in the output...
This is what I tried:
<xsl:choose>
<xsl:when test="count(schedule) = 0"> <!-- schedule needed -->
<schedule>
<broadcast>
...
<xsl:otherwise> <!-- no schedule needed -->
<broadcast>
...
Thanks for any help, as I'm unaware how to handle that. ;(
YeTI
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一次写入一个文件,包含该日期的所有广播。
这就变成了按日期对输入元素进行分组的问题。由于 Xalan 是 XSLT 1.0,因此您可以使用键来执行此操作。
我们定义了一个按日期对广播进行分组的键。我们选择每个广播的日期中的第一个。然后使用按键功能选择同一日期的所有广播。
Write a single file at a time, containing all broadcasts for that date.
This becomes a problem of grouping the input elements by date. As Xalan is XSLT 1.0, you do this with keys.
We define a key to group broadcasts by date. The we select each broadcast that is the first of its date. Then select all the broadcasts for the same date using the key function.
用唯一的父级包裹您的日程安排元素,看看是否可以解决问题。
我不熟悉这个特定问题,但我猜测这是由于您尝试生成具有多个顶级元素的 XML 文档引起的。每个 XML 文档必须恰好有一个顶级元素(如果您问我的话,这是一个愚蠢的要求,例如,它使 XML 不适合日志文件,但事实就是如此)。
Wrap your schedule elements with a unique parent and see if that makes the problem go away.
I'm not familiar with this particular problem, but my guess is that it's caused by your trying to generate XML documents with multiple top level elements. Every XML document must have exactly one top level element (a stupid requirement if you ask me, e.g. it makes XML unsuitable for logfiles, but that's the way it is).