通过 XSL/XSLT 进行 XML 到 XML 转换?
我正在非常努力地尝试获取由专有数据库吐出的 XML 文档,并将其转换为格式良好的 XML 文档,最终由 Apache Solr 编制索引,但运气不佳。
我想获取这个 XML 文件并将其转换为 Apache Solr 格式,如下所示。
<?xml version="1.0" encoding="UTF-8" ?>
<ecatalogue>
<tuple>
<table name="CatObjectName_tab">
<tuple>
<atom name="CatObjectName">Clog</atom>
</tuple>
</table>
<atom name="CatObjectNumber">2003-39-27A</atom>
<atom name="CatObjectTitle"></atom>
<table name="CatOtherNumbers_tab">
<tuple>
<atom name="CatOtherNumbers">1895.1.117a</atom>
</tuple>
</table>
<table name="ProPlaceName_tab">
<tuple>
<atom name="ProPlaceName">China</atom>
</tuple>
</table>
<table name="CatOtherNumberType_tab">
<tuple>
<atom name="CatOtherNumberType">Other Number</atom>
</tuple>
</table>
<atom name="DatDateMade"></atom>
<atom name="DatEarliestDateMadeOrig"></atom>
<atom name="DatLatestDateMadeOrig"></atom>
</tuple>
<tuple>
<table name="CatObjectName_tab">
<tuple>
<atom name="CatObjectName">Boot</atom>
</tuple>
</table>
<atom name="CatObjectNumber">2003-39-20B</atom>
<atom name="CatObjectTitle"></atom>
<table name="CatOtherNumbers_tab">
<tuple>
<atom name="CatOtherNumbers">1895.1.91b</atom>
</tuple>
</table>
<table name="ProPlaceName_tab">
<tuple>
<atom name="ProPlaceName">China</atom>
</tuple>
</table>
<table name="CatOtherNumberType_tab">
<tuple>
<atom name="CatOtherNumberType">Other Number</atom>
</tuple>
</table>
<atom name="DatDateMade"></atom>
<atom name="DatEarliestDateMadeOrig"></atom>
<atom name="DatLatestDateMadeOrig"></atom>
</tuple>
</ecatalogue>
我想将上面的内容转换为:
<add>
<doc>
<field name="ProPlaceName">China</field>
<field name="CatObjectTitle"></field>
<field name="CatObjectNumber">2003-39-27A</field>
<field name="CatOtherNumberType">Other Number</field>
<field name="CatOtherNumbers">1895.1.117a</field>
<field name="CatObjectName_tab">Clog</field>
<field name="DatDateMade"></field>
<field name="DatEarliestDateMadeOrig"></field>
<field name="DatLatestDateMadeOrig"></field>
</doc>
<!-- Row 2 -->
<doc>
<field name="ProPlaceName">China</field>
<field name="CatObjectTitle"></field>
<field name="CatObjectNumber">2003-39-20B</field>
<field name="CatOtherNumberType">Other Number</field>
<field name="CatOtherNumbers">1895.1.91b</field>
<field name="CatObjectName_tab">Boot</field>
<field name="DatDateMade"></field>
<field name="DatEarliestDateMadeOrig"></field>
<field name="DatLatestDateMadeOrig"></field>
</doc>
</add>
最好尝试使用 XSL/XSLT 还是使用 java 或其他编程语言来进行转换?您将如何解决这个问题?您能给我指出正确的方向吗?
我相信可以使用 XSL 来完成。任何帮助表示赞赏。
I am trying very hard with no luck to take an XML document which is spit out by a proprietary database and transform it into a well-formed XML document which will eventually be indexed by Apache Solr.
I would like to take this XML file and transform it into a Apache Solr format like that below it.
<?xml version="1.0" encoding="UTF-8" ?>
<ecatalogue>
<tuple>
<table name="CatObjectName_tab">
<tuple>
<atom name="CatObjectName">Clog</atom>
</tuple>
</table>
<atom name="CatObjectNumber">2003-39-27A</atom>
<atom name="CatObjectTitle"></atom>
<table name="CatOtherNumbers_tab">
<tuple>
<atom name="CatOtherNumbers">1895.1.117a</atom>
</tuple>
</table>
<table name="ProPlaceName_tab">
<tuple>
<atom name="ProPlaceName">China</atom>
</tuple>
</table>
<table name="CatOtherNumberType_tab">
<tuple>
<atom name="CatOtherNumberType">Other Number</atom>
</tuple>
</table>
<atom name="DatDateMade"></atom>
<atom name="DatEarliestDateMadeOrig"></atom>
<atom name="DatLatestDateMadeOrig"></atom>
</tuple>
<tuple>
<table name="CatObjectName_tab">
<tuple>
<atom name="CatObjectName">Boot</atom>
</tuple>
</table>
<atom name="CatObjectNumber">2003-39-20B</atom>
<atom name="CatObjectTitle"></atom>
<table name="CatOtherNumbers_tab">
<tuple>
<atom name="CatOtherNumbers">1895.1.91b</atom>
</tuple>
</table>
<table name="ProPlaceName_tab">
<tuple>
<atom name="ProPlaceName">China</atom>
</tuple>
</table>
<table name="CatOtherNumberType_tab">
<tuple>
<atom name="CatOtherNumberType">Other Number</atom>
</tuple>
</table>
<atom name="DatDateMade"></atom>
<atom name="DatEarliestDateMadeOrig"></atom>
<atom name="DatLatestDateMadeOrig"></atom>
</tuple>
</ecatalogue>
I would like to transform the above into this:
<add>
<doc>
<field name="ProPlaceName">China</field>
<field name="CatObjectTitle"></field>
<field name="CatObjectNumber">2003-39-27A</field>
<field name="CatOtherNumberType">Other Number</field>
<field name="CatOtherNumbers">1895.1.117a</field>
<field name="CatObjectName_tab">Clog</field>
<field name="DatDateMade"></field>
<field name="DatEarliestDateMadeOrig"></field>
<field name="DatLatestDateMadeOrig"></field>
</doc>
<!-- Row 2 -->
<doc>
<field name="ProPlaceName">China</field>
<field name="CatObjectTitle"></field>
<field name="CatObjectNumber">2003-39-20B</field>
<field name="CatOtherNumberType">Other Number</field>
<field name="CatOtherNumbers">1895.1.91b</field>
<field name="CatObjectName_tab">Boot</field>
<field name="DatDateMade"></field>
<field name="DatEarliestDateMadeOrig"></field>
<field name="DatLatestDateMadeOrig"></field>
</doc>
</add>
Is it best to try and use XSL/XSLT or use something like java or another programming language to make the transformation? How would you approach this problem and can you point me in the right direction?
I believe it can be done using XSL. Any help is appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是应该有帮助的东西。它相当简单,并假设您跳过任何嵌套表......而只是抓取其中的原子。它不按任何特定顺序对字段进行排序。
Here's something that should help. It's fairly simple, and assumes that you are skipping any nested tables...instead only grabbing the atoms within them. It does not sort the fields in any specific order.
除非您能保证 XML 始终有效,否则我会采用编程语言方法。我为您提供了更灵活的数据解析方式。您说过数据来自专有数据库,这使我需要灵活性。
举个例子,如果数据库由于缺陷而导出无效的 xml,该怎么办?您可以更快地更改哪些组件?
为什么不选择一种解决方案来解析 XML,然后创建可以输出为所需格式的对象模型。您可以使用自己的 XML/XSLT 或模板工具集 (POJO/Velocity) 来处理最终的转换。
Unless you can guarentee that the XML will always be valid I would go with a programming language approach. I gives you more flexbility in how you parse your data. You stated the data came from a proprietary database and that causes me to want the flexibility.
Case in point, what if the database is exporting invalid xml due to a defect. What component would you be able to change sooner?
Why not choose a solution that parses the XML and then creates an object model that can be outputted to the desired format. You could use your own XML/XSLT or templating toolset (POJO/Velocity) to handle the final transformation.