在 Java 中通过 XSLT 分解 XML
我需要将具有嵌套(分层)表单结构的大型 XML 文件转换
<Root>
Flat XML
Hierarchical XML (multiple blocks, some repetitive)
Flat XML
</Root>
为更扁平(“粉碎”)的表单,每个重复的嵌套块有 1 个块。
数据具有许多不同的标签和层次结构变化(尤其是分层 XML 前后的分解 XML 的标签数量),因此理想情况下不应对标签和属性名称或层次结构级别做出任何假设。
仅 4 个级别的层次结构的顶级视图看起来像这样
<Level 1>
...
<Level 2>
...
<Level 3>
...
<Level 4>A</Level 4>
<Level 4>B</Level 4>
...
</Level 3>
...
</Level 2>
...
</Level 1>
,所需的输出将是 也就是说
<Level 1>
...
<Level 2>
...
<Level 3>
...
<Level 4>A</Level 4>
...
</Level 3>
...
</Level 2>
...
</Level 1>
<Level 1>
...
<Level 2>
...
<Level 3>
...
<Level 4>B</Level 4>
...
</Level 3>
...
</Level 2>
...
</Level 1>
,如果在每个级别 i
都有 Li
不同的组件,总共会产生 Product(Li)
个不同的组件(上面只有 2 个,因为唯一的差异化因素是 Level 4,所以 L1*L2*L3*L4 = 2
)。
根据我的观察,XSLT 可能是可行的方法,但任何其他解决方案(例如 StAX 甚至 JDOM)都可以。
使用虚构信息的更详细的示例是:
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/10/2001</StartDate>
<Months>38</Months>
</Job>
<Job title = "Senior Developer">
<StartDate>01/12/2004</StartDate>
<Months>6</Months>
</Job>
<Job title = "Senior Developer">
<StartDate>01/06/2005</StartDate>
<Months>10</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>2</Jobs>
<JobDetails>
<Job title = "Junior Developer">
<StartDate>01/05/1999</StartDate>
<Months>25</Months>
</Job>
<Job title = "Junior Developer">
<StartDate>01/07/2001</StartDate>
<Months>3</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
上述数据应被分解为 5 个块(即,每个不同的
块一个),每个块将使所有其他标签保持相同并且只有一个
元素。因此,考虑到上面示例中的 5 个不同的
块,转换后的(“粉碎的”)XML 将是
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/10/2001</StartDate>
<Months>38</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/12/2004</StartDate>
<Months>6</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/06/2005</StartDate>
<Months>10</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Junior Developer">
<StartDate>01/05/1999</StartDate>
<Months>25</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Junior Developer">
<StartDate>01/07/2001</StartDate>
<Months>3</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
I need to transform large XML files that have a nested (hierarchical) structure of the form
<Root>
Flat XML
Hierarchical XML (multiple blocks, some repetitive)
Flat XML
</Root>
into a flatter ("shredded") form, with 1 block for each repetitive nested block.
The data has numerous different tags and hierarchy variations (especially in the number of tags of the shredded XML before and after the hierarchical XML), so ideally no assumption should be made about tag and attribute names, or the hierarchical level.
A top-level view of the hierarchy for just 4 levels would look something like
<Level 1>
...
<Level 2>
...
<Level 3>
...
<Level 4>A</Level 4>
<Level 4>B</Level 4>
...
</Level 3>
...
</Level 2>
...
</Level 1>
and the desired output would then be
<Level 1>
...
<Level 2>
...
<Level 3>
...
<Level 4>A</Level 4>
...
</Level 3>
...
</Level 2>
...
</Level 1>
<Level 1>
...
<Level 2>
...
<Level 3>
...
<Level 4>B</Level 4>
...
</Level 3>
...
</Level 2>
...
</Level 1>
That is, if at each level i
there are Li
different components, a total of Product(Li)
different components will be produced (just 2 above, since the only differentiating factor is Level 4, so L1*L2*L3*L4 = 2
).
From what I have seen around, XSLT may be the way to go, but any other solution (e.g., StAX or even JDOM) would do.
A more detailed example, using fictitious information, would be
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/10/2001</StartDate>
<Months>38</Months>
</Job>
<Job title = "Senior Developer">
<StartDate>01/12/2004</StartDate>
<Months>6</Months>
</Job>
<Job title = "Senior Developer">
<StartDate>01/06/2005</StartDate>
<Months>10</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>2</Jobs>
<JobDetails>
<Job title = "Junior Developer">
<StartDate>01/05/1999</StartDate>
<Months>25</Months>
</Job>
<Job title = "Junior Developer">
<StartDate>01/07/2001</StartDate>
<Months>3</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
The above data should be shredded into 5 blocks (i.e., one for each different <Job>
block), each of which will leave all other tags identical and just have a single <Job>
element. So, given the 5 different <Job>
blocks in the above example, the transformed ("shredded") XML would be
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/10/2001</StartDate>
<Months>38</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/12/2004</StartDate>
<Months>6</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/06/2005</StartDate>
<Months>10</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Junior Developer">
<StartDate>01/05/1999</StartDate>
<Months>25</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Junior Developer">
<StartDate>01/07/2001</StartDate>
<Months>3</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是一个按要求提供的通用解决方案:
当应用于提供的简化(通用)XML 文档时:
生成所需的正确结果:
现在,如果我们将行:
更改为:
并将转换应用到
Employee
XML 文档:我们再次得到想要的正确结果:
解释:该处理在命名模板 (
StructRepro
) 中完成,并由名为pLeafNodes
的单个外部参数控制,该参数必须包含其“向上结构”要实现的所有节点的节点集。被再现在结果中。Here is a generic solution as requested:
When applied on the provided simplified (and generic) XML document:
the wanted, correct result is produced:
Now, if we change the line:
to:
and apply the transformation to the
Employee
XML document:we again get the wanted, correct result:
Explanation: The processing is done in a named template (
StructRepro
) and controlled by a single external parameter namedpLeafNodes
, that must contain a nodeset of all nodes whose "upward structure" is to be reproduced in the result.给定以下 XML:
以下 XSLT:
给出以下输出:
请注意,我添加了一个输出根元素以确保文档格式良好。
这是你想要的吗?
您也许还可以使用 xsl:copy 来复制更高级别的元素,但我需要多考虑一下这一点。使用上面的 xslt,您拥有更多控制权,但您也必须重新定义元素......
Given the following XML:
The following XSLT:
Gives the following output:
Note that I've added an Output root element to ensure the document is well formed.
Is this what you wanted?
You might also be able to use xsl:copy to copy the higher level elements, but I need to think about this one a bit more. With the above xslt, you have more control, but also you have to redefine your elements...