在 Java 中通过 XSLT 分解 XML

发布于 2024-12-21 12:40:48 字数 6440 浏览 4 评论 0原文

我需要将具有嵌套(分层)表单结构的大型 XML 文件转换

   Flat XML
   Hierarchical XML (multiple blocks, some repetitive)
   Flat XML

为更扁平(“粉碎”)的表单,每个重复的嵌套块有 1 个块。

数据具有许多不同的标签和层次结构变化(尤其是分层 XML 前后的分解 XML 的标签数量),因此理想情况下不应对标签和属性名称或层次结构级别做出任何假设。

仅 4 个级别的层次结构的顶级视图看起来像这样

<Level 1>
   <Level 2>
      <Level 3>
        <Level 4>A</Level 4>
        <Level 4>B</Level 4>
      </Level 3>
   </Level 2>
</Level 1>

,所需的输出将是 也就是说

<Level 1>
  <Level 2>
      <Level 3>
        <Level 4>A</Level 4>
      </Level 3>
  </Level 2>
</Level 1>

<Level 1>
  <Level 2>
      <Level 3>
        <Level 4>B</Level 4>
      </Level 3>
  </Level 2>
</Level 1>

,如果在每个级别 i 都有 Li 不同的组件,总共会产生 Product(Li) 个不同的组件(上面只有 2 个,因为唯一的差异化因素是 Level 4,所以 L1*L2*L3*L4 = 2)。

根据我的观察,XSLT 可能是可行的方法,但任何其他解决方案(例如 StAX 甚至 JDOM)都可以。


<Employee name="A Name">
  <Address>123 A Street</Address>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
        <Job title = "Senior Developer">
        <Job title = "Senior Developer">
        <Job title = "Senior Developer">
    <Employment country="UK">
      <Comment>List of previous jobs in the UK</Comment>
        <Job title = "Junior Developer">
        <Job title = "Junior Developer">
  <Experience unit="years">6</Experience>

上述数据应被分解为 5 个块(即,每个不同的 块一个),每个块将使所有其他标签保持相同并且只有一个 元素。因此,考虑到上面示例中的 5 个不同的 块,转换后的(“粉碎的”)XML 将是

<Employee name="A Name">
  <Address>123 A Street</Address>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
        <Job title = "Senior Developer">
     <Experience unit="years">6</Experience>

<Employee name="A Name">
  <Address>123 A Street</Address>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
        <Job title = "Senior Developer">
     <Experience unit="years">6</Experience>

<Employee name="A Name">
  <Address>123 A Street</Address>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
        <Job title = "Senior Developer">
     <Experience unit="years">6</Experience>

<Employee name="A Name">
  <Address>123 A Street</Address>
    <Employment country="UK">
      <Comment>List of previous jobs in the UK</Comment>
        <Job title = "Junior Developer">
     <Experience unit="years">6</Experience>

<Employee name="A Name">
  <Address>123 A Street</Address>
    <Employment country="UK">
      <Comment>List of previous jobs in the UK</Comment>
        <Job title = "Junior Developer">
     <Experience unit="years">6</Experience>

I need to transform large XML files that have a nested (hierarchical) structure of the form

   Flat XML
   Hierarchical XML (multiple blocks, some repetitive)
   Flat XML

into a flatter ("shredded") form, with 1 block for each repetitive nested block.

The data has numerous different tags and hierarchy variations (especially in the number of tags of the shredded XML before and after the hierarchical XML), so ideally no assumption should be made about tag and attribute names, or the hierarchical level.

A top-level view of the hierarchy for just 4 levels would look something like

<Level 1>
   <Level 2>
      <Level 3>
        <Level 4>A</Level 4>
        <Level 4>B</Level 4>
      </Level 3>
   </Level 2>
</Level 1>

and the desired output would then be

<Level 1>
  <Level 2>
      <Level 3>
        <Level 4>A</Level 4>
      </Level 3>
  </Level 2>
</Level 1>

<Level 1>
  <Level 2>
      <Level 3>
        <Level 4>B</Level 4>
      </Level 3>
  </Level 2>
</Level 1>

That is, if at each level i there are Li different components, a total of Product(Li) different components will be produced (just 2 above, since the only differentiating factor is Level 4, so L1*L2*L3*L4 = 2).

From what I have seen around, XSLT may be the way to go, but any other solution (e.g., StAX or even JDOM) would do.

A more detailed example, using fictitious information, would be

<Employee name="A Name">
  <Address>123 A Street</Address>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
        <Job title = "Senior Developer">
        <Job title = "Senior Developer">
        <Job title = "Senior Developer">
    <Employment country="UK">
      <Comment>List of previous jobs in the UK</Comment>
        <Job title = "Junior Developer">
        <Job title = "Junior Developer">
  <Experience unit="years">6</Experience>

The above data should be shredded into 5 blocks (i.e., one for each different <Job> block), each of which will leave all other tags identical and just have a single <Job> element. So, given the 5 different <Job> blocks in the above example, the transformed ("shredded") XML would be

<Employee name="A Name">
  <Address>123 A Street</Address>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
        <Job title = "Senior Developer">
     <Experience unit="years">6</Experience>

<Employee name="A Name">
  <Address>123 A Street</Address>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
        <Job title = "Senior Developer">
     <Experience unit="years">6</Experience>

<Employee name="A Name">
  <Address>123 A Street</Address>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
        <Job title = "Senior Developer">
     <Experience unit="years">6</Experience>

<Employee name="A Name">
  <Address>123 A Street</Address>
    <Employment country="UK">
      <Comment>List of previous jobs in the UK</Comment>
        <Job title = "Junior Developer">
     <Experience unit="years">6</Experience>

<Employee name="A Name">
  <Address>123 A Street</Address>
    <Employment country="UK">
      <Comment>List of previous jobs in the UK</Comment>
        <Job title = "Junior Developer">
     <Experience unit="years">6</Experience>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。



需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。


祁梦 2024-12-28 12:40:48


<xsl:stylesheet version="1.0"
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="pLeafNodes" select="//Level-4"/>

 <xsl:template match="/">
    <xsl:call-template name="StructRepro"/>

 <xsl:template name="StructRepro">
   <xsl:param name="pLeaves" select="$pLeafNodes"/>

   <xsl:for-each select="$pLeaves">
     <xsl:apply-templates mode="build" select="/*">
      <xsl:with-param name="pChild" select="."/>
      <xsl:with-param name="pLeaves" select="$pLeaves"/>

  <xsl:template mode="build" match="node()|@*">
      <xsl:param name="pChild"/>
      <xsl:param name="pLeaves"/>

       <xsl:apply-templates mode="build" select="@*"/>

       <xsl:variable name="vLeafChild" select=
         "*[count(.|$pChild) = count($pChild)]"/>

        <xsl:when test="$vLeafChild">
         <xsl:apply-templates mode="build"
                      node()[not(count(.|$pLeaves) = count($pLeaves))]">
             <xsl:with-param name="pChild" select="$pChild"/>
             <xsl:with-param name="pLeaves" select="$pLeaves"/>
         <xsl:apply-templates mode="build" select=
         "node()[not(.//*[count(.|$pLeaves) = count($pLeaves)])
                 .//*[count(.|$pChild) = count($pChild)]

             <xsl:with-param name="pChild" select="$pChild"/>
             <xsl:with-param name="pLeaves" select="$pLeaves"/>
 <xsl:template match="text()"/>

当应用于提供的简化(通用)XML 文档时





 <xsl:param name="pLeafNodes" select="//Level-4"/>


 <xsl:param name="pLeafNodes" select="//Job"/>

并将转换应用到 Employee XML 文档

<Employee name="A Name">
    <Address>123 A Street</Address>
        <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
                <Job title = "Senior Developer">
                <Job title = "Senior Developer">
                <Job title = "Senior Developer">
        <Employment country="UK">
            <Comment>List of previous jobs in the UK</Comment>
                <Job title = "Junior Developer">
                <Job title = "Junior Developer">
    <Experience unit="years">6</Experience>


   <Employee name="A Name">
      <Address>123 A Street</Address>
         <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
               <Job title="Senior Developer">
      <Experience unit="years">6</Experience>
   <Employee name="A Name">
      <Address>123 A Street</Address>
         <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
               <Job title="Senior Developer">
      <Experience unit="years">6</Experience>
   <Employee name="A Name">
      <Address>123 A Street</Address>
         <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
               <Job title="Senior Developer">
      <Experience unit="years">6</Experience>
   <Employee name="A Name">
      <Address>123 A Street</Address>
         <Employment country="UK">
            <Comment>List of previous jobs in the UK</Comment>
               <Job title="Junior Developer">
      <Experience unit="years">6</Experience>
   <Employee name="A Name">
      <Address>123 A Street</Address>
         <Employment country="UK">
            <Comment>List of previous jobs in the UK</Comment>
               <Job title="Junior Developer">
      <Experience unit="years">6</Experience>

解释:该处理在命名模板 (StructRepro) 中完成,并由名为 pLeafNodes 的单个外部参数控制,该参数必须包含其“向上结构”要实现的所有节点的节点集。被再现在结果中。

Here is a generic solution as requested:

<xsl:stylesheet version="1.0"
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="pLeafNodes" select="//Level-4"/>

 <xsl:template match="/">
    <xsl:call-template name="StructRepro"/>

 <xsl:template name="StructRepro">
   <xsl:param name="pLeaves" select="$pLeafNodes"/>

   <xsl:for-each select="$pLeaves">
     <xsl:apply-templates mode="build" select="/*">
      <xsl:with-param name="pChild" select="."/>
      <xsl:with-param name="pLeaves" select="$pLeaves"/>

  <xsl:template mode="build" match="node()|@*">
      <xsl:param name="pChild"/>
      <xsl:param name="pLeaves"/>

       <xsl:apply-templates mode="build" select="@*"/>

       <xsl:variable name="vLeafChild" select=
         "*[count(.|$pChild) = count($pChild)]"/>

        <xsl:when test="$vLeafChild">
         <xsl:apply-templates mode="build"
                      node()[not(count(.|$pLeaves) = count($pLeaves))]">
             <xsl:with-param name="pChild" select="$pChild"/>
             <xsl:with-param name="pLeaves" select="$pLeaves"/>
         <xsl:apply-templates mode="build" select=
         "node()[not(.//*[count(.|$pLeaves) = count($pLeaves)])
                 .//*[count(.|$pChild) = count($pChild)]

             <xsl:with-param name="pChild" select="$pChild"/>
             <xsl:with-param name="pLeaves" select="$pLeaves"/>
 <xsl:template match="text()"/>

When applied on the provided simplified (and generic) XML document:


the wanted, correct result is produced:


Now, if we change the line:

 <xsl:param name="pLeafNodes" select="//Level-4"/>


 <xsl:param name="pLeafNodes" select="//Job"/>

and apply the transformation to the Employee XML document:

<Employee name="A Name">
    <Address>123 A Street</Address>
        <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
                <Job title = "Senior Developer">
                <Job title = "Senior Developer">
                <Job title = "Senior Developer">
        <Employment country="UK">
            <Comment>List of previous jobs in the UK</Comment>
                <Job title = "Junior Developer">
                <Job title = "Junior Developer">
    <Experience unit="years">6</Experience>

we again get the wanted, correct result:

   <Employee name="A Name">
      <Address>123 A Street</Address>
         <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
               <Job title="Senior Developer">
      <Experience unit="years">6</Experience>
   <Employee name="A Name">
      <Address>123 A Street</Address>
         <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
               <Job title="Senior Developer">
      <Experience unit="years">6</Experience>
   <Employee name="A Name">
      <Address>123 A Street</Address>
         <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
               <Job title="Senior Developer">
      <Experience unit="years">6</Experience>
   <Employee name="A Name">
      <Address>123 A Street</Address>
         <Employment country="UK">
            <Comment>List of previous jobs in the UK</Comment>
               <Job title="Junior Developer">
      <Experience unit="years">6</Experience>
   <Employee name="A Name">
      <Address>123 A Street</Address>
         <Employment country="UK">
            <Comment>List of previous jobs in the UK</Comment>
               <Job title="Junior Developer">
      <Experience unit="years">6</Experience>

Explanation: The processing is done in a named template (StructRepro) and controlled by a single external parameter named pLeafNodes, that must contain a nodeset of all nodes whose "upward structure" is to be reproduced in the result.

素手挽清风 2024-12-28 12:40:48

给定以下 XML:

<?xml version="1.0" encoding="utf-8" ?>
<Employee name="A Name">
  <Address>123 A Street</Address>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
        <Job title = "Developer">
        <Job title = "Developer">
        <Job title = "Developer">
      <Employment country="UK">
        <Comment>List of previous jobs in the UK</Comment>
          <Job title = "Developer">
          <Job title = "Developer">
  <Experience unit="years">6</Experience>

以下 XSLT:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">

    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="/">
        <xsl:apply-templates select="//Employee/EmploymentHistory/Employment/JobDetails/Job" />

  <xsl:template match="//Employee/EmploymentHistory/Employment/JobDetails/Job">
      <xsl:attribute name="name">
        <xsl:value-of select="ancestor::Employee/@name"/>
        <xsl:value-of select="ancestor::Employee/Address"/>
        <xsl:value-of select="ancestor::Employee/Age"/>
          <xsl:attribute name="country">
            <xsl:value-of select="ancestor::Employment/@country"/>
            <xsl:value-of select="ancestor::Employment/Comment"/>
            <xsl:value-of select="ancestor::Employment/Jobs"/>
            <xsl:copy-of select="."/>
            <xsl:value-of select="ancestor::Employee/Available"/>
            <xsl:attribute name="unit">
              <xsl:value-of select="ancestor::Employee/Experience/@unit"/>
            <xsl:value-of select="ancestor::Employee/Experience"/>




<?xml version="1.0" encoding="utf-8"?>
  <Employee name="A Name">
    <Address>123 A Street</Address>
      <Employment country="US">
        <Comment>List of previous jobs in the US</Comment>
          <Job title="Developer">
        <Experience unit="years">6</Experience>
  <Employee name="A Name">
    <Address>123 A Street</Address>
      <Employment country="US">
        <Comment>List of previous jobs in the US</Comment>
          <Job title="Developer">
        <Experience unit="years">6</Experience>
  <Employee name="A Name">
    <Address>123 A Street</Address>
      <Employment country="US">
        <Comment>List of previous jobs in the US</Comment>
          <Job title="Developer">
        <Experience unit="years">6</Experience>
  <Employee name="A Name">
    <Address>123 A Street</Address>
      <Employment country="UK">
        <Comment>List of previous jobs in the UK</Comment>
          <Job title="Developer">
        <Experience unit="years">6</Experience>
  <Employee name="A Name">
    <Address>123 A Street</Address>
      <Employment country="UK">
        <Comment>List of previous jobs in the UK</Comment>
          <Job title="Developer">
        <Experience unit="years">6</Experience>



您也许还可以使用 xsl:copy 来复制更高级别的元素,但我需要多考虑一下这一点。使用上面的 xslt,您拥有更多控制权,但您也必须重新定义元素......

Given the following XML:

<?xml version="1.0" encoding="utf-8" ?>
<Employee name="A Name">
  <Address>123 A Street</Address>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
        <Job title = "Developer">
        <Job title = "Developer">
        <Job title = "Developer">
      <Employment country="UK">
        <Comment>List of previous jobs in the UK</Comment>
          <Job title = "Developer">
          <Job title = "Developer">
  <Experience unit="years">6</Experience>

The following XSLT:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">

    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="/">
        <xsl:apply-templates select="//Employee/EmploymentHistory/Employment/JobDetails/Job" />

  <xsl:template match="//Employee/EmploymentHistory/Employment/JobDetails/Job">
      <xsl:attribute name="name">
        <xsl:value-of select="ancestor::Employee/@name"/>
        <xsl:value-of select="ancestor::Employee/Address"/>
        <xsl:value-of select="ancestor::Employee/Age"/>
          <xsl:attribute name="country">
            <xsl:value-of select="ancestor::Employment/@country"/>
            <xsl:value-of select="ancestor::Employment/Comment"/>
            <xsl:value-of select="ancestor::Employment/Jobs"/>
            <xsl:copy-of select="."/>
            <xsl:value-of select="ancestor::Employee/Available"/>
            <xsl:attribute name="unit">
              <xsl:value-of select="ancestor::Employee/Experience/@unit"/>
            <xsl:value-of select="ancestor::Employee/Experience"/>



Gives the following output:

<?xml version="1.0" encoding="utf-8"?>
  <Employee name="A Name">
    <Address>123 A Street</Address>
      <Employment country="US">
        <Comment>List of previous jobs in the US</Comment>
          <Job title="Developer">
        <Experience unit="years">6</Experience>
  <Employee name="A Name">
    <Address>123 A Street</Address>
      <Employment country="US">
        <Comment>List of previous jobs in the US</Comment>
          <Job title="Developer">
        <Experience unit="years">6</Experience>
  <Employee name="A Name">
    <Address>123 A Street</Address>
      <Employment country="US">
        <Comment>List of previous jobs in the US</Comment>
          <Job title="Developer">
        <Experience unit="years">6</Experience>
  <Employee name="A Name">
    <Address>123 A Street</Address>
      <Employment country="UK">
        <Comment>List of previous jobs in the UK</Comment>
          <Job title="Developer">
        <Experience unit="years">6</Experience>
  <Employee name="A Name">
    <Address>123 A Street</Address>
      <Employment country="UK">
        <Comment>List of previous jobs in the UK</Comment>
          <Job title="Developer">
        <Experience unit="years">6</Experience>

Note that I've added an Output root element to ensure the document is well formed.

Is this what you wanted?

You might also be able to use xsl:copy to copy the higher level elements, but I need to think about this one a bit more. With the above xslt, you have more control, but also you have to redefine your elements...

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。