使用 C# 将多个存在的节点合并到一个父节点下

发布于 2024-10-21 02:58:46 字数 1432 浏览 7 评论 0原文

我有一个包含多个节点的 XML。所有此类节点下面都有节点。我想要使节点出现一次，以便所有节点属于同一节点显示为特定页面的子页面。例如

输入：

<Page PageID="**1**">
   <Para ParaID="1">
     <some nodes as child of para>
   </Para>
</Page>
<Page PageID="**2**">
   <Para ParaID="**1**">
     <some nodes as child of para>
   </Para>
</Page>
<Page PageID="**1**"> <!Page 1 encountered again>
   <Para ParaID="**1**">
     <some nodes as child of para>
   </Para>
</Page>
<Page PageID="**3**">
   <Para ParaID="**1**">
     <some nodes as child of para>
   </Para>
</Page>

预期输出：

<Page PageID="**1**">
   <Para ParaID="**1**">
     <some nodes as child of para>
   </Para>
   <Para ParaID="**2**">           <!all <Para> of Page 1 are under single <Page> node>
     <some nodes as child of para>
   </Para>
</Page>
<Page PageID="**2**">
   <Para ParaID="**1**">
     <some nodes as child of para>
   </Para>
</Page>
<Page PageID="**3**">
   <Para ParaID="**1**">
     <some nodes as child of para>
   </Para>
</Page>

原文

I have an XML having multiple <Page Pageid="1"> nodes. All such nodes have <Para Paraid="1"> nodes under them. I want to do make single occurence of <Page> node such that all <Para> nodes belonging to same <Page> node are shown as child of particular page. e.g.

INPUT:

<Page PageID="**1**">
   <Para ParaID="1">
     <some nodes as child of para>
   </Para>
</Page>
<Page PageID="**2**">
   <Para ParaID="**1**">
     <some nodes as child of para>
   </Para>
</Page>
<Page PageID="**1**"> <!Page 1 encountered again>
   <Para ParaID="**1**">
     <some nodes as child of para>
   </Para>
</Page>
<Page PageID="**3**">
   <Para ParaID="**1**">
     <some nodes as child of para>
   </Para>
</Page>

Expected OUTPUT:

<Page PageID="**1**">
   <Para ParaID="**1**">
     <some nodes as child of para>
   </Para>
   <Para ParaID="**2**">           <!all <Para> of Page 1 are under single <Page> node>
     <some nodes as child of para>
   </Para>
</Page>
<Page PageID="**2**">
   <Para ParaID="**1**">
     <some nodes as child of para>
   </Para>
</Page>
<Page PageID="**3**">
   <Para ParaID="**1**">
     <some nodes as child of para>
   </Para>
</Page>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

你对谁都笑 2024-10-28 02:58:46

如果您使用的是 .NET 3.5，则可以使用 XDocument 系列和 Linq 扩展来完成相当轻松的任务：

var doc1 = XDocument.Parse(stringContainingYourXML);
var groups = doc1.Root.Elements().ToLookup(elt => elt.Attribute("PageID").Value);
var unique = groups.AsEnumerable().Select(group => group.First());
var doc2 = new XDocument(new XElement("root", unique));

对此的解释是，我们在第 2 行创建一个查找表，其中元素包含与 < 相同的值code>PageID 分组在一起。鉴于您的示例 XML，它需要 4 个元素并创建 3 个组，其中一组包含两个 PageID="1" 元素。

在第 3 行，我们循环访问 3 个组，并仅提取其中第一个 XML 元素，在第 4 行，我们将这 3 个元素塞入一个新文档中。生成的 XML 为：

<root>
  <Page PageID="**1**">
    <Para ParaID="1" />
  </Page>
  <Page PageID="**2**">
    <Para ParaID="**1**" />
  </Page>
  <Page PageID="**3**">
    <Para ParaID="**1**" />
  </Page>
</root>

更新：2011/03/12

下面的代码考虑了页面重复实例中的段落以自动递增方式合并在一起的要求。

与之前的解决方案相比，修改后的解决方案非常糟糕，但弄乱 ParaID 值（尤其是它们的格式）非常烦人。我并不为此感到自豪，但事实是：

using System;
using System.Linq;
using System.Text.RegularExpressions;
using System.Xml.Linq;

namespace SO {
    class Program {
        static void Main(string[] args) {
            var doc1 = XDocument.Parse(xmlstr);
            var groups = doc1.Root.Elements().ToLookup(page => page.Attribute("PageID").Value);
            var doc2 = new XDocument(new XElement("root"));

            foreach (var group in groups) {
                var firstpage = group.First();
                var startindex = firstpage.Elements("Para").Last().Attribute("ParaID").Value;
                var lastindex = int.Parse(Regex.Match(startindex, @"\d+").Value);

                // Duplicate pages...
                firstpage.Add(
                    group.Skip(1)
                         .SelectMany(page => page.Elements("Para"))
                         .Select(
                             para => {
                                 para.Attribute("ParaID").Value = Regex.Replace(
                                     para.Attribute("ParaID").Value,
                                     @"\d+",
                                     m => (++lastindex).ToString()
                                 );
                                 return para;
                             }
                         )
                );

                doc2.Root.Add(firstpage);
            }

            Console.WriteLine(doc2);
            Console.ReadKey(true);
        }
    }
}

If you are using .NET 3.5, you can use the XDocument family and Linq extensions to make fairly light work of the task:

var doc1 = XDocument.Parse(stringContainingYourXML);
var groups = doc1.Root.Elements().ToLookup(elt => elt.Attribute("PageID").Value);
var unique = groups.AsEnumerable().Select(group => group.First());
var doc2 = new XDocument(new XElement("root", unique));

The explanation of this is that we are creating a lookup table on line 2, where elements containing the same value for PageID are grouped together. Given your example XML, it takes 4 <Page/> elements and creates 3 groups, with one group containing both PageID="1" elements.

On line 3, we loop through the 3 groups and extract just the first XML element for one, and on line 4 we jam those 3 elements into a new document. The resulting XML is:

<root>
  <Page PageID="**1**">
    <Para ParaID="1" />
  </Page>
  <Page PageID="**2**">
    <Para ParaID="**1**" />
  </Page>
  <Page PageID="**3**">
    <Para ParaID="**1**" />
  </Page>
</root>

Update: 2011/03/12

The code below takes into account the requirement for paragraphs from duplicate instances of a page to be merged together in an auto-incrementing kind of way.

The revised solution is pretty awful compared to the previous one, but messing around with the ParaID values (especially in the format they are in) was quite annoying. I'm not proud of this, but here it is:

using System;
using System.Linq;
using System.Text.RegularExpressions;
using System.Xml.Linq;

namespace SO {
    class Program {
        static void Main(string[] args) {
            var doc1 = XDocument.Parse(xmlstr);
            var groups = doc1.Root.Elements().ToLookup(page => page.Attribute("PageID").Value);
            var doc2 = new XDocument(new XElement("root"));

            foreach (var group in groups) {
                var firstpage = group.First();
                var startindex = firstpage.Elements("Para").Last().Attribute("ParaID").Value;
                var lastindex = int.Parse(Regex.Match(startindex, @"\d+").Value);

                // Duplicate pages...
                firstpage.Add(
                    group.Skip(1)
                         .SelectMany(page => page.Elements("Para"))
                         .Select(
                             para => {
                                 para.Attribute("ParaID").Value = Regex.Replace(
                                     para.Attribute("ParaID").Value,
                                     @"\d+",
                                     m => (++lastindex).ToString()
                                 );
                                 return para;
                             }
                         )
                );

                doc2.Root.Add(firstpage);
            }

            Console.WriteLine(doc2);
            Console.ReadKey(true);
        }
    }
}

回复收藏 0 原文

平生欢 2024-10-28 02:58:46

这不是特别有效 - 有一种更快的方法，使用 xsl:key - 但它在大多数情况下都可以工作，只要源文档不是特别大。将以下内容添加到身份转换：

<!-- filter out Page elements that aren't the first occurrence for their PageID -->
<xsl:template match="Page[@PageID = preceding-sibling::Page/@PageID]"/>

<!-- for each distinct page, copy all Page child nodes with the current PageID -->      
<xsl:template match="Page">
  <xsl:copy>
    <xsl:apply-templates select="@*"/>
    <xsl:apply-templates select="/root/Page[@PageID = current()/@PageID]/node()"/>
  </xsl:copy>
</xsl:template>

请注意，您还没有说明在这种情况下要做什么您试图将 Page 元素上的属性组合在一起，而上面的内容基本上忽略了它们；它只会从具有给定 PageID 的第一个 Page 元素复制属性。

This isn't especially efficient - there's a faster method that uses xsl:key - but it will work in most cases where the source document isn't unreasonably large. Add the following to the identity transform:

<!-- filter out Page elements that aren't the first occurrence for their PageID -->
<xsl:template match="Page[@PageID = preceding-sibling::Page/@PageID]"/>

<!-- for each distinct page, copy all Page child nodes with the current PageID -->      
<xsl:template match="Page">
  <xsl:copy>
    <xsl:apply-templates select="@*"/>
    <xsl:apply-templates select="/root/Page[@PageID = current()/@PageID]/node()"/>
  </xsl:copy>
</xsl:template>

Note that you haven't said what to do in the case that there are attributes on the Page elements that you're trying to group together, and the above basically ignores them; it will only copy attributes from the first Page element with a given PageID.

回复收藏 0 原文

~没有更多了~