删除空的 XML 标签

发布于 2024-12-02 22:01:08 字数 1115 浏览 2 评论 0原文

我正在寻找一种可以有效地从 XML 中删除空标签的好方法。你有什么建议吗？正则表达式？ X文档？ XmlTextReader？

例如，

const string original = 
    @"<?xml version=""1.0"" encoding=""utf-16""?>
    <pet>
        <cat>Tom</cat>
        <pig />
        <dog>Puppy</dog>
        <snake></snake>
        <elephant>
            <africanElephant></africanElephant>
            <asianElephant>Biggy</asianElephant>
        </elephant>
        <tiger>
            <tigerWoods></tigerWoods>       
            <americanTiger></americanTiger>
        </tiger>
    </pet>";

可以变成：

const string expected = 
    @"<?xml version=""1.0"" encoding=""utf-16""?>
        <pet>
        <cat>Tom</cat>
        <dog>Puppy</dog>        
        <elephant>                                              
            <asianElephant>Biggy</asianElephant>
        </elephant>                                 
    </pet>";

原文

I am looking for a good approach that can remove empty tags from XML efficiently. What do you recommend? Regex? XDocument? XmlTextReader?

For example,

const string original = 
    @"<?xml version=""1.0"" encoding=""utf-16""?>
    <pet>
        <cat>Tom</cat>
        <pig />
        <dog>Puppy</dog>
        <snake></snake>
        <elephant>
            <africanElephant></africanElephant>
            <asianElephant>Biggy</asianElephant>
        </elephant>
        <tiger>
            <tigerWoods></tigerWoods>       
            <americanTiger></americanTiger>
        </tiger>
    </pet>";

Could become:

const string expected = 
    @"<?xml version=""1.0"" encoding=""utf-16""?>
        <pet>
        <cat>Tom</cat>
        <dog>Puppy</dog>        
        <elephant>                                              
            <asianElephant>Biggy</asianElephant>
        </elephant>                                 
    </pet>";

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

流星番茄 2024-12-09 22:01:08

将原始文件加载到 XDocument 中并使用以下代码给出您想要的输出：

var document = XDocument.Parse(original);
document.Descendants()
        .Where(e => e.IsEmpty || String.IsNullOrWhiteSpace(e.Value))
        .Remove();

Loading your original into an XDocument and using the following code gives your desired output:

var document = XDocument.Parse(original);
document.Descendants()
        .Where(e => e.IsEmpty || String.IsNullOrWhiteSpace(e.Value))
        .Remove();

回复收藏 0 原文

鲸落 2024-12-09 22:01:08

这意味着对处理属性的已接受答案的改进：

XDocument xd = XDocument.Parse(original);
xd.Descendants()
    .Where(e => (e.Attributes().All(a => a.IsNamespaceDeclaration || string.IsNullOrWhiteSpace(a.Value))
            && string.IsNullOrWhiteSpace(e.Value)
            && e.Descendants().SelectMany(c => c.Attributes()).All(ca => ca.IsNamespaceDeclaration || string.IsNullOrWhiteSpace(ca.Value))))
    .Remove();

这里的想法是在删除元素之前检查元素上的所有属性是否也为空。 还有一种情况，空后代可以具有非空属性。我插入了第三个条件来检查该元素在其后代中是否具有所有空属性。考虑以下文档添加了node8：

<root>
  <node />
  <node2 blah='' adf='2'></node2>
  <node3>
    <child />
  </node3>
  <node4></node4>
  <node5><![CDATA[asdfasdf]]></node5>
  <node6 xmlns='urn://blah' d='a'/>
  <node7 xmlns='urn://blah2' />
  <node8>
     <child2 d='a' />
  </node8>
</root>

这将变为：

<root>
  <node2 blah="" adf="2"></node2>
  <node5><![CDATA[asdfasdf]]></node5>
  <node6 xmlns="urn://blah" d="a" />
  <node8>
    <child2 d='a' />
  </node8>
</root>

原始和改进 em> 回答这个问题将丢失 node2 和 node6 和 node8 节点。如果您只想删除像这样的节点，则检查 e.IsEmpty 会起作用，但如果您要同时删除 ，则检查是多余的<节点/>和<节点>。如果您还需要删除空属性，您可以这样做：

xd.Descendants().Attributes().Where(a => string.IsNullOrWhiteSpace(a.Value)).Remove();
xd.Descendants()
  .Where(e => (e.Attributes().All(a => a.IsNamespaceDeclaration))
            && string.IsNullOrWhiteSpace(e.Value))
  .Remove();

这将为您提供：

<root>
  <node2 adf="2"></node2>
  <node5><![CDATA[asdfasdf]]></node5>
  <node6 xmlns="urn://blah" d="a" />
</root>

This is meant to be an improvement on the accepted answer to handle attributes:

XDocument xd = XDocument.Parse(original);
xd.Descendants()
    .Where(e => (e.Attributes().All(a => a.IsNamespaceDeclaration || string.IsNullOrWhiteSpace(a.Value))
            && string.IsNullOrWhiteSpace(e.Value)
            && e.Descendants().SelectMany(c => c.Attributes()).All(ca => ca.IsNamespaceDeclaration || string.IsNullOrWhiteSpace(ca.Value))))
    .Remove();

The idea here is to check that all attributes on an element are also empty before removing it. There is also the case that empty descendants can have non-empty attributes. I inserted a third condition to check that the element has all empty attributes among its descendants. Considering the following document with node8 added:

<root>
  <node />
  <node2 blah='' adf='2'></node2>
  <node3>
    <child />
  </node3>
  <node4></node4>
  <node5><![CDATA[asdfasdf]]></node5>
  <node6 xmlns='urn://blah' d='a'/>
  <node7 xmlns='urn://blah2' />
  <node8>
     <child2 d='a' />
  </node8>
</root>

This would become:

<root>
  <node2 blah="" adf="2"></node2>
  <node5><![CDATA[asdfasdf]]></node5>
  <node6 xmlns="urn://blah" d="a" />
  <node8>
    <child2 d='a' />
  </node8>
</root>

The original and improved answer to this question would lose the node2 and node6 and node8 nodes. Checking for e.IsEmpty would work if you only want to strip out nodes like <node />, but it's redunant if you're going for both <node /> and <node></node>. If you also need to remove empty attributes, you could do this:

xd.Descendants().Attributes().Where(a => string.IsNullOrWhiteSpace(a.Value)).Remove();
xd.Descendants()
  .Where(e => (e.Attributes().All(a => a.IsNamespaceDeclaration))
            && string.IsNullOrWhiteSpace(e.Value))
  .Remove();

which would give you:

<root>
  <node2 adf="2"></node2>
  <node5><![CDATA[asdfasdf]]></node5>
  <node6 xmlns="urn://blah" d="a" />
</root>

回复收藏 0 原文

烟凡古楼 2024-12-09 22:01:08

一如既往，这取决于您的要求。

你知道空标签会如何显示吗？（例如、等）我通常不建议使用正则表达式（它们确实很有用，但是同时他们也是邪恶的）。另外，考虑 string.Replace 方法似乎是有问题的，除非您的 XML 没有特定的结构。

最后，我建议使用 XML 解析器方法（确保您的代码是有效的 XML）。

var doc = XDocument.Parse(original);
var emptyElements = from descendant in doc.Descendants()
                    where descendant.IsEmpty || string.IsNullOrWhiteSpace(descendant.Value)
                    select descendant;
emptyElements.Remove();

As always, it depends on your requirements.

Do you know how the empty tag will display? (e.g. <pig />, <pig></pig>, etc.) I usually do not recommend using Regular Expressions (they are really useful but at the same time they are evil). Also considering a string.Replace approach seems to be problematic unless your XML doesn't have a certain structure.

Finally, I would recommend using an XML parser approach (make sure your code is valid XML).

var doc = XDocument.Parse(original);
var emptyElements = from descendant in doc.Descendants()
                    where descendant.IsEmpty || string.IsNullOrWhiteSpace(descendant.Value)
                    select descendant;
emptyElements.Remove();

回复收藏 0 原文

未蓝澄海的烟 2024-12-09 22:01:08

您使用的任何内容都必须至少通过该文件一次。如果它只是您知道的单个命名标签，那么正则表达式是您的朋友，否则请使用堆栈方法。从父标签开始，如果它有子标签，则将其放入堆栈中。如果您发现一个空标签，请将其删除，然后在浏览完子标签并到达堆栈顶部的结束标签后，将其弹出并检查它。如果它是空的，也将其删除。这样您就可以删除所有空标签，包括具有空子项的标签。

如果您在使用 reg ex 表达式，请使用 this

回复收藏 0 原文

一桥轻雨一伞开 2024-12-09 22:01:08

XDocument 可能是最容易实现的，如果您知道您的文档相当小，它将提供足够的性能。

在处理非常大的文档时，XmlTextReader 会比 XDocument 更快并且使用更少的内存。

正则表达式最适合处理文本而不是 XML。它可能无法按照您的意愿处理所有边缘情况（例如，CDATA 部分中的标记；具有 xmlns 属性的标记），因此对于一般实现来说可能不是一个好主意，但可能就足够了，具体取决于您的控制程度有输入 XML。

回复收藏 0 原文

三五鸿雁 2024-12-09 22:01:08

如果我们谈论性能，则 XmlTextReader 更可取（它提供对 XML 的快速、仅向前的访问）。您可以使用 XmlReader 确定标记是否为空。 IsEmptyElement 属性。

产生所需输出的 XDocument 方法：

public static bool IsEmpty(XElement n)
{
    return n.IsEmpty 
        || (string.IsNullOrEmpty(n.Value) 
            && (!n.HasElements || n.Elements().All(IsEmpty)));
}

var doc = XDocument.Parse(original);
var emptyNodes = doc.Descendants().Where(IsEmpty);
foreach (var emptyNode in emptyNodes.ToArray())
{
    emptyNode.Remove();
}

XmlTextReader is preferable if we are talking about performance (it provides fast, forward-only access to XML). You can determine if tag is empty using XmlReader.IsEmptyElement property.

XDocument approach which produces desired output:

public static bool IsEmpty(XElement n)
{
    return n.IsEmpty 
        || (string.IsNullOrEmpty(n.Value) 
            && (!n.HasElements || n.Elements().All(IsEmpty)));
}

var doc = XDocument.Parse(original);
var emptyNodes = doc.Descendants().Where(IsEmpty);
foreach (var emptyNode in emptyNodes.ToArray())
{
    emptyNode.Remove();
}

回复收藏 0 原文

~没有更多了~