删除空的 XML 标签
我正在寻找一种可以有效地从 XML 中删除空标签的好方法。你有什么建议吗?正则表达式? X文档? XmlTextReader?
例如,
const string original =
@"<?xml version=""1.0"" encoding=""utf-16""?>
<pet>
<cat>Tom</cat>
<pig />
<dog>Puppy</dog>
<snake></snake>
<elephant>
<africanElephant></africanElephant>
<asianElephant>Biggy</asianElephant>
</elephant>
<tiger>
<tigerWoods></tigerWoods>
<americanTiger></americanTiger>
</tiger>
</pet>";
可以变成:
const string expected =
@"<?xml version=""1.0"" encoding=""utf-16""?>
<pet>
<cat>Tom</cat>
<dog>Puppy</dog>
<elephant>
<asianElephant>Biggy</asianElephant>
</elephant>
</pet>";
I am looking for a good approach that can remove empty tags from XML efficiently. What do you recommend? Regex? XDocument? XmlTextReader?
For example,
const string original =
@"<?xml version=""1.0"" encoding=""utf-16""?>
<pet>
<cat>Tom</cat>
<pig />
<dog>Puppy</dog>
<snake></snake>
<elephant>
<africanElephant></africanElephant>
<asianElephant>Biggy</asianElephant>
</elephant>
<tiger>
<tigerWoods></tigerWoods>
<americanTiger></americanTiger>
</tiger>
</pet>";
Could become:
const string expected =
@"<?xml version=""1.0"" encoding=""utf-16""?>
<pet>
<cat>Tom</cat>
<dog>Puppy</dog>
<elephant>
<asianElephant>Biggy</asianElephant>
</elephant>
</pet>";
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
将原始文件加载到
XDocument
中并使用以下代码给出您想要的输出:Loading your original into an
XDocument
and using the following code gives your desired output:这意味着对处理属性的已接受答案的改进:
这里的想法是在删除元素之前检查元素上的所有属性是否也为空。 还有一种情况,空后代可以具有非空属性。我插入了第三个条件来检查该元素在其后代中是否具有所有空属性。考虑以下文档添加了node8:
这将变为:
原始和改进 em> 回答这个问题将丢失
node2
和node6
和node8
节点。如果您只想删除像
这样的节点,则检查e.IsEmpty
会起作用,但如果您要同时删除,则检查是多余的<节点/>
和<节点>
。如果您还需要删除空属性,您可以这样做:这将为您提供:
This is meant to be an improvement on the accepted answer to handle attributes:
The idea here is to check that all attributes on an element are also empty before removing it. There is also the case that empty descendants can have non-empty attributes. I inserted a third condition to check that the element has all empty attributes among its descendants. Considering the following document with node8 added:
This would become:
The original and improved answer to this question would lose the
node2
andnode6
andnode8
nodes. Checking fore.IsEmpty
would work if you only want to strip out nodes like<node />
, but it's redunant if you're going for both<node />
and<node></node>
. If you also need to remove empty attributes, you could do this:which would give you:
一如既往,这取决于您的要求。
你知道空标签会如何显示吗? (例如
、
等)我通常不建议使用正则表达式(它们确实很有用,但是同时他们也是邪恶的)。另外,考虑string.Replace
方法似乎是有问题的,除非您的 XML 没有特定的结构。最后,我建议使用 XML 解析器方法(确保您的代码是有效的 XML)。
As always, it depends on your requirements.
Do you know how the empty tag will display? (e.g.
<pig />
,<pig></pig>
, etc.) I usually do not recommend using Regular Expressions (they are really useful but at the same time they are evil). Also considering astring.Replace
approach seems to be problematic unless your XML doesn't have a certain structure.Finally, I would recommend using an XML parser approach (make sure your code is valid XML).
您使用的任何内容都必须至少通过该文件一次。如果它只是您知道的单个命名标签,那么正则表达式是您的朋友,否则请使用堆栈方法。从父标签开始,如果它有子标签,则将其放入堆栈中。如果您发现一个空标签,请将其删除,然后在浏览完子标签并到达堆栈顶部的结束标签后,将其弹出并检查它。如果它是空的,也将其删除。这样您就可以删除所有空标签,包括具有空子项的标签。
如果您在使用 reg ex 表达式,请使用 this
Anything you use will have to pass through the file once at least. If its just a single named tag that you know then regex is your friend otherwise use a stack approach. Start with parent tag and if it has a sub tag place it in stack. If you find an empty tag remove it then once you have gone through child tags and reached the ending tag of what you have on top of stack then pop it and check it as well. If its empty remove it as well. This way you can remove all empty tags including tags with empty children.
If you are after a reg ex expression use this
XDocument
可能是最容易实现的,如果您知道您的文档相当小,它将提供足够的性能。在处理非常大的文档时,
XmlTextReader
会比 XDocument 更快并且使用更少的内存。正则表达式最适合处理文本而不是 XML。它可能无法按照您的意愿处理所有边缘情况(例如,CDATA 部分中的标记;具有 xmlns 属性的标记),因此对于一般实现来说可能不是一个好主意,但可能就足够了,具体取决于您的控制程度有输入 XML。
XDocument
is probably simplest to implement, and will give adequate performance if you know your documents are reasonably small.XmlTextReader
will be faster and use less memory than XDocument when processing very large documents.Regex is best for handling text rather than XML. It might not handle all edge cases as you would like (e.g. a tag within a CDATA section; a tag with an xmlns attribute), so is probably not a good idea for a general implementation, but may be adequate depending on how much control you have of the input XML.
如果我们谈论性能,则 XmlTextReader 更可取(它提供对 XML 的快速、仅向前的访问)。您可以使用
XmlReader 确定标记是否为空。 IsEmptyElement
属性。产生所需输出的 XDocument 方法:
XmlTextReader is preferable if we are talking about performance (it provides fast, forward-only access to XML). You can determine if tag is empty using
XmlReader.IsEmptyElement
property.XDocument approach which produces desired output: