C# 中的 XML 修复
我的应用程序使用的文件格式是基于 Xml 的。我刚刚遇到一位客户,他的 xml 文件有问题。该东西包含近 90,000 行,并且由于某种原因大约有 20 个“=”符号随机散布。
大多数情况下,我都会收到 XmlException 异常,其中包含行号和字符位置,这使我能够找到有问题的字符并手动删除它们。我刚刚开始编写一个小应用程序来自动执行此过程,但我想知道是否有更好的方法来修复损坏的 xml 文件。
拙劣的线路示例:
<item name="InstanceGuid" typ=e_name="gh_guid" type_code="9">ee330f9f-a1e2-451a-8c6d-723f066a6bd4</item>
↑ (this is supposed to be [type_name])
The file format my application uses is Xml based. I just got a customer who has a botched xml file. The thing contains nearly 90,000 lines and for some reason there are about 20 "=" symbols randomly interspersed.
I get an XmlException for most of them with a line number and char position which allows me to find offending chars and remove them manually. I've just started writing a small app that automates this process, but I was wondering if there are better ways to repair damaged xml files.
Example of botched line:
<item name="InstanceGuid" typ=e_name="gh_guid" type_code="9">ee330f9f-a1e2-451a-8c6d-723f066a6bd4</item>
↑ (this is supposed to be [type_name])
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以搜索后面没有双引号的任何等号。正则表达式(regex)的编写起来非常简单。
或者,您可以在高级文本编辑器中打开文件,然后通过相同的正则表达式进行搜索以查找和替换/删除。某些文本编辑器允许您使用正则表达式查找/替换,因此您可以搜索任何不带双引号的等号并将其删除。
当然,我会保留原始文件的副本,因为如果内部 XML 中有等号,那么可能会弄乱它,等等。
You could search for any equal sign that isn't followed by a double quote. A regular expression (regex) would be pretty simple to write up.
Or you could just open the file in an advanced text editor and search by that same regex expression to find and replace/remove. Some text editors allow you to find/replace with regex, so you could search for any equal sign not followed by double quote and just remove it.
Of course, I'd keep a copy of the original since if you had equal signs in the inner XML then it might mess it up, etc.
首先使用正则表达式清理 xml。
类似:
显然这需要移植到您选择的正则表达式引擎:)
Use a regular expression to clean the xml first.
something like:
Obviously this would need to be ported to your Regex engine of choice :)
在 TextPad 中,如果您使用正则表达式 =[^"] 进行搜索,您将发现任何 = 符号后面没有 "
这应该找到文档中出现恶意 = 符号的位置。要替换它们,请首先在 TextPad 中打开文档。然后按 F8。
在对话框中输入以下内容:
查找内容:=\([^"]\)
替换为:\1
选中“正则表达式”框,选择“所有文档”,然后单击“全部替换”
这应该与所有 = 匹配t 后跟一个 " 并将 = 替换为其后面的符号。
typename =“test”typ = ename =“test”
将变为
typename =“test”typename =“test”
In TextPad if you search using the regular expression =[^"] you will find any = signs not followed by a "
This should find the locations in the document where the rogue = signs have appeared. To replace them, first open the document in TextPad. Then press F8.
In the dialog enter the following:
Find what: =\([^"]\)
Replace with: \1
Check the "Regular expressions" box, select "All documents" and click "Replace All"
This should match all = that aren't followed by a " and replace the = with the symbol that did follow it.
typename="test" typ=ename="test"
will become
typename="test" typename="test"