C# 中的 XML 修复

发布于 2024-08-04 06:19:10 字数 444 浏览 6 评论 0原文

我的应用程序使用的文件格式是基于 Xml 的。我刚刚遇到一位客户,他的 xml 文件有问题。该东西包含近 90,000 行,并且由于某种原因大约有 20 个“=”符号随机散布。

大多数情况下,我都会收到 XmlException 异常,其中包含行号和字符位置,这使我能够找到有问题的字符并手动删除它们。我刚刚开始编写一个小应用程序来自动执行此过程,但我想知道是否有更好的方法来修复损坏的 xml 文件。

拙劣的线路示例:

<item name="InstanceGuid" typ=e_name="gh_guid" type_code="9">ee330f9f-a1e2-451a-8c6d-723f066a6bd4</item>
                             ↑ (this is supposed to be [type_name])

The file format my application uses is Xml based. I just got a customer who has a botched xml file. The thing contains nearly 90,000 lines and for some reason there are about 20 "=" symbols randomly interspersed.

I get an XmlException for most of them with a line number and char position which allows me to find offending chars and remove them manually. I've just started writing a small app that automates this process, but I was wondering if there are better ways to repair damaged xml files.

Example of botched line:

<item name="InstanceGuid" typ=e_name="gh_guid" type_code="9">ee330f9f-a1e2-451a-8c6d-723f066a6bd4</item>
                             ↑ (this is supposed to be [type_name])

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

笑,眼淚并存 2024-08-11 06:19:10

您可以搜索后面没有双引号的任何等号。正则表达式(regex)的编写起来非常简单。

或者,您可以在高级文本编辑器中打开文件,然后通过相同的正则表达式进行搜索以查找和替换/删除。某些文本编辑器允许您使用正则表达式查找/替换,因此您可以搜索任何不带双引号的等号并将其删除。

当然,我会保留原始文件的副本,因为如果内部 XML 中有等号,那么可能会弄乱它,等等。

You could search for any equal sign that isn't followed by a double quote. A regular expression (regex) would be pretty simple to write up.

Or you could just open the file in an advanced text editor and search by that same regex expression to find and replace/remove. Some text editors allow you to find/replace with regex, so you could search for any equal sign not followed by double quote and just remove it.

Of course, I'd keep a copy of the original since if you had equal signs in the inner XML then it might mess it up, etc.

总以为 2024-08-11 06:19:10

首先使用正则表达式清理 xml。

类似:

s/([^\s"]+)=([^\s"]+="[^"]*")/\1\2/

显然这需要移植到您选择的正则表达式引擎:)

Use a regular expression to clean the xml first.

something like:

s/([^\s"]+)=([^\s"]+="[^"]*")/\1\2/

Obviously this would need to be ported to your Regex engine of choice :)

北风几吹夏 2024-08-11 06:19:10

在 TextPad 中,如果您使用正则表达式 =[^"] 进行搜索,您将发现任何 = 符号后面没有 "

这应该找到文档中出现恶意 = 符号的位置。要替换它们,请首先在 TextPad 中打开文档。然后按 F8。

在对话框中输入以下内容:

查找内容:=\([^"]\)

替换为:\1

选中“正则表达式”框,选择“所有文档”,然后单击“全部替换”

这应该与所有 = 匹配t 后跟一个 " 并将 = 替换为其后面的符号。

typename =“test”typ = ename =“test”

将变为

typename =“test”typename =“test”

In TextPad if you search using the regular expression =[^"] you will find any = signs not followed by a "

This should find the locations in the document where the rogue = signs have appeared. To replace them, first open the document in TextPad. Then press F8.

In the dialog enter the following:

Find what: =\([^"]\)

Replace with: \1

Check the "Regular expressions" box, select "All documents" and click "Replace All"

This should match all = that aren't followed by a " and replace the = with the symbol that did follow it.

typename="test" typ=ename="test"

will become

typename="test" typename="test"

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文