使用 Regex 或 XmlParser 替换标记中未包含的文本
我知道使用正则表达式来解析或操作 HTML/XML 是一个坏主意,我通常不会这样做。但考虑它是因为缺乏替代方案。
我需要使用 C# 替换尚未属于标记(最好是具有特定 id 的 span 标记)一部分的字符串内的文本。
例如,假设我想将以下文本中不在跨度内的 ABC 的所有实例替换为替代文本(在我的情况下是另一个跨度)
行首的 ABC 或此处的 ABC 必须替换,但是, < ;span id="__publishingReusableFragment" >span 内的 ABC 不得替换为任何内容。这里还有一个 ABC 这个 ABC 也必须被替换
我尝试使用正则表达式,同时使用前瞻和后瞻断言。各种组合,
string regexPattern = "(?<!id=\"__publishingReusableFragment\").*?" + stringToMatch + ".*?(?!span)";
但放弃了。
我尝试将其加载到 XElement 中,并尝试从那里创建一个编写器并获取不在节点内部的文本。但也无法弄清楚。
XElement xel = XElement.Parse("<payload>" + inputString + @"</payload>");
XmlWriter requiredWriter = xel.CreateWriter();
我希望以某种方式使用编写器来获取不属于节点的字符串并替换它们。
基本上我愿意接受任何解决这个问题的建议/解决方案。
预先感谢您的帮助。
I know that using Regular expressions to parse or manipulate HTML/XML is a bad idea and I usually would never do it. But considering it because of lack of alternatives.
I need to replace text inside a string that is not already part of a tag (ideally a span tag with specific id) using C#.
For example, Lets say I want to replace all instaces of ABC in the following text that are not inside a span with Alternate text (another span in my case)
ABC at start of line or ABC here must be replaced but, <span id="__publishingReusableFragment" >ABC inside span must not be replaced with anything. Another ABC here </span> this ABC must also be replaced
I tried using regex with both look ahead and look behind assertion. Various combinations along the lines of
string regexPattern = "(?<!id=\"__publishingReusableFragment\").*?" + stringToMatch + ".*?(?!span)";
but gave up on that.
I tried loading it into an XElement and trying to create a writer from there and getting text not inside of a node. But couldn't figure that out either.
XElement xel = XElement.Parse("<payload>" + inputString + @"</payload>");
XmlWriter requiredWriter = xel.CreateWriter();
I am hoping somehow to use the writer to get the strings that are not part of a node and replacing them.
Basically I am open to any suggestions/solutions to solve this problem.
Thanks in advance for the help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
将会适用于所有关于 HTML 解析的警告(您似乎知道,所以我不会在这里重复它们)仍然有效。
如果正则表达式前面没有开始
标记并且没有结束
标记,则该正则表达式与
ABC
匹配。两者之间如果可以嵌套标签,显然会失败。
will work with all the caveats about HTML parsing (that you seem to know, so I won't repeat them here) still valid.
The regex matches
ABC
if it's not preceded by an opening<span id=__publishingReusableFragment">
tag and if there is no closing<span>
tag between the two. It will obviously fail if there can be nested<span>
tags.我知道它有点难看,但这会起作用
I know its slightly ugly, but this will work