使用不区分大小写的正则表达式替换需要注意哪些事项?

发布于 2024-08-13 04:12:46 字数 323 浏览 2 评论 0原文

我编写了以下代码来在 C# 中执行不区分大小写的替换:

Regex.Replace(textBoxText, 
    Regex.Escape(findText), 
    replaceText, 
    RegexOptions.IgnoreCase);

只是想检查这是否是正确的方法,或者是否有更好的方法以及我是否忽略了一些我应该更好注意的内容。

注意:请不要向我提供一些手工编写的代码,我使用了 codeproject 中的快速替换功能,并且该代码在客户端崩溃,我无法知道用户正在使用什么输入。所以,我更喜欢一些简单但正确可靠的方法。

I have written the following code to do case insensitive replace in C#:

Regex.Replace(textBoxText, 
    Regex.Escape(findText), 
    replaceText, 
    RegexOptions.IgnoreCase);

Just wanted to check, whether this is the right approach, or is there a better approach and whether I'm overlooking something that I should better be aware of.

Note: Please don't provide me some hand crafted code, I had used a fast replace function from codeproject, and that code crashes at client side, and I have no way to know, what input the user was using. So, I prefer some simple but correct and reliable method.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

怎言笑 2024-08-20 04:12:46

您的代码看起来不错,但请记住,当您进行不区分大小写的匹配时,您将使用当前的区域设置或区域性。最好添加您想要的文化,或者让用户选择它。 CultureInvariant 通常是一个很好的通用选择,可以在任何区域设置中执行相同的操作:

Regex.Replace(textBoxText, 
    Regex.Escape(findText), 
    replaceText, 
    RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);

要使用其他区域设置,您需要做更多的花招:

// remember current
CultureInfo originalCulture = Thread.CurrentThread.CurrentCulture;

// set user-selected culture here (in place of "en-US")
Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("en-US");

// do the regex
Regex.Replace(textBoxText, 
    Regex.Escape(findText), 
    replaceText, 
    RegexOptions.IgnoreCase);

// reset the original culture
Thread.CurrentThread.CurrentCulture = originalCulture;

请注意,您可以打开或关闭不区分大小写。它不是一个切换,这意味着:

// these three statements are equivalent and yield the same results:
Regex.Replace("tExT", "[a-z]", "", RegexOptions.IgnoreCase);
Regex.Replace("tExT", "(?i)[a-z]", "", RegexOptions.IgnoreCase);
Regex.Replace("tExT", "(?i)[a-z]", "");

// once IgnoreCase is used, this switches it off for the whole expression...
Regex.Replace("tExT", "(?-i)[a-z]", "", RegexOptions.IgnoreCase);

//...and this can switch it off for only a part of the expression:
Regex.Replace("tExT", "(?:(?-i)[a-z])", "", RegexOptions.IgnoreCase);

最后一个很有趣:在非捕获分组括号之后的 (?:) 之间,大小写切换 (?-i)< /code> 不再有效。您可以在表达式中随意使用它。在不分组的情况下使用它会使它们在下一次区分大小写切换或结束之前一直有效。

更新:我错误地假设您无法进行区分大小写的切换。上面的文字是根据这一点进行编辑的。

Your code seems ok, but remember that when you do case-insensitive matching like that, you use the current locale or culture. It is probably better to add the Culture you want, or have the user select it. CultureInvariant is usually a good general choice to act the same in any locale:

Regex.Replace(textBoxText, 
    Regex.Escape(findText), 
    replaceText, 
    RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);

To use another locale, you need to do a bit more hocus pocus:

// remember current
CultureInfo originalCulture = Thread.CurrentThread.CurrentCulture;

// set user-selected culture here (in place of "en-US")
Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("en-US");

// do the regex
Regex.Replace(textBoxText, 
    Regex.Escape(findText), 
    replaceText, 
    RegexOptions.IgnoreCase);

// reset the original culture
Thread.CurrentThread.CurrentCulture = originalCulture;

Note that you can switch case insensitivity on or off. It is not a toggle, that means that:

// these three statements are equivalent and yield the same results:
Regex.Replace("tExT", "[a-z]", "", RegexOptions.IgnoreCase);
Regex.Replace("tExT", "(?i)[a-z]", "", RegexOptions.IgnoreCase);
Regex.Replace("tExT", "(?i)[a-z]", "");

// once IgnoreCase is used, this switches it off for the whole expression...
Regex.Replace("tExT", "(?-i)[a-z]", "", RegexOptions.IgnoreCase);

//...and this can switch it off for only a part of the expression:
Regex.Replace("tExT", "(?:(?-i)[a-z])", "", RegexOptions.IgnoreCase);

The last one is interesting: between the (?:) after the non-capturing grouping parenthesis, the case-switch (?-i) is not effective anymore. You can use this as often as you like in an expression. Using it without grouping makes them effective until the next case-sensitivity switch, or to the end.

Update: I made the wrong assumption that you can't do case-sensitivity switching. The text above is edited with this in mind.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文