如何使用 Regex 或 StringBuilder 转义外来字符?

发布于 2025-01-18 19:26:41 字数 634 浏览 1 评论 0原文

我有以下清理字符串的方法:

public static String UseStringBuilderWithHashSet(string strIn)
    {
        var hashSet = new HashSet<char>("?&^$#@!()+-,:;<>’\'-_*");
        // specify capacity of StringBuilder to avoid resizing
        StringBuilder sb = new StringBuilder(strIn.Length);
        foreach (char x in strIn.Where(c => !hashSet.Contains(c)))
        {
            sb.Append(x);
        }
        return sb.ToString();
    }

但是,[mv] reolちるちるchiruchiru[mv] reol reolヒビカセhibikase没有清理。

如何修改我的方法,以便将上述字符串之一变成: [MV] Reol Chiruchiru

I have the following method to clean up strings:

public static String UseStringBuilderWithHashSet(string strIn)
    {
        var hashSet = new HashSet<char>("?&^$#@!()+-,:;<>’\'-_*");
        // specify capacity of StringBuilder to avoid resizing
        StringBuilder sb = new StringBuilder(strIn.Length);
        foreach (char x in strIn.Where(c => !hashSet.Contains(c)))
        {
            sb.Append(x);
        }
        return sb.ToString();
    }

However, strings such as [MV] REOL ちるちる ChiruChiru or [MV] REOL ヒビカセ Hibikase do not get cleaned up.

How can I modify my method so it can turn one of the above strings into for example:
[MV] REOL ChiruChiru

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

尸血腥色 2025-01-25 19:26:41

你试图通过过滤掉你不想要的一切来彻底解决这个问题。这不是最佳选择,因为它们有 100,000 多个可能的字符。

如果你只接受你想要的,你可能会找到更好的结果。

public static string CleanInput(string input)
{
    //a-zA-Z allows any English alphabet character upper or lower case
    //\[ and \] allows []
    //\s allows whitespace
    var regex = new Regex(@"[a-zA-Z\[\]\s]");
    var stringBuilder = new StringBuilder(input.Length);
    foreach(char c in input){
        if(regex.IsMatch(c.ToString())){
            stringBuilder.Append(c);
        }
    }
    string output = stringBuilder.ToString();
    //\s+ will match on any duplicate spaces and replace it with
    //a single space.
    return Regex.Replace(output , @"\s+", " ");
}

You're trying to solve this exhaustively by filtering out everything you don't want. This is not optimal as their are 100,000+ possible characters.

You may find better results if you only accept what you do want.

public static string CleanInput(string input)
{
    //a-zA-Z allows any English alphabet character upper or lower case
    //\[ and \] allows []
    //\s allows whitespace
    var regex = new Regex(@"[a-zA-Z\[\]\s]");
    var stringBuilder = new StringBuilder(input.Length);
    foreach(char c in input){
        if(regex.IsMatch(c.ToString())){
            stringBuilder.Append(c);
        }
    }
    string output = stringBuilder.ToString();
    //\s+ will match on any duplicate spaces and replace it with
    //a single space.
    return Regex.Replace(output , @"\s+", " ");
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文