不区分大小写的字符串替换可以正确地与“ß”等连字一起使用<=> “ss”

发布于 2024-09-01 05:54:06 字数 957 浏览 7 评论 0 原文

我构建了一个小型的 asp.net 表单来搜索某些内容并显示结果。我想突出显示搜索结果中的搜索字符串。示例：

Query: "p"
Results: a<b>p</b>ple, banana, <b>p</b>lum

我的代码如下：

public static string HighlightSubstring(string text, string substring)
{
 var index = text.IndexOf(substring, StringComparison.CurrentCultureIgnoreCase);
 if(index == -1) return HttpUtility.HtmlEncode(text);
 string p0, p1, p2;
 text.SplitAt(index, index + substring.Length, out p0, out p1, out p2);
 return HttpUtility.HtmlEncode(p0) + "<b>" + HttpUtility.HtmlEncode(p1) + "</b>" + HttpUtility.HtmlEncode(p2);
}

我大部分工作正常，但请尝试使用 HighlightSubstring("ß", "ss") 进行示例。这会崩溃，因为在德国，IndexOf 方法认为“ß”和“ss”相等，但它们的长度不同！

现在，如果有一种方法可以找出“text”中的匹配长度，那就没问题了。请记住，该长度可以是 != substring.Length。

那么，在存在连字和外来语言字符（本例中为连字）的情况下，如何找出 IndexOf 生成的匹配长度？

原文

I have build a litte asp.net form that searches for something and displays the results. I want to highlight the search string within the search results. Example:

Query: "p"
Results: a<b>p</b>ple, banana, <b>p</b>lum

The code that I have goes like this:

public static string HighlightSubstring(string text, string substring)
{
 var index = text.IndexOf(substring, StringComparison.CurrentCultureIgnoreCase);
 if(index == -1) return HttpUtility.HtmlEncode(text);
 string p0, p1, p2;
 text.SplitAt(index, index + substring.Length, out p0, out p1, out p2);
 return HttpUtility.HtmlEncode(p0) + "<b>" + HttpUtility.HtmlEncode(p1) + "</b>" + HttpUtility.HtmlEncode(p2);
}

I mostly works but try it for example with HighlightSubstring("ß", "ss"). This crashes because in Germany "ß" and "ss" are considered to be equal by the IndexOf method, but they have different length!

Now that would be ok if there was a way to find out how long the match in "text" is. Remember that this length can be != substring.Length.

So how do I find out the length of the match that IndexOf produces in the presence of ligatures and exotic language characters (ligatures in this case)?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

┾廆蒐ゝ 2024-09-08 05:54:06

这可能不会直接回答您的问题，但也许会解决您的实际问题。

为什么不替代呢？

using System.Text.RegularExpressions;

public static string HighlightString(string text, string substring)
{
    Regex r = new Regex(Regex.Escape(HttpUtility.HtmlEncode(substring)),
                        RegexOptions.IgnoreCase);
    return r.Replace(HttpUtility.HtmlEncode(text), @"<b>amp;</b>");
}

但文化又如何呢？如果将正则表达式指定为不区分大小写，则默认情况下根据 http://msdn.microsoft.com/en-us/library/z0sbec17.aspx。

This may not directly answer your question but perhaps will solve your actual problem.

Why not substitute instead?

using System.Text.RegularExpressions;

public static string HighlightString(string text, string substring)
{
    Regex r = new Regex(Regex.Escape(HttpUtility.HtmlEncode(substring)),
                        RegexOptions.IgnoreCase);
    return r.Replace(HttpUtility.HtmlEncode(text), @"<b>amp;</b>");
}

But what of the culture? If you specify a Regex as case-insensitive, it is culture-sensitive by default according to http://msdn.microsoft.com/en-us/library/z0sbec17.aspx.

回复收藏 0 原文

~没有更多了~