我构建了一个小型的 asp.net 表单来搜索某些内容并显示结果。我想突出显示搜索结果中的搜索字符串。示例:
Query: "p"
Results: a<b>p</b>ple, banana, <b>p</b>lum
我的代码如下:
public static string HighlightSubstring(string text, string substring)
{
var index = text.IndexOf(substring, StringComparison.CurrentCultureIgnoreCase);
if(index == -1) return HttpUtility.HtmlEncode(text);
string p0, p1, p2;
text.SplitAt(index, index + substring.Length, out p0, out p1, out p2);
return HttpUtility.HtmlEncode(p0) + "<b>" + HttpUtility.HtmlEncode(p1) + "</b>" + HttpUtility.HtmlEncode(p2);
}
我大部分工作正常,但请尝试使用 HighlightSubstring("ß", "ss")
进行示例。这会崩溃,因为在德国,IndexOf 方法认为“ß”和“ss”相等,但它们的长度不同!
现在,如果有一种方法可以找出“text”中的匹配长度,那就没问题了。请记住,该长度可以是 != substring.Length
。
那么,在存在连字和外来语言字符(本例中为连字)的情况下,如何找出 IndexOf
生成的匹配长度?
I have build a litte asp.net form that searches for something and displays the results. I want to highlight the search string within the search results. Example:
Query: "p"
Results: a<b>p</b>ple, banana, <b>p</b>lum
The code that I have goes like this:
public static string HighlightSubstring(string text, string substring)
{
var index = text.IndexOf(substring, StringComparison.CurrentCultureIgnoreCase);
if(index == -1) return HttpUtility.HtmlEncode(text);
string p0, p1, p2;
text.SplitAt(index, index + substring.Length, out p0, out p1, out p2);
return HttpUtility.HtmlEncode(p0) + "<b>" + HttpUtility.HtmlEncode(p1) + "</b>" + HttpUtility.HtmlEncode(p2);
}
I mostly works but try it for example with HighlightSubstring("ß", "ss")
. This crashes because in Germany "ß" and "ss" are considered to be equal by the IndexOf
method, but they have different length!
Now that would be ok if there was a way to find out how long the match in "text" is. Remember that this length can be != substring.Length
.
So how do I find out the length of the match that IndexOf
produces in the presence of ligatures and exotic language characters (ligatures in this case)?
发布评论
评论(1)
这可能不会直接回答您的问题,但也许会解决您的实际问题。
为什么不替代呢?
但文化又如何呢?如果将正则表达式指定为不区分大小写,则默认情况下根据 http://msdn.microsoft.com/en-us/library/z0sbec17.aspx。
This may not directly answer your question but perhaps will solve your actual problem.
Why not substitute instead?
But what of the culture? If you specify a Regex as case-insensitive, it is culture-sensitive by default according to http://msdn.microsoft.com/en-us/library/z0sbec17.aspx.