正则表达式不匹配 Unicode
我将如何使用正则表达式来匹配 Unicode 字符串?我从一个文本文件中加载几个关键字,并将它们与另一个文件上的正则表达式一起使用。关键字都包含unicode(例如á
等)。我不确定问题出在哪里。我必须设置一些选项吗?
代码:
foreach (string currWord in _keywordList)
{
MatchCollection mCount = Regex.Matches(
nSearch.InnerHtml, "\\b" + @currWord + "\\b", RegexOptions.IgnoreCase);
if (mCount.Count > 0)
{
wordFound.Add(currWord);
MessageBox.Show(@currWord, mCount.ToString());
}
}
并将关键字读取到列表中:
var rdComp = new StreamReader(opnDiag.FileName);
string compSplit = rdComp.ReadToEnd()
.Replace("\r\n", "\n")
.Replace("\n\r", "\n");
rdComp.Dispose();
string[] compList = compSplit.Split(new[] {'\n'});
然后我将数组更改为列表。
How would I go about using Regex to match Unicode strings? I'm loading in a couple keywords from a text file and using them with Regex on another file. The keywords both contain unicode (such as á
, etc). I'm not sure where the problem is. Is there some option I have to set?
Code:
foreach (string currWord in _keywordList)
{
MatchCollection mCount = Regex.Matches(
nSearch.InnerHtml, "\\b" + @currWord + "\\b", RegexOptions.IgnoreCase);
if (mCount.Count > 0)
{
wordFound.Add(currWord);
MessageBox.Show(@currWord, mCount.ToString());
}
}
And reading the keywords to a list:
var rdComp = new StreamReader(opnDiag.FileName);
string compSplit = rdComp.ReadToEnd()
.Replace("\r\n", "\n")
.Replace("\n\r", "\n");
rdComp.Dispose();
string[] compList = compSplit.Split(new[] {'\n'});
Then I change the array to a list.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
当匹配特定字符时,我相信正则表达式仅支持 ASCII 字符集的文字。除此之外,您可以使用 \uxxxx 来匹配 Unicode 代码点。
请参阅此处。
When matching on a specific character, I believe regular expressions only support literals for the ASCII character set. Beyond that, you can use \uxxxx to match on the Unicode code point.
See here.
您可以使用 [\u0000-\uffff]+ 至少匹配 BMP
You can use [\u0000-\uffff]+ to match at least the BMP