替换所有非 ASCII 字符,除了 C# 中的直角字符
编写一个文件实用程序以从文件中删除所有非 ASCII 字符。我有这个正则表达式:
Regex rgx = new Regex(@"[^\u0000-\u007F]");
效果很好。但不幸的是,我发现一些愚蠢的人在他们的文件中使用直角(Ø)作为分隔符,所以这些也被删除了,但我需要它们!
我对正则表达式还很陌生,而且我确实了解基础知识,但任何帮助都会很棒!
提前致谢!
Writing a file utility to strip out all non-ASCII characters from files. I have this Regex:
Regex rgx = new Regex(@"[^\u0000-\u007F]");
Which works fine. But unfortunatly, I've discovered some silly people use right angles (¬) as delimiters in their files, so these get stripped out as well, but I need those!
I'm pretty new to Regex, and I do understand the basics, but any help would be awesome!
Thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您只需要在集合中包含尖括号的代码点:
试试这个:
或者这个:(
其中 xxxx 是您要保留的字符的 Unicode 代码点。)
这里给出两个选项的原因是我知道您可以在一个负字符组中指定多个范围,但我不知道是否可以将单个字符与范围匹配。
You just need to include the code point for the angle bracket in the set:
Try this:
Or this:
(Where xxxx is the Unicode code point for the character you want to preserve.)
The reason for giving two options here is that I know you can specify multiple ranges within one negative character group, but I don't know if you can match individual characters with ranges.
乔恩的回答是绝对正确的。您可能使用了错误的字符代码。对于外观相似的角色,请尝试以下操作:
我认为第一个应该有效。
Jon's answer is absolutely correct. You may be using the wrong code for the character. Try the following for the similar looking characters:
First one should work I think.