替换所有非 ASCII 字符，除了 C# 中的直角字符

发布于 2024-10-02 02:42:34 字数 240 浏览 0 评论 0原文

编写一个文件实用程序以从文件中删除所有非 ASCII 字符。我有这个正则表达式：

Regex rgx = new Regex(@"[^\u0000-\u007F]");

效果很好。但不幸的是，我发现一些愚蠢的人在他们的文件中使用直角（Ø）作为分隔符，所以这些也被删除了，但我需要它们！

我对正则表达式还很陌生，而且我确实了解基础知识，但任何帮助都会很棒！

提前致谢！

原文

Writing a file utility to strip out all non-ASCII characters from files. I have this Regex:

Regex rgx = new Regex(@"[^\u0000-\u007F]");

Which works fine. But unfortunatly, I've discovered some silly people use right angles (¬) as delimiters in their files, so these get stripped out as well, but I need those!

I'm pretty new to Regex, and I do understand the basics, but any help would be awesome!

Thanks in advance!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

深白境迁sunset 2024-10-09 02:42:34

您只需要在集合中包含尖括号的代码点：

试试这个：

Regex rgx = new Regex(@"[^\uxxxx\u0000-\u007F]");

或者这个：（

Regex rgx = new Regex(@"[^\uxxxx-\uxxxx\u0000-\u007F]");

其中 xxxx 是您要保留的字符的 Unicode 代码点。）

这里给出两个选项的原因是我知道您可以在一个负字符组中指定多个范围，但我不知道是否可以将单个字符与范围匹配。

You just need to include the code point for the angle bracket in the set:

Try this:

Regex rgx = new Regex(@"[^\uxxxx\u0000-\u007F]");

Or this:

Regex rgx = new Regex(@"[^\uxxxx-\uxxxx\u0000-\u007F]");

(Where xxxx is the Unicode code point for the character you want to preserve.)

The reason for giving two options here is that I know you can specify multiple ranges within one negative character group, but I don't know if you can match individual characters with ranges.

回复收藏 0 原文

情定在深秋 2024-10-09 02:42:34

乔恩的回答是绝对正确的。您可能使用了错误的字符代码。对于外观相似的角色，请尝试以下操作：

Regex regex = new Regex(@"([^\u00ac\u0000-\u007F])");
Regex regex = new Regex(@"([^\u02fa\u0000-\u007F])");
Regex regex = new Regex(@"([^\u031a\u0000-\u007F])");

我认为第一个应该有效。

Jon's answer is absolutely correct. You may be using the wrong code for the character. Try the following for the similar looking characters:

Regex regex = new Regex(@"([^\u00ac\u0000-\u007F])");
Regex regex = new Regex(@"([^\u02fa\u0000-\u007F])");
Regex regex = new Regex(@"([^\u031a\u0000-\u007F])");

First one should work I think.

回复收藏 0 原文

~没有更多了~