如何在 C# 中将字符串从 utf8 转换(音译)为 ASCII(单字节)?
字符串对象
我有一个“包含多个字符甚至特殊字符”的
,我正在尝试使用
UTF8Encoding utf8 = new UTF8Encoding();
ASCIIEncoding ascii = new ASCIIEncoding();
对象将该字符串转换为 ascii。 我可以请某人为这个简单的任务带来一些启发,那就是狩猎我的下午。
编辑1: 我们想要完成的是摆脱特殊字符,例如一些特殊的 Windows 撇号。 我在下面发布的作为答案的代码不会解决这个问题。 基本上
奥布莱恩将成为奥布莱恩。 其中 ' 是特殊撇号之一
I have a string object
"with multiple characters and even special characters"
I am trying to use
UTF8Encoding utf8 = new UTF8Encoding();
ASCIIEncoding ascii = new ASCIIEncoding();
objects in order to convert that string to ascii. May I ask someone to bring some light to this simple task, that is hunting my afternoon.
EDIT 1:
What we are trying to accomplish is getting rid of special characters like some of the special windows apostrophes. The code that I posted below as an answer will not take care of that. Basically
O'Brian will become O?Brian. where ' is one of the special apostrophes
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
这是对你的另一个问题的回应,看起来它已被删除......这一点仍然成立。
看起来像经典的 Unicode 到 ASCII 问题。 诀窍是找到它发生的地点。
.NET 可以很好地处理 Unicode,假设它被告知它是 Unicode 开始(或保留默认值)。
我的猜测是您的接收应用程序无法处理它。 所以,我可能会使用 ASCIIEncoder 带有 EncoderReplacementFallback with String.Empty:
当然,在过去,我们只是循环并删除任何大于127的字符......好吧,至少我们这些在美国的人。 ;)
This was in response to your other question, that looks like it's been deleted....the point still stands.
Looks like a classic Unicode to ASCII issue. The trick would be to find where it's happening.
.NET works fine with Unicode, assuming it's told it's Unicode to begin with (or left at the default).
My guess is that your receiving app can't handle it. So, I'd probably use the ASCIIEncoder with an EncoderReplacementFallback with String.Empty:
Of course, in the old days, we'd just loop though and remove any chars greater than 127...well, those of us in the US at least. ;)
我能够弄清楚。 如果有人想知道下面对我有用的代码:
请告诉我是否有更简单的方法。
I was able to figure it out. In case someone wants to know below the code that worked for me:
Let me know if there is a simpler way o doing it.
对于任何喜欢扩展方法的人来说,这个方法对我们有用。
(系统命名空间,因此它几乎可以自动用于我们所有的字符串。)
For anyone who likes Extension methods, this one does the trick for us.
(System namespace so it's available pretty much automatically for all of our strings.)
根据上面 Mark 的回答(以及 Geo 的评论),我创建了一个两行版本来从字符串中删除所有 ASCII 异常情况。 为寻找这个答案的人提供(就像我一样)。
Based on Mark's answer above (and Geo's comment), I created a two liner version to remove all ASCII exception cases from a string. Provided for people searching for this answer (as I did).
如果您想要在许多编码中使用的字符的 8 位表示,这可能会帮助您。
您必须将变量 targetEncoding 更改为您想要的任何编码。
If you want 8 bit representation of characters that used in many encoding, this may help you.
You must change variable targetEncoding to whatever encoding you want.
下面是尽可能将 unicode 字符音译为最接近的 ascii 版本的代码。 删除/修复重音符号、宏符号、排版冒号、破折号、大引号、撇号、破折号、隐形空格和其他不良字符。
如果您需要将数据输入到另一个不支持 unicode 的系统中,这非常有用。 通过使用 stringbuilder 和简单循环,代码速度很快(经过测试,处理 8,000 个字符串需要 10,000x = 1.1 秒)。
Here is code to transliterate unicode chars to their closest ascii version where possible. Remove/fix accents, macrons, typesetters colons, dashes, curly quotes, apostrophes, dashes, invisible spaces, and other bad chars.
This is useful if you need to feed data into another system that does not support unicode. Code is fast by using stringbuilder and simple loop (tested 8,000 char string processed 10,000x = 1.1sec).