处理C#中无法编码的字符
给定一个输入字符串和一个编码,我想处理输入字符串中的每个字符,如下所示:
如果可以编码编码点,然后对其进行编码;
如果不是,则输出(编码)字符串
& #xuuuu;
其中uuuu是Unicode CodePoint的十六进制值。
我已经阅读了encoder
和encoderfallback
的.NET文档,我可以看到如何在找到无法校正的字符时获得通知,但我看不到任何内容输出实际取决于所讨论的特定字符的方法。
有什么想法吗?
看起来更深一些(谢谢@Josefz),我看到encoderfallback
class的描述支持三种机制,包括:
最佳拟合后回来,映射有效的Unicode字符 编码为近似等效。例如,最合适的后卫 Asciiencoding类的处理程序可能会将æ(U+00C6)映射到AE(U+0041+ U+0045)。最佳合适的后备处理程序也可以实施以将一个字母(例如西里尔)音译到另一个字母(例如 拉丁或罗马)。 .NET框架没有提供任何公众 最佳拟合后备实现。
这似乎是我所追求的:所以我必须弄清楚如何编写自己的encoderfallback
的实现?
Given an input string and an encoding, I want to process each character in the input string as follows:
If the codepoint can be encoded, then encode it;
If not, output (the encoding of) the string
UUUU;
where UUUU is the hex value of the Unicode codepoint.
I've read through the .NET documentation for Encoder
and EncoderFallback
, and I can see how to get notified when an unencodable character is found, but I can't see any way to output something that actually depends on the particular character in question.
Any ideas?
Looking a bit deeper (thanks @JosefZ), I see that the description of the EncoderFallback
class says it supports three mechanisms, including:
Best-fit fallback, which maps valid Unicode characters that cannot be
encoded to an approximate equivalent. For example, a best-fit fallback
handler for the ASCIIEncoding class might map Æ (U+00C6) to AE (U+0041 +
U+0045). A best-fit fallback handler might also be implemented to transliterate one alphabet (such as Cyrillic) to another (such as
Latin or Roman). The .NET Framework does not provide any public
best-fit fallback implementations.
which would appear to be the one I am after: so I have to work out how to write my own implementation of EncoderFallback
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用以下
encoderfallback
和encoderfallbackbuffer
来执行您想要的dotnetfiddle
您这样使用
You can use the following
EncoderFallback
andEncoderFallbackBuffer
to do what you wantdotnetfiddle
You use it like this