将两个 ascii 字符转换为其“对应”字符一个字符扩展 ASCII 表示
问题: 我有两个来自外部系统的固定宽度字符串。第一个包含基本字符(如 az),第二个(可以)包含附加到第一个字符串以创建实际字符的变音符号。
string asciibase = "Dutch has funny chars: a,e,u";
string diacrits = " ' \" \"";
//no clue what to do
string result = "Dutch has funny chars: á,ë,ü";
我可以编写大量搜索并替换所有字符+不同的变音符号,但希望有一些更优雅的东西。
有人知道如何解决这个问题吗?尝试使用 string.Normalize (c#) 计算小数值,但没有结果。而且谷歌并没有真正拿出什么东西。
The problem:
I have two fixed width strings from an external system. The first contains the base characters (like a-z), the second (MAY) contain diacritics to be appended to the first string to create the actual characters.
string asciibase = "Dutch has funny chars: a,e,u";
string diacrits = " ' \" \"";
//no clue what to do
string result = "Dutch has funny chars: á,ë,ü";
I could write a massive search and replace for all characters + different diacritics but was hoping for something a bit more elegant.
Somebody have a clue how to fix this one? Tried it with calculating the decimal values, using string.Normalize (c#) but no results. Also Google didn't really turn up with something.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
将变音符号从 Unicode 组合变音符号范围转换为合适的 unicode 值:
http://www .unicode.org/charts/PDF/U0300.pdf
然后将 char 及其变音符号放在一起,例如对于 e-acute,U+0065 = "e" 且 U+0301 = Acute。
然后:
将两者组合成一个新字符串。
Convert the diacritics to suitable unicode values from the Unicode combining diacritical marks range:
http://www.unicode.org/charts/PDF/U0300.pdf
Then slap the char and its diacritic together e.g. for e-acute, U+0065 = "e" and U+0301 = acute.
Then:
Will combine the two into a new string.
除了使用查找表之外,我找不到简单的解决方案:
[编辑:@JonB 和 @Oliver 的答案中的建议后的简化代码]
I cannot find an easy solution except using lookup tables:
[EDIT: Simplified code after suggestions in the answers from @JonB and @Oliver]
问题是,必须显式解析指定的变音符号,因为双点不单独存在,因此在这种情况下使用双引号。因此,为了解决您的问题,您没有任何其他机会来实现每个所需的案例。
这是获取线索的起点...
IEnumerable.Zip 已经 在 .Net 4 中实现,但要在 3.5 中获得它,您需要此代码 (取自埃里克·利珀特):
The problem is, that the specified diacrits have to be explicitly parsed, cause the double points don't exists sole and so the double quotes are used for this case. So to accomplish your problem you don't have any other chance then to implement each needed case.
Here is a starting point to get a clue...
The IEnumerable.Zip is already implemented in .Net 4, but to get it in 3.5 you'll need this code (taken from Eric Lippert):
我不知道 C# 或其标准库,但一种替代方法可能是利用现有的 HTML/SGML/XML 字符实体解析器/渲染器之类的东西,或者如果您实际上要将其呈现给浏览器,什么都没有!
伪代码:
因此,
A + o
->Å
、u + "
->ü
等等。如果你可以解析 html 实体,你应该然后就可以回家了,甚至可以在字符集之间移植!
I don't know C#, or its standard libraries, but one alternative approach might be to utilize something like an existing HTML/SGML/XML character entity parser/renderer, or if you actually are going to present it to a browser, nothing!
Pseudo code:
Thus,
A + o
->Å
,u + "
->ü
and so on.If you can then parse html entities, you should then be home free, and even portable between charsets!