从文件(Inputstream)中删除UTF-8字符
我正在尝试将诸如“ \ u00f6”之类的UTF_8字符与他们的UTF-8表示形式。
例如,文件包含“ aalk \ u00f6rben”应该成为“aalkörben”。
val tmp = text.toByteArray(Charsets.UTF_8)
val escaped = tmp.decodeToString()
// or val escaped = tmp.toString(Charsets.UTF_8)
当我手动将字符串设置为“ Aalk \ u00f6rben”时,这可以正常工作。但是,当从文件中读取字符串时,它被解释为“ aalk \\ u00f6rben”,而斜线逃脱(两个斜线),而逃脱的失败。
有什么方法可以说服Kotlin转换特殊角色吗?我宁愿不使用Apache之类的外部库。
I am trying to unescape UTF_8 characters like "\u00f6" to their UTF-8 representation.
E.g. file contains "Aalk\u00f6rben" should become "Aalkörben".
val tmp = text.toByteArray(Charsets.UTF_8)
val escaped = tmp.decodeToString()
// or val escaped = tmp.toString(Charsets.UTF_8)
When I set the string manually to "Aalk\u00f6rben", this works fine. However, when reading the string from the file it is interpreted like "Aalk\\u00f6rben" with the slash escaped (two slashes) and the escaping fails.
Is there any way to convince Kotlin to convert the special characters? I would rather not use external libraries like from Apache.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不知道您是如何读取文件的,但是发生的事情很可能是... \ u00f6 ...被读为六个单个字符,而后斜线可能被逃脱了。您可以检查调试器。
因此,我的假设是,在内存中,您有“ Aalk \\ u00f6rben”。尝试以下替换:
编辑:这应该替换所有逃脱的4个字节字符:
I do not know how you read the file, but what happens most probably is that ...\u00f6... is read as six single characters and the backslash is probably being escaped. You could check in the debugger.
So my assumption is that in memory you have "Aalk\\u00f6rben". Try this replace:
Edit: this should replace all escaped 4 byte characters: