一些法语重音字符被编码为 utf-8 但仍然无法正确呈现
您好:我正在导入一个包含大量法语口音字符的 Stata 文件。导入时,我将编码设置为 utf-8。但是,某些重音字符无法正确呈现。请参阅下面我的数据集中的行示例。 我该如何处理这个问题?
test<-tibble::tribble(
~municipality,
"Sainte-Anne-de-Beaupré",
"Sainte-Anne-de-Beaupré",
"Sainte-Anne-de-Beaupré",
"Beaupré",
"Beaupré",
"Beaupré",
"Beaupré",
"Beaupré",
"Beaupré"
)
Encoding(test$municipality)
Encoding(test$municipality)<-'utf-8'
test$municipality
Hi there: I'm importing a Stata file that has a lot of French accented characters. The on import, I set the Encoding to utf-8. However, some of the accented characters are not rendering properly. See a sample of rows from my data-set below.
How do I handle this?
test<-tibble::tribble(
~municipality,
"Sainte-Anne-de-Beaupré",
"Sainte-Anne-de-Beaupré",
"Sainte-Anne-de-Beaupré",
"Beaupré",
"Beaupré",
"Beaupré",
"Beaupré",
"Beaupré",
"Beaupré"
)
Encoding(test$municipality)
Encoding(test$municipality)<-'utf-8'
test$municipality
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
正如 Giacomo 提到的,这似乎是文件的一个示例,其中一部分(正如您还显示了正确编码的 é,其中 UTF-8 一样,将其读取为 Latin1 并再次编码为 UTF-8,这意味着您的编码是正确的 à 并且这样显示。
知道它是如何发生的意味着我们知道如何修复它!
© 本身也是 utf-8 字符, 重大失误三次错误的编码),因此我们模拟每个字符在错误编码后的变化,然后再次保存为 utf8
让我们在您的数据上运行它
另一个例子:
As Giacomo mentions this seems to be an example of files where part of it (as you show also correct encoded é's where UTF-8, read it as it was Latin1 and encoded again as UTF-8, this means your encoding is correct as é itself are utf-8 characters as well and displayed as such. What you can do is fix such errors in the past.
Knowing how it happened means we know how to fix it!
So I wrote a function in the past (it supports some major screw ups of tripple wrong encodings) so we simulate how each character becomes after wrong encoding and then saved as utf8 again.
Lets run it on your data
Another example: