奇怪的字符在记事本中正确呈现,但在其他地方作为控制字符

发布于 2024-12-07 09:37:57 字数 311 浏览 0 评论 0原文

我有一个 .csv 企业列表。该文件中有一些奇怪的字符。例如,在此字段中:Stocktonon-Tees,第一个连字符,位于 Stocktonon 之间,似乎是值为 6 的字符,而不是值为 45 的连字符。堆栈溢出可能会对其进行清理,因此您看不到它,所以这里有一个粘贴箱:

http://pastebin.com/NuyyaQy9

任何人都可以解释为什么会这样吗?我错过了一些编码问题吗?或者数据集损坏?

I have a .csv list of businesses. The file has some strange characters in. For example, in this field: Stocktonon-Tees, the first hyphen, between Stockton and on seems to be a character with the value 6 rather than a hyphen, with the value 45. Stack overflow will probably sanatize this so you can't see it, so here is a pastebin:

http://pastebin.com/NuyyaQy9

Can anyone explain why this could be? Is it some encoding issue that I have missed? Or a corruption in the dataset?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

街角迷惘 2024-12-14 09:37:57

是的,这几乎可以肯定是编码问题。文件仅由二进制数据组成 - 重要的是您解释二进制数据的方式。听起来记事本正在猜测最初的编码,但您使用的其他任何东西都不是。

不幸的是,您没有说明什么软件正在尝试读取该文件或首先写入该文件的内容 - 但您应该查看记事本认为它是什么编码,并从那里开始工作。

如果是您的代码写出文件,并且您可以决定编码,那么我建议使用 UTF-8 作为良好的通用、平台可移植编码。

Yes, it's almost certainly an encoding issue. A file just consists of binary data - it's how you interpret that binary data that matters. It sounds like Notepad is guessing at the originally-intended encoding, but whatever else you're using isn't.

Unfortunately you haven't said anything about what software is trying to read the file or what wrote it in the first place - but you should look at what encoding Notepad thinks it is, and work from there.

If it's your code that wrote the file out, and you get to decide the encoding, I'd recommend UTF-8 as a good general purpose, platform-portable encoding.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文