CSV 文件中的 Unicode？

发布于 2024-09-19 22:39:55 字数 508 浏览 6 评论 0原文

我需要生成一个 CSV 文件。也许我“做错了”，因为我用自己的代码转储文件而不是使用库，但无论如何。

看来我一切都对了。引号、逗号和所有内容似乎都被完美地转义了。这很容易。问题是我正在使用 unicode 字符串进行测试，结果显示为 ????。当我使用 MS Excel 保存带有测试字符串的文件并点击“另存为 CSV”打开文件时，我遇到了同样的问题（unicode 字母变成了??????）。不支持unicode吗？

我只是尝试转储这样的字符串，而不是将其输出到网页

var f = new System.IO.StreamWriter(filename, false, System.Text.Encoding.Unicode);

，现在我看到了 unicode 文本，但所有内容现在都在一列中。奇怪的是，在我选择的文本编辑器中，一切看起来都很正常，如果我复制/粘贴几列并将其粘贴到另存为 .csv 中，我会看到这些列很好。尽管它可能会去除 unicode。

我该如何正确保存这个？

原文

I need to generate a CSV file. Maybe i am 'doing it wrong' because i am dumping the file with my own code instead of using a lib but anyways.

It looks like i have everything right. Quotes, commas and everything seems to be escaped perfectly. It was rather easy. The problem is i am using unicode strings to test and they come out as ????. When i use MS Excel to save a file with my test string and i hit save as CSV opening the file gets me the same problem (unicode letters becoming ?????). Is unicode not supported?

I just tried dumping the string like this instead of outputting it to a webpage

var f = new System.IO.StreamWriter(filename, false, System.Text.Encoding.Unicode);

and now i see the unicode text but everything is now in one column. Whats weird is everything looks normal in my text editor of choice and if i copy/paste a few columns out and paste it in saving as .csv i see the columns fine. Although it probably strips unicode out.

How do i save this properly?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

纵山崖 2024-09-26 22:39:55

System.Text.Encoding.Unicode 使用 UTF-16 编码。尝试告诉您的文本编辑器使用 UTF-16 进行解码；我猜您用来显示输出文件的编辑器默认为 UTF-8 或 ASCII。如果是这样，另一种方法可能是使用 System.Text.Encoding.UTF8 对输出进行编码。

回复收藏 0 原文

迷乱花海 2024-09-26 22:39:55

您需要做两件事：将文本文件（或 html 页面）标记为包含 Unicode 字符（UTF-8 或 UTF-16），并确保您使用的文本编辑器支持 Unicode 文本。在 Windows 上记事本是一个不错的选择。

要将文本文件（例如 .csv）标记为包含 Unicode 文本，您需要编写字节顺序标记(BOM) 作为文本文件中的第一个字符。对于 UTF-16 小尾数法 (Intel)，BOM 将为字节 0xFF、0xFE。字节顺序标记告诉文档阅读器文档中的字符是按大端序还是小端序排序。 BOM 字符是 Unicode 字符表中保留的非打印字符。此 BOM 还可用于区分 ASCII 文本与 UTF-8 和其他 Unicode 编码（因为 UTF-8 BOM 字节序列与 UTF-16 等不同）。

一些文档编写者会为您编写 BOM，或者可以选择包含或排除 BOM。使用二进制十六进制转储查看文本文件字节以确定是否有 BOM。不要使用文本编辑器 - BOM 是非显示字符。

要指示您生成的 HTML 页面包含 Unicode 字符，您需要设置 Content-Type 标头来指示 Unicode 字符集： Content-Type: text/html;例如，charset=utf-8 表示 UTF-8 编码的 Unicode 文本。

回复收藏 0 原文