如何对 Unicode 进行编码以便 iPad 和 Excel 都能理解?

发布于 2024-11-16 18:25:29 字数 2283 浏览 0 评论 0原文

我有一个使用 UTF32 编码的 CSV。当我在 IE 中打开流并用 Excel 打开时,我可以阅读所有内容。在 iPad 上进行流式传输时,我得到一个没有任何内容的空白页面。 (我不知道如何在 iPad 上查看源代码,因此 HTML 中可能隐藏着一些内容)。

http响应是用asp.net C#编写的

Response.Clear();
Response.Buffer = true;

Response.ContentType = "text/comma-separated-values";
Response.AddHeader("Content-Disposition", "attachment;filename=\"InventoryCount.csv\"");

Response.RedirectLocation = "InventoryCount.csv";
Response.ContentEncoding = Encoding.UTF32;//works on Excel wrong in iPad
//Response.ContentEncoding = Encoding.UTF8;//works on iPad wrong in Excel

Response.Charset = "UTF-8";//tried also adding Charset just to see if it works somehow, but it does not.
EnableViewState = false;

NMDUtilities.Export oUtilities = new NMDUtilities.Export();

Response.Write(oUtilities.DataGridToCSV(gvExport, ","));

Response.End();

我唯一能做的猜测是iPad无法读取UTF32,这是真的吗?如何在 iPad 上查看源代码?


UPDATE
I just made an interesting discovery. When my encoding is UTF8 things work on iPad and characters are displayed properly, but Excel messes up a character. But when I use UTF32 the inverse is true. iPad displays nothing, but Excel works perfectly. I really have no idea what I can do about this.

iPad UTF8 输出 =“Quattrode®”
Excel UTF8 输出 = " Quattrode® "

iPad UTF32 输出 = " "
Excel UTF32 输出 = " Quattrode® "

这是我的 DataGridToCsv 实现

public string DataGridToCsv(GridView input, string delimiter)
{
    StringBuilder sb = new StringBuilder();

//iterate Gridview and put row results in stringbuilder...
   string result = HttpUtility.HtmlDecode(sb.ToString());
   return result;
}


UPDATE2 Excel is barfing on UTF8 >:{. Man. I just undid the second option he lists because it doesnt work on iPad. I cant win for losing on this one.

UPDATE3
根据您的建议,我查看了十六进制代码。没有 BOM,但文件布局存在差异。

UTF8
4D 61 74 65(MATE 来自第一个单词 MATERIAL)
UTF32
4D 00 00 00(M 来自第一个单词 MATERIAL)

所以看起来 UTF32 以 32 位进行排列,而 UTF8 以 8 位进行排列。我想这就是Excel可以猜测的原因。现在我将尝试您建议的修复。

I have a CSV that is encoded with UTF32. When I open stream in IE and open with Excel I can read everything. On iPad I stream and I get a blank page with no content whatsoever. (I don't know how to view source on iPad so there could be something hidden in HTML).

The http response is written in asp.net C#

Response.Clear();
Response.Buffer = true;

Response.ContentType = "text/comma-separated-values";
Response.AddHeader("Content-Disposition", "attachment;filename=\"InventoryCount.csv\"");

Response.RedirectLocation = "InventoryCount.csv";
Response.ContentEncoding = Encoding.UTF32;//works on Excel wrong in iPad
//Response.ContentEncoding = Encoding.UTF8;//works on iPad wrong in Excel

Response.Charset = "UTF-8";//tried also adding Charset just to see if it works somehow, but it does not.
EnableViewState = false;

NMDUtilities.Export oUtilities = new NMDUtilities.Export();

Response.Write(oUtilities.DataGridToCSV(gvExport, ","));

Response.End();

The only guess I can make is that iPad cannot read UTF32, is that true? How can I view source on iPad?


UPDATE
I just made an interesting discovery. When my encoding is UTF8 things work on iPad and characters are displayed properly, but Excel messes up a character. But when I use UTF32 the inverse is true. iPad displays nothing, but Excel works perfectly. I really have no idea what I can do about this.

iPad UTF8 outputs = " Quattrode® "
Excel UTF8 outputs = " Quattrode® "

iPad UTF32 outputs = " "
Excel UTF32 outputs = " Quattrode® "

Here's my implementation of DataGridToCsv

public string DataGridToCsv(GridView input, string delimiter)
{
    StringBuilder sb = new StringBuilder();

//iterate Gridview and put row results in stringbuilder...
   string result = HttpUtility.HtmlDecode(sb.ToString());
   return result;
}


UPDATE2 Excel is barfing on UTF8 >:{. Man. I just undid the second option he lists because it doesnt work on iPad. I cant win for losing on this one.

UPDATE3
Per your suggestions I have looked at the hex code. There is no BOM, but there is a difference between the file layouts.

UTF8
4D 61 74 65 (MATE from the first word MATERIAL)
UTF32
4D 00 00 00 (M from the first word MATERIAL)

So it looks like UTF32 lays things out in 32 bits vs UTF8 doing it in 8 bits. I think this is why Excel can guess. Now I will try your suggested fixes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

时光匆匆的小流年 2024-11-23 18:25:29

问题是浏览器知道您的数据编码是 UTF-8,但它无法告诉 Excel。当 Excel 打开文件时,它采用系统的默认编码。不过,如果您复制一些非 ASCII 文本,将其粘贴到记事本中,并使用 UTF-8 编码保存,您将看到 Excel 可以正确检测到它。它可以在 iPad 上运行,因为它的默认编码恰好是 UTF-8。

原因是记事本放置了正确的字节顺序标记 (EF BB BF 对于 UTF-8)位于文件的开头。您可以自己尝试使用十六进制编辑器或其他方式创建一个包含该文件的文件

EF BB BF 20 51 75 61 74 74 72 6F 64 65 C2 AE 20

并在 Excel 中打开该文件。 (我使用的是 Excel 2010,但我认为它适用于所有最新版本。)

尝试确保您的输出以前 3 个字节开始。


How to write a BOM in C#

    byte[] BOM = new byte[] { 0xef, 0xbb, 0xbf };
    Response.BinaryWrite(BOM);//write the BOM first
    Response.Write(utility.DataGridToCSV(gvExport, ","));//then write your CSV

The problem is that the browser knows your data's encoding is UTF-8, but it has no way of telling Excel. When Excel opens the file, it assumes your system's default encoding. If you copy some non-ASCII text, paste it in Notepad, and save it with UTF-8 encoding, though, you'll see that Excel can properly detect it. It works on the iPad because its default encoding just happens to be UTF-8.

The reason is that Notepad puts the proper byte order mark (EF BB BF for UTF-8) in the beginning of the file. You can try it yourself by using a hex editor or some other means to create a file containing

EF BB BF 20 51 75 61 74 74 72 6F 64 65 C2 AE 20

and opening that file in Excel. (I used Excel 2010, but I assume it would work with all recent versions.)

Try making sure your output starts with those first 3 bytes.


How to write a BOM in C#

    byte[] BOM = new byte[] { 0xef, 0xbb, 0xbf };
    Response.BinaryWrite(BOM);//write the BOM first
    Response.Write(utility.DataGridToCSV(gvExport, ","));//then write your CSV
樱娆 2024-11-23 18:25:29

Excel 尝试根据文件内容推断编码,而 ASCII 和 UTF-8 恰好在前 128 个字符(字母和数字)上重叠。当您使用 UTF-16 和 UTF-32 时,它可以判断出内容不是 ASCII,但由于大多数使用 UTF-8 的内容与 ASCII 匹配,如果您希望文件以 UTF-8 方式读入,你必须像 Gabe 在他的回答中所说的那样,通过编写字节顺序标记来明确告诉它内容是 UTF-8 。另外,请参阅 Andrew Csontos 对其他问题的回答:

将 UTF8 数据导出到 Excel 的最佳方法是什么?

Excel tries to infer the encoding based on your file contents, and ASCII and UTF-8 happen to overlap on the first 128 characters (letters and numbers). When you use UTF-16 and UTF-32, it can figure out that the content isn't ASCII, but since most of your content using UTF-8 matches ASCII, if you want your file to be read in as UTF-8, you have to tell it explicitly that the content is UTF-8 by writing the byte order mark as Gabe said in his answer. Also, see the answer by Andrew Csontos on this other question:

What's the best way to export UTF8 data into Excel?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文