哪种编码可以在 Mac 和 Windows 上使用 Excel 正确打开 CSV 文件?
我们有一个 Web 应用程序,可以导出包含 UTF-8 且无 BOM 的外来字符的 CSV 文件。 Windows 和 Mac 用户在 Excel 中都会遇到垃圾字符。我尝试用BOM转换为UTF-8; Excel/Win 没问题,Excel/Mac 显示乱码。我使用的是 Excel 2003/Win、Excel 2011/Mac。 以下是我尝试过的所有编码:
Encoding BOM Win Mac
-------- --- ---------------------------- ------------
utf-8 -- scrambled scrambled
utf-8 BOM WORKS scrambled
utf-16 -- file not recognized file not recognized
utf-16 BOM file not recognized Chinese gibberish
utf-16LE -- file not recognized file not recognized
utf-16LE BOM characters OK, same as Win
row data all in first field
最好的编码是带有 BOM 的 UTF-16LE,但 CSV 无法识别。字段分隔符是逗号,但分号不会改变内容。
是否有任何编码可以在两个世界中工作?
We have a web app that exports CSV files containing foreign characters with UTF-8, no BOM. Both Windows and Mac users get garbage characters in Excel. I tried converting to UTF-8 with BOM; Excel/Win is fine with it, Excel/Mac shows gibberish. I'm using Excel 2003/Win, Excel 2011/Mac.
Here's all the encodings I tried:
Encoding BOM Win Mac
-------- --- ---------------------------- ------------
utf-8 -- scrambled scrambled
utf-8 BOM WORKS scrambled
utf-16 -- file not recognized file not recognized
utf-16 BOM file not recognized Chinese gibberish
utf-16LE -- file not recognized file not recognized
utf-16LE BOM characters OK, same as Win
row data all in first field
The best one is UTF-16LE with BOM, but the CSV is not recognized as such. The field separator is comma, but semicolon doesn't change things.
Is there any encoding that works in both worlds?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(15)
Excel 编码
我发现
WINDOWS-1252
编码在处理 Excel 时最不令人沮丧。由于它基本上是 Microsoft 自己的专有字符集,因此可以假设它可以在 Mac 和 Windows 版本的 MS-Excel 上运行。两个版本都至少包含相应的“文件来源”或“文件编码”选择器,可以正确读取数据。根据您的系统和使用的工具,此编码还可以命名为
CP1252
、ANSI
、Windows (ANSI)
、MS -ANSI
或只是Windows
,以及其他变体。此编码是
ISO-8859-1
(又名LATIN1
等)的超集,因此您可以回退到ISO-8859-1
如果由于某种原因您无法使用WINDOWS-1252
。请注意,ISO-8859-1
缺少WINDOWS-1252
中的一些字符,如下所示:请注意,缺少欧元符号。
该表可以在 Alan Wood 找到。
转换
每种工具和语言的转换方式都不同。但是,假设您有一个文件
query_result.csv
,您知道该文件是UTF-8
编码的。使用iconv
将其转换为WINDOWS-1252
:Excel Encodings
I found the
WINDOWS-1252
encoding to be the least frustrating when dealing with Excel. Since its basically Microsofts own proprietary character set, one can assume it will work on both the Mac and the Windows version of MS-Excel. Both versions at least include a corresponding "File origin" or "File encoding" selector which correctly reads the data.Depending on your system and the tools you use, this encoding could also be named
CP1252
,ANSI
,Windows (ANSI)
,MS-ANSI
or justWindows
, among other variations.This encoding is a superset of
ISO-8859-1
(akaLATIN1
and others), so you can fallback toISO-8859-1
if you cannot useWINDOWS-1252
for some reason. Be advised thatISO-8859-1
is missing some characters fromWINDOWS-1252
as shown here:Note that the euro sign is missing.
This table can be found at Alan Wood.
Conversion
Conversion is done differently in every tool and language. However, suppose you have a file
query_result.csv
which you know isUTF-8
encoded. Convert it toWINDOWS-1252
usingiconv
:对于带有 BOM 的 UTF-16LE,如果使用制表符而不是逗号作为分隔符,Excel 将识别这些字段。它起作用的原因是 Excel 实际上最终使用了它的 Unicode *.txt 解析器。
警告:如果在 Excel 中编辑文件并保存,它将被保存为制表符分隔的 ASCII。现在的问题是,当您重新打开文件时,Excel 假定它是真正的 CSV(带逗号),发现它不是 Unicode,因此将其解析为逗号分隔 - 因此将对其进行哈希!
更新:至少今天在 Excel 2010 (Windows) 中,上述警告似乎不会发生,尽管在以下情况下保存行为似乎存在差异:
与:
For UTF-16LE with BOM if you use tab characters as your delimiters instead of commas Excel will recognise the fields. The reason it works is that Excel actually ends up using its Unicode *.txt parser.
Caveat: If the file is edited in Excel and saved, it will be saved as tab-delimited ASCII. The problem now is that when you re-open the file Excel assumes it's real CSV (with commas), sees that it's not Unicode, so parses it as comma-delimited - and hence will make a hash of it!
Update: The above caveat doesn't appear to be happening for me today in Excel 2010 (Windows) at least, although there does appear to be a difference in saving behaviour if:
compared to:
事实是:没有解决方案。无论您采用何种编码或跳圈,Excel 2011/Mac 都无法正确解释包含变音符号和变音符号的 CSV 文件。我很高兴听到有人告诉我不同的!
The lowdown is: There is no solution. Excel 2011/Mac cannot correctly interpret a CSV file containing umlauts and diacritical marks no matter what encoding or hoop jumping you do. I'd be glad to hear someone tell me different!
您只尝试过逗号分隔和分号分隔的 CSV。如果您尝试过制表符分隔的 CSV(也称为 TSV),您会找到答案:
带有 BOM(字节顺序标记)的 UTF-16LE,制表符-分离
但是:在评论中,您提到 TSV 不适合您(不过,我在您的问题中找不到此要求)。真遗憾。这通常意味着您允许手动编辑 TSV 文件,这可能不是一个好主意。 TSV 文件的目视检查不是问题。此外,编辑器可以设置为显示特殊字符来标记选项卡。
是的,我在 Windows 和 Mac 上尝试过这一点。
You only have tried comma-separated and semicolon-separated CSV. If you had tried tab-separated CSV (also called TSV) you would have found the answer:
UTF-16LE with BOM (byte order mark), tab-separated
But: In a comment you mention that TSV is not an option for you (I haven't been able to find this requirement in your question though). That's a pity. It often means that you allow manual editing of TSV files, which probably is not a good idea. Visual checking of TSV files is not a problem. Furthermore editors can be set to display a special character to mark tabs.
And yes, I tried this out on Windows and Mac.
以下是将 utf8 编码的 CSV 导入 Excel 2011 for Mac 的关键信息:Microsoft 表示:“Excel for Mac 目前不支持 UTF-8。” Excel for Mac 2011 和 UTF-8
耶,干得好,MS!
Here's the clincher on importing utf8-encoded CSV into Excel 2011 for Mac: Microsoft says: "Excel for Mac does not currently support UTF-8." Excel for Mac 2011 and UTF-8
Yay, way to go MS!
在 Mac 上使用 UTF-8 读取 CSV 文件的最佳解决方法是将其转换为 XLSX 格式。我找到了 Konrad Foerstner 制作的脚本,我通过添加对不同分隔符字符的支持对其进行了一些改进。
从 Github 下载脚本 https://github.com/brablc/clit/blob/ master/csv2xlsx.py。为了运行它,您需要安装用于 Excel 文件操作的 python 模块 openpyxl:
sudo easy_install openpyxl
。The best workaround for reading CSV files with UTF-8 on Mac is to convert them into XLSX format. I have found a script made by Konrad Foerstner, which I have improved little bit by adding support for different delimiter characters.
Download the script from Github https://github.com/brablc/clit/blob/master/csv2xlsx.py. In order to run it you will need to install a python module openpyxl for Excel file manipulation:
sudo easy_install openpyxl
.在我的例子中,这有效(Mac,Excel 2011,带有捷克变音符号的西里尔字母和拉丁字符):
In my case this worked (Mac, Excel 2011, both Cyrillic and Latin characters with Czech diacritics):
在我看来,Mac OS 的 Excel 2011 并没有像我想象的那样使用 Encoding.GetEncoding("10000"),并且浪费了 2 天的时间,但与 Microsoft 操作系统上的 iso 相同。
最好的证明是在 Excel 2011 for MAC 中使用特殊字符创建一个文件,将其另存为 CSV,然后在 MAC 文本编辑器中打开它,字符会被打乱。
对我来说,这种方法有效 - 这意味着 MAC OS 上的 Excel 2011 上的 csv 导出内部有特殊的西欧字符:
It seems to my case that Excel 2011 for Mac OS is not using Encoding.GetEncoding("10000") as i thought and wasted 2 days with but the same iso as on Microsoft OS.
The best proof for this is to make a file in Excel 2011 for MAC with special chars, save it as CSV and then open it in MAC text editor and the chars are scrambled.
For me this approach worked - meaning that csv export on Excel 2011 on MAC OS has special western europeean chars inside:
目前,没有 BOM 的 UTF-8 在 Excel Mac 2011 14.3.2 中适用于我。
UTF-8 + BOM 可以工作,但 BOM 呈现为乱码。
如果您导入文件并完成向导,则 UTF-16 有效,但如果您只是双击它,则无效。
UTF-8 with no BOM currently works for me in Excel Mac 2011 14.3.2.
UTF-8 + BOM kind of works, but BOM rendered as gibberish.
UTF-16 works if you Import the file and complete the wizard, but not if you just double-click it.
以下内容在 Excel for Mac 2011 和 Windows Excel 2002 上对我有用:
在 Mac 上使用 iconv,将文件转换为 UTF-16 Little-Endian + 将其命名为 *.txt(.txt 扩展名强制 Excel 运行文本导入向导):
iconv -f UTF-8 -t UTF-16LE filename.csv >filename_UTF-16LE.csv.txt
在 Excel 中打开文件,然后在文本导入向导中选择:
PS iconv 创建的UTF-16LE 开头有BOM 字节FF FE。
PPS 我的原始 csv 文件是在 Windows 7 计算机上创建的,采用 UTF-8 格式(开头为 BOM 字节 EF BB BF)并使用 CRLF 换行符。逗号用作字段分隔符,单引号用作文本限定符。它包含 ASCII 字母以及带有波形符、元音变音等的不同拉丁字母,以及一些西里尔字母。所有内容均在 Excel for Win 和 Mac 中正确显示。
PPPS 确切的软件版本:
* Mac OS X 10.6.8
* Excel for Mac 2011 v.14.1.3
* Windows Server 2003 SP2
* Windows Excel 2002 v.10.2701.2625
The following worked for me on Excel for Mac 2011 and Windows Excel 2002:
Using iconv on Mac, convert the file to UTF-16 Little-Endian + name it *.txt (the .txt extension forces Excel to run the Text Import Wizard):
iconv -f UTF-8 -t UTF-16LE filename.csv >filename_UTF-16LE.csv.txt
Open the file in Excel and in the Text Import Wizard choose:
PS The UTF-16LE created by iconv has BOM bytes FF FE in the beginning.
PPS My original csv file was created on a Windows 7 computer, in UTF-8 format (with the BOM bytes EF BB BF in the beginning) and used CRLF line breaks. Comma was used as field delimiter and single quote as text qualifier. It contained ASCII letters plus different latin letters with tildes, umlaut etc, plus some cyrillic. All displayed properly in both Excel for Win and Mac.
PPPS Exact software versions:
* Mac OS X 10.6.8
* Excel for Mac 2011 v.14.1.3
* Windows Server 2003 SP2
* Windows Excel 2002 v.10.2701.2625
在我的 Mac 操作系统上,Text Wrangler 将使用 Excel 创建的 CSV 文件识别为具有“西方”编码。
经过一番谷歌搜索后,我制作了这个小脚本(我不确定 Windows 的可用性,也许使用 Cygwin?) :
$ cat /usr/local/bin/utf8.sh
On my Mac OS, Text Wrangler identified a CSV file created with Excel as having "Western" encoding.
After some googling I have made this small script (I am not sure about Windows availability, maybe with Cygwin?):
$ cat /usr/local/bin/utf8.sh
这对我有用
现在是关键点,选择MacIntosh作为文件来源(它应该是首选)。
这是使用 Excel 2011(版本 14.4.2)
*窗口底部有一个小下拉菜单
This works for me
Now the key point, choose MacIntosh as File Origin (it should be the first choice).
This is using Excel 2011 (version 14.4.2)
*There's a little dropdown at the bottom of the window
就我而言,将序言添加到文件解决了我的问题:
In my case adding Preamble to file solved my problem:
尝试输出带有 XLS 扩展名和“application/excel”mime 类型的 html,而不是 csv。我知道这适用于 Windows,但不适用于 MacOS
instead of csv, trying outputting html with an XLS extension and "application/excel" mime-type. I know this will work in Windows, but can't speak for MacOS
使用 java(带有 BOM 的 UTF-16LE)解决此问题:
请注意,CSV 文件应使用
TAB
作为分隔符。您可以在 Windows 和 MAC OS X 上读取 CSV 文件。请参阅:如何使用 BOM 编码/解码 UTF-16LE 字节数组?
Solve this using java ( UTF-16LE with BOM ):
Note that CSV file should use
TAB
as separator. You can read the CSV file both on windows and MAC OS X.Refer to: How do I encode/decode UTF-16LE byte arrays with a BOM?