This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 4 years ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(30)
对于那些寻找完全编程(或至少服务器端)解决方案的人来说,我使用 catdoc 的 xls2csv 工具取得了巨大成功。
安装 catdoc:
进行转换:
速度非常快。
请注意,包含
-d utf-8
标志非常重要,否则它将以默认的cp1252
编码对输出进行编码,并且您将面临丢失信息的风险。请注意,
xls2csv
也仅适用于.xls
文件,不适用于.xlsx
文件。For those looking for an entirely programmatic (or at least server-side) solution, I've had great success using catdoc's xls2csv tool.
Install catdoc:
Do the conversion:
This is blazing fast.
Note that it's important that you include the
-d utf-8
flag, otherwise it will encode the output in the defaultcp1252
encoding, and you run the risk of losing information.Note that
xls2csv
also only works with.xls
files, it does not work with.xlsx
files.最简单的方法:
无需打开office和google docs
的其他代码页
csv
文件,并将所有制表符替换为逗号。要在 Win 10 上的记事本中执行此操作,只需选择一个选项卡字段,然后单击Ctrl+H
。在打开的窗口中,在“替换为”字段中输入逗号,
,然后单击“全部替换”。保存您的文件。结果将是一个以逗号分隔的 UTF-8 csv 文件。无论如何不要用 MS-Office 打开它!!!
现在您有了一个制表符分隔的 CSV 文件。
或者,如果您应用了步骤 5,则为逗号分隔的步骤。
Easiest way:
No need Open office and google docs
other code page that you want
csv
file you just renamed and replace all tabs with commas. To do this in Notepad on Win 10, simply select one tab field then clickCtrl+H
. In the window that opens, type a comma,
in the "Replace with" field then click "Replace All". Save your file. The result will be a comma-delimited UTF-8 csv file.Don't open it with MS-Office anyway!!!
Now you have a tab delimited CSV file.
Or, a comma-delimited one if you applied step number 5.
看起来很有趣,我发现将 180MB 电子表格保存到 UTF8 CSV 文件的最简单方法是选择 Excel 中的单元格,复制它们并将剪贴板的内容粘贴到 SublimeText 中。
As funny as it may seem, the easiest way I found to save my 180MB spreadsheet into a UTF8 CSV file was to select the cells into Excel, copy them and to paste the content of the clipboard into SublimeText.
我无法在 Mac Excel 上找到此问题的 VBA 解决方案。似乎根本无法输出 UTF-8 文本。
所以我最终不得不放弃VBA,硬着头皮去学习AppleScript。情况并不像我想象的那么糟糕。
解决方案描述如下:
http://talesoftech.blogspot.com/2011 /05/excel-on-mac-goodbye-vba-hello.html
I was not able to find a VBA solution for this problem on Mac Excel. There simply seemed to be no way to output UTF-8 text.
So I finally had to give up on VBA, bit the bullet, and learned AppleScript. It wasn't nearly as bad as I had thought.
Solution is described here:
http://talesoftech.blogspot.com/2011/05/excel-on-mac-goodbye-vba-hello.html
假设在 Windows 环境中,像往常一样在 Excel 中保存并使用该文件,然后在 Gnome Gnumeric(免费)中打开保存的 Excel 文件。将 Gnome Gnumeric 的电子表格保存为 CSV,无论如何,对我来说,将其保存为 UTF-8 CSV。
Assuming an Windows environment, save and work with the file as usual in Excel but then open up the saved Excel file in Gnome Gnumeric (free). Save Gnome Gnumeric's spreadsheet as CSV which - for me anyway - saves it as UTF-8 CSV.
简单方法:下载 open office(此处),加载电子表格并打开Excel 文件(
.xls
或.xlsx
)。然后只需将其另存为文本 CSV 文件,就会打开一个窗口,询问是保留当前格式还是另存为 .ODF 格式。选择“保留当前格式”,然后在新窗口中根据文件编写语言选择更适合您的选项。对于西班牙语,选择西欧 (Windows-1252/ WinLatin 1
),文件工作得很好。如果您选择 Unicode (UTF-8
),则它将无法使用西班牙语字符。Easy way to do it: download open office (here), load the spreadsheet and open the excel file (
.xls
or.xlsx
). Then just save it as a text CSV file and a window opens asking to keep the current format or to save as a .ODF format. select "keep the current format" and in the new window select the option that works better for you, according with the language that your file is been written on. For Spanish language select Western Europe (Windows-1252/ WinLatin 1
) and the file works just fine. If you select Unicode (UTF-8
), it is not going to work with the spanish characters.将 xls 文件(Excel 文件)另存为 Unicode 文本=>文件将以文本格式保存(.txt)
将格式从 .txt 更改为 .csv(将文件从 XYX.txt 重命名为 XYX.csv
Save xls file (Excel file) as Unicode text=>file will be saved in text format (.txt)
Change format from .txt to .csv (rename the file from XYX.txt to XYX.csv
我也遇到了同样的问题,但有一个简单的解决方案。
它运行良好,并生成一个 csv 文件,可以在任何软件中导入该文件。我将此 csv 文件导入到我的 SQLITE 数据库中,它与所有完整的 unicode 字符完美配合。
I have also came across the same problem but there is an easy solution for this.
It works perfectly and a csv file is generated which can be imported in any software. I imported this csv file in my SQLITE database and it works perfectly with all unicode characters intact.
遇到了同样的问题,并用谷歌搜索了这篇文章。以上都不适合我。最后,我将 Unicode .xls 转换为 .xml(选择另存为 ... XML Spreadsheet 2003),它生成了正确的字符。然后我编写了代码来解析 xml 并提取内容供我使用。
Came across the same problem and googled out this post. None of the above worked for me. At last I converted my Unicode .xls to .xml (choose Save as ... XML Spreadsheet 2003) and it produced the correct character. Then I wrote code to parse the xml and extracted content for my use.
我编写了一个小型 Python 脚本,可以以 UTF-8 格式导出工作表。
您只需提供 Excel 文件作为第一个参数,然后提供您要导出的工作表。如果您不提供工作表,脚本将导出 Excel 文件中存在的所有工作表。
I have written a small Python script that can export worksheets in UTF-8.
You just have to provide the Excel file as first parameter followed by the sheets that you would like to export. If you do not provide the sheets, the script will export all worksheets that are present in the Excel file.
Excel 通常将 csv 文件保存为 ANSI 编码而不是 utf8。
更正文件的一种方法是使用记事本或记事本 ++:
Excel typically saves a csv file as ANSI encoding instead of utf8.
One option to correct the file is to use Notepad or Notepad++:
“nevets1219”的第二个选项是在 Notepad++ 中打开 CSV 文件并转换为 ANSI。
在顶部菜单中选择:
编码->转换为 Ansi
A second option to "nevets1219" is to open your CSV file in Notepad++ and do a convertion to ANSI.
Choose in the top menu :
Encoding -> Convert to Ansi
编码->转换为 Ansi 会将其编码为 ANSI/UNICODE。 Utf8 是 Unicode 的子集。也许在 ANSI 中会正确编码,但这里我们讨论的是 UTF8,@SequenceDigitale。
有更快的方法,例如导出为 csv(逗号分隔),然后使用 Notepad++(免费)打开该 csv,然后 Encoding >转换为 UTF8。但前提是您必须为每个文件执行一次此操作。如果您需要频繁更改和导出,那么最好的是 LibreOffice 或 GDocs 解决方案。
Encoding -> Convert to Ansi will encode it in ANSI/UNICODE. Utf8 is a subset of Unicode. Perhaps in ANSI will be encoded correctly, but here we are talking about UTF8, @SequenceDigitale.
There are faster ways, like exporting as csv ( comma delimited ) and then, opening that csv with Notepad++ ( free ), then Encoding > Convert to UTF8. But only if you have to do this once per file. If you need to change and export fequently, then the best is LibreOffice or GDocs solution.
Microsoft Excel 可以选择使用 Unicode 编码导出电子表格。请参阅以下屏幕截图。
Microsoft Excel has an option to export spreadsheet using Unicode encoding. See following screenshot.
使用记事本++打开.csv即可。如果你看到你的编码是好的(你看到所有字符都应该是)按编码,然后转换为ANSI
else - 找出你当前的编码是什么
open .csv fine with notepad++. if you see your encoding is good (you see all characters as they should be) press encoding , then convert to ANSI
else - find out what is your current encoding
另一种解决方案是通过 winword 打开文件并将其另存为 txt,然后通过 excel 重新打开它,它将在 ISA 上工作
another solution is to open the file by winword and save it as txt and then reopen it by excel and it will work ISA
保存对话框>工具按钮>网页选项>编码选项卡
Save Dialog > Tools Button > Web Options > Encoding Tab
我遇到了同样的问题,并遇到 this 添加,它工作得很好除了提及的 excel 2007 和 2010 之外,在 excel 2013 中也很好。
I have the same problem and come across this add in , and it works perfectly fine in excel 2013 beside excel 2007 and 2010 which it is mention for.
一个简单的解决方法是使用 Google Spreadsheet。粘贴(仅当您有复杂公式时才使用值)或导入工作表,然后下载 CSV。我只是尝试了几个角色,效果相当不错。
注意:Google 表格在导入时确实有限制。请参阅此处。
注意:使用 Google 表格时请小心敏感数据。
编辑: 另一种选择 - 基本上他们使用VB宏或插件来强制保存为UTF8。我没有尝试过任何这些解决方案,但它们听起来很合理。
A simple workaround is to use Google Spreadsheet. Paste (values only if you have complex formulas) or import the sheet then download CSV. I just tried a few characters and it works rather well.
NOTE: Google Sheets does have limitations when importing. See here.
NOTE: Be careful of sensitive data with Google Sheets.
EDIT: Another alternative - basically they use VB macro or addins to force the save as UTF8. I have not tried any of these solutions but they sound reasonable.
我发现 OpenOffice 的电子表格应用程序 Calc 非常擅长处理 CSV 数据。
在“另存为...”对话框中,单击“格式选项”以获得 CSV 的不同编码。 LibreOffice 的工作方式与 AFAIK 相同。
I've found OpenOffice's spreadsheet application, Calc, is really good at handling CSV data.
In the "Save As..." dialog, click "Format Options" to get different encodings for CSV. LibreOffice works the same way AFAIK.
将 Excel 工作表另存为“Unicode 文本 (.txt)”。好消息是所有国际字符都是 UTF16 格式(注意,不是 UTF8)。但是,新的“*.txt”文件是制表符分隔的,而不是逗号分隔的,因此不是真正的 CSV。
(可选)除非您可以使用制表符分隔的文件进行导入,否则请使用您喜欢的文本编辑器并将制表符替换为逗号“,”。
在目标应用程序中导入您的 *.txt 文件。确保它可以接受 UTF16 格式。
如果 UTF-16 已正确实现并支持非 BMP 代码点,则您可以将 UTF-16 文件转换为 UTF-8,而不会丢失信息。我让您自行寻找您最喜欢的方法。
我使用此过程将数据从 Excel 导入到 Moodle。
Save the Excel sheet as "Unicode Text (.txt)". The good news is that all the international characters are in UTF16 (note, not in UTF8). However, the new "*.txt" file is TAB delimited, not comma delimited, and therefore is not a true CSV.
(optional) Unless you can use a TAB delimited file for import, use your favorite text editor and replace the TAB characters with commas ",".
Import your *.txt file in the target application. Make sure it can accept UTF16 format.
If UTF-16 has been properly implemented with support for non-BMP code points, that you can convert a UTF-16 file to UTF-8 without losing information. I leave it to you to find your favourite method of doing so.
I use this procedure to import data from Excel to Moodle.
我知道这是一个老问题,但我在与OP相同的问题中挣扎时碰巧遇到了这个问题。
由于没有发现所提供的任何解决方案都是可行的选择,我开始探索是否有一种方法可以仅使用 Excel 来实现此目的。
幸运的是,我发现丢失字符的问题仅在从 xlsx 格式保存到 csv 格式时发生(就我而言)。我尝试先将 xlsx 文件保存为 xls,然后保存为 csv。它确实有效。
请尝试一下,看看它是否适合您。祝你好运。
I know this is an old question but I happened to come upon this question while struggling with the same issues as the OP.
Not having found any of the offered solutions a viable option, I set out to discover if there is a way to do this just using Excel.
Fortunately, I have found that the lost character issue only happens (in my case) when saving from xlsx format to csv format. I tried saving the xlsx file to xls first, then to csv. It actually worked.
Please give it a try and see if it works for you. Good luck.
您可以在 Unix 下使用 iconv 命令(在 Windows 上也可以使用 libiconv)。
在 Excel 下保存为 CSV 后,在命令行中输入:(
记住用您的编码替换 cp1250)。
对于无法导入到 GoogleDocs(限制 400.000 个单元格)的邮政编码数据库等大文件,运行速度快且效果出色。
You can use iconv command under Unix (also available on Windows as libiconv).
After saving as CSV under Excel in the command line put:
(remember to replace cp1250 with your encoding).
Works fast and great for big files like post codes database, which cannot be imported to GoogleDocs (400.000 cells limit).
您可以在现代 Windows 计算机上执行此操作,无需第三方软件。此方法可靠,可以处理包含引号逗号、引号制表符、CJK 字符等的数据。
1。从 Excel 保存
在 Excel 中,使用
Unicode 文本 (*.txt)
类型将数据保存到file.txt
。2.启动 PowerShell
从“开始”菜单运行
powershell
。3.在 PowerShell 中加载文件
4.将数据保存为 CSV
You can do this on a modern Windows machine without third party software. This method is reliable and it will handle data that includes quoted commas, quoted tab characters, CJK characters, etc.
1. Save from Excel
In Excel, save the data to
file.txt
using the typeUnicode Text (*.txt)
.2. Start PowerShell
Run
powershell
from the Start menu.3. Load the file in PowerShell
4. Save the data as CSV
执行此操作的唯一“简单方法”如下。首先,认识到 Excel .csv 文件中显示的内容和隐藏的内容之间存在差异。
此文件采用 UTF-8 格式,并保留所有字符和重音符号。例如,导入到 MySQL 和其他数据库程序中
该答案取自 此论坛。
The only "easy way" of doing this is as follows. First, realize that there is a difference between what is displayed and what is kept hidden in the Excel .csv file.
This file is in UTF-8 and retains all characters and accents and can be imported, for example, into MySQL and other database programs.
This answer is taken from this forum.
我发现另一个有用的:
“数字”在另存为 CSV 时允许编码设置。
Another one I've found useful:
"Numbers" allows encoding-settings when saving as CSV.
使用 Notepad++
这将修复 Excel 保存的损坏的 CSV 文件,并以正确的编码重新保存。
Excel 在 CP-1252 / Windows-1252 中保存。在 Notepad++ 中打开 CSV 文件。选择
然后
首先告诉 Notepad++ 编码,然后转换。其中一些其他答案在没有先设置正确的编码的情况下进行转换,从而进一步破坏文件。他们会把应该是
'
的东西变成达
。如果您的角色不适合 CP-1252,则在保存为 CSV 时它已经丢失。使用另一个答案。Using Notepad++
This will fix the corrupted CSV file saved by Excel and re-save it in the proper encoding.
Excel saves in CP-1252 / Windows-1252. Open the CSV file in Notepad++. Select
Then
First tell Notepad++ the encoding, then convert. Some of these other answers are converting without setting the proper encoding first, mangling the file even more. They would turn what should be
’
into達
. If your character does not fit into CP-1252 then it was already lost when it was saved as CSV. Use another answer for that.在 Excel 2016 及更高版本(包括 Office 365)下,有一个专用于 UTF-8 格式的 CSV 选项。
在 Office 365 中,执行“另存为”;以前可能会选择 CSV(逗号分隔),现在您可以另存为的文件类型之一是 CSV UTF-8(逗号分隔)(*.csv)
Under Excel 2016 and up (including Office 365), there is a CSV option dedicated to the UTF-8 format.
In Office 365, do Save As; where previously one might have chosen CSV (Comma Delimited), now one of the file types you can save as is CSV UTF-8 (Comma delimited) (*.csv)
“nevets1219”关于 Google 文档的说法是正确的,但是如果您只是“导入”文件,它通常不会将其转换为 UTF-8。
但如果您将 CSV 导入现有的 Google 电子表格,它会转换为 UTF-8。
这是一个秘诀:
生成的文件将为 UTF-8
"nevets1219" is right about Google docs, however if you simply "import" the file it often does not convert it to UTF-8.
But if you import the CSV into an existing Google spreadsheet it does convert to UTF-8.
Here's a recipe:
The resulting file will be in UTF-8
使用 Powershell 怎么样?
What about using Powershell.