如何使用 python xlrd 处理日语单词

发布于 2024-11-08 14:50:11 字数 424 浏览 4 评论 0原文

这是我的代码：

#!/usr/bin/python   
#-*-coding:utf-8-*-   

import xlrd,sys,re

data = xlrd.open_workbook('a.xls',encoding_override="utf-8")
a = data.sheets()[0]
s=''
for i in range(a.nrows):
    if 9<i<20:
        #stage
        print a.row_values(i)[1].decode('shift_jis')+'\n'

但它显示：

????
????????
??????
????
????
????
????????

那么我能做什么，

谢谢

原文

this is my code:

#!/usr/bin/python   
#-*-coding:utf-8-*-   

import xlrd,sys,re

data = xlrd.open_workbook('a.xls',encoding_override="utf-8")
a = data.sheets()[0]
s=''
for i in range(a.nrows):
    if 9<i<20:
        #stage
        print a.row_values(i)[1].decode('shift_jis')+'\n'

but it show :

????
????????
??????
????
????
????
????????

so what can i do ,

thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

皇甫轩 2024-11-15 14:50:11

背景：在“现代”(Excel 97-2003) XLS 文件中，文本实际上存储为 Unicode。在较旧的文件中，文本存储为 8 位字符串，“代码页”记录说明其编码方式，例如整数 1252 对应于 cp1252 或 windows-1252< 的编码。 /代码>。无论哪种情况，xlrd 都会将提取的文本呈现为 unicode 对象。

请将这一行插入您的代码中：

print data.biff_version, data.codepage, data.encoding

如果您有新文件，您应该看到

80 1200 utf_16_le

无论如何，请编辑您的问题以报告结果。

问题 1：仅当文件是旧文件并且您知道/怀疑代码页记录被省略或错误时才需要 encoding_override。如果文件是新文件，则忽略它。您真的知道该文件是 Excel-97 之前的版本并且文本采用 UTF-8 编码吗？如果是这样，它只能是由一些严重欺骗的第三方软件创建的，如果你尝试用Excel打开它，Excel就会崩溃；带着棒球棒拜访作者。否则，不要使用encoding_override。

问题 2：您应该拥有unicode 对象。要显示它们，您需要使用合适的编码将它们从 unicode 编码（不解码）为 str。令人惊讶的是 print unicode_object.decode('shift-jis') 没有引发异常并打印问号。

为了帮助理解这一点，请将您的代码更改为如下所示：

text = a.rowvalues(i)[1]
print i, repr(text)
print repr(text.decode('shift-jis'))

并报告结果。

这样我们就可以帮助您选择合适的编码（如果有），请告诉我们您正在使用的操作系统的版本，以及以下显示的内容：

print sys.stdout.encoding
import locale
print locale.getpreferredencoding()

进一步阅读：

（1）xlrd 文档（关于 Unicode 的部分，右上）前面）...包含在发行版中，或者获取最新提交

(2) Python Unicode HOWTO。

Background: In a "modern" (Excel 97-2003) XLS file, text is effectively stored as Unicode. In older files, text is stored as 8-bit strings, and a "codepage" record tells how it is encoded e.g. the integer 1252 corresponds to the encoding known as cp1252 or windows-1252. In either case, xlrd presents extracted text as unicode objects.

Please insert this line into your code:

print data.biff_version, data.codepage, data.encoding

If you have a new file, you should see

80 1200 utf_16_le

In any case, please edit your question to report the outcome.

Problem 1: encoding_override is required ONLY if the file is an old file AND you know/suspect that the codepage record is omitted or wrong. It is ignored if the file is a new file. Do you really know that the file is pre-Excel-97 and the text is encoded in UTF-8? If so, it can only have been created by some seriously deluded 3rd-party software, and Excel will blow up if you try to open it with Excel; visit the author with a baseball bat. Otherwise, don't use encoding_override.

Problem 2: You should have unicode objects. To display them, you need to encode (not decode) them from unicode to str using a suitable encoding. It is very suprising that print unicode_object.decode('shift-jis') doesn't raise an exception and prints question marks.

To help understand this, please change your code to be like this:

text = a.rowvalues(i)[1]
print i, repr(text)
print repr(text.decode('shift-jis'))

and report the outcome.

So that we can help you choose an appropriate encoding (if any), tell us what version of what operating system you are using, and what the following display:

print sys.stdout.encoding
import locale
print locale.getpreferredencoding()

关于作者

浅紫色的梦幻

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

如何使用 python xlrd 处理日语单词

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

亚希

cyp

北漠

11223456

坠似风落

游魂

友情链接

如何使用 python xlrd 处理日语单词

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

亚希

cyp

北漠

11223456

坠似风落

游魂

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。