使用Python3读取srt(字幕)文件
我希望能够使用 python3 读取 srt 文件。
这些文件可以在这里找到: http://www.opensubtitles.org/
信息如下: http://en.wikipedia.org/wiki/SubRip
Subrip 支持任何编码:ascii 或 unicode , 例如。
如果我理解正确的话,那么当我使用 python 读取函数时,我需要指定使用哪个解码器。那么我说我需要知道文件是如何编码的才能做出这个判断,对吗?如果是这样,如果我有一百个具有不同来源和语言支持的此类文件,如何为每个文件建立该文件?
最终,我更希望能够转换这些文件,以便它们都以 utf-8 编码开始。但据我所知,其中一些文件可能是一些晦涩的编码。
请帮忙,
巴里
I wish to be able to read an srt file with python3.
These files can be found here:
http://www.opensubtitles.org/
With info here:
http://en.wikipedia.org/wiki/SubRip
Subrip supports any encoding: ascii or unicode, for example.
If I understand correctly then I need to specify which decoder to use when I use pythons read function. So am I right in saying that I need to know how the file is encoded in order to make this judgement? If so how do I establish that for each file if I have a hundred such files with different sources and language support?
Ultimately I would prefer if I could convert the files so that they are all in utf-8 encoding to start with. But some of these files might be some obscure encoding for all I know.
Please help,
Barry
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以使用
charade
包(以前的chardet
) 来检测编码。You could use the
charade
package (formerlychardet
) to detect the encoding.您可以检查每个
.srt
字节顺序标记 > 用于测试编码的文件。但是,这可能不适用于所有文件,因为它不是必需的属性,而且只能在UTF
文件中指定。 您可能想要做的只是打开文件,然后将您从文件中提取的任何内容解码为 unicode,处理 unicode 表示,直到准备好打印,然后再次将其编码回来。请参阅此演讲,了解更多信息以及可能相关的代码示例。
You can check for the byte order mark at the start of each
.srt
file to test for encoding. However, this probably won't work for all files, as it is not a required attribute, and only specified inUTF
files anyways. A check can be performed byWhat you probably want to do is simply open your file, then decode whatever you pull out of the file into unicode, deal with the unicode representation until you are ready to print, and then encode it back again. See this talk for some more information, and code samples that might be relevant.
还有一个不错的库用于处理 SRT 文件:
https://pypi.python.org/pypi/pysrt
您可以指定打开和写入SRT文件时的编码。
There's also a decent library for handling SRT files:
https://pypi.python.org/pypi/pysrt
You can specify the encoding when opening and writing SRT files.