我应该在python中输入什么样的编码声明

发布于 2024-12-17 21:23:32 字数 1130 浏览 2 评论 0原文

我从网站上了解到,当我不想输入友好的unicode字符时,我应该在python中添加代码声明: http ://www.python.org/dev/peps/pep-0263/,但我仍然对此感到困惑。

假设我在linux下使用vim工作,我创建一个新的py文件并输入代码如下:

#!/usr/bin/python2.7
# -*- coding: utf8 -*-
s = u'ޔ'
print s

1。我尝试用以下代码替换第2行:

import sys
reload(sys)
sys.setdefaultencoding('utf8')

但它不起作用,它们不是相同的吗?

2。我对linux不太熟悉,我真的不知道为什么要在代码声明的开头和结尾添加 _*_ ,当我尝试替换 # -*- 编码时: utf8 -*-# code=utf8# code: utf8,我收到错误:

File "pythontest.py", line 3
SyntaxError: Non-ASCII character '\xde' in file pythontest.py on line 3, but no encoding declared; see     http://www.python.org/peps/pep-0263.html for details

但这些代码声明在网站 http://www.python.org/dev/peps/pep-0263/

根据文档,允许如下代码声明:

# This Python file uses the following encoding: utf-8

哎呀,这是什么?我不认为它可以被计算机识别。代码到底应该声明什么?我感觉越来越迷茫了。

感谢您的帮助。

I learned from the websitethat I should add the code declaration in python when i wan't to input friendly unicode characters: http://www.python.org/dev/peps/pep-0263/, but I still feel confused about it.

Assume that i work in linux with vim, and i create a new py file and input codes as follows:

#!/usr/bin/python2.7
# -*- coding: utf8 -*-
s = u'ޔ'
print s

1. I tried to replace line 2 with codes as follows:

import sys
reload(sys)
sys.setdefaultencoding('utf8')

but it doesn't work, aren't they same?

2. I am not very familiar with linux, I really dont know why should i add _*_ at the beginning and end of code delcaration, and when i tried to replaced # -*- coding: utf8 -*- with # code=utf8 or # code: utf8, I got an error:

File "pythontest.py", line 3
SyntaxError: Non-ASCII character '\xde' in file pythontest.py on line 3, but no encoding declared; see     http://www.python.org/peps/pep-0263.html for details

but these code declaration is mentioned in the website http://www.python.org/dev/peps/pep-0263/!

and according to the documentation , the code declaration as follows is allowed:

# This Python file uses the following encoding: utf-8

Oops, what's this? I don't think it can be recognized by computer.what in the world should the code declared? I feel more and more confused.

Thanks for help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

只涨不跌 2024-12-24 21:23:32

您链接的 PEP 摘要确实说明了一切:

此 PEP 建议引入一种语法来声明编码
Python 源文件
。然后,编码信息由
Python 解析器使用给定的编码解释文件。最多
值得注意的是,这增强了 Unicode 文字的解释
源代码并使得可以编写 Unicode 文字
例如,直接在 Unicode 识别编辑器中使用 UTF-8。

(重点是我的)。

即使您想要做的事情可行(以编程方式替换源文件的编码),它也没有任何意义。想想看:代码是静态的(不会改变)。尝试使用不同的编码来读取它是没有意义的:只有一种正确的编码(源作者编辑源的编码)。

至于语法:

# This Python file uses the following encoding: utf-8

PEP 本身说该语法是“没有解释器行,使用纯文本”。它是为人类放置的。因此,如果您在文本编辑器中打开文件并发现它充满乱码,您可以在其菜单中手动设置源的编码。

编辑: 至于为什么应该将编码放在 # -*--*- 之间......这纯粹是约定俗成的。第一个符号,哈希符号,告诉这是一个注释(因此它不会被编译为字节码),然后 _*_ 只是告诉解析器该特定注释的一种方式是为了他/她。

这与放入源代码没有任何不同:

# TODO: fix this nasty bug

其中 TODO: 部分告诉开发人员(和某些 IDE)这是一条需要执行操作的消息。您实际上可以使用任何您想要的东西,包括 @MarkZarWTF!...只是约定!

哈!

The abstract of the PEP you link really says it all:

This PEP proposes to introduce a syntax to declare the encoding of
a Python source file
. The encoding information is then used by the
Python parser
to interpret the file using the given encoding. Most
notably this enhances the interpretation of Unicode literals in
the source code and makes it possible to write Unicode literals
using e.g. UTF-8 directly in an Unicode aware editor.

(the emphasis is mine).

Even if what you wanted to do would have worked (replacing the encoding of the source file programmatically), it wouldn't have had any sense. Think about it: the code is static (doesn't change). It would make no sense to try to read it with different encoding: there is only one correct one (the one the author of the source edited the source in).

As for the syntax:

# This Python file uses the following encoding: utf-8

the PEP itself says that that syntax is "Without interpreter line, using plain text". It is placed there for humans. So that if you open a file in a text editor and find it full of gibberish, you can manually set the encode of the source in its menu.

EDIT: As for why you should put the encoding between # -*- and -*-... That's purely conventional. The first symbol, the hash sign, tells that that is a comment (so it won't be compiled to bytecode), then the _*_ is just a way to tell the parser that that specific comment is for him/her.

It is not any different than putting in your source:

# TODO: fix this nasty bug

in which the TODO: part tells the developer (and some IDE) that this is a message requiring an action. You could have really used whatever your wanted, including @MarkZar or WTF!... just convention!

HTH!

鯉魚旗 2024-12-24 21:23:32

python编码声明的重要部分是coding: utf-8,它应该在Python代码第一行之前的注释中,你可以对注释的其他部分做任何你想做的事情。

以下是 PEP 中描述此行为的几行内容:

更准确地说,第一行或第二行必须与常规行匹配
表达式“编码[:=]\s*([-\w.]+)”。这组第一组
然后表达式被解释为编码名称。如果编码
Python 不知道,编译期间会引发错误。那里
不得是包含以下内容的行上的任何 Python 语句
编码声明。

The important part of python encoding declaration is coding: utf-8 and it should be in a comment before the first line of python code, and you can do whatever you want with the other part of the comment.

Here is the lines in the PEP described this behaviour:

More precisely, the first or second line must match the regular
expression "coding[:=]\s*([-\w.]+)". The first group of this
expression is then interpreted as encoding name. If the encoding
is unknown to Python, an error is raised during compilation. There
must not be any Python statement on the line that contains the
encoding declaration.

情归归情 2024-12-24 21:23:32

您需要该行,因为您需要告诉编译器源代码使用哪种编码。

You need the line since you need to tell the compiler which encoding the source code uses.

放赐 2024-12-24 21:23:32

使用正则表达式 coding[:=]\s*([-\w.]+) 在该行的任意位置搜索编码设置。这意味着:

  • 找到精确的字符串 coding=coding: 后跟零个或多个空白字符,后跟至少一个字符字母数字、_-

  • 捕获至少一个的运行...

  • 捕获的部分用作编码。

也就是说,使用类似的东西是完全合法的

# This program was written for Python 3. Encoding that should be used for decoding: UTF-8!

,因为仍然可以在那里找到所需格式的字符串。


Python 3 源文件默认使用 UTF-8 作为编码,因此只要使用 UTF-8,Python 3 代码中就不需要 #coding: utf-8

The encoding setting is searched using the regular expression coding[:=]\s*([-\w.]+) anywhere on the line. This means:

  • find the exact string coding= or coding: followed by zero or more white-space characters, followed by a run of at least one character that are alphanumeric, _ or -.

  • capture the run of at least one...

  • the captured part is used as the encoding.

That is, it is perfectly legal to use anything like

# This program was written for Python 3. Encoding that should be used for decoding: UTF-8!

because the string in required format can still be found there.


Python 3 source files default to UTF-8 as the encoding, so no # coding: utf-8 is necessary in Python 3 code for as long as you use UTF-8.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文