charset-utf8 和字符实体
我建议将我的 windows-1252 XHTML 网页转换为 UTF-8。
我的编码中有以下字符实体:
'
- 撇号,►
- 右指针,◄< /code> — 左指针。
如果我使用编辑器更改字符集并将页面保存为 UTF-8:
- 撇号仍作为字符实体;
- 指针被转换为代码中的符号(大概是因为 UTF-8 不支持实体?)。
问题:
如果我正确理解 UTF-8,则无需使用实体,可以直接在代码中键入字符。在哪种情况下,我可以安全地将
#39
替换为键入的撇号?编辑器将指针符号直接放入我的代码中是否正确,这些符号是否会在现代浏览器上可靠地显示,看起来没问题?据推测,如果我使用 UTF-8,我无论如何都无法恢复到实体?
谢谢。
I am proposing to convert my windows-1252 XHTML web pages to UTF-8.
I have the following character entities in my coding:
'
— apostrophe,►
— right pointer,◄
— left pointer.
If I change the charset and save the pages as UTF-8 using my editor:
- the apostrophe remains in as a character entity;
- the pointers are converted to symbols within the code (presumably because the entities are not supported in UTF-8?).
Questions:
If I understand UTF-8 correctly, you don't need to use the entities and can type characters directly into the code. In which case is it safe for me to replace
#39
with a typed in apostrophe?Is it correct that the editor has placed the pointer symbols directly into my code and will these be displayed reliably on modern browsers, it seems to be ok? Presumably, I can't revert to the entities anyway, if I use UTF-8?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是字符集,而不是图表集。
1) 这取决于撇号的使用位置,它也是一个有效的 ASCII 字符,因此根据字符意图(无论是仅用于显示(在 DOMText 节点内)还是在代码中使用),您可能能够或可能无法使用一个字面的撇号。
2)如果您的编辑器是现代编辑器,它将使用 utf 序列而不是仅使用 char 来显示文本。代码中使用的大多数序列只是纯 ASCII(ASCII 是 utf8 的子集),因此这些字符将占用一个字节。其他字符可能以特殊方式占用两个、三个甚至四个字节。它们仍将作为一个字符显示给您,但字符和字节之间的关系已变得不同。
反正;因为所有有效的 ASCII 字符在 ASCII、utf8 甚至 windows-1252 中都是完全相同的。使用 utf8 应该不会出现任何问题。您仍然可以使用数字和命名实体,因为它们是用这些有效字符编写的。你只是不必这样做。
PS 所有现代浏览器都可以很好地处理 utf8。但我们对“现代”的定义可能会有所不同。
It's charset, not chartset.
1) it depends on where the apostrophe is used, it's a valid ASCII character as well so depending on the characters intention (wether its for display only (inside a DOMText node) or used in code) you may or may not be able to use a literal apostrophe.
2) if your editor is a modern editor, it will be using utf sequences instead of just char to display text. most of the sequences used in code are just plain ASCII (and ASCII is a subset of utf8) so those characters will take up one byte. other characters may take up two, three or even four bytes in a specialized manner. they will still be displayed to you as one character, but the relation between character and byte has become different.
Anyway; since all valid ASCII characters are exactly the same in ASCII, utf8 and even windows-1252. you should not see any problems using utf8. And you can still use numeric and named entities because they are written in those valid characters. You just don't have to.
P.S. All modern browsers can do utf8 just fine. but our definitions of "modern" may vary.
实体具有三个目的:对无法使用所使用的字符编码(与 UTF-8 无关)进行编码的字符进行编码、对在给定键盘上不方便键入的字符进行编码以及对非法未转义的字符进行编码。
无论编码是什么,
►
应始终生成 ►。如果没有,那就是其他地方的错误。直接在源代码中使用 UTF-8 即可。你可以这样做,也可以做实体,这没有什么区别。
' 在大多数情况下都可以,但在某些情况下则不然。以下都是允许的:
但必须编码为:
因为否则它将被视为结束属性值的 '。
Entities have three purposes: Encoding characters it isn't possible to encode in the character encoding used (not relevant with UTF-8), encoding characters it is not convenient to type on a given keyboard, and encoding characters that are illegal unescaped.
►
should always produce ► no matter what the encoding. If it doesn't, it's a bug elsewhere.►
directly in the source is fine in UTF-8. You can do either that or the entity, and it makes no difference.' is fine in most contexts, but not some. The following are both allowed:
But would have to be encoded in:
because otherwise it would be taken as the ' that ends the attribute value.
如果您从文字处理器复制/粘贴内容或者代码是 XML 方言,请使用实体。使用文本编辑器中的宏一次性查找/替换常见的宏。这是一个简单的列表:
½
é
&
'
`
\
•
$
¢
…
—
–
“
”
参考
Use entities if you copy/paste content from a word processor or if the code is an XML dialect. Use a macro in your text-editor to find/replace the common ones in one shot. Here is a simple list:
½
é
&
'
`
\
•
$
¢
…
—
–
“
”
References