”“E2;”“”显示在页面上而不是“” ' ”

发布于 2024-08-25 09:14:44 字数 490 浏览 11 评论 0原文

我的页面上显示的是 ',而不是 '

我在 标记和 HTTP 标头中将 Content-Type 设置为 UTF-8

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

在此处输入图像描述

另外,我的浏览器设置为 Unicode (UTF-8):

在此处输入图像描述

那么问题是什么,我该如何解决它?

’ is showing on my page instead of '.

I have the Content-Type set to UTF-8 in both my <head> tag and my HTTP headers:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

enter image description here

In addition, my browser is set to Unicode (UTF-8):

enter image description here

So what's the problem, and how can I fix it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

为你拒绝所有暧昧 2024-09-01 09:14:44

那么问题出在哪里

这是一个 ' (< code>右单引号 - U+2019) 字符被解码为 CP-1252 而不是 UTF-8。如果您检查 FileFormat.Info 中该字符的编码表 ,然后您会看到该字符采用 UTF-8 格式,由字节 0xE20x800x99 组成。
输入图像描述这里

如果您检查 维基百科上的 CP-1252 代码页布局< /a>,然后您将看到十六进制字节 E28099 代表单个字符 â、<代码>€和<代码>™。
输入图像描述这里


我该如何解决这个问题?

使用 UTF-8 而不是 CP-1252 来读取、写入、存储和显示字符。


我在 标记和 HTTP 标头中将 Content-Type 设置为 UTF-8:

>

这仅指示客户端使用哪种编码来解释和显示字符。这不会指示您自己的程序使用哪种编码来读取、写入、存储和显示字符。确切的答案取决于所使用的服务器端平台/数据库/编程语言。请注意,HTTP 响应标头中设置的优先级高于 HTML 元标记。 HTML 元标记将在通过 file:// URL 从本地磁盘文件系统打开页面而不是通过 从网络打开时使用http(s):// URL。


此外,我的浏览器设置为 Unicode (UTF-8)

这只会强制客户端使用哪种编码来解释和显示字符。但实际的问题是,您已经向客户端发送确切的字符 (以 UTF-8 编码),而不是字符 '< /代码>。客户端基本上使用 UTF-8 编码正确显示 。如果客户端被错误指示使用 ISO-8859-1 等来显示它们,那么您可能会看到 ââ‚ââ


我正在使用带有数据库的 ASP.NET 2.0。

这很可能就是您的问题所在。您需要使用独立的数据库工具来验证数据是什么样的。

如果 ' 字符正确,那么您很可能没有从程序正确连接到数据库。您基本上需要重新配置数据库连接器以使用 UTF-8。如何执行此操作取决于所使用的数据库。

或者,如果您的数据库已经包含 ,那么就是您的数据库搞砸了。这些表很可能未配置为使用 UTF-8。相反,它们使用数据库的默认编码,该编码根据配置而变化。如果这是您的问题,那么通常只需更改表以使用 UTF-8 就足够了。如果您的数据库不支持,则需要重新创建表。最好在创建表时设置表的编码。

您很可能使用 SQL Server,但这里有一些 MySQL 代码(复制自 这篇文章):

CREATE DATABASE db_name CHARACTER SET utf8;
CREATE TABLE tbl_name (...) CHARACTER SET utf8;

如果您的表已经是 UTF-8,那么您需要后退一步。 什么将数据放在那里。 这就是问题所在。一个例子是 HTML 表单提交的值被错误地编码/解码。


这里有一些更多的链接来了解有关该问题的更多信息:

So what's the problem,

It's a (RIGHT SINGLE QUOTATION MARK - U+2019) character which is being decoded as CP-1252 instead of UTF-8. If you check the Encodings table of this character at FileFormat.Info, then you see that this character is in UTF-8 composed of bytes 0xE2, 0x80 and 0x99.
enter image description here

And if you check the CP-1252 code page layout at Wikipedia, then you'll see that the hex bytes E2, 80 and 99 stand for the individual characters â, and .
enter image description here


and how can I fix it?

Use UTF-8 instead of CP-1252 to read, write, store, and display the characters.


I have the Content-Type set to UTF-8 in both my <head> tag and my HTTP headers:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

This only instructs the client which encoding to use to interpret and display the characters. This doesn't instruct your own program which encoding to use to read, write, store, and display the characters in. The exact answer depends on the server side platform / database / programming language used. Do note that the one set in HTTP response header has precedence over the HTML meta tag. The HTML meta tag would then only be used when the page is opened from local disk file system via a file:// URL instead of from the web via a http(s):// URL.


In addition, my browser is set to Unicode (UTF-8):

This only forces the client which encoding to use to interpret and display the characters. But the actual problem is that you're already sending the exact characters ’ (encoded in UTF-8) to the client instead of the character . The client is basically correctly displaying ’ using the UTF-8 encoding. If the client was misinstructed to use for example ISO-8859-1 to display them, then you would likely have seen ââ¬â¢ instead.


I am using ASP.NET 2.0 with a database.

This is most likely where your problem lies. You need to verify with an independent database tool what the data looks like.

If the character is correctly there, then you are most likely not correctly connecting to the database from your program. You basically need to reconfigure the database connector to use UTF-8. How to do that depends on the database being used.

Or if your database already contains ’, then it's your database that's messed up. Most probably the tables aren't configured to use UTF-8. Instead, they use the database's default encoding, which varies depending on the configuration. If this is your issue, then usually just altering the table to use UTF-8 is sufficient. If your database doesn't support that, you'll need to recreate the tables. It is good practice to set the encoding of the table when you create it.

You're most likely using SQL Server, but here is some MySQL code (copied from this article):

CREATE DATABASE db_name CHARACTER SET utf8;
CREATE TABLE tbl_name (...) CHARACTER SET utf8;

If your table is however already UTF-8, then you need to take a step back. Who or what put the data there. That's where the problem is. One example would be HTML form submitted values which are incorrectly encoded/decoded.


Here are some more links to learn more about the problem:

十年不长 2024-09-01 09:14:44

确保浏览器和编辑器使用 UTF-8 编码而不是 ISO-8859-1/Windows-1252。

或者使用

Ensure the browser and editor are using UTF-8 encoding instead of ISO-8859-1/Windows-1252.

Or use .

悲欢浪云 2024-09-01 09:14:44

'(Unicode 代码点 U+2019 RIGHT SINGLE QUOTATION MARK)以 UTF-8 编码为字节:

0xE2 0x80 0x99

(Unicode 代码点 U+00E2 U+20AC U+2122)以 UTF-8 编码为字节:

0xC3 0xA2 0xE2 0x82 0xAC 0xE2 0x84 0xA2

这些是您的浏览器实际接收的字节,以便在处理为 UTF-8 时生成

这意味着您的源数据在发送到浏览器之前会经历两次字符集转换:

  1. ' 字符 (U+2019) 首先被编码为 UTF-8 字节:

    0xE2 0x80 0x99

  2. 这些单独的字节随后被错误解释并解码为 Unicode 代码点 U+00E2 U+20AC U+2122 Windows-125X 字符集之一(1252、1254、1256 和 1258 均将 0xE2 0x80 0x99 映射到 U+00E2 U+20AC U+2122 code>),然后这些代码点被编码为 UTF-8 字节:

    0xE2 -> U+00E2 -> 0xC3 0xA2
    0x80 -> U+20AC -> 0xE2 0x82 0xAC
    0x99 -> U+2122 -> 0xE2 0x84 0xA2

您需要找到执行步骤 2 中的额外转换的位置并将其删除。

(Unicode codepoint U+2019 RIGHT SINGLE QUOTATION MARK) is encoded in UTF-8 as bytes:

0xE2 0x80 0x99.

’ (Unicode codepoints U+00E2 U+20AC U+2122) is encoded in UTF-8 as bytes:

0xC3 0xA2   0xE2 0x82 0xAC   0xE2 0x84 0xA2.

These are the bytes your browser is actually receiving in order to produce ’ when processed as UTF-8.

That means that your source data is going through two charset conversions before being sent to the browser:

  1. The source character (U+2019) is first encoded as UTF-8 bytes:

    0xE2 0x80 0x99

  2. those individual bytes were then being mis-interpreted and decoded to Unicode codepoints U+00E2 U+20AC U+2122 by one of the Windows-125X charsets (1252, 1254, 1256, and 1258 all map 0xE2 0x80 0x99 to U+00E2 U+20AC U+2122), and then those codepoints are being encoded as UTF-8 bytes:

    0xE2 -> U+00E2 -> 0xC3 0xA2
    0x80 -> U+20AC -> 0xE2 0x82 0xAC
    0x99 -> U+2122 -> 0xE2 0x84 0xA2

You need to find where the extra conversion in step 2 is being performed and remove it.

故人的歌 2024-09-01 09:14:44

我有一些文档,其中 ... 显示为 ,而 ê 显示为 àª。这就是它的实现方式(python 代码):

# Adam edits original file using windows-1252
windows = '\x85\xea' 
# that is HORIZONTAL ELLIPSIS, LATIN SMALL LETTER E WITH CIRCUMFLEX

# Beth reads it correctly as windows-1252 and writes it as utf-8
utf8 = windows.decode("windows-1252").encode("utf-8")
print(utf8)

# Charlie reads it *incorrectly* as windows-1252 writes a twingled utf-8 version
twingled = utf8.decode("windows-1252").encode("utf-8")
print(twingled)

# detwingle by reading as utf-8 and writing as windows-1252 (it's really utf-8)
detwingled = twingled.decode("utf-8").encode("windows-1252")

assert utf8==detwingled

为了解决这个问题,我使用了这样的 python 代码:(

with open("dirty.html","rb") as f:
    dt = f.read()
ct = dt.decode("utf8").encode("windows-1252")
with open("clean.html","wb") as g:
    g.write(ct)

因为有人将 twingled 版本插入到正确的 UTF-8 文档中,所以我实际上只需要提取 twingled 部分,detwingle并将其插回。我为此使用了 BeautifulSoup。)

您在内容创建方面有查理的可能性比 Web 服务器配置错误的可能性要大得多。您还可以通过为 utf-8 文档选择 windows-1252 编码来强制 Web 浏览器旋转页面。您的网络浏览器无法解开查理保存的文档。

注意:任何其他单字节代码页(例如 latin-1)而不是 windows-1252 都可能发生同样的问题。

I have some documents where was showing as … and ê was showing as ê. This is how it got there (python code):

# Adam edits original file using windows-1252
windows = '\x85\xea' 
# that is HORIZONTAL ELLIPSIS, LATIN SMALL LETTER E WITH CIRCUMFLEX

# Beth reads it correctly as windows-1252 and writes it as utf-8
utf8 = windows.decode("windows-1252").encode("utf-8")
print(utf8)

# Charlie reads it *incorrectly* as windows-1252 writes a twingled utf-8 version
twingled = utf8.decode("windows-1252").encode("utf-8")
print(twingled)

# detwingle by reading as utf-8 and writing as windows-1252 (it's really utf-8)
detwingled = twingled.decode("utf-8").encode("windows-1252")

assert utf8==detwingled

To fix the problem, I used python code like this:

with open("dirty.html","rb") as f:
    dt = f.read()
ct = dt.decode("utf8").encode("windows-1252")
with open("clean.html","wb") as g:
    g.write(ct)

(Because someone had inserted the twingled version into a correct UTF-8 document, I actually had to extract only the twingled part, detwingle it and insert it back in. I used BeautifulSoup for this.)

It is far more likely that you have a Charlie in content creation than that the web server configuration is wrong. You can also force your web browser to twingle the page by selecting windows-1252 encoding for a utf-8 document. Your web browser cannot detwingle the document that Charlie saved.

Note: the same problem can happen with any other single-byte code page (e.g. latin-1) instead of windows-1252.

陈独秀 2024-09-01 09:14:44

当字符串从 Windows-1252 转换为 UTF-8 两次时,有时会发生这种情况。

我们在 Zend/PHP/MySQL 应用程序中遇到了这种情况,其中类似的字符出现在数据库中,可能是由于 MySQL 连接未指定正确的字符集。我们必须:

  1. 确保 Zend 和 PHP 与数据库以 UTF-8 进行通信(默认情况下不是

  2. 使用几个像这样的 SQL 查询修复损坏的字符...

    更新 MyTable SET 
    MyField1 = CONVERT(CAST(CONVERT(MyField1 USING latin1) AS BINARY) USING utf8),
    MyField2 = CONVERT(CAST(CONVERT(MyField2 USING latin1) AS BINARY) USING utf8);
    

    根据需要对尽可能多的表/列执行此操作。

如果需要,您还可以在 PHP 中修复其中一些字符串。请注意,因为字符已被编码两次,所以我们实际上需要将 UTF-8 反向转换回Windows-1252,这一开始让我很困惑。

mb_convert_encoding('’', 'Windows-1252', 'UTF-8');    // returns ’

This sometimes happens when a string is converted from Windows-1252 to UTF-8 twice.

We had this in a Zend/PHP/MySQL application where characters like that were appearing in the database, probably due to the MySQL connection not specifying the correct character set. We had to:

  1. Ensure Zend and PHP were communicating with the database in UTF-8 (was not by default)

  2. Repair the broken characters with several SQL queries like this...

    UPDATE MyTable SET 
    MyField1 = CONVERT(CAST(CONVERT(MyField1 USING latin1) AS BINARY) USING utf8),
    MyField2 = CONVERT(CAST(CONVERT(MyField2 USING latin1) AS BINARY) USING utf8);
    

    Do this for as many tables/columns as necessary.

You can also fix some of these strings in PHP if necessary. Note that because characters have been encoded twice, we actually need to do a reverse conversion from UTF-8 back to Windows-1252, which confused me at first.

mb_convert_encoding('’', 'Windows-1252', 'UTF-8');    // returns ’
傲世九天 2024-09-01 09:14:44

您的字符编码不匹配;您的字符串以一种编码(UTF-8)进行编码,而解释此页面的任何内容都使用另一种编码(例如 ASCII)。

始终在 http 标头中指定您的编码,并确保这与您的框架的编码定义相匹配。

示例 http 标头:

Content-Type    text/html; charset=utf-8

在 asp.net 中设置编码

<configuration>
  <system.web>
    <globalization
      fileEncoding="utf-8"
      requestEncoding="utf-8"
      responseEncoding="utf-8"
      culture="en-US"
      uiCulture="de-DE"
    />
  </system.web>
</configuration>

在jsp中设置编码

You have a mismatch in your character encoding; your string is encoded in one encoding (UTF-8) and whatever is interpreting this page is using another (say ASCII).

Always specify your encoding in your http headers and make sure this matches your framework's definition of encoding.

Sample http header:

Content-Type    text/html; charset=utf-8

Setting encoding in asp.net

<configuration>
  <system.web>
    <globalization
      fileEncoding="utf-8"
      requestEncoding="utf-8"
      responseEncoding="utf-8"
      culture="en-US"
      uiCulture="de-DE"
    />
  </system.web>
</configuration>

Setting encoding in jsp

人│生佛魔见 2024-09-01 09:14:44

如果您的内容类型已经是 UTF8 ,那么数据可能已经以错误的编码到达。如果要从数据库获取数据,请确保数据库连接使用 UTF-8。

如果这是来自文件的数据,请确保该文件正确编码为 UTF-8。您通常可以在您选择的编辑器的“另存为...”对话框中进行设置。

如果在源文件中查看数据时数据已损坏,则很可能它曾经是 UTF-8 文件,但在途中以错误的编码保存。

If your content type is already UTF8 , then it is likely the data is already arriving in the wrong encoding. If you are getting the data from a database, make sure the database connection uses UTF-8.

If this is data from a file, make sure the file is encoded correctly as UTF-8. You can usually set this in the "Save as..." Dialog of the editor of your choice.

If the data is already broken when you view it in the source file, chances are that it used to be a UTF-8 file but was saved in the wrong encoding somewhere along the way.

压抑⊿情绪 2024-09-01 09:14:44

如果有人在 WordPress 网站上收到此错误,您需要更改 wp-config db charset:

define('DB_CHARSET', 'utf8mb4_unicode_ci');

而不是:

define('DB_CHARSET', 'utf8mb4');

If someone gets this error on WordPress website, you need to change wp-config db charset:

define('DB_CHARSET', 'utf8mb4_unicode_ci');

instead of:

define('DB_CHARSET', 'utf8mb4');
枕花眠 2024-09-01 09:14:44

如果其他答案没有帮助,您可能需要检查您的数据库是否确实存储了 mojibake 字符。我正在查看 utf-8 格式的文本,但我仍然看到 mojibake,结果发现,由于数据库升级,文本已永久“mojibake”。

在这种情况下,一种选择是使用 Python 的 ftfy 包(或 JavaScript 版本)“修复”文本此处)。

If the other answers haven't helped, you might want to check whether your database is actually storing the mojibake characters. I was viewing the text in utf-8, but I was still seeing the mojibake and it turned out that, due to a database upgrade, the text had been permanently "mojibaked".

In this case, one option is to "fix" the text with Python's ftfy package (or JavaScript verion here).

半山落雨半山空 2024-09-01 09:14:44

您必须从 Word 文档复制/粘贴文本。 Word 文档使用智能引号。您可以用特殊字符 (’) 替换它,或者只需在 HTML 编辑器中键入 (')。

我相信这会解决您的问题。

You must have copy/paste text from Word Document. Word document use Smart Quotes. You can replace it with Special Character (’) or simply type in your HTML editor (').

I'm sure this will solve your problem.

自由如风 2024-09-01 09:14:44

在 DBeaver(或其他编辑器)中,您正在工作的脚本文件可能会提示另存为 UTF8,这会将 char:

更改为

–

–

In DBeaver (or other editors) the script file you're working can prompt to save as UTF8 and that will change the char:

–

into

–

or

–
水波映月 2024-09-01 09:14:44

同样的事情也发生在我身上,使用“-”字符(长减号)。
我使用了这个简单的替换,所以解决它:

htmlText = htmlText.Replace('–', '-');

The same thing happened to me with the '–' character (long minus sign).
I used this simple replace so resolve it:

htmlText = htmlText.Replace('–', '-');
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文