HTML 编码不在字符集中的字符

发布于 2024-10-07 22:10:57 字数 699 浏览 2 评论 0原文

我们有一个使用 ISO-8859-1 字符集的 Web 应用程序。有时，用户会使用包含 Š 等字符的“奇怪”名称（为了方便起见，此处进行了 html 编码）。 ~~我们将其存储在数据库中，但是~~我们无法正确显示它。

处理这个问题的最佳方法是什么？我想我应该使用其 HTML 实体编号编码（ Š 到 Š）自动转换字符集之外的字符，

但我在找出如何自动执行此操作时遇到问题（无需使用所有值的表）。

此代码适用于扩展 ASCII 字符，例如“å”（存在于 ISO-8859-1 中）。我想对其他角色做同样的事情。这些 HTML 实体编码值中是否有我可以使用的模式？

unsigned int c;  
for( int i=0; i < html.GetLength(); i++)  
{  
    c = html[i];  
    if( c > 255 || c < 0 )  
    {  
        CString orig = CString(html[i]);  
        CString encoded = "&#";  
        encoded += CTool::String((byte)c);  
        encoded += ";";  
        html.Replace(orig, encoded);  
    }  
}

原文

We have a web app which uses the ISO-8859-1 character set. Occationaly users have 'strange' names which contain characters like Š (html encoded here for your convenience). ~~We store this in our database, but~~ we can't display it correctly.

What is the best way of dealing with this? I'm thinking I should automatically convert characters outside the character set with its HTML Entity number encoding ( Š to Š)

But I'm having problems finding out how to do this automatically (without using a table of all values).

This code works for extended ASCII characters like 'å' (that are present in ISO-8859-1). I would like to do the same with other characters. Is there a pattern in these HTML entity encoding values I can use?

unsigned int c;  
for( int i=0; i < html.GetLength(); i++)  
{  
    c = html[i];  
    if( c > 255 || c < 0 )  
    {  
        CString orig = CString(html[i]);  
        CString encoded = "&#";  
        encoded += CTool::String((byte)c);  
        encoded += ";";  
        html.Replace(orig, encoded);  
    }  
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风柔一江水 2024-10-14 22:10:57

网页应指示浏览器以 UTF-8 格式显示响应。这通常是通过在 Content-Type 响应标头中提供字符集（例如 text/html;charset=UTF-8）来实现的。

Response.AppendHeader("Content-Type", "text/html;charset=UTF-8");

HTML/XML 实体仅存在于此，以便您能够以 UTF-8 以外的编码保存网页源代码。

The webpage should instruct the browser to display the response in UTF-8. This usually happens by supplying the charset in the Content-Type response header like text/html;charset=UTF-8.

Response.AppendHeader("Content-Type", "text/html;charset=UTF-8");

The HTML/XML entities are solely there so that you will be able to save the webpage source in an encoding other than UTF-8.

回复收藏 0 原文

戴着白色围巾的女孩 2024-10-14 22:10:57

html 似乎是一个“Unicode”CString。这意味着它是 UTF-16 编码的。 “&#ddd”语法使用 Unicode 代码点编号。通常，这非常简单。 Š 是 U+0160，这意味着它在 UTF-16 中是 0x0160。当然，十进制是 352，所以你得到 Š。

只有当您遇到基本多语言平面 (BMP)（超过 U+FFFF）之外的字符时，才会出现问题。它不再适合 16 位，因此将在 html 字符串中占用两个字符。然而，它应该只产生一个 &#ddddd 值。这种情况非常罕见，以至于您常常可以忽略它。

回复收藏 0 原文

~没有更多了~

关于作者

能否归途做我良人

暂无简介

0 文章

0 评论

665 人气

关注发私信

友情链接

文江博客

HTML 编码不在字符集中的字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

HTML 编码不在字符集中的字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。