浏览器代码页检测

发布于 2024-10-01 23:01:51 字数 398 浏览 0 评论 0原文

我有一个 ASP.Net 页面,用户可以在 TEXTAREA 中输入一些文本并将其提交到服务器。该文本将存储在数据库中并显示在 winform 应用程序中。

我如何确保 winform 应用程序 显示用户在 TEXTAREA 中输入的确切字符。

也就是说,我是否存在潜在问题,例如,如果用户输入特殊语言特定字母,例如 Æ、Ø 和 Å(丹麦语字母)?
这些字母根据代码页有不同的代码,所以据我所知,我需要知道 TEXTAREA 控件显示其输入的代码页。或者我在这里遗漏了什么?

我试图在网上找到有关此问题的材料,但很难找到解决此问题的内容。我通常会找到讨论服务器要求浏览器使用什么代码页的页面,以便正确显示发送的数据。

但我的问题是相反的,即从客户端到服务器。

I have an ASP.Net page, where a user can enter some text in a TEXTAREA and submit it to the server. This text will be stored in a database and will be presented in a winform application.

How can I make sure that the winform application presents the exact characters that the user entered in the TEXTAREA.

That is, do I have a potential problem like for example if the user enters special language specific letters such as Æ, Ø and Å, which are Danish letters?
Those letters have different codes depending on the codepage, so as far as I can see, I need to know what codepage the TEXTAREA control is showing its input in. Or am i missing something here?

I have tried to find material on this on the net, but it is difficult to find something that addresses this issue. I typically found pages talking about what codepage the server requires the browser to use, in order to display the sent data correctly.

But my question goes the other way, i.e. from client to server.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

满栀 2024-10-08 23:01:52

如果您确实想确保用户使用蹩脚的浏览器发送文本不会损坏您的数据骨干。

本质上它是这样工作的:

每个代码页都有自己的指纹。例如单个实体“º”可用于区分三巨头:ISO-8859-1/Windows-1252 (=BA)、MacRoman(=BC) 和 UTF-8 (=C2BA)。

在表单中,您只需添加包含这些指纹作为实体的隐藏输入(例如 °、÷ 和 —),当用户提交表单时,您只需检查返回的十六进制值并比较它们对着你的指纹表。
如果这没有给出匹配,则只能继续其他后备解决方案。

稍微大一点的实现只需五个代码点就可以很好地工作:

my @fp_ents = qw/deg divide mdash bdquo euro/;
my %fingerprints = (
  "UTF-8" => ['c2b0','c3b7','e28094','e2809e','e282ac'],
  "WINDOWS-1252" => ['b0','f7','97','84','80'],
  "MAC"          => ['a1','d6','d1','e3','db'],
  "MS-HEBR"      => ['b0','ba','97','84','80'],
  "MAC-CYRILLIC" => ['a1','d6','d1','d7',''],
  "MS-GREEK"     => ['b0','','97','84','80'],
  "MAC-IS"       => ['a1','d6','d0','e3',''],
  "MS-CYRL"      => ['b0','','97','84','88'],
  "MS932"        => ['818b','8180','815c','',''],
  "WINDOWS-31J"  => ['818b','8180','815c','',''],
  "WINDOWS-936"  => ['a1e3','a1c2','a1aa','',''],
  "MS_KANJI"     => ['818b','8180','','',''],
  "ISO-8859-15"  => ['b0','f7','','','a4'],
  "ISO-8859-1"   => ['b0','f7','','',''],
  "CSIBM864"     => ['80','dd','','',''],
 );

You could also use the HEBCI: HTML Entity-Based Codepage Inference technique if you REALY want to be sure that users sending text with crappy browsers don't corrupt your data-backbone.

In essence this is how it works:

Every codepage has its own finger-print. For instance the single entity "º" could be used to distinguish between the Big Three: ISO-8859-1/Windows-1252 (=BA), MacRoman(=BC), and UTF-8 (=C2BA).

In a form you simply add a hidden input containing those fingerprints as entity's (like °, ÷, and —) and when the users submits the form you simply check the returned hex-values and compare them against your finger-print table.
IF this does not give a match, only THEN continue other fall-back solutions.

A slightly larger implementation works great with only five codepoints:

my @fp_ents = qw/deg divide mdash bdquo euro/;
my %fingerprints = (
  "UTF-8" => ['c2b0','c3b7','e28094','e2809e','e282ac'],
  "WINDOWS-1252" => ['b0','f7','97','84','80'],
  "MAC"          => ['a1','d6','d1','e3','db'],
  "MS-HEBR"      => ['b0','ba','97','84','80'],
  "MAC-CYRILLIC" => ['a1','d6','d1','d7',''],
  "MS-GREEK"     => ['b0','','97','84','80'],
  "MAC-IS"       => ['a1','d6','d0','e3',''],
  "MS-CYRL"      => ['b0','','97','84','88'],
  "MS932"        => ['818b','8180','815c','',''],
  "WINDOWS-31J"  => ['818b','8180','815c','',''],
  "WINDOWS-936"  => ['a1e3','a1c2','a1aa','',''],
  "MS_KANJI"     => ['818b','8180','','',''],
  "ISO-8859-15"  => ['b0','f7','','','a4'],
  "ISO-8859-1"   => ['b0','f7','','',''],
  "CSIBM864"     => ['80','dd','','',''],
 );
如此安好 2024-10-08 23:01:52

您可以查看内容类型标头以找出编码。

有关更多详细信息,请参阅对相关问题的回答问题。

You can look at the content-type header to find out the encoding.

For more details see this SO answer to a related question.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文