当前位置：文江博客话题详情

PHP 使用 unicode 进行编码

发布于 2024-09-08 01:48:32 字数 161 浏览 2 评论 0原文

如何从 \u00e4（代表 &aauml; (ä)）中获取 HTML 实体？

出于转义原因，我在字符串中有反斜杠。当我删除斜杠时，我会得到类似 u00e4 的内容。我必须剥掉睫毛才能将其存储并恢复到会话中。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

兔小萌 2024-09-15 01:48:32

使用htmlentities()：

<?php

echo htmlentities("\xE4");

?>

但是，值得注意的是：

会话不关心字符编码。
HTML Unicode 文档中不需要 HTML 实体（HTML 中具有特殊含义的字符除外，例如 < 和 >）。

所以这不会解决你的问题，它只会隐藏它;-)

更新

我忽略了原始问题中对 \00e4 的引用。 ä 字符对应于 U+00E4 Unicode 代码点。但是，PHP 不支持 Unicode 代码点。如果您需要在 PHP 代码中键入它，并且您的键盘没有此类符号，您可以将文档另存为 UTF-8，然后提供 UTF-8 字节(c3 a4) 与双引号语法：

<?php
// \[0-7]{1,3} or \x[0-9A-Fa-f]{1,2}
echo "\xc3\xa4";
?>

不过，这与会话或 HTML 无关。我不明白你的具体问题是什么。

第二次更新

所以serialize（）无法处理关联数组，并且json_decode（）无法提供json_encode（）的输出......

<?php

$associative_array = array(
    'foo' => 'ä',
    'bar' => 33,
    'gee' => array(10, 20, 30),
);

var_dump($associative_array);
echo PHP_EOL;
var_dump(serialize($associative_array));
echo PHP_EOL;
var_dump(unserialize(serialize($associative_array)));
echo PHP_EOL;

var_dump(json_encode($associative_array));
echo PHP_EOL;
var_dump(json_decode(json_encode($associative_array)));
echo PHP_EOL;

?>

在

array(3) {
  ["foo"]=>
  string(2) "ä"
  ["bar"]=>
  int(33)
  ["gee"]=>
  array(3) {
    [0]=>
    int(10)
    [1]=>
    int(20)
    [2]=>
    int(30)
  }
}

string(83) "a:3:{s:3:"foo";s:2:"ä";s:3:"bar";i:33;s:3:"gee";a:3:{i:0;i:10;i:1;i:20;i:2;i:30;}}"

array(3) {
  ["foo"]=>
  string(2) "ä"
  ["bar"]=>
  int(33)
  ["gee"]=>
  array(3) {
    [0]=>
    int(10)
    [1]=>
    int(20)
    [2]=>
    int(30)
  }
}

string(42) "{"foo":"\u00e4","bar":33,"gee":[10,20,30]}"

object(stdClass)#1 (3) {
  ["foo"]=>
  string(2) "ä"
  ["bar"]=>
  int(33)
  ["gee"]=>
  array(3) {
    [0]=>
    int(10)
    [1]=>
    int(20)
    [2]=>
    int(30)
  }
}

我看来，您正在为一个简单的脚本添加几层复杂性，因为您正在制作关于某些 PHP 函数如何工作的假设，而不是检查手册或自己测试。此时，所提供的信息与原始问题几乎没有相似之处，而且我们仍然没有看到一行代码。

到目前为止，我的建议是，尝试停止整个应用程序的调试，将其分成更小的部分，并使用 var_dump() 找出每个部分实际生成的内容。不要假设事情：自己测试一下。另外，请考虑到 PHP 并不像其他语言那样原生支持 Unicode。涉及双字节字符串处理的每个任务都必须使用适当的多字节函数仔细实现，这通常需要对字符编码进行硬编码。

With htmlentities():

<?php

echo htmlentities("\xE4");

?>

However, it's worth noting that:

Sessions do not care about character encoding.
HTML entities are not required in HTML Unicode documents (except for chars with a special meaning in HTML such as < and >).

So this won't fix your problem, it will just hide it ;-)

Update

I had overlooked the reference to \00e4 in the original question. The ä character corresponds to the U+00E4 Unicode code point. However, PHP does not support Unicode code points. If you need to type it in your PHP code and your keyboard does not have such symbol, you can save the document as UTF-8 and then provide the UTF-8 bytes (c3 a4) with the double quote syntax:

<?php
// \[0-7]{1,3} or \x[0-9A-Fa-f]{1,2}
echo "\xc3\xa4";
?>

Still, this has no relation to sessions or HTML. I can't understand what your exact problem is.

Second update

So serialize() cannot handle associative arrays and json_decode() cannot be fed with json_encode()'s output...

<?php

$associative_array = array(
    'foo' => 'ä',
    'bar' => 33,
    'gee' => array(10, 20, 30),
);

var_dump($associative_array);
echo PHP_EOL;
var_dump(serialize($associative_array));
echo PHP_EOL;
var_dump(unserialize(serialize($associative_array)));
echo PHP_EOL;

var_dump(json_encode($associative_array));
echo PHP_EOL;
var_dump(json_decode(json_encode($associative_array)));
echo PHP_EOL;

?>

...

array(3) {
  ["foo"]=>
  string(2) "ä"
  ["bar"]=>
  int(33)
  ["gee"]=>
  array(3) {
    [0]=>
    int(10)
    [1]=>
    int(20)
    [2]=>
    int(30)
  }
}

string(83) "a:3:{s:3:"foo";s:2:"ä";s:3:"bar";i:33;s:3:"gee";a:3:{i:0;i:10;i:1;i:20;i:2;i:30;}}"

array(3) {
  ["foo"]=>
  string(2) "ä"
  ["bar"]=>
  int(33)
  ["gee"]=>
  array(3) {
    [0]=>
    int(10)
    [1]=>
    int(20)
    [2]=>
    int(30)
  }
}

string(42) "{"foo":"\u00e4","bar":33,"gee":[10,20,30]}"

object(stdClass)#1 (3) {
  ["foo"]=>
  string(2) "ä"
  ["bar"]=>
  int(33)
  ["gee"]=>
  array(3) {
    [0]=>
    int(10)
    [1]=>
    int(20)
    [2]=>
    int(30)
  }
}

It appears to me that you are adding several layers of complexity to a simple script because you are making assumptions about how some PHP functions work instead of checking the manual or testing yourself. At this point, the information provided hardly resembles the original question and we still haven't seen a single line of code.

My advice so far is that you try to stop debugging your app as a whole, divide it into smaller pieces and use var_dump() to find out what each of these parts actually generate. Don't assume things: test stuff yourself. Also, take into account that PHP doesn't Unicode natively as others languages do. Every single task that involves double-byte string handling must be carefully implemented with the appropriate multi-byte functions, which often require to hard-code the character encoding.

回复收藏 0 原文

只是一片海 2024-09-15 01:48:32

您的意思是您在重新加载时遇到问题？
您将其输出到 HTML 页面吗？在这种情况下，您可能设置了错误的字符集。
至于使用实体，请查看：
htmlentitites

回复收藏 0 原文

酸甜透明夹心 2024-09-15 01:48:32

我不确定这个对你有帮助，但看看 WordPress 的 sanitize_title 功能，您可以在其中找到一些巨大的字符表。

回复收藏 0 原文

去了角落 2024-09-15 01:48:32

正如您在讨论和答案中看到的那样，这是一个问题，php 无法处理本机（或者到目前为止，这里没有人知道），

我建议使用这个非常重要的功能...我的意思是，这是我迄今为止的解决方案，我非常不喜欢。

function parse_umlaut($string){

        $string = str_replace('u00c4', 'Ä', $string);
        $string = str_replace('u00e4', 'ä', $string);
        $string = str_replace('u00d6', 'Ö', $string);
        $string = str_replace('u00f6', 'ö', $string);
        $string = str_replace('u00dc', 'Ü', $string);
        $string = str_replace('u00fc', 'ü', $string);
        $string = str_replace('u00df', 'ß', $string);

        return $string;
 }

As you can see in the discussions, and answeres, it is a problem, which php can't handle native (or until now nobody here knows)

i suggest using this very havy function ... i mean, this is my solution so far, which i do not like, very much.

function parse_umlaut($string){

        $string = str_replace('u00c4', 'Ä', $string);
        $string = str_replace('u00e4', 'ä', $string);
        $string = str_replace('u00d6', 'Ö', $string);
        $string = str_replace('u00f6', 'ö', $string);
        $string = str_replace('u00dc', 'Ü', $string);
        $string = str_replace('u00fc', 'ü', $string);
        $string = str_replace('u00df', 'ß', $string);

        return $string;
 }

回复收藏 0 原文

~没有更多了~