当前位置：文江博客话题详情

PHP character-encoding domdocument

PHP DOMDocument，Unicode 问题

发布于 2025-01-08 10:25:26 字数 680 浏览 0 评论 0原文

我这里有一些问题

$source = "<html><body><h1>&#8220;</h1></body></html>";
$dom = new DOMDocument();
$dom->loadHTML($source);
echo $dom->saveHTML();

输出：

“

好的，这可以正常工作。但是如果我想像这样提取节点

$source = "<html><body><h1>&#8220;</h1></body></html>";
$dom = new DOMDocument();
$dom->loadHTML($source);
$h1 = $dom->getElementsByTagName('h1');
echo $dom->saveHTML($h1->item(0));

它会输出无法识别的文本。

“

有人知道如何解决这个问题吗？

I have some problem here

$source = "<html><body><h1>“</h1></body></html>";
$dom = new DOMDocument();
$dom->loadHTML($source);
echo $dom->saveHTML();

Output:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><h1>“</h1></body></html>

Ok, this work correctly.
But if I want to extract the nodes like this

$source = "<html><body><h1>“</h1></body></html>";
$dom = new DOMDocument();
$dom->loadHTML($source);
$h1 = $dom->getElementsByTagName('h1');
echo $dom->saveHTML($h1->item(0));

It output unrecognized text.

<h1>â€œ</h1>

Anyone know how to solve this?

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（2）

肩上的翅膀 2025-01-15 10:25:26

您的代码示例适用于我，输出为

“

。

“    <ENTITY TYPE="#8220"/>    “    Left double quotation mark

“ 的二进制 UTF-8 序列是：

0xE2 (226) 0x80 (128) 0x9C (156)
 |          |           `------ Windows-1252: œ
 |          `--- most Windows 125x encodings: €
 `--- ISO 8859-1, 2, 3, 4, 9, 10, 14, 15, 16: â

那么您在哪里查看该输出？

可能在 Windows 上的浏览器中？如果在您的浏览器中，您是否尝试过

header('Content-Type: text/html; charset=utf-8');

在脚本之上添加？

另请参阅：设置 HTTP 字符集参数和检查 HTTP 标头。

Your code example works for me, output is <h1>“</h1>.

“    <ENTITY TYPE="#8220"/>    “    Left double quotation mark

Binary UTF-8 sequence of “ is:

0xE2 (226) 0x80 (128) 0x9C (156)
 |          |           `------ Windows-1252: œ
 |          `--- most Windows 125x encodings: €
 `--- ISO 8859-1, 2, 3, 4, 9, 10, 14, 15, 16: â

So where do you view that output?

Probably inside your browser on windows? If inside your browser, have you tried adding

header('Content-Type: text/html; charset=utf-8');

on top of your script?

See also: Setting the HTTP charset parameter and Checking HTTP Headers.

回复收藏 0 原文

千仐 2025-01-15 10:25:26

您需要 domdocument 构造函数的第二个参数（查看 http://nl.php。 net/manual/en/domdocument.construct.php）：

$dom = new DOMDocument('1.0', 'utf-8');

you need the second parameter of the domdocument constructor (checkout http://nl.php.net/manual/en/domdocument.construct.php):

$dom = new DOMDocument('1.0', 'utf-8');

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

文章

评论

26 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

微信用户

文章 0 评论 0

小情绪

文章 0 评论 0

追我者格杀勿论

文章 0 评论 0

ゞ记忆︶ㄣ

文章 0 评论 0

笨死的猪

文章 0 评论 0

彭明超

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文