PHP SimpleXML 返回的值有奇怪的字符代替连字符和撇号

发布于 2024-10-15 03:28:54 字数 915 浏览 4 评论 0原文

我环顾四周，似乎找不到解决方案，所以就在这里。

我有以下代码：

$file = "adhddrugs.xml";
$xmlstr = simplexml_load_file($file);
echo $xmlstr->report_description;

这是简单的版本，但即使尝试此操作，任何连字符 r 撇号也会变成：^a（欧元符号）商标符号。

我尝试过的事情有：

echo = (string)$xmlstr->report_description; /* did not work */
echo = addslashes($xmlstr->report_description); /* yes I know this doesnt work with hyphens, was mainly trying to see if I could escape the apostrophes */
echo = addslashes((string)$xmlstr->report_description); /* did not work */

htmlspecial（我再次知道不能使用连字符）、htmlentities 和其他一些技巧。

现在的情况是我从提要中获取 XML 文件，因此我无法更改它们，但它们是相当标准的。带有连字符等的文本封装在 cdata 标签中，编码为 UTF-8。如果我检查源代码，我会看到源代码中的连字符和撇号。

现在，为了查看编码是否已关闭、标签错误或其他奇怪的情况，我尝试查看原始 XML 文件，果然它显示正确。

我确信，在急于寻找答案的过程中，我忽略了一些简单的事情，而且事实上，这确实是我第一次使用 SimpleXML，我错过了一个非常简单的解决方案。只是不要因此而拒绝我，我确实尝试过自己找到答案。

再次感谢。

原文

I have looked around and can't seem to find a solution so here it is.

I have the following code:

$file = "adhddrugs.xml";
$xmlstr = simplexml_load_file($file);
echo $xmlstr->report_description;

This is the simple version, but even trying this any hyphens r apostrophes are turned into: ^a (euro sign) trademark sign.

Things I have tried are:

echo = (string)$xmlstr->report_description; /* did not work */
echo = addslashes($xmlstr->report_description); /* yes I know this doesnt work with hyphens, was mainly trying to see if I could escape the apostrophes */
echo = addslashes((string)$xmlstr->report_description); /* did not work */

also htmlspecial(again i know does not work with hyphens), htmlentities, and a few other tricks.

Now the situation is I am getting the XML files from a feed so I cannot change them, but they are pretty standard. The text with the hyphens etc are encapsulated in a cdata tag and encoding is UTF-8. If I check the source I am shown the hyphens and apostrophes in the source.

Now just to see if the encoding was off or mislabeled or something else weird, I tried to view the raw XML file and sure enough it is displayed correctly.

I am sure that in my rush to find the answer I have overlooked something simple and the fact that this is really the first time I have ever used SimpleXML I am missing a very simple solution. Just don't dock me for it I really did try and find the answer on my own.

Thanks again.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

花之痕靓丽 2024-10-22 03:28:54

这是简单的版本，但即使
尝试这个任何连字符撇号
变成：^a（欧元符号）
商标标志。

这是由不正确的字符集猜测（以及可能的重新编码）引起的。

如果文本包含“大写撇号”=“右单引号”= U+2019 字符，则将其保存为 UTF-8 编码会导致字节 0xE2 0x80 0x99。如果再次读取同一文件，假设其字符集是 windows-1252，则撇号字符 (0xE2 0x80 0x99) 的字节流将被解释为字符 â €™（=带扬抑符的小“a”、欧元符号、商标符号）。同样，如果此错误解释的文本保存为 UTF-8，则原始字符会生成字节流 0xC3 0xA2 0xE2 0x82 0xAC 0xE2 0x84 0xA2

摘要：您的原始数据是 UTF-8，并且代码的某些部分读取数据时假定它是 windows-1252（或 ISO-8859-1，通常实际上被视为 windows-1252）。这种字符集假设的一个可能原因是 HTTP 的默认字符集是 ISO-8859-1。 “当发送者没有提供明确的字符集参数时，“文本”类型的媒体子类型在通过 HTTP 接收时被定义为具有默认字符集值“ISO-8859-1”。资料来源：RFC 2616，超文本传输协议 - HTTP/1.1

PS。这是一个非常常见的问题。只需使用查询 doesn’t -doesn't 进行 Google 或 Bing 搜索，您就会看到许多页面都存在相同的编码错误。

回复收藏 0 原文

轻拂→两袖风尘 2024-10-22 03:28:54

您知道文档的字符集吗？

如果您还没有这样做，您可以在打印任何内容之前执行 header('Content-Type: text/html; charset=utf-8'); 。

回复收藏 0 原文

唠甜嗑 2024-10-22 03:28:54

确保您也已将 SimpleXML 设置为使用 UTF-8。

确保所有实体均使用十六进制表示法进行编码，而不是 HTML 实体。

也可能：

$string = html_entity_decode($string, ENT_QUOTES, "utf-8");

会有所帮助。

Make sure you have set up SimpleXML to use UTF-8 too.

Be sure that all the entities are encoded using hex notation, not HTML entities.

Also maybe:

$string = html_entity_decode($string, ENT_QUOTES, "utf-8");

will help.

回复收藏 0 原文

耶耶耶 2024-10-22 03:28:54

这是在页面的部分声明不正确的字符集（或者未声明和使用不带重音符号和特殊字符的默认字符集）的症状。

这对于拉丁语言来说很有效。

<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

对于新手来说，浏览器的 html 页面有一个基本布局，带有 HEAD 或 HEADER，用于告诉浏览器有关页面的一些基本信息，以及预加载页面将用来实现其功能的一些脚本。

<html>
 <head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 </head>
 <body>
  Hello world
 </body>
</html>

如果省略部分，html 将使用默认值（认为某些事情是理所当然的 - 例如使用北美字符集，其中不包含许多重音字母，这些字母显示为“奇怪的字符” ”。

This is a symptom of declaring an incorrect character set in the <head> section of your page (or not declaring and using default character set without accents and special characters).

This does the trick for latin languages.

<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

For TOTAL NEWBIES, html pages for browsers have a basic layout, with a HEAD or HEADER which serves to tell the browser some basic stuff about the page, as well as preload some scripts that the page will use to achieve its functionality(ies).

<html>
 <head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 </head>
 <body>
  Hello world
 </body>
</html>

if the <head> section is omitted, html will use defaults (take some things for granted - like using the northamerican character set, which does NOT include many accented letters, whch show up as "weird characters".

回复收藏 0 原文

~没有更多了~