PHP:将非英语字符写入 XML - 编码问题
我编写了一个小型 PHP 脚本来编辑站点新闻 XML 文件。 我使用 DOM 来操作 XML(加载、写入、编辑)。
当写入英文字符时它工作正常,但当写入非英文字符时,PHP 在尝试加载文件时会抛出错误。
如果我手动在文件中键入非英语字符 - 它加载得很好,但如果 PHP 写入非英语字符,则编码会出错,尽管我指定了 utf-8 编码。
任何帮助表示赞赏。
更新:有了有用的答案,问题就解决了(请阅读下文)。
错误:
警告:DOMDocument::load() [domdocument.load]:实体“次”不 在文件路径中定义
警告:DOMDocument::load() [domdocument.load]:输入不是 正确的UTF-8,表示编码! 字节:文件路径中的 0x91 0x26 0x74 0x69
以下是负责加载和保存文件的函数(不言自明):
function get_tags_from_xml(){
// Load news entries from XML file for display
$errors = Array();
if(!$xml_file = load_news_file()){
// Load file
// String indicates error presence
$errors = "file not found";
return $errors;
}
$taglist = $xml_file->getElementsByTagName("text");
return $taglist;
}
function set_news_lang(){
// Sets the news language
global $news_lang;
if($_POST["news-lang"]){
$news_lang = htmlentities($_POST["news-lang"]);
}
elseif($_GET["news-lang"]){
$news_lang = htmlentities($_GET["news-lang"]);
}
else{
$news_lang = "he";
}
}
function load_news_file(){
// Load XML news file for proccessing, depending on language
global $news_lang;
$doc = new DOMDocument('1.0','utf-8');
// Create new XML document
$doc->load("news_{$news_lang}.xml");
// Load news file by language
$doc->formatOutput = true;
// Nicely format the file
return $doc;
}
function save_news_file($doc){
// Save XML news file, depending on language
global $news_lang;
$doc->saveXML($doc->documentElement);
$doc->save("news_{$news_lang}.xml");
}
这是写入 XML 的代码(添加新闻):
<?php ob_start()?>
<?php include("include/xml_functions.php")?>
<?php include("../include/functions.php")?>
<?php get_lang();?>
<?php
//TODO: ADD USER AUTHENTICATION!
if(isset($_POST["news"]) && isset($_POST["news-lang"])){
set_news_lang();
$news = htmlentities($_POST["news"]);
$xml_doc = load_news_file();
$news_list = $xml_doc->getElementsByTagName("text");
// Get all existing news from file
$doc_root_element = $xml_doc->getElementsByTagName("news")->item(0);
// Get the root element of the new XML document
$new_news_entry = $xml_doc->createElement("text",$news);
// Create the submited news entry
$doc_root_element->appendChild($new_news_entry);
// Append submited news entry
$xml_doc->appendChild($doc_root_element);
save_news_file($xml_doc);
header("Location: /cpanel/index.php?lang={$lang}&news-lang={$news_lang}");
}
else{
header("Location: /cpanel/index.php?lang={$lang}&news-lang={$news_lang}");
}
?>
<?php ob_end_flush()?>
更新:根据您提供的有用答案,问题已解决: 表单提交的值是非英文的,并且包含一些HTML实体, 我在 POST 上使用了 htmlentities()
,这使得非英语字符串无法读取。 将 htmlentities()
替换为 htmlspecialchars()
,它就像魔术一样工作。
结论:htmlentities()
可能会破坏非英语字符串。
I wrote a small PHP script to edit the site news XML file.
I used DOM to manipulate the XML (Loading, writing, editing).
It works fine when writing English characters, but when non-English characters are written, PHP throws an error when trying to load the file.
If I manually type non-English characters into the file - it's loaded perfectly fine, but if PHP writes the non-English characters the encoding goes wrong, although I specified the utf-8 encoding.
Any help is appreciated.
Update: with the helpful answers, it is solved (read below).
Errors:
Warning: DOMDocument::load()
[domdocument.load]: Entity 'times' not
defined in filepathWarning: DOMDocument::load()
[domdocument.load]: Input is not
proper UTF-8, indicate encoding !
Bytes: 0x91 0x26 0x74 0x69 in filepath
Here are the functions responsible for loading and saving the file (self-explanatory):
function get_tags_from_xml(){
// Load news entries from XML file for display
$errors = Array();
if(!$xml_file = load_news_file()){
// Load file
// String indicates error presence
$errors = "file not found";
return $errors;
}
$taglist = $xml_file->getElementsByTagName("text");
return $taglist;
}
function set_news_lang(){
// Sets the news language
global $news_lang;
if($_POST["news-lang"]){
$news_lang = htmlentities($_POST["news-lang"]);
}
elseif($_GET["news-lang"]){
$news_lang = htmlentities($_GET["news-lang"]);
}
else{
$news_lang = "he";
}
}
function load_news_file(){
// Load XML news file for proccessing, depending on language
global $news_lang;
$doc = new DOMDocument('1.0','utf-8');
// Create new XML document
$doc->load("news_{$news_lang}.xml");
// Load news file by language
$doc->formatOutput = true;
// Nicely format the file
return $doc;
}
function save_news_file($doc){
// Save XML news file, depending on language
global $news_lang;
$doc->saveXML($doc->documentElement);
$doc->save("news_{$news_lang}.xml");
}
Here is the code for writing to XML (add news):
<?php ob_start()?>
<?php include("include/xml_functions.php")?>
<?php include("../include/functions.php")?>
<?php get_lang();?>
<?php
//TODO: ADD USER AUTHENTICATION!
if(isset($_POST["news"]) && isset($_POST["news-lang"])){
set_news_lang();
$news = htmlentities($_POST["news"]);
$xml_doc = load_news_file();
$news_list = $xml_doc->getElementsByTagName("text");
// Get all existing news from file
$doc_root_element = $xml_doc->getElementsByTagName("news")->item(0);
// Get the root element of the new XML document
$new_news_entry = $xml_doc->createElement("text",$news);
// Create the submited news entry
$doc_root_element->appendChild($new_news_entry);
// Append submited news entry
$xml_doc->appendChild($doc_root_element);
save_news_file($xml_doc);
header("Location: /cpanel/index.php?lang={$lang}&news-lang={$news_lang}");
}
else{
header("Location: /cpanel/index.php?lang={$lang}&news-lang={$news_lang}");
}
?>
<?php ob_end_flush()?>
Update: with the helpful answers you provided, its solved:
The value submitted by form is non-English, and it contains some HTML entities,
I used htmlentities()
on the POST, that made the non-English string unreadable.
Replaced htmlentities()
with htmlspecialchars()
, and it works like magic.
Conclusion: htmlentities()
can ruin non-English strings.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
字符编码总是一个麻烦。确保包含表单的页面、加载到 $dom 中的 xml 以及 php 文件本身也是 utf-8 编码的,或者进行相应的翻译。否则你的所有字符串都不会是这样,并且将它们作为 utf-8 处理将不起作用。
试试这个:将原始新闻 XML 回显到空白页面上。然后在浏览器中切换页面编码,看看哪一个能正确显示字符。从表单中检索输入后,对 $news 重复此操作。这通常会提供有关编码哪里出错的线索。
Character encoding is always a hassle. Make sure the page containing your form, the xml you load into $dom, and the php file itself are also utf-8 encoded, or translate accordingly. Otherwise all your strings won't be, and handling them as utf-8 won't work.
Try this: echo your original news XML onto an empty page. Then switch page encoding in the browser to see which one displays the characters correctly. Repeat this for $news after retrieving the input from the form. This usually provides a clue on where the encoding goes wrong.
如果不进一步分解应用程序,很难诊断出确切的问题,但这是一个很好的线索:
通常不喜欢像
×
这样的 HTML 实体。保证工作的唯一实体是<
、>
、&
和"。
。请改用数字实体。因此对于 ×,使用
×
等等。这是一个快速而肮脏的技巧,您可以在调用
html_entities
后添加:您可以使用
preg_replace
和array_map
做更奇特的事情,但这就是数据你会需要的。或者,如果性能对您来说是一个问题,您可以执行一些奇特的多字节字符检测并完全绕过命名实体步骤。 PHP 网站上有很多示例。
严格来说,如果您已将 XML 文档标记为 utf8 编码,则可以完全省略实体编码,而只对四个主要部分进行编码:
n。
It's hard to diagnose the exact issue without pulling the app apart a bit more, but this is a good clue:
XML doesn't generally like HTML entities like
×
. The only entities guaranteed to work are<
,>
,&
and"
.Use numeric entities instead. So for ×, use
×
and so on.Here's a quick and dirty trick you can add after your call to
html_entities
:You can do fancier things with
preg_replace
andarray_map
but this is the data you'll need.Alternatively, if performance is an issue for you, you can do some fancy multi-byte-character detection and bypass the named entity step altogether. There are plenty of examples on the PHP website.
Strictly speaking, if you've marked your XML document as being utf8 encoded, you can leave the entity encoding out completely, and just encode the four main guys:
n.