使用php从xml文件中删除十六进制字符
首先,我有一个 XML 文件数组。需要迭代这些文件并检查某些“无法识别”的十六进制字符,并用正常的 UTF-8 文本或某种占位符替换。
我尝试迭代文件并使用 str_replace 和 preg_replace 替换十六进制代码,但没有成功。我的最终问题是,当我尝试使用 simpleXML 打开这些文件时,收到有关“非 utf 字符”的错误。
这是我到目前为止所得到的:
class HexadecimalConverter {
public $filenames = array();
public function __construct($filenames) {
$this->filenames = $filenames;
$this->removeHex();
}
public function removeHex() {
foreach ($this->filenames as $key => $value) {
$contents = file_get_contents($value);
$contents = preg_replace("/\x96/", '–', $contents);
$contents = preg_replace("/\x97/", '—', $contents);
$contents = preg_replace("/\x85/", "...", $contents);
$contents = preg_replace("/\xBA/", "", $contents);
file_put_contents($value, $contents);
}
}
}
这是我试图修复的错误:警告:simplexml_load_file()[function.simplexml-load-file]:./04R_P455_S1157.xml:5:解析器错误:输入不正确的UTF- 8、注明编码!字节:0x97 0x0D 0x0A 0x69 in C:\xampp\htdocs\hint_updater\libraries\hint_updater_classes.php on line 130
仍然没有运气,我已经尝试了此线程中建议的所有内容,但 preg_replace 似乎并未替换所有实例十六进制代码。
So to start, I have an array of XML files. These files need to be iterated through and checked for certain 'unrecognized' hexadecimal characters and replaced with normal UTF-8 text, or some kind of placeholder.
I've tried iterating through the files and replacing the hex codes using both str_replace and preg_replace with no luck. My ultimate problem, is I'm receiving errors about 'non-utf characters' when trying to open these files with simpleXML.
Here's what I have so far:
class HexadecimalConverter {
public $filenames = array();
public function __construct($filenames) {
$this->filenames = $filenames;
$this->removeHex();
}
public function removeHex() {
foreach ($this->filenames as $key => $value) {
$contents = file_get_contents($value);
$contents = preg_replace("/\x96/", '–', $contents);
$contents = preg_replace("/\x97/", '—', $contents);
$contents = preg_replace("/\x85/", "...", $contents);
$contents = preg_replace("/\xBA/", "", $contents);
file_put_contents($value, $contents);
}
}
}
Here is the error I'm trying to fix: Warning: simplexml_load_file() [function.simplexml-load-file]: ./04R_P455_S1157.xml:5: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0x97 0x0D 0x0A 0x69 in C:\xampp\htdocs\hint_updater\libraries\hint_updater_classes.php on line 130
Still no luck, I've tried everything suggested in this thread, but the preg_replace doesn't appear to be replacing all instances of hex code.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您应该首先阅读 preg_replace 文档。他们明确指出该函数返回修改后的字符串,因此您必须通过
$contents = preg_replace(...);
更改代码中的每个 preg_replace 行才能使替换工作。现在您正在执行替换,但丢弃了生成的字符串,因此最终您将原始字符串写回到文件中。You should first read the preg_replace docs. They clearly state that the function returns the modified string, so you will have to change every preg_replace line in your code by
$contents = preg_replace(...);
to make your replaces work. Right now you're doing the replace but throwing the resulting string away, and thus in the end you write the original string back to the file.preg_replace
返回新字符串。尝试
$contents = preg_replace("/\x96/", '–', $contents);
等。preg_replace
returns the new string.Try
$contents = preg_replace("/\x96/", '–', $contents);
and the like.