使用php从xml文件中删除十六进制字符

发布于 2024-09-14 02:02:02 字数 1214 浏览 9 评论 0原文

首先,我有一个 XML 文件数组。需要迭代这些文件并检查某些“无法识别”的十六进制字符,并用正常的 UTF-8 文本或某种占位符替换。

我尝试迭代文件并使用 str_replace 和 preg_replace 替换十六进制代码,但没有成功。我的最终问题是,当我尝试使用 simpleXML 打开这些文件时,收到有关“非 utf 字符”的错误。

这是我到目前为止所得到的:

class HexadecimalConverter {

    public $filenames = array();

    public function __construct($filenames) {

        $this->filenames = $filenames;
        $this->removeHex();

    }

    public function removeHex() {

        foreach ($this->filenames as $key => $value) {

            $contents = file_get_contents($value);

            $contents = preg_replace("/\x96/", '–', $contents);
            $contents = preg_replace("/\x97/", '—', $contents);
            $contents = preg_replace("/\x85/", "...", $contents);
            $contents = preg_replace("/\xBA/", "", $contents);

            file_put_contents($value, $contents);

        }

    }

}

这是我试图修复的错误:警告:simplexml_load_file()[function.simplexml-load-file]:./04R_P455_S1157.xml:5:解析器错误:输入不正确的UTF- 8、注明编码!字节:0x97 0x0D 0x0A 0x69 in C:\xampp\htdocs\hint_updater\libraries\hint_updater_classes.php on line 130

仍然没有运气,我已经尝试了此线程中建议的所有内容,但 preg_replace 似乎并未替换所有实例十六进制代码。

So to start, I have an array of XML files. These files need to be iterated through and checked for certain 'unrecognized' hexadecimal characters and replaced with normal UTF-8 text, or some kind of placeholder.

I've tried iterating through the files and replacing the hex codes using both str_replace and preg_replace with no luck. My ultimate problem, is I'm receiving errors about 'non-utf characters' when trying to open these files with simpleXML.

Here's what I have so far:

class HexadecimalConverter {

    public $filenames = array();

    public function __construct($filenames) {

        $this->filenames = $filenames;
        $this->removeHex();

    }

    public function removeHex() {

        foreach ($this->filenames as $key => $value) {

            $contents = file_get_contents($value);

            $contents = preg_replace("/\x96/", '–', $contents);
            $contents = preg_replace("/\x97/", '—', $contents);
            $contents = preg_replace("/\x85/", "...", $contents);
            $contents = preg_replace("/\xBA/", "", $contents);

            file_put_contents($value, $contents);

        }

    }

}

Here is the error I'm trying to fix: Warning: simplexml_load_file() [function.simplexml-load-file]: ./04R_P455_S1157.xml:5: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0x97 0x0D 0x0A 0x69 in C:\xampp\htdocs\hint_updater\libraries\hint_updater_classes.php on line 130

Still no luck, I've tried everything suggested in this thread, but the preg_replace doesn't appear to be replacing all instances of hex code.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

2024-09-21 02:02:02

您应该首先阅读 preg_replace 文档。他们明确指出该函数返回修改后的字符串,因此您必须通过 $contents = preg_replace(...); 更改代码中的每个 preg_replace 行才能使替换工作。现在您正在执行替换,但丢弃了生成的字符串,因此最终您将原始字符串写回到文件中。

You should first read the preg_replace docs. They clearly state that the function returns the modified string, so you will have to change every preg_replace line in your code by $contents = preg_replace(...); to make your replaces work. Right now you're doing the replace but throwing the resulting string away, and thus in the end you write the original string back to the file.

长安忆 2024-09-21 02:02:02

preg_replace 返回新字符串。

尝试 $contents = preg_replace("/\x96/", '–', $contents); 等。

preg_replace returns the new string.

Try $contents = preg_replace("/\x96/", '–', $contents); and the like.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文