ARC2(PHP 语义 Web 库)错误地将 UTF-8 文件双重转换为 UTF-8

发布于 2025-01-07 09:03:34 字数 523 浏览 1 评论 0 原文

使用ARC2,文本数据会被损坏。

我的 RDF 输入文件采用 UTF-8 格式。它通过 LOAD 查询加载到使用 MySQL 后端的 ARC2 中。 MySQL 数据库也是 UTF-8 格式,通过 PHPMyAdmin 检查可以确定。

然而,文本数据被损坏。经过多次转换检查,问题似乎是原始UTF-8文件被认为是ISO-8859-1,并再次转换为UTF-8。

示例:“surmonté”→“surmonteÌ”。

此“surmonteÌ”实际上在数据库中以 UTF-8 格式提供。

这是否与 ARC2 打开文件的方式有关(深入研究代码,不是详尽但相当深入,没有显示任何可疑的内容),或者这可能是 PHP 和 MySQL 的更常见情况?

如何确保导入的数据不会被错误地重新编码,而是被视为原始数据?

Using ARC2, textual data gets corrupted.

My RDF input file is in UTF-8. It gets loaded in ARC2, which uses a MySQL backend, through a LOAD <path/to/file.rdf> query. The MySQL database is in UTF-8 too, as a check with PHPMyAdmin makes sure.

However, the textual data gets corrupted. After several conversion checks, the problem seems to be that the original UTF-8 file is believed to be in ISO-8859-1, and converted to UTF-8 once again.

Example: "surmonté" → "surmonteÌ".

This "surmonteÌ" is actulally available in UTF-8 in the database.

Is this related to the way ARC2 opens files (digging through the code, not exhaustively but quite deep, did not show anything suspicious), or could this be a more general case with PHP and MySQL?

How can I make sure the imported data is not wrongly re-encoded but taken as the original?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

暮凉 2025-01-14 09:03:34

ARC2 使用两个函数:$store->setUp(),其中 CREATETABLEDATABASE if需要是;和 query(LOAD...,问题中的详细信息。

事实证明,setUp() 部分必须在同一脚本中调用至少,不是在同一个执行期间。我采取的解决方案是制作两个单独的脚本,一个用于初始化数据库,另一个用于加载数据,但只是注释掉 init。无论如何,完成后的部分也有效。确保初始化后不会立即进行加载。

发生这种情况是因为仅设置了数据库连接时的SET NAMES utf8编码规范排序规则检测之后,如果数据库刚刚创建,MySQL 似乎无法正确检测。我做了一个 修复请求


附带说明,使用 效率不高。 LOAD 问题的构造:这将被计算为相对web地址,调用服务器通过网络从自身下载。使用如下结构会更有效:

 $store->query('LOAD <file://' . dirname(__FILE__) . '/path/to/file.rdf>')

ARC2 uses two functions: $store->setUp(), which CREATEs TABLEs and DATABASE if needs be; and query(LOAD…, a detailed in the question.

It turns out, the setUp() part must not be called in the same script as the load part. At least, not during the same execution. The solution I took was to make two separate scripts, one to init the database, another to load the data, but simply commenting out the init part once it is done also works. In any case, the trick is to make sure the loading won't take place right after the initialization.

This happens because the SET NAMES utf8 encoding specification upon DB connection is set only after collation detection, for which MySQL does not seem to detect properly if the database has just been created. I made a pull request of a fix.


As a side note, it is not efficient to use the LOAD <path/to/file.rdf construct of the question: this will be computed as a relative web address, calling the server to download from itself through the network. It is much more efficient to use a construct such as:

 $store->query('LOAD <file://' . dirname(__FILE__) . '/path/to/file.rdf>')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文