CakePHP编码问题:存储大写S并在顶部加减号,保存在数据库中但在蛋糕处理时导致错误

发布于 2024-11-17 20:14:59 字数 2441 浏览 3 评论 0原文

所以我在一个提供楔形文字板信息的网站上工作。我们使用闪族字符进行音译。

在我的脚本中,我根据平板电脑的音译创建了一个术语列表。

我的问题是,对于 Š,我的脚本创建了两个不同的术语,因为它认为单词中存在空格,因为 cake 处理特殊字符的方式。

示例:

平板电脑的部分内容:

  1. utu-DIŠ-nu-il2

由我的脚本处理时来自平板电脑的术语:

utu-DIŠ,-nu-il2

应该是:

utu-DIŠ-nu-il2

当我在处理内容的过程中打印数组的内容时,我看到以下内容:

  1. utu-DI�-nu-il2

因此,这意味着对文本的不正确解析会创建一个空格,该空格在我的脚本中被解释为 2 个单词而不是 1 个单词。

在数据库中,文本很好...

我也收到这些错误:

警告(512):SQL 错误:1366:字符串值不正确:第 1 行“term”列的“\xC5”[CORE\cake\libs\model\datasources\dbo_source.php,第 684 行]

查询:INSERT INTO terms (term, lft, rght) VALUES ('utu-DI� ', 449, 450)

查询:INSERT INTO terms (term, lft, rght) VALUES ('A�', 449, 450)

查询:INSERT INTO terms (term, lft, rght) VALUES ('xDI�', 449, 450)

有人知道我可以做些什么来使这项工作成功吗?

谢谢 !

添加信息:

    $terms=$this->data['Tablet']['translit'];
$terms= str_replace(array('\r\n', '\r', '\n','\n\r','\t'), ' ', $terms);
$terms = trim($terms, chr(173));
print_r($terms);
$terms = preg_replace('/\s+/', ' ', $terms);
$terms = explode(" ", $terms);
$terms=array_map('trim', $terms);
$anti_terms = array('@tablet','1.','2.','3.','4.','5.','6.','7.','7.','9.','10.','11.','12.','13.','14.','15.','16.','17.','18.','19.','20.','Rev.',
'Obv.','@tablet','@obverse','@reverse','C1','C2','C3','C4','C5','C6','C7','C8','C9', '\r', '\n','\r\n', '\t',''. ' ', null, chr(173), 'x', '[x]','[...]' );
foreach($terms as $key => $term) {
    if(in_array($term, $anti_terms) || is_numeric($term)) {
        unset($terms[$key]);
        }
    }

如果我将 print_r 放在预浸料之前,则 S 很好,如果我在预浸料之前放置,则它们会显示为黑色菱形。所以我猜 preg 函数就是问题所在!


刚刚发现这个: http://www.php.net/manual/fr/function .preg-replace.php#84385

但似乎

mb_ereg_replace()

会导致与 preg_replace() 相同的问题....


解决方案:

mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");
$terms = mb_ereg_replace('\s+', ' ', $terms);

错误消失了...!

So I am working in a site that sores cuneiform tablets info. We use semitic chars for transliteration.

In my script, I create a term list from the translittaration of a tablet.

My problem is that with the Š, my script created two different terms because it thinks there is a space in the word because of the way cake treats the special char.

Exemple :

Partial contents of a tablet :

  1. utu-DIŠ-nu-il2

Terms from the tablet when treated by my script :

utu-DIŠ, -nu-il2

it should be :

utu-DIŠ-nu-il2

When I print the contents of my array in course of treatment of the contents, I see this :

  1. utu-DI� -nu-il2

So this means the uncorrect parsing of the text creates a space that is interpreted in my script as 2 words instead of one.

In the database, the text is fine...

I also get these errors :

Warning (512): SQL Error: 1366: Incorrect string value: '\xC5' for column 'term' at row 1 [CORE\cake\libs\model\datasources\dbo_source.php, line 684]

Query: INSERT INTO terms (term, lft, rght) VALUES ('utu-DI�', 449, 450)

Query: INSERT INTO terms (term, lft, rght) VALUES ('A�', 449, 450)

Query: INSERT INTO terms (term, lft, rght) VALUES ('xDI�', 449, 450)

Anybody knows what I could do to make this work ?

Thanks !

Added info :

    $terms=$this->data['Tablet']['translit'];
$terms= str_replace(array('\r\n', '\r', '\n','\n\r','\t'), ' ', $terms);
$terms = trim($terms, chr(173));
print_r($terms);
$terms = preg_replace('/\s+/', ' ', $terms);
$terms = explode(" ", $terms);
$terms=array_map('trim', $terms);
$anti_terms = array('@tablet','1.','2.','3.','4.','5.','6.','7.','7.','9.','10.','11.','12.','13.','14.','15.','16.','17.','18.','19.','20.','Rev.',
'Obv.','@tablet','@obverse','@reverse','C1','C2','C3','C4','C5','C6','C7','C8','C9', '\r', '\n','\r\n', '\t',''. ' ', null, chr(173), 'x', '[x]','[...]' );
foreach($terms as $key => $term) {
    if(in_array($term, $anti_terms) || is_numeric($term)) {
        unset($terms[$key]);
        }
    }

If I put my print_r before the preg, the S are good, if I do it after, they display with the black lozenge. So I guess the preg function is the problem !


just found this :
http://www.php.net/manual/fr/function.preg-replace.php#84385

But it seems that

mb_ereg_replace()

causes the same problem as preg_replace() ....


Solutuion :

mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");
$terms = mb_ereg_replace('\s+', ' ', $terms);

and error is gone ... !

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

离线来电— 2024-11-24 20:14:59
mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");
$terms = mb_ereg_replace('\s+', ' ', $terms);
mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");
$terms = mb_ereg_replace('\s+', ' ', $terms);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文