PHP Regex：搜索英语和阿拉伯语文本的文章

发布于 2024-11-03 20:47:15 字数 406 浏览 3 评论 0原文

我正在搜索英语和阿拉伯语关键字的文章。这些文章可以是英语或阿拉伯语。

我当前的代码是：

$k = implode("|", $keywords);
$regexp = "/(?i)\b(".$k.")\b/";
preg_match_all( $regexp, $content, $matches );

但由于某种原因，这在阿拉伯语文章中找不到关键字。我已经验证关键字和文章都被正确阅读；没有编码问题。

我可以做什么来解决这个问题？请注意，我无法检测文章或关键字是英语还是阿拉伯语，因此必须有一个正则表达式来匹配它们。

原文

Code sample

I'm searching articles for keywords which are in both English and Arabic.
The articles can be either in English or Arabic.

My current code is:

$k = implode("|", $keywords);
$regexp = "/(?i)\b(".$k.")\b/";
preg_match_all( $regexp, $content, $matches );

But this doesn't find keywords in Arabic articles for some reason. I've verified that both the keywords and articles are being read correctly; no encoding issues.

What can I do to fix this? Note that there is no way for me to detect whether an article or keyword is in English or Arabic, so there has to be a single regex to match them all.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

三寸金莲 2024-11-10 20:47:16

您的正则表达式可能只是缺少 /unicode 标志：

$regexp = "/(?i)\b(".$k.")\b/u";

否则 PCRE 必须比较字节。在这种情况下，它可能仍然能够找到单词（当 UTF-8 编码相同时），但永远不会检测到单词 \boundaries。

更新
好的 \b 实际上只检测 \w 边界（因此取决于区域设置而不是 /u 标志）。然后尝试使用断言：

$regexp = "/(?<!\p{L})(".$k.")(?!\p{L})/ui";

Your regex might simply lack the /unicode flag:

$regexp = "/(?i)\b(".$k.")\b/u";

Otherwise PCRE has to compare bytes. In that case it might still be able to find the words (when the UTF-8 encoding is identical), but won't ever detect the word \boundaries.

Update
Okay \b really only detects \w boundaries (so depends on the locale setting instead of /u flag). Then try this instead, which uses assertions:

$regexp = "/(?<!\p{L})(".$k.")(?!\p{L})/ui";

回复收藏 0 原文

~没有更多了~

关于作者

乖不如嘢

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

PHP Regex：搜索英语和阿拉伯语文本的文章

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

Gabu-gabumon

qq_CgiN62

荔枝明

赏烟花じ飞满天

独守阴晴ぅ圆缺

¤→小豸慧

友情链接

PHP Regex：搜索英语和阿拉伯语文本的文章

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

Gabu-gabumon

qq_CgiN62

荔枝明

赏烟花じ飞满天

独守阴晴ぅ圆缺

¤→小豸慧

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。