如何编写一个PHP搜索脚本,其中带变音符号的单词与不带变音符号的搜索词相匹配,并且结果带有下划线?

发布于 2024-08-23 12:29:19 字数 132 浏览 10 评论 0原文

我有这个网站,其中有很多带有变音符号的文本(根据维基百科,辅助字形添加到字母中),并且大多数人使用没有字形的单词搜索这些文本。现在,通过拥有一份不带变音符号的文本副本来做到这一点应该不具有挑战性。不过,我想强调一下原文中的匹配项。 最好的方法是什么?

I've got this site where there are lots of texts with diacritics in them (ancillary glyphs added to letters, according to wikipedia) and most people search these texts using words without the glyphs. Now it shouldn't be challenging to do this by having a copy of the texts without diacritics. However, I want to highlight the matches in the original text.
What's the best way to do it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

翻了热茶 2024-08-30 12:29:20

您应该尝试更改 MySQL 数据库中的排序规则设置。

在该主题的讨论中似乎经常出现以下三个问题:

  1. utf8_general_ci

  2. utf8_unicode_ci< /p>

  3. utf8_bin ← 您可能想要这个。

我发现 #3 将匹配搜索中的重音符号。 这个答案提供了一些有关差异的背景信息,但没有提及事实上,utf8_bin 对重音也很敏感。您可能想尝试所有这三种方法,这样您就可以自己测试它是否适用于您正在处理的语言/脚本。

为了真正确保事情能够正确匹配,您还必须研究 Unicode 规范化,这实际上是一个完全不同的蜡球。您的用户可能会使用与存储数据的标准化不同的重音来输入查询,因此可能无法匹配。我在使用 Sqlite 时遇到过这个问题,不确定它是否适用于 MySQL。

FWIW,这是我当前正在使用的 CREATE TABLE,我需要匹配重音,即设置 COLLATION:

CREATE TABLE `glosses` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `morphemes` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `labels` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `phrase_id` int(11) DEFAULT NULL,
  `nth_word` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

您可以看到附加在的 COLLATE=utf8_bin结尾。

You should try changing the collation setting in your MySQL DB.

There are three that seem to come up often in discussions of this topic:

  1. utf8_general_ci

  2. utf8_unicode_ci

  3. utf8_bin ← You probably want this one.

I have found that #3 will match accents in search. This answer gives a bit of background on the differences, but it doesn't mention the fact that utf8_bin is also sensitive to accents. You might want to try all three so you can test for yourself if it's working with the language/script you're dealing with.

To be really sure that things are going to match correctly, you will have to look into Unicode Normalization as well, which is really a whole different ball of wax. It is possible that your user could type in a query with an accent in a different normalization from the one your data is stored in, and thus it might fail to match. I've had that problem with Sqlite, not sure if it applies to MySQL or not.

FWIW, here's a CREATE TABLE I'm currently using, where I needed to match accents, that is setting the COLLATION:

CREATE TABLE `glosses` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `morphemes` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `labels` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `phrase_id` int(11) DEFAULT NULL,
  `nth_word` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

You can see the COLLATE=utf8_bin tacked on at the end.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文