找到与关键字给定的相似字符串,每个关键字都有自己的“力量”;

发布于 2024-10-26 22:18:16 字数 868 浏览 1 评论 0原文

这个问题对我来说是一个挑战,我的朋友不能告诉我该怎么做,但他真的是很好的程序员(我认为)。

用户可以将句子存入数据库。当用户放置一个句子时,它会保存在sentences表中。 接下来,句子被分割成单词,单词的每个 soundex 被保存到带有分割句子 id 的表 tags 中。

最后,将单词的每个 soundex 放入 weights 表中,如果存在相同的 soundex,则函数将该 soundex 的 counter 加 1。

(对于那些不知道的人:soundex 是一个返回字符串的语音表示(发音方式)的函数)

数据库的结构: 一张表 sentences 包含两行:idsentence。 其他表tags包含id(with是句子的id)和tag(with是句子中的一个单词)。 tag 实际上不仅仅是简单的单词,而是这个单词的发音。 最后一个表weights包含tagweight(带有数字,它告诉我们表tags<中有多少个这样的标签) /code>)

我的问题是:如何使函数返回给定字符串的相似句子。 它应该使用标签(单词的 soundex),并且每个标签应该基于权重表拥有自己的权力。 经常使用的标签比更原始的标签更重要。可以只用一个mysql查询来完成吗?

下一个问题:我认为这种寻找相似句子的方式很好,但是这个函数的速度怎么样? 我需要在我的网站中非常非常频繁地使用它。

This question is a chalenge for me, my friend can`t tell me how to do it, but he is really good programmer (I think).

Users can put into database sentences. When user puts a sentence it is saved in sentences table.
Next, sentence is split into words, each soundex of the word is saved into table tags with id of the splited sentence.

Last, each soundax of the word is put into weights table, if there arleady is the same soundex, function adds 1 to counter of this soundex.

(For those who dont know: soundex is a function that returns a phonetic representation (the way it sounds) of a string)

Structure of the database:
One table sentences contains two rows: id and sentence.
Other table tags contains id (with is id of a sentence) and tag (with is one word from the sentence).
tag isn't really just plain word, but soundex of this word.
Last table weights contains tag and weight (with is number, it tells us how many there is tags like this in table tags)

My question is: how can I make a function witch returns similar sentences to given string.
It should use tags (soundex of word) and each tag should have its own power based on weights table.
Tags, that are often used are more important, then more original tags. Can it be done in just one mysql query?

Next question: I think that this way of looking for similar sentences is good, but what with speed of this function?
I need to use it very very often in my site.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

若沐 2024-11-02 22:18:16

那么为什么没有一个权重表,而是一个将标签与句子相关联的表呢?因此,有一个名为 sentence_tags 的表,其中包含 sentence_idtag_id 列。然后,您可以通过对这两个表进行联接来计算权重,并且仍然引用包含该标签的句子。您也可以在使用时将标签和 soundex 都存储在标签表中。

Well instead of having a weights table, why don't you have a table that relates tags to sentences? So have a table called sentence_tags with a sentence_id and a tag_id column. Then you can compute the weights by doing a join on those two tables, and still reference back to the sentence that contains the tag. You may as well store both the tag and the soundex in the tags table, while you're at it.

长亭外,古道边 2024-11-02 22:18:16

也许 Levenshtein 距离 就是您正在寻找的。它计算从一个单词转移到另一个单词所需的步骤数。

请务必意识到这是一项成本高昂的操作。

Perhaps the Levenshtein Distance is what you are looking for. It calculates the number of steps there are needed to transfer from one word to another.

Do realize this is a costly operation.

傻比既视感 2024-11-02 22:18:16

Joe K 的建议对于良好的数据库设计来说似乎是正确的。

不要存储可以推断的信息。

意思是,使用 join 语句和 PHP 在运行时计算权重。

我知道这可能不是您设计中的正确解决方案,但通常花一些时间进行智能数据库结构设计将使一切工作得更好。

Joe K's suggestion seems spot on for good database design.

Do not store information that can be extrapolated.

Meaning, use the join statement and PHP to calculate the weight at run-time.

I understand this may not be the correct solution in your design, but often a few moments spent on smart database struture design will make everything work that much better.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文