找到与关键字给定的相似字符串,每个关键字都有自己的“力量”;
这个问题对我来说是一个挑战,我的朋友不能告诉我该怎么做,但他真的是很好的程序员(我认为)。
用户可以将句子存入数据库。当用户放置一个句子时,它会保存在sentences
表中。 接下来,句子被分割成单词,单词的每个 soundex 被保存到带有分割句子 id 的表 tags
中。
最后,将单词的每个 soundex 放入 weights
表中,如果存在相同的 soundex,则函数将该 soundex 的 counter
加 1。
(对于那些不知道的人:soundex 是一个返回字符串的语音表示(发音方式)的函数)
数据库的结构: 一张表 sentences
包含两行:id
和 sentence
。 其他表tags
包含id
(with是句子的id)和tag
(with是句子中的一个单词)。 tag
实际上不仅仅是简单的单词,而是这个单词的发音。 最后一个表weights
包含tag
和weight
(带有数字,它告诉我们表tags<中有多少个这样的标签) /code>)
我的问题是:如何使函数返回给定字符串的相似句子。 它应该使用标签(单词的 soundex),并且每个标签应该基于权重
表拥有自己的权力。 经常使用的标签比更原始的标签更重要。可以只用一个mysql查询来完成吗?
下一个问题:我认为这种寻找相似句子的方式很好,但是这个函数的速度怎么样? 我需要在我的网站中非常非常频繁地使用它。
This question is a chalenge for me, my friend can`t tell me how to do it, but he is really good programmer (I think).
Users can put into database sentences. When user puts a sentence it is saved in sentences
table.
Next, sentence is split into words, each soundex of the word is saved into table tags
with id of the splited sentence.
Last, each soundax of the word is put into weights
table, if there arleady is the same soundex, function adds 1 to counter
of this soundex.
(For those who dont know: soundex is a function that returns a phonetic representation (the way it sounds) of a string)
Structure of the database:
One table sentences
contains two rows: id
and sentence
.
Other table tags
contains id
(with is id of a sentence) and tag
(with is one word from the sentence).tag
isn't really just plain word, but soundex of this word.
Last table weights
contains tag
and weight
(with is number, it tells us how many there is tags like this in table tags
)
My question is: how can I make a function witch returns similar sentences to given string.
It should use tags (soundex of word) and each tag should have its own power based on weights
table.
Tags, that are often used are more important, then more original tags. Can it be done in just one mysql query?
Next question: I think that this way of looking for similar sentences is good, but what with speed of this function?
I need to use it very very often in my site.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
那么为什么没有一个权重表,而是一个将标签与句子相关联的表呢?因此,有一个名为
sentence_tags
的表,其中包含sentence_id
和tag_id
列。然后,您可以通过对这两个表进行联接来计算权重,并且仍然引用包含该标签的句子。您也可以在使用时将标签和 soundex 都存储在标签表中。Well instead of having a weights table, why don't you have a table that relates tags to sentences? So have a table called
sentence_tags
with asentence_id
and atag_id
column. Then you can compute the weights by doing a join on those two tables, and still reference back to the sentence that contains the tag. You may as well store both the tag and the soundex in the tags table, while you're at it.也许 Levenshtein 距离 就是您正在寻找的。它计算从一个单词转移到另一个单词所需的步骤数。
请务必意识到这是一项成本高昂的操作。
Perhaps the Levenshtein Distance is what you are looking for. It calculates the number of steps there are needed to transfer from one word to another.
Do realize this is a costly operation.
Joe K 的建议对于良好的数据库设计来说似乎是正确的。
意思是,使用 join 语句和 PHP 在运行时计算权重。
我知道这可能不是您设计中的正确解决方案,但通常花一些时间进行智能数据库结构设计将使一切工作得更好。
Joe K's suggestion seems spot on for good database design.
Meaning, use the join statement and PHP to calculate the weight at run-time.
I understand this may not be the correct solution in your design, but often a few moments spent on smart database struture design will make everything work that much better.