“性格测验”风格比较[PHP]
我正在尝试考虑一种有效或合理的算法来获取用户提交的测试结果,并将其与多个配置文件的值进行比较以找到匹配项(例如在线约会服务如何将您的答案与合适的伴侣相匹配)。
我真的不知道该怎么做。如果用户回答了 10 个关于他自己的问题,并且有 10 个候选人可以与他匹配,我们就会通过数据库查看数千个比较。必须有更好的方法来做到这一点。
在我所做的研究中,也许我可以使用编辑距离函数来完成此任务,但我不知道如何去做,因为我对此并不完全熟悉,也不太理解它。但也许我可以做一些事情,比如将连接成字符串的用户结果(例如“AEBCDAABEAD”)与每个候选人的答案进行比较,并以这种方式衡量相似性?
有什么建议吗?
非常感谢。
I'm trying to think of an efficient or reasonable algorithm to take the results of a test submitted by the user and compare them with the values of several profiles to find a match (like how online dating services match your answers to suitable mates).
I really have no idea how to go about this. If the user answers 10 questions about himself and there are 10 candidates to match him with, we're looking at thousands of comparisons through the database. There must be a better way to do this.
Of the research I've done, maybe I could accomplish this with the Levenshtein distance function, but I don't know how to go about it because I'm not entirely familiar with this and I don't understand it that well. But maybe I could do something like compare the user's results concatenated into a string (e.g. 'AEBCDAABEAD') with the answers of each candidate and measure similarity that way?
Any suggestions?
Thanks much.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为使用确切的答案并不是一个足以满足各种目的的灵活解决方案,因为其他答案可能不会对某些配置文件类型产生深远的影响。
拥有 1-2 和 3-4 的人仍然会得到不匹配的答案,即使该人选择 20-25 的答案相差甚远。据我所知,Levensthein 的“AB”和“AC”与“AZ”和“AB”相似。
Levensthein 算法也是一个好主意,我想如果您基于问题进行此操作,在某些情况下您会得到更糟糕的匹配。
让我描述一下当我读到你的问题时我想到的技术。
配置文件类别和答案权重
我正在考虑一种配置,您可以在其中描述一些配置文件或属性类别。
我们以食物的口味为例。所以我们的类别可能如下所示:
甜、酸、辣、正常
等。现在,对于您的调查,我将为每个问题配置一个类别权重。您可以积累。
示例
你喜欢辣椒酱
是 -
辣 +3
否 -
spicy -1
现在您可以使用算法来确定每个类别中的距离并在计算中对它们进行加权。
现在您可以比较人的决定,并看到 [2] 和 [3] 之间的距离比 [1] 和 [2] 之间的距离小得多。
注意:我在这里不是在讨论 Levensthein 距离,因为这些值是数字,并且计算给出的结果比不匹配字符更好。
我不确定这对您是否有帮助,但这出现在我的脑海中,并且似乎是一个巧妙的解决方案。
I think using the exact answers is not a solution flexible enough for every purpose, because other answers may not have a deep impact on certain profile types.
Someone with 1-2 and 3-4 will still have a non-matching answer, even if the person take 20-25 is way off. Afaik with Levensthein 'AB' and 'AC' are as similar as 'AZ' to 'AB'.
Also the Levensthein algorith is a good idea, I guess you get too worse matches in some cases, if you do this question-based.
Let me describe what technique comes into my mind, when I read you question.
Profile categories and answer weight
I'm thinking of a configuration where you can described a few profiles or attribute categories.
Let's take for example food tastes. So our categories may look like:
sweet, sour, spicy, normal
etc.Now for your survey I would configure for each question a category weight. which you can accumulate.
Example
Do you like chili con carne
Yes -
spicy +3
No -
spicy -1
Now you can use a algorithm to determine the distance in each category and weight them in a caculation.
Now you can compare for example the persons decision and see, that the distance between [2] and [3] is way smaller than between [1] and [2].
Note: I'm not talking about Levensthein distance here, because these values are numeric and a calculation give better results than just not matching characters.
I'm not sure if this is helpful to you, but thiis came into my mind and seemed to be a neat solution.