相关性百分比(搜索字符串在源字符串中更“左”)
任何人都可以帮助选择算法吗?
比较两个字符串并给出相关性百分比(如果找到更多左侧,则排名更高)。 也许合并两种算法 例如: 寻找“巧克力白羽毛笔”
我们有记录
CHOCOLATE, WHITE/DARK QUILLS [MONA LISA, 4 #/CS]
CHOCOLATE, WHITE QUILLS [SWISS CHALET, 900 GR BOX]
PASTRY INGR., CHOCOLATE QUILLS WHITE [SWISS CHALET FINE FO, 16 / 120 CT]
结果必须是这样的:
CHOCOLATE, WHITE QUILLS [SWISS CHALET, 900 GR BOX] | 0,1
CHOCOLATE, WHITE/DARK QUILLS [MONA LISA, 4 #/CS] | 0,2
PASTRY INGR., CHOCOLATE QUILLS WHITE [SWISS CHALET FINE FO, 16 / 120 CT] | 0,4
就像你可以看到它不是强比较必须使用 我现在使用 JaroWinkler
现在结果是这样的
CHOCOLATE, WHITE/DARK QUILLS [MONA LISA, 4 #/CS] | 0,3775
CHOCOLATE, WHITE QUILLS [SWISS CHALET, 900 GR BOX] | 0,3769
PASTRY INGR., CHOCOLATE QUILLS WHITE [SWISS CHALET FINE FO, 16 / 120 CT] | 0,3728
Can anybody help with selecting of algorithm.
To compare two strings and give a relevance percentage(if more left found, more rank).
maybe merge two algorithms
for example:
looking for "chocolate white quills"
we have records
CHOCOLATE, WHITE/DARK QUILLS [MONA LISA, 4 #/CS]
CHOCOLATE, WHITE QUILLS [SWISS CHALET, 900 GR BOX]
PASTRY INGR., CHOCOLATE QUILLS WHITE [SWISS CHALET FINE FO, 16 / 120 CT]
the result must be like this:
CHOCOLATE, WHITE QUILLS [SWISS CHALET, 900 GR BOX] | 0,1
CHOCOLATE, WHITE/DARK QUILLS [MONA LISA, 4 #/CS] | 0,2
PASTRY INGR., CHOCOLATE QUILLS WHITE [SWISS CHALET FINE FO, 16 / 120 CT] | 0,4
like you can see it is not strong compare must be used
I'm now use JaroWinkler
and now result like this
CHOCOLATE, WHITE/DARK QUILLS [MONA LISA, 4 #/CS] | 0,3775
CHOCOLATE, WHITE QUILLS [SWISS CHALET, 900 GR BOX] | 0,3769
PASTRY INGR., CHOCOLATE QUILLS WHITE [SWISS CHALET FINE FO, 16 / 120 CT] | 0,3728
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对于任何文本排名,您都需要明确要衡量的内容。在您的示例中,
为什么第一项的排名低于第二项?我有点理解为什么底部的评分最高,因为它包含字符串中的所有项目,没有中间字符串。请提供更多详细信息,我们将尽力提供帮助。
With any ranking of text, you need to be explicit about what you're trying to measure. In your example
Why is the first item ranked lower than the second? I sort of understand why the bottom one has the highest rating, because it contains all the items in the string without intermediate strings. Provide some more details and we'll try to help.