有哪些实用(且轻量级)的语义/数据匹配技术?
我有一个允许用户发布非结构化关键字的应用程序。同时,其他用户可以发布必须与一个或多个指定关键字相匹配的项目。对于任何一组用户可以使用的关键字都没有限制,因此,当现实情况是用户可能对同一事物使用不同的关键字或者它们足够接近时(例如, “自行车”和“骑自行车”,或“肉类”和“食物”)。
我需要它在移动设备(Android)上工作,所以我很乐意牺牲匹配精度来提高效率和占用空间小。我知道 s-match 但这依赖于 15MB 的支持字典,所以它并不理想。
还有哪些其他想法/方法/框架可能对此有所帮助?
I have an application that lets users publish unstructured keywords. Simultaneously, other users can publish items that must be matched to one or more specified keywords. There is no restriction on the keywords either set of users may use, so simply hoping for a collision is likely to mean very few matches, when the reality is users might have used different keywords for the same thing or they are close enough (eg, 'bicycles' and 'cycling', or 'meat' and 'food').
I need this to work on mobile devices (Android), so I'm happy to sacrifice matching accuracy for efficiency and a small footprint. I know about s-match but this relies on a backing dictionary of 15MB, so it isn't ideal.
What other ideas/approaches/frameworks might help with this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的“自行车”和“骑自行车”示例可以通过采用 Levenshtein 编辑距离算法来解决,因为这两个词有些相关。但是你的“肉”和“食物”的例子确实需要一个相当大的支持字典,当然,除非概念集或目标受众仅限于美食家。
您是否考虑过将字典托管为 Web 服务并根据需要访问数据?当然,缺点是您的应用程序只能在网络覆盖范围内运行。
Your example of 'bicycles' and 'cycling' could be addressed by a take on the Levenshtein edit-distance algorithm since the two words are somewhat related. But your example of 'meat' and 'food' would indeed require a sizable backing dictionary, unless of course the concept set or target audience is limited to say, foodies.
Have you considered hosting the dictionary as a web service and accessing the data as needed? The drawback of course is that your app would only work while in network coverage.