如何使用 TF-IDF 权重对相关性进行排名?
我有一组关键术语,并计算了 TF-IDF 权重以及每个术语的标签频率和术语计数,并保存在数据库中。
给定一个单数术语,如何使用这些 DB 值生成一组相关术语?
我已阅读有关 TF-IDF 的维基百科页面,并使用了许多与余弦相似度、n-gram 算法等相关的 Google 搜索结果。我的强项并不是线性代数、IR 或微积分,所以我很难理解这些文档。
我想了解 TF-IDF 权重与相关性的关系。有没有一种方法可以对这些值进行排序?我是否需要根据预定义术语的权重对它们进行排名?
现在我有了这些数字,如何使用它们?
I have a set of key terms and have calculated TF-IDF weights along with tag frequencies and term counts for each term, persisted in a database.
How can I use these DB values to produce a set of related terms, given a singular term?
I have read the Wikipedia page on TF-IDF and have consumed many Google search results having to do with cosine similarities, n-gram algorithms, and the like. My strengths are not really in linear algebra, IR, or calculus, so I'm struggling to make sense of those documents.
I'd like to know about the relationship of TF-IDF weights to relevancy. Is there a method to rank these values? Do I need to rank them in relation to the weight of a predefined term?
How can I use these numbers now that I have them?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论