加快文本比较(使用稀疏矩阵)
我有一个函数,它接受两个字符串并给出余弦相似度值,该值显示两个文本之间的关系。
如果我想相互比较 75 个文本,我需要进行 5,625 次单次比较才能将所有文本相互比较。
有没有办法减少比较次数?例如稀疏矩阵或 k 均值?
我不想谈论我的功能或比较文本的方法。只是减少比较的次数。
I have a function which takes two strings and gives out the cosine similarity value which shows the relationship between both texts.
If I want to compare 75 texts with each other, I need to make 5,625 single comparisons to have all texts compared with each other.
Is there a way to reduce this number of comparisons? For example sparse matrices or k-means?
I don't want to talk about my function or about ways to compare texts. Just about reducing the number of comparisons.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
本说的是真的,为了获得更好的帮助,您需要告诉我们目标是什么。
例如,如果您想查找相似的字符串,一种可能的优化是将字符串向量存储在空间数据结构(例如四叉树)中,您可以在其中彻底丢弃彼此相距太远的向量,避免很多比较。
What Ben says it's true, to get better help you need to tell us what's the goal.
For example, one possible optimization if you want to find similar strings is storing the string vectors in a spatial data structure such as a quadtree, where you can outright discard the vectors that are too far away from each other, avoiding many comparisons.
如果您的算法是成对的,那么根据定义,您可能无法减少比较次数。
您需要使用不同的算法,或者如果您想减少比较次数,至少需要预处理您的输入。
如果没有您的功能的详细信息,就很难提供任何具体的帮助。
If your algorithm is pair-wise, then you probably can't reduce the number of comparisons, by definition.
You'll need to use a different algorithm, or at the very least pre-process your input if you want to reduce the number of comparisons.
Without the details of your function, it's difficult to give any concrete help.