n维匹配算法
在这里寻求一些建议。 有谁知道在 n 维空间中开始研究匹配算法的好地方。 例如,任何约会网站都必须使用某种算法来匹配 2 个人。 我读到的是,我们可以将一个人的特征映射到一个 n 维数组中,每个特征都有一个点系统。 一旦我们拥有一个人的所有(可用)特征,我们就可以用 n 维数组中的一个点来表示这个人。 然后,匹配 2 个人就像在这个 n 维数组中找到 2 点之间的最短距离一样简单。 有人对此类算法的实现有任何参考吗? 编写此类内容的最佳语言是什么?
Looking for some advice here. Does anyone know a good place to start looking into matching algorithm in a n-dimensional space. For example, any dating site out there must be using some sort of algorithm to match 2 people. What I have read is that we can map characteristics of a person in a n-dimensional array with a point system for each characteristic. Once we have all (available) characteristics of a person, we can represent this person in a point within a n-dimensional array. Then, to match 2 person would be as simple as finding the shortest distance between 2 point in this n-dim array. Does anyone has any reference in implementation of these kind of algorithm? What's the best language to write these kind of stuff in?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果您想找到一个人最接近的匹配,Bentley & Shamos 发表了一种多维分治方法:O(N log N) 时间内的分治方法:多维空间中的分而治之,1976 年第八届 ACM 计算理论年度研讨会论文集。如果您找不到副本 这个也可能有帮助。
然而,对于您的示例应用程序来说,实际上找到最近的邻居似乎并不是最大的问题 - 更棘手的是将输入映射到维度。 例如,如果一个维度是“喜欢动物”,那么您对喜欢狗和动物的人给予什么价值? 猫却不能忍受马? 如果有人喜欢马、认为狗还好、对猫感到恼火、对金鱼抱有矛盾的态度呢?
If you want to find the closest match for one person, Bentley & Shamos published a multi-dimensional divide-and-conquer method: Divide-and-conquer in O(N log N) time: Divide-and-conquer in multidimensional space in Proceedings of the eighth annual ACM symposium on Theory of computing 1976. If you can't get a copy this may also be helpful.
However for your example application actually finding the nearest neighbour doesn't seem to be the biggest problem - much trickier is mapping inputs into dimensions. For example if one dimension is "likes animals", what value do you give to someone who likes dogs & cats but can't stand horses? What about someone who loves horses, thinks dogs are OK, is annoyed by cats and is ambivalent about goldfish?
首先,选择您最熟悉的语言。 处理这个问题的算法相当简单,并且应该适用于任何现代语言。 (只要有一些数组的概念,并且可能有一个矩阵库,你应该没问题。)我之前已经用 C、C++ 和 C# 实现了其中的许多概念,但也见过 python、vb.net 等中的实现 。
根据您想要做什么,有几种选择
话虽这么说,你想做什么取决于你的目标。 如果您只想找到最佳匹配,可以使用简单的距离计算(即:n 维数组中每个维度/属性的平方和的 sqrt),可选择对每个属性距离进行加权,并使用最近的点。
如果您想将人员分组,您需要查看聚类算法。 对于这样的数据,我怀疑某种形式的 K 均值聚类或模糊 c 均值聚类效果最好。
First off, pick the language you are most familiar with. The algorithms for handling this are fairly straightforward, and should work in any modern language. (As long as there is some concept of array, and potentially a matrix library, you should be fine.) I've implemented many of these in C, C++, and C# before, but seen implementations in python, vb.net, etc.
Depending on what you are trying to do, there are a couple of options.
That being said, what you want to do depends on your goals. If you just want to find the best match, you can use simple distance calculations (ie: sqrt of sum of squares for each dimension/property in the n-dimensional array), optionally weight each properties distance, and use the closest point.
If you want to group together people, you'll want to look at clustering algorithms. For data like this, I would suspect that some form of K-Means clustering or fuzzy c-means clustering would work the best.
下面的解决方案怎么样。
假设用户是U1、U2、U3、U4、U5……Un。
属性为 A1、A2、A3、A4、A5..... Am
将它们存储为
A1 - U1、U2、U3...
A2 - U4、U6、U7...
A3-
Profile属性是索引,存储所有用户。 现在,如果有新用户到来,请查看其属性,并针对这些属性找到普通人。 一个人出现在这些列表中的次数 - 排名更高。
How about following solution.
Assume the users are U1, U2, U3, U4, U5.... Un.
Attributes are A1, A2, A3, A4, A5..... Am
Store these as
A1 - U1, U2, U3...
A2 - U4, U6, U7....
A3 -
Profile attribute is the index and stores all users. Now, if a new User comes, see the its attributes and for those attributes, find common people. number of times a person exists in these lists - higher ranking.
您提到的过程称为 k 最近邻,k=1。 这是查找相似向量的最直观方法。
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
The process you mention is known as k-nearest neighbor, with k=1. It's the most intuitive approach for finding similar vectors.
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
您在示例中描述的不是n维匹配,而是具有多个节点的二分匹配特征。 (您需要提供一个函数,在两个人计算这个距离的情况下)。 应该有非常有效的算法。 在 n 维匹配中,您将尝试匹配来自两个以上集合的节点(在您的示例中,假设您可以将人切割为身体、灵魂和音乐偏好,然后将它们重新组合以形成新的人。然后 n 维匹配将将人们分开,然后重新组合,这样新设计的人就能成为非常好的情侣:D) 这是 关于 3 维匹配的维基百科文章,它是 np 完全的。
另外,正如另一个人指出的那样,如果您的目标不是将人配对,而是找到兼容的群体,那么您应该考虑将他们分组。 这可以通过例如无监督学习来完成
What you describe with your example is not n-dimentional matching, but rather bipartite matching of nodes with multiple features. (You will need to provide a function that will, given two persons compute this distance). There should be very efficient algorithms for that. In n-dimentional matching you would try to match nodes from more than two sets (in your example, suppose you could cut people to body, soul, and music preferences, then recombine them to make new persons. Then the n-dimentional matching would cut the people apart, and recombine them so that the new persons engineered would make really nice couples :D ) Here is the wikipedia article for 3-dimentional matching, which is np-complete.
Also, as noted by another, if your goal is not to match persons in pairs, but rather find compatible groups, you should consider clustering them to groups. This can be done with e.g. Unsupervised Learning