在 Matlab 中对相似值进行分组
如果我有一个包含值 [6712, 7023, 7510, 7509, 6718, 7514, 7509, 6247]
的数组,并且我想要 4 组相似的数字,以便输出为 4 个矩阵
[6247]
[6712, 6718]
[7023]
[7510, 7509, 7514, 7509]
:是实现这一目标的最佳方法吗?
If I have an array that contains the values [6712, 7023, 7510, 7509, 6718, 7514, 7509, 6247]
and I want 4 groups of similar numbers so that the output is 4 matrices:
[6247]
[6712, 6718]
[7023]
[7510, 7509, 7514, 7509]
What would be the best way to accomplish this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我相信您正在寻找的术语是聚类。例如,我们可以应用 Kmeans 算法 将数据分为 4 个集群:
这是我得到的结果之一:
通常这对于更多的点效果最好(也适用于多维数据)
I believe the term you are looking for is clustering. For example, we can apply the Kmeans algorithm to group the data into 4 clusters:
This is one of the result I get:
Usually this works best with more points (also works for multidimensional data)
实际上,对于您的具体情况,实际上不需要任何类型的复杂(且相当难以理解)的聚类过程,也不需要任何(看似简单的)基于显式排序的解决方案。
假设现在您的值彼此接近(或多或少,如
abs(xx_0)<= 50
)定义了(感兴趣的)组,那么为什么不继续进行非常简单直接的方式。因此,通过利用彼此价值观的“最自然”接近;您可以简单地进行如下操作:
Actually, for your specific case, there's really no need for any kind of complicated (and quite incomprehensible) clustering procedure, nor any (seemingly simple looking) explicit sorting based solution.
Assuming now that your values, close to each other (more or less, like
abs(x- x_0)<= 50
) defines the groups (of interest), then why not just proceed with a very simple and straightforward manner.Thus, by utilizing the 'most natural' proximity of your values to each other; you could simply proceed as follows:
你所说的“相似”是什么意思?例如,为什么6718与7023不相似?我们的意思是“组中连续整数之间的差异
如果是这样,请对数组进行排序,然后逐步遍历它,确定需要的边界(即,当差异太大时)。然后简单地分割出一个新的数组。
例如...
What do you mean by "similar"? For instance, why is 6718 not similar to 7023? Do we mean "difference < N between consecutive ints in a group"?
If so, sort the array and then step through it, identifying boundaries where you need them (i.e., when the difference is too great). Then simply split off a new array.
Such as...
你必须首先决定确定群体边界的标准是什么。例如,您可以将阈值设置为 50,因此任何与其最接近的较大或较小值不同的值都被视为位于不同的组中。
您可以通过首先使用函数 SORT,然后使用函数 差异 和查找。计算这些索引之间的差异(再次使用函数DIFF)可以得到每个组的大小向量,可用于使用函数 MAT2CELL。代码如下所示:
groupArray
将是一个 1×4 元胞数组,其中每个元胞包含一组值。以下是上面示例的groupArray
的内容:You have to first decide what the criteria is for determining the boundaries of your groups. For example, you could set a threshold value of 50, so any values that differ from their nearest larger or smaller value are considered to be in a different group.
You can solve this in a vectorized way by first sorting the array using the function SORT, then finding the indices into the sorted array where the differences between neighboring values are greater than your threshold (i.e. where the group boundaries are) using the functions DIFF and FIND. Taking the differences between these indices (again using the function DIFF) gives you a vector of sizes for each group, which can be used to break the sorted array into a cell array using the function MAT2CELL. Here's what the code would look like:
And
groupArray
will be a 1-by-4 cell array where each cell contains a set of values for a group. Here are the contents ofgroupArray
for the above example: