在 Matlab 中对相似值进行分组

发布于 2024-11-26 09:55:28 字数 211 浏览 0 评论 0原文

如果我有一个包含值 [6712, 7023, 7510, 7509, 6718, 7514, 7509, 6247] 的数组,并且我想要 4 组相似的数字,以便输出为 4 个矩阵

[6247]
[6712, 6718]
[7023]
[7510, 7509, 7514, 7509]

:是实现这一目标的最佳方法吗?

If I have an array that contains the values [6712, 7023, 7510, 7509, 6718, 7514, 7509, 6247] and I want 4 groups of similar numbers so that the output is 4 matrices:

[6247]
[6712, 6718]
[7023]
[7510, 7509, 7514, 7509]

What would be the best way to accomplish this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

幸福丶如此 2024-12-03 09:55:28

我相信您正在寻找的术语是聚类。例如,我们可以应用 Kmeans 算法 将数据分为 4 个集群:

X = [6712, 7023, 7510, 7509, 6718, 7514, 7509, 6247];
[IDX,C] = kmeans(X, 4, 'EmptyAction','singleton');
G = cell(4,1);
for i=1:4
    G{i} = X(IDX==i);
end 

这是我得到的结果之一:

>> G{:}
ans =
        7510        7509        7514        7509
ans =
        7023
ans =
        6247
ans =
        6712        6718

通常这对于更多的点效果最好(也适用于多维数据)

I believe the term you are looking for is clustering. For example, we can apply the Kmeans algorithm to group the data into 4 clusters:

X = [6712, 7023, 7510, 7509, 6718, 7514, 7509, 6247];
[IDX,C] = kmeans(X, 4, 'EmptyAction','singleton');
G = cell(4,1);
for i=1:4
    G{i} = X(IDX==i);
end 

This is one of the result I get:

>> G{:}
ans =
        7510        7509        7514        7509
ans =
        7023
ans =
        6247
ans =
        6712        6718

Usually this works best with more points (also works for multidimensional data)

弥繁 2024-12-03 09:55:28

实际上,对于您的具体情况,实际上不需要任何类型的复杂(且相当难以理解)的聚类过程,也不需要任何(看似简单的)基于显式排序的解决方案。

假设现在您的值彼此接近(或多或少,如 abs(xx_0)<= 50)定义了(感兴趣的)组,那么为什么不继续进行非常简单直接的方式。

因此,通过利用彼此价值观的“最自然”接近;您可以简单地进行如下操作:

>>> x= [6712 7023 7510 7509 6718 7514 7509 6247]; g= round(x/ 50)
g =
   134 140 150 150 134 150 150 125

>>> groups= {}; for g_u= unique(g), groups{end+ 1}= x(g_u== g); end
>>> groups
groups =
{
  [1,1] =  6247
  [1,2] =  6712 6718
  [1,3] =  7023
  [1,4] =  7510 7509 7514 7509
}

Actually, for your specific case, there's really no need for any kind of complicated (and quite incomprehensible) clustering procedure, nor any (seemingly simple looking) explicit sorting based solution.

Assuming now that your values, close to each other (more or less, like abs(x- x_0)<= 50) defines the groups (of interest), then why not just proceed with a very simple and straightforward manner.

Thus, by utilizing the 'most natural' proximity of your values to each other; you could simply proceed as follows:

>>> x= [6712 7023 7510 7509 6718 7514 7509 6247]; g= round(x/ 50)
g =
   134 140 150 150 134 150 150 125

>>> groups= {}; for g_u= unique(g), groups{end+ 1}= x(g_u== g); end
>>> groups
groups =
{
  [1,1] =  6247
  [1,2] =  6712 6718
  [1,3] =  7023
  [1,4] =  7510 7509 7514 7509
}
七颜 2024-12-03 09:55:28

你所说的“相似”是什么意思?例如,为什么6718与7023不相似?我们的意思是“组中连续整数之间的差异

如果是这样,请对数组进行排序,然后逐步遍历它,确定需要的边界(即,当差异太大时)。然后简单地分割出一个新的数组。

例如...

  GroupSimilar(values)
   1. result := list()
   2. values' := sort(values)
   3. temp := list()
   4. for i := 1 to |values'| - 1 do
   5.    if values'[i+1] - values'[i] <= diff then
   6.       temp.add(values'[i])
   7.    else
   8.        result.add(temp)
   9.        temp := list()
  10. return result

What do you mean by "similar"? For instance, why is 6718 not similar to 7023? Do we mean "difference < N between consecutive ints in a group"?

If so, sort the array and then step through it, identifying boundaries where you need them (i.e., when the difference is too great). Then simply split off a new array.

Such as...

  GroupSimilar(values)
   1. result := list()
   2. values' := sort(values)
   3. temp := list()
   4. for i := 1 to |values'| - 1 do
   5.    if values'[i+1] - values'[i] <= diff then
   6.       temp.add(values'[i])
   7.    else
   8.        result.add(temp)
   9.        temp := list()
  10. return result
拧巴小姐 2024-12-03 09:55:28

你必须首先决定确定群体边界的标准是什么。例如,您可以将阈值设置为 50,因此任何与其最接近的较大或较小值不同的值都被视为位于不同的组中。

您可以通过首先使用函数 SORT,然后使用函数 差异查找。计算这些索引之间的差异(再次使用函数DIFF)可以得到每个组的大小向量,可用于使用函数 MAT2CELL。代码如下所示:

threshold = 50;
array = [6712 7023 7510 7509 6718 7514 7509 6247];
sortedArray = sort(array);
nPerGroup = diff(find([1 (diff(sortedArray) > threshold) 1]));
groupArray = mat2cell(sortedArray,1,nPerGroup);

groupArray 将是一个 1×4 元胞数组,其中每个元胞包含一组值。以下是上面示例的 groupArray 的内容:

>> groupArray{:}

ans =

        6247

ans =

        6712        6718

ans =

        7023

ans =

        7509        7509        7510        7514

You have to first decide what the criteria is for determining the boundaries of your groups. For example, you could set a threshold value of 50, so any values that differ from their nearest larger or smaller value are considered to be in a different group.

You can solve this in a vectorized way by first sorting the array using the function SORT, then finding the indices into the sorted array where the differences between neighboring values are greater than your threshold (i.e. where the group boundaries are) using the functions DIFF and FIND. Taking the differences between these indices (again using the function DIFF) gives you a vector of sizes for each group, which can be used to break the sorted array into a cell array using the function MAT2CELL. Here's what the code would look like:

threshold = 50;
array = [6712 7023 7510 7509 6718 7514 7509 6247];
sortedArray = sort(array);
nPerGroup = diff(find([1 (diff(sortedArray) > threshold) 1]));
groupArray = mat2cell(sortedArray,1,nPerGroup);

And groupArray will be a 1-by-4 cell array where each cell contains a set of values for a group. Here are the contents of groupArray for the above example:

>> groupArray{:}

ans =

        6247

ans =

        6712        6718

ans =

        7023

ans =

        7509        7509        7510        7514
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文