如何积累数据集？

发布于 2024-12-27 22:50:44 字数 390 浏览 2 评论 0原文

我的向量的值在 1 和 N > 之间1..有些值可能连续出现多次。现在我想要第二行计算连续条目的数量并删除所有连续出现的条目，例如：

A = [1 2 1 1 3 2 4 4 1 1 1 2]'

将导致：（

你看，第二列包含连续条目的数量！我最近在 MATLAB 中遇到了accumarray()，但我找不到任何用于此任务的解决方案，因为它始终考虑整个向量而不仅仅是连续的条目。

有什么想法吗？

原文

I have vector with values between 1 and N > 1. Some values COULD occur multiple times consecutively. Now I want to have a second row which counts the consecutively entries and remove all those consecutively occuring entries, e.g.:

A = [1 2 1 1 3 2 4 4 1 1 1 2]'

would lead to:

(you see, the second column contains the number of consecutively entries!
I came across accumarray() in MATLAB recently but I can't find any solution with it for this task since it always regards the whole vector and not only consecutively entries.

Any idea?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

孤君无依 2025-01-03 22:50:44

这可能不是最易读或最优雅的方法，但如果你有很大的向量并且速度是一个问题，这种向量化可能会有所帮助......

A = [1 2 1 1 3 2 4 4 1 1 1 2];

首先，我将用前导和尾随零填充 A 来捕获第一个和最后一个转换

>>  A = [0, A, 0];

可以在相邻值之间的差值不等于 0 的地方找到转换位置：

>> locations = find(diff(A)~=0);

但是因为我们用零填充了 A 的开头，所以第一个转换是无意义的，所以我们只从 2 中获取位置：结尾。其中 A 中的值是每个段的值：

>> first_column = A(locations(2:end))

ans =

     1     2     1     3     2     4     1     2

这是第一列 - 现在查找每个数字的计数。从地点的不同就可以看出这一点。这就是两端填充 A 变得重要的地方：

>> second_column = diff(locations)

ans =

 1     1     2     1     1     2     3     1

最后组合：

B = [first_column', second_column']

B =

 1     1
 2     1
 1     2
 3     1
 2     1
 4     2
 1     3
 2     1

这一切都可以组合成一行不太可读的行：

>> A = [1 2 1 1 3 2 4 4 1 1 1 2]';
>> B = [A(find(diff([A; 0]) ~= 0)), diff(find(diff([0; A; 0])))]

B =

 1     1
 2     1
 1     2
 3     1
 2     1
 4     2
 1     3
 2     1

This probably isn't the most readable or elegant way of doing it, but if you have large vectors and speed is an issue, this vectorisation may help...

A = [1 2 1 1 3 2 4 4 1 1 1 2];

First I'm going to pad A with a leading and trailing zero to capture the first and final transitions

>>  A = [0, A, 0];

The transition locations can be found where the difference between neighbouring values is not equal to zero:

>> locations = find(diff(A)~=0);

But because we padded the start of A with a zero, the first transition is nonsensical, so we only take the locations from 2:end. The values in A of these are the value of each segment:

>> first_column = A(locations(2:end))

ans =

     1     2     1     3     2     4     1     2

That's the first colomn - now to find the count of each number. This can be found from the difference in locations. This is where padding A at both ends becomes important:

>> second_column = diff(locations)

ans =

 1     1     2     1     1     2     3     1

Finally combining:

B = [first_column', second_column']

B =

 1     1
 2     1
 1     2
 3     1
 2     1
 4     2
 1     3
 2     1

This can all be combined into one less-readable line:

>> A = [1 2 1 1 3 2 4 4 1 1 1 2]';
>> B = [A(find(diff([A; 0]) ~= 0)), diff(find(diff([0; A; 0])))]

B =

 1     1
 2     1
 1     2
 3     1
 2     1
 4     2
 1     3
 2     1

回复收藏 0 原文

雨轻弹 2025-01-03 22:50:44

我没有看到循环数据集的另一种方法，但它相当简单。也许这不是最优雅的解决方案，但据我所知，它工作得很好。

function B = accum_data_set(A)
    prev = A(1);
    count = 1;
    B = [];
    for i=2:length(A)
        if (prev == A(i))
            count = count + 1;
        else
            B = [B;prev count];
            count = 1;
        end
        prev = A(i);
    end
    B = [B;prev count];

输出：

>> A = [1 2 1 1 3 2 4 4 1 1 1 2]';
>> B = accum_data_set(A)

B =

     1     1
     2     1
     1     2
     3     1
     2     1
     4     2
     1     3
     2     1

I don't see another way then looping through the data set, but it is rather straight forward. Maybe this is not the most elegant solution, but as far as I can see, it works fine.

function B = accum_data_set(A)
    prev = A(1);
    count = 1;
    B = [];
    for i=2:length(A)
        if (prev == A(i))
            count = count + 1;
        else
            B = [B;prev count];
            count = 1;
        end
        prev = A(i);
    end
    B = [B;prev count];

output:

>> A = [1 2 1 1 3 2 4 4 1 1 1 2]';
>> B = accum_data_set(A)

B =

     1     1
     2     1
     1     2
     3     1
     2     1
     4     2
     1     3
     2     1

回复收藏 0 原文

~没有更多了~