C代码-内存访问/抢占

发布于 2024-11-18 19:12:46 字数 588 浏览 9 评论 0原文

我写了一段代码，其中有一个数据：

unsigned char buf[4096]; // data in chunks of size 4k
unsigned counter[256];

我将每 3 个连续字节的 i/p 数据相加并存储 ans。例如：温度[4096]；临时[0] = buf[0] + buf[1] + buf[2]; ...直到 4096

然后使用以下代码从 temp 的结果生成直方图：

for(i = 0; i < 4096; i++)
counter[temp[i]]++;

对直方图进行排序（冒泡排序），然后取前 8 个最常出现的值。代码在linux内核（2.6.35）中运行

我面临的问题是，如果我删除排序部分，执行代码所需的时间非常快（在我的笔记本电脑上为6微秒，使用gettimeofday func测量）。但引入排序后，这个过程大大减慢了（44微秒）。排序功能本身需要 20 微秒，我不明白为什么时间会增加这么多。我使用cachegrind进行了内存分析，结果是正常的，我什至尝试禁用抢占，但仍然没有显示出任何差异。如果有人可以在这里帮助我。谢谢！

原文

I have written a piece of code wherein a data:

unsigned char buf[4096]; // data in chunks of size 4k
unsigned counter[256];

I am adding up the i/p data for every 3 contiguous bytes and storing the ans.
ex: temp[4096]; temp[0] = buf[0] + buf[1] + buf[2]; ... till 4096

Then a histogram is generated from the results of temp using the code:

for(i = 0; i < 4096; i++)
counter[temp[i]]++;

The histogram is sorted (bubble sort) and then top 8 most recurring values are taken. The code is run in the linux kernel (2.6.35)

The problem I am facing is that if I remove the sorting part, the time taken to execute the code is very fast (6 microsec on my laptop, measured using gettimeofday func). But after introducing the sorting, the process slows down to a great extent (44 microsec). The sorting function itself takes 20 microsecs, I cant understand why is the time then increasing so much. I did a memory analysis using cachegrind, the results are normal and I even tried disabling preemption ubut still it doesnt show any difference. If anybody can help me out over here. Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小嗲 2024-11-25 19:12:46

冒泡排序很慢，它会比较和交换您的值最多 4096*4096 = 16,777,216 次。如果您只需要 8 个最佳值，那么 1 次扫描选择肯定会更快。类似的事情。

 const uint_t n = 8;
 uint_t best[n] = {0};
 uint_t index[n] = {0};
 uint_t j;

 for(uint_t i=0; i<4096; i++) {

   if(counter[i] > best[n-1]) {
     for(j=n-2; j && counter[i] > best[j]; j--);           /* Find the insertion position, as our value might be bigger than the value at position n-1. */
     memmove(&best [j+1], &best[j] , (n-1 -j) * sizeof best[0]);      /* Shift the values beyond j up 1  */
     memmove(&index[j+1], &index[j], (n-1 -j) * sizeof index[0]);
     best[j] = counter[i];                                 /* Put the current best value at the top */
     index[j] = i;                                         /* Store the index in the second array to know where the best value was. */
   }
 }

这样，您只需比较一次值，并且 memmove 的成本可以忽略不计，因为您的选择数组很小。
无需对数组进行排序，该算法的时间复杂度为 O(nm)，其中 n 是数组的大小，m 是选择的大小。最好的排序是 O((n.log2 n).m)。因此，如果 m 很小并且 n 很大，那么任何通用排序算法都无法击败它。

编辑：我添加了索引数组。

EDIT2：引入第二个来纠正我在第一个实例中遇到的基本错误。

EDIT3：注释：大小为 0 的 memmove 是允许的，并且基本上是一个 nop。

Bubble sort is slow, it compares and swaps your values up to 4096*4096 = 16,777,216 times. If you need only the 8 best values, a 1 sweep selection is certainly faster. Something like that.

 const uint_t n = 8;
 uint_t best[n] = {0};
 uint_t index[n] = {0};
 uint_t j;

 for(uint_t i=0; i<4096; i++) {

   if(counter[i] > best[n-1]) {
     for(j=n-2; j && counter[i] > best[j]; j--);           /* Find the insertion position, as our value might be bigger than the value at position n-1. */
     memmove(&best [j+1], &best[j] , (n-1 -j) * sizeof best[0]);      /* Shift the values beyond j up 1  */
     memmove(&index[j+1], &index[j], (n-1 -j) * sizeof index[0]);
     best[j] = counter[i];                                 /* Put the current best value at the top */
     index[j] = i;                                         /* Store the index in the second array to know where the best value was. */
   }
 }

With that, you compare your values only once and the cost of the memmove is negligible because your selection array is small.
No need to sort the array, this algo is O(nm) with n the size of your array and m the size of your selection. The best sort would be O((n.log2 n).m). So if m is small and n is big, it is unbeatable by any generic sort algorithm.

EDIT: I added the array for the index.

EDIT2: Introduced second to correct the fundamental bug I had in first instance.

EDIT3: Comment: memmove with size 0 is allowed and is basically a nop.

回复收藏 0 原文