在 C 中将二维数组归零的最快方法?

发布于 2024-08-26 17:27:53 字数 358 浏览 11 评论 0原文

我想在 C 中重复将一个大的二维数组归零。这就是我目前所做的:

// Array of size n * m, where n may not equal m
for(j = 0; j < n; j++)
{
    for(i = 0; i < m; i++)
    {  
        array[i][j] = 0;
    }
}

我尝试使用 memset:

memset(array, 0, sizeof(array))

但这仅适用于一维数组。当我 printf 二维数组的内容时,第一行为零,但随后我得到了一堆随机大数字,然后它崩溃了。

I want to repeatedly zero a large 2d array in C. This is what I do at the moment:

// Array of size n * m, where n may not equal m
for(j = 0; j < n; j++)
{
    for(i = 0; i < m; i++)
    {  
        array[i][j] = 0;
    }
}

I've tried using memset:

memset(array, 0, sizeof(array))

But this only works for 1D arrays. When I printf the contents of the 2D array, the first row is zeroes, but then I got a load of random large numbers and it crashes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(13

怎会甘心 2024-09-02 17:27:53
memset(array, 0, sizeof(array[0][0]) * m * n);

其中 m 和 n 是二维数组的宽度和高度(在您的示例中,您有一个方形二维数组,因此 m == n)。

memset(array, 0, sizeof(array[0][0]) * m * n);

Where m and n are the width and height of the two-dimensional array (in your example, you have a square two-dimensional array, so m == n).

假装不在乎 2024-09-02 17:27:53

如果 array 确实是一个数组,那么您可以使用以下方法将其“归零”:

memset(array, 0, sizeof array);

但是您应该知道两点:

让我们做一个实验:

#include <stdio.h>

void f(int (*arr)[5])
{
    printf("f:    sizeof arr:       %zu\n", sizeof arr);
    printf("f:    sizeof arr[0]:    %zu\n", sizeof arr[0]);
    printf("f:    sizeof arr[0][0]: %zu\n", sizeof arr[0][0]);
}

int main(void)
{
    int arr[10][5];
    printf("main: sizeof arr:       %zu\n", sizeof arr);
    printf("main: sizeof arr[0]:    %zu\n", sizeof arr[0]);
    printf("main: sizeof arr[0][0]: %zu\n\n", sizeof arr[0][0]);
    f(arr);
    return 0;
}

在我的机器上,上面的打印结果是:

main: sizeof arr:       200
main: sizeof arr[0]:    20
main: sizeof arr[0][0]: 4

f:    sizeof arr:       8
f:    sizeof arr[0]:    20
f:    sizeof arr[0][0]: 4

尽管 arr 是一个数组,但当传递给 f() 时,它会衰减为指向其第一个元素的指针,因此 f() 中打印的尺寸是“错误的”。另外,在 f() 中,arr[0] 的大小是数组 arr[0] 的大小,这是一个“数组int 的 [5]”。它不是 int * 的大小,因为“衰减”仅发生在第一级,这就是为什么我们需要将 f() 声明为采用指向正确大小的数组的指针。

所以,正如我所说,只有满足上述两个条件,你原来所做的事情才会起作用。如果没有,您将需要按照其他人所说的进行操作:

memset(array, 0, m*n*sizeof array[0][0]);

最后,您发布的 memset()for 循环在严格意义上并不等效。可能(并且已经)存在某些编译器,其中“所有位为零”对于某些类型(例如指针和浮点值)不等于零。但我怀疑你是否需要担心这一点。

If array is truly an array, then you can "zero it out" with:

memset(array, 0, sizeof array);

But there are two points you should know:

  • this works only if array is really a "two-d array", i.e., was declared T array[M][N]; for some type T.
  • it works only in the scope where array was declared. If you pass it to a function, then the name array decays to a pointer, and sizeof will not give you the size of the array.

Let's do an experiment:

#include <stdio.h>

void f(int (*arr)[5])
{
    printf("f:    sizeof arr:       %zu\n", sizeof arr);
    printf("f:    sizeof arr[0]:    %zu\n", sizeof arr[0]);
    printf("f:    sizeof arr[0][0]: %zu\n", sizeof arr[0][0]);
}

int main(void)
{
    int arr[10][5];
    printf("main: sizeof arr:       %zu\n", sizeof arr);
    printf("main: sizeof arr[0]:    %zu\n", sizeof arr[0]);
    printf("main: sizeof arr[0][0]: %zu\n\n", sizeof arr[0][0]);
    f(arr);
    return 0;
}

On my machine, the above prints:

main: sizeof arr:       200
main: sizeof arr[0]:    20
main: sizeof arr[0][0]: 4

f:    sizeof arr:       8
f:    sizeof arr[0]:    20
f:    sizeof arr[0][0]: 4

Even though arr is an array, it decays to a pointer to its first element when passed to f(), and therefore the sizes printed in f() are "wrong". Also, in f() the size of arr[0] is the size of the array arr[0], which is an "array [5] of int". It is not the size of an int *, because the "decaying" only happens at the first level, and that is why we need to declare f() as taking a pointer to an array of the correct size.

So, as I said, what you were doing originally will work only if the two conditions above are satisfied. If not, you will need to do what others have said:

memset(array, 0, m*n*sizeof array[0][0]);

Finally, memset() and the for loop you posted are not equivalent in the strict sense. There could be (and have been) compilers where "all bits zero" does not equal zero for certain types, such as pointers and floating-point values. I doubt that you need to worry about that though.

酒儿 2024-09-02 17:27:53

好吧,最快的方法就是根本不做。

我知道听起来很奇怪,这里有一些伪代码:

int array [][];
bool array_is_empty;


void ClearArray ()
{
   array_is_empty = true;
}

int ReadValue (int x, int y)
{
   return array_is_empty ? 0 : array [x][y];
}

void SetValue (int x, int y, int value)
{
   if (array_is_empty)
   {
      memset (array, 0, number of byte the array uses);
      array_is_empty = false;
   }
   array [x][y] = value;
}

实际上,它仍然在清除数组,但只有当有东西被写入数组时。这在这里并不是一个很大的优势。但是,如果二维数组是使用四叉树(不是动态思维)或数据行集合来实现的,那么您可以本地化布尔标志的效果,但您需要更多标志。在四叉树中,只需为根节点设置空标志,在行数组中,只需为每行设置标志。

这就引出了一个问题“为什么要重复对大型二维数组进行归零”?数组是用来做什么的?有没有办法更改代码以使数组不需要归零?

例如,如果您有:

clear array
for each set of data
  for each element in data set
    array += element 

也就是说,将其用作累积缓冲区,那么像这样更改它将会无休止地提高性能:

 for set 0 and set 1
   for each element in each set
     array = element1 + element2

 for remaining data sets
   for each element in data set
     array += element 

这不需要清除数组,但仍然有效。这将比清除数组快得多。就像我说的,最快的方法就是一开始就不这样做。

Well, the fastest way to do it is to not do it at all.

Sounds odd I know, here's some pseudocode:

int array [][];
bool array_is_empty;


void ClearArray ()
{
   array_is_empty = true;
}

int ReadValue (int x, int y)
{
   return array_is_empty ? 0 : array [x][y];
}

void SetValue (int x, int y, int value)
{
   if (array_is_empty)
   {
      memset (array, 0, number of byte the array uses);
      array_is_empty = false;
   }
   array [x][y] = value;
}

Actually, it's still clearing the array, but only when something is being written to the array. This isn't a big advantage here. However, if the 2D array was implemented using, say, a quad tree (not a dynamic one mind), or a collection of rows of data, then you can localise the effect of the boolean flag, but you'd need more flags. In the quad tree just set the empty flag for the root node, in the array of rows just set the flag for each row.

Which leads to the question "why do you want to repeatedly zero a large 2d array"? What is the array used for? Is there a way to change the code so that the array doesn't need zeroing?

For example, if you had:

clear array
for each set of data
  for each element in data set
    array += element 

that is, use it for an accumulation buffer, then changing it like this would improve the performance no end:

 for set 0 and set 1
   for each element in each set
     array = element1 + element2

 for remaining data sets
   for each element in data set
     array += element 

This doesn't require the array to be cleared but still works. And that will be far faster than clearing the array. Like I said, the fastest way is to not do it in the first place.

韵柒 2024-09-02 17:27:53

如果您真的非常注重速度(而不是太注重可移植性),我认为绝对最快的方法是使用 SIMD 向量内在函数。例如,在 Intel CPU 上,您可以使用这些 SSE2 指令:

__m128i _mm_setzero_si128 ();                   // Create a quadword with a value of 0.
void _mm_storeu_si128 (__m128i *p, __m128i a);  // Write a quadword to the specified address.

每个存储指令将在一次命中中将四个 32 位整数设置为零。

p 必须是 16 字节对齐,但此限制也有利于速度,因为它有助于缓存。另一个限制是 p 必须指向 16 字节倍数的分配大小,但这也很酷,因为它允许我们轻松展开循环。

将其放入循环中,并展开循环几次,您将获得一个疯狂的快速初始化程序:

// Assumes int is 32-bits.
const int mr = roundUpToNearestMultiple(m, 4);      // This isn't the optimal modification of m and n, but done this way here for clarity.    
const int nr = roundUpToNearestMultiple(n, 4);    

int i = 0;
int array[mr][nr] __attribute__ ((aligned (16)));   // GCC directive.
__m128i* px = (__m128i*)array;
const int incr = s >> 2;                            // Unroll it 4 times.
const __m128i zero128 = _mm_setzero_si128();

for(i = 0; i < s; i += incr)
{
    _mm_storeu_si128(px++, zero128);
    _mm_storeu_si128(px++, zero128);
    _mm_storeu_si128(px++, zero128);
    _mm_storeu_si128(px++, zero128);
}

还有一个 _mm_storeu 的变体可以绕过缓存(即,将数组清零不会污染缓存),在某些情况下这可以给您带来一些次要的性能优势。

有关 SSE2 参考,请参阅此处:http://msdn.microsoft.com/en -us/library/kcwz153a(v=vs.80).aspx

If you are really, really obsessed with speed (and not so much with portability) I think the absolute fastest way to do this would be to use SIMD vector intrinsics. e.g. on Intel CPUs, you could use these SSE2 instructions:

__m128i _mm_setzero_si128 ();                   // Create a quadword with a value of 0.
void _mm_storeu_si128 (__m128i *p, __m128i a);  // Write a quadword to the specified address.

Each store instruction will set four 32-bit ints to zero in one hit.

p must be 16-byte aligned, but this restriction is also good for speed because it will help the cache. The other restriction is that p must point to an allocation size that is a multiple of 16-bytes, but this is cool too because it allows us to unroll the loop easily.

Have this in a loop, and unroll the loop a few times, and you will have a crazy fast initialiser:

// Assumes int is 32-bits.
const int mr = roundUpToNearestMultiple(m, 4);      // This isn't the optimal modification of m and n, but done this way here for clarity.    
const int nr = roundUpToNearestMultiple(n, 4);    

int i = 0;
int array[mr][nr] __attribute__ ((aligned (16)));   // GCC directive.
__m128i* px = (__m128i*)array;
const int incr = s >> 2;                            // Unroll it 4 times.
const __m128i zero128 = _mm_setzero_si128();

for(i = 0; i < s; i += incr)
{
    _mm_storeu_si128(px++, zero128);
    _mm_storeu_si128(px++, zero128);
    _mm_storeu_si128(px++, zero128);
    _mm_storeu_si128(px++, zero128);
}

There is also a variant of _mm_storeu that bypasses the cache (i.e. zeroing the array won't pollute the cache) which could give you some secondary performance benefits in some circumstances.

See here for SSE2 reference: http://msdn.microsoft.com/en-us/library/kcwz153a(v=vs.80).aspx

木緿 2024-09-02 17:27:53

如果使用 malloc 初始化数组,请改用 calloc;它会免费将您的阵列归零。 (显然与 memset 具有相同的性能,只是代码更少。)

If you initialize the array with malloc, use calloc instead; it will zero your array for free. (Same perf obviously as memset, just less code for you.)

冷默言语 2024-09-02 17:27:53

int array[N][M] = {0};

...至少在 GCC 4.8 中。

int array[N][M] = {0};

...at least in GCC 4.8.

独孤求败 2024-09-02 17:27:53

你的二维数组是如何声明的?

如果是这样的:

int arr[20][30];

您可以通过执行以下操作将其归零:

memset(arr, sizeof(int)*20*30);

How was your 2D array declared?

If it something like:

int arr[20][30];

You can zero it by doing:

memset(arr, sizeof(int)*20*30);
蓝咒 2024-09-02 17:27:53

使用 calloc 而不是 malloc 。 calloc 会将所有字段初始化为 0。

int *a = (int *)calloc(n,size of(int)) ;

//a的所有单元格都已初始化为0

Use calloc instead of malloc . calloc will initiate all fields to 0.

int *a = (int *)calloc(n,size of(int)) ;

//all cells of a have been initialized to 0

数理化全能战士 2024-09-02 17:27:53

我认为手动执行此操作的最快方法是遵循代码。您可以将它的速度与 memset 函数进行比较,但它不应该更慢。

(如果您的数组类型与 int 不同,请更改 ptr 和 ptr1 指针的类型)


#define SIZE_X 100
#define SIZE_Y 100

int *ptr, *ptr1;
ptr = &array[0][0];
ptr1 = ptr + SIZE_X*SIZE_Y*sizeof(array[0][0]);

while(ptr < ptr1)
{
    *ptr++ = 0;
}

I think that the fastest way to do it by hand is following code. You can compare it's speed to memset function, but it shouldn't be slower.

(change type of ptr and ptr1 pointers if your array type is different then int)


#define SIZE_X 100
#define SIZE_Y 100

int *ptr, *ptr1;
ptr = &array[0][0];
ptr1 = ptr + SIZE_X*SIZE_Y*sizeof(array[0][0]);

while(ptr < ptr1)
{
    *ptr++ = 0;
}

妖妓 2024-09-02 17:27:53
memset(array, 0, sizeof(int [n][n]));
memset(array, 0, sizeof(int [n][n]));
毁我热情 2024-09-02 17:27:53

你可以试试这个

int array[20,30] = {{0}};

You can try this

int array[20,30] = {{0}};
等待圉鍢 2024-09-02 17:27:53

我没有资格发表评论,但是……

我有点同意 Skizz 的回答,即“最快的方法是[通常]不这样做”,但有一个非常重要的警告。如果您要像这样过滤访问,您确实需要考虑浪费的每个周期。

例如,当 Skizz 有:

void ClearArray ()
{
    array_is_empty = true;
}

int ReadValue (int x, int y)
{
   return array_is_empty ? 0 : array [x][y];
}

我会指出条件 ?: 是一个不必要的分支。要摆脱这个问题,您可以使用值为01的整数array_is_empty,然后.. 。

int ReadValue (int x, int y)
{
   return array_is_empty * array [x][y];
}

这会删除条件,但会添加不必要的乘法,编译器不会对其进行优化,因为它不知道 array_is_empty 是否会具有 0 或以外的值 1

因此,我建议使用全 1 的 AND 掩码。这可以是有符号类型中的“-1”表示,也可以是任何简单类型中的~0(零,位翻转)。

int ReadValue (int x, int y)
{
   return array_is_empty & array [x][y];
}

这会产生一个更干净、更快的解决方案,没有不必要的分支预测,也没有缓慢的乘法运算。 AND 的速度非常快,如果您要对数组进行大量查找,则这一点非常重要。

我要添加的最后一个警告要严重得多:

当大多数人将数组归零时,他们期望每个元素都为零。使用“单标志”方法,设置单个元素将删除过滤器并公开所有未初始化的数据......这几乎从来不是用户所期望的。这可能会在以后引起很多头痛。

另一种方法(如 Skizz 所示)是在第一次写入时将数组清零。这消除了该方法的所有潜在好处,只留下缺点和性能损失。

因此,我只会在完全初始化或根本没有初始化的数组上使用它。而且,说实话,这严重限制了它的实用性。

可以有用的地方是数组在块中使用,例如行......特别是在稀疏占用的情况下。如果每一行或每一块都有这样的标志,那么它就成为一种非常有效的方法。第一次写入时的单个通用“memset”(参见 Skizz 的示例)会消除该方法的任何潜在好处,并且是一个很大的警钟,您应该在构建时将数组归零; )

另外,还有一点,但是,就我个人而言,我会避免使用“ClearArray”之类的方法名称...我谦虚地认为“MarkUninitialized”或“MarkUnused”可能看起来很难看,但可以避免混淆。

尽管如此,这一切都归功于斯基兹的回答。当有意义的事情去做时,它就非常值得去做。避免不必要的工作与稍后优化重构一样好; )

我很抱歉回答这个问题。我只是认为值得指出,考虑到数组访问期间的额外工作有多快失控......以及 memset-on-first-write 的问题

I don't have the reputation to comment, but...

I'd somewhat agree with Skizz's answer that "the fastest way to do it is [often] not to do it" but with a very important caveat. If you're going to filter access like this you really need to consider every cycle wasted.

For example, where Skizz has:

void ClearArray ()
{
    array_is_empty = true;
}

int ReadValue (int x, int y)
{
   return array_is_empty ? 0 : array [x][y];
}

I'd point out that the conditional ?: is an unnecessary branch. To get rid of this you could use an integer array_is_empty with a value of 0 or 1 and then ...

int ReadValue (int x, int y)
{
   return array_is_empty * array [x][y];
}

And this removes the conditional but adds an unnecessary multiplication which the compiler won't optimise away since it won't know if array_is_empty will ever have a value other than 0 or 1

So, instead I suggest using an AND mask of all 1's. This could be a '-1' representation in a signed type, or just a ~0 (Zero, bit-flipped) in any simple type.

int ReadValue (int x, int y)
{
   return array_is_empty & array [x][y];
}

This results in a MUCH cleaner faster solution with no unnecessary branch prediction and no slow multiplication operations. An AND is super fast and this will be important if you're performing this filtering over a large number of lookups into an array.

The final caveat I'd add is considerably more serious:

When most people zero an array, they expect each element to be zero. With a 'single-flag' approach, setting a single element will remove the filter and expose all the uninitialised data... which is almost never what the user expects. This can cause a lot of headaches later.

The alternative (as Skizz shows) is to zero the array on first write. This undoes every potential benefit of the approach and leaves ONLY downsides and performance loss.

So, I'd use this only on arrays that are either initialised in full, or not at all. And, to be honest, that severely limits its usefulness.

Where it can come in useful is where the array is used in blocks, such as lines ... particularly when sparsely occupied. If each line or block has such a flag then it becomes a very efficient approach. A single general 'memset' on first write (see Skizz's example) undoes any potential benefit of the approach and is a big alarm bell that you should just Zero the array on construction ; )

Also, a minor point, but, personally I'd avoid method names like "ClearArray"... I humbly submit that "MarkUninitialised" -or- "MarkUnused" may seem ugly, but avoids confusion.

Still, all credit to Skizz for his answer. When it makes sense to do, it is well worth doing. Unnecessary work avoided is as good as an optimising refactor later ; )

My apologies for making this an answer. I just thought it worth pointing out, given how rapidly additional work during array-access gets out of hand... and that issue with memset-on-first-write

青丝拂面 2024-09-02 17:27:53

发生这种情况是因为 sizeof(array) 为您提供了 array 指向的对象的分配大小。 (数组只是指向多维数组第一行的指针)。但是,您分配了 j 个大小为 i 的数组。因此,您需要将 sizeof(array) 返回的一行大小乘以您分配的行数,例如:

bzero(array, sizeof(array) * j);

另请注意,sizeof(array) 仅适用于静态分配的数组。对于动态分配的数组,您可以编写

size_t arrayByteSize = sizeof(int) * i * j; 
int *array = malloc(array2dByteSite);
bzero(array, arrayByteSize);

This happens because sizeof(array) gives you the allocation size of the object pointed to by array. (array is just a pointer to the first row of your multidimensional array). However, you allocated j arrays of size i. Consequently, you need to multiply the size of one row, which is returned by sizeof(array) with the number of rows you allocated, e.g.:

bzero(array, sizeof(array) * j);

Also note that sizeof(array) will only work for statically allocated arrays. For a dynamically allocated array you would write

size_t arrayByteSize = sizeof(int) * i * j; 
int *array = malloc(array2dByteSite);
bzero(array, arrayByteSize);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文