C 代码中的 SIGBART 与 malloc/free

发布于 2025-01-07 21:50:34 字数 2444 浏览 1 评论 0原文

我编写了这个小程序来查找较大字符串中所有出现的子字符串,或者干草堆中的。当我在本地运行该程序时,它似乎工作得很好。然而,当我将其提交给在线竞赛进行评审时,它给出了 SIGBART 错误。我认为这是因为内存管理不善,所以我删除了 free() 函数调用,但随后我收到了 Time Limit Exceeded 错误(但 SIGBART 错误消失了) )。删除 free() 调用会减慢程序速度吗?我的程序有没有漏洞?

这是我正在谈论的比赛: 大海捞针

这是代码:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define RAW_INPUT_SIZE 10000

#define BOOL unsigned int
#define NO 0
#define YES 1

int main (int argc, char **argv)
{
   int needleLength;
   char *rawNeedle = (char *)malloc(RAW_INPUT_SIZE);
   char *rawHaystack = (char *)malloc(RAW_INPUT_SIZE);
   char *needle; // to be allocated later
   char *haystack; // to be allocated later, but not deallocated
   while (scanf("%i\n%s\n%s", &needleLength, rawNeedle, rawHaystack) != EOF)
   {
      needle = (char *)malloc(needleLength);
      strncpy(needle, rawNeedle, needleLength);
      haystack = strchr(rawHaystack, needle[0]);
      int i = haystack - rawHaystack;
      BOOL matchesFound = NO;
      if (i + needleLength - 1 < strlen(rawHaystack))
      {
         while (haystack != NULL)
         {
            if (i + needleLength - 1 < strlen(rawHaystack))
            {
               char *substr = (char *)malloc(needleLength);
               strncpy(substr, haystack, needleLength);
               if (strcmp(needle, substr) == 0)
               {
                  printf("%i\n", i);
                  matchesFound = YES;
               }
               free(substr);
               substr = NULL;
            }
            haystack = strchr(haystack+1, needle[0]);
            i = haystack - rawHaystack;
         }
      }
      if (matchesFound == NO)
         printf("\n");
      free(needle);
      needle = NULL;
   }
   free(rawNeedle);
   free(rawHaystack);
   rawNeedle = NULL;
   rawHaystack = NULL;
   return 0;
}

问题输入和输出规范的转录

输入

输入由许多测试用例组成。每个测试用例由三行组成,包含:

  • 针的长度,
  • 针本身,
  • 干草堆。

针的长度仅受程序可用内存的限制,因此不要做任何假设 - 相反,读取长度并根据需要分配内存。干草堆的大小没有限制,这意味着您的程序不应一次读取整个干草堆。 KMP算法是基于流的,即它逐个字符地处理干草堆,所以这不是问题。

测试用例一个接一个地出现,每个测试用例占据三行,中间没有额外的空格或换行符。

输出

对于每个测试用例,您的程序应该输出大海捞针中出现的所有位置。如果找到匹配项,输出应包含匹配项的第一个字符的位置。大海捞针中的字符从零开始编号。

对于给定的测试用例,位置输出应按升序排序,并且每个位置应打印在单独的行中。对于两个不同的测试用例,位置应该用空行分隔。

I wrote this small program to find all occurrences of a substring in a larger string, or a needle in a haystack. When I run the program locally, it seems to work just fine. However, when I submit it to an online contest for judging, it gives a SIGBART error. I assumed it was because of poor memory-management, so I deleted the free() function calls, but then I got a Time Limit Exceeded error (but the SIGBART error disappeared). Does removing the free() calls slow the program? And are there any leaks in my program?

Here the contest I was talking about:
Needle in the Haystack

Here's the code:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define RAW_INPUT_SIZE 10000

#define BOOL unsigned int
#define NO 0
#define YES 1

int main (int argc, char **argv)
{
   int needleLength;
   char *rawNeedle = (char *)malloc(RAW_INPUT_SIZE);
   char *rawHaystack = (char *)malloc(RAW_INPUT_SIZE);
   char *needle; // to be allocated later
   char *haystack; // to be allocated later, but not deallocated
   while (scanf("%i\n%s\n%s", &needleLength, rawNeedle, rawHaystack) != EOF)
   {
      needle = (char *)malloc(needleLength);
      strncpy(needle, rawNeedle, needleLength);
      haystack = strchr(rawHaystack, needle[0]);
      int i = haystack - rawHaystack;
      BOOL matchesFound = NO;
      if (i + needleLength - 1 < strlen(rawHaystack))
      {
         while (haystack != NULL)
         {
            if (i + needleLength - 1 < strlen(rawHaystack))
            {
               char *substr = (char *)malloc(needleLength);
               strncpy(substr, haystack, needleLength);
               if (strcmp(needle, substr) == 0)
               {
                  printf("%i\n", i);
                  matchesFound = YES;
               }
               free(substr);
               substr = NULL;
            }
            haystack = strchr(haystack+1, needle[0]);
            i = haystack - rawHaystack;
         }
      }
      if (matchesFound == NO)
         printf("\n");
      free(needle);
      needle = NULL;
   }
   free(rawNeedle);
   free(rawHaystack);
   rawNeedle = NULL;
   rawHaystack = NULL;
   return 0;
}

Transcription of input and output specification from the question

Input

The input consists of a number of test cases. Each test case is composed of three lines, containing:

  • the length of the needle,
  • the needle itself,
  • the haystack.

The length of the needle is only limited by the memory available to your program, so do not make any assumptions - instead, read the length and allocate memory as needed. The haystack is not limited in size, which implies that your program should not read the whole haystack at once. The KMP algorithm is stream-based, i.e. it processes the haystack character by character, so this is not a problem.

The test cases come one after another, each occupying three lines, with no additional space or line breaks in between.

Output

For each test case your program should output all positions of the needle's occurences within the haystack. If a match is found, the output should contain the position of the first character of the match. Characters in the haystack are numbered starting with zero.

For a given test case, the positions output should be sorted in ascending order, and each of these should be printed in a separate line. For two different test cases, the positions should be separated by an empty line.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

沙与沫 2025-01-14 21:50:34

为什么要使用内存分配?如果规范中包含的最大针长度为 10,000,则只需使用本地数组:

char needle[RAW_INPUT_SIZE];
char haystack[RAW_INPUT_SIZE];

直接读入这些数组;不要复制它们。

char *substr = (char *)malloc(needleLength);
strncpy(substr, haystack, needleLength);
if (strcmp(needle, substr) == 0)

目前尚不清楚您的针长度是否包括尾随零值。因此,这没有分配足够的空间,也不能保证空终止,这两者都很容易导致 SIGABRT 问题。

在大海捞针上重复使用 strlen() 会让你的程序运行缓慢。您无需对每个针和干草堆执行多次 strlen() 即可计算长度。

除非保证数据中没有空格,否则您的 scanf() 代码读取的数据将低于您的预期。您应该始终检查是否获得了您期望的所有值。

您应该查找函数 strstr()

Why use any memory allocation? If the specification includes a maximum needle length of 10,000, simply use local arrays:

char needle[RAW_INPUT_SIZE];
char haystack[RAW_INPUT_SIZE];

Read directly into these; don't copy them around.

char *substr = (char *)malloc(needleLength);
strncpy(substr, haystack, needleLength);
if (strcmp(needle, substr) == 0)

It is not clear that your needle length includes the trailing null. Therefore, this does not allocate enough space and does not guarantee null termination, both of which can easily lead to SIGABRT problems.

Repeatedly using strlen() on your haystack will make your program run slowly. You can calculate the lengths without needing to do strlen() more than once on each of the needle and haystack.

Unless you are guaranteed no blanks in the data, your scanf() code will read less than you expect. You should always check that you get all the values you expect.

You should look up the function strstr().

若能看破又如何 2025-01-14 21:50:34

我不确定这是否是问题的直接原因,但一个明显的问题是您没有正确使用 strncpystrncpy 不一定以 NUL 结尾。

另外,您不检查 malloc 是否成功,也不检查 strchr 是否成功。

I'm not sure if it's the direct cause of your problem, but one obvious issue is that you're not using strncpy correct. strncpy does not necessarily NUL-terminate.

Also, you don't check if malloc succeeds, nor do you check if strchr succeeds.

如痴如狂 2025-01-14 21:50:34

您可能正在覆盖重要的内存。

针的长度仅受程序可用内存的限制,因此不要做任何假设 - 相反,读取长度并根据需要分配内存。干草堆的大小没有限制,这意味着您的程序不应一次读取整个干草堆。 KMP算法是基于流的,即它逐个字符地处理干草堆,所以这不是问题。

您可以预计某些输入的长度会超过 10000 个字符。这意味着您正在使用未分配的内存,从而产生不可预测的后果。

更可预测的是 - 正如 Jonathan Leffler 已经提到的 - 你的 substr 通常不会以 0 结尾,因此 strcmp 只能在 substr< 的情况下返回 0 /code> 后面紧跟着一个 '\0' 字符,因此您可能会错过 haystack 中出现的 needle (并且可能会在过程中破坏 haystack)你的算法)。

你的算法是朴素的算法(通过专门扫描needle的第一个字符来增强),这可能太慢了:

但是,简单的方法可能会超出时间限制,而其他算法则更复杂......选择权在您。

您应该为每个测试用例

  1. 读取针的长度,
  2. 为针(包括0终止符)分配足够的空间,
  3. 读取分块读取的针
  4. 在大海捞针中

,与扫描针的交错SPOJ建议使用KMP算法,而不是无缘无故。使用 Boyer-Moore 算法是一个不错的选择,但会使处理跨块边界的匹配变得更加复杂。

You are probably overwriting crucial memory.

The length of the needle is only limited by the memory available to your program, so do not make any assumptions - instead, read the length and allocate memory as needed. The haystack is not limited in size, which implies that your program should not read the whole haystack at once. The KMP algorithm is stream-based, i.e. it processes the haystack character by character, so this is not a problem.

You can count on some inputs being longer than 10000 chars. That means you're using unallocated memory, with unpredictable consequences.

More predictable is that - as already mentioned by Jonathan Leffler - your substr will generally not be 0-terminated, so the strcmp can only return 0 if perchance substr is immediately followed by a '\0' character, thus you're likely to miss occurrences of needle in haystack (and likely to clobber haystack in the course of your algorithm).

And your algorithm is the naive algorithm (somewhat enhanced by scanning specifically for the first character of needle), that is probably too slow:

However, a naive approach will probably exceed the time limit, whereas other algorithms are more complicated... The choice is yours.

You should for each test case

  1. read the length of the needle
  2. allocate enough space for the needle (including the 0-terminator)
  3. read in the needle
  4. read in the haystack in chunks, interleaved with scanning for the needle

SPOJ recommends using the KMP algorithm, not without reason. Using the Boyer-Moore algorithm is a good alternative, but makes handling matches that cross chunk boundaries more complicated.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文