C 代码中的 SIGBART 与 malloc/free
我编写了这个小程序来查找较大字符串中所有出现的子字符串,或者干草堆中的针。当我在本地运行该程序时,它似乎工作得很好。然而,当我将其提交给在线竞赛进行评审时,它给出了 SIGBART 错误。我认为这是因为内存管理不善,所以我删除了 free()
函数调用,但随后我收到了 Time Limit Exceeded 错误(但 SIGBART 错误消失了) )。删除 free()
调用会减慢程序速度吗?我的程序有没有漏洞?
这是我正在谈论的比赛: 大海捞针
这是代码:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define RAW_INPUT_SIZE 10000
#define BOOL unsigned int
#define NO 0
#define YES 1
int main (int argc, char **argv)
{
int needleLength;
char *rawNeedle = (char *)malloc(RAW_INPUT_SIZE);
char *rawHaystack = (char *)malloc(RAW_INPUT_SIZE);
char *needle; // to be allocated later
char *haystack; // to be allocated later, but not deallocated
while (scanf("%i\n%s\n%s", &needleLength, rawNeedle, rawHaystack) != EOF)
{
needle = (char *)malloc(needleLength);
strncpy(needle, rawNeedle, needleLength);
haystack = strchr(rawHaystack, needle[0]);
int i = haystack - rawHaystack;
BOOL matchesFound = NO;
if (i + needleLength - 1 < strlen(rawHaystack))
{
while (haystack != NULL)
{
if (i + needleLength - 1 < strlen(rawHaystack))
{
char *substr = (char *)malloc(needleLength);
strncpy(substr, haystack, needleLength);
if (strcmp(needle, substr) == 0)
{
printf("%i\n", i);
matchesFound = YES;
}
free(substr);
substr = NULL;
}
haystack = strchr(haystack+1, needle[0]);
i = haystack - rawHaystack;
}
}
if (matchesFound == NO)
printf("\n");
free(needle);
needle = NULL;
}
free(rawNeedle);
free(rawHaystack);
rawNeedle = NULL;
rawHaystack = NULL;
return 0;
}
问题输入和输出规范的转录
输入
输入由许多测试用例组成。每个测试用例由三行组成,包含:
- 针的长度,
- 针本身,
- 干草堆。
针的长度仅受程序可用内存的限制,因此不要做任何假设 - 相反,读取长度并根据需要分配内存。干草堆的大小没有限制,这意味着您的程序不应一次读取整个干草堆。 KMP算法是基于流的,即它逐个字符地处理干草堆,所以这不是问题。
测试用例一个接一个地出现,每个测试用例占据三行,中间没有额外的空格或换行符。
输出
对于每个测试用例,您的程序应该输出大海捞针中出现的所有位置。如果找到匹配项,输出应包含匹配项的第一个字符的位置。大海捞针中的字符从零开始编号。
对于给定的测试用例,位置输出应按升序排序,并且每个位置应打印在单独的行中。对于两个不同的测试用例,位置应该用空行分隔。
I wrote this small program to find all occurrences of a substring in a larger string, or a needle in a haystack. When I run the program locally, it seems to work just fine. However, when I submit it to an online contest for judging, it gives a SIGBART error. I assumed it was because of poor memory-management, so I deleted the free()
function calls, but then I got a Time Limit Exceeded error (but the SIGBART error disappeared). Does removing the free()
calls slow the program? And are there any leaks in my program?
Here the contest I was talking about:
Needle in the Haystack
Here's the code:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define RAW_INPUT_SIZE 10000
#define BOOL unsigned int
#define NO 0
#define YES 1
int main (int argc, char **argv)
{
int needleLength;
char *rawNeedle = (char *)malloc(RAW_INPUT_SIZE);
char *rawHaystack = (char *)malloc(RAW_INPUT_SIZE);
char *needle; // to be allocated later
char *haystack; // to be allocated later, but not deallocated
while (scanf("%i\n%s\n%s", &needleLength, rawNeedle, rawHaystack) != EOF)
{
needle = (char *)malloc(needleLength);
strncpy(needle, rawNeedle, needleLength);
haystack = strchr(rawHaystack, needle[0]);
int i = haystack - rawHaystack;
BOOL matchesFound = NO;
if (i + needleLength - 1 < strlen(rawHaystack))
{
while (haystack != NULL)
{
if (i + needleLength - 1 < strlen(rawHaystack))
{
char *substr = (char *)malloc(needleLength);
strncpy(substr, haystack, needleLength);
if (strcmp(needle, substr) == 0)
{
printf("%i\n", i);
matchesFound = YES;
}
free(substr);
substr = NULL;
}
haystack = strchr(haystack+1, needle[0]);
i = haystack - rawHaystack;
}
}
if (matchesFound == NO)
printf("\n");
free(needle);
needle = NULL;
}
free(rawNeedle);
free(rawHaystack);
rawNeedle = NULL;
rawHaystack = NULL;
return 0;
}
Transcription of input and output specification from the question
Input
The input consists of a number of test cases. Each test case is composed of three lines, containing:
- the length of the needle,
- the needle itself,
- the haystack.
The length of the needle is only limited by the memory available to your program, so do not make any assumptions - instead, read the length and allocate memory as needed. The haystack is not limited in size, which implies that your program should not read the whole haystack at once. The KMP algorithm is stream-based, i.e. it processes the haystack character by character, so this is not a problem.
The test cases come one after another, each occupying three lines, with no additional space or line breaks in between.
Output
For each test case your program should output all positions of the needle's occurences within the haystack. If a match is found, the output should contain the position of the first character of the match. Characters in the haystack are numbered starting with zero.
For a given test case, the positions output should be sorted in ascending order, and each of these should be printed in a separate line. For two different test cases, the positions should be separated by an empty line.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
为什么要使用内存分配?如果规范中包含的最大针长度为 10,000,则只需使用本地数组:
直接读入这些数组;不要复制它们。
目前尚不清楚您的针长度是否包括尾随零值。因此,这没有分配足够的空间,也不能保证空终止,这两者都很容易导致 SIGABRT 问题。
在大海捞针上重复使用
strlen()
会让你的程序运行缓慢。您无需对每个针和干草堆执行多次strlen()
即可计算长度。除非保证数据中没有空格,否则您的
scanf()
代码读取的数据将低于您的预期。您应该始终检查是否获得了您期望的所有值。您应该查找函数
strstr()
。Why use any memory allocation? If the specification includes a maximum needle length of 10,000, simply use local arrays:
Read directly into these; don't copy them around.
It is not clear that your needle length includes the trailing null. Therefore, this does not allocate enough space and does not guarantee null termination, both of which can easily lead to SIGABRT problems.
Repeatedly using
strlen()
on your haystack will make your program run slowly. You can calculate the lengths without needing to dostrlen()
more than once on each of the needle and haystack.Unless you are guaranteed no blanks in the data, your
scanf()
code will read less than you expect. You should always check that you get all the values you expect.You should look up the function
strstr()
.我不确定这是否是问题的直接原因,但一个明显的问题是您没有正确使用
strncpy
。strncpy
不一定以 NUL 结尾。另外,您不检查
malloc
是否成功,也不检查strchr
是否成功。I'm not sure if it's the direct cause of your problem, but one obvious issue is that you're not using
strncpy
correct.strncpy
does not necessarily NUL-terminate.Also, you don't check if
malloc
succeeds, nor do you check ifstrchr
succeeds.您可能正在覆盖重要的内存。
您可以预计某些输入的长度会超过 10000 个字符。这意味着您正在使用未分配的内存,从而产生不可预测的后果。
更可预测的是 - 正如 Jonathan Leffler 已经提到的 - 你的
substr
通常不会以 0 结尾,因此strcmp
只能在substr< 的情况下返回 0 /code> 后面紧跟着一个 '\0' 字符,因此您可能会错过
haystack
中出现的needle
(并且可能会在过程中破坏 haystack)你的算法)。你的算法是朴素的算法(通过专门扫描
needle
的第一个字符来增强),这可能太慢了:您应该为每个测试用例
,与扫描针的交错SPOJ建议使用KMP算法,而不是无缘无故。使用 Boyer-Moore 算法是一个不错的选择,但会使处理跨块边界的匹配变得更加复杂。
You are probably overwriting crucial memory.
You can count on some inputs being longer than 10000
char
s. That means you're using unallocated memory, with unpredictable consequences.More predictable is that - as already mentioned by Jonathan Leffler - your
substr
will generally not be 0-terminated, so thestrcmp
can only return 0 if perchancesubstr
is immediately followed by a '\0' character, thus you're likely to miss occurrences ofneedle
inhaystack
(and likely to clobber haystack in the course of your algorithm).And your algorithm is the naive algorithm (somewhat enhanced by scanning specifically for the first character of
needle
), that is probably too slow:You should for each test case
SPOJ recommends using the KMP algorithm, not without reason. Using the Boyer-Moore algorithm is a good alternative, but makes handling matches that cross chunk boundaries more complicated.