C - 我不想分配我不会使用的内存! (新手问题)

发布于 2024-11-08 20:41:14 字数 1924 浏览 0 评论 0原文

我已经编写了第一个 C 程序。它删除了 C 注释('//')。我将一个字符串传递给函数 strip_comments,创建一个与参数字符串大小相同的新字符串,然后我逐个字符地进行复制,忽略注释。

这是代码:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define IN 1
#define OUT 0

int file_size(FILE *file);
char * strip_comments(char *content);

int main(int argc, char *argv[])
{
    FILE *file;
    char *buffer, *content;
    int size;

    if (argc == 1)
    {
        printf("USAGE: stripccomments filename\n");
        return 1;
    }

    if ((file = fopen(argv[1], "rw")) == NULL)
    {
        printf("Could not open file '%s'.\n", argv[1]);
        return 1;
    }

    size = file_size(file);
    buffer = malloc(sizeof(char) * size);

    if (buffer == NULL)
    {
        printf("Could not allocate memory\n");
        return 1;
    }

    fread(buffer, sizeof(char), size, file);

    content = strip_comments(buffer);
    printf("%s", content);

    free(buffer);
    fclose(file);

    return 0;
}

int file_size(FILE *file)
{
    int size;

    fseek(file, 0, SEEK_END);
    size = ftell(file);
    rewind(file);

    return size;
}

char * strip_comments(char *content)
{
    int state, length, i, j;
    char *new_content;

    state = OUT;
    length = strlen(content);
    new_content = malloc(sizeof(char) * length);
    j = 0;

    for (i = 0; i < length; i++)
    {
        if (content[i] == '/' && content[i + 1] == '/')
        {
            state = IN;
            i++;
            continue;
        }

        if (state && content[i] == '\n')
        {
            state = OUT;
        }

        if (!state)
        {
            new_content[j] = content[i];
            j++;
        }
    }
    new_content[j + 1] = '\0';

    return new_content;
}

如果有一些注释,我将只使用分配的字符串的一些字节。我不想分配超出我将使用的数量。执行此操作并返回新字符串的最佳方法是什么?或者我应该修改作为参数传递的字符串?

更新:这个未使用的空间会发生什么?它是否仍处于“地狱边缘”?当执行结束时,这个空间会发生什么?

谢谢。

I've made my first C program. It strips C comments ('//'). I pass a string to the function strip_comments, create a new string with the same size as the argument string and then i do a copy, char by char, ignoring comments.

This is the code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define IN 1
#define OUT 0

int file_size(FILE *file);
char * strip_comments(char *content);

int main(int argc, char *argv[])
{
    FILE *file;
    char *buffer, *content;
    int size;

    if (argc == 1)
    {
        printf("USAGE: stripccomments filename\n");
        return 1;
    }

    if ((file = fopen(argv[1], "rw")) == NULL)
    {
        printf("Could not open file '%s'.\n", argv[1]);
        return 1;
    }

    size = file_size(file);
    buffer = malloc(sizeof(char) * size);

    if (buffer == NULL)
    {
        printf("Could not allocate memory\n");
        return 1;
    }

    fread(buffer, sizeof(char), size, file);

    content = strip_comments(buffer);
    printf("%s", content);

    free(buffer);
    fclose(file);

    return 0;
}

int file_size(FILE *file)
{
    int size;

    fseek(file, 0, SEEK_END);
    size = ftell(file);
    rewind(file);

    return size;
}

char * strip_comments(char *content)
{
    int state, length, i, j;
    char *new_content;

    state = OUT;
    length = strlen(content);
    new_content = malloc(sizeof(char) * length);
    j = 0;

    for (i = 0; i < length; i++)
    {
        if (content[i] == '/' && content[i + 1] == '/')
        {
            state = IN;
            i++;
            continue;
        }

        if (state && content[i] == '\n')
        {
            state = OUT;
        }

        if (!state)
        {
            new_content[j] = content[i];
            j++;
        }
    }
    new_content[j + 1] = '\0';

    return new_content;
}

If there are some comments, i will just use some bytes of the allocated string. I don't want to allocate more than i will use. What's the best approach to do that and return a new string? Or should I modify the string passed as parameter?

UPDATE: What happens with this unused space? Does it remain in the "limbo" ? And what happens with this space when execution reach the end?

Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

榆西 2024-11-15 20:41:14

这并不能解决您所说的问题,但我的方法是将其编写为一个与输入文件句柄和输出文件句柄一起使用的函数,即 void strip_comments(FILE *infile, FILE *outfile);< /代码>。无论如何,您实际上只是一次进行一个字符的解析,那么从 content[i++] 切换到 getc() 会有什么损失呢?如果您使用文件句柄而不是字符串来执行此操作,则 a) 不必在内存中存储可能很大的整个文件,b) 不必担心为缓冲区分配内存。

也就是说,如果您想用 char * 来代替,您总是可以解析字符串两次:一次计算要分配的字节数,然后一次复制所述字节。或者您可以简单地在最后调用 realloc 将缓冲区缩短到适当的大小。

This doesn't solve your problem as stated, but my approach would be to write this as a function that works with an input file handle and output file handle, i.e. void strip_comments(FILE *infile, FILE *outfile);. You're really only doing character-at-a-time parsing anyway, so what's to lose switching from content[i++] to getc()? If you do it with filehandles instead of strings, you a) don't have to store potentially large entire files in memory, and b) don't have to worry about allocating memory for a buffer.

That said, if you want to do it with char *s instead, you could always parse through the string twice: once to calculate how many bytes to allocate, then once to copy said bytes over. Or you could simply call realloc at the end to shorten your buffer down to the appropriate size.

或十年 2024-11-15 20:41:14

由于您的程序不知道需要多少内存来包含源代码,因此您可以从初始缓冲区大小开始,并根据需要增加它。另一种方法是先扫描文件并计算大小差异。这两种方法都会对性能产生影响,具体取决于传入的注释数量。多个 malloc's/realloc's 会降低性能,并且会读取整个文件两次,另一方面,您担心浪费内存,这是您的选择,或者您可以使用所有 3 个选项,设置默认值,然后实现命令行标志,让用户决定使用哪个选项(如果他们选择)。

另请记住在 strlen 中添加 +1 以解决空字符。如果文件不包含注释,您当前的实现将进入未malloced 的区域。

更新:

是的,浪费的空间将位于字符串的末尾,什么也不做,但一旦调用 free 就会被正确回收。例如,分配给 15 的内存块中 strlen 为 10 的字符串可能如下所示:

size of 10\0#%^@&
          ^^^garbage
          ^^null char

Since your program will not know how much memory is needed to contain the source you can start with an initial buffer size and increase it as necessary. Another way is to scan the file first and calculate the difference in sizes. Both of these approaches have performance implications depending on the amount of comments passed in. Multiple malloc's/realloc's will slow performance as well as reading through the entire file twice, and on the other hand your worried about wasting memory, it is your choice to decide or you can use all 3, setting a default and then implement command line flags to let the user decide which option if they so choose.

Also remember to add +1 to your strlen to account for the null character. Your current implementation will go into unmalloced territory if the file contains no comments.

For Your Update:

Yes the wasted space will be at the end of your string doing nothing but will be properly reclaimed once free is called. For instance a string with strlen of 10 in a block of memory allocated for 15 may look like this:

size of 10\0#%^@&
          ^^^garbage
          ^^null char
々眼睛长脚气 2024-11-15 20:41:14

我只能想到一种可能使您的分配更有效的方法(并不是我认为需要如此,老实说您现在所做的事情似乎相当合理,尤其是对于新的 C 程序员而言)。

我能想到的是分两遍浏览你的文件。在第一遍中,您可以计算需要分配的内存量。之后,您可以准确分配所需的内存量,然后在第二遍中进行实际的复制。

此外,您可能会受益于使用文件句柄进行此操作,而不是完全在内存中进行操作,这样您就不需要立即分配大量内存。

I can only think of one way that might make your allocation more efficient (not that I think it needs to be, honestly what you're doing now seems pretty reasonable, especially for a new C programmer).

What I can think of is to go through your file in two passes. In the first pass you can calculate the amount of memory that you will need to allocate. After this you can allocate exactly the amount of memory that you need and then in the second pass you do the actual copying.

Also you might benefit from taking a look at making this using file handles instead of doing it entirely in-memory so that you do not need to allocate large swaths of memory at once.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文