这是 C 的一个好的子集吗?

发布于 2024-07-19 07:58:54 字数 2058 浏览 4 评论 0原文

另请参阅 C 分词器


这是我为 C 编写的一个快速 substr() (是的,变量初始化需要移动到函数的开头等,但你明白了)

我见过许多 substr() 的“智能”实现,它们很简单一个班轮调用 strncpy()!

它们都是错误的(strncpy 不保证 null 终止,因此调用可能不会产生正确的子字符串!)

这里有更好的东西吗?

把 bug 拿出来!

char* substr(const char* text, int nStartingPos, int nRun)
{
    char* emptyString = strdup(""); /* C'mon! This cannot fail */

    if(text == NULL) return emptyString;

    int textLen = strlen(text);

    --nStartingPos;

    if((nStartingPos < 0) || (nRun <= 0) || (textLen == 0) || (textLen < nStartingPos)) return emptyString;

    char* returnString = (char *)calloc((1 + nRun), sizeof(char));

    if(returnString == NULL) return emptyString;

    strncat(returnString, (nStartingPos + text), nRun);

    /* We do not need emptyString anymore from this point onwards */

    free(emptyString);
    emptyString = NULL;

    return returnString;
}


int main()
{
    const char *text = "-2--4--6-7-8-9-10-11-";

    char *p = substr(text, -1, 2);
    printf("[*]'%s' (\")\n",  ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 1, 2);
    printf("[*]'%s' (-2)\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 3, 2);
    printf("[*]'%s' (--)\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 16, 2);
    printf("[*]'%s' (10)\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 16, 20);
    printf("[*]'%s' (10-11-)\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 100, 2);
    printf("[*]'%s' (\")\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 1, 0);
    printf("[*]'%s' (\")\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    return 0;
}

输出:

[*]'' (")
[*]'-2' (-2)
[*]'--' (--)
[*]'10' (10)
[*]'10-11-' (10-11-)
[*]'' (")
[*]'' (")

See also C Tokenizer


Here is a quick substr() for C that I wrote (yes, the variable initializations needs to be moved to start of the function etc, but you get the idea)

I have seen many "smart" implementations of substr() that are simple one liner calls strncpy()!

They are all wrong (strncpy does not guarantee null termination and thus the call might NOT produce a correct substring!)

Here is something maybe better?

Bring out the bugs!

char* substr(const char* text, int nStartingPos, int nRun)
{
    char* emptyString = strdup(""); /* C'mon! This cannot fail */

    if(text == NULL) return emptyString;

    int textLen = strlen(text);

    --nStartingPos;

    if((nStartingPos < 0) || (nRun <= 0) || (textLen == 0) || (textLen < nStartingPos)) return emptyString;

    char* returnString = (char *)calloc((1 + nRun), sizeof(char));

    if(returnString == NULL) return emptyString;

    strncat(returnString, (nStartingPos + text), nRun);

    /* We do not need emptyString anymore from this point onwards */

    free(emptyString);
    emptyString = NULL;

    return returnString;
}


int main()
{
    const char *text = "-2--4--6-7-8-9-10-11-";

    char *p = substr(text, -1, 2);
    printf("[*]'%s' (\")\n",  ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 1, 2);
    printf("[*]'%s' (-2)\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 3, 2);
    printf("[*]'%s' (--)\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 16, 2);
    printf("[*]'%s' (10)\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 16, 20);
    printf("[*]'%s' (10-11-)\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 100, 2);
    printf("[*]'%s' (\")\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 1, 0);
    printf("[*]'%s' (\")\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    return 0;
}

Output :

[*]'' (")
[*]'-2' (-2)
[*]'--' (--)
[*]'10' (10)
[*]'10-11-' (10-11-)
[*]'' (")
[*]'' (")

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

凯凯我们等你回来 2024-07-26 07:58:54

对于本应简单的操作,您的功能似乎非常复杂。 一些问题是(并非所有这些都是错误):

  • strdup() 和其他内存分配函数可能失败,您应该考虑所有可能的问题。
  • 仅在需要时分配资源(在本例中为内存)。
  • 你应该能够区分错误和有效的刺痛。 目前,您不知道 malloc()substr("xxx",1,1) 失败,还是工作 substr("xxx" ,1,0) 生成一个空字符串。
  • 您不需要 calloc() 无论如何都会覆盖的内存。
  • 所有无效参数都应该导致错误或强制为有效参数(并且您的 API 应记录哪些参数)。
  • 释放本地空字符串后,无需将其设置为 NULL - 它会在函数返回时丢失。
  • 您不需要使用 strncat() - 您应该在进行任何复制之前知道可用的大小和内存,以便您可以(最有可能)更快地使用memcpy()
  • 你使用base-1而不是base-0作为字符串偏移量违背了C的原则。

以下部分是我要做的(我更喜欢Python的负值习惯用法从字符串末尾开始计数但我保留了长度而不是结束位置)。

char *substr (const char *inpStr, int startPos, int strLen) {
    /* Cannot do anything with NULL. */

    if (inpStr == NULL) return NULL;

    /* All negative positions to go from end, and cannot
       start before start of string, force to start. */

    if (startPos < 0)
        startPos = strlen (inpStr) + startPos;
    if (startPos < 0)
        startPos = 0;

    /* Force negative lengths to zero and cannot
       start after end of string, force to end. */

    if (strLen < 0)
        strLen = 0;
    if (startPos >strlen (inpStr))
        startPos = strlen (inpStr);

    /* Adjust length if source string too short. */

    if (strLen > strlen (&inpStr[startPos]))
        strLen = strlen (&inpStr[startPos]);

    /* Get long enough string from heap, return NULL if no go. */

    if ((buff = malloc (strLen + 1)) == NULL)
        return NULL;

    /* Transfer string section and return it. */

    memcpy (buff, &(inpStr[startPos]), strLen);
    buff[strLen] = '\0';

    return buff;
}

Your function seems very complicated for what should be a simple operation. Some problems are (not all of these are bugs):

  • strdup(), and other memory allocation functions, can fail, you should allow for all possible issues.
  • only allocate resources (memory in this case) if and when you need it.
  • you should be able to distinguish between errors and valid stings. At the moment, you don't know whether malloc() failure of substr ("xxx",1,1) or a working substr ("xxx",1,0) produces an empty string.
  • you don't need to calloc() memory that you're going to overwrite anyway.
  • all invalid parameters should either cause an error or be coerced to a valid parameter (and your API should document which).
  • you don't need to set the local emptyString to NULL after freeing it - it will be lost on function return.
  • you don't need to usr strncat() - you should know the sizes and the memory you have available before doing any copying so you can use the (most likely) faster memcpy().
  • you're use of base-1 rather than base-0 for string offsets goes against the grain of C.

The following segment is what I'd do (I rather like the Python idiom of negative values to count from the end of the string but I've kept length rather than end position).

char *substr (const char *inpStr, int startPos, int strLen) {
    /* Cannot do anything with NULL. */

    if (inpStr == NULL) return NULL;

    /* All negative positions to go from end, and cannot
       start before start of string, force to start. */

    if (startPos < 0)
        startPos = strlen (inpStr) + startPos;
    if (startPos < 0)
        startPos = 0;

    /* Force negative lengths to zero and cannot
       start after end of string, force to end. */

    if (strLen < 0)
        strLen = 0;
    if (startPos >strlen (inpStr))
        startPos = strlen (inpStr);

    /* Adjust length if source string too short. */

    if (strLen > strlen (&inpStr[startPos]))
        strLen = strlen (&inpStr[startPos]);

    /* Get long enough string from heap, return NULL if no go. */

    if ((buff = malloc (strLen + 1)) == NULL)
        return NULL;

    /* Transfer string section and return it. */

    memcpy (buff, &(inpStr[startPos]), strLen);
    buff[strLen] = '\0';

    return buff;
}
深爱成瘾 2024-07-26 07:58:54

如果输入无效,我会说返回 NULL,而不是使用 malloc() 编辑的空字符串。 这样您就可以使用 if(p) 而不是 if(*p == 0) 来测试函数是否失败。

另外,我认为你的函数会泄漏内存,因为 emptyString 仅在一个条件中使用 free()d 。 您应该确保无条件free()它,即在return之前。

至于您对 strncpy() 不以 NUL 终止字符串的评论(这是正确的),如果您使用 calloc() 来分配字符串而不是 malloc (),如果您分配的字节比复制的多一个字节,这不会成为问题,因为 calloc() 自动将所有值(包括本例中的末尾)设置为0.

我会给你更多的注释,但我讨厌阅读驼峰式代码。 并不是说这有什么问题。

编辑:关于您的更新:

请注意,C 标准将 sizeof(char) 定义为 1,无论您的系统如何。 如果您使用的计算机在一个字节中使用 9 位(上帝禁止),则 sizeof(char) 仍将是 1。并不是说 sizeof(char) 有什么问题) - 它清楚地表明了您的意图,并为其他类型的 calloc()malloc() 调用提供了对称性。 但 sizeof(int) 实际上很有用(int 在 16 位和 32 位以及这些新奇的 64 位计算机上可以有不同的大小)。 你懂得越多。

我还想重申,与大多数其他 C 代码的一致性是在出现错误时返回 NULL,而不是 ""。 我知道许多函数(例如 strcmp())如果将 NULL 传递给它们,可能会做坏事 - 这是可以预料的。 但是 C 标准库(以及许多其他 C API)采用的方法是“调用者有责任检查 NULL,而不是函数有责任照顾他/她(如果他/她不这样做)” ”。 如果您想以另一种方式进行操作,那也很酷,但它违背了 C 接口设计中更强大的趋势之一。

另外,我会使用 strncpy() (或 memcpy())而不是 strncat()。 使用 strncat() (和 strcat())会掩盖您的意图 - 它会让查看您代码的人认为您想要添加到字符串末尾(您确实这样做了) ,因为在calloc()之后,结束就是开始),当你想要做的是设置字符串时。 strncat() 让它看起来像是您正在添加到一个字符串,而 strcpy() (或其他复制例程)会让它看起来更像您的意图。 以下三行在此上下文中都执行相同的操作 - 选择您认为看起来最好的一行:

strncat(returnString, text + nStartingPos, nRun);

strncpy(returnString, text + nStartingPos, nRun);

memcpy(returnString, text + nStartingPos, nRun);

另外,strncpy()memcpy() 可能是(wee一点点)比 strncat() 更快/更高效。

text + nStartingPosnStartingPos + text 相同 - 我会将 char * 放在第一位,因为我认为这样更清楚,但无论您的顺序如何想要把它们放进去取决于你。 此外,它们周围的括号是不必要的(但很好),因为 + 的优先级高于 ,

编辑2:三行代码不做同样的事情,但在这种情况下它们都会产生相同的结果。 谢谢你让我知道这一点。

I would say return NULL if the input isn't valid rather than a malloc()ed empty string. That way you can test whether or not the function failed or not with if(p) rather than if(*p == 0).

Also, I think your function leaks memory because emptyString is only free()d in one conditional. You should make sure you free() it unconditionally, i.e. right before the return.

As to your comment on strncpy() not NUL-terminating the string (which is true), if you use calloc() to allocate the string rather than malloc(), this won't be a problem if you allocate one byte more than you copy, since calloc() automatically sets all values (including, in this case, the end) to 0.

I would give you more notes but I hate reading camelCase code. Not that there's anything wrong with it.

EDIT: With regards to your updates:

Be aware that the C standard defines sizeof(char) to be 1 regardless of your system. If you're using a computer that uses 9 bits in a byte (god forbid), sizeof(char) is still going to be 1. Not that there's anything wrong with saying sizeof(char) - it clearly shows your intention and provides symmetry with calls to calloc() or malloc() for other types. But sizeof(int) is actually useful (ints can be different sizes on 16- and 32- and these newfangled 64-bit computers). The more you know.

I'd also like to reiterate that consistency with most other C code is to return NULL on an error rather than "". I know many functions (like strcmp()) will probably do bad things if you pass them NULL - this is to be expected. But the C standard library (and many other C APIs) take the approach of "It's the caller's responsibility to check for NULL, not the function's responsibility to baby him/her if (s)he doesn't." If you want to do it the other way, that's cool, but it's going against one of the stronger trends in C interface design.

Also, I would use strncpy() (or memcpy()) rather than strncat(). Using strncat() (and strcat()) obscures your intent - it makes someone looking at your code think you want to add to the end of the string (which you do, because after calloc(), the end is the beginning), when what you want to do is set the string. strncat() makes it look like you're adding to a string, while strcpy() (or another copy routine) would make it look more like what your intent is. The following three lines all do the same thing in this context - pick whichever one you think looks nicest:

strncat(returnString, text + nStartingPos, nRun);

strncpy(returnString, text + nStartingPos, nRun);

memcpy(returnString, text + nStartingPos, nRun);

Plus, strncpy() and memcpy() will probably be a (wee little) bit faster/more efficient than strncat().

text + nStartingPos is the same as nStartingPos + text - I would put the char * first, as I think that's clearer, but whatever order you want to put them in is up to you. Also, the parenthesis around them are unnecessary (but nice), since + has higher precedence than ,.

EDIT 2: The three lines of code don't do the same thing, but in this context they will all produce the same result. Thanks for catching me on that.

墨落成白 2024-07-26 07:58:54
char* emptyString = strdup(""); /* C'mon! This cannot fail? */

您需要检查是否为空。 请记住,它仍然必须为空字符分配 1 个字节。

char* emptyString = strdup(""); /* C'mon! This cannot fail? */

You need to check for null. Remember that it still must allocate 1 byte for the null character.

风吹雨成花 2024-07-26 07:58:54

strdup 可能会失败(尽管这种情况不太可能发生,也不值得检查,恕我直言)。 然而它确实有另一个问题——它不是标准 C 函数。 最好使用malloc。

strdup could fail (though it is very unlikely and not worth checking for, IMHO). It does have another problem however - it is not a Standard C function. It would be better to use malloc.

鱼忆七猫命九 2024-07-26 07:58:54

您还可以使用 memmove 函数返回从开始到长度的子字符串。
改进/添加 paxdiablo 解决方案中的另一个解决方案:

    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>

    char *splitstr(char *idata, int start, int slen) {
            char ret[150];
            if(slen == NULL) {
                    slen=strlen(idata)-start;
            }
            memmove (ret,idata+start,slen);
            return ret;
    }

    /*
    Usage:
            char ostr[]="Hello World!";
            char *ores=splitstr(ostr, 0, 5);
            Outputs:
                    Hello
    */

希望有帮助。 使用 TCC C 编译器在 Windows 7 Home Premium 上进行测试。

You can also use the memmove function to return a substring from start to length.
Improving/adding another solution from paxdiablo's solution:

    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>

    char *splitstr(char *idata, int start, int slen) {
            char ret[150];
            if(slen == NULL) {
                    slen=strlen(idata)-start;
            }
            memmove (ret,idata+start,slen);
            return ret;
    }

    /*
    Usage:
            char ostr[]="Hello World!";
            char *ores=splitstr(ostr, 0, 5);
            Outputs:
                    Hello
    */

Hope it helps. Tested on Windows 7 Home Premium with TCC C Compilier.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文