C中strtok和strsep有什么区别

发布于 2024-12-01 12:37:21 字数 102 浏览 2 评论 0原文

有人可以解释一下 strtok()strsep() 之间有什么区别吗? 它们有什么优点和缺点? 为什么我会选择其中一个而不是另一个。

Could someone explain me what differences there are between strtok() and strsep()?
What are the advantages and disadvantages of them?
And why would I pick one over the other one.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

寄意 2024-12-08 12:37:21

strtok()strsep()strtok() 是标准化的(由 C 标准,因此也由 POSIX),但是 strsep() 未标准化(由 C或 POSIX; GNU C 库,起源于 BSD)。因此,可移植代码更有可能使用 strtok() 而不是 strsep()

另一个区别是,对不同字符串的 strsep() 函数的调用可以交错,而使用 strtok() 则无法做到这一点(尽管可以使用 strtok_r ())。因此,在库中使用 strsep() 不会意外破坏其他代码,而在库函数中使用 strtok() 必须记录下来,因为其他代码使用 strtok() 不能同时调用库函数。

strsep() 的手册页位于 kernel.org 说:

引入 strsep() 函数作为 strtok(3) 的替代品,因为后者无法处理空字段。

因此,另一个主要区别是 George Gaál 在他的回答中强调的区别; strtok() 允许单个标记之间有多个分隔符,而 strsep() 期望标记之间有单个分隔符,并将相邻分隔符解释为空标记。

strsep()strtok() 都会修改其输入字符串,并且都不能让您识别标记标记结尾的分隔符(因为两者都写入 NUL ' \0' 位于标记末尾之后的分隔符上)。

何时使用它们?

  • 当您想要空标记而不是允许标记之间存在多个分隔符,并且您不介意可移植性时,您可以使用 strsep() 。
  • 当您希望在标记之间允许多个分隔符并且不希望空标记时(并且 POSIX 对您来说具有足够的可移植性),您可以使用 strtok_r()
  • 如果您不这样做,只有当有人威胁您的生命时您才会使用 strtok()。而且你只会使用它足够长的时间来让你摆脱危及生命的情况;然后你会再次放弃对它的所有使用。它有毒;不要使用它。编写自己的 strtok_r()strsep() 比使用 strtok() 更好。

为什么strtok()有毒?

如果在库函数中使用 strtok() 函数,则该函数是有毒的。如果您的库函数使用 strtok(),则必须清楚地记录它。

这是因为:

  1. 如果任何调用函数正在使用 strtok() 并调用也使用 strtok() 的函数,则会破坏该调用函数。
  2. 如果您的函数调用任何调用 strtok() 的函数,则会破坏您的函数对 strtok() 的使用。
  3. 如果您的程序是多线程的,则在任何给定时间最多有一个线程可以在一系列 strtok() 调用中使用 strtok()

此问题的根源在于调用之间保存的状态,该状态允许 strtok() 从中断处继续。除了“不要使用 strtok()”之外,没有其他明智的方法可以解决该问题。

  • 如果可用,您可以使用 strsep()
  • 您可以使用 POSIX 的 strtok_r() 如果它是可用的。
  • 您可以使用 Microsoft 的 strtok_s(),如果它是可用的。
  • 名义上,您可以使用 ISO/IEC 9899:2011 附件 K.3.7.3.1 函数 strtok_s(),但其接口与 strtok_r() 和 Microsoft 的 strtok_r() 不同。代码>strtok_s()。

BSD strsep()

char *strsep(char **stringp, const char *delim);

POSIX strtok_r()

char *strtok_r(char *restrict s, const char *restrict sep, char **restrict state);

Microsoft strtok_s()

char *strtok_s(char *strToken, const char *strDelimit, char **context);

附件 K strtok_s()

char *strtok_s(char * restrict s1, rsize_t * restrict s1max,
               const char * restrict s2, char ** restrict ptr);

注意它有 4 个参数,而不是 strtok() 上其他两个变体中的 3 个参数。

One major difference between strtok() and strsep() is that strtok() is standardized (by the C standard, and hence also by POSIX) but strsep() is not standardized (by C or POSIX; it is available in the GNU C Library, and originated on BSD). Thus, portable code is more likely to use strtok() than strsep().

Another difference is that calls to the strsep() function on different strings can be interleaved, whereas you cannot do that with strtok() (though you can with strtok_r()). So, using strsep() in a library doesn't break other code accidentally, whereas using strtok() in a library function must be documented because other code using strtok() at the same time cannot call the library function.

The manual page for strsep() at kernel.org says:

The strsep() function was introduced as a replacement for strtok(3), since the latter cannot handle empty fields.

Thus, the other major difference is the one highlighted by George Gaál in his answer; strtok() permits multiple delimiters between a single token, whereas strsep() expects a single delimiter between tokens, and interprets adjacent delimiters as an empty token.

Both strsep() and strtok() modify their input strings and neither lets you identify which delimiter character marked the end of the token (because both write a NUL '\0' over the separator after the end of the token).

When to use them?

  • You would use strsep() when you want empty tokens rather than allowing multiple delimiters between tokens, and when you don't mind about portability.
  • You would use strtok_r() when you want to allow multiple delimiters between tokens and you don't want empty tokens (and POSIX is sufficiently portable for you).
  • You would only use strtok() when someone threatens your life if you don't do so. And you'd only use it for long enough to get you out of the life-threatening situation; you would then abandon all use of it once more. It is poisonous; do not use it. It would be better to write your own strtok_r() or strsep() than to use strtok().

Why is strtok() poisonous?

The strtok() function is poisonous if used in a library function. If your library function uses strtok(), it must be documented clearly.

That's because:

  1. If any calling function is using strtok() and calls your function that also uses strtok(), you break the calling function.
  2. If your function calls any function that calls strtok(), that will break your function's use of strtok().
  3. If your program is multithreaded, at most one thread can be using strtok() at any given time — across a sequence of strtok() calls.

The root of this problem is the saved state between calls that allows strtok() to continue where it left off. There is no sensible way to fix the problem other than "do not use strtok()".

  • You can use strsep() if it is available.
  • You can use POSIX's strtok_r() if it is available.
  • You can use Microsoft's strtok_s() if it is available.
  • Nominally, you could use the ISO/IEC 9899:2011 Annex K.3.7.3.1 function strtok_s(), but its interface is different from both strtok_r() and Microsoft's strtok_s().

BSD strsep():

char *strsep(char **stringp, const char *delim);

POSIX strtok_r():

char *strtok_r(char *restrict s, const char *restrict sep, char **restrict state);

Microsoft strtok_s():

char *strtok_s(char *strToken, const char *strDelimit, char **context);

Annex K strtok_s():

char *strtok_s(char * restrict s1, rsize_t * restrict s1max,
               const char * restrict s2, char ** restrict ptr);

Note that this has 4 arguments, not 3 as in the other two variants on strtok().

苹果你个爱泡泡 2024-12-08 12:37:21

来自 GNU C 库手册 - 查找令牌在字符串中

strsepstrtok_r 之间的一个区别是,如果输入字符串在一行中包含多个分隔符字符,strsep 返回一个空分隔符中每对字符的字符串。这意味着程序通常应该在处理 strsep 之前测试是否返回空字符串。

From The GNU C Library manual - Finding Tokens in a String:

One difference between strsep and strtok_r is that if the input string contains more than one character from delimiter in a row strsep returns an empty string for each pair of characters from delimiter. This means that a program normally should test for strsep returning an empty string before processing it.

浪菊怪哟 2024-12-08 12:37:21

strtok()strsep() 的第一个区别是它们处理输入字符串中连续分隔符的方式。

strtok() 处理连续分隔符:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
    const char* teststr = "aaa-bbb --ccc-ddd"; //Contiguous delimiters between bbb and ccc sub-string
    const char* delims = " -";  // delimiters - space and hyphen character
    char* token;
    char* ptr = strdup(teststr);

    if (ptr == NULL) {
        fprintf(stderr, "strdup failed");
        exit(EXIT_FAILURE);
    }

    printf ("Original String: %s\n", ptr);

    token = strtok (ptr, delims);
    while (token != NULL) {
        printf("%s\n", token);
        token = strtok (NULL, delims);
    }

    printf ("Original String: %s\n", ptr);
    free (ptr);
    return 0;
}

输出:

# ./example1_strtok
Original String: aaa-bbb --ccc-ddd
aaa
bbb
ccc
ddd
Original String: aaa

在输出中,您可以看到紧随其后的标记 "bbb""ccc"其他。 strtok() 不指示连续分隔符的出现。另外,strtok() 修改输入字符串

strsep() 处理连续分隔符:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
    const char* teststr = "aaa-bbb --ccc-ddd"; //Contiguous delimiters between bbb and ccc sub-string
    const char* delims = " -";  // delimiters - space and hyphen character
    char* token;
    char* ptr1;
    char* ptr = strdup(teststr);

    if (ptr == NULL) {
        fprintf(stderr, "strdup failed");
        exit(EXIT_FAILURE);
    }

    ptr1 = ptr;

    printf ("Original String: %s\n", ptr);
    while ((token = strsep(&ptr1, delims)) != NULL) {
        if (*token == '\0') {
            token = "<empty>";
        }
        printf("%s\n", token);
    }

    if (ptr1 == NULL) // This is just to show that the strsep() modifies the pointer passed to it
        printf ("ptr1 is NULL\n");
    printf ("Original String: %s\n", ptr);
    free (ptr);
    return 0;
}

输出:

# ./example1_strsep
Original String: aaa-bbb --ccc-ddd
aaa
bbb
<empty>             <==============
<empty>             <==============
ccc
ddd
ptr1 is NULL
Original String: aaa

在输出中,您可以看到 bbb< 之间有两个空字符串(通过 表示) /code> 和 ccc。这两个空字符串用于 "bbb""ccc" 之间的 "--"。当 strsep()"bbb" 之后发现分隔符 ' ' 时,它会将分隔符替换为 '\0' 字符并返回 "bbb"。此后,strsep() 找到了另一个分隔符'-'。然后它用 '\0' 字符替换分隔符并返回空字符串。下一个分隔符也是如此。

strsep() 返回指向空字符的指针(即值为 '\0' 的字符)时,指示连续分隔符)。

strsep() 修改输入字符串以及指针,该指针的地址作为第一个参数传递给 strsep()

第二个区别是,strtok() 依赖静态变量来跟踪字符串中的当前解析位置。此实现需要在开始第二个字符串之前完全解析一个字符串。但 strsep() 的情况并非如此。

当另一个 strtok() 未完成时调用 strtok()

#include <stdio.h>
#include <string.h>

void another_function_callng_strtok(void)
{
    char str[] ="ttt -vvvv";
    char* delims = " -";
    char* token;

    printf ("Original String: %s\n", str);
    token = strtok (str, delims);
    while (token != NULL) {
        printf ("%s\n", token);
        token = strtok (NULL, delims);
    }
    printf ("another_function_callng_strtok: I am done.\n");
}

void function_callng_strtok ()
{
    char str[] ="aaa --bbb-ccc";
    char* delims = " -";
    char* token;

    printf ("Original String: %s\n", str);
    token = strtok (str, delims);
    while (token != NULL)
    {
        printf ("%s\n",token);
        another_function_callng_strtok();
        token = strtok (NULL, delims);
    }
}

int main(void) {
    function_callng_strtok();
    return 0;
}

输出:

# ./example2_strtok
Original String: aaa --bbb-ccc
aaa
Original String: ttt -vvvv
ttt
vvvv
another_function_callng_strtok: I am done.

函数 function_callng_strtok() 仅打印标记 "aaa" 并且不会打印输入字符串的其余标记,因为它调用 another_function_callng_strtok() ,而后者又调用 strtok() 并设置当完成提取所有标记时,strtok() 的静态指针指向 NULL。控制回到function_callng_strtok() while循环,由于静态指针指向,strtok()返回NULLNULL 并使循环条件 false 并退出循环。

当另一个 strsep() 未完成时调用 strsep()

#include <stdio.h>
#include <string.h>

void another_function_callng_strsep(void)
{
    char str[] ="ttt -vvvv";
    const char* delims = " -";
    char* token;
    char* ptr = str;

    printf ("Original String: %s\n", str);
    while ((token = strsep(&ptr, delims)) != NULL) {
        if (*token == '\0') {
            token = "<empty>";
        }
        printf("%s\n", token);
    }
    printf ("another_function_callng_strsep: I am done.\n");
}

void function_callng_strsep ()
{
    char str[] ="aaa --bbb-ccc";
    const char* delims = " -";
    char* token;
    char* ptr = str;

    printf ("Original String: %s\n", str);
    while ((token = strsep(&ptr, delims)) != NULL) {
        if (*token == '\0') {
            token = "<empty>";
        }
        printf("%s\n", token);
        another_function_callng_strsep();
    }
}

int main(void) {
    function_callng_strsep();
    return 0;
}

输出:

# ./example2_strsep
Original String: aaa --bbb-ccc
aaa
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
<empty>
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
<empty>
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
bbb
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
ccc
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.

在这里您可以看到,在完全解析一个字符串之前调用 strsep()没有任何区别。

因此,strtok()strsep() 的缺点是都会修改输入字符串,但 strsep()strsep() 有几个优点code>strtok() 如上所示。

来自strsep

strsep() 函数旨在替代 strtok() 函数。虽然出于可移植性原因应首选 strtok() 函数(它符合 ISO/IEC 9899:1990 (``ISO C90'')),但它无法处理空字段,即检测由两个相邻分隔符字符分隔的字段,或一次用于多个字符串。 strsep()函数首先出现在4.4BSD中。


供参考:

First difference in strtok() and strsep() is the way they handle contiguous delimiter characters in the input string.

Contiguous delimiter characters handling by strtok():

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
    const char* teststr = "aaa-bbb --ccc-ddd"; //Contiguous delimiters between bbb and ccc sub-string
    const char* delims = " -";  // delimiters - space and hyphen character
    char* token;
    char* ptr = strdup(teststr);

    if (ptr == NULL) {
        fprintf(stderr, "strdup failed");
        exit(EXIT_FAILURE);
    }

    printf ("Original String: %s\n", ptr);

    token = strtok (ptr, delims);
    while (token != NULL) {
        printf("%s\n", token);
        token = strtok (NULL, delims);
    }

    printf ("Original String: %s\n", ptr);
    free (ptr);
    return 0;
}

Output:

# ./example1_strtok
Original String: aaa-bbb --ccc-ddd
aaa
bbb
ccc
ddd
Original String: aaa

In the output, you can see the token "bbb" and "ccc" one after another. strtok() does not indicate the occurrence of contiguous delimiter characters. Also, the strtok() modify the input string.

Contiguous delimiter characters handling by strsep():

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
    const char* teststr = "aaa-bbb --ccc-ddd"; //Contiguous delimiters between bbb and ccc sub-string
    const char* delims = " -";  // delimiters - space and hyphen character
    char* token;
    char* ptr1;
    char* ptr = strdup(teststr);

    if (ptr == NULL) {
        fprintf(stderr, "strdup failed");
        exit(EXIT_FAILURE);
    }

    ptr1 = ptr;

    printf ("Original String: %s\n", ptr);
    while ((token = strsep(&ptr1, delims)) != NULL) {
        if (*token == '\0') {
            token = "<empty>";
        }
        printf("%s\n", token);
    }

    if (ptr1 == NULL) // This is just to show that the strsep() modifies the pointer passed to it
        printf ("ptr1 is NULL\n");
    printf ("Original String: %s\n", ptr);
    free (ptr);
    return 0;
}

Output:

# ./example1_strsep
Original String: aaa-bbb --ccc-ddd
aaa
bbb
<empty>             <==============
<empty>             <==============
ccc
ddd
ptr1 is NULL
Original String: aaa

In the output, you can see the two empty string (indicated through <empty>) between bbb and ccc. Those two empty strings are for "--" between "bbb" and "ccc". When strsep() found a delimiter character ' ' after "bbb", it replaced delimiter character with '\0' character and returned "bbb". After this, strsep() found another delimiter character '-'. Then it replaced delimiter character with '\0' character and returned the empty string. Same is for the next delimiter character.

Contiguous delimiter characters are indicated when strsep() returns a pointer to a null character (that is, a character with the value '\0').

The strsep() modify the input string as well as the pointer whose address passed as first argument to strsep().

Second difference is, strtok() relies on a static variable to keep track of the current parse location within a string. This implementation requires to completely parse one string before beginning a second string. But this is not the case with strsep().

Calling strtok() when another strtok() is not finished:

#include <stdio.h>
#include <string.h>

void another_function_callng_strtok(void)
{
    char str[] ="ttt -vvvv";
    char* delims = " -";
    char* token;

    printf ("Original String: %s\n", str);
    token = strtok (str, delims);
    while (token != NULL) {
        printf ("%s\n", token);
        token = strtok (NULL, delims);
    }
    printf ("another_function_callng_strtok: I am done.\n");
}

void function_callng_strtok ()
{
    char str[] ="aaa --bbb-ccc";
    char* delims = " -";
    char* token;

    printf ("Original String: %s\n", str);
    token = strtok (str, delims);
    while (token != NULL)
    {
        printf ("%s\n",token);
        another_function_callng_strtok();
        token = strtok (NULL, delims);
    }
}

int main(void) {
    function_callng_strtok();
    return 0;
}

Output:

# ./example2_strtok
Original String: aaa --bbb-ccc
aaa
Original String: ttt -vvvv
ttt
vvvv
another_function_callng_strtok: I am done.

The function function_callng_strtok() only print token "aaa" and does not print the rest of the tokens of input string because it calls another_function_callng_strtok() which in turn call strtok() and it set the static pointer of strtok() to NULL when it finishes with extracting all the tokens. The control comes back to function_callng_strtok() while loop, strtok() returns NULL due to the static pointer pointing to NULL and which make the loop condition false and loop exits.

Calling strsep() when another strsep() is not finished:

#include <stdio.h>
#include <string.h>

void another_function_callng_strsep(void)
{
    char str[] ="ttt -vvvv";
    const char* delims = " -";
    char* token;
    char* ptr = str;

    printf ("Original String: %s\n", str);
    while ((token = strsep(&ptr, delims)) != NULL) {
        if (*token == '\0') {
            token = "<empty>";
        }
        printf("%s\n", token);
    }
    printf ("another_function_callng_strsep: I am done.\n");
}

void function_callng_strsep ()
{
    char str[] ="aaa --bbb-ccc";
    const char* delims = " -";
    char* token;
    char* ptr = str;

    printf ("Original String: %s\n", str);
    while ((token = strsep(&ptr, delims)) != NULL) {
        if (*token == '\0') {
            token = "<empty>";
        }
        printf("%s\n", token);
        another_function_callng_strsep();
    }
}

int main(void) {
    function_callng_strsep();
    return 0;
}

Output:

# ./example2_strsep
Original String: aaa --bbb-ccc
aaa
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
<empty>
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
<empty>
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
bbb
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.
ccc
Original String: ttt -vvvv
ttt
<empty>
vvvv
another_function_callng_strsep: I am done.

Here you can see, calling strsep() before completely parse one string doesn't makes any difference.

So, the disadvantage of strtok() and strsep() is that both modify the input string but strsep() has couple of advantages over strtok() as illustrated above.

From strsep:

The strsep() function is intended as a replacement for the strtok() function. While the strtok() function should be preferred for portability reasons (it conforms to ISO/IEC 9899:1990 (``ISO C90'')) it is unable to handle empty fields, i.e., detect fields delimited by two adjacent delimiter characters, or to be used for more than a single string at a time. The strsep() function first appeared in 4.4BSD.


For reference:

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文