C中strtok和strsep有什么区别
有人可以解释一下 strtok()
和 strsep()
之间有什么区别吗? 它们有什么优点和缺点? 为什么我会选择其中一个而不是另一个。
Could someone explain me what differences there are between strtok()
and strsep()
?
What are the advantages and disadvantages of them?
And why would I pick one over the other one.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
strtok()
和strsep()
是strtok()
是标准化的(由 C 标准,因此也由 POSIX),但是strsep()
未标准化(由 C或 POSIX; GNU C 库,起源于 BSD)。因此,可移植代码更有可能使用strtok()
而不是strsep()
。另一个区别是,对不同字符串的
strsep()
函数的调用可以交错,而使用strtok()
则无法做到这一点(尽管可以使用strtok_r ()
)。因此,在库中使用strsep()
不会意外破坏其他代码,而在库函数中使用strtok()
必须记录下来,因为其他代码使用strtok()
不能同时调用库函数。strsep()
的手册页位于 kernel.org 说:因此,另一个主要区别是 George Gaál 在他的回答中强调的区别;
strtok()
允许单个标记之间有多个分隔符,而strsep()
期望标记之间有单个分隔符,并将相邻分隔符解释为空标记。strsep()
和strtok()
都会修改其输入字符串,并且都不能让您识别标记标记结尾的分隔符(因为两者都写入 NUL' \0'
位于标记末尾之后的分隔符上)。何时使用它们?
strtok_r()
。strtok()
。而且你只会使用它足够长的时间来让你摆脱危及生命的情况;然后你会再次放弃对它的所有使用。它有毒;不要使用它。编写自己的strtok_r()
或strsep()
比使用strtok()
更好。为什么
strtok()
有毒?如果在库函数中使用
strtok()
函数,则该函数是有毒的。如果您的库函数使用strtok()
,则必须清楚地记录它。这是因为:
strtok()
并调用也使用strtok()
的函数,则会破坏该调用函数。strtok()
的函数,则会破坏您的函数对strtok()
的使用。strtok()
调用中使用strtok()
。此问题的根源在于调用之间保存的状态,该状态允许
strtok()
从中断处继续。除了“不要使用strtok()
”之外,没有其他明智的方法可以解决该问题。strsep()
。strtok_r()
如果它是可用的。strtok_s()
,如果它是可用的。strtok_s()
,但其接口与strtok_r()
和 Microsoft 的strtok_r()
不同。代码>strtok_s()。BSD
strsep()
:POSIX
strtok_r()
:Microsoft
strtok_s()
:附件 K
strtok_s()
:注意它有 4 个参数,而不是
strtok()
上其他两个变体中的 3 个参数。One major difference between
strtok()
andstrsep()
is thatstrtok()
is standardized (by the C standard, and hence also by POSIX) butstrsep()
is not standardized (by C or POSIX; it is available in the GNU C Library, and originated on BSD). Thus, portable code is more likely to usestrtok()
thanstrsep()
.Another difference is that calls to the
strsep()
function on different strings can be interleaved, whereas you cannot do that withstrtok()
(though you can withstrtok_r()
). So, usingstrsep()
in a library doesn't break other code accidentally, whereas usingstrtok()
in a library function must be documented because other code usingstrtok()
at the same time cannot call the library function.The manual page for
strsep()
at kernel.org says:Thus, the other major difference is the one highlighted by George Gaál in his answer;
strtok()
permits multiple delimiters between a single token, whereasstrsep()
expects a single delimiter between tokens, and interprets adjacent delimiters as an empty token.Both
strsep()
andstrtok()
modify their input strings and neither lets you identify which delimiter character marked the end of the token (because both write a NUL'\0'
over the separator after the end of the token).When to use them?
strsep()
when you want empty tokens rather than allowing multiple delimiters between tokens, and when you don't mind about portability.strtok_r()
when you want to allow multiple delimiters between tokens and you don't want empty tokens (and POSIX is sufficiently portable for you).strtok()
when someone threatens your life if you don't do so. And you'd only use it for long enough to get you out of the life-threatening situation; you would then abandon all use of it once more. It is poisonous; do not use it. It would be better to write your ownstrtok_r()
orstrsep()
than to usestrtok()
.Why is
strtok()
poisonous?The
strtok()
function is poisonous if used in a library function. If your library function usesstrtok()
, it must be documented clearly.That's because:
strtok()
and calls your function that also usesstrtok()
, you break the calling function.strtok()
, that will break your function's use ofstrtok()
.strtok()
at any given time — across a sequence ofstrtok()
calls.The root of this problem is the saved state between calls that allows
strtok()
to continue where it left off. There is no sensible way to fix the problem other than "do not usestrtok()
".strsep()
if it is available.strtok_r()
if it is available.strtok_s()
if it is available.strtok_s()
, but its interface is different from bothstrtok_r()
and Microsoft'sstrtok_s()
.BSD
strsep()
:POSIX
strtok_r()
:Microsoft
strtok_s()
:Annex K
strtok_s()
:Note that this has 4 arguments, not 3 as in the other two variants on
strtok()
.来自 GNU C 库手册 - 查找令牌在字符串中:
From The GNU C Library manual - Finding Tokens in a String:
strtok()
和strsep()
的第一个区别是它们处理输入字符串中连续分隔符的方式。strtok()
处理连续分隔符:输出:
在输出中,您可以看到紧随其后的标记
"bbb"
和"ccc"
其他。strtok()
不指示连续分隔符的出现。另外,strtok()
修改输入字符串。strsep()
处理连续分隔符:输出:
在输出中,您可以看到
bbb< 之间有两个空字符串(通过
表示) /code> 和ccc
。这两个空字符串用于"bbb"
和"ccc"
之间的"--"
。当strsep()
在"bbb"
之后发现分隔符' '
时,它会将分隔符替换为'\0' 字符并返回
"bbb"
。此后,strsep()
找到了另一个分隔符'-'
。然后它用'\0'
字符替换分隔符并返回空字符串。下一个分隔符也是如此。当
strsep()
返回指向空字符的指针(即值为'\0'
的字符)时,指示连续分隔符)。strsep()
修改输入字符串以及指针,该指针的地址作为第一个参数传递给strsep()
。第二个区别是,strtok() 依赖静态变量来跟踪字符串中的当前解析位置。此实现需要在开始第二个字符串之前完全解析一个字符串。但
strsep()
的情况并非如此。当另一个
strtok()
未完成时调用strtok()
:输出:
函数
function_callng_strtok()
仅打印标记"aaa"
并且不会打印输入字符串的其余标记,因为它调用another_function_callng_strtok()
,而后者又调用strtok()
并设置当完成提取所有标记时,strtok()
的静态指针指向NULL
。控制回到function_callng_strtok()
while
循环,由于静态指针指向,strtok()
返回NULL
到NULL
并使循环条件false
并退出循环。当另一个
strsep()
未完成时调用strsep()
:输出:
在这里您可以看到,在完全解析一个字符串之前调用
strsep()
没有任何区别。因此,
strtok()
和strsep()
的缺点是都会修改输入字符串,但strsep()
比strsep() 有几个优点code>strtok()
如上所示。来自strsep:
供参考:
First difference in
strtok()
andstrsep()
is the way they handle contiguous delimiter characters in the input string.Contiguous delimiter characters handling by
strtok()
:Output:
In the output, you can see the token
"bbb"
and"ccc"
one after another.strtok()
does not indicate the occurrence of contiguous delimiter characters. Also, thestrtok()
modify the input string.Contiguous delimiter characters handling by
strsep()
:Output:
In the output, you can see the two empty string (indicated through
<empty>
) betweenbbb
andccc
. Those two empty strings are for"--"
between"bbb"
and"ccc"
. Whenstrsep()
found a delimiter character' '
after"bbb"
, it replaced delimiter character with'\0'
character and returned"bbb"
. After this,strsep()
found another delimiter character'-'
. Then it replaced delimiter character with'\0'
character and returned the empty string. Same is for the next delimiter character.Contiguous delimiter characters are indicated when
strsep()
returns a pointer to a null character (that is, a character with the value'\0'
).The
strsep()
modify the input string as well as the pointer whose address passed as first argument tostrsep()
.Second difference is,
strtok()
relies on a static variable to keep track of the current parse location within a string. This implementation requires to completely parse one string before beginning a second string. But this is not the case withstrsep()
.Calling
strtok()
when anotherstrtok()
is not finished:Output:
The function
function_callng_strtok()
only print token"aaa"
and does not print the rest of the tokens of input string because it callsanother_function_callng_strtok()
which in turn callstrtok()
and it set the static pointer ofstrtok()
toNULL
when it finishes with extracting all the tokens. The control comes back tofunction_callng_strtok()
while
loop,strtok()
returnsNULL
due to the static pointer pointing toNULL
and which make the loop conditionfalse
and loop exits.Calling
strsep()
when anotherstrsep()
is not finished:Output:
Here you can see, calling
strsep()
before completely parse one string doesn't makes any difference.So, the disadvantage of
strtok()
andstrsep()
is that both modify the input string butstrsep()
has couple of advantages overstrtok()
as illustrated above.From strsep:
For reference: