C 语言中 strtok() 如何将字符串拆分为标记?
请向我解释一下strtok()
函数的工作原理。手册说它将字符串分成标记。我无法从手册中理解它的实际作用。
我在 str
和 *pch
上添加了监视,以在第一个 while 循环发生时检查其工作情况,str
的内容只是“this”。下面所示的输出是如何打印在屏幕上的?
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
输出:
Splitting string "- This, a sample string." into tokens: This a sample string
Please explain to me the working of strtok()
function. The manual says it breaks the string into tokens. I am unable to understand from the manual what it actually does.
I added watches on str
and *pch
to check its working when the first while loop occurred, the contents of str
were only "this". How did the output shown below printed on the screen?
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
Output:
Splitting string "- This, a sample string." into tokens: This a sample string
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(16)
strtok 运行时函数的工作方式如下:
第一次调用 strtok 时,您提供了一个要
在上面的字符串空间中标记的字符串,这似乎是单词之间的一个很好的分隔符,所以让我们使用它:
现在发生的情况是搜索 's'直到找到空格字符,返回第一个标记('this')并且 p 指向该标记(字符串)
以便获取下一个标记并继续使用相同的字符串 NULL 作为第一个传递
参数,因为 strtok 维护指向先前传递的字符串的静态指针:
p 现在指向“is”
,依此类推,直到找不到更多空格,然后将最后一个字符串作为最后一个标记“string”返回'。
更方便的是,您可以这样写,而不是打印出所有令牌:
编辑:
如果您想存储从
strtok
返回的值,您需要将令牌复制到另一个缓冲区例如strdup(p);
,因为原始字符串(由strtok
内的静态指针指向)在迭代之间被修改以返回令牌。the strtok runtime function works like this
the first time you call strtok you provide a string that you want to tokenize
in the above string space seems to be a good delimiter between words so lets use that:
what happens now is that 's' is searched until the space character is found, the first token is returned ('this') and p points to that token (string)
in order to get next token and to continue with the same string NULL is passed as first
argument since strtok maintains a static pointer to your previous passed string:
p now points to 'is'
and so on until no more spaces can be found, then the last string is returned as the last token 'string'.
more conveniently you could write it like this instead to print out all tokens:
EDIT:
If you want to store the returned values from
strtok
you need to copy the token to another buffer e.g.strdup(p);
since the original string (pointed to by the static pointer insidestrtok
) is modified between iterations in order to return the token.strtok()
将字符串划分为标记。即从任何一个分隔符开始到下一个分隔符将是您的一个标记。在您的情况下,起始标记将从“-”开始,并以下一个空格“”结束。然后下一个标记将从“ ”开始并以“,”结束。在这里你得到“This”作为输出。类似地,字符串的其余部分被从一个空格到另一个空格分割成标记,最后以“.”结束最后一个标记。strtok()
divides the string into tokens. i.e. starting from any one of the delimiter to next one would be your one token. In your case, the starting token will be from "-" and end with next space " ". Then next token will start from " " and end with ",". Here you get "This" as output. Similarly the rest of the string gets split into tokens from space to space and finally ending the last token on "."strtok
维护一个静态的内部引用,指向字符串中的下一个可用标记;如果你向它传递一个 NULL 指针,它将从该内部引用开始工作。这就是
strtok
不可重入的原因;一旦你向它传递一个新的指针,旧的内部引用就会被破坏。strtok
maintains a static, internal reference pointing to the next available token in the string; if you pass it a NULL pointer, it will work from that internal reference.This is the reason
strtok
isn't re-entrant; as soon as you pass it a new pointer, that old internal reference gets clobbered.strtok 将标记一个字符串,即将其转换为一系列子字符串。
它通过搜索分隔这些标记(或子字符串)的分隔符来实现这一点。并且您指定分隔符。就您而言,您需要“ ”或“,”或“。”或“-”作为分隔符。
提取这些标记的编程模型是手动 strtok 主字符串和分隔符集。然后重复调用它,每次 strtok 都会返回它找到的下一个标记。直到到达主字符串的末尾,此时返回 null。另一个规则是仅在第一次传递字符串,随后的时间传递 NULL。这是一种告诉 strtok 您是否正在使用新字符串开始新的标记化会话,或者您是否正在从先前的标记化会话中检索标记的方法。请注意,strtok 会记住其标记化会话的状态。因此,它不是可重入的或线程安全的(您应该使用 strtok_r 代替)。另一件需要知道的事情是它实际上修改了原始字符串。它为找到的分隔符写入“\0”。
简而言之,调用 strtok 的一种方法如下:
结果:
strtok will tokenize a string i.e. convert it into a series of substrings.
It does that by searching for delimiters that separate these tokens (or substrings). And you specify the delimiters. In your case, you want ' ' or ',' or '.' or '-' to be the delimiter.
The programming model to extract these tokens is that you hand strtok your main string and the set of delimiters. Then you call it repeatedly, and each time strtok will return the next token it finds. Till it reaches the end of the main string, when it returns a null. Another rule is that you pass the string in only the first time, and NULL for the subsequent times. This is a way to tell strtok if you are starting a new session of tokenizing with a new string, or you are retrieving tokens from a previous tokenizing session. Note that strtok remembers its state for the tokenizing session. And for this reason it is not reentrant or thread safe (you should be using strtok_r instead). Another thing to know is that it actually modifies the original string. It writes '\0' for teh delimiters that it finds.
One way to invoke strtok, succintly, is as follows:
Result:
strtok
不会更改参数本身 (str
)。它存储该指针(在局部静态变量中)。然后,它可以在后续调用中更改该参数指向,而无需将参数传回。 (并且它可以推进它所保留的指针,但是它需要执行其操作。)来自 POSIX
strtok
页面:有一个线程安全变体 (
strtok_r
) 无法实现这种类型的魔法。strtok
doesn't change the parameter itself (str
). It stores that pointer (in a local static variable). It can then change what that parameter points to in subsequent calls without having the parameter passed back. (And it can advance that pointer it has kept however it needs to perform its operations.)From the POSIX
strtok
page:There is a thread-safe variant (
strtok_r
) that doesn't do this type of magic.第一次调用它时,您需要向
strtok
提供要标记化的字符串。然后,要获取以下标记,只需将 NULL 赋予该函数,只要它返回非 NULL 指针即可。strtok
函数记录您在调用它时首次提供的字符串。 (这对于多线程应用程序来说确实很危险)The first time you call it, you provide the string to tokenize to
strtok
. And then, to get the following tokens, you just giveNULL
to that function, as long as it returns a nonNULL
pointer.The
strtok
function records the string you first provided when you call it. (Which is really dangerous for multi-thread applications)strtok 修改其输入字符串。它在其中放置空字符('\0'),以便它将返回原始字符串的位作为标记。事实上strtok并不分配内存。如果将字符串绘制为一系列框,您可能会更好地理解它。
strtok modifies its input string. It places null characters ('\0') in it so that it will return bits of the original string as tokens. In fact strtok does not allocate memory. You may understand it better if you draw the string as a sequence of boxes.
要了解
strtok()
的工作原理,首先需要知道什么是静态变量< /a> 是。 这个链接解释得很好......strtok( 操作的关键)
保留连续调用之间最后一个分隔符的位置(这就是为什么strtok()
继续解析使用调用它时传递给它的原始字符串。连续调用中的空指针
)。看看我自己的
strtok()
实现,称为zStrtok()
,它的功能与由strtok()
提供,这里是一个示例用法
代码来自字符串处理库我在Github上维护,名为zString。看一下代码,甚至贡献一下:)
https://github.com/fnoyanisi/zString
To understand how
strtok()
works, one first need to know what a static variable is. This link explains it quite well....The key to the operation of
strtok()
is preserving the location of the last seperator between seccessive calls (that's whystrtok()
continues to parse the very original string that is passed to it when it is invoked with anull pointer
in successive calls)..Have a look at my own
strtok()
implementation, calledzStrtok()
, which has a sligtly different functionality than the one provided bystrtok()
And here is an example usage
The code is from a string processing library I maintain on Github, called zString. Have a look at the code, or even contribute :)
https://github.com/fnoyanisi/zString
这就是我实现 strtok 的方法,不是很好,但经过 2 小时的工作终于成功了。它确实支持多个分隔符。
This is how i implemented strtok, Not that great but after working 2 hr on it finally got it worked. It does support multiple delimiters.
对于那些仍然很难理解这个
strtok()
函数的人,请看一下这个 pythontutor 示例,它是可视化 C(或 C++、Python ...)代码的绝佳工具。如果链接损坏,请粘贴:
致谢人员转到 Anders K.
For those who are still having hard time understanding this
strtok()
function, take a look at this pythontutor example, it is a great tool to visualize your C (or C++, Python ...) code.In case the link got broken, paste in:
Credits go to Anders K.
这是我的实现,它使用哈希表作为分隔符,这意味着它是 O(n) 而不是 O(n^2) (这里是代码链接):
Here is my implementation which uses hash table for the delimiter, which means it O(n) instead of O(n^2) (here is a link to the code):
strtok() 将指针存储在静态变量中,因此在第二次调用时,当我们传递 null 时,strtok() 从静态变量中获取指针。
如果您提供相同的字符串名称,它将再次从头开始。
此外,strtok() 具有破坏性,即它会更改原始字符串。因此,请确保您始终拥有一份原件的副本。
使用 strtok() 的另一个问题是,由于它将地址存储在静态变量中,因此在多线程编程中多次调用 strtok() 会导致错误。为此,请使用 strtok_r()。
strtok() stores the pointer in static variable where did you last time left off , so on its 2nd call , when we pass the null , strtok() gets the pointer from the static variable .
If you provide the same string name , it again starts from beginning.
Moreover strtok() is destructive i.e. it make changes to the orignal string. so make sure you always have a copy of orignal one.
One more problem of using strtok() is that as it stores the address in static variables , in multithreaded programming calling strtok() more than once will cause an error. For this use strtok_r().
strtok 将第二个参数中的字符替换为 NULL,并且 NULL 字符也是字符串的结尾。
http://www.cplusplus.com/reference/clibrary/cstring/strtok/
strtok replaces the characters in the second argument with a NULL and a NULL character is also the end of a string.
http://www.cplusplus.com/reference/clibrary/cstring/strtok/
您可以扫描字符数组来查找标记,如果发现它只是打印新行,否则打印字符。
you can scan the char array looking for the token if you found it just print new line else print the char.
所以,这是一个代码片段,可以帮助更好地理解这个主题。
打印标记
任务:给定一个句子 s,在新行中打印该句子的每个单词。
输入:
怎么样
结果:
解释: 所以这里使用了“strtok()”函数,它是使用 for 循环迭代以在单独的行中打印标记。
该函数将采用“字符串”和“断点”作为参数,并在这些断点处中断字符串并形成标记。现在,这些令牌存储在“p”中并进一步用于打印。
So, this is a code snippet to help better understand this topic.
Printing Tokens
Task: Given a sentence, s, print each word of the sentence in a new line.
Input:
How is that
Result:
Explanation: So here, "strtok()" function is used and it's iterated using for loop to print the tokens in separate lines.
The function will take parameters as 'string' and 'break-point' and break the string at those break-points and form tokens. Now, those tokens are stored in 'p' and are used further for printing.
strtok
将给定字符串中的分隔符替换为'\0'
NULL 字符CODE
OUTPUT
在标记化之前string
我将 string s 的地址分配给某个 ptr(p1) 并尝试通过该 ptr 打印字符串并打印整个字符串。
标记化后
strtok 将字符串 s 的地址返回到 ptr(p2) ,但是当我尝试通过 ptr 打印字符串时,它只打印“30”,但没有打印整个字符串。因此可以肯定,
strtok 不仅返回地址,而且还在存在分隔符的位置放置了“\0”字符
。交叉检查
1.
再次将字符串 s 的地址分配给某个 ptr (p3) 并尝试打印字符串,它在标记字符串更新时打印“30”分隔符处为“\0”。
2.
请参阅通过循环逐个字符打印字符串,第一个分隔符被 '\0' 替换,因此它打印空格而不是 ''
strtok
is replacing delimiter with'\0'
NULL character in given stringCODE
OUTPUT
Before tokenizing the string
I assigned address of string s to some ptr(p1) and try to print string through that ptr and whole string is printed.
after tokenized
strtok return the address of string s to ptr(p2) but when I try to print string through ptr it only print "30" it did not print whole string. so it's sure that
strtok is not just returning adress but it is placing '\0' character where delimiter is present
.cross check
1.
again I assign the address of string s to some ptr (p3) and try to print string it prints "30" as while tokenizing the string is updated with '\0' at delimiter.
2.
see printing string s character by character via loop the 1st delimiter is replaced by '\0' so it is printing blank space rather than ''