C 中使用分隔符分割字符串
如何在 C 编程语言中编写一个函数来分割并返回带有分隔符的字符串数组?
char* str = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC";
str_split(str,',');
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(25)
您可以使用
strtok()
函数来分割字符串(并指定要使用的分隔符)。请注意,strtok()
将修改传递给它的字符串。如果其他地方需要原始字符串,请复制它并将该副本传递给strtok()
。编辑:
示例(注意它不处理连续的分隔符,例如“JAN,,,FEB,MAR”):
输出:
You can use the
strtok()
function to split a string (and specify the delimiter to use). Note thatstrtok()
will modify the string passed into it. If the original string is required elsewhere make a copy of it and pass the copy tostrtok()
.EDIT:
Example (note it does not handle consecutive delimiters, "JAN,,,FEB,MAR" for example):
Output:
我认为 strsep 仍然是最好的工具:
这实际上是分割字符串的一行。
额外的括号是一个风格元素,表明我们有意测试赋值的结果,而不是相等运算符
==
。为了使该模式发挥作用,
token
和str
都具有char *
类型。如果您从字符串文字开始,那么您需要先复制它:如果两个分隔符同时出现在
str
中,您将获得一个token
值那是空字符串。str
的值被修改,因为遇到的每个分隔符都被零字节覆盖 - 这是首先复制要解析的字符串的另一个好理由。在评论中,有人建议
strtok
比strsep
更好,因为strtok
更便携。 Ubuntu 和 Mac OS X 有strsep
;可以肯定地猜测其他 unixy 系统也这样做。 Windows 缺少strsep
,但它有strbrk
,它可以实现这个简短而甜蜜的strsep
替换:这里很好地解释了
strsep
与strtok
。可以主观地判断利弊;但是,我认为这是一个明显的迹象,表明strsep
被设计为strtok
的替代品。I think
strsep
is still the best tool for this:That is literally one line that splits a string.
The extra parentheses are a stylistic element to indicate that we're intentionally testing the result of an assignment, not an equality operator
==
.For that pattern to work,
token
andstr
both have typechar *
. If you started with a string literal, then you'd want to make a copy of it first:If two delimiters appear together in
str
, you'll get atoken
value that's the empty string. The value ofstr
is modified in that each delimiter encountered is overwritten with a zero byte - another good reason to copy the string being parsed first.In a comment, someone suggested that
strtok
is better thanstrsep
becausestrtok
is more portable. Ubuntu and Mac OS X havestrsep
; it's safe to guess that other unixy systems do as well. Windows lacksstrsep
, but it hasstrbrk
which enables this short and sweetstrsep
replacement:Here is a good explanation of
strsep
vsstrtok
. The pros and cons may be judged subjectively; however, I think it's a telling sign thatstrsep
was designed as a replacement forstrtok
.字符串分词器这段代码应该会让你走向正确的方向。
String tokenizer this code should put you in the right direction.
这是我的两分钱:
用法:
Here is my two cents:
Usage:
下面的方法将为您完成所有工作(内存分配、计算长度)。更多信息和描述可以在这里找到 - 实现Java String.split()方法分割C字符串
使用方法:
Method below will do all the job (memory allocation, counting the length) for you. More information and description can be found here - Implementation of Java String.split() method to split C string
How to use it:
在上面的示例中,有一种方法可以在字符串中返回一个以空结尾的字符串数组(如您想要的)。但它无法传递文字字符串,因为它必须由函数修改:
可能有一种更简洁的方法来做到这一点,但你明白了。
In the above example, there would be a way to return an array of null terminated strings (like you want) in place in the string. It would not make it possible to pass a literal string though, as it would have to be modified by the function:
There is probably a neater way to do it, but you get the idea.
我认为以下解决方案是理想的:
代码说明:
str
完全由分隔符组成,因此有strlen(str) + 1
标记,全部为空字符串
str
记录每个标记的地址和长度NULL
的额外空间哨兵值信息 - 使用
memcpy
因为它比strcpy
更快,我们知道长度
注意 为了简洁起见,省略了
malloc
检查。一般来说,我不会从像这样的 split 函数返回 char * 指针数组,因为它给调用者带来了很多正确释放它们的责任。我更喜欢的接口是允许调用者传递回调函数并为每个令牌调用该函数,正如我在这里所描述的: 在 C 中分割字符串。
I think the following solution is ideal:
Explanation of the code:
token
to store the address and lengths of the tokensstr
is made up entirely of separators so there arestrlen(str) + 1
tokens, all of them empty strings
str
recording the address and length of every tokenNULL
sentinel valueinformation - use
memcpy
as it's faster thanstrcpy
and we knowthe lengths
Note
malloc
checking omitted for brevity.In general, I wouldn't return an array of
char *
pointers from a split function like this as it places a lot of responsibility on the caller to free them correctly. An interface I prefer is to allow the caller to pass a callback function and call this for every token, as I have described here: Split a String in C.此优化方法在 *result 中创建(或更新现有)指针数组,并返回 *count 中的元素数量。
使用“max”来指示您期望的最大字符串数(当您指定现有数组或任何其他原因时),否则将其设置为 0
要与分隔符列表进行比较,请将 delim 定义为 char* 并替换该行:
包含以下两行:
Enjoy
用法示例:
This optimized method create (or update an existing) array of pointers in *result and returns the number of elements in *count.
Use "max" to indicate the maximum number of strings you expect (when you specify an existing array or any other reaseon), else set it to 0
To compare against a list of delimiters, define delim as a char* and replace the line:
with the two following lines:
Enjoy
Usage example:
我的版本:
My version:
该函数接受一个 char* 字符串并用分隔符将其分割。连续可以有多个分隔符。请注意,该函数修改了原始字符串。如果您需要原始字符串保持不变,则必须首先复制原始字符串。该函数不使用任何 cstring 函数调用,因此它可能比其他函数快一点。如果你不关心内存分配,你可以在函数顶部分配 sub_strings ,大小为 strlen(src_str)/2 ,并且(就像提到的 c++“版本”)跳过函数的下半部分。如果这样做,函数会减少到 O(N),但下面显示的内存优化方式是 O(2N)。
功能:
使用方法:
This function takes a char* string and splits it by the deliminator. There can be multiple deliminators in a row. Note that the function modifies the orignal string. You must make a copy of the original string first if you need the original to stay unaltered. This function doesn't use any cstring function calls so it might be a little faster than others. If you don't care about memory allocation, you can allocate sub_strings at the top of the function with size strlen(src_str)/2 and (like the c++ "version" mentioned) skip the bottom half of the function. If you do this, the function is reduced to O(N), but the memory optimized way shown below is O(2N).
The function:
How to use it:
下面是我从 zString 库 实现的
strtok()
。zstring_strtok()
与标准库的strtok()
的不同之处在于它处理连续分隔符的方式。只需看一下下面的代码,确保您会了解它是如何工作的(我尝试使用尽可能多的注释)
下面是一个示例用法...
该库可以从 Github 下载
https://github.com/fnoyanisi/zString
Below is my
strtok()
implementation from zString library.zstring_strtok()
differs from standard library'sstrtok()
in the way it treats consecutive delimiters.Just have a look at the code below,sure that you will get an idea about how it works (I tried to use as many comments as I could)
Below is an example usage...
The library can be downloaded from Github
https://github.com/fnoyanisi/zString
这是一个可以处理多字符分隔符的字符串分割函数。请注意,如果分隔符比要分割的字符串长,则
buffer
和stringLengths
将设置为(void *) 0
,并且numStrings
将设置为0
。该算法已经过测试并且有效。 (免责声明:尚未针对非 ASCII 字符串进行测试,并且假设调用者给出了有效参数)
示例代码:
库:
This is a string splitting function that can handle multi-character delimiters. Note that if the delimiter is longer than the string that is being split, then
buffer
andstringLengths
will be set to(void *) 0
, andnumStrings
will be set to0
.This algorithm has been tested, and works. (Disclaimer: It has not been tested for non-ASCII strings, and it assumes that the caller gave valid parameters)
Sample code:
Libraries:
爆炸& implode - 初始字符串保持不变,动态内存分配
用法:
Explode & implode - initial string remains intact, dynamic memory allocation
Usage:
如果您愿意使用外部库,我强烈推荐
bstrlib
。它需要一些额外的设置,但从长远来看更容易使用。例如,拆分下面的字符串,首先使用
bfromcstr()
调用创建一个bstring
。 (bstring
是 char 缓冲区的包装器)。接下来,用逗号分割字符串,将结果保存在
struct bstrList
中,其中包含字段qty
和数组entry
(它是一个数组)bstring
。bstrlib
还有许多其他函数可以在bstring
上操作,非常简单...
If you are willing to use an external library, I can't recommend
bstrlib
enough. It takes a little extra setup, but is easier to use in the long run.For example, split the string below, one first creates a
bstring
with thebfromcstr()
call. (Abstring
is a wrapper around a char buffer).Next, split the string on commas, saving the result in a
struct bstrList
, which has fieldsqty
and an arrayentry
, which is an array ofbstring
s.bstrlib
has many other functions to operate onbstring
sEasy as pie...
我知道,聚会迟到了,但这里还有 2 个函数可供使用,可能会进一步调整以满足您的需求(源代码位于帖子底部)
另请参阅实现说明,进一步确定哪个功能更适合您的需求。
使用说明
它们的原型几乎相同,除了源字符串(分别为
strp
和str
)。strp
(指向字符串的指针)是已分配的非常量 C 字符串的地址,将就地标记化。str
是一个未更改的 C 字符串(它甚至可以是字符串文字)。我所说的c-string是指以nul
结尾的字符缓冲区。这两个函数的其余参数相同。要解析所有可用标记,请静音
ntoks
(意味着在将其传递给任何函数之前将其设置为 0 或将其作为NULL
指针传递) )。否则,函数最多解析*ntoks
个标记,或者直到没有更多标记(以先到者为准)。无论如何,当ntoks
为non-NULL
时,它会使用成功解析的令牌的计数进行更新。另请注意,非静音
ntoks
确定将分配多少个指针。因此,如果源字符串包含 10 个标记,并且我们将 ntoks 设置为 1000,则最终会得到 990 个不必要的分配指针。另一方面,如果源字符串包含 1000 个标记,但我们只需要前 10 个,则将ntoks
设置为 10 听起来是一个更明智的选择。这两个函数都分配并返回一个字符指针数组,但
str_toksarray_alloc()
使它们指向修改后的源字符串本身中的标记,而str_toksarray_alloc2( )
使它们指向动态分配的令牌副本(其名称末尾的 2 表示 2 级分配)。返回的数组附加了一个
NULL
哨兵指针,在ntoks
的传回值中不考虑该指针(否则,当非 NULL 时)
、ntoks
将返回数组的长度(而不是其第一级大小)传回给调用者。当
keepnulls
设置为true
时,生成的令牌与我们期望的strsep() 函数。主要意味着源字符串中的连续分隔符会产生空标记 (null),如果 delim 是空 c 字符串或者在源字符串中未找到其包含的分隔符字符,则结果只是 1 个标记:源字符串。与 strsep() 相反,空标记可以被忽略将keepnulls
设置为false
。函数的失败调用可以通过检查其返回值与
NULL
来识别,或者通过检查ntoks
的传回值与0(假设ntoks
为非 NULL
)。我建议在尝试访问返回的数组之前始终检查失败,因为这些函数包括健全性检查,可以推迟否则立即崩溃(例如,传递 NULL 指针作为源字符串)。成功时,调用者应在使用完数组后释放该数组。
对于
str_toksarray_alloc()
,一个简单的free() 就足够了。对于str_toksarray_alloc2()
,由于第二级分配,涉及到一个循环。NULL
哨兵(或非 NULL
ntoks
的传回值)使这变得微不足道,但我还提供了 <下面是 code>toksarray_free2() 函数,供所有懒惰的蜜蜂使用:)下面是使用这两个函数的简化示例。
准备:
str_toksarray_alloc():
str_toksarray_alloc2():
实现说明
这两个函数都使用 strsep() 用于标记化,这使得它们线程安全,但事实并非如此一个标准函数。如果未提供,您始终可以使用开源实现(例如 GNU 的 或 Apple 的 例如)。使用的函数 strdup() 也是如此在
str_toksarray_alloc2()
中(它的实现很简单,但这里还是GNU 的 和 Apple 的)。在 strsep() 中使用的副作用code>str_toksarray_alloc() 是在解析循环的每一步中源字符串的起始指针不断移动到下一个标记。这意味着调用者将无法释放已解析的字符串,除非他们已将起始地址保存到额外的指针。 我们通过使用
strpSaved
指针在函数本地执行此操作,从而避免了麻烦。str_toksarray_alloc2()
不受此影响,因为它不触及源字符串。这两个函数之间的主要区别在于 str_toksarray_alloc() 不会为找到的令牌分配内存。它只是为数组指针分配空间并将它们设置为直接指向源字符串。这是有效的,因为 strsep()
nul< /code> - 就地终止找到的令牌。这种依赖关系可能会使您的支持代码变得复杂,但对于大字符串,它也会对性能产生很大的影响。如果保留源字符串并不重要,它也会对内存占用产生很大的影响。
另一方面,str_toksarray_alloc2() 分配并返回一个动态分配的令牌副本的自维持数组,无需进一步的依赖。它首先通过从源字符串的本地副本创建数组,然后将实际标记内容复制到数组中来实现此目的。与 str_toksarray_alloc() 相比,这要慢得多,并且会留下更大的内存占用,但它没有进一步的依赖项,并且对源字符串的性质没有任何特殊要求。这使得编写更简单(因此更易于维护)的支持代码变得更容易。
这两个函数之间的另一个区别是当
ntoks
被静音时的第一级分配(数组指针)。它们都解析所有可用的标记,但采用的方法截然不同。str_toksarray_alloc()
使用初始大小为 16(字符指针)的 alloc-ahead,在解析循环中根据需要将其加倍。str_toksarray_alloc2()
进行第一遍计数所有可用标记,然后它只分配一次这么多的字符指针。第一遍是通过辅助函数str_toksfound()
完成的,该函数使用标准函数 strpbrk() 和 strchr()。我也在下面提供了该函数的源代码。哪种方法更好实际上取决于您的决定,具体取决于您的项目的需求。请随意将每个函数的代码调整为任一方法并从那里开始。
我想说,平均而言,对于真正大的字符串,提前分配要快得多,特别是当初始大小和增长因子根据具体情况进行微调时(例如,使它们成为函数参数)。使用所有这些
strchr()
和strpbrk()
保存额外的传递可以产生影响。然而,对于相对较小的字符串(这几乎是常态),仅提前分配一堆字符指针就有点矫枉过正了。这并没有什么坏处,但在这种情况下,它确实会无缘无故地使代码变得混乱。无论如何,请随意选择最适合您的。这两个函数也是如此。我想说,在大多数情况下,str_toksarray_alloc2() 处理起来要简单得多,因为对于中小型字符串来说,内存和性能很少是问题。如果您必须处理巨大的字符串,请考虑使用
str_toksarray_alloc()
(尽管在这些情况下您应该使用专门的字符串解析函数,接近您的项目的需求和输入的规格) 。天哪,我想这不仅仅是 2 美分(笑)。
不管怎样,这是两个函数和辅助函数的代码(我已经删除了它们的大部分描述注释,因为我已经涵盖了几乎所有内容)。
源代码
str_toksarray_alloc():
str_toksarray_alloc2():
str_tokscount() - 辅助函数,由 str_toksarr_alloc2() 使用:
toksarray_free2() - 在str_toksarr_alloc2()返回的数组上使用它:
Late to the party I know, but here's 2 more functions to play with and probably further adjust to your needs (source code at the bottom of the post)
See also the Implementation Notes, further below, to decide which function suits your needs better.
Usage Notes
Their prototypes are almost identical, except for the source-string (
strp
andstr
, respectively).strp
(pointer to string) is the address of an already allocated, non-constant c-string, to be tokenized in-place.str
is a c-string which is not altered (it can even be a string-literal). By c-string I mean anul
-terminated buffer of chars. The rest of the arguments are the same for both functions.To parse all available tokens, mute
ntoks
(meaning set it to 0 before passing it to any of the functions or pass it as aNULL
pointer). Else the functions parse up to*ntoks
tokens, or until there are no more tokens (whichever comes first). In any case, whenntoks
isnon-NULL
it gets updated with the count of successfully parsed tokens.Note also that a non-muted
ntoks
determines how many pointers will be allocated. Thus if the source string contains say 10 tokens and we setntoks
to 1000, we'll end up with 990 needlessly allocated pointers. On the other hand, if the source-string contains say 1000 tokens but we only need the first 10, settingntoks
to 10 sounds like a much wiser choice.Both functions allocate and return an array of char-pointers, but
str_toksarray_alloc()
makes them point to the tokens in the modified source-string itself, whilestr_toksarray_alloc2()
makes them point to dynamically allocated copies of the tokens (that 2 at the end of its name indicates the 2-levels of allocation).The returned array is appended with a
NULL
sentinel pointer, which is not taken into account in the passed-back value ofntoks
(put otherwise, whennon-NULL
,ntoks
passes-back to the caller the length of the returned array, not its 1st level size).When
keepnulls
is set totrue
, the resulting tokens are similar to what we'd expect from the strsep() function. Mostly meaning that consecutive delimiters in the source-string produce empty tokens (nulls), and ifdelim
is an empty c-string or none of its contained delimiter-chars were found in the source string, the result is just 1 token: the source string. Contrary to strsep(), empty tokens can be ignored by settingkeepnulls
tofalse
.Failed calls of the functions can be identified by checking their return value against
NULL
, or by checking the passed-back value ofntoks
against 0 (providedntoks
wasnon-NULL
). I suggest always checking against failure before attempting to access the returned array, because the functions include sanity checks which can postpone otherwise immediate crashes (for example, passing aNULL
pointer as the source string).On success, the caller should free the array when they're done with it.
For
str_toksarray_alloc()
, a simple free() is enough. Forstr_toksarray_alloc2()
a loop is involved, due to the 2nd level of allocation. TheNULL
sentinel (or the passed-back value of anon-NULL
ntoks
) makes this trivial, but I'm also providing atoksarray_free2()
function below, for all the lazy bees out there :)Simplified examples using both functions follow.
Prep:
str_toksarray_alloc():
str_toksarray_alloc2():
Implementation Notes
Both functions use strsep() for the tokenization which makes them thread-safe, but it's not a standard function. If not provided, you can always use an open-source implementation (like GNU's or Apple's for example). Same goes for the function strdup() which is used in
str_toksarray_alloc2()
(its implementation is trivial but again here's GNU's and Apple's for example).A side-effect of using strsep() in
str_toksarray_alloc()
is that the starting pointer of the source-string keeps moving to the next token in every step of the parsing loop. This means that the caller won't be able to free the parsed string, unless they had saved the starting address to an extra pointer. We save them the hassle, by doing that locally in the function, using thestrpSaved
pointer.str_toksarray_alloc2()
is not affected by this, because it doesn't touch the source-string.A main difference between the 2 functions is that
str_toksarray_alloc()
does not allocate memory for the found tokens. It rather allocates space just for the array pointers and sets them pointing directly into the source-string. This works because strsep()nul
-terminates the found tokens in-place. This dependency can complicate your supporting code, but with big strings it can also make a big difference in performance. If preserving the source-string is not important, it can make a big difference in memory footprint too.On the other hand,
str_toksarray_alloc2()
allocates and returns a self sustained array of dynamically allocated copies of the tokens, without further dependencies. It does so firstly by creating the array from a local duplicate of the source-string, and secondly by duplicating the actual tokens contents into the array. This is a lot slower and leaves a much bigger memory footprint compared tostr_toksarray_alloc()
, but it has no further dependencies, and sets no special requirements for the nature of the source-string. This makes it easier to write simpler (hence better maintainable) supporting code.Another difference between the 2 functions is the 1st level of allocation (the array pointers) when
ntoks
is muted. They both parse all available tokens, but they take quite different approaches.str_toksarray_alloc()
uses alloc-ahead with an initial size of 16 (char-pointers), doubling it on demand in the parsing loop.str_toksarray_alloc2()
makes a 1st pass counting all available tokens, then it allocates that many char-pointers just once. That 1st pass is done with a helper functionstr_toksfound()
which uses the standard functions strpbrk() and strchr(). I'm providing the source-code of that function too, further below.Which approach is better is really up to you to decide, depending on the needs of your project. Feel free to adjust the code of each function to either approach and take it from there.
I'd say that on average and for really big strings alloc-ahead is much faster, especially when the initial size and grow factor are fine tuned on a per-case basis (making them function parameters for example). Saving that extra pass with all those
strchr()
's andstrpbrk()
's can make a difference there. However, with relatively small strings which is pretty much the norm, allocing-ahead just a bunch of char-pointers is just an overkill. It doesn't hurt but it does clutter the code for no good reason in this case. Anyway, feel free to choose whichever suits you best.Same goes for these 2 functions. I'd say in most cases
str_toksarray_alloc2()
is much simpler to cope with, since memory and performance are rarely an issue with small to medium strings. If you have to deal with huge strings, then consider usingstr_toksarray_alloc()
(though in those cases you should roll a specialized string parsing function, close to the needs of your project and the specs of your input).Oh boy, I think that was a bit more than just 2 cents (lol).
Anyway, here is the code of the 2 functions and the helper ones (I've removed most of their description comments, since I've covered pretty much everything already).
Source Code
str_toksarray_alloc():
str_toksarray_alloc2():
str_tokscount() - helper function, used by str_toksarr_alloc2():
toksarray_free2() - use it on the array returned by str_toksarr_alloc2():
strtok()
和strsep()
都会修改输入字符串。我们可以编写一个函数,使用 strspn() 和 strpbrk()。算法:
null
。strspn()
),将其称为start
。strpbrk()
),将其称为end
。start
复制到end
。优点:
strtok()
和strsep()
那样修改输入字符串。实施:
输出:
Both
strtok()
andstrsep()
modify the input string. We can write a function to split the string based on delimiters using strspn() and strpbrk().Algorithm:
null
.strspn()
for this), call itstart
.strpbrk()
for this), call itend
.start
toend
in that memory.Advantage:
strtok()
andstrsep()
does.Implementation:
Output:
尝试使用这个。
Try use this.
我的方法是扫描字符串,让指针指向分隔符(和第一个字符)之后的每个字符,同时将字符串中分隔符的出现分配给'\0'。
首先复制原始字符串(因为它是常量),然后通过扫描获取分割数,并将其传递给指针参数len。之后,将第一个结果指针指向复制字符串指针,然后扫描复制字符串:一旦遇到分隔符,则将其赋值为'\0',从而将前一个结果字符串终止,并将下一个结果字符串指针指向下一个结果字符串。字符指针。
My approach is to scan the string and let the pointers point to every character after the deliminators(and the first character), at the same time assign the appearances of deliminator in string to '\0'.
First make a copy of original string(since it's constant), then get the number of splits by scan it pass it to pointer parameter len. After that, point the first result pointer to the copy string pointer, then scan the copy string: once encounter a deliminator, assign it to '\0' thus the previous result string is terminated, and point the next result string pointer to the next character pointer.
我的代码(已测试):
结果:
My code (tested):
Result:
围绕这个问题的两个问题是内存管理和线程安全。从众多帖子中可以看出,
这不是一个用 C 语言无缝完成的简单任务。我想要一个解决方案:
我提出的解决方案满足所有这些标准。设置可能需要更多工作
比这里发布的其他一些解决方案要好,但我认为在实践中,额外的工作是值得的
以避免其他解决方案的常见陷阱。
下面是一个编译和输出的示例。请注意,在我的示例中,我特意拼出了“APRIL”,以便您可以看到软错误是如何工作的。
享受!
Two issues surrounding this question are memory management and thread safety. As you can see from the numerous posts,
this isn't an easy task to accomplish seamlessly in C. I desired a solution that is:
The solution I came up meets all of these criteria. It's probably a little more work to setup
than some other solutions posted here, but I think that in practice, the extra work is worth
it in order to avoid the common pitfalls of other solutions.
Below is an example compile and output. Note that in my example, I purposefully spelled out "APRIL" so that you can see how the soft error works.
Enjoy!
这是另一个实现,它将安全地操作来标记与问题中请求的原型匹配的字符串文字,返回分配的指向 char 的指针(例如
char **
)。分隔符字符串可以包含多个字符,输入字符串可以包含任意数量的标记。所有分配和重新分配均由malloc
或realloc
处理,无需 POSIXstrdup
。分配的指针的初始数量由 NPTRS 常量控制,唯一的限制是它大于零。返回的
char **
在最后一个标记后包含一个 sentinelNULL
,类似于*argv[]
并且在可供execv
、execvp
和execve
使用的形式。与
strtok()
一样,多个连续分隔符被视为单个分隔符,因此"JAN,FEB,MAR,APR,MAY,,,JUN,JUL,AUG,SEP,OCT,NOV ,DEC"
将被解析为只有一个','
分隔"MAY,JUN"
。下面的函数进行了内联注释,并添加了一个简短的
main()
来分割月份。分配的初始指针数量设置为2
,以在对输入字符串进行标记期间强制进行 3 次重新分配:示例使用/输出
如果您还有任何其他问题,请告诉我。
Here is another implementation that will operate safely to tokenize a string-literal matching the prototype requested in the question returning an allocated pointer-to-pointer to char (e.g.
char **
). The delimiter string can contain multiple characters, and the input string can contain any number of tokens. All allocations and reallocations are handled bymalloc
orrealloc
without POSIXstrdup
.The initial number of pointers allocated is controlled by the
NPTRS
constant and the only limitation is that it be greater than zero. Thechar **
returned contains a sentinelNULL
after the last token similar to*argv[]
and in the form usable byexecv
,execvp
andexecve
.As with
strtok()
multiple sequential delimiters are treated as a single delimiter, so"JAN,FEB,MAR,APR,MAY,,,JUN,JUL,AUG,SEP,OCT,NOV,DEC"
will be parsed as if only a single','
separates"MAY,JUN"
.The function below is commented in-line and a short
main()
was added splitting the months. The initial number of pointers allocated was set at2
to force three reallocation during tokenizing the input string:Example Use/Output
Let me know if you have any further questions.
遇到这个问题正在寻找一个简单的解决方案。
我对所有选项都很着迷,但对我自己的用例/品味不满意(这可能很糟糕)。
我创建了一个有点独特的解决方案,旨在为用户提供清晰的行为,而不是重新分配任何内存,并且具有人类可读性+注释。
上传到gist.github:https://gist.github.com/RepComm/1e89f7611733ce0e75c8476d5ef66093
示例:
输出:
strutils.c 的完整源代码
Came across this looking for a simple solution.
I am fascinated by all of the options but dissatisfied for my own use case/taste (which may be terrible).
I have created a somewhat unique solution that aims to clearly behave for its user, not re-allocate any memory, and be human readable + with comments.
Uploaded to gist.github here: https://gist.github.com/RepComm/1e89f7611733ce0e75c8476d5ef66093
Example:
Output:
Full source of strutils.c
输出
Outputs
我尝试做一个非常简单的。我还在 main() 中展示了示例。
假设 .txt 文件已
使用 .txt 文件作为参数运行它,则会给出
I tried to make a very simple one. I am also showing example in the main().
Assume a .txt file has
running it with a .txt file as a parameter would give