这是 C 的一个好的子集吗?
另请参阅 C 分词器
这是我为 C 编写的一个快速 substr() (是的,变量初始化需要移动到函数的开头等,但你明白了)
我见过许多 substr() 的“智能”实现,它们很简单一个班轮调用 strncpy()!
它们都是错误的(strncpy 不保证 null 终止,因此调用可能不会产生正确的子字符串!)
这里有更好的东西吗?
把 bug 拿出来!
char* substr(const char* text, int nStartingPos, int nRun)
{
char* emptyString = strdup(""); /* C'mon! This cannot fail */
if(text == NULL) return emptyString;
int textLen = strlen(text);
--nStartingPos;
if((nStartingPos < 0) || (nRun <= 0) || (textLen == 0) || (textLen < nStartingPos)) return emptyString;
char* returnString = (char *)calloc((1 + nRun), sizeof(char));
if(returnString == NULL) return emptyString;
strncat(returnString, (nStartingPos + text), nRun);
/* We do not need emptyString anymore from this point onwards */
free(emptyString);
emptyString = NULL;
return returnString;
}
int main()
{
const char *text = "-2--4--6-7-8-9-10-11-";
char *p = substr(text, -1, 2);
printf("[*]'%s' (\")\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 1, 2);
printf("[*]'%s' (-2)\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 3, 2);
printf("[*]'%s' (--)\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 16, 2);
printf("[*]'%s' (10)\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 16, 20);
printf("[*]'%s' (10-11-)\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 100, 2);
printf("[*]'%s' (\")\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 1, 0);
printf("[*]'%s' (\")\n", ((p == NULL) ? "<NULL>" : p));
free(p);
return 0;
}
输出:
[*]'' (")
[*]'-2' (-2)
[*]'--' (--)
[*]'10' (10)
[*]'10-11-' (10-11-)
[*]'' (")
[*]'' (")
See also C Tokenizer
Here is a quick substr() for C that I wrote (yes, the variable initializations needs to be moved to start of the function etc, but you get the idea)
I have seen many "smart" implementations of substr() that are simple one liner calls strncpy()!
They are all wrong (strncpy does not guarantee null termination and thus the call might NOT produce a correct substring!)
Here is something maybe better?
Bring out the bugs!
char* substr(const char* text, int nStartingPos, int nRun)
{
char* emptyString = strdup(""); /* C'mon! This cannot fail */
if(text == NULL) return emptyString;
int textLen = strlen(text);
--nStartingPos;
if((nStartingPos < 0) || (nRun <= 0) || (textLen == 0) || (textLen < nStartingPos)) return emptyString;
char* returnString = (char *)calloc((1 + nRun), sizeof(char));
if(returnString == NULL) return emptyString;
strncat(returnString, (nStartingPos + text), nRun);
/* We do not need emptyString anymore from this point onwards */
free(emptyString);
emptyString = NULL;
return returnString;
}
int main()
{
const char *text = "-2--4--6-7-8-9-10-11-";
char *p = substr(text, -1, 2);
printf("[*]'%s' (\")\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 1, 2);
printf("[*]'%s' (-2)\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 3, 2);
printf("[*]'%s' (--)\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 16, 2);
printf("[*]'%s' (10)\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 16, 20);
printf("[*]'%s' (10-11-)\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 100, 2);
printf("[*]'%s' (\")\n", ((p == NULL) ? "<NULL>" : p));
free(p);
p = substr(text, 1, 0);
printf("[*]'%s' (\")\n", ((p == NULL) ? "<NULL>" : p));
free(p);
return 0;
}
Output :
[*]'' (")
[*]'-2' (-2)
[*]'--' (--)
[*]'10' (10)
[*]'10-11-' (10-11-)
[*]'' (")
[*]'' (")
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
对于本应简单的操作,您的功能似乎非常复杂。 一些问题是(并非所有这些都是错误):
malloc()
是substr("xxx",1,1)
失败,还是工作substr("xxx" ,1,0)
生成一个空字符串。calloc()
无论如何都会覆盖的内存。strncat()
- 您应该在进行任何复制之前知道可用的大小和内存,以便您可以(最有可能)更快地使用memcpy()
。以下部分是我要做的(我更喜欢Python的负值习惯用法从字符串末尾开始计数但我保留了长度而不是结束位置)。
Your function seems very complicated for what should be a simple operation. Some problems are (not all of these are bugs):
strdup()
, and other memory allocation functions, can fail, you should allow for all possible issues.malloc()
failure ofsubstr ("xxx",1,1)
or a workingsubstr ("xxx",1,0)
produces an empty string.calloc()
memory that you're going to overwrite anyway.strncat()
- you should know the sizes and the memory you have available before doing any copying so you can use the (most likely) fastermemcpy()
.The following segment is what I'd do (I rather like the Python idiom of negative values to count from the end of the string but I've kept length rather than end position).
如果输入无效,我会说返回 NULL,而不是使用 malloc() 编辑的空字符串。 这样您就可以使用
if(p)
而不是if(*p == 0)
来测试函数是否失败。另外,我认为你的函数会泄漏内存,因为
emptyString
仅在一个条件中使用free()
d 。 您应该确保无条件free()
它,即在return
之前。至于您对
strncpy()
不以 NUL 终止字符串的评论(这是正确的),如果您使用calloc()
来分配字符串而不是malloc ()
,如果您分配的字节比复制的多一个字节,这不会成为问题,因为calloc()
自动将所有值(包括本例中的末尾)设置为0.我会给你更多的注释,但我讨厌阅读驼峰式代码。 并不是说这有什么问题。
编辑:关于您的更新:
请注意,C 标准将
sizeof(char)
定义为 1,无论您的系统如何。 如果您使用的计算机在一个字节中使用 9 位(上帝禁止),则sizeof(char)
仍将是 1。并不是说sizeof(char) 有什么问题)
- 它清楚地表明了您的意图,并为其他类型的calloc()
或malloc()
调用提供了对称性。 但sizeof(int)
实际上很有用(int
在 16 位和 32 位以及这些新奇的 64 位计算机上可以有不同的大小)。 你懂得越多。我还想重申,与大多数其他 C 代码的一致性是在出现错误时返回
NULL
,而不是""
。 我知道许多函数(例如strcmp()
)如果将 NULL 传递给它们,可能会做坏事 - 这是可以预料的。 但是 C 标准库(以及许多其他 C API)采用的方法是“调用者有责任检查NULL
,而不是函数有责任照顾他/她(如果他/她不这样做)” ”。 如果您想以另一种方式进行操作,那也很酷,但它违背了 C 接口设计中更强大的趋势之一。另外,我会使用
strncpy()
(或memcpy()
)而不是strncat()
。 使用strncat()
(和strcat()
)会掩盖您的意图 - 它会让查看您代码的人认为您想要添加到字符串末尾(您确实这样做了) ,因为在calloc()
之后,结束就是开始),当你想要做的是设置字符串时。strncat()
让它看起来像是您正在添加到一个字符串,而strcpy()
(或其他复制例程)会让它看起来更像您的意图。 以下三行在此上下文中都执行相同的操作 - 选择您认为看起来最好的一行:另外,
strncpy()
和memcpy()
可能是(wee一点点)比 strncat() 更快/更高效。text + nStartingPos
与nStartingPos + text
相同 - 我会将char *
放在第一位,因为我认为这样更清楚,但无论您的顺序如何想要把它们放进去取决于你。 此外,它们周围的括号是不必要的(但很好),因为+
的优先级高于,
。编辑2:三行代码不做同样的事情,但在这种情况下它们都会产生相同的结果。 谢谢你让我知道这一点。
I would say return
NULL
if the input isn't valid rather than amalloc()
ed empty string. That way you can test whether or not the function failed or not withif(p)
rather thanif(*p == 0)
.Also, I think your function leaks memory because
emptyString
is onlyfree()
d in one conditional. You should make sure youfree()
it unconditionally, i.e. right before thereturn
.As to your comment on
strncpy()
not NUL-terminating the string (which is true), if you usecalloc()
to allocate the string rather thanmalloc()
, this won't be a problem if you allocate one byte more than you copy, sincecalloc()
automatically sets all values (including, in this case, the end) to 0.I would give you more notes but I hate reading camelCase code. Not that there's anything wrong with it.
EDIT: With regards to your updates:
Be aware that the C standard defines
sizeof(char)
to be 1 regardless of your system. If you're using a computer that uses 9 bits in a byte (god forbid),sizeof(char)
is still going to be 1. Not that there's anything wrong with sayingsizeof(char)
- it clearly shows your intention and provides symmetry with calls tocalloc()
ormalloc()
for other types. Butsizeof(int)
is actually useful (int
s can be different sizes on 16- and 32- and these newfangled 64-bit computers). The more you know.I'd also like to reiterate that consistency with most other C code is to return
NULL
on an error rather than""
. I know many functions (likestrcmp()
) will probably do bad things if you pass them NULL - this is to be expected. But the C standard library (and many other C APIs) take the approach of "It's the caller's responsibility to check forNULL
, not the function's responsibility to baby him/her if (s)he doesn't." If you want to do it the other way, that's cool, but it's going against one of the stronger trends in C interface design.Also, I would use
strncpy()
(ormemcpy()
) rather thanstrncat()
. Usingstrncat()
(andstrcat()
) obscures your intent - it makes someone looking at your code think you want to add to the end of the string (which you do, because aftercalloc()
, the end is the beginning), when what you want to do is set the string.strncat()
makes it look like you're adding to a string, whilestrcpy()
(or another copy routine) would make it look more like what your intent is. The following three lines all do the same thing in this context - pick whichever one you think looks nicest:Plus,
strncpy()
andmemcpy()
will probably be a (wee little) bit faster/more efficient thanstrncat()
.text + nStartingPos
is the same asnStartingPos + text
- I would put thechar *
first, as I think that's clearer, but whatever order you want to put them in is up to you. Also, the parenthesis around them are unnecessary (but nice), since+
has higher precedence than,
.EDIT 2: The three lines of code don't do the same thing, but in this context they will all produce the same result. Thanks for catching me on that.
您需要检查是否为空。 请记住,它仍然必须为空字符分配 1 个字节。
You need to check for null. Remember that it still must allocate 1 byte for the null character.
strdup 可能会失败(尽管这种情况不太可能发生,也不值得检查,恕我直言)。 然而它确实有另一个问题——它不是标准 C 函数。 最好使用malloc。
strdup could fail (though it is very unlikely and not worth checking for, IMHO). It does have another problem however - it is not a Standard C function. It would be better to use malloc.
您还可以使用 memmove 函数返回从开始到长度的子字符串。
改进/添加 paxdiablo 解决方案中的另一个解决方案:
希望有帮助。 使用 TCC C 编译器在 Windows 7 Home Premium 上进行测试。
You can also use the memmove function to return a substring from start to length.
Improving/adding another solution from paxdiablo's solution:
Hope it helps. Tested on Windows 7 Home Premium with TCC C Compilier.