C中的utf8字符串和malloc
使用“opendir”和“readdir”我确实读取了目录内容。 在这个过程中,我做了一些字符串操作/分配: 类似这样的:
int stringlength = strlen(cur_dir)+strlen(ep->d_name)+2;
char *file_with_path = xmalloc(stringlength); //xmalloc is a malloc wrapper with some tests (like no more memory)
snprintf (file_with_path, (size_t)stringlength, "%s/%s", cur_dir, ep->d_name);
但是如果一个字符串包含一个两字节的 utf8 字符怎么办? 你如何处理这个问题?
stringlength*2?
谢谢
With "opendir" and "readdir" i do read a directories content.
During that process i do some strings manipulation / allocation:
something like that:
int stringlength = strlen(cur_dir)+strlen(ep->d_name)+2;
char *file_with_path = xmalloc(stringlength); //xmalloc is a malloc wrapper with some tests (like no more memory)
snprintf (file_with_path, (size_t)stringlength, "%s/%s", cur_dir, ep->d_name);
But what if a string contains a two-byte utf8 char?
How do you handle that issue?
stringlength*2?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
strlen()
计算字符串中的字节数,它不关心包含的字节是否表示 UTF-8 编码的 Unicode 字符。因此,例如,包含 UTF-8 编码“aöü”的字符串的strlen()
将返回5
,因为该字符串被编码为"a \xc3\xb6\xc3\xbc"
。strlen()
counts the bytes in the string, it doesn't care if the contained bytes represent UTF-8 encoded Unicode characters. So, for example,strlen()
of a string containing an UTF-8 encoding of "aöü" would return5
, since the string is encoded as"a\xc3\xb6\xc3\xbc"
.strlen
计算字符串中的字节数(直到终止 NUL),而不是 UTF-8 字符数,因此stringlength
应该已经是您需要的大小它。strlen
counts the number of bytes in a string (up to the terminating NUL), not the number of UTF-8 characters, sostringlength
should already be as large as you need it.