C:从分隔源字符串创建字符串数组

发布于 2024-08-19 09:22:40 字数 508 浏览 12 评论 0原文

在 C(而不是 C++)中将分隔字符串转换为字符串数组的有效方法是什么?例如,我可能有:

char *input = "valgrind --leak-check=yes --track-origins=yes ./a.out"

源字符串始终只有一个空格作为分隔符。我想要一个 malloc'ed 字符串数组 char *myarray[] 这样:

myarray[0]=="valgrind"
myarray[1]=="--leak-check=yes"
...

编辑 我必须假设有任意数量的令牌inputString 所以我不能只将其限制为 10 或其他。

我尝试使用 strtok 和我实现的链接列表尝试一个混乱的解决方案,但 valgrind 抱怨太多,所以我放弃了。

(如果您想知道,这是我正在尝试编写的基本 Unix shell。)

What would be an efficient way of converting a delimited string into an array of strings in C (not C++)? For example, I might have:

char *input = "valgrind --leak-check=yes --track-origins=yes ./a.out"

The source string will always have only a single space as the delimiter. And I would like a malloc'ed array of malloc'ed strings char *myarray[] such that:

myarray[0]=="valgrind"
myarray[1]=="--leak-check=yes"
...

Edit I have to assume that there are an arbitrary number of tokens in the inputString so I can't just limit it to 10 or something.

I've attempted a messy solution with strtok and a linked list I've implemented, but valgrind complained so much that I gave up.

(If you're wondering, this is for a basic Unix shell I'm trying to write.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

待"谢繁草 2024-08-26 09:22:40

类似这样的事情是怎么回事:

char* string = "valgrind --leak-check=yes --track-origins=yes ./a.out";
char** args = (char**)malloc(MAX_ARGS*sizeof(char*));
memset(args, 0, sizeof(char*)*MAX_ARGS);

char* curToken = strtok(string, " \t");

for (int i = 0; curToken != NULL; ++i)
{
  args[i] = strdup(curToken);
  curToken = strtok(NULL, " \t");
}

What's about something like:

char* string = "valgrind --leak-check=yes --track-origins=yes ./a.out";
char** args = (char**)malloc(MAX_ARGS*sizeof(char*));
memset(args, 0, sizeof(char*)*MAX_ARGS);

char* curToken = strtok(string, " \t");

for (int i = 0; curToken != NULL; ++i)
{
  args[i] = strdup(curToken);
  curToken = strtok(NULL, " \t");
}
画中仙 2024-08-26 09:22:40

如果您一开始就拥有 input 中的所有输入,那么您的令牌永远不会多于 strlen(input)。如果您不允许“”作为令牌,那么您永远不能拥有超过 strlen(input)/2 令牌。因此,除非输入很大,否则您可以安全地编写。

char ** myarray = malloc( (strlen(input)/2) * sizeof(char*) );

int NumActualTokens = 0;
while (char * pToken = get_token_copy(input))
{ 
   myarray[++NumActualTokens] = pToken;
   input = skip_token(input);
}

char ** myarray = (char**) realloc(myarray, NumActualTokens * sizeof(char*));

作为进一步的优化,您可以保留 input 并仅用 \0 替换空格,并将指针放入 input 缓冲区中并放入 myarray[] 中。不需要为每个令牌单独分配内存,除非由于某种原因您需要单独释放它们。

if you have all of the input in input to begin with then you can never have more tokens than strlen(input). If you don't allow "" as a token, then you can never have more than strlen(input)/2 tokens. So unless input is huge you can safely write.

char ** myarray = malloc( (strlen(input)/2) * sizeof(char*) );

int NumActualTokens = 0;
while (char * pToken = get_token_copy(input))
{ 
   myarray[++NumActualTokens] = pToken;
   input = skip_token(input);
}

char ** myarray = (char**) realloc(myarray, NumActualTokens * sizeof(char*));

As a further optimization, you can keep input around and just replace spaces with \0 and put pointers into the input buffer into myarray[]. No need for a separate malloc for each token unless for some reason you need to free them individually.

听,心雨的声音 2024-08-26 09:22:40

您是否记得为标记字符串结尾的终止 null 分配一个额外的字节?

Were you remembering to malloc an extra byte for the terminating null that marks the end of string?

因为看清所以看轻 2024-08-26 09:22:40

来自 OSX 上的 strsep(3) 联机帮助页:

   char **ap, *argv[10], *inputstring;

   for (ap = argv; (*ap = strsep(&inputstring, " \t")) != NULL;)
           if (**ap != '\0')
                   if (++ap >= &argv[10])
                           break;

针对任意标记数量进行编辑:

char **ap, **argv, *inputstring;

int arglen = 10;
argv = calloc(arglen, sizeof(char*));
for (ap = argv; (*ap = strsep(&inputstring, " \t")) != NULL;)
    if (**ap != '\0')
        if (++ap >= &argv[arglen])
        {
            arglen += 10;
            argv = realloc(argv, arglen);
            ap = &argv[arglen-10];
        }

或类似的内容。上面的方法可能行不通,但如果行不通,也相差不远了。构建链表比不断调用 realloc 更有效,但这实际上不是重点 - 重点是如何最好地利用 strsep

From the strsep(3) manpage on OSX:

   char **ap, *argv[10], *inputstring;

   for (ap = argv; (*ap = strsep(&inputstring, " \t")) != NULL;)
           if (**ap != '\0')
                   if (++ap >= &argv[10])
                           break;

Edited for arbitrary # of tokens:

char **ap, **argv, *inputstring;

int arglen = 10;
argv = calloc(arglen, sizeof(char*));
for (ap = argv; (*ap = strsep(&inputstring, " \t")) != NULL;)
    if (**ap != '\0')
        if (++ap >= &argv[arglen])
        {
            arglen += 10;
            argv = realloc(argv, arglen);
            ap = &argv[arglen-10];
        }

Or something close to that. The above may not work, but if not it's not far off. Building a linked list would be more efficient than continually calling realloc, but that's really besides the point - the point is how best to make use of strsep.

耶耶耶 2024-08-26 09:22:40

看看其他答案,对于 C 初学者来说,由于代码的紧凑,它看起来会很复杂,我想我会把它放在初学者身上,实际解析字符串而不是使用 可能更容易strtok...类似这样:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

char **parseInput(const char *str, int *nLen);
void resizeptr(char ***, int nLen);

int main(int argc, char **argv){
    int maxLen = 0;
    int i = 0;
    char **ptr = NULL;
    char *str = "valgrind --leak-check=yes --track-origins=yes ./a.out";
    ptr = parseInput(str, &maxLen);
    if (!ptr) printf("Error!\n");
    else{
        for (i = 0; i < maxLen; i++) printf("%s\n", ptr[i]);
    }
    for (i = 0; i < maxLen; i++) free(ptr[i]);
    free(ptr);
    return 0;
}

char **parseInput(const char *str, int *Index){
    char **pStr = NULL;
    char *ptr = (char *)str;
    int charPos = 0, indx = 0;
    while (ptr++ && *ptr){
        if (!isspace(*ptr) && *ptr) charPos++;
        else{
            resizeptr(&ptr, ++indx);
            pStr[indx-1] = (char *)malloc(((charPos+1) * sizeof(char))+1);
            if (!pStr[indx-1]) return NULL;
            strncpy(pStr[indx-1], ptr - (charPos+1), charPos+1);
            pStr[indx-1][charPos+1]='\0';
            charPos = 0;
        }
    }
    if (charPos > 0){
        resizeptr(&pStr, ++indx);
        pStr[indx-1] = (char *)malloc(((charPos+1) * sizeof(char))+1);
        if (!pStr[indx-1]) return NULL;
        strncpy(pStr[indx-1], ptr - (charPos+1), charPos+1);
        pStr[indx-1][charPos+1]='\0';
    }
    *Index = indx;
    return (char **)pStr;
}

void resizeptr(char ***ptr, int nLen){
    if (*(ptr) == (char **)NULL){
        *(ptr) = (char **)malloc(nLen * sizeof(char*));
        if (!*(ptr)) perror("error!");
    }else{
        char **tmp = (char **)realloc(*(ptr),nLen);
        if (!tmp) perror("error!");
        *(ptr) = tmp;
    }
}

我稍微修改了代码以使其更容易。我使用的唯一字符串函数是 strncpy 。当然它有点啰嗦,但它确实动态地重新分配字符串数组,而不是使用硬编码的 MAX_ARGS,这意味着双指针当只有 3 或 4 个就已经占用内存了,这也会使内存使用高效且很小,通过使用 realloc,简单的解析通过使用 isspace 来覆盖,因为它使用指针进行迭代。当遇到空格时,它会重新分配双指针,并malloc偏移量来保存字符串。

请注意如何在 resizeptr 函数中使用三重指针。事实上,我认为这将是一个简单 C 程序、指针、realloc、malloc、按引用传递、基本的绝佳示例。解析字符串的元素...

希望这有帮助,
此致,
汤姆.

Looking at the other answers, for a beginner in C, it would look complex due to the tight size of code, I thought I would put this in for a beginner, it might be easier to actually parse the string instead of using strtok...something like this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

char **parseInput(const char *str, int *nLen);
void resizeptr(char ***, int nLen);

int main(int argc, char **argv){
    int maxLen = 0;
    int i = 0;
    char **ptr = NULL;
    char *str = "valgrind --leak-check=yes --track-origins=yes ./a.out";
    ptr = parseInput(str, &maxLen);
    if (!ptr) printf("Error!\n");
    else{
        for (i = 0; i < maxLen; i++) printf("%s\n", ptr[i]);
    }
    for (i = 0; i < maxLen; i++) free(ptr[i]);
    free(ptr);
    return 0;
}

char **parseInput(const char *str, int *Index){
    char **pStr = NULL;
    char *ptr = (char *)str;
    int charPos = 0, indx = 0;
    while (ptr++ && *ptr){
        if (!isspace(*ptr) && *ptr) charPos++;
        else{
            resizeptr(&ptr, ++indx);
            pStr[indx-1] = (char *)malloc(((charPos+1) * sizeof(char))+1);
            if (!pStr[indx-1]) return NULL;
            strncpy(pStr[indx-1], ptr - (charPos+1), charPos+1);
            pStr[indx-1][charPos+1]='\0';
            charPos = 0;
        }
    }
    if (charPos > 0){
        resizeptr(&pStr, ++indx);
        pStr[indx-1] = (char *)malloc(((charPos+1) * sizeof(char))+1);
        if (!pStr[indx-1]) return NULL;
        strncpy(pStr[indx-1], ptr - (charPos+1), charPos+1);
        pStr[indx-1][charPos+1]='\0';
    }
    *Index = indx;
    return (char **)pStr;
}

void resizeptr(char ***ptr, int nLen){
    if (*(ptr) == (char **)NULL){
        *(ptr) = (char **)malloc(nLen * sizeof(char*));
        if (!*(ptr)) perror("error!");
    }else{
        char **tmp = (char **)realloc(*(ptr),nLen);
        if (!tmp) perror("error!");
        *(ptr) = tmp;
    }
}

I slightly modified the code to make it easier. The only string function that I used was strncpy..sure it is a bit long-winded but it does reallocate the array of strings dynamically instead of using a hard-coded MAX_ARGS, which means that the double pointer is already hogging up memory when only 3 or 4 would do, also which would make the memory usage efficient and tiny, by using realloc, the simple parsing is covered by employing isspace, as it iterates using the pointer. When a space is encountered, it reallocates the double pointer, and malloc the offset to hold the string.

Notice how the triple pointers are used in the resizeptr function.. in fact, I thought this would serve an excellent example of a simple C program, pointers, realloc, malloc, passing-by-reference, basic element of parsing a string...

Hope this helps,
Best regards,
Tom.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文