需要使用 strtok() 知道两个标记分隔符之间何时没有数据出现

发布于 2024-12-24 01:01:28 字数 612 浏览 1 评论 0原文

我正在尝试标记一个字符串,但我需要确切地知道两个标记之间何时看不到数据。例如,当标记以下字符串“a,b,c,,,d,e”时,我需要了解“d”和“”之间的两个空槽>e'...我无法仅使用 strtok() 找到它。我的尝试如下所示:

char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
tok=strtok(line,delim);//line contains the data

for(i=0;i<num_of_fields;i++,tok=strtok(NULL,delim))
{
    if(tok)
        sprintf(arr_fields[i], "%s", tok);
    else
        sprintf(arr_fields[i], "%s", "-");          
}

使用上述示例执行上述代码将字符 a、b、c、d、e 放入 arr_fields 的前五个元素中,这是不可取的。我需要每个字符的位置进入数组的特定索引:即如果两个字符之间缺少一个字符,则应按原样记录。

I am trying to tokenize a string but I need to know exactly when no data is seen between two tokens. e.g when tokenizing the following string "a,b,c,,,d,e" I need to know about the two empty slots between 'd' and 'e'... which I am unable to find out simply using strtok(). My attempt is shown below:

char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
tok=strtok(line,delim);//line contains the data

for(i=0;i<num_of_fields;i++,tok=strtok(NULL,delim))
{
    if(tok)
        sprintf(arr_fields[i], "%s", tok);
    else
        sprintf(arr_fields[i], "%s", "-");          
}

Executing the above code with the aforementioned examples put characters a,b,c,d,e into first five elements of arr_fields which is not desirable. I need the position of each character to go in specific indexes of array: i.e if there is a character missing between two characters, it should be recorded as is.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

永不分离 2024-12-31 01:01:28

7.21.5.8 strtok 函数

该标准对 strtok 进行了如下规定:

[#3] 序列中的第一个调用搜索字符串
s1 指向的第一个字符不是
包含在 s2 指向的当前分隔符字符串中。
如果没有找到这样的字符,则说明中没有标记
s1 指向的字符串和 strtok 函数返回
一个空指针。如果找到这样的字符,则它是
第一个标记的开始。

在上面的引用中,我们可以看到您不能使用 strtok 作为您的特定问题的解决方案,因为它会将 delims 中找到的任何连续字符视为单个< /strong> 令牌。


我是否注定要默默哭泣,或者有人可以帮助我吗?

您可以轻松实现您自己的 strtok 版本来执行您想要的操作,请参阅本文末尾的代码片段。

strtok_single 使用 strpbrk (char const* src, const char* delims) 它将返回一个指向 delims 中第一次出现的任何字符的指针 在以 null 结尾的字符串 src 中找到。

如果没有找到匹配的字符,该函数将返回 NULL。


strtok_single

char *
strtok_single (char * str, char const * delims)
{
  static char  * src = NULL;
  char  *  p,  * ret = 0;

  if (str != NULL)
    src = str;

  if (src == NULL)
    return NULL;

  if ((p = strpbrk (src, delims)) != NULL) {
    *p  = 0;
    ret = src;
    src = ++p;

  } else if (*src) {
    ret = src;
    src = NULL;
  }

  return ret;
}

示例使用

  char delims[] = ",";
  char data  [] = "foo,bar,,baz,biz";

  char * p    = strtok_single (data, delims);

  while (p) {
    printf ("%s\n", *p ? p : "<empty>");

    p = strtok_single (NULL, delims);
  }

输出

foo
bar
<empty>
baz
biz

7.21.5.8 the strtok function

The standard says the following regarding strtok:

[#3] The first call in the sequence searches the string
pointed to by s1 for the first character that is not
contained in the current separator string pointed to by s2.
If no such character is found, then there are no tokens in
the string pointed to by s1 and the strtok function returns
a null pointer. If such a character is found, it is the
start of the first token.

In the above quote we can read you cannot use strtok as a solution to your specific problem, since it will treat any sequential characters found in delims as a single token.


Am I doomed to weep in silence, or can somebody help me out?

You can easily implement your own version of strtok that does what you want, see the snippets at the end of this post.

strtok_single makes use of strpbrk (char const* src, const char* delims) which will return a pointer to the first occurrence of any character in delims that is found in the null-terminated string src.

If no matching character is found the function will return NULL.


strtok_single

char *
strtok_single (char * str, char const * delims)
{
  static char  * src = NULL;
  char  *  p,  * ret = 0;

  if (str != NULL)
    src = str;

  if (src == NULL)
    return NULL;

  if ((p = strpbrk (src, delims)) != NULL) {
    *p  = 0;
    ret = src;
    src = ++p;

  } else if (*src) {
    ret = src;
    src = NULL;
  }

  return ret;
}

sample use

  char delims[] = ",";
  char data  [] = "foo,bar,,baz,biz";

  char * p    = strtok_single (data, delims);

  while (p) {
    printf ("%s\n", *p ? p : "<empty>");

    p = strtok_single (NULL, delims);
  }

output

foo
bar
<empty>
baz
biz
別甾虛僞 2024-12-31 01:01:28

如果您想要的话,则不能使用 strtok()。从手册页:

解析的两个或多个连续分隔符的序列
字符串被视为单个分隔符。分隔符位于
字符串的开头或结尾将被忽略。换句话说:
strtok() 返回的标记始终是非空字符串。

因此,在您的示例中,它只会从 c 跳转到 d

您将必须手动解析字符串,或者可能搜索 CSV 解析库,这将使您的生活更轻松。

You can't use strtok() if that's what you want. From the man page:

A sequence of two or more contiguous delimiter characters in the parsed
string is considered to be a single delimiter. Delimiter characters at
the start or end of the string are ignored. Put another way: the
tokens returned by strtok() are always nonempty strings.

Therefore it is just going to jump from c to d in your example.

You're going to have to parse the string manually or perhaps search for a CSV parsing library that would make your life easier.

我不在是我 2024-12-31 01:01:28

最近我在寻找同一问题的解决方案并找到了这个线程。

您可以使用strsep()
从手册中:

引入 strsep() 函数作为 strtok(3) 的替代品,
因为后者无法处理空字段。

Lately I was looking for a solution to the same problem and found this thread.

You can use strsep().
From the manual:

The strsep() function was introduced as a replacement for strtok(3),
since the latter cannot handle empty fields.

手心的温暖 2024-12-31 01:01:28

正如这个答案中提到的,您需要自己实现类似strtok的东西。我更喜欢使用 strcspn(而不是 strpbrk),因为它允许更少的 if 语句:

char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;

int current_token= 0;
int token_length;
for (i = 0; i < num_of_fields; i++, token_length = strcspn(line + current_token,delim))
{
    if(token_length)
        sprintf(arr_fields[i], "%.*s", token_length, line + current_token);
    else
        sprintf(arr_fields[i], "%s", "-");
    current_token += token_length;
}

As mentioned in this answer, you'll want to implement something like strtok yourself. I prefer using strcspn (as opposed to strpbrk), as it allows for fewer if statements:

char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;

int current_token= 0;
int token_length;
for (i = 0; i < num_of_fields; i++, token_length = strcspn(line + current_token,delim))
{
    if(token_length)
        sprintf(arr_fields[i], "%.*s", token_length, line + current_token);
    else
        sprintf(arr_fields[i], "%s", "-");
    current_token += token_length;
}
樱&纷飞 2024-12-31 01:01:28
  1. 解析(例如,strtok)
  2. 排序
  3. 插入
  4. 冲洗并根据需要重复:)
  1. Parse (for example, strtok)
  2. Sort
  3. Insert
  4. Rinse and repeat as needed :)
☆獨立☆ 2024-12-31 01:01:28

您可以尝试使用 strchr 找出 , 符号的位置。手动将字符串标记为您找到的标记(使用 memcpystrncpy),然后再次使用 strchr。您将能够通过这种方式查看两个或多个逗号是否彼此相邻(strchr 将返回其减法等于 1 的数字),并且您可以编写一个 if 语句来处理这种情况。

You could try using strchr to find out the locations of the , symbols. Tokenize manually your string up to the token you found (using memcpy or strncpy) and then use again strchr. You will be able to see if two or more commas are next to each other this way (strchr will return numbers that their subtraction will equal 1) and you can write an if statement to handle that case.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文