从C中的不规则字符串中获取所有整数

发布于 2024-11-14 01:33:06 字数 289 浏览 6 评论 0原文

我正在寻找一种(相对)简单的方法来解析随机字符串并从中提取所有整数并将它们放入数组中 - 这与其他一些类似的问题不同,因为我的字符串没有标准格式。

示例:

pt112parah salin10n m5:isstupid::42$%&%^*%7first3

我最终需要获得一个包含以下内容的数组:

112 10 5 42 7 3

并且我想要一种比逐字符遍历字符串更有效的方法。

感谢您的帮助

I am looking for a (relatively) simple way to parse a random string and extract all of the integers from it and put them into an Array - this differs from some of the other questions which are similar because my strings have no standard format.

Example:

pt112parah salin10n m5:isstupid::42$%&%^*%7first3

I would need to eventually get an array with these contents:

112 10 5 42 7 3

And I would like a method more efficient then going character by character through a string.

Thanks for your help

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

任谁 2024-11-21 01:33:06

一个快速的解决方案。我假设没有数字超出 long 的范围,并且没有负号需要担心。如果这些都是问题,那么您需要做更多的工作来分析 strtol() 的结果,并且需要检测 '-' 后跟一个数字。

该代码确实循环遍历所有字符;我认为你无法避免这一点。但它确实使用 strtol() 来处理每个数字序列(一旦找到第一个数字),并从 strtol() 停止处继续(并且 strtol () 很友善地告诉我们它在哪里停止转换)。

#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>

int main(void)
{
    const char data[] = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
    long results[100];
    int  nresult = 0;

    const char *s = data;
    char c;

    while ((c = *s++) != '\0')
    {
        if (isdigit(c))
        {
            char *end;
            results[nresult++] = strtol(s-1, &end, 10);
            s = end;
        }
    }

    for (int i = 0; i < nresult; i++)
        printf("%d: %ld\n", i, results[i]);
    return 0;
}

输出:

0: 112
1: 10
2: 5
3: 42
4: 7
5: 3

A quick solution. I'm assuming that there are no numbers that exceed the range of long, and that there are no minus signs to worry about. If those are problems, then you need to do a lot more work analyzing the results of strtol() and you need to detect '-' followed by a digit.

The code does loop over all characters; I don't think you can avoid that. But it does use strtol() to process each sequence of digits (once the first digit is found), and resumes where strtol() left off (and strtol() is kind enough to tell us exactly where it stopped its conversion).

#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>

int main(void)
{
    const char data[] = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
    long results[100];
    int  nresult = 0;

    const char *s = data;
    char c;

    while ((c = *s++) != '\0')
    {
        if (isdigit(c))
        {
            char *end;
            results[nresult++] = strtol(s-1, &end, 10);
            s = end;
        }
    }

    for (int i = 0; i < nresult; i++)
        printf("%d: %ld\n", i, results[i]);
    return 0;
}

Output:

0: 112
1: 10
2: 5
3: 42
4: 7
5: 3
堇年纸鸢 2024-11-21 01:33:06

比逐个字符地浏览更高效

不可能,因为你必须查看每个字符才能知道它不是整数。

现在,考虑到您必须逐个字符地遍历字符串,我建议您简单地将每个字符转换为 int 并检查:

//string tmp = ""; declared outside of loop.
//pseudocode for inner loop:
int intVal = (int)c;
if(intVal >=48 && intVal <= 57){ //0-9 are 48-57 when char casted to int.
    tmp += c;
}
else if(tmp.length > 0){
    array[?] = (int)tmp; // ? is where to add the int to the array.
    tmp = "";
}

array 将包含您的解决方案。

More efficient than going through character by character?

Not possible, because you must look at every character to know that it is not an integer.

Now, given that you have to go though the string character by character, I would recommend simply casting each character as an int and checking that:

//string tmp = ""; declared outside of loop.
//pseudocode for inner loop:
int intVal = (int)c;
if(intVal >=48 && intVal <= 57){ //0-9 are 48-57 when char casted to int.
    tmp += c;
}
else if(tmp.length > 0){
    array[?] = (int)tmp; // ? is where to add the int to the array.
    tmp = "";
}

array will contain your solution.

放血 2024-11-21 01:33:06

只是因为我整天都在写 Python,我想休息一下。声明一个数组会很棘手。您要么必须运行它两次才能计算出您有多少个数字(然后分配数组),要么像本示例中那样一一使用数字。

请注意,“0”到“9”的 ASCII 字符为 48 到 57(即连续的)。

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdbool.h>

int main(int argc, char **argv)
{
    char *input = "pt112par0ah salin10n m5:isstupid::42$%&%^*%7first3";

    int length = strlen(input);
    int value = 0;
    int i;
    bool gotnumber = false;
    for (i = 0; i < length; i++)
    {
        if (input[i] >= '0' && input[i] <= '9')
        {
            gotnumber = true;
            value = value * 10; // shift up a column
            value += input[i] - '0'; // casting the char to an int
        }
        else if (gotnumber) // we hit this the first time we encounter a non-number after we've had numbers
        {
            printf("Value: %d \n", value);
            value = 0;
            gotnumber = false;
        }
    }

    return 0;
}

编辑:以前的版本没有处理 0

Just because I've been writing Python all day and I want a break. Declaring an array will be tricky. Either you have to run it twice to work out how many numbers you have (and then allocate the array) or just use the numbers one by one as in this example.

NB the ASCII characters for '0' to '9' are 48 to 57 (i.e. consecutive).

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdbool.h>

int main(int argc, char **argv)
{
    char *input = "pt112par0ah salin10n m5:isstupid::42$%&%^*%7first3";

    int length = strlen(input);
    int value = 0;
    int i;
    bool gotnumber = false;
    for (i = 0; i < length; i++)
    {
        if (input[i] >= '0' && input[i] <= '9')
        {
            gotnumber = true;
            value = value * 10; // shift up a column
            value += input[i] - '0'; // casting the char to an int
        }
        else if (gotnumber) // we hit this the first time we encounter a non-number after we've had numbers
        {
            printf("Value: %d \n", value);
            value = 0;
            gotnumber = false;
        }
    }

    return 0;
}

EDIT: the previous verison didn't deal with 0

云巢 2024-11-21 01:33:06

另一个解决方案是使用 strtok 函数

/* strtok example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
  char * pch;
  printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str," abcdefghijklmnopqrstuvwxyz:$%&^*");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " abcdefghijklmnopqrstuvwxyz:$%&^*");
  }
  return 0;
}

给出:

112
10
5
42
7
3

也许不是此任务的最佳解决方案,因为您需要指定将被视为标记的所有字符。但它是其他解决方案的替代方案。

Another solution is to use the strtok function

/* strtok example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
  char * pch;
  printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str," abcdefghijklmnopqrstuvwxyz:$%&^*");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " abcdefghijklmnopqrstuvwxyz:$%&^*");
  }
  return 0;
}

Gives:

112
10
5
42
7
3

Perhaps not the best solution for this task, since you need to specify all characters that will be treated as a token. But it is an alternative to the other solutions.

南烟 2024-11-21 01:33:06

如果您不介意使用 C++ 而不是 C(通常没有充分的理由不这样做),那么您可以将解决方案减少到只有两行代码(使用 AX 解析器生成器):

vector<int> numbers;
auto number_rule = *(*(axe::r_any() - axe::r_num()) 
   & *axe::r_num() >> axe::e_push_back(numbers));

现在测试它:

std::string str = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
number_rule(str.begin(), str.end());
std::for_each(numbers.begin(), numbers.end(), [](int i) { std::cout << "\ni=" << i; });

并且确定够了,你已经拿回你的号码了。

作为奖励,您在解析 unicode 宽字符串时不需要更改任何内容:

std::wstring str = L"pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
number_rule(str.begin(), str.end());
std::for_each(numbers.begin(), numbers.end(), [](int i) { std::cout << "\ni=" << i; });

果然,您得到了相同的数字。

And if you don't mind using C++ instead of C (usually there isn't a good reason why not), then you can reduce your solution to just two lines of code (using AXE parser generator):

vector<int> numbers;
auto number_rule = *(*(axe::r_any() - axe::r_num()) 
   & *axe::r_num() >> axe::e_push_back(numbers));

now test it:

std::string str = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
number_rule(str.begin(), str.end());
std::for_each(numbers.begin(), numbers.end(), [](int i) { std::cout << "\ni=" << i; });

and sure enough, you got your numbers back.

And as a bonus, you don't need to change anything when parsing unicode wide strings:

std::wstring str = L"pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
number_rule(str.begin(), str.end());
std::for_each(numbers.begin(), numbers.end(), [](int i) { std::cout << "\ni=" << i; });

and sure enough, you got the same numbers back.

别靠近我心 2024-11-21 01:33:06
#include <stdio.h>
#include <string.h>
#include <math.h>

int main(void)
{
    char *input = "pt112par0ah salin10n m5:isstupid::42$%&%^*%7first3";
    char *pos = input;
    int integers[strlen(input) / 2];   // The maximum possible number of integers is half the length of the string, due to the smallest number of digits possible per integer being 1 and the smallest number of characters between two different integers also being 1
    unsigned int numInts= 0;

    while ((pos = strpbrk(pos, "0123456789")) != NULL) // strpbrk() prototype in string.h
    {
        sscanf(pos, "%u", &(integers[numInts]));

        if (integers[numInts] == 0)
            pos++;
        else
            pos += (int) log10(integers[numInts]) + 1;        // requires math.h

        numInts++;
    }

    for (int i = 0; i < numInts; i++)
        printf("%d ", integers[i]);

    return 0;
}

查找整数是通过在偏移指针上重复调用 strpbrk() 来完成的,指针再次偏移等于整数中位数的数量,通过查找以 10 为底的对数来计算的整数并加 1(当整数为 0 时有特殊情况)。计算对数时无需在整数上使用 abs() ,正如您所说,整数将是非负的。如果您想提高空间效率,可以使用 unsigned char integers[] 而不是 int integers[],正如您所说,整数都将 <256 ,但这不是必需的。

#include <stdio.h>
#include <string.h>
#include <math.h>

int main(void)
{
    char *input = "pt112par0ah salin10n m5:isstupid::42$%&%^*%7first3";
    char *pos = input;
    int integers[strlen(input) / 2];   // The maximum possible number of integers is half the length of the string, due to the smallest number of digits possible per integer being 1 and the smallest number of characters between two different integers also being 1
    unsigned int numInts= 0;

    while ((pos = strpbrk(pos, "0123456789")) != NULL) // strpbrk() prototype in string.h
    {
        sscanf(pos, "%u", &(integers[numInts]));

        if (integers[numInts] == 0)
            pos++;
        else
            pos += (int) log10(integers[numInts]) + 1;        // requires math.h

        numInts++;
    }

    for (int i = 0; i < numInts; i++)
        printf("%d ", integers[i]);

    return 0;
}

Finding the integers is accomplished via repeated calls to strpbrk() on the offset pointer, with the pointer being offset again by an amount equaling the number of digits in the integer, calculated by finding the base-10 logarithm of the integer and adding 1 (with a special case for when the integer is 0). No need to use abs() on the integer when calculating the logarithm, as you stated the integers will be non-negative. If you wanted to be more space-efficient, you could use unsigned char integers[] rather than int integers[], as you stated the integers will all be <256, but that isn't a necessity.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文