具体字符串识别和提取

发布于 2024-12-26 11:49:29 字数 629 浏览 1 评论 0原文

我有一个这样的数字文件:XXX 是未知数字

XXXX

<小时> YY YYYY YYY YYYY <小时> YYYY YYY YY YYY <小时> ZZZ <小时> UUU UU UUUU UUUUUU UU UUUU <小时> UU UUU UUUU U

每行的数字数量和“行号”的数量未知。 我只知道有多少个“块”。 (其中一个块是一个数字,后跟几条数轴)

我的目标是: - 提取 XXXX 并用它填充一个选项卡 - 将“行号”标记为数字并用它归档我的矩阵

我还拥有什么。 我读了一行,但不知道它是单个数字还是一行数字。

我尝试使用 sscanf 来确定是否只有一个数字或多个数字,但这并不是结论性的。我还检查了 ret 的值,但 sscanf 总是返回数字 1。 因此不可能确定是否存在多个数字。

ret = sscanf(line, "%d");

我不想使用PCRE。我确信可以使用标准 c 库来实现它,但是如何实现呢?我如何从 char* 中区分两种线?

谢谢,抱歉我的英语:)

i have a file of numbers like that : XXX are unknow numbers

XXXX


YY YYYY YYY YYYY


YYYY YYY YY YYY


ZZZ


UUU UU UUUU UUUUUU UU UUUU


UU UUU UUUU U


the number of numbers per lines and numbers of "line number" are unknowed.
I just know how many "blocks" there is. (where a block is a number followed by several number lines)

My aims are:
- extracts XXXX and fill a tab with it
- tokenize the "line number" into number and file my matrice with it

What i have yet.
i read a line, but don't know if it's a single number or a line of numbers.

I tried with sscanf, to determine if there is just one number or several, but it's not conclusive. I checked also the value of ret but sscanf always return the number 1.
So it's impossible to determine if there is more than just one number.

ret = sscanf(line, "%d");

I don't want to use PCRE. I'm sur it's possible to make it with the standard c library, but how ? How from a char* can i make the difference between the two kinds of line ?

Thanks, and sorry for my english : )

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

莫多说 2025-01-02 11:49:29

如果行分隔符是换行符 (\n) 并且标记分隔符是空格 (\s),则一次将一个字符读入缓冲区。

一旦您点击任一分隔符,终止缓冲区,打印它,重置缓冲区的索引,然后继续读取文件以找到下一个分隔符。

下面是执行此操作的一些代码:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* 
    INT_MAX is 2147483647, and so the maximum digit 
    length is 10. We add another digit to hold a 
    null terminator.
*/

static const unsigned int kMaxNumberLength = 11;
static const char *kNumberFilename = "numbers.txt";

int main(int argc, char *argv[])
{
    FILE *fp = NULL;
    char currC, buffer[kMaxNumberLength];
    unsigned int cIndex = 0U;

    fp = fopen(kNumberFilename, "r");

    if (fp) {
        do {
            currC = fgetc(fp);
            buffer[cIndex] = currC;
            if ((currC == ' ') || (currC == '\n')) {
                buffer[cIndex] = '\0'; /* terminate buffer */
                fprintf(stdout, "found number: %d\n", atoi(buffer));
                cIndex = 0U;
                continue;
            }
            cIndex++;
        } while (currC != EOF);

        fclose(fp);
    }
    else
        return EXIT_FAILURE;

    return EXIT_SUCCESS;
}

假设您有以下文件 numbers.txt

1234
234 567 1
4 5
9

让我们编译并运行代码:

$ gcc -Wall test.c
$ ./a.out numbers.txt 
found number: 1234
found number: 234
found number: 567
found number: 1
found number: 4
found number: 5
found number: 9

If your line separator is a newline (\n) and your token separator is a whitespace (\s) then read one character at a time into a buffer.

Once you hit either separator, terminate the buffer, print it, reset the buffer's index, and then keep on reading through the file for the next separator.

Here's some code to do that:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* 
    INT_MAX is 2147483647, and so the maximum digit 
    length is 10. We add another digit to hold a 
    null terminator.
*/

static const unsigned int kMaxNumberLength = 11;
static const char *kNumberFilename = "numbers.txt";

int main(int argc, char *argv[])
{
    FILE *fp = NULL;
    char currC, buffer[kMaxNumberLength];
    unsigned int cIndex = 0U;

    fp = fopen(kNumberFilename, "r");

    if (fp) {
        do {
            currC = fgetc(fp);
            buffer[cIndex] = currC;
            if ((currC == ' ') || (currC == '\n')) {
                buffer[cIndex] = '\0'; /* terminate buffer */
                fprintf(stdout, "found number: %d\n", atoi(buffer));
                cIndex = 0U;
                continue;
            }
            cIndex++;
        } while (currC != EOF);

        fclose(fp);
    }
    else
        return EXIT_FAILURE;

    return EXIT_SUCCESS;
}

Let's say you have the following file numbers.txt:

1234
234 567 1
4 5
9

Let's compile and run the code:

$ gcc -Wall test.c
$ ./a.out numbers.txt 
found number: 1234
found number: 234
found number: 567
found number: 1
found number: 4
found number: 5
found number: 9
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文