C语言中如何获取两个子串之间的子串?

发布于 2024-08-18 07:27:24 字数 346 浏览 7 评论 0 原文

我有一个数据包捕获代码,可将 http 有效负载写入文件中。现在我想从这些转储中提取 URL 信息。 对于每个数据包,有效负载都是这样开始的。

获取 /intl/en_com/images/logo_plain.png HTTP/1.1..主机: www.google.co.in..用户代理: Mozilla/5.0

我想提取:

  1. “GET”和“HTTP/1.1”之间的字符串
  2. “Host:”和“User-Agent”之间的字符串

如何在 C 中执行此操作?有没有内置的字符串函数?或者正则表达式?

I have a packet capture code that writes http payload into a file. Now i want to extract the URL information from these dumps.
For each packet , the payload begins like this.

GET /intl/en_com/images/logo_plain.png
HTTP/1.1..Host:
www.google.co.in..User-Agent:
Mozilla/5.0

I would like to extract :

  1. the string between "GET" and "HTTP/1.1"
  2. the string between "Host:" and "User-Agent"

How to do this in C ? Are there any inbuilt string functions ? Or Regular expressions ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

如日中天 2024-08-25 07:27:24

C 没有内置正则表达式,但可以使用库: http://www.arglist.com /regex/, http://www.pcre.org/ 是我看到的两个最常见。

对于如此简单的任务,您无需使用正则表达式即可轻松完成。如果这些行都小于某个最大长度 MAXLEN,只需一次处理一行:

char buf[MAXLEN];
char url[MAXLEN];
char host[MAXLEN];
int state = 0;      /* 0: Haven't seen GET yet; 1: haven't seen Host yet */
FILE *f = fopen("my_input_file", "rb");

if (!f) {
    report_error_somehow();
}

while (fgets(buf, sizeof buf, f)) {
    /* Strip trailing \r and \n */
    int len = strlen(buf);
    if (len >= 2 && buf[len - 1] == '\n' && buf[len - 2] == '\r') {
        buf[len - 2] = 0;
    } else {
        if (feof(f)) {
            /* Last line was not \r\n-terminated: probably OK to ignore */
        } else {
            /* Either the line was too long, or ends with \n but not \r\n. */
            report_error_somehow();
        }
    }

    if (state == 0 && !memcmp(buf, "GET ", 4)) {
        strcpy(url, buf + 4);    /* We know url[] is big enough */
        ++state;
    } else if (state == 1 && !memcmp(buf, "Host: ", 6)) {
        strcpy(host, buf + 6);   /* We know host[] is big enough */
        break;
    }
}

fclose(f);

此解决方案不需要像 KennyTM 的答案那样将整个文件缓冲在内存中(尽管这样也可以)如果您知道文件很小,则采用这种方式)。请注意,我们使用 fgets() 而不是不安全的 gets(),后者很容易在长行上溢出缓冲区。

C doesn't have built-in regular expressions, though libraries are available: http://www.arglist.com/regex/, http://www.pcre.org/ are the two I see most often.

For a task this simple, you can easily get away without using regexes though. Provided the lines are all less than some maximum length MAXLEN, just process them one line at a time:

char buf[MAXLEN];
char url[MAXLEN];
char host[MAXLEN];
int state = 0;      /* 0: Haven't seen GET yet; 1: haven't seen Host yet */
FILE *f = fopen("my_input_file", "rb");

if (!f) {
    report_error_somehow();
}

while (fgets(buf, sizeof buf, f)) {
    /* Strip trailing \r and \n */
    int len = strlen(buf);
    if (len >= 2 && buf[len - 1] == '\n' && buf[len - 2] == '\r') {
        buf[len - 2] = 0;
    } else {
        if (feof(f)) {
            /* Last line was not \r\n-terminated: probably OK to ignore */
        } else {
            /* Either the line was too long, or ends with \n but not \r\n. */
            report_error_somehow();
        }
    }

    if (state == 0 && !memcmp(buf, "GET ", 4)) {
        strcpy(url, buf + 4);    /* We know url[] is big enough */
        ++state;
    } else if (state == 1 && !memcmp(buf, "Host: ", 6)) {
        strcpy(host, buf + 6);   /* We know host[] is big enough */
        break;
    }
}

fclose(f);

This solution doesn't require buffering the entire file in memory as KennyTM's answer does (though that is fine by the way if you know the files are small). Notice that we use fgets() instead of the unsafe gets(), which is prone to overflow buffers on long lines.

莫相离 2024-08-25 07:27:24

使用 strchr (或 strstr)查找 \r 的位置。由于字符串 GETHTTP/1.1 以及 Host: 是固定长度的,因此可以轻松提取其间路径的索引和位置。


如果你想使用正则表达式,在 POSIX 兼容的系统上有 regcomp(3) ,但这也很难使用。

Look for the location of \r using strchr (or strstr). Since the strings GET and HTTP/1.1 and Host: are of fixed length, the index and location of the path in between can be extracted easily.


If you want to use regular expressions, on POSIX-compliant systems there is regcomp(3), but that's also quite hard to use.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文