当前位置：文江博客话题详情

C语言中如何获取两个子串之间的子串？

发布于 2024-08-18 07:27:24 字数 346 浏览 11 评论 0原文

我有一个数据包捕获代码，可将 http 有效负载写入文件中。现在我想从这些转储中提取 URL 信息。对于每个数据包，有效负载都是这样开始的。

获取 /intl/en_com/images/logo_plain.png HTTP/1.1..主机： www.google.co.in..用户代理： Mozilla/5.0

我想提取：

“GET”和“HTTP/1.1”之间的字符串
“Host:”和“User-Agent”之间的字符串

如何在 C 中执行此操作？有没有内置的字符串函数？或者正则表达式？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

如日中天 2024-08-25 07:27:24

C 没有内置正则表达式，但可以使用库： http://www.arglist.com /regex/, http://www.pcre.org/ 是我看到的两个最常见。

对于如此简单的任务，您无需使用正则表达式即可轻松完成。如果这些行都小于某个最大长度 MAXLEN，只需一次处理一行：

char buf[MAXLEN];
char url[MAXLEN];
char host[MAXLEN];
int state = 0;      /* 0: Haven't seen GET yet; 1: haven't seen Host yet */
FILE *f = fopen("my_input_file", "rb");

if (!f) {
    report_error_somehow();
}

while (fgets(buf, sizeof buf, f)) {
    /* Strip trailing \r and \n */
    int len = strlen(buf);
    if (len >= 2 && buf[len - 1] == '\n' && buf[len - 2] == '\r') {
        buf[len - 2] = 0;
    } else {
        if (feof(f)) {
            /* Last line was not \r\n-terminated: probably OK to ignore */
        } else {
            /* Either the line was too long, or ends with \n but not \r\n. */
            report_error_somehow();
        }
    }

    if (state == 0 && !memcmp(buf, "GET ", 4)) {
        strcpy(url, buf + 4);    /* We know url[] is big enough */
        ++state;
    } else if (state == 1 && !memcmp(buf, "Host: ", 6)) {
        strcpy(host, buf + 6);   /* We know host[] is big enough */
        break;
    }
}

fclose(f);

此解决方案不需要像 KennyTM 的答案那样将整个文件缓冲在内存中（尽管这样也可以）如果您知道文件很小，则采用这种方式）。请注意，我们使用 fgets() 而不是不安全的 gets()，后者很容易在长行上溢出缓冲区。

C doesn't have built-in regular expressions, though libraries are available: http://www.arglist.com/regex/, http://www.pcre.org/ are the two I see most often.

For a task this simple, you can easily get away without using regexes though. Provided the lines are all less than some maximum length MAXLEN, just process them one line at a time:

char buf[MAXLEN];
char url[MAXLEN];
char host[MAXLEN];
int state = 0;      /* 0: Haven't seen GET yet; 1: haven't seen Host yet */
FILE *f = fopen("my_input_file", "rb");

if (!f) {
    report_error_somehow();
}

while (fgets(buf, sizeof buf, f)) {
    /* Strip trailing \r and \n */
    int len = strlen(buf);
    if (len >= 2 && buf[len - 1] == '\n' && buf[len - 2] == '\r') {
        buf[len - 2] = 0;
    } else {
        if (feof(f)) {
            /* Last line was not \r\n-terminated: probably OK to ignore */
        } else {
            /* Either the line was too long, or ends with \n but not \r\n. */
            report_error_somehow();
        }
    }

    if (state == 0 && !memcmp(buf, "GET ", 4)) {
        strcpy(url, buf + 4);    /* We know url[] is big enough */
        ++state;
    } else if (state == 1 && !memcmp(buf, "Host: ", 6)) {
        strcpy(host, buf + 6);   /* We know host[] is big enough */
        break;
    }
}

fclose(f);

This solution doesn't require buffering the entire file in memory as KennyTM's answer does (though that is fine by the way if you know the files are small). Notice that we use fgets() instead of the unsafe gets(), which is prone to overflow buffers on long lines.

回复收藏 0 原文