如何在C中转义html实体?

发布于 2024-10-21 14:00:31 字数 1277 浏览 6 评论 0 原文

我正在尝试用 C 语言解码 HTML 实体(格式为 ')。

到目前为止,我已经有了一些代码来尝试解码它们,但它似乎会产生奇怪的输出。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char* convertHtmlEntities(char* str) {
    size_t length = strlen(str);
    size_t i;
    char *endchar = malloc(sizeof(char));
    long charCode;
    if (!endchar) {
        fprintf(stderr,"not enough memory");
        exit(EXIT_FAILURE);
    }
    for (i=0;i<length;i++) {
        if (*(str+i) == '&' && *(str+i+1) == '#' && *(str+i+2) >= '0' && *(str+i+2) <= '9' && *(str+i+3) >= '0' && *(str+i+3) <= '9' && *(str+i+4) == ';') {
            charCode = strtol(str+i+2,&endchar,0);
            printf("ascii %li\n",charCode);
            *(str+i) = charCode;
            strncpy(str+i+1,str+i+5,length - (i+5));
            *(str + length - 5) = 0; /* null terminate string */
        }
    }
    return str;
}

int main()
{
    char string[] = "Helloworld&#39;s parent company has changed - comF";
    printf("%s",convertHtmlEntities(&string));
}

我不确定主要陈述是否正确,因为我只是为此示例制作的,因为我的程序是从 Web url 生成它的,但是想法是相同的。

该函数确实用撇号替换了 &#39;,但输出在末尾和替换后出现乱码。

有人有解决办法吗?

I'm trying to decode HTML entities (in the format ') in C.

So far I've got some code to try and decode them but it seems to produce odd output.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char* convertHtmlEntities(char* str) {
    size_t length = strlen(str);
    size_t i;
    char *endchar = malloc(sizeof(char));
    long charCode;
    if (!endchar) {
        fprintf(stderr,"not enough memory");
        exit(EXIT_FAILURE);
    }
    for (i=0;i<length;i++) {
        if (*(str+i) == '&' && *(str+i+1) == '#' && *(str+i+2) >= '0' && *(str+i+2) <= '9' && *(str+i+3) >= '0' && *(str+i+3) <= '9' && *(str+i+4) == ';') {
            charCode = strtol(str+i+2,&endchar,0);
            printf("ascii %li\n",charCode);
            *(str+i) = charCode;
            strncpy(str+i+1,str+i+5,length - (i+5));
            *(str + length - 5) = 0; /* null terminate string */
        }
    }
    return str;
}

int main()
{
    char string[] = "Helloworld's parent company has changed - comF";
    printf("%s",convertHtmlEntities(&string));
}

I'm not sure if the main statement is correct because I just made it for this example as my program generates it from a web url, however the idea is the same.

The function does replace the ' with a apostrophe, but the output is garbled at the end and just after the replacement.

Does anyone have a solution?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

半步萧音过轻尘 2024-10-28 14:00:31

strncpy(或strcpy)不适用于重叠字符串。

您的字符串 str+i+1str+i+5 重叠。不要那样做!

将 strncpy 替换为 memmove

            *(str+i) = charCode;
            memmove(str+i+1,str+i+5,length - (i+5) + 1); /* also copy the '\0' */
            /* strncpy(str+i+1,str+i+5,length - (i+5)); */
            /* *(str + length - 5) = 0; */ /* null terminate string */

strncpy (or strcpy) does not work for overlapping strings.

Your strings str+i+1 and str+i+5 overlap. Don't do that!

Replace strncpy with memmove

            *(str+i) = charCode;
            memmove(str+i+1,str+i+5,length - (i+5) + 1); /* also copy the '\0' */
            /* strncpy(str+i+1,str+i+5,length - (i+5)); */
            /* *(str + length - 5) = 0; */ /* null terminate string */
愛上了 2024-10-28 14:00:31

我的代码还有另一个问题 - 它删除了最后一个“F”字符。我将这一行: 替换

 *(str + length - 5) = 0; /* null terminate string */

为:

 *(str + length - 4) = 0; /* null terminate string */

我相信这是因为您删除了五个字符并添加了一个,所以新的长度不是 old-5,而是 old-4。

I had another problem with the code - it cut the last 'F' character. I replaced this line:

 *(str + length - 5) = 0; /* null terminate string */

with this:

 *(str + length - 4) = 0; /* null terminate string */

I belive it's because you delete five chars and add one, so the new length is not old-5, but old-4.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文