当前位置：文江博客话题详情

如何向嵌入式项目添加 UTF-8 支持和关联的字体表？

发布于 2024-07-15 12:18:39 字数 404 浏览 5 评论 0原文

我目前正在为嵌入式显示器设计字体引擎。基本问题如下：

我需要获取动态生成的文本字符串，在 UTF-8 表中查找该字符串中的值，然后使用该表指向所有支持的字符的压缩位图数组。完成后，我调用一个位复制例程，将数据从位图数组移动到显示器。

我不会支持完整的 UTF-8 字符集，因为我可以使用的系统资源非常有限（32K ROM、8K RAM），但希望能够在以后添加所需的字形以实现本地化目的。所有开发都是用 C 语言和汇编语言完成的。

字形大小最大为 16 位宽 x 16 位高。我们可能需要支持整个基本多语言平面（3 字节），因为我们的一些较大客户位于亚洲。但是，我们不会将整个表包含在任何特定的本地化中。

我的问题是这样的：
添加此 UTF-8 支持和关联表的最佳方法是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鸩远一方 2024-07-22 12:18:39

下面的解决方案假设 Unicode 空间的低 16 位对您来说足够了。如果您的位图表有，例如在位置 0x00 到 0x5E 处有 U+0020 到 U+007E，在位置 0x5F 到 0xBE 处有 U+00A0 到 U+00FF，在 0xBF 到 0xFF 处有 U+1200 到 U+1241，你可以这样做下面的代码（未经测试，甚至未经编译测试）。

位图map 包含一系列值对。第一对中的第一个值是索引 0 处的位图表示的 Unicode 代码点。假设位图表包含一系列直接相邻的 Unicode 代码点。所以第二个值表示这个系列有多长。

while 循环的第一部分迭代 UTF-8 输入并在 ucs2char 中构建 Unicode 代码点。一旦找到完整的字符，第二部分就会在位图中提到的范围之一中搜索该字符。如果找到合适的位图索引，则会将其添加到索引中。不存在位图的字符将被静默删除。

该函数返回找到的位图索引的数量。

就 unicode-> 位图表而言，这种处理方式应该具有内存效率，相当快且相当灵活。

// Code below assumes C99, but is about three cut-and-pastes from C89
// Assuming an unsigned short is 16-bit

unsigned short bitmapmap[]={0x0020, 0x005E,
                            0x00A0, 0x0060,
                            0x1200, 0x0041,
                            0x0000};

int utf8_to_bitmap_indexes(unsigned char *utf8, unsigned short *indexes)
{
    int bitmapsfound=0;
    int utf8numchars;
    unsigned char c;
    unsigned short ucs2char;
    while (*utf8)
    {
        c=*utf8;
        if (c>=0xc0)
        {
            utf8numchars=0;
            while (c&0x80)
            {
                utf8numchars++;
                c<<=1;
            }
            c>>=utf8numchars;
            ucs2char=0;
        }
        else if (utf8numchars && c<0x80)
        {
            // This is invalid UTF-8.  Do our best.
            utf8numchars=0;
        }

        if (utf8numchars)
        {
            c&=0x3f;
            ucs2char<<=6;
            ucs2char+=c;
            utf8numchars--;
            if (utf8numchars)
                continue; // Our work here is done - no char yet
        }
        else
            ucs2char=c;

        // At this point, we have a complete UCS-2 char in ucs2char

        unsigned short bmpsearch=0;
        unsigned short bmpix=0;
        while (bitmapmap[bmpsearch])
        {
            if (ucs2char>=bitmapmap[bmpsearch] && ucs2char<=bitmapmap[bmpsearch]+bitmapmap[bmpsearch+1])
            {
                *indexes++ = bmpix+(ucs2char-bitmapmap[bmpsearch]);
                bitmapsfound++;
                break;
            }

            bmpix+=bitmapmap[bmpsearch+1];
            bmpsearch+=2;
        }
    }
    return bitmapsfound;
}

编辑：您提到您需要的不仅仅是低 16 位。 s/无符号短/无符号整数/;s/ucs2char/codepoint/; 在上面的代码中，它可以完成整个 Unicode 空间。

The solution below assumes that the lower 16 bits of the Unicode space will be enough for you. If your bitmap table has, say U+0020 through U+007E at positions 0x00 to 0x5E and U+00A0 through U+00FF at positions 0x5F to 0xBE and U+1200 through U+1241 at 0xBF to 0xFF, you could do something like the code below (which isn't tested, not even compile-tested).

bitmapmap contains a series of pairs of values. The first value in the first pair is the Unicode code point which the bitmap at index 0 represents. The assumption is that the bitmap table contains a series of directly adjacent Unicode code points. So the second value says how long this series is.

The first part of the while loop iterates through UTF-8 input and builds up a Unicode code point in ucs2char. Once a complete character is found, the second part searches for that character in one of the ranges mentioned in bitmapmap. If it finds an appropriate bitmap index, it adds it to indexes. Characters for which no bitmap is present are silently dropped.

The function returns the number of bitmap indexes found.

This way of doing things should be memory-efficient in terms of the unicode->bitmap table, reasonably fast and reasonably flexible.

// Code below assumes C99, but is about three cut-and-pastes from C89
// Assuming an unsigned short is 16-bit

unsigned short bitmapmap[]={0x0020, 0x005E,
                            0x00A0, 0x0060,
                            0x1200, 0x0041,
                            0x0000};

int utf8_to_bitmap_indexes(unsigned char *utf8, unsigned short *indexes)
{
    int bitmapsfound=0;
    int utf8numchars;
    unsigned char c;
    unsigned short ucs2char;
    while (*utf8)
    {
        c=*utf8;
        if (c>=0xc0)
        {
            utf8numchars=0;
            while (c&0x80)
            {
                utf8numchars++;
                c<<=1;
            }
            c>>=utf8numchars;
            ucs2char=0;
        }
        else if (utf8numchars && c<0x80)
        {
            // This is invalid UTF-8.  Do our best.
            utf8numchars=0;
        }

        if (utf8numchars)
        {
            c&=0x3f;
            ucs2char<<=6;
            ucs2char+=c;
            utf8numchars--;
            if (utf8numchars)
                continue; // Our work here is done - no char yet
        }
        else
            ucs2char=c;

        // At this point, we have a complete UCS-2 char in ucs2char

        unsigned short bmpsearch=0;
        unsigned short bmpix=0;
        while (bitmapmap[bmpsearch])
        {
            if (ucs2char>=bitmapmap[bmpsearch] && ucs2char<=bitmapmap[bmpsearch]+bitmapmap[bmpsearch+1])
            {
                *indexes++ = bmpix+(ucs2char-bitmapmap[bmpsearch]);
                bitmapsfound++;
                break;
            }

            bmpix+=bitmapmap[bmpsearch+1];
            bmpsearch+=2;
        }
    }
    return bitmapsfound;
}

EDIT: You mentioned that you need more than the lower 16 bits. s/unsigned short/unsigned int/;s/ucs2char/codepoint/; in the above code and it can then do the whole Unicode space.

回复收藏 0 原文