当前位置：文江博客话题详情

确定 Unicode 字符是否可见？

发布于 2024-07-08 20:32:35 字数 209 浏览 14 评论 0原文

我正在编写一个文本编辑器，它可以选择显示项目符号来代替任何不可见的 Unicode 字符。不幸的是，似乎没有简单的方法来确定 Unicode 字符是否不可见。

我需要找到一个包含每个 Unicode 字符的文本文件，以便我可以查找不可见的字符。有人知道我在哪里可以找到这样的文件吗？

编辑：我正在 Mac OS X 的 Cocoa 中编写这个应用程序。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

〃温暖了心ぐ 2024-07-15 20:32:35

哦，我明白了...实际的不可见字符；）此常见问题解答可能会很有用：

http:// /www.unicode.org/faq/unsup_char.html

它列出了当前不可见的代码点以及其他可能对您有用的信息。

编辑：添加了一些特定于 Cocoa 的信息

由于您使用的是 Cocoa，因此您可以获取控制字符的 unicode 字符集并与之进行比较：

NSCharacterSet* controlChars = [NSCharacterSet controlCharacterSet];

您可能还想查看常见问题解答链接我并根据上面的信息将您认为可能需要的任何字符添加到 controlCharacterSet 返回的字符集中。

编辑：添加了从 Unicode 字符创建 Unicode 字符串的示例

unichar theChar = 0x000D;
NSString* thestring = [NSStirng stringWithCharacters:&theChar length:1];

Oh, I see... actual invisble characters ;) This FAQ will probably be useful:

http://www.unicode.org/faq/unsup_char.html

It lists the current invisible codepoints and has other information that you might find helpful.

EDIT: Added some Cocoa-specific information

Since you're using Cocoa, you can get the unicode character set for control characters and compare against that:

NSCharacterSet* controlChars = [NSCharacterSet controlCharacterSet];

You might also want to take a look at the FAQ link I posted above and add any characters that you think you may need based on the information there to the character set returned by controlCharacterSet.

EDIT: Added an example of creating a Unicode string from a Unicode character

unichar theChar = 0x000D;
NSString* thestring = [NSStirng stringWithCharacters:&theChar length:1];

回复收藏 0 原文

谁对谁错谁最难过 2024-07-15 20:32:35

让我知道这段代码是否有帮助：

-(NSString*)stringByReplacingControlCharacters:(NSString*)originalString
{
    NSUInteger length = [originalString length];
    unichar *strAsUnichar = (unichar*)malloc(length*sizeof(unichar));
    NSCharacterSet* controlChars = [NSCharacterSet controlCharacterSet];
    unichar bullet = 0x2022;

    [originalString getCharacters:strAsUnichar];
    for( NSUInteger i = 0; i < length; i++ ) {
        if( [controlChars characterIsMember:strAsUnichar[i]] )
            strAsUnichar[i] = bullet;
    }

    NSString* newString = [NSString stringWithCharacters:strAsUnichar length:length];
    free(strAsUnichar);

    return newString;
}

重要警告：

这可能不是最有效的方法，因此您必须决定在让它工作后如何优化。这只适用于 BMP 上的字符，如果您有这样的要求，则必须添加对合成字符的支持。这根本不进行错误检查。

Let me know if this code helps at all:

-(NSString*)stringByReplacingControlCharacters:(NSString*)originalString
{
    NSUInteger length = [originalString length];
    unichar *strAsUnichar = (unichar*)malloc(length*sizeof(unichar));
    NSCharacterSet* controlChars = [NSCharacterSet controlCharacterSet];
    unichar bullet = 0x2022;

    [originalString getCharacters:strAsUnichar];
    for( NSUInteger i = 0; i < length; i++ ) {
        if( [controlChars characterIsMember:strAsUnichar[i]] )
            strAsUnichar[i] = bullet;
    }

    NSString* newString = [NSString stringWithCharacters:strAsUnichar length:length];
    free(strAsUnichar);

    return newString;
}

Important caveats:

This probably isn't the most efficient way of doing this, so you will have to decide how you want to optimize after you get it working. This only works with characters on the BMP, support for composted characters would have to be added if you have such a requirement. This does no error checking at all.

回复收藏 0 原文