确定 Unicode 字符是否可见?
我正在编写一个文本编辑器,它可以选择显示项目符号来代替任何不可见的 Unicode 字符。 不幸的是,似乎没有简单的方法来确定 Unicode 字符是否不可见。
我需要找到一个包含每个 Unicode 字符的文本文件,以便我可以查找不可见的字符。 有人知道我在哪里可以找到这样的文件吗?
编辑:我正在 Mac OS X 的 Cocoa 中编写这个应用程序。
I am writing a text editor which has an option to display a bullet in place of any invisible Unicode character. Unfortunately there appears to be no easy way to determine whether a Unicode character is invisible.
I need to find a text file containing every Unicode character in order that I can look through for invisible characters. Would anyone know where I can find such a file?
EDIT: I am writing this app in Cocoa for Mac OS X.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
哦,我明白了...实际的不可见字符;)此常见问题解答可能会很有用:
http:// /www.unicode.org/faq/unsup_char.html
它列出了当前不可见的代码点以及其他可能对您有用的信息。
编辑:添加了一些特定于 Cocoa 的信息
由于您使用的是 Cocoa,因此您可以获取控制字符的 unicode 字符集并与之进行比较:
您可能还想查看常见问题解答链接我并根据上面的信息将您认为可能需要的任何字符添加到 controlCharacterSet 返回的字符集中。
编辑:添加了从 Unicode 字符创建 Unicode 字符串的示例
Oh, I see... actual invisble characters ;) This FAQ will probably be useful:
http://www.unicode.org/faq/unsup_char.html
It lists the current invisible codepoints and has other information that you might find helpful.
EDIT: Added some Cocoa-specific information
Since you're using Cocoa, you can get the unicode character set for control characters and compare against that:
You might also want to take a look at the FAQ link I posted above and add any characters that you think you may need based on the information there to the character set returned by controlCharacterSet.
EDIT: Added an example of creating a Unicode string from a Unicode character
让我知道这段代码是否有帮助:
重要警告:
这可能不是最有效的方法,因此您必须决定在让它工作后如何优化。 这只适用于 BMP 上的字符,如果您有这样的要求,则必须添加对合成字符的支持。 这根本不进行错误检查。
Let me know if this code helps at all:
Important caveats:
This probably isn't the most efficient way of doing this, so you will have to decide how you want to optimize after you get it working. This only works with characters on the BMP, support for composted characters would have to be added if you have such a requirement. This does no error checking at all.
Unicode 联盟本身是一个很好的起点,它提供了大量数据,其中一些正是您正在寻找的。
我还在生成一个 DLL,你给它一个字符串,它就会返回每个字符的 UCN。 但不要屏住呼吸。
A good place to start is the Unicode Consortium itself which provides a large body of data, some of which would be what you're looking for.
I'm also in the process of producing a DLL which you give a string and it gives back the UCNs of each character. But don't hold your breath.
当前的官方 Unicode 版本是 5.1.0,描述其中所有代码点的文本文件可以在 http://www.unicode.org/standard/versions/components-latest.html
The current official Unicode version is 5.1.0, and text files describing all of the code points in that can be found at http://www.unicode.org/standard/versions/components-latest.html
对于 Java,java. lang.Character.getType。 对于 C,u_charType() 或 u_isgraph()。
For Java, java.lang.Character.getType. For C, u_charType() or u_isgraph().
您可能会发现此代码感兴趣:http://gavingrover。 blogspot.com/2008/11/unicode-for-grerlvy.html
you might find this code to be of interest: http://gavingrover.blogspot.com/2008/11/unicode-for-grerlvy.html
这是一项不可能完成的任务,Unicode 甚至支持克林贡语,所以这是行不通的。 然而,大多数文本编辑器使用标准 ANSI 不可见字符。 如果您的 Unicode 库很好,它将支持查找等效字符和/或类别,您可以使用这两个功能以及任何编辑器来完成此操作
编辑:是的,我是愚蠢的克林贡语支持,但这并不意味着它不正确......当然,联盟不支持克林贡语,但是在为克林贡字母定义的 Unicode 的“私人使用区域”中存在克林贡语的运动(U+F8D0 - U +F8FF)。 链接此处供感兴趣的人使用:)
注意:想知道什么编辑器克林贡程序员使用...
Its an impossible task, Unicode supports even Klingon, so it's not going to work. However most text editors use the standard ANSI invisible characters. And if your Unicode library is good, it will support finding equivalent characters and/or categories, you can use these two features to do it as well as any editor out there
Edit: Yes I was being silly about Klingon support, but that doesn't make it not true... of course Klingon is not supported by the Consortium, however there is a movement for Klingon in the Unicode's "Private Use Area" defined for Klingon alphabet (U+F8D0 - U+F8FF). Link here for those interested :)
Note: Wonder what editor Klingon programmers use...