C 中带有重音字符的 isLetter

发布于 2024-10-31 11:26:26 字数 213 浏览 7 评论 0原文

我想创建(或查找)一个 C 函数来检查 char c 是否是一个字母... 当然,我可以轻松地为 az 和 AZ 执行此操作。

但是,如果测试 c == á,ã,ô,ç,ë 等,我会收到错误

可能这些特殊字符存储在比字符更多的位置...

我想知道: 这些特殊字符是如何存储的,我的函数需要接收哪些参数,以及如何接收? 我还想知道是否有任何标准函数已经做到了这一点。

I'd like to create (or find) a C function to check if a char c is a letter...
I can do this for a-z and A-Z easily of course.

However i get an error if testing c == á,ã,ô,ç,ë, etc

Probably those special characters are stored in more then a char...

I'd like to know:
How these special characters are stored, which arguments my function needs to receive, and how to do it?
I'd also like to know if are there any standard function that already does this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

毅然前行 2024-11-07 11:26:26

我认为您正在寻找 iswalpha() 例程:

   #include <wctype.h>

   int iswalpha(wint_t wc);

DESCRIPTION
   The iswalpha() function is the wide-character equivalent of
   the isalpha(3) function.  It tests whether wc is a wide
   character belonging to the wide-character class "alpha".

它确实取决于当前 locale(7)LC_CTYPE,因此它在应该同时正确处理多种类型输入的程序中使用可能并不理想。

I think you're looking for the iswalpha() routine:

   #include <wctype.h>

   int iswalpha(wint_t wc);

DESCRIPTION
   The iswalpha() function is the wide-character equivalent of
   the isalpha(3) function.  It tests whether wc is a wide
   character belonging to the wide-character class "alpha".

It does depend upon the LC_CTYPE of the current locale(7), so its use in a program that is supposed to handle multiple types of input correctly simultaneously might not be ideal.

苏大泽ㄣ 2024-11-07 11:26:26

如果您正在使用单字节代码集,例如 ISO 8859-1 或 8859-15(或任何其他 8859-x 代码集),则 isalpha() 函数将完成这项工作,如果您还记得在程序中使用 setlocale(LC_ALL, ""); (或其他合适的 setlocale() 调用)。如果没有这个,程序将在 C 语言环境中运行,该语言环境仅对 ASCII 字符(0x00..0x7F 范围内的 8859-x 字符)进行分类。

如果您使用多字节或宽字符代码集(例如 UTF8 或 UTF16),则需要查看

If you are working with single-byte codesets such as ISO 8859-1 or 8859-15 (or any of the other 8859-x codesets), then the isalpha() function will do the job if you also remember to use setlocale(LC_ALL, ""); (or some other suitable invocation of setlocale()) in your program. Without this, the program runs in the C locale, which only classifies the ASCII characters (8859-x characters in the range 0x00..0x7F).

If you are working with multibyte or wide character codesets (such as UTF8 or UTF16), then you need to look to the wide character functions found in <wchar.h> and <wctype.h>.

看春风乍起 2024-11-07 11:26:26

这些字符的存储方式取决于区域设置。在大多数 UNIX 系统上,它们将存储为 UTF8,而 Win32 计算机可能会将它们表示为 UTF16。 UTF8 存储为可变数量的字符,而 UTF16 则使用代理项对存储 - 因此位于 wchar_t (或 unsigned Short)内(不过顺便说一句,Windows 上的 sizeof(wchar_t) 只有 2(而 *nix 上为 4),因此,如果使用代理对编码(在很多情况下都会如此),您通常需要 2 个 wchar_t 类型来存储 1 个字符。

如前所述,iswalpha() 例程将为您执行此操作,并记录在这里。它应该为您处理特定于区域设置的问题。

How these characters are stored is locale-dependent. On most UNIX systems, they'll be stored as UTF8, whereas a Win32 machine will likely represent them as UTF16. UTF8 is stored as a variable-amount of chars, whereas UTF16 is stored using surrogate pairs - and thus inside a wchar_t (or unsigned short) (though incidentally, sizeof(wchar_t) on Windows is only 2 (vs 4 on *nix), and thus you'll often need 2 wchar_t types to store the 1 character if a surrogate pair encoding is used - which it will be in many cases).

As was mentioned, the iswalpha() routine will do this for you, and is documented here. It should take care of locale-specific issues for you.

浮华 2024-11-07 11:26:26

您可能需要 http://site.icu-project.org/。它提供了一个带有 API 的可移植库。

You probably want http://site.icu-project.org/. It provides a portable library with APIs for this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文