ctype.h 中部分函数的优化实现

发布于 2024-11-24 14:51:43 字数 228 浏览 2 评论 0原文

在我正在使用的 C 应用程序之一中,ctype.h 中的以下函数:

isalpha()、isspace()、ispunct()、tolower()。

经过分析后,我发现这些函数的调用存在一些瓶颈(基本上我的应用程序是来自输入文本文件的字符/字符串处理,因此这些函数在关键循环内被详尽地调用)我想优化它们的速度并让我的自己的实现(如果有帮助的话)。

我在哪里可以找到这样的逻辑或实现它们的逻辑?

In one of my C application I am using, below functions from ctype.h :

isalpha(), isspace(), ispunct(), tolower().

After profiling I see there are some bottlenecks in the calls of these functions(Basically my app is a character/string processing from a input text file and hence these functions are called exhaustively inside critical loops)I want to optimize them for speed and have my own implementation if it helps.

Where can I find such or logic to implement them?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

夜灵血窟げ 2024-12-01 14:51:43

您可以将它们实现为宏或内联函数:

#define IS_ALPHA(x) (((x) >= 'a' && (x) <= 'z') || ((x) >= 'A' && (x) <= 'Z'))
#define IS_SPACE(x) ((x) == ' ' || (x) == '\t')
... etc.

但请注意,原始的 isalpha、isspace、ispunct 等取决于当前的语言环境和根据语言的不同,可能会产生不同的结果。

You could implement them as macros or inline functions:

#define IS_ALPHA(x) (((x) >= 'a' && (x) <= 'z') || ((x) >= 'A' && (x) <= 'Z'))
#define IS_SPACE(x) ((x) == ' ' || (x) == '\t')
... etc.

Note however that the original isalpha, isspace, ispunct, etc. depend on the current locale and may yield different results depending on the language.

拍不死你 2024-12-01 14:51:43

在我看来,这样的函数可能成为你的瓶颈,这听起来很奇怪;他们可能会考虑到区域设置,这使得他们“更慢”。如果您可以忽略它,那么您可以像(例如:这只是即时编写的想法)一样轻松地实现它们,

bool isalpha(int c)
{
   return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');
}

bool isspace(int c)
{
   return c == ' ' || c == '\t'; // || whatever other char you consider space
}

bool ispunct(int c)
{
   static const char *punct = ".;!?...";
   return strchr(punct, c) == NULL ? false : true; // you can make this shorter
}

int tolower(int c)
{
   if ( !isalpha(c) ) return c;
   return (c >= 'A' && c <= 'Z') ? c - 'A' : c;
}

然后使它们成为内联函数。

It sounds to me odd that such functions can be your bottleneck; likely they can take into account the locale, and this makes them "slower". If you can disregard it, then you can implement them as easily as (e.g.: this is just an idea wrote on the fly)

bool isalpha(int c)
{
   return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');
}

bool isspace(int c)
{
   return c == ' ' || c == '\t'; // || whatever other char you consider space
}

bool ispunct(int c)
{
   static const char *punct = ".;!?...";
   return strchr(punct, c) == NULL ? false : true; // you can make this shorter
}

int tolower(int c)
{
   if ( !isalpha(c) ) return c;
   return (c >= 'A' && c <= 'Z') ? c - 'A' : c;
}

Then make them inline functions.

许你一世情深 2024-12-01 14:51:43

您可以使用 256 个元素的查找表快速实现这些函数。对于 isalpha(),如果 ASCII 值为 i 的字符是字母数字,则第 i 个元素为 1。那么isalpha只是一个表查找。

通过将每个条目的一位用于一个函数的结果,您可以节省一些空间并使用一个表对所有这些函数进行编码。然后,每个函数只需查找传入字符的条目,并屏蔽掉它需要的位。

戴夫

You can make fast implementations of these functions by using a lookup table of 256 elements. For isalpha(), the i'th element is 1 if the character whose ASCII value is i is an alphanumeric. Then isalpha is just a table lookup.

You can save some space and encode all of these functions with one table by devoting one bit of each entry to the result of one function. Then each function simply looks up the entry for the character passed in, and masks out the bit that it needs.

Dave

探春 2024-12-01 14:51:43

一般来说,编写库代码的人都是非常优秀的软件工程师,并且这些功能已经被调整到了n级。除非您可以删除这些函数必须考虑的一些情况,否则您将无法匹配它们的性能。

In general the people that write library code are very good software engineers and those functions have been tuned to the nth degree. Unless you can remove some of the cases that those functions have to account for you will have trouble matching their performance.

残疾 2024-12-01 14:51:43

查看 ctype.h 标头 - 您的编译器库可能已经提供了一种将这些函数内联或实现为宏的方法(如果出于某种原因不支持内联)。 (顺便说一句 - 您使用的是什么编译器和目标平台?)

如果这些东西已经是内联/宏,那么您可能需要发布一些有关如何使用这些函数的详细信息。也许有一种方法可以快捷地调用其中一些函数(例如,如果 isspace() 为 true,则不需要调用 isalpha()ispunct() 因为它们一定不是真的)。

Take a look at the ctype.h header - your compiler library probably already provides a way to have these functions inlined or implemented as macros (if inline isn't supported for whatever reason). (By the way - what compiler & target platform are you using?)

If these things are already inlined/macros then you might want to post some details about how you're using the functions. Maybe there's a way to shortcut calling some of these functions (for example, if isspace() is true, you don't need to call isalpha() or ispunct() since they must not be true).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文