当前位置：文江博客话题详情

C 风格：宏还是预处理器？

发布于 2024-07-16 03:57:22 字数 2458 浏览 6 评论 0原文

我编写了一个库来将字符串与一组模式进行匹配，现在我可以轻松地将词法扫描器嵌入到 C 程序中。

我知道有许多成熟的工具可用于创建词法扫描器（lex 和 re2c，仅列出我想到的前两个）这个问题与词法分析器无关，而是关于“扩展”C 语法的最佳方法。词法分析器示例只是一般问题的具体案例。

我可以看到两种可能的解决方案：

编写一个预处理器，将带有嵌入式词法分析器的源文件转换为纯 C 文件，并且可能转换为要在编译中使用的一组其他文件。
编写一组 C 宏以更易读的形式表示词法分析器。

我已经完成了这两项工作，但问题是：“根据以下标准，您认为哪一种做法更好？”

可读性。词法分析器逻辑应该清晰且易于理解
可维护性。查找并修复错误不应该是一场噩梦！
构建过程中的干扰。预处理器将需要在构建过程中执行额外的步骤，预处理器必须位于路径等中。

换句话说，如果您必须维护或编写一个使用两种方法之一的软件，其中一种是会少让你失望吗？

作为一个例子，这里是一个针对以下问题的词法分析器：

对所有数字求和（可以是十进制形式，包括像 1.3E-4.2 这样的指数）
跳过字符串（双引号和单引号）
跳过列表（类似于 LISP 列表： (3 4 ( 0 1)() 3) )
在遇到单词 end（大小写无关）或缓冲区末尾时停止

在两种样式中。

/**** SCANNER STYLE 1 (preprocessor) ****/
#include "pmx.h"

t = buffer

while (*t) {
  switch pmx(t) { /* the preprocessor will handle this */
    case "&q" :         /* skip strings */
      break; 

    case "&f<?=eE>&F" : /* sum numbers */ 
      sum += atof(pmx(Start,0));
      break;

    case "&b()":        /* skip lists */
      break;

    case "&iend" :      /* stop processing */ 
      t = "";
      break;

    case "<.>":         /* skip a char and proceed */
      break;
  }
}

/**** SCANNER STYLE 2 (macros) ****/
#include "pmx.h"
/* There can be up to 128 tokens per scanner with id x80 to xFF */
#define TOK_STRING x81
#define TOK_NUMBER x82
#define TOK_LIST   x83
#define TOK_END    x84
#define TOK_CHAR   x85

pmxScanner(   /* pmxScanner() is a pretty complex macro */
   buffer
 ,
   pmxTokSet("&q"         , TOK_STRING)
   pmxTokSet("&f<?=eE>&F" , TOK_NUMBER)
   pmxTokSet("&b()"       , TOK_LIST)
   pmxTokSet("&iend"      , TOK_END)
   pmxTokSet("<.>"        , TOK_CHAR)
 ,
   pmxTokCase(TOK_STRING) :   /* skip strings */
     continue; 

   pmxTokCase(TOK_NUMBER) :   /* sum numbers */ 
     sum += atof(pmxTokStart(0));
     continue;

   pmxTokCase(TOK_LIST):      /* skip lists */
     continue;

   pmxTokCase(TOK_END) :      /* stop processing */ 
     break; 

   pmxTokCase(TOK_CHAR) :     /* skip a char and proceed */
     continue;
);

如果有人对当前的实现感兴趣，代码位于：http://sites.google.com/站点/clibutl 。

原文

I've written a library to match strings against a set of patterns and I can now easily embed lexical scanners into C programs.

I know there are many well established tools available to create lexical scanners (lex and re2c, to just name the first two that come to mind) this question is not about lexers, it's about the best approach to "extend" C syntax. The lexer example is just a concrete case of a general problem.

I can see two possible solutions:

write a preprocessor that converts a source file with the embedded lexer to a plain C file and, possibly, to a set of other files to be used in the compilation.
write a set of C macros to represent lexers in a more readable form.

I've already done both but the question is: "which one would you consider a better practice according the following criteria?"

Readability. The lexer logic should be clear and easy to understand
Maintainability. Finding and fixing a bug should not be a nightmare!
Interference in the build process. The preprocessor will require an additional step in the build process, the preprocessor will have to be in the path etc etc.

In other words, if you had to maintain or write a piece of software that is using one of the two approaches, wich one will disappoint you less?

As an example, here is a lexer for the following problem:

Sum all numbers (can be in decimal form including exponential like 1.3E-4.2)
Skip strings (double and single quoted)
skip lists (similar to LISP lists: (3 4 (0 1)() 3) )
stop on encountering the word end (case is irrelevant) or at the end of the buffer

In the two styles.

/**** SCANNER STYLE 1 (preprocessor) ****/
#include "pmx.h"

t = buffer

while (*t) {
  switch pmx(t) { /* the preprocessor will handle this */
    case "&q" :         /* skip strings */
      break; 

    case "&f<?=eE>&F" : /* sum numbers */ 
      sum += atof(pmx(Start,0));
      break;

    case "&b()":        /* skip lists */
      break;

    case "&iend" :      /* stop processing */ 
      t = "";
      break;

    case "<.>":         /* skip a char and proceed */
      break;
  }
}

/**** SCANNER STYLE 2 (macros) ****/
#include "pmx.h"
/* There can be up to 128 tokens per scanner with id x80 to xFF */
#define TOK_STRING x81
#define TOK_NUMBER x82
#define TOK_LIST   x83
#define TOK_END    x84
#define TOK_CHAR   x85

pmxScanner(   /* pmxScanner() is a pretty complex macro */
   buffer
 ,
   pmxTokSet("&q"         , TOK_STRING)
   pmxTokSet("&f<?=eE>&F" , TOK_NUMBER)
   pmxTokSet("&b()"       , TOK_LIST)
   pmxTokSet("&iend"      , TOK_END)
   pmxTokSet("<.>"        , TOK_CHAR)
 ,
   pmxTokCase(TOK_STRING) :   /* skip strings */
     continue; 

   pmxTokCase(TOK_NUMBER) :   /* sum numbers */ 
     sum += atof(pmxTokStart(0));
     continue;

   pmxTokCase(TOK_LIST):      /* skip lists */
     continue;

   pmxTokCase(TOK_END) :      /* stop processing */ 
     break; 

   pmxTokCase(TOK_CHAR) :     /* skip a char and proceed */
     continue;
);

Should anyone be interested in the current implementation, the code is here: http://sites.google.com/site/clibutl .

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

懒猫 2024-07-23 03:57:22

预处理器将提供更强大和通用的解决方案。另一方面，宏可以快速启动，提供良好的概念验证，并且当示例关键字/令牌空间较小时也很容易。在某个点之后，使用宏扩展/包含新功能可能会变得乏味。我会说启动宏，然后将它们转换为预处理器命令。

另外，如果可能的话，尝试使用通用预处理器而不是编写自己的预处理器。

[...] 我需要处理另一个依赖项（例如，用于 Windows 的 m4）。

是的。但是您编写的任何解决方案也是如此:) - 并且您必须维护它。您命名的大多数程序都有可用的 Windows 端口（例如，请参阅 m4 for windows）。使用这种解决方案的优点是可以节省大量时间。当然，缺点是，如果出现奇怪的错误，您可能必须加快源代码的速度（尽管维护这些错误的人员非常有帮助，并且肯定会确保您获得一切帮助）。

再说一次，是的，我更喜欢打包的解决方案而不是自己推出。

回复收藏 0 原文