如何定义排除特定单词集的语法?
我构建了一个用于 C 代码静态分析的小代码。构建它的目的是警告用户使用 strcpy() 等方法,这可能本质上导致缓冲区溢出。
现在,为了形式化它,我需要编写一个正式的语法,它将排除的库显示为不属于允许使用的接受的库方法集的一部分。
例如,
AllowedSentence->ANSI C 允许的代码,而不是 UnSafeLibraryMethods
UnSafeLibraryMethods->strcpy|其他潜在不安全的方法
关于如何形式化此语法有什么想法吗?
I have built a small code for static analysis of C code. The purpose of building it is to warn users about the use of methods such as strcpy() which could essentially cause buffer overflows.
Now, to formalise the same, I need to write a formal Grammar which shows the excluded libraries as NOT a part of the allowed set of accepted library methods used.
For example,
AllowedSentence->ANSI C Permitted Code, NOT UnSafeLibraryMethods
UnSafeLibraryMethods->strcpy|other potentially unsafe methods
Any ideas on how this grammar can be formalised?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为,这不应该在语法层面上完成。它应该是解析完成后应用于解析树的规则。
I think, this should not be done at the grammar level. It should be a rule that is applied to the parse tree after parsing is done.
您几乎不需要解析器来解决您提出问题的方式。如果您的唯一目标是反对某些标识符(“strcpy”)的存在,您可以简单地构建一个处理 C 并选择标识符的词法分析器。特殊词位可以识别您的“您不应该使用此”列表。这样,您可以使用积极识别而不是消极识别来挑选出您认为有问题的标识符。
如果您想要一个更复杂的分析工具,您可能需要解析 C,将标识符名称解析为其实际定义,然后扫描树以查找令人反感的标识符。这至少可以让您决定标识符是否实际上是由用户定义的,或者来自某个已知的库;当然,如果我的代码定义了strcpy,你不应该抱怨,除非你知道我的strcpy有某种缺陷。
You hardly need a parser for the way you have posed the problem. If your only goal is to object to the presence of certain identifiers ("strcpy"), you can simply build a lexer that processes C and picks identifiers. Special lexemes can recognize your list of "you shouldn't use this". This way you use positive recognition instead of negative recognition to pick out the identifiers that you belive to be trouble.
If you want a more sophisticated analaysis tool, you'll likely want to parse C, an name-resolve the identifers to their actual definitisn, then the scan the tree looking for identifiers that are objectionable. This will at least let you decide if the identifier is actually defined by the user, or comes from some known library; surely, if my code defines strcpy, you shouldn't complain unless you know my strcpy is defective somehow.