解析器生成
我正在做一个关于软件剽窃检测的项目..我打算用C语言来完成它..为此我应该创建一个令牌生成器和一个解析器..但我不知道在哪里首先..任何人都可以帮助我解决这个问题..
我创建了一个令牌数据库,并将令牌与我的程序分开。我想做的下一件事是比较两个程序以查明它是否抄袭。为此,我需要创建一个语法分析器。我不知道从哪里开始...
即我想为 python 中的 c 程序创建一个解析器
i am doing a project on SOFWARE PLAGIARISM DETECTION..i am intended to do it with language C..for that i am supposed to create a token generator, and a parser..but i dont know where to start..any one can help me out with this..
i created a database of tokens and i separated the tokens from my program.Next thing i wanna do is to compare two programs to find out whether it's plagiarized or not. For that i need to create a syntax analyzer.I don't know where to start from...
i.e I want to create a parser for c programs in python
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您想在 Python 中创建解析器,您可以查看这些库:
PLY
pyparsing
和 Lepl - 新但非常强大
If you want to create a parser in Python you can look at these libraries:
PLY
pyparsing
and Lepl - new but very powerful
自己构建一个真正的 C 解析器是一项非常艰巨的任务。
我建议你要么找到一个已经完成的,例如。 pycparser 或者您定义一个非常简单的易于解析的 C 子集。
解析完 C 后,您的抄袭检测器将有大量工作要做。
Building a real C parser by yourself is a really big task.
I suggest you either find one that is already done, eg. pycparser or you define a really simple subset of C that is easily parsed.
You'll have plenty of work to do for your plagiarism detector after you are done parsing C.
我不确定您是否需要解析令牌流来检测您正在寻找的功能。事实上,这可能会让事情变得更加复杂。
您真正要寻找的是与正在测试的可疑示例代码具有非常强的相似性的原始源代码序列。这听起来与贝叶斯分类器的用途非常相似,就像垃圾邮件过滤和语言检测中使用的分类器一样。
I'm not sure you need to parse the token stream to detect the features you're looking for. In fact, it's probably going to complicate things more than anything.
what you're really looking for is sequences of original source code that have a very strong similarity with a suspect sample code being tested. This sounds very similar to the purpose of a Bayes classifier, like those used in spam filtering and language detection.