通过编码风格识别开发人员的软件指标

发布于 2025-01-05 09:18:10 字数 383 浏览 4 评论 0原文

传统的软件指标涉及软件质量。我正在寻找可用于通过代码来识别开发人员的指标,就像抄袭软件和 文体测量法可用于通过写作风格来识别作者。我可以想象,某些现有的指标也可以在这里使用,例如评论率。我还可以想象从质量角度来看不相关的指标,例如(过度)使用某些方法或设计模式、变量名称的平均长度等。

我对指向此类集合的指针感兴趣指标或研究,或个别指标。它们可能与语言无关或与语言或编程范例相关。

我想用它来理解和分析不同的编码风格,而不是检测抄袭。

Traditionional software metrics deal with quality of software. I'm looking for metrics that can be used to identify developers by their code, in the same vein as plagiarism software and stylometry can be used to identify authors by their writing style. I can imagine that certain existing metrics can be used here as well, such as comment ratio. I can also imagine metrics that would irrelevant from a quality point of view, such as the (over)use of certain methods or design patterns, average length of variable names, etc.

I'm interested either in a pointer to a collection of such metrics or studies, or individual metrics. They may be language-agnostic or related to a language or programming paradigm.

I want to use it to understand and analyze different coding styles, not to detect plagiarism.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

旧情别恋 2025-01-12 09:18:10

我发现已经有一些研究对此进行了研究。他们可能会有所帮助。

  1. Kothari, J.、Shevertalov, M.、Stehle, E.、Mancoridis, S.,“源代码作者身份识别的概率方法”,国际信息技术会议记录,第 243-248 页,IEEE,2007 年。

    可在此处

    在线获取

    引用摘要:

    <块引用>

    我们首先计算一组指标,使用经验证真实的代码示例为一群已知作者构建个人资料。然后,我们计算未识别源代码的指标,以确定最接近的匹配配置文件。 [...] 在我们的案例研究中,我们能够
    在选择单个最接近匹配时确定作者身份的准确度超过 70%,在选择前三个排序最接近的匹配时确定作者身份的准确度超过 90%。

  2. Shevertalov, M.、Kothari, J.、Stehle, E.、Mancoridis, S.,“关于使用离散化源代码指标进行作者识别”,第一届国际会议论文集基于搜索的软件工程研讨会,第 69-78 页,IEEE,2009 年。

    可在线获取此处,这是以下内容的后续内容之前的研究。

  3. Lange, R.、Mancoridis, S.,“使用代码度量直方图和遗传算法进行软件取证的作者识别”,第九届遗传与进化计算年会论文集,第 2082-2089 页,ACM,2007 年。

    在线获取此处

    这也与第一个参考文献(共同作者)相关,并更详细地讨论了指标。再次引用摘要:

    <块引用>

    我们的方法涉及测量代码指标直方图分布的差异。确定有效区分开发人员风格的指标组合是该技术实用性的关键。我们的案例研究涉及 18 个指标。

您还可以使用 Google Scholar 获取其他参考资料,并根据上述内容查找其他论文(使用“引用”选项)。

I see there are already a couple of studies that looked into this. They might help.

  1. Kothari, J., Shevertalov, M., Stehle, E., Mancoridis, S., "A probabilistic approach to source code authorship identification", In Proceedings of the International Conference on Information Technology, pp.243-248, IEEE, 2007.

    Available online here

    Quoting from the abstract:

    We begin by computing a set of metrics to build profiles for a population of known authors using code samples that are verified to be authentic. We then compute metrics on unidentified source code to determine the closest matching profile. [...] In our case study we are able
    to determine authorship with greater than 70% accuracy in choosing the single nearest match and greater than 90% accuracy in choosing the top three ordered nearest matches.

  2. Shevertalov, M., Kothari, J., Stehle, E., Mancoridis, S., "On the use of discretized source code metrics for author identification", In Proceedings of the 1st International Symposium on Search Based Software Engineering, pp.69-78, IEEE, 2009.

    Available online here, this is a follow-up of the previous study.

  3. Lange, R., Mancoridis, S., "Using code metric histograms and genetic algorithms to perform author identification for software forensics", In Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, pp.2082-2089, ACM, 2007.

    Available online here

    This is also related to the first reference (common author), and discusses the metrics in more detail. Again quoting from the abstract:

    Our method involves measuring the differences in histogram distributions for code metrics. Identifying a combination of metrics that is effective in distinguishing developer styles is key to the utility of the technique. Our case study involves 18 metrics.

You can also use Google Scholar for other references, and for finding other papers based on the ones above (using the "cited by" option).

忆依然 2025-01-12 09:18:10

如果您正在寻找潜在的指标,您可以尝试查看一些编码标准。由于它们规定了特定的风格,因此它们所讨论的内容(间距、大括号的位置、标识符长度、强制注释等)可以用来从代码中识别开发人员。

另外,如果您对 .NET 代码感兴趣,您可能会发现 NDepend 是一个有用的工具。它使您能够针对代码库运行查询,并支持 82 个指标。

If you're looking for potential metrics, you might try reviewing some coding standards. Since these dictate a particular style, it follows that the things they talk about (spacing, placement of braces, identifier lengths, mandatory comments, etc.) are things that might be used to identify developers from their code.

Also, if you're interested in .NET code, you might find NDepend to be a useful tool. It enables you to run queries against a code base, and supports 82 metrics.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文