使用 SHA1 识别 Git 中重命名和修改的文件

发布于 2024-12-19 01:12:04 字数 425 浏览 3 评论 0原文

我正在低级别修改 git 存储库,尝试从中检索文件的历史记录。并且难以识别在同一修订版中修改和重命名的文件。

我正在开发 C# 应用程序,我需要实现 git log --follow FILENAME 功能。

修改很简单:如果 SHA1 不同,则在修订版附加的树中搜索具有给定路径的文件 - Voilà

重命名也很简单:如果给定路径搜索不成功 - 与之前一样查找具有相同 SHA1 的对象,如果找到的话 -

但如果没有找到,可能是文件删除,我的搜索结束,或者在同一版本中重命名和修改......但如何区分这些情况?

我已经研究了我发现的有关 Git 内部结构的所有内容,但仍然无法找出在这种情况下该怎么做,与不同版本中的相同修改和重命名文件相对应的树对象之间可能有什么共同点?

非常感谢您的帮助!

I'm hacking around git repository at low-level, trying to retrieve file's history from it. And having difficulties identifying file modified and renamed in a same revision.

I'm developing C# application and I need to implement git log --follow FILENAME feature.

Modification is simple: search for file with given path in trees attached to revision, if SHA1 differs — Voilà!

Rename is simple too: if search by given path was not successful — look for object with same SHA1, as previously, if found — Voilà!

But if not found it might be either file deletion and my search is over, or rename and modify in same revision... but how to distinguish between these cases?

I've studied everything I found regarding Git internals, but still cannot find out what to do in this case, what might be common between tree objects corresponding to the same modified and renamed file in different revisions?

Many thanks in advance for your help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

淡忘如思 2024-12-26 01:12:04

Git 已经具备了该功能。请参阅 -M/--find-renames-C/--find-copies-C -C/--find-copies-harder diff 选项(适用于 logshow< /code> 以及)和--follow 选项来记录

--find-renames 的原理是,如果它在修订版中看到新文件,它会查看该修订版中删除的文件,比较它们,如果有足够相似的文件,则将其声明为重命名。

编辑:更详细地说:为了检测副本/重命名,git 首先比较两个版本,然后比较文件列表。对于仅出现在新版本中的每个路径,它会将内容与旧版本中的文件内容进行比较,其中 -M(已删除)、-C(已修改)或 < code>-C——全部,如果它们足够相似(需要比较),则将其标记为重命名或适当复制。这是 diff 核心的一部分,可用于以任何形式显示差异的所有命令,包括名称状态,它不进行详细的逐行分析。除此之外,--follow 通过逐个迭代修订版本来工作,通过重命名检测执行名称状态差异,并在文件被修改时输出修订版本并记住新(旧)名称当它更名时。

Git allready has that functionality. See -M/--find-renames, -C/--find-copies and -C -C/--find-copies-harder options to diff (applies to log and show as well) and --follow option to log.

The principle of --find-renames is, that if it sees new file in a revision, it looks at the files removed in that revision, compares them and if any is similar enough, declares it a rename.

Edit: In more details: To detect copies/renames, git compares the two revision first it compares the lists of files. Than for each path that only appears in the new revision it compares the content with content of files from old revision that -M—were deleted, -C—were modified or -C—all and if they are similar enough (which requires diff), marks it as rename or copy as appropriate. This is part of the diff core and is available to all commands that show diffs in any form, including the name-status, which does not do detailed line-by-line analysis. On top of this the --follow works by iterating the revisions one by one, does a name-status diff with rename detection and outputs the revision if the file was modified and remembers the new (old) name when it was renamed.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文