Git:如何找到与目录最匹配的提交?

发布于 2024-11-15 23:37:57 字数 1415 浏览 4 评论 0 原文

有人采用了 Moodle 的一个版本(我不知道),在目录中应用了许多更改,然后发布了它(树在这里)。

如何确定原始项目的哪个提交最有可能被编辑以形成此树?

这将允许我使用此补丁在适当的提交处形成分支。当然它来自 1.81.9 分支,可能来自发布标签,但特定提交之间存在差异对我帮助不大。

事后更新: knittl 的回答让我尽可能接近。我首先将我的补丁存储库添加为远程“外部”(没有共同的提交,没关系),然后使用几个格式选项在循环中进行差异。第一个使用 --shortstat 格式:

for REV in $(git rev-list v1.9.0^..v1.9.5); do 
    git diff --shortstat "$REV" f7f7ad53c8839b8ea4e7 -- mod/assignment >> ~/rdiffs.txt; 
    echo "$REV" >> ~/rdiffs.txt; 
done;

第二个只是在没有上下文的情况下计算统一差异中的行更改:

for REV in $(git rev-list v1.9.0^..v1.9.5); do 
    git diff -U0 "$REV" f7f7ad53c8839b8ea4e7 -- mod/assignment | wc -l >> ~/rdiffs2.txt;
    echo "$REV" >> ~/rdiffs2.txt; 
done;

有数千个提交需要挖掘,但是 这个似乎是最接近的匹配。

Someone took a version (unknown to me) of Moodle, applied many changes within a directory, and released it (tree here).

How can I determine which commit of the original project was most likely edited to form this tree?

this would allow me to form a branch at the appropriate commit with this patch. Surely it came from either the 1.8 or 1.9 branches, probably from a release tag, but diffing between particular commits doesn't help me much.

Postmortem Update: knittl's answer got me as close as I'm going to get. I first added my patch repo as the remote "foreign" (no commits in common, that's OK), then did diffs in loops with a couple format options. The first used the --shortstat format:

for REV in $(git rev-list v1.9.0^..v1.9.5); do 
    git diff --shortstat "$REV" f7f7ad53c8839b8ea4e7 -- mod/assignment >> ~/rdiffs.txt; 
    echo "$REV" >> ~/rdiffs.txt; 
done;

The second just counted the line changes in a unified diff with no context:

for REV in $(git rev-list v1.9.0^..v1.9.5); do 
    git diff -U0 "$REV" f7f7ad53c8839b8ea4e7 -- mod/assignment | wc -l >> ~/rdiffs2.txt;
    echo "$REV" >> ~/rdiffs2.txt; 
done;

There were thousands of commits to dig through, but this one seems to be the closest match.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

萌无敌 2024-11-22 23:37:57

您可以编写一个脚本,将给定的树与存储库中的修订范围进行比较。

假设我们首先将更改后的树(没有历史记录)提取到我们自己的存储库中:

git remote add foreign git://…
git fetch foreign

然后我们为要匹配的每个修订版输出 diffstat(简短形式):

for REV in $(git rev-list 1.8^..1.9); do
   git diff --shortstat foreign/master $REV;
done

查找具有最小更改量的提交(或使用某种排序)机制)

you could write a script, which diffs the given tree against a revision range in your repository.

assume we first fetch the changed tree (without history) into our own repository:

git remote add foreign git://…
git fetch foreign

we then output the diffstat (in short form) for each revision we want to match against:

for REV in $(git rev-list 1.8^..1.9); do
   git diff --shortstat foreign/master $REV;
done

look for the commit with the smallest amount of changes (or use some sorting mechanism)

清君侧 2024-11-22 23:37:57

这是我的解决方案:

#!/bin/sh

start_date="2012-03-01"
end_date="2012-06-01"
needle_ref="aaa"

echo "" > /tmp/script.out;
shas=$(git log --oneline --all --after="$start_date" --until="$end_date" | cut -d' ' -f 1)
for sha in $shas
do
    wc=$(git diff --name-only "$needle_ref" "$sha" | wc -l)
    wc=$(printf %04d $wc);
    echo "$wc $sha" >> /tmp/script.out
done
cat /tmp/script.out | grep -v ^$ | sort | head -5

This was my solution:

#!/bin/sh

start_date="2012-03-01"
end_date="2012-06-01"
needle_ref="aaa"

echo "" > /tmp/script.out;
shas=$(git log --oneline --all --after="$start_date" --until="$end_date" | cut -d' ' -f 1)
for sha in $shas
do
    wc=$(git diff --name-only "$needle_ref" "$sha" | wc -l)
    wc=$(printf %04d $wc);
    echo "$wc $sha" >> /tmp/script.out
done
cat /tmp/script.out | grep -v ^$ | sort | head -5
深白境迁sunset 2024-11-22 23:37:57

这里有一些非常好的解决方案!

我使用类似的方法来尝试找到最接近的源文件修订版(给定目标文件):

  1. 向后迭代分支 merge 中的所有提交,
  2. 查找与文件 target.txt 最接近的匹配
  3. 打印出 git revision 以及不同文本行数

NB 在新的一次性分支中执行 - reset --hard具有破坏性(据我所知)。

for REV in $(git rev-list merge); do
    git reset --hard "$REV"
    echo "$REV" `comm -2 -3 source.txt ../target.txt | wc -l`
done

您将得到如下所示的输出,它告诉您哪个版本是最接近的匹配(即差异最小的行):

1c58bd5925a1fc8233730626**************** 771
HEAD is now at ...
9b2c29b00f1b4541a4135906**************** 775
HEAD is now at ...
b8e0bf5ec4372ebbcbd4edd0**************** 342
HEAD is now at ...
ba0d474bf2aac40dae48923e**************** 342
HEAD is now at ...
6d96921d3e9ad760ce55e76c**************** 335 <-- Closest match
HEAD is now at ...
795cd4caae5a5b08563443c9**************** 396
HEAD is now at ...
8743f42b24dd77e3bcc897dd**************** 399
HEAD is now at ...
d1b74dd33074c17da3fff638**************** 929

进一步阅读:

  • comm - 用于输出不同的行
  • wc - 用于计算文本行数

来源:

Some really great solutions here!

I used something similar, to try and find the closet source file revision (given a target file):

  1. iterate backwards through all commits in the branch merge
  2. looking for the closest match with file target.txt
  3. print out the git revision, and the number of differing lines of text

N.B. perform inside a new, throw-away branch - reset --hard is destructive (afaik).

for REV in $(git rev-list merge); do
    git reset --hard "$REV"
    echo "$REV" `comm -2 -3 source.txt ../target.txt | wc -l`
done

You'll get output like the following, which tells you which revision was the closest match (i.e. least differing lines):

1c58bd5925a1fc8233730626**************** 771
HEAD is now at ...
9b2c29b00f1b4541a4135906**************** 775
HEAD is now at ...
b8e0bf5ec4372ebbcbd4edd0**************** 342
HEAD is now at ...
ba0d474bf2aac40dae48923e**************** 342
HEAD is now at ...
6d96921d3e9ad760ce55e76c**************** 335 <-- Closest match
HEAD is now at ...
795cd4caae5a5b08563443c9**************** 396
HEAD is now at ...
8743f42b24dd77e3bcc897dd**************** 399
HEAD is now at ...
d1b74dd33074c17da3fff638**************** 929

Further reading:

  • comm - for outputing differing lines
  • wc - for counting lines of text

Credit:

往日情怀 2024-11-22 23:37:57

如何使用 git 从 1.8 的所有版本创建补丁。和 1.9 到这个新版本。
然后你就可以看到哪个补丁更“有意义”。

例如,如果补丁“删除”了许多方法,那么它可能不是这个版本,而是之前的版本。如果补丁有许多部分作为单个编辑没有意义,那么它也可能不是这个版本。

等等……不幸的是,实际上,不存在一种算法可以完美地做到这一点。我必须采取一些启发式的做法。

How about using git to create a patch from all versions of 1.8. and 1.9 to this new release.
Then you could see which patch makes more 'sense'.

For example, if the patch 'removes' many methods, then it is probably not this release, but one before. If the patch has many sections that don't make sense as a single edit, then it probably isn't this release either.

And so on... In reality, unfortunately, there doesn't exist an algorithm to do this perfectly. I will have to be some heuristic.

内心荒芜 2024-11-22 23:37:57

使用“git Blame”怎么样?它会向您显示每一行的更改者和修订版本。

How about using 'git blame'? It will show you, for each line, who changed it, and in which revision.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文