当前位置：文江博客话题详情

Git 提交生成数

发布于 2024-11-23 16:21:12 字数 98 浏览 4 评论 0原文

什么是 git 提交生成编号（黑客新闻链接）以及它们的意义是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无声静候 2024-11-30 16:21:12

只需添加到 siri 的答案，“提交生成编号”为：

此处解释：

提交的生成是其在历史图中的高度，如下所示
从最远的根开始测量。其定义为：
如果提交没有父项，则其代数为 0。
否则，其代数比其父代的最大代数多 1 代。

2005 年 Git 创建时已经提到过的一个老话题：

Linus Torwald（昨天，7 月 14 日）：
好的，我看到关于世代数的旧讨论又重新出现了。
我不得不说，在使用 git 六年的时间里，我认为这些年来多次出现世代数字的概念并不是巧合：我认为缺乏它们实际上是我们唯一真正的设计错误。
[...]
它实际上早在 2005 年 7 月就出现了，所以“让我们在提交中使用代号”的说法已经非常古老了。

关于快速了解一个提交是否是另一个提交的祖先的问题（无需回溯 DAG——提交图——）：

我认为完全有理由说这个问题基本上可以归结为一个 git 问题：“提交 X 可以成为提交 Y 的祖先吗”（作为一种基本上限制某些算法必须一路向下走的方法）。我们已经使用了提交日期，实际上它确实运作得很好。但这始终是一个破碎的启发式。
所以，是的，我个人认为生成计数器是正确比较提交日期的一种方法。如果只说“如果没有代号，我们将使用日期戳，并且知道它们可能不正确”，那就完全没问题了。
“使用日期戳”后备措施很可能涉及我们已经做过的所有启发式方法（即检查戳记看起来是否正常，而不是只信任单个戳记）。

正如黑客新闻线程提到的：

生成数是树状态的结果，而时间戳则源自提交时的周围环境（并且可能不正确！）。
目前，每个提交都会存储对父树的引用。
通过解析该树并读取整个历史记录，您可以获得提交的层次结构。
因为在很多情况下需要对提交进行排序，读取整个历史记录的效率极低，因此 git 使用时间戳来确定提交的顺序。
如果给定机器上的系统时钟关闭，这当然会失败。
使用生成编号，您可以从最新提交在本地获取排序，而不必依赖时间戳或读取整个树。
当你有一个生成n的提交时，任何包含它的后续提交都会有生成>n，所以要告诉提交之间的关系，你只需要追溯到 n，您可以立即获取任何中间提交的顺序。
它与“容易记住”无关。这是为了让 git 更加高效和健壮

而不是多余：

<块引用>
世代数与父指针所代表的历史的实际结构完全冗余。

莱纳斯：

不正确。仅当您在该语句中添加“...如果您解析整个历史记录”时，这才是正确的。
而且我们从未解析过整个历史，因为它太昂贵并且无法扩展。所以现在我们依靠提交日期进行一些修改。
所以不，代号一点也不多余。它们是根本性的。这就是我们六年前进行此讨论的原因。

关于在哪里缓存该信息（或者是否应该缓存）仍然存在争议，但从用户的角度来看，它仍然是关于一些“容易记住”的信息（这不是提交生成的目标）数字）：

<块引用>
所以它几乎（但不完全）像其他人一直拥有的修订号？
是的——几乎，但不完全是。
如果你和我都从 gen #123 的提交中创建一个分支，那么，据我了解，我的分支中的后续提交将是 #124，< code>#125 等，并且您在分支中的提交也将是 #124、#125 等。
对比一下：
- 使用 CVS，我将拥有 1.124.1.1、1.124.1.2 等，而您将拥有 1.124.2.1、1.124.2.2，或
- 使用 Subversion，我可能会得到修订 125、128 和 129，而服务器会提供您的提交 #124、127 和 130 以及其他人在项目的完全不同部分获得了 #126。
只要开发在单个分支上线性进行，那么是的，这就是在集中式 RCS 中保存为修订号——但是，一旦开始分支和合并，它就完全代表了一个不同的概念。对于单个存储库，它确实具有与 svn revnos 非常相似的解释。您可以在特定存储库中谈论“分支的修订#125”。这通常正是人与人之间关于发展的沟通所需要的。 “你能看看这个 bug 是否存在于不稳定的 r125 中吗？” “我已经获得了产品 r245 之前的所有更改” 我想令人困惑的方面是，如果中央服务器中的“r245 of prod”在我的本地存储库中是“r100 of prod”，因为我还没有克隆完整的历史记录？

Just to add to siri's answer, "Commit Generation Numbers" are:

explained here:

A commit's generation is its height in the history graph, as
measured from the farthest root. It is defined as:
If the commit has no parents, then its generation is 0.
Otherwise, its generation is 1 more than the maximum of its parents generations.

an old topic already mentioned at the creation of Git in 2005:

Linus Torwald (yester, July 14th):
Ok, so I see that the old discussion about generation numbers has resurfaced.
And I have to say, with six years of git use, I think it's not a coincidence that the notion of generation numbers has come up several times over the years: I think the lack of them is literally the only real design mistake we have.
[...]
It actually came up as early as July 2005, so the "let's use generation numbers in commits" thing is really old.

about the question of quickly knowing if a commit is an ancestor of another commit (without having to walk back the DAG -- the graph of commits --):

I think it's entirely reasonable to say that the issue basically boils down to one git question: "can commit X be an ancestor of commit Y" (as a way to basically limit certain algorithms from having to walk all the way down). We've used commit dates for it, and realistically it really has worked very well. But it was always a broken heuristic.
So yes, I personally see generation counters as a way to do the commit date comparisons right. And it would be perfectly fine to just say "if there are no generation numbers, we'll use the datestamps instead, and know that they could be incorrect".
That "use the datestamps" fallback thing may well involve all the heuristics we already do (ie check for the stamps looking sane, and not trusting just one individual one).

As the Hacker news thread mentions:

Generation numbers are a result of the state of the tree, while timestamps are derived from the ambient (and potentially incorrect!) environment from which the commit was made.
At the moment, each commit stores a reference to the parent tree.
By parsing that tree and reading the entire history you can obtain a hierarchy of commits.
Because you need to order commits in many situations, reading the entire history is extremely inefficient, so git uses timestamps to determine the ordering of commits.
This of course fails if the system clock on a given machine is off.
With a generation number, you can get an ordering locally from the latest commits, without having to rely on timestamps or read the entire tree.
When you have a commit with generation n, any later commits that include it wound have generation >n, so to tell the relation between commits, you only need look as far back as n, and you can immediately get the order of any intermediate commits.
It has nothing to do with "easy to remember". It's about making git more efficient and robust

not redundant:

Generation numbers are completely redundant with the actual structure of history represented by the parent pointers.

Linus:

Not true. That's only true if you add "... if you parse the whole history" to that statement.
And we've never parsed the whole history, because it's just too expensive and doesn't scale. So right now we depend on commit dates with a few hacks.
So no, generation numbers are not at all redundant. They are fundamental. It's why we had this discussion six years ago.

There is still a debate as to where to cache that information (or if it should be cached), but for the user point of view, it still is about some "easy to remember" information (which isn't the goal of commit generation number):

So it's almost, but not quite, like the revision numbers everyone else has always had?
Yes -- almost, but not quite.
If you and I each create a branch off of a commit with gen #123, then, as I understand it, the subsequent commits in my branch would be #124, #125, etc., and your commits in your branch would also be #124, #125, etc.
Contrast this:
- with CVS, where I would have 1.124.1.1, 1.124.1.2, etc., and you would have 1.124.2.1, 1.124.2.2, or
- with Subversion, where I might get revisions 125, 128, and 129, while the server gave your commits #124, 127 and 130, and someone else, on a totally different part of the project got #126.
As long as development proceeds linearly, on a single branch, then yeah, it's about the save as revision numbers in a centralized RCS -- once you start branching and merging, though, it represents a different concept entirely.
For a single repository, it does have a very similar interpretation to, say, svn revnos.
You can speak of "revision #125 of a branch" in a specific repository. Which is generally exactly what you need for human-to-human communication about development.
"Can you see if that bug is in r125 of unstable?" "I've got all changes up to r245 of prod"
I guess the confusing aspect would be if "r245 of prod" in the central server was "r100 of prod" in my local repo because I haven't cloned the full history?

回复收藏 0 原文

欲拥i 2024-11-30 16:21:12

问题（如 [email protected] 上的线程中暗示的）是我们信任的DAG方向被统计反方向，从分支头向后穿过血统。生成编号（即使在提交时记录）是通过后代进行计数的。另外，我们经常在不同的（分布式）存储库中混淆感知的历史 - 因此出现了所有问题。

只需阅读 Linus 的最新，除了他对重命名的误读（我认为 George Spelvin 同意他的观点 - 不要在存储库中记录重命名，只需拍摄快照），他确实指出：

git 的基本设计都是关于不完整 DAG 遍历的。 DAG 遍历部分非常明显且简单，但是部分部分确实非常非常重要。”。
因此，本质上，预先记录的提交“生成”编号会告诉您还需要到达底部（根）多远（最大值），因此如果您可以信任它，那么您可以选择停止不完整 DAG 遍历。如果没有它，您将不得不一路追根溯源，这是低效的。
所以我想我现在改变了主意，我意识到这是一个停止标准。这并不是说某些（本地计算的）缓存可能不会加速某些搜索。