“git log --graph”如何实现?或“hg graphlog”工作?

发布于 2024-10-13 07:32:49 字数 242 浏览 7 评论 0 原文

我知道 Git 中的历史记录存储在称为 DAG 的数据结构中。我听说过 DFS 并且知道它有些相关。

我很好奇,git log --graphhg graphlog 等程序如何绘制历史记录?我一直认为以如此漂亮的方式绘制车道和所有东西是相当复杂的。

有人可以写一些伪代码来演示它吗?

注意:我尝试查看 Git 或 hg 的代码,但很难理解并大致了解正在发生的事情。

I know that the history in Git is stored in a data structure called a DAG. I've heard about DFS and know it's somewhat related.

I'm curious, how do programs such as git log --graph or hg graphlog draw the history? I always thought it's quite complicated to draw the lanes and everything in such a nice way.

Could someone write some pseudo code that demonstrates it?

note: I tried looking around Git or hg's code but it's very hard to follow and get a general idea of what's going on.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

把昨日还给我 2024-10-20 07:32:49

首先,获取提交列表(与 git rev-list 一样)以及每个提交的父项。内存中保存有“列保留列表”。

对于每个提交,然后:

  • 如果提交没有为其保留列,则将其分配给空闲列。这就是支部负责人的开始方式。
  • 根据列保留列表打印树形图形,然后打印提交消息
  • 当前列/提交的保留列表条目将使用当前提交的第一个父级进行更新,以便父级将在同一列中打印。
  • 其他家长可以获得新的免费专栏。
  • 如果这是合并,下一行将尝试将第二个父级链接到预期提交的列(这会产生循环和“i 桥”)

显示 git-forest 输出的示例> 在 aufs2-util 上进行额外提交以拥有多个分支)。

示例

通过前瞻,人们可以预测合并点将有多远,并在两根柱子之间挤压木材,以获得更美观的结果。

First, one obtains a list of commits (as with git rev-list), and parents of each commit. A "column reservation list" is kept in memory.

For each commit then:

  • If the commit has no column reserved for it, assign it to a free column. This is how the branch heads will start.
  • Print the tree graphics according to the column reservation list, and then the commit message
  • The reservation's list entry for the current column/commit is updated with the first parent of the current commit, such that the parent is going to be printed in the same column.
  • Other parents get a new free column.
  • If this was a merge, the next line will try to link the second parent to a column where the commit is expected (this makes for the loops and the "≡ bridge")

Example showing output of git-forest on aufs2-util with an extra commit to have more than one branch).

Example

With lookahead, one can anticipate how far down the merge point will be and squeeze the wood between two columns to give a more aesthetically pleasing result.

娜些时光,永不杰束 2024-10-20 07:32:49

我尝试查看 Git 或 hg 的代码,但很难跟踪并大致了解正在发生的事情。

对于 hg,您是否尝试遵循 hg 本身或 graphlog 中的代码?

因为graphlog的代码很短。您可以在 hgext/graphlog.py 中找到它,实际上,最重要的部分是前 200 行,其余部分是扩展的引导和查找选定的修订图。代码生成函数是 ascii,其最后一个参数是调用 asciiedge 的结果(调用本身在 generate 的最后一行执行) code>,由 graphlog 提供给 generate 的函数)

I tried looking around Git or hg's code but it's very hard to follow and get a general idea of what's going on.

For hg, did you try to follow the code in hg itself, or in graphlog?

Because the code of graphlog is pretty short. You can find it in hgext/graphlog.py, and really the important part is the top ~200 lines, the rest is the extension's bootstrapping and finding the revision graph selected. The code generation function is ascii, with its last parameter being the result of a call to asciiedge (the call itself is performed on the last line of generate, the function being provided to generate by graphlog)

牵强ㄟ 2024-10-20 07:32:49

与一般的图形显示相比,这个特殊问题并不难。因为您希望保持节点的提交顺序,所以问题变得更加简单。

另请注意,显示模型是基于网格的,行是提交,列是过去/未来的边缘。

虽然我没有阅读 git 源代码,但您可能只是从最新的开始遍历提交列表,并维护过去的开放边缘列表。沿着边缘自然会导致拆分/合并列,最终会得到树 git/hg 显示的类型。

合并边时,您希望避免交叉其他边,因此您必须尝试提前对列进行排序。这实际上是唯一可能不简单的部分。例如,可以执行双遍算法,在第一遍中为边缘制定列顺序,并在第二遍中进行绘图。

This particular problem isn't that hard, compared to graph display in general. Because you want to keep the nodes in the order they were committed the problem gets much simpler.

Also note that the display model is grid based, rows are commits and columns are edges into the past/future.

While I didn't read the git source you probably just walk the list of commits, starting from the newest, and maintain a list of open edges into the past. Following the edges naturally leads to splitting/merging columns and you end up with the kind of tree git/hg display.

When merging edges you want to avoid crossing other edges, so you'll have to try to order your columns ahead of time. This is actally the only part that may not be straightforward. For example one could do a two-pass algorithm, making up a column order for the edges in the first pass and doing the drawing in the second pass.

ぇ气 2024-10-20 07:32:49

注意:Git 2.18(2018 年第 2 季度)现在会预先计算祖先遍历所需的信息并将其存储在单独的文件中,以优化图形遍历。

提交图的概念确实改变了“git log --graph”的工作方式。

正如此处提到

git config --global core.commitGraph true
git config --global gc.writeCommitGraph true
cd /path/to/repo
git commit-graph write

请参阅提交 7547b95提交 3d5df01,< a href="https://github.com/git/git/commit/049d51a2bb9a03d2f2c2cce1ae41e57dbbf42244" rel="nofollow noreferrer">提交 049d51a, 提交 177722b提交 4f2542b< /a>, 提交 1b70dfd, 提交 2a2e32b(2018 年 4 月 10 日),以及 提交 f237c8b提交 08fd81c提交 4ce58ee提交 ae30d7b, 提交 b84f767 提交cfe8321提交 f2af9f5(2018 年 4 月 2 日),作者:德里克·斯托利 (derrickstolee)
(由 Junio C Hamano -- gitster -- 合并于 提交 b10edb2,2018 年 5 月 8 日)

您现在拥有命令 git commit-graph:编写并验证 Git 提交图形文件。

根据包文件中找到的提交编写提交图文件。
包括现有提交图文件中的所有提交。

设计文档指出:

Git 遍历提交图的原因有很多,包括:

  1. 列出并过滤提交历史记录。
  2. 计算合并基础。

随着提交计数的增加,这些操作可能会变慢。合并
基数计算出现在许多面向用户的命令中,例如“merge-base”
或“状态”,并且可能需要几分钟的时间来计算,具体取决于历史记录的形状。

这里有两个主要成本:

  1. 解压缩并解析提交。
  2. 遍历整个图以满足拓扑顺序约束。

提交图文件是一种补充数据结构,可以加速
提交图行走

如果用户降级或禁用“core.commitGraph”配置设置,则现有 ODB 就足够了。

该文件以“commit-graph”形式存储在 .git/objects/info 目录或备用目录的 info 目录中。

提交图文件存储提交图结构以及一些
额外的元数据可以加速图形遍历。

通过按字典顺序列出提交 OID,我们可以识别每个提交的整数位置,并使用这些整数位置引用提交的父级。
我们使用二分搜索来查找初始提交,然后使用整数位置
用于在行走过程中快速查找。

您可以看到测试用例

git log --oneline $BRANCH
git log --topo-order $BRANCH
git log --graph $COMPARE..$BRANCH
git 分支 -vv
git merge-base -a $BRANCH $COMPARE

这将提高git log性能


在 Git 2.39(2022 年第 4 季度)中,添加了“提交图文件”和“可达性位图”的术语表条目。

请参阅提交8fea12a提交 4973726提交fa8e8d5提交 776ba91(2022 年 10 月 29 日),作者:菲利普·奥克利 (PhilipOakley)
(由 Taylor Blau -- ttaylorr -- 合并于 提交 4b6302c,2022 年 11 月 8 日)

词汇表:添加可达性位图描述

签字人:Philip Oakley
签字人:Taylor Blau

描述可达性位图的用途。

glossary-content 现在包含在其 手册页

可达性位图

可达性位图存储有关
中选定的一组提交的可达性
包文件或多包索引 (MIDX),以加速对象搜索。
位图存储在“.bitmap”文件中。
存储库可能位于
使用最多的一个位图文件。
位图文件可以属于任一
pack,或存储库的多包索引(如果存在)。

和:

词汇表:添加“提交图”描述

签字人:Philip Oakley
签字人:Taylor Blau

Git 有一个额外的“提交图”功能,可以补充普通提交对象的有向无环图 (DAG)。
补充提交图文件是为了提高访问速度而设计的。

从规范 DAG 角度和提交图文件角度描述提交图。

此外,通过链接到与此提交图条目匹配的 ref 术语表条目来阐明分支引用和分支提示之间的链接。

提交图文件也通过它的连字符来区分。

后续提交会捕获提交图连字符丢失的少数情况。

glossary-content 现在包含在其 手册页

提交图概念、表示和用法

由提交形成的 DAG 结构的同义词
在对象数据库中,由分支提示引用,
使用他们的链接提交链。
该结构是最终的提交图。这
图可以用其他方式表示,例如
“提交图”文件。

提交图文件

“commit-graph”(通常用连字符连接)文件是补充文件
提交图的表示
这加速了提交图的遍历。
“提交图”文件是
存储在 .git/objects/info 目录或 info 中
备用对象数据库的目录。


Git 2.19(2018 年第 3 季度)将处理锁定文件:

请参阅 commit 33286dc ( 2018 年 5 月 10 日),提交 1472978提交 7adf526提交 04bc8d1提交 d7c1ec3提交f9b8908提交 819807b提交 e2838d8提交 3afc679提交 3258c66 (2018 年 5 月 1 日),以及 提交 83073cc提交 8fb572a(2018 年 4 月 25 日)作者:Derrick Stolee (derrickstolee)
帮助者:Jeff King (peff)
(由 Junio C Hamano -- gitster -- 合并于 提交 a856e7d,2018 年 6 月 25 日)

commit-graph:修复 .lock 文件存在时的 UX 问题

我们使用锁文件 API 来避免多个 Git 进程写入
.git/objects/info 目录
中的提交图文件。
在某些情况下,该目录可能不存在,因此我们检查它是否存在。

现有代码在获取锁时执行以下操作:

  1. 尝试获取锁。
  2. 如果失败,请尝试创建 .git/object/info 目录。
  3. 尝试获取锁,必要时会失败。

问题是,如果锁文件存在,则 mkdir 失败,给出
对用户没有帮助的错误:

“致命:无法 mkdir .git/objects/info:文件存在”

虽然从技术上讲这尊重锁定文件,但它对用户没有帮助。

相反,请执行以下操作:

  1. 检查.git/objects/info是否存在;如有必要,请创建。
  2. 尝试获取锁,必要时会失败。

新输出如下所示:

致命:无法创建
'/.git/objects/info/commit-graph.lock':文件存在。

另一个 git 进程似乎正在这个存储库中运行,例如
由“git commit”打开的编辑器。 
请确保所有进程都已终止,然后重试。 
如果仍然失败,则 git 进程可能之前在此存储库中崩溃了:
手动删除文件以继续。

注意:当核心对象
从未知类型提升到提交(例如,提交是
通过引用它的标签访问)涉及,这已经
使用 Git 2.21 更正(2019 年 2 月)

请参阅 commit 4468d44(2019 年 1 月 27 日) SZEDER Gábor (szeder)
(由 Junio C Hamano -- gitster -- 合并于 提交 2ed3de4,2019 年 2 月 5 日)


该算法正在 Git 2.23 中重构( 2019 年第三季度)。

请参阅提交238def5提交 f998d54, 提交014e344提交b2c8306提交 4c9efe8, 提交 ef5b83f提交 c9905be提交 10bd0be提交 5af8039提交 e103f72 (2019 年 6 月 12 日),以及提交 c794405(2019 年 5 月 9 日),作者:Derrick Stolee (derrickstolee)
(由 Junio C Hamano -- gitster -- 合并于 提交 e116894,2019 年 7 月 9 日)

Commit 10bd0be 解释范围的变化。


在 Git 2.24(2109 年第 3 季度)中,通过给定提交对象名称编写commit-graph 的代码变得更加健壮。

请参阅 提交 7c5c9b9提交 39d8831, 提交9916073(2019 年 8 月 5 日),作者:SZEDER Gábor (szeder)
(由 Junio C Hamano -- gitster -- 合并于 提交 6ba06b5,2019 年 8 月 22 日)


而且,仍然使用 Git 2.24(2019 年第 4 季度 ) ),解析和使用提交图文件的代码针对损坏的输入变得更加健壮。

请参阅 提交 806278d提交 16749b8, 提交23424ea(2019 年 9 月 5 日),作者:Taylor Blau (ttaylorr)
(由 Junio C Hamano -- gitster -- 合并于 提交 80693e3,2019 年 10 月 7 日)

t/t5318:引入失败的“git commit-graph write”测试

在损坏的存储库中调用“git commit-graph”时,如果祖先提交以某种方式损坏,可能会导致段错误。
这是由于“commit-graph.c”代码中的两个函数调用可能导致
返回NULL,但在取消引用之前不检查是否为 NULL。

因此:

commit-graph.c:处理提交解析错误

要写入提交图块,“write_graph_chunk_data()”会获取要写入的提交列表,并在写入必要的数据之前解析每个提交,然后继续处理列表中的下一个提交。

由于这些提交中的大多数都不会提前解析(列表中的最后提交是一个例外,它在“copy_oids_to_commits”中提前解析) ),对它们调用“parse_commit_no_graph()”可能会返回错误。
在取消引用后续调用之前未能捕获这些错误可能会导致未定义的内存访问和 SIGSEGV。
²
一个这样的示例是“get_commit_tree_oid()”,它期望一个已解析的对象作为其输入(在本例中,commit-graph 代码传递“*列表')。
如果“*list”导致解析错误,则后续调用将失败。

通过检查“parse_commit_no_graph()”的返回值来防止此类问题,以避免将未解析的对象传递给需要已解析对象的函数,从而防止段错误。


在 Git 2.26(2020 年第一季度)中,计算提交图的代码已被教导使用更强大的方法来判断两个对象目录是否引用同一事物。

请参阅提交a7df60c提交 ad2dd5b, 提交13c2499(2020 年 2 月 3 日),提交 0bd52e2(2020 年 2 月 4 日) ,以及 提交 1793280(2020 年 1 月 30 日),作者:泰勒·布劳 (ttaylorr)
(由 Junio C Hamano -- gitster -- 合并于 提交 53c3be2,2020 年 2 月 14 日)

commit-graph.h:存储'struct write_commit_graph_context' 中的odb

签字人:Taylor Blau

commit-graph 中有很多地方。 h,其中函数具有(或几乎具有)完整的 struct object_directory *,访问 ->path`,然后丢弃结构的其余部分。

在比较备用对象目录的位置时(例如,在决定是否可以合并两个提交图层的情况下),这可能会导致头痛。
这些路径使用 normalize_path_copy() 进行标准化,这可以缓解一些比较问题,但不是全部1

通过在 write_commit_graph_context 中存储 struct object_directory* ,将 char *object_dir 的用法替换为 odb->path代码>结构。
这是摆脱 'commit-graph.c'.

现在解析用户提供的“--object-dir”参数需要我们将其与已知的替代项进行比较以确保相等。

在此补丁之前,未知的“--object-dir”参数会以状态零静默退出。

这显然会导致意外的行为,例如验证不在存储库自己的对象存储(或其替代对象存储之一)中的提交图,或者导致拼写错误掩盖合法的提交图验证失败。< br>
当给定的“--object-dir”与任何已知的备用对象存储不匹配时,通过“die()”-ing 使此错误变得非静默。


在 Git 2.28(2020 年第 3 季度)中,commit-graph write --stdin-commits 得到了优化。

请参阅 提交 2f00c35提交 1f1304d, 提交 5b6653e提交 630cd51提交 d335ce8(2020 年 5 月 13 日),提交 fa8953c (2020 年 5 月 18 日),以及 提交 1fe1084(2020 年 5 月 05 日),作者:泰勒·布劳 (ttaylorr)
(由 Junio C Hamano -- gitster -- 合并于 提交 dc57a9b,2020 年 6 月 9 日)

commit-graph:删除 COMMIT_GRAPH_WRITE_CHECK_OIDS 标志

帮助者:杰夫·金
签字人:Taylor Blau

以来写入-stdin-commits '“”,2019-08-05中,在无效的commit oids上出现错误/git/git/commit/6BA06B582BCF0FB4AFA9FCC7D265005EB20CD50F“ rel =“ nofollow noreferrer”> Merge 98F776C26“ rel =“ nofollow noreferrer” >批次#1 ),commit-graph内置因接收非委托OID作为' - stdin-commits '。

输入

这种行为可能很麻烦,例如,' git for-each-ref 'to' git commit-git-graph 写'写了包含与提交有关的输入的图表,并默默地忽略了输入的其余部分。

已经提出了一些选项,以实现' - [no-] check-ods ',这将使呼叫者能够让commit-graph hildiN添加。

经过一番讨论,很难想象一个不想通过'的来电者-No-Check-ods ',这表明我们应该摆脱抱怨非招称的行为完全输入。

如果呼叫者确实希望保留此行为,则可以通过以下几种操作来轻松解决此更改:

  git for-each-ref -format ='%(objectName)%(objectType)%(*objectType)'|
尴尬'
  !/ commit/ {print“ not-a-commit:” $ 1}
   / commit/ {打印$ 1}
' |
git commit-graph写作-stdin-commits

使其成为指向不存在的对象的有效OID在松开错误处理后确实是错误p>

用GIT 2.28(Q3 2020)对此进行了测试。

参见 commits 6334c5f (03 Jun 2020)(03 Jun 2020) nofollow noreferrer“> taylor blau( ttaylorr )。
(由 Junio C Hamano -- gitster -- 合并于

'尊重'<代码> - [no-] progress '

签名:泰勒·布劳
Acked-by:Derrick Stolee

最近针对GIT的线覆盖测试中未涵盖以下几行:

 内置/commit-graph.c
5B6653E5 244)progress = start_delayed_progress(
5B6653E5 268)stop_progress(&amp; progress);

当两个' - stdin-commits '和' - progress '被传递时,这些语句将执行。介绍三个测试,这些测试可以遵守这些选项的各种组合,以确保涵盖这些线路。

更重要的是,这是在' - stdin-commits '的(某种程度上)的(某种程度上)的功能,即它尊重' - progress ' .

](https ://github.com/git/git/git/blob/94fbd9149a2d59b0dca1844484848ef9d3e0607a7a7a19d/builtin/builtin/builtin/commit-graph.c) ,git v2.28.0- batch#2 ),从' stdin-comports nide of of code

现在,可以从 commit-graph.c ,添加一个相应的测试,以确保它也尊重' - [no] -progress '。

另一个生成进度仪表输出的位置(从 [ commit-graph.c ](https ://github.com/git/git/git/blob/94fbd9149a2d59b0dca184444848484848484848ef9d3e0607a7a7a19d/commit-graph.c) ,2020-05-13,git v2.28.0- '。



使用GIT 2.29(Q4 2020),IN_Merge_Bases_Many()是一种查看一组提交中任何提交是否可以达到提交的方法,当使用了提交订单 - 格拉普功能(已纠正)时,已完全破坏。

参见 noreflowl noreferrer“> commit github.com/derrickstolee“ rel =“ nofollow noreferrer”> derrick stolee( derrickstolee
(由 Junio C Hamano -- gitster -- 合并于

bug

报告:srinidhi kaushik
帮助:约翰内斯·辛德林
签名:Derrick Stolee

返回](https ://github.com/git/git/git/blob/8791bf18414a37205127E184C04C04CAD53A43AEFF1/COMMIT.C): 05-01,git v2.19.0-rc0- nofollow noreferrer“> batch#1 ) /代码>步行。
只要呼叫者仅检查两次提交,这种情况就可以正常工作,但是当有多个提交时,这种启发式可能是非常错误的

自那时以来,某些代码移动已将此方法更改为 repo_in_merge_bases_many() commit-reach.c 。启发式计算“参考”列表的最小生成数,然后将此数字与“ commit”的生成编号进行比较。

在最近的主题中,添加了一个测试,该测试使用 in_merge_bases_many()来测试是否可以从从反射仪中提取的许多提交中达到提交。但是,这突出了问题:如果任何参考提交的生成号都比给定提交较小,则步行会跳过 _even 如果存在一些具有较高生成号码_。

的wall。

这个启发式是错误的!它必须检查参考提交的最大生成数,而不是最低。

修复本身是 min_generation max_generation in repo_in_merge_merge_bases_many()


在GIT 2.32(Q1 2021)之前,当存储库中使用的某些功能(例如移植物)与使用提交图的使用不符时,我们过去曾默默关闭提交图形;现在,我们告诉用户我们在做什么。

参见 johannes schindelin( dscho
(由 Junio C Hamano -- gitster -- 合并于 noreflow noreferrer“

> ,但是

noreflow noreferrer”> ,指示为什么

签名:Johannes Schindelin
Acked-by:Derrick Stolee

gc.writecommitgraph = true 时,提示图可能是 stall 未写的:替换对象,移植物和浅库库与commit-图形功能。

在这种情况下,我们需要向用户表明为什么未写下提交图片而不是对此保持沉默。

警告将是:

 存储库包含替换对象;跳过提交图
存储库包含(不推荐的)移植物;跳过提交图
存储库很浅;跳过提交图

Note: Git 2.18 (Q2 2018) does now pre-compute and store information necessary for ancestry traversal in a separate file to optimize graph walking.

That notion of commits graph does change how 'git log --graph' does work.

As mentioned here:

git config --global core.commitGraph true
git config --global gc.writeCommitGraph true
cd /path/to/repo
git commit-graph write

See commit 7547b95, commit 3d5df01, commit 049d51a, commit 177722b, commit 4f2542b, commit 1b70dfd, commit 2a2e32b (10 Apr 2018), and commit f237c8b, commit 08fd81c, commit 4ce58ee, commit ae30d7b, commit b84f767, commit cfe8321, commit f2af9f5 (02 Apr 2018) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit b10edb2, 08 May 2018)

You now have the command git commit-graph: Write and verify Git commit graph files.

Write a commit graph file based on the commits found in packfiles.
Includes all commits from the existing commit graph file.

The design document states:

Git walks the commit graph for many reasons, including:

  1. Listing and filtering commit history.
  2. Computing merge bases.

These operations can become slow as the commit count grows. The merge
base calculation shows up in many user-facing commands, such as 'merge-base'
or 'status' and can take minutes to compute depending on history shape.

There are two main costs here:

  1. Decompressing and parsing commits.
  2. Walking the entire graph to satisfy topological order constraints.

The commit graph file is a supplemental data structure that accelerates
commit graph walks
.
If a user downgrades or disables the 'core.commitGraph' config setting, then the existing ODB is sufficient.

The file is stored as "commit-graph" either in the .git/objects/info directory or in the info directory of an alternate.

The commit graph file stores the commit graph structure along with some
extra metadata to speed up graph walks.

By listing commit OIDs in lexicographic order, we can identify an integer position for each commit and refer to the parents of a commit using those integer positions.
We use binary search to find initial commits and then use the integer positions
for fast lookups during the walk.

You can see the test use cases:

git log --oneline $BRANCH
git log --topo-order $BRANCH
git log --graph $COMPARE..$BRANCH
git branch -vv
git merge-base -a $BRANCH $COMPARE

This will improve git log performance.


With Git 2.39 (Q4 2022), the glossary entries for "commit-graph file" and "reachability bitmap" have been added.

See commit 8fea12a, commit 4973726, commit fa8e8d5, commit 776ba91 (29 Oct 2022) by Philip Oakley (PhilipOakley).
(Merged by Taylor Blau -- ttaylorr -- in commit 4b6302c, 08 Nov 2022)

glossary: add reachability bitmap description

Signed-off-by: Philip Oakley
Signed-off-by: Taylor Blau

Describe the purpose of the reachability bitmap.

glossary-content now includes in its man page:

reachability bitmaps

Reachability bitmaps store information about the
reachability of a selected set of commits in
a packfile, or a multi-pack index (MIDX), to speed up object search.
The bitmaps are stored in a ".bitmap" file.
A repository may have at
most one bitmap file in use.
The bitmap file may belong to either one
pack, or the repository's multi-pack index (if it exists).

And:

glossary: add "commit graph" description

Signed-off-by: Philip Oakley
Signed-off-by: Taylor Blau

Git has an additional "commit graph" capability that supplements the normal commit object's directed acyclic graph (DAG).
The supplemental commit graph file is designed for speed of access.

Describe the commit graph both from the normative DAG view point and from the commit graph file perspective.

Also, clarify the link between the branch ref and branch tip by linking to the ref glossary entry, matching this commit graph entry.

The commit-graph file is also distinguished by its hyphenation.

Subsequent commit catches the few cases where the hyphenation of commit-graph was missing.

glossary-content now includes in its man page:

commit graph concept, representations and usage

A synonym for the DAG structure formed by the commits
in the object database, referenced by branch tips,
using their chain of linked commits.
This structure is the definitive commit graph. The
graph can be represented in other ways, e.g. the
"commit-graph" file.

commit-graph file

The "commit-graph" (normally hyphenated) file is a supplemental
representation of the commit graph
which accelerates commit graph walks.
The "commit-graph" file is
stored either in the .git/objects/info directory or in the info
directory of an alternate object database.


Git 2.19 (Q3 2018) will take care of the lock file:

See commit 33286dc (10 May 2018), commit 1472978, commit 7adf526, commit 04bc8d1, commit d7c1ec3, commit f9b8908, commit 819807b, commit e2838d8, commit 3afc679, commit 3258c66 (01 May 2018), and commit 83073cc, commit 8fb572a (25 Apr 2018) by Derrick Stolee (derrickstolee).
Helped-by: Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit a856e7d, 25 Jun 2018)

commit-graph: fix UX issue when .lock file exists

We use the lockfile API to avoid multiple Git processes from writing to
the commit-graph file in the .git/objects/info directory
.
In some cases, this directory may not exist, so we check for its existence.

The existing code does the following when acquiring the lock:

  1. Try to acquire the lock.
  2. If it fails, try to create the .git/object/info directory.
  3. Try to acquire the lock, failing if necessary.

The problem is that if the lockfile exists, then the mkdir fails, giving
an error that doesn't help the user:

"fatal: cannot mkdir .git/objects/info: File exists"

While technically this honors the lockfile, it does not help the user.

Instead, do the following:

  1. Check for existence of .git/objects/info; create if necessary.
  2. Try to acquire the lock, failing if necessary.

The new output looks like:

fatal: Unable to create
'<dir>/.git/objects/info/commit-graph.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. 
Please make sure all processes are terminated then try again. 
If it still fails, a git process may have crashed in this repository earlier:
remove the file manually to continue.

Note: The commit-graph facility did not work when in-core objects that
are promoted from unknown type to commit (e.g. a commit that is
accessed via a tag that refers to it) were involved, which has been
corrected with Git 2.21 (Feb. 2019)

See commit 4468d44 (27 Jan 2019) by SZEDER Gábor (szeder).
(Merged by Junio C Hamano -- gitster -- in commit 2ed3de4, 05 Feb 2019)


That algorithm is being refactored in Git 2.23 (Q3 2019).

See commit 238def5, commit f998d54, commit 014e344, commit b2c8306, commit 4c9efe8, commit ef5b83f, commit c9905be, commit 10bd0be, commit 5af8039, commit e103f72 (12 Jun 2019), and commit c794405 (09 May 2019) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit e116894, 09 Jul 2019)

Commit 10bd0be explain the change of scope.


With Git 2.24 (Q3 2109), the code to write commit-graph over given commit object names has been made a bit more robust.

See commit 7c5c9b9, commit 39d8831, commit 9916073 (05 Aug 2019) by SZEDER Gábor (szeder).
(Merged by Junio C Hamano -- gitster -- in commit 6ba06b5, 22 Aug 2019)


And, still with Git 2.24 (Q4 2019), the code to parse and use the commit-graph file has been made more robust against corrupted input.

See commit 806278d, commit 16749b8, commit 23424ea (05 Sep 2019) by Taylor Blau (ttaylorr).
(Merged by Junio C Hamano -- gitster -- in commit 80693e3, 07 Oct 2019)

t/t5318: introduce failing 'git commit-graph write' tests

When invoking 'git commit-graph' in a corrupt repository, one can cause a segfault when ancestral commits are corrupt in one way or another.
This is due to two function calls in the 'commit-graph.c' code that may
return NULL, but are not checked for NULL-ness before dereferencing.

Hence:

commit-graph.c: handle commit parsing errors

To write a commit graph chunk, 'write_graph_chunk_data()' takes a list of commits to write and parses each one before writing the necessary data, and continuing on to the next commit in the list.

Since the majority of these commits are not parsed ahead of time (an exception is made for the last commit in the list, which is parsed early within 'copy_oids_to_commits'), it is possible that calling 'parse_commit_no_graph()' on them may return an error.
Failing to catch these errors before de-referencing later calls can result in a undefined memory access and a SIGSEGV.
²
One such example of this is 'get_commit_tree_oid()', which expects a parsed object as its input (in this case, the commit-graph code passes '*list').
If '*list' causes a parse error, the subsequent call will fail.

Prevent such an issue by checking the return value of 'parse_commit_no_graph()' to avoid passing an unparsed object to a function which expects a parsed object, thus preventing a segfault.


With Git 2.26 (Q1 2020), the code to compute the commit-graph has been taught to use a more robust way to tell if two object directories refer to the same thing.

See commit a7df60c, commit ad2dd5b, commit 13c2499 (03 Feb 2020), commit 0bd52e2 (04 Feb 2020), and commit 1793280 (30 Jan 2020) by Taylor Blau (ttaylorr).
(Merged by Junio C Hamano -- gitster -- in commit 53c3be2, 14 Feb 2020)

commit-graph.h: store an odb in 'struct write_commit_graph_context'

Signed-off-by: Taylor Blau

There are lots of places in commit-graph.h where a function either has (or almost has) a full struct object_directory *, accesses ->path`, and then throws away the rest of the struct.

This can cause headaches when comparing the locations of object directories across alternates (e.g., in the case of deciding if two commit-graph layers can be merged).
These paths are normalized with normalize_path_copy() which mitigates some comparison issues, but not all 1.

Replace usage of char *object_dir with odb->path by storing a struct object_directory* in the write_commit_graph_context structure.
This is an intermediate step towards getting rid of all path normalization in 'commit-graph.c'.

Resolving a user-provided '--object-dir' argument now requires that we compare it to the known alternates for equality.

Prior to this patch, an unknown '--object-dir' argument would silently exit with status zero.

This can clearly lead to unintended behavior, such as verifying commit-graphs that aren't in a repository's own object store (or one of its alternates), or causing a typo to mask a legitimate commit-graph verification failure.
Make this error non-silent by 'die()'-ing when the given '--object-dir' does not match any known alternate object store.


With Git 2.28 (Q3 2020), the commit-graph write --stdin-commits is optmized.

See commit 2f00c35, commit 1f1304d, commit 0ec2d0f, commit 5b6653e, commit 630cd51, commit d335ce8 (13 May 2020), commit fa8953c (18 May 2020), and commit 1fe1084 (05 May 2020) by Taylor Blau (ttaylorr).
(Merged by Junio C Hamano -- gitster -- in commit dc57a9b, 09 Jun 2020)

commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag

Helped-by: Jeff King
Signed-off-by: Taylor Blau

Since 7c5c9b9c57 ("commit-graph: error out on invalid commit oids in 'write --stdin-commits'", 2019-08-05, Git v2.24.0-rc0 -- merge listed in batch #1), the commit-graph builtin dies on receiving non-commit OIDs as input to '--stdin-commits'.

This behavior can be cumbersome to work around in, say, the case of piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if the caller does not want to cull out non-commits themselves. In this situation, it would be ideal if 'git commit-graph write' wrote the graph containing the inputs that did pertain to commits, and silently ignored the remainder of the input.

Some options have been proposed to the effect of '--[no-]check-oids' which would allow callers to have the commit-graph builtin do just that.
After some discussion, it is difficult to imagine a caller who wouldn't want to pass '--no-check-oids', suggesting that we should get rid of the behavior of complaining about non-commit inputs altogether.

If callers do wish to retain this behavior, they can easily work around this change by doing the following:

git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
awk '
  !/commit/ { print "not-a-commit:"$1 }
   /commit/ { print $1 }
' |
git commit-graph write --stdin-commits

To make it so that valid OIDs that refer to non-existent objects are indeed an error after loosening the error handling, perform an extra lookup to make sure that object indeed exists before sending it to the commit-graph internals.

This is tested with Git 2.28 (Q3 2020).

See commit 94fbd91 (01 Jun 2020), and commit 6334c5f (03 Jun 2020) by Taylor Blau (ttaylorr).
(Merged by Junio C Hamano -- gitster -- in commit abacefe, 18 Jun 2020)

t5318: test that '--stdin-commits' respects '--[no-]progress'

Signed-off-by: Taylor Blau
Acked-by: Derrick Stolee

The following lines were not covered in a recent line-coverage test against Git:

builtin/commit-graph.c
5b6653e5 244) progress = start_delayed_progress(
5b6653e5 268) stop_progress(&progress);

These statements are executed when both '--stdin-commits' and '--progress' are passed. Introduce a trio of tests that exercise various combinations of these options to ensure that these lines are covered.

More importantly, this is exercising a (somewhat) previously-ignored feature of '--stdin-commits', which is that it respects '--progress'.

Prior to 5b6653e523 ("[builtin/commit-graph.c](https://github.com/git/git/blob/94fbd9149a2d59b0dca18448ef9d3e0607a7a19d/builtin/commit-graph.c): dereference tags in builtin", 2020-05-13, Git v2.28.0 -- merge listed in batch #2), dereferencing input from '--stdin-commits' was done inside of commit-graph.c.

Now that an additional progress meter may be generated from outside of commit-graph.c, add a corresponding test to make sure that it also respects '--[no]-progress'.

The other location that generates progress meter output (from d335ce8f24 ("[commit-graph.c](https://github.com/git/git/blob/94fbd9149a2d59b0dca18448ef9d3e0607a7a19d/commit-graph.c): show progress of finding reachable commits", 2020-05-13, Git v2.28.0 -- merge listed in batch #2)) is already covered by any test that passes '--reachable'.


With Git 2.29 (Q4 2020), in_merge_bases_many(), a way to see if a commit is reachable from any commit in a set of commits, was totally broken when the commit-graph feature was in use, which has been corrected.

See commit 8791bf1 (02 Oct 2020) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit c01b041, 05 Oct 2020)

commit-reach: fix in_merge_bases_many bug

Reported-by: Srinidhi Kaushik
Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee

Way back in f9b8908b ("[commit.c](https://github.com/git/git/blob/8791bf18414a37205127e184c04cad53a43aeff1/commit.c): use generation numbers for in_merge_bases()", 2018-05-01, Git v2.19.0-rc0 -- merge listed in batch #1), a heuristic was used to short-circuit the in_merge_bases() walk.
This works just fine as long as the caller is checking only two commits, but when there are multiple, there is a possibility that this heuristic is very wrong.

Some code moves since then has changed this method to repo_in_merge_bases_many() inside commit-reach.c. The heuristic computes the minimum generation number of the "reference" list, then compares this number to the generation number of the "commit".

In a recent topic, a test was added that used in_merge_bases_many() to test if a commit was reachable from a number of commits pulled from a reflog. However, this highlighted the problem: if any of the reference commits have a smaller generation number than the given commit, then the walk is skipped _even if there exist some with higher generation number_.

This heuristic is wrong! It must check the MAXIMUM generation number of the reference commits, not the MINIMUM.

The fix itself is to swap min_generation with a max_generation in repo_in_merge_bases_many().


Before Git 2.32 hopefully (Q1 2021), when certain features (e.g. grafts) used in the repository are incompatible with the use of the commit-graph, we used to silently turned commit-graph off; we now tell the user what we are doing.

See commit c85eec7 (11 Feb 2021) by Johannes Schindelin (dscho).
(Merged by Junio C Hamano -- gitster -- in commit 726b11d, 17 Feb 2021)

That will show what was intended for Git 2.31, but it has been reverted, as it is a bit overzealous in its current form.

commit-graph: when incompatible with graphs, indicate why

Signed-off-by: Johannes Schindelin
Acked-by: Derrick Stolee

When gc.writeCommitGraph = true, it is possible that the commit-graph is still not written: replace objects, grafts and shallow repositories are incompatible with the commit-graph feature.

Under such circumstances, we need to indicate to the user why the commit-graph was not written instead of staying silent about it.

The warnings will be:

repository contains replace objects; skipping commit-graph
repository contains (deprecated) grafts; skipping commit-graph
repository is shallow; skipping commit-graph
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文