有没有办法限制“git gc”的内存量用途?

发布于 2024-09-06 23:03:47 字数 147 浏览 5 评论 0 原文

我在共享主机上托管 git 存储库。我的存储库中必然有几个非常大的文件,每次我尝试在存储库上运行“git gc”时,我的进程都会因使用过多内存而被共享托管提供商杀死。有没有办法限制 git gc 可以消耗的内存量?我希望它可以用内存使用来换取速度,并且只需花更长的时间来完成工作。

I'm hosting a git repo on a shared host. My repo necessarily has a couple of very large files in it, and every time I try to run "git gc" on the repo now, my process gets killed by the shared hosting provider for using too much memory. Is there a way to limit the amount of memory that git gc can consume? My hope would be that it can trade memory usage for speed and just take a little longer to do its work.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

感性不性感 2024-09-13 23:03:48

Git repack 的内存使用量为:(pack.deltaCacheSize + pack.windowMemory) × pack.threads。各自的默认值是 256MiB、无限制、nproc。

增量缓存没有什么用处:大部分时间都花在了滑动窗口上计算增量上,其中大部分都被丢弃了;缓存幸存者以便它们可以重用一次(写入时)不会改善运行时间。该缓存也不在线程之间共享。

默认情况下,窗口内存通过 pack.window (gc.aggressiveWindow) 进行限制。限制这种打包方式是一个坏主意,因为工作集的大小和效率会有很大差异。最好将两者提高到更高的值,并依靠 pack.windowMemory 来限制窗口大小。

最后,线程具有分割工作集的缺点。降低 pack.threads 并增加 pack.windowMemory 以使总数保持不变应该可以缩短运行时间。

repack 还有其他有用的可调参数(pack.depthpack.compression、位图选项),但它们不会影响内存使用。

Git repack's memory use is: (pack.deltaCacheSize + pack.windowMemory) × pack.threads. Respective defaults are 256MiB, unlimited, nproc.

The delta cache isn't useful: most of the time is spent computing deltas on a sliding window, the majority of which are discarded; caching the survivors so they can be reused once (when writing) won't improve the runtime. That cache also isn't shared between threads.

By default the window memory is limited through pack.window (gc.aggressiveWindow). Limiting packing that way is a bad idea, because the working set size and efficiency will vary widely. It's best to raise both to much higher values and rely on pack.windowMemory to limit the window size.

Finally, threading has the disadvantage of splitting the working set. Lowering pack.threads and increasing pack.windowMemory so that the total stays the same should improve the run time.

repack has other useful tunables (pack.depth, pack.compression, the bitmap options), but they don't affect memory use.

离鸿 2024-09-13 23:03:48

您可以使用关闭 delta 属性来仅对这些路径名的 blob 禁用增量压缩:

在 foo/.git/info/attributes 中(或 foo.git/info/attributes 中) code> 如果它是一个裸存储库)(请参阅 gitattributes 中的增量条目并参阅 gitignore 用于模式语法):

/large_file_dir/* -delta
*.psd -delta
/data/*.iso -delta
/some/big/file -delta
another/file/that/is/large -delta

这不会影响存储库的克隆。要影响其他存储库(即克隆),请将属性放入 .gitattributes 文件中,而不是(或除了)info/attributes 文件中。

You could use turn off the delta attribute to disable delta compression for just the blobs of those pathnames:

In foo/.git/info/attributes (or foo.git/info/attributes if it is a bare repository) (see the delta entry in gitattributes and see gitignore for the pattern syntax):

/large_file_dir/* -delta
*.psd -delta
/data/*.iso -delta
/some/big/file -delta
another/file/that/is/large -delta

This will not affect clones of the repository. To affect other repositories (i.e. clones), put the attributes in a .gitattributes file instead of (or in addition to) the info/attributes file.

病毒体 2024-09-13 23:03:48

Git 2.18 (Q2 2018) 将改善 gc 内存消耗。
在 2.18 之前,“git pack-objects”在完成工作时需要分配大量的“struct object_entry”:缩小其大小对性能有很大帮助
这会影响 git gc。

请参阅提交f6a5576提交 3b13a5f, 提交0aca34e提交 ac77d0c提交27a7d06提交 660b373提交 0cb3c14提交898eba5提交 43fa44f提交 06af3bb , 提交 b5c0cbd, 提交 0c6804a, 提交fd9b1ba提交 8d6ccce提交 4c2db93(2018 年 4 月 14 日),作者:Nguyễn Thái Ngọc Duy (pclouds)
(由 Junio C Hamano -- gitster -- 合并于 提交 ad635e8,2018 年 5 月 23 日)

pack-objects:重新排序成员以缩小struct object_entry

以前的补丁在这个结构中留下了很多洞和填充。
此补丁重新排序成员并将结构缩小到 80 字节
(在 64 位系统上从 136 字节开始,在完成任何字段收缩之前)
有 16 位空闲(当 in_pack_header_size 时还有更多位
我们真的用完了位)。

这是一系列内存减少补丁中的最后一个(请参阅
pack-objects:一些关于 struct object_entry 的文档
第一个)。

总的来说,他们减少了 linux-2.6.git 上的重新打包内存大小
3.747G 至 3.424G,减少了 320M 左右,减少了 8.5%。
重新打包的运行时间在整个系列中保持不变。
Ævar 在他可以访问的大型 monorepo(大于 linux-2.6.git)上进行的测试显示,性能下降了 7.9%,因此总体预期改进应该在 8% 左右。


使用 Git 2.20(2018 年第 4 季度),可以更轻松地检查一个分叉中存在的对象是否与未出现在同一分叉存储库中的另一个对象形成增量。

请参阅提交 fe0ac2f提交 108f530, 提交f64ba53(2018 年 8 月 16 日),作者:Christian Couder (chriscool)
帮助者:Jeff King (peff)Duy Nguyen (pclouds)
请参阅 提交 9eb0986提交 16d75fa, 提交28b8a73提交 c8d521f(2018 年 8 月 16 日),作者:杰夫·金 (peff)
帮助者:Jeff King (peff)Duy Nguyen (pclouds)
(由 Junio C Hamano -- gitster -- 合并于 提交 f3504ea,2018 年 9 月 17 日)

pack-objects:将“layer”移动到“struct Packing_data

这将“struct object_entry”的大小从 88 个字节减少到 80 个字节,从而使打包对象更加高效。

例如,在具有 12M 对象的 Linux 存储库上,即使不使用图层功能,git pack-objects --all 也需要额外的 96MB 内存。


请注意,Git 2.21(2019 年 2 月)修复了一个小错误:“git pack-objects”错误地使用了未初始化的互斥体,该错误已得到纠正。

请参阅 提交 edb673c提交 459307b(2019 年 1 月 25 日),作者: Patrick Hogg ( ``)
帮助者:Junio C Hamano (gitster)
(由 Junio C Hamano -- gitster -- 合并于 提交 d243a32,2019 年 2 月 5 日)

pack-objects:将读取互斥体移动到packing_data结构

ac77d0c ("pack-objects: 收缩结构体object_entry中的大小字段”,
2018-04-14) 在新版本中添加了 read_lock/read_unlock 的额外用法
引入了 oe_get_size_slow 以确保并行调用中的线程安全
try_delta()
不幸的是 oe_get_size_slow 也用于串行
代码,其中一些在第一次调用之前被调用
ll_find_deltas
因此,不保证读取互斥锁会被初始化。

通过将读取互斥体移至 packing_data 并初始化来解决此问题
它位于在cmd_pack_objects中初始化的prepare_packing_data中。


Git 2.21(2019 年 2 月)仍然找到另一种方法来缩小包的大小,通过“git pack-objects”学习另一种算法来计算集合
要发送的对象,将生成的包文件进行交换以保存
有利于小推送的遍历成本。

pack-objects:创建pack.useSparse设置

git pack-objects”中的“--sparse”标志更改了算法
用于将对象枚举为对个体来说更快的对象
用户推送仅改变一小部分圆锥体的新对象
工作目录。
不建议对服务器使用稀疏算法,因为服务器可能会发送出现在整个工作目录中的新对象。

创建一个“pack.useSparse”设置来启用此新算法。
这允许“git push”使用此算法而无需传递
'--sparse' 标志一直贯穿 run_command() 的四个级别
来电。

如果设置了“--no-sparse”标志,则此配置设置为
被覆盖。

配置包文档现在包括:

pack.useSparse:

当为 true 时,Git 将默认使用 '--sparse' 选项
当存在“--revs”选项时,使用“git pack-objects”。
该算法仅遍历出现在引入新对象的路径中的树。

在计算一个包以发送小额零钱时,这可以带来显着的性能优势。

但是,如果包含的提交包含某些类型的直接重命名,则可能会向包文件中添加额外的对象。

请参阅“git Push 对于大型存储库来说非常慢”举个具体的例子。


注意:正如 Git 2.24 中所评论的,像 pack.useSparse 这样的设置仍然是实验性的。

请参阅提交aaf633c提交 c6cc4c5, 提交ad0fb65提交 31b1de6提交 b068d9a提交 7211b9e(2019 年 8 月 13 日),作者:Derrick Stolee (derrickstolee)
(由 Junio C Hamano -- gitster -- 合并于 提交 f4f8dfe,2019 年 9 月 9 日)

repo-settings:创建feature.experimental设置

feature.experimental”设置包括未承诺成为默认值的配置选项,但可以使用其他测试g。

更新以下配置设置以采用新的默认值,并
如果尚未使用 repo_settings 结构,请使用它们:

  • 'pack.useSparse=true'
  • 'fetch.negotiationAlgorithm=跳过'

使用 Git 2.26 (Q1 2020),方式“git pack-objects" 重用现有包中存储的对象来生成其结果改善了。

请参阅提交 d2ea031提交 92fb0db, 提交bb514de提交 ff48302提交 e704fc7提交 2f4af77提交 8ebf529提交 59b2829提交 40d18ff提交 14fbd26 (2019 年 12 月 18 日),提交 56d9cbe提交 bab28d9(2019 年 9 月 13 日),作者 杰夫·金 (peff)
(由 Junio C Hamano -- gitster -- 合并于 提交 a14aebe,2020 年 2 月 14 日)

pack-objects:改进部分包文件重用

帮助者:Jonathan Tan
签字人:杰夫·金
签字人:Christian Couder

重用现有包文件中的增量的旧代码只是尝试逐字转储包的整个片段。这比实际将物品添加到装箱单的传统方法要快,但它并不经常起作用。这个新代码实际上是在寻求中间立场:为每个对象做一些工作,但比我们传统上做的要少得多。

新代码的一般策略是从我们将包含的包文件中创建对象的位图,然后对其进行迭代,完全按照磁盘包中的方式写出每个对象,但是将其添加到我们的包列表中(这会消耗内存,并增加增量的搜索空间)。

一个复杂的问题是,如果我们省略了一些对象,我们就无法针对我们不发送的基础设置增量。因此,我们必须检查 try_partial_reuse() 中的每个对象,以确保我们拥有其增量。

关于性能,在最坏的情况下,我们可能会交错发送或不发送对象,并且我们将拥有与对象一样多的块。但实际上我们会发送大块。

例如,在 GitHub 服务器上打包 torvalds/linux 现在重用了 650 万个对象,但只需要约 5 万个块。


在 Git 2.34(2021 年第 4 季度)中,git repack 本身(由 git gc 使用)受益于内存使用量的减少。

请参阅 提交 b017334提交 a9fd2f2, 提交a241878(2021 年 8 月 29 日),作者:Taylor Blau (ttaylorr)
(由 Junio C Hamano -- gitster -- 合并于 提交 9559de3,2021 年 9 月 10 日)

builtin/pack-objects.c:删除重复的哈希查找

签字人:Taylor Blau

08cdfb1的原始代码中(“pack-objects - -keep-unreachable”,2007-09-16,Git v1.5.4-rc0 -- merge),我们将每个对象添加到类型为 ``obj->type, 的装箱清单中,其中 obj 来自 lookup_unknown_object( )
除非我们已经查找并解析了该对象,否则这将是 OBJ_NONE
没关系,因为 oe_set_type()type_valid 位设置为“0”,并且我们稍后会确定实际类型。

因此,我们在对象查找中唯一需要的是访问 flags 字段,以便我们可以使用 OBJECT_ADDED 标记我们已添加对象,以避免添加再次获取它(我们可以直接传递 OBJ_NONE ,而不是从对象中获取它)。

但是 add_object_entry() 已经拒绝重复项!自 7a979d9 (“精简包 - 创建缺少增量基础的包文件.", 2006-02-19, Git v1.3.0-rc1 -- 合并 ),但是 08cdfb1 没有利用它。
此外,要进行 OBJECT_ADDED 检查,我们必须在 obj_hash 中进行哈希查找。

因此,我们可以完全放弃 lookup_unknown_object() 调用, OBJECT_ADDED 标志,因为我们在这里触及的位置是唯一检查它的位置。

最后,我们执行相同数量的哈希查找,但还有一个额外的好处,即我们不会浪费内存分配 OBJ_NONE 对象(如果我们正在遍历,我们需要它最终,但此代码路径的重点不是遍历)。


fetch.negotiationAlgorithmfeature.experimental 配置变量之间的交互已在 Git 2.36(2022 年第 2 季度)中得到纠正。

请参阅提交 714edc6提交 a9a136c, Elijah Newren (newren)
(由 Junio C Hamano -- gitster -- 合并于 提交 70ff41f,2022 年 2 月 16 日)

repo-settings:重命名传统默认值fetch.negotiationAlgorithm

签字人:伊利亚·纽伦

将传统的默认 fetch.negotiationAlgorithm 命名为“consecutive”。
还允许选择“default”,让 Git 在选项之间做出决定(目前,如果 feature.experimental 为 true,则选择“skipping”,并且否则为“连续”)。
相应地更新文档。

git config 现在包含在其 手册页

控制本地存储库中有关提交的信息的方式
在协商要发送的包文件的内容时发送
服务器。

  • 设置为“连续”以使用遍历算法
    在连续提交中检查每一项。
  • 设置为“跳过
    使用跳过提交的算法来努力收敛
    更快,但可能会导致包文件过大;或设置
    到“noop”根本不发送任何信息,这几乎将
    当然会导致包文件过大,但会跳过
    谈判步骤。
  • 设置为“default”以覆盖所做的设置
    之前并使用默认行为。

一般情况下默认为
consecutive”,但如果 feature.experimental 为 true,则
默认为“跳过”。
未知值将导致“git fetch
错误输出(未知的获取协商算法)。


在 Git 2.44(2024 年第 1 季度)中,包文件数据的流传输过去只能通过具有多个包文件的存储库中的单个主包来完成。
它已被扩展为允许从其他包文件中重用。这会影响 gc。

请参阅 提交 ba47d88提交 af626ac, 提交9410741提交 3bea0c0 nofollow noreferrer“> commit B1E33333068247DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDEB69457C249DDEE7A1“ REL =“ Nofollow Noreferrer”> commit B1E3333 5e6ad2“ rel =” nofollow noreferrer”>提交ED9F414 ,, /git/comm/ca0fd69e37132ACDDDC457B96A91EF528C7C312B“ rel =“ nofollow noreferrer”> commit Ca0fd69 commit 4805125 github.com/git/git/commit/D1D701EB9CE2293588AABF34C69335D49640F968“ rel =“ nofollow noreferrer”> commitd d1d701e 2102185E7450C54A3637FDEE0“ rel =” nofollow noreferrer">commit 5e29c3f, commit 83296d2, nofollow noreferrer“> commit 35E156B E5D48BF38BC0E1F44F4DAA7C8E0F75CD9296D020“ rel =“ nofollow noreferrer”>提交E5D48BF commit dab6093 commits 6cdb67b 12d2952305a4793d7b43f“ rel =” nofollow noreferrer“> commit 66f0c71 (2023年12月14日)作者: 。
(由 Junio C Hamano -- gitster -- 合并于

位图包

签名:泰勒·布劳

现在,Pack-bitmap和Pack-Objects代码都准备好处理标记和使用来自多个位图包的对象进行逐字重复使用,允许从所有位映射的包装中标记对象作为符合重用的资格。

reuse_partial_packfile_from_bitmap()函数,我们不再仅标记第一个对象的重用位置为零的包,而是标记MIDX中包含的任何包装作为重复使用候选者。 /p>

在新脚本(T5332)中提供少数测试用例,以对多包重复使用行为有趣的行为,以确保我们正确执行了所有之前的所有步骤。

git config 现在包含在其人页

当true或“ 单个”时,以及可及性位图为
已启用,包装对象将尝试发送位图的一部分
Packfile逐字。
当“ Multi ”,当多包时
可及性位图可用,包对象将尝试发送
MIDX中所有包装的一部分。

如果只有一个包位图,并且
pack.AllowPackReuse 设置为“多”,仅重复使用部分
位图包装。这可以将内存和CPU使用降低到
提供提取,但可能会导致发送稍大
包。
默认为true。


与Git 2.44(Q1 2024),将用户选择到多彼得重复使用实验中,

请参见 href =“ https://github.com/git/git/commit/7C01878EEB15E8DD75F0262BDFBDFB3249C85A30A4A” rel =“ nofollow noreferrer”> norelollow noreferrer“> commition 7c01878 05年5月5日(05年5月5日) /ttaylorr“ rel =” nofollow noreferrer“> taylor blau( ttaylorr )。
(由 Junio C Hamano -- gitster -- 合并于

通过功能重复使用。经验

签名:泰勒·布劳

现在支持多包重复使用,请通过功能启用它。经典配置除了经典 pack.allowpackreuse

这将允许更多用户尝试新行为,这些新行为可能不知道现有 pack.allowpackreuse 配置选项。

带有值的枚举 no_pack_reuse, single_pack_reuse, and multi_pack_reuse

git config 现在包含在其人页

  • pack.AllowPackReuse = Multi 可能会通过从多个包中重用对象而不是一个。
  • 来改善创建包的时间。

Git 2.18 (Q2 2018) will improve the gc memory consumption.
Before 2.18, "git pack-objects" needs to allocate tons of "struct object_entry" while doing its work: shrinking its size helps the performance quite a bit.
This influences git gc.

See commit f6a5576, commit 3b13a5f, commit 0aca34e, commit ac77d0c, commit 27a7d06, commit 660b373, commit 0cb3c14, commit 898eba5, commit 43fa44f, commit 06af3bb, commit b5c0cbd, commit 0c6804a, commit fd9b1ba, commit 8d6ccce, commit 4c2db93 (14 Apr 2018) by Nguyễn Thái Ngọc Duy (pclouds).
(Merged by Junio C Hamano -- gitster -- in commit ad635e8, 23 May 2018)

pack-objects: reorder members to shrink struct object_entry

Previous patches leave lots of holes and padding in this struct.
This patch reorders the members and shrinks the struct down to 80 bytes
(from 136 bytes on 64-bit systems, before any field shrinking is done)
with 16 bits to spare (and a couple more in in_pack_header_size when
we really run out of bits).

This is the last in a series of memory reduction patches (see
"pack-objects: a bit of document about struct object_entry" for the
first one).

Overall they've reduced repack memory size on linux-2.6.git from
3.747G to 3.424G, or by around 320M, a decrease of 8.5%.
The runtime of repack has stayed the same throughout this series.
Ævar's testing on a big monorepo he has access to (bigger than linux-2.6.git) has shown a 7.9% reduction, so the overall expected improvement should be somewhere around 8%.


With Git 2.20 (Q4 2018), it will be easier to check an object that exists in one fork is not made into a delta against another object that does not appear in the same forked repository.

See commit fe0ac2f, commit 108f530, commit f64ba53 (16 Aug 2018) by Christian Couder (chriscool).
Helped-by: Jeff King (peff), and Duy Nguyen (pclouds).
See commit 9eb0986, commit 16d75fa, commit 28b8a73, commit c8d521f (16 Aug 2018) by Jeff King (peff).
Helped-by: Jeff King (peff), and Duy Nguyen (pclouds).
(Merged by Junio C Hamano -- gitster -- in commit f3504ea, 17 Sep 2018)

pack-objects: move 'layer' into 'struct packing_data'

This reduces the size of 'struct object_entry' from 88 bytes to 80 and therefore makes packing objects more efficient.

For example on a Linux repo with 12M objects, git pack-objects --all needs extra 96MB memory even if the layer feature is not used.


Note that Git 2.21 (Feb. 2019) fixes a small bug: "git pack-objects" incorrectly used uninitialized mutex, which has been corrected.

See commit edb673c, commit 459307b (25 Jan 2019) by Patrick Hogg (``).
Helped-by: Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit d243a32, 05 Feb 2019)

pack-objects: move read mutex to packing_data struct

ac77d0c ("pack-objects: shrink size field in struct object_entry",
2018-04-14) added an extra usage of read_lock/read_unlock in the newly
introduced oe_get_size_slow for thread safety in parallel calls to
try_delta().
Unfortunately oe_get_size_slow is also used in serial
code, some of which is called before the first invocation of
ll_find_deltas.
As such the read mutex is not guaranteed to be initialized.

Resolve this by moving the read mutex to packing_data and initializing
it in prepare_packing_data which is initialized in cmd_pack_objects.


Git 2.21 (Feb. 2019) still find another way to shrink the size of the pack with "git pack-objects" learning another algorithm to compute the set of
objects to send, that trades the resulting packfile off to save
traversal cost to favor small pushes.

pack-objects: create pack.useSparse setting

The '--sparse' flag in 'git pack-objects' changes the algorithm
used to enumerate objects to one that is faster for individual
users pushing new objects that change only a small cone of the
working directory.
The sparse algorithm is not recommended for a server, which likely sends new objects that appear across the entire working directory.

Create a 'pack.useSparse' setting that enables this new algorithm.
This allows 'git push' to use this algorithm without passing a
'--sparse' flag all the way through four levels of run_command()
calls.

If the '--no-sparse' flag is set, then this config setting is
overridden.

The config pack documentation now includes:

pack.useSparse:

When true, Git will default to using the '--sparse' option in
'git pack-objects' when the '--revs' option is present.
This algorithm only walks trees that appear in paths that introduce new objects.

This can have significant performance benefits when computing a pack to send a small change.

However, it is possible that extra objects are added to the pack-file if the included commits contain certain types of direct renames.

See "git push is very slow for a huge repo" for a concrete illustration.


Note: as commented in Git 2.24, a setting like pack.useSparse is still experimental.

See commit aaf633c, commit c6cc4c5, commit ad0fb65, commit 31b1de6, commit b068d9a, commit 7211b9e (13 Aug 2019) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit f4f8dfe, 09 Sep 2019)

repo-settings: create feature.experimental setting

The 'feature.experimental' setting includes config options that are not committed to become defaults, but could use additional testing.

Update the following config settings to take new defaults, and to
use the repo_settings struct if not already using them:

  • 'pack.useSparse=true'
  • 'fetch.negotiationAlgorithm=skipping'

With Git 2.26 (Q1 2020), The way "git pack-objects" reuses objects stored in existing pack to generate its result has been improved.

See commit d2ea031, commit 92fb0db, commit bb514de, commit ff48302, commit e704fc7, commit 2f4af77, commit 8ebf529, commit 59b2829, commit 40d18ff, commit 14fbd26 (18 Dec 2019), and commit 56d9cbe, commit bab28d9 (13 Sep 2019) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit a14aebe, 14 Feb 2020)

pack-objects: improve partial packfile reuse

Helped-by: Jonathan Tan
Signed-off-by: Jeff King
Signed-off-by: Christian Couder

The old code to reuse deltas from an existing packfile just tried to dump a whole segment of the pack verbatim. That's faster than the traditional way of actually adding objects to the packing list, but it didn't kick in very often. This new code is really going for a middle ground: do some per-object work, but way less than we'd traditionally do.

The general strategy of the new code is to make a bitmap of objects from the packfile we'll include, and then iterate over it, writing out each object exactly as it is in our on-disk pack, but not adding it to our packlist (which costs memory, and increases the search space for deltas).

One complication is that if we're omitting some objects, we can't set a delta against a base that we're not sending. So we have to check each object in try_partial_reuse() to make sure we have its delta.

About performance, in the worst case we might have interleaved objects that we are sending or not sending, and we'd have as many chunks as objects. But in practice we send big chunks.

For instance, packing torvalds/linux on GitHub servers now reused 6.5M objects, but only needed ~50k chunks.


With Git 2.34 (Q4 2021), git repack itself (used by git gc) benefits from a reduced memory usage.

See commit b017334, commit a9fd2f2, commit a241878 (29 Aug 2021) by Taylor Blau (ttaylorr).
(Merged by Junio C Hamano -- gitster -- in commit 9559de3, 10 Sep 2021)

builtin/pack-objects.c: remove duplicate hash lookup

Signed-off-by: Taylor Blau

In the original code from 08cdfb1 ("pack-objects --keep-unreachable", 2007-09-16, Git v1.5.4-rc0 -- merge), we add each object to the packing list with type ``obj->type, where obj comes from lookup_unknown_object().
Unless we had already looked up and parsed the object, this will be OBJ_NONE.
That's fine, since oe_set_type() sets the type_valid bit to '0', and we determine the real type later on.

So the only thing we need from the object lookup is access to the flags field so that we can mark that we've added the object with OBJECT_ADDED to avoid adding it again (we can just pass OBJ_NONE directly instead of grabbing it from the object).

But add_object_entry() already rejects duplicates! This has been the behavior since 7a979d9 ("Thin pack - create packfile with missing delta base.", 2006-02-19, Git v1.3.0-rc1 -- merge), but 08cdfb1 didn't take advantage of it.
Moreover, to do the OBJECT_ADDED check, we have to do a hash lookup in obj_hash.

So we can drop the lookup_unknown_object() call completely, and the OBJECT_ADDED flag, too, since the spot we're touching here is the only location that checks it.

In the end, we perform the same number of hash lookups, but with the added bonus that we don't waste memory allocating an OBJ_NONE object (if we were traversing, we'd need it eventually, but the whole point of this code path is not to traverse).


The interaction between fetch.negotiationAlgorithm and feature.experimental configuration variables has been corrected with Git 2.36 (Q2 2022).

See commit 714edc6, commit a9a136c, commit a68c5b9 (02 Feb 2022) by Elijah Newren (newren).
(Merged by Junio C Hamano -- gitster -- in commit 70ff41f, 16 Feb 2022)

repo-settings: rename the traditional default fetch.negotiationAlgorithm

Signed-off-by: Elijah Newren

Give the traditional default fetch.negotiationAlgorithm the name 'consecutive'.
Also allow a choice of 'default' to have Git decide between the choices (currently, picking 'skipping' if feature.experimental is true and 'consecutive' otherwise).
Update the documentation accordingly.

git config now includes in its man page:

Control how information about the commits in the local repository
is sent when negotiating the contents of the packfile to be sent by
the server.

  • Set to "consecutive" to use an algorithm that walks
    over consecutive commits checking each one.
  • Set to "skipping" to
    use an algorithm that skips commits in an effort to converge
    faster, but may result in a larger-than-necessary packfile; or set
    to "noop" to not send any information at all, which will almost
    certainly result in a larger-than-necessary packfile, but will skip
    the negotiation step.
  • Set to "default" to override settings made
    previously and use the default behaviour.

The default is normally
"consecutive", but if feature.experimental is true, then the
default is "skipping".
Unknown values will cause 'git fetch' to
error out (unknown fetch negotiation algorithm).


With Git 2.44 (Q1 2024), streaming spans of packfile data used to be done only from a single, primary, pack in a repository with multiple packfiles.
It has been extended to allow reuse from other packfiles, too. That can influence the gc.

See commit ba47d88, commit af626ac, commit 9410741, commit 3bea0c0, commit 54393e4, commit 519e17f, commit dbd5c52, commit e1bfe30, commit b1e3333, commit ed9f414, commit b96289a, commit ca0fd69, commit 4805125, commit 073b40e, commit d1d701e, commit 5e29c3f, commit 83296d2, commit 35e156b, commit e5d48bf, commit dab6093, commit 307d75b, commit 5f5ccd9, commit fba6818, commit a96015a, commit 6cdb67b, commit 66f0c71 (14 Dec 2023) by Taylor Blau (ttaylorr).
(Merged by Junio C Hamano -- gitster -- in commit 0fea6b7, 12 Jan 2024)

pack-bitmap: enable reuse from all bitmapped packs

Signed-off-by: Taylor Blau

Now that both the pack-bitmap and pack-objects code are prepared to handle marking and using objects from multiple bitmapped packs for verbatim reuse, allow marking objects from all bitmapped packs as eligible for reuse.

Within the reuse_partial_packfile_from_bitmap() function, we no longer only mark the pack whose first object is at bit position zero for reuse, and instead mark any pack contained in the MIDX as a reuse candidate.

Provide a handful of test cases in a new script (t5332) exercising interesting behavior for multi-pack reuse to ensure that we performed all of the previous steps correctly.

git config now includes in its man page:

When true or "single", and when reachability bitmaps are
enabled, pack-objects will try to send parts of the bitmapped
packfile verbatim.
When "multi", and when a multi-pack
reachability bitmap is available, pack-objects will try to send
parts of all packs in the MIDX.

If only a single pack bitmap is available, and
pack.allowPackReuse is set to "multi", reuse parts of just the
bitmapped packfile. This can reduce memory and CPU usage to
serve fetches, but might result in sending a slightly larger
pack.
Defaults to true.


With Git 2.44 (Q1 2024), rc1, setting feature.experimental opts the user into multi-pack reuse experiment

See commit 23c1e71, commit 7c01878 (05 Feb 2024) by Taylor Blau (ttaylorr).
(Merged by Junio C Hamano -- gitster -- in commit 3b89ff1, 12 Feb 2024)

pack-objects: enable multi-pack reuse via feature.experimental

Signed-off-by: Taylor Blau

Now that multi-pack reuse is supported, enable it via the feature.experimental configuration in addition to the classic pack.allowPackReuse.

This will allow more users to experiment with the new behavior who might not otherwise be aware of the existing pack.allowPackReuse configuration option.

The enum with values NO_PACK_REUSE, SINGLE_PACK_REUSE, and MULTI_PACK_REUSE is defined statically in builtin/pack-objects.c's compilation unit.
We could hoist that enum into a scope visible from the repository_settings struct, and then use that enum value in pack-objects.
Instead, define a single int that indicates what pack-objects's default value should be to avoid additional unnecessary code movement.

Though feature.experimental implies pack.allowPackReuse=multi, this can still be overridden by explicitly setting the latter configuration to either "single" or "false".

git config now includes in its man page:

  • pack.allowPackReuse=multi may improve the time it takes to create a pack by reusing objects from multiple packs instead of just one.
零時差 2024-09-13 23:03:47

我使用了此链接<的说明/a>.与 Charles Baileys 建议的想法相同。

命令的副本在这里:

git config --global pack.windowMemory "100m"
git config --global pack.packSizeLimit "100m"
git config --global pack.threads "1"

这对我在具有共享托管帐户的 hostgator 上有用。

I used instructions from this link. Same idea as Charles Baileys suggested.

A copy of the commands is here:

git config --global pack.windowMemory "100m"
git config --global pack.packSizeLimit "100m"
git config --global pack.threads "1"

This worked for me on hostgator with shared hosting account.

呆头 2024-09-13 23:03:47

是的,查看 git config 的帮助页面并查看 pack.* 选项,特别是 pack.depth pack.windowpack.windowMemorypack.deltaCacheSize

它不是一个完全精确的大小,因为 git 需要将每个对象映射到内存中,因此无论窗口和增量缓存设置如何,一个非常大的对象都可能导致大量内存使用。

您可能会更好地在本地打包并“手动”将打包文件传输到远程端,添加 .keep 文件,这样远程 git 就不会尝试完全重新打包所有内容。

Yes, have a look at the help page for git config and look at the pack.* options, specifically pack.depth, pack.window, pack.windowMemory and pack.deltaCacheSize.

It's not a totally exact size as git needs to map each object into memory so one very large object can cause a lot of memory usage regardless of the window and delta cache settings.

You may have better luck packing locally and transfering pack files to the remote side "manually", adding a .keep files so that the remote git doesn't ever try to completely repack everything.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文