Mercurial:是否可以将 .hg 文件夹压缩为几个大的 BLOB?

发布于 2024-12-17 02:42:42 字数 506 浏览 0 评论 0原文

问题:通过网络克隆 Mercurial 存储库需要太多时间(~ 12 分钟)。我们怀疑这是因为 .hg 目录包含大量文件(> 15 000)。

我们还有更大的 git 存储库,但克隆性能非常好 - 大约 1 分钟。看起来这是因为通过网络传输的 .git 文件夹只有几个文件(通常是 <30)。

问题:Mercurial 是否支持“存储库压缩为单个 blob”,如果支持如何启用它?

谢谢

更新

Mercurial 版本:1.8.3

访问方法:SAMBA 共享 (\\server\path\to\repo)

Mercurial 安装在 Linux 机器上,可从 Windows 计算机访问 (通过Windows域登录)

Issue: cloning mercurial repository over network takes too much time (~ 12 minutes). We suspect it is because .hg directory contains a lot of files (> 15 000).

We also have git repository which is even larger, but clone performance is quite good - around 1 minute. Looks like it's because .git folder which is transferred over network has only several files (usually < 30).

Question: does Mercurial support "repository compressing to single blob" and if it does how to enable it?

Thanks

UPDATE

Mercurial version: 1.8.3

Access method: SAMBA share (\\server\path\to\repo)

Mercurial is installed on Linux box, accessed from Windows machines (by Windows domain login)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

一世旳自豪 2024-12-24 02:42:42

Mercurial 使用某种压缩在网络上发送数据(请参阅 http://hgbook.red-bean.com/read/behind-the-scenes.html#id358828 ),但是通过使用 Samba,您可以完全绕过此机制。 Mercurial 认为远程存储库位于本地文件系统上,并且使用的机制不同。

它在链接文档中清楚地说明了每个数据在发送之前都被作为一个整体进行压缩:

这种算法和整个流压缩的结合
(而不是一次修订)大大减少了
要传输的字节数,比大多数网络产生更好的网络性能
各种网络。

因此,使用“真正的”网络协议时,您不会遇到 15'000 个文件的问题。

顺便说一句,我强烈建议不要使用 Samba 之类的东西来共享您的存储库。这确实会带来各种问题:

  • 当多个人尝试同时访问存储库时出现锁定问题
  • 文件权限问题
  • 文件统计问题
  • 符号链接管理问题(如果使用)

您可以在 wiki 上找到有关发布存储库的信息:PublishingRepositories (您可以在其中看到根本不推荐使用 samba)

并回答问题是,AFAIK,没有办法压缩 Mercurial 元数据或类似的东西,比如减少文件数量。但如果存储库正确发布,这将不再是问题。

Mercurial use some kind of compression to send data on the network ( see http://hgbook.red-bean.com/read/behind-the-scenes.html#id358828 ), but by using Samba, you totally bypass this mechanism. Mercurial thinks the remote repository is on a local filesystem and the mechanism used is different.

It clearly says in the linked documentation that each data are compressed as a whole before sending :

This combination of algorithm and compression of the entire stream
(instead of a revision at a time) substantially reduces the number of
bytes to be transferred, yielding better network performance over most
kinds of network.

So you won't have the problem of 15'000 files you use a "real" network protocol.

BTW, I strongly recommend against using something like Samba to share your repository. This is really asking for various kind of problems :

  • lock problems when multiple people attempt to access the repository at the same time
  • file right problems
  • file stats problems
  • problems with symlink management if used

You can find information about publishing repositories on the wiki : PublishingRepositories (where you can see that samba is not recommended at all)

And to answer the question, AFAIK, there's no way to compress the Mercurial metadata or anything like that like reduce the number of files. But if the repository is published correctly, this won't be a problem anymore.

爱的十字路口 2024-12-24 02:42:42

您可以通过创建捆绑包将其压缩为 blob:

  • hg bundle --all \\server\therepo.bundle
  • hg clone \\server\therepo.bundle
  • hg log -R therepo.bundle

您确实需要定期重新创建或更新捆绑包,但创建捆绑包的速度很快,并且可以在服务器上的 post-changeset 挂钩中完成,或者每晚完成。 (因为如果您在 .hg/hgrc 中正确设置了 [paths],则可以通过从捆绑包克隆后拉取存储库来获取剩余的变更集)。

因此,为了回答有关多个 blob 的问题,您可以为每个 X 变更集创建一个捆绑包,并让客户端克隆/取消捆绑每个变更集。 (但是,定期更新单个版本 + 正常拉取任何剩余的变更集似乎更容易......)

但是,由于您无论如何都在服务器上运行 Linux,我建议运行 hg-ssh 或 hg-web.cgi。这就是我们所做的,而且对我们来说效果很好。 (使用 Windows 客户端)

You could compress it to a blob by creating a bundle:

  • hg bundle --all \\server\therepo.bundle
  • hg clone \\server\therepo.bundle
  • hg log -R therepo.bundle

You do need to re-create or update the bundle periodically, but creating the bundle is fast and could be done in a post-changeset hook on the server, or nightly. (Since fetching remaining changesets can be done by pulling the repo after cloneing from bundle, if you set [paths] correctly in .hg/hgrc).

So, to answer your question about several blobs, you could create a bundle every X changesets, and have the clients clone/unbundle each of those. (However, having a single one updated regularly + a normal pull for any remaining changesets seems easier...)

However, since you're running Linux on the server anyway, I suggest running hg-ssh or hg-web.cgi. That's what we do and it works well for us. (With windows clients)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文