当前位置：文江博客话题详情

如何干净地处理存储库中的源代码和数据

发布于 2024-10-30 19:00:26 字数 249 浏览 4 评论 0原文

我正在开展一个协作科学项目，该项目由一些 Python 脚本（最多 1M）和一个相对较大的数据集（1.5 GB）组成。数据集与 python 脚本紧密相连，因为数据集本身就是科学，而脚本是它们的简单接口。

我使用 Mercurial 作为我的源代码控制工具，但我不清楚定义存储库的良好机制。从逻辑上讲，将它们捆绑在一起是有意义的，这样通过克隆存储库您就可以获得整个包。另一方面，我担心处理大量数据的源代码控制工具。

有没有一个干净的机制来处理这个问题？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

巨坚强 2024-11-06 19:00:26

如果数据文件很少更改，并且您通常需要所有这些文件，那么只需将它们添加到 Mercurial 中即可完成。您的所有克隆都是 1.5 GB，但这正是处理该数据量所必须的方式。

如果数据是二进制数据并且经常更改，那么您可能会尝试避免下载所有旧数据。一种方法是使用 Subversion 子存储库。您将有一个 .hgsub 文件

data = [svn]http://svn.some.edu/me/ourdata

，该文件告诉 Mercurial 从右侧 URL 进行 svn checkout 并将 Subversion 工作副本作为 放入您的 Mercurial 克隆中>数据。 Mercurial 将为您维护一个名为 .hgsubstate 的附加文件，其中记录了 SVN 修订号以签出任何给定的 Mercurial 变更集。通过像这样使用 Subversion，您最终只能在计算机上获得最新版本的数据，但 Mercurial 将知道如何在需要时获取旧版本的数据。如果您选择此路线，请参阅子存储库指南。

If the data files change rarely and you normally need all of them anyway, then just add them to Mercurial and be done with it. All your clones will be 1.5 GB, but that is just the way it has to be with that amount of data.

if the data is binary data and changed often, then you might try to avoid downloading all the old data. One way to do this is to use a Subversion subrepository. You will have a .hgsub file with

data = [svn]http://svn.some.edu/me/ourdata

which tells Mercurial to make a svn checkout from the right-hand side URL and put the Subversion working copy into your Mercurial clone as data. Mercurial will maintain an additional file for you called .hgsubstate, in which it records the SVN revision number to checkout for any given Mercurial changeset. By using Subversion like this, you only end up with the latest version of the data on your machine, but Mercurial will know how to get older versions of the data when needed. Please see this guide to subrepositories if you go down this route.

回复收藏 0 原文