如何使用DVC添加单个文件?

发布于 2025-02-07 13:17:37 字数 850 浏览 4 评论 0原文

假设我运行以下命令:

# set up DVC

mkdir foo
cd foo && git init
dvc init
git add * && git commit -m "dvc init"


# make a data file

mkdir -p bar/biz
touch bar/biz/boz


# add the data file

dvc add bar/biz/boz

并且DVC输出以下内容:

To track the changes with git, run:

  git add bar/biz/.gitignore bar/biz/boz.dvc

最后一部分是我想要避免的。最好是,DVC仅更改顶级.gitignore(位于项目root,其中执行git Init),并且仅在顶级更改DVC文件。

这就是为什么:

我在原始工作或多或少的临时工作中开发了一个相当大的数据集。这些数据不是系统地组织的,我也不想组织它。

相反,我想将此旧的定制数据逐渐添加到DVC目录树中。而且,每次我将一些数据添加到树上时,我都想像修改代码或将一个项目的代码混合到另一个项目中一样,用DVC进行检查。

但是,DVC希望在我添加的每个位置创建一个本地文件和Gitignore。这会造成一团糟,我没有合理的信念,即可以容易地维护所有这些原子和分布式数据。


问题:

在DVC中添加数据的首选方法是什么,以便DVC使用root gitignore和root dvc文件/项目?

Suppose I run the following commands:

# set up DVC

mkdir foo
cd foo && git init
dvc init
git add * && git commit -m "dvc init"


# make a data file

mkdir -p bar/biz
touch bar/biz/boz


# add the data file

dvc add bar/biz/boz

And DVC outputs the following:

To track the changes with git, run:

  git add bar/biz/.gitignore bar/biz/boz.dvc

This last part is what I would like to avoid. Preferably, DVC would only change the top level .gitignore (located at the project root, where git init was executed), and will change only DVC files at the top level.

And here's why:

I have a rather large dataset developed in an original work more or less ad-hoc. This data is not systematically organized, nor do I want to organize it as-is.

Instead, I want to incrementally add this old, bespoke data to the DVC directory tree. And each time I add some of the data to the tree, I want to check it in with DVC as I would if I were modifying code or mixing one project's code into another.

However, DVC wants to create a local file and gitignore at every location I add. This creates a mess and I have no reasonable faith that it will be easy to maintain all of these atomic and distributed datastores.


The question:

What is the preferred way to incrementally add data in DVC so that DVC uses the root gitignore and root DVC files/items?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

公布 2025-02-14 13:17:37

假设bar/是您逐步添加到的数据集目录,则可以

dvc add bar

创建一个bar.dvc文件,并在最高级别写入.gitignore。

当您在bar/,dvc中更新内容时,再次添加或使用dvc commit注册新的数据集版本。新文件将添加到项目缓存中,并且.DVC文件获得了更新的md5哈希,该哈希标识为最新的目录结构。

一些文档:
https://dvc.org/doc/doc/doc/start/data-management-management-management-management-management-management-management-mangey更改
https://dvc.org/doc/doc/command-morperferenc-
https:// dvc。 org/doc/用户指定/项目结构/内部文件#-Th-CACHE-DIRECTORY 结构

Assuming bar/ is the dataset directory you're incrementally adding to, you can instead

dvc add bar

This creates a bar.dvc file and writes to .gitignore at the top level.

When you update content in bar/, dvc add it again or use dvc commit to register the new dataset version. The new files get added to the project cache and the .dvc file gets an updated md5 hash that identifies to the latest directory structure.

Some docs:
https://dvc.org/doc/start/data-management#making-changes
https://dvc.org/doc/command-reference/add
https://dvc.org/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文