如何使用DVC添加单个文件?
假设我运行以下命令:
# set up DVC
mkdir foo
cd foo && git init
dvc init
git add * && git commit -m "dvc init"
# make a data file
mkdir -p bar/biz
touch bar/biz/boz
# add the data file
dvc add bar/biz/boz
并且DVC输出以下内容:
To track the changes with git, run:
git add bar/biz/.gitignore bar/biz/boz.dvc
最后一部分是我想要避免的。最好是,DVC仅更改顶级.gitignore
(位于项目root,其中执行git Init
),并且仅在顶级更改DVC文件。
这就是为什么:
我在原始工作或多或少的临时工作中开发了一个相当大的数据集。这些数据不是系统地组织的,我也不想组织它。
相反,我想将此旧的定制数据逐渐添加到DVC目录树中。而且,每次我将一些数据添加到树上时,我都想像修改代码或将一个项目的代码混合到另一个项目中一样,用DVC进行检查。
但是,DVC希望在我添加的每个位置创建一个本地文件和Gitignore。这会造成一团糟,我没有合理的信念,即可以容易地维护所有这些原子和分布式数据。
问题:
在DVC中添加数据的首选方法是什么,以便DVC使用root gitignore和root dvc文件/项目?
Suppose I run the following commands:
# set up DVC
mkdir foo
cd foo && git init
dvc init
git add * && git commit -m "dvc init"
# make a data file
mkdir -p bar/biz
touch bar/biz/boz
# add the data file
dvc add bar/biz/boz
And DVC outputs the following:
To track the changes with git, run:
git add bar/biz/.gitignore bar/biz/boz.dvc
This last part is what I would like to avoid. Preferably, DVC would only change the top level .gitignore
(located at the project root, where git init
was executed), and will change only DVC files at the top level.
And here's why:
I have a rather large dataset developed in an original work more or less ad-hoc. This data is not systematically organized, nor do I want to organize it as-is.
Instead, I want to incrementally add this old, bespoke data to the DVC directory tree. And each time I add some of the data to the tree, I want to check it in with DVC as I would if I were modifying code or mixing one project's code into another.
However, DVC wants to create a local file and gitignore at every location I add. This creates a mess and I have no reasonable faith that it will be easy to maintain all of these atomic and distributed datastores.
The question:
What is the preferred way to incrementally add data in DVC so that DVC uses the root gitignore and root DVC files/items?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
假设bar/是您逐步添加到的数据集目录,则可以
创建一个bar.dvc文件,并在最高级别写入.gitignore。
当您在bar/,
dvc中更新内容时,再次添加
或使用dvc commit
注册新的数据集版本。新文件将添加到项目缓存中,并且.DVC文件获得了更新的md5
哈希,该哈希标识为最新的目录结构。一些文档:
https://dvc.org/doc/doc/doc/start/data-management-management-management-management-management-management-management-mangey更改
https://dvc.org/doc/doc/command-morperferenc-
https:// dvc。 org/doc/用户指定/项目结构/内部文件#-Th-CACHE-DIRECTORY 结构
Assuming bar/ is the dataset directory you're incrementally adding to, you can instead
This creates a bar.dvc file and writes to .gitignore at the top level.
When you update content in bar/,
dvc add
it again or usedvc commit
to register the new dataset version. The new files get added to the project cache and the .dvc file gets an updatedmd5
hash that identifies to the latest directory structure.Some docs:
https://dvc.org/doc/start/data-management#making-changes
https://dvc.org/doc/command-reference/add
https://dvc.org/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory