Google Colab 上的 DVC 错误 - dvc.scm.CloneError:无法克隆存储库
我尝试在 Google Colab 上运行“dvc pull”时遇到问题。我有两个存储库(我们称之为 A 和 B),其中存储库 A 用于我的机器学习代码,存储库 B 用于我的数据集。
我已成功使用 DVC 将数据集推送到存储库 B(使用 gdrive 作为远程存储),并且我还成功在存储库 A 的本地项目上运行“dvc import”(以及“dvc pull/update”)。
当我使用 colab 运行我的项目时,问题就出现了。所以我做了以下事情:
- 在colab上创建了一个新笔记本
- 成功地git克隆了我的机器学习项目(存储库A)
- Ran“!pip install dvc
- ”Ran“!dvc pull -v”(这就是导致错误的原因)
On第 4 步,我收到错误(这是完整的堆栈跟踪。请注意,出于保密原因,我更改了堆栈跟踪中的存储库 URL)
2022-03-08 08:53:31,863 DEBUG: Adding '/content/<my_project_A>/.dvc/config.local' to gitignore file.
2022-03-08 08:53:31,866 DEBUG: Adding '/content/<my_project_A>/.dvc/tmp' to gitignore file.
2022-03-08 08:53:31,866 DEBUG: Adding '/content/<my_project_A>/.dvc/cache' to gitignore file.
2022-03-08 08:53:31,916 DEBUG: Creating external repo https://gitlab.com/<my-dataset-repo-B>.git@3a3f2019efabff8ec71429da39b86688d1c98e75
2022-03-08 08:53:31,916 DEBUG: erepo: git clone 'https://gitlab.com/<my-dataset-repo-B>.git' to a temporary dir
Everything is up to date.
2022-03-08 08:53:32,154 ERROR: failed to pull data from the cloud - Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'
------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/backend/gitpython.py", line 185, in clone
tmp_repo = clone_from()
File "/usr/local/lib/python3.7/dist-packages/git/repo/base.py", line 1148, in clone_from
return cls._clone(git, url, to_path, GitCmdObjectDB, progress, multi_options, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/git/repo/base.py", line 1079, in _clone
finalize_process, decode_streams=False)
File "/usr/local/lib/python3.7/dist-packages/git/cmd.py", line 176, in handle_process_output
return finalizer(process)
File "/usr/local/lib/python3.7/dist-packages/git/util.py", line 386, in finalize_process
proc.wait(**kwargs)
File "/usr/local/lib/python3.7/dist-packages/git/cmd.py", line 502, in wait
raise GitCommandError(remove_password_if_present(self.args), status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git clone -v --no-single-branch --progress https://gitlab.com/<my-dataset-repo-B>.git /tmp/tmp2x6y9z0edvc-clone
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/dvc/scm.py", line 104, in clone
return Git.clone(url, to_path, progress=pbar.update_git, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/__init__.py", line 121, in clone
backend.clone(url, to_path, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/backend/gitpython.py", line 190, in clone
raise CloneError(url, to_path) from exc
scmrepo.exceptions.CloneError: Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/dvc/command/data_sync.py", line 41, in run
glob=self.args.glob,
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 49, in wrapper
return f(repo, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/pull.py", line 38, in pull
run_cache=run_cache,
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 49, in wrapper
return f(repo, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/fetch.py", line 50, in fetch
revs=revs,
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 437, in used_objs
with_deps=with_deps,
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/index.py", line 190, in used_objs
filter_info=filter_info,
File "/usr/local/lib/python3.7/dist-packages/dvc/stage/__init__.py", line 660, in get_used_objs
for odb, objs in out.get_used_objs(*args, **kwargs).items():
File "/usr/local/lib/python3.7/dist-packages/dvc/output.py", line 918, in get_used_objs
return self.get_used_external(**kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/output.py", line 973, in get_used_external
return dep.get_used_objs(**kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/dependency/repo.py", line 94, in get_used_objs
used, _ = self._get_used_and_obj(**kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/dependency/repo.py", line 108, in _get_used_and_obj
locked=locked, cache_dir=local_odb.cache_dir
File "/usr/lib/python3.7/contextlib.py", line 112, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 35, in external_repo
path = _cached_clone(url, rev, for_write=for_write)
File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 155, in _cached_clone
clone_path, shallow = _clone_default_branch(url, rev, for_write=for_write)
File "/usr/local/lib/python3.7/dist-packages/funcy/decorators.py", line 45, in wrapper
return deco(call, *dargs, **dkwargs)
File "/usr/local/lib/python3.7/dist-packages/funcy/flow.py", line 274, in wrap_with
return call()
File "/usr/local/lib/python3.7/dist-packages/funcy/decorators.py", line 66, in __call__
return self._func(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 220, in _clone_default_branch
git = clone(url, clone_path)
File "/usr/local/lib/python3.7/dist-packages/dvc/scm.py", line 106, in clone
raise CloneError(str(exc))
dvc.scm.CloneError: Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'
------------------------------------------------------------
2022-03-08 08:53:32,161 DEBUG: Analytics is enabled.
2022-03-08 08:53:32,192 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp4x5js0dk']'
2022-03-08 08:53:32,193 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp6x11s0dk']'
顺便说一句,这就是我克隆 git 存储库(存储库 A)的方式
!git config - global user.name "Zharfan"
!git config - global user.email "[email protected]"
!git clone https://<MyTokenName>:<MyToken>@link-to-my-repo-A.git
有谁知道为什么吗?任何帮助将不胜感激。先感谢您!
I'm having a problem trying to run "dvc pull" on Google Colab. I have two repositories (let's call them A and B) where repository A is for my machine learning codes and repository B is for my dataset.
I've successfully pushed my dataset to repository B with DVC (using gdrive as my remote storage) and I also managed to successfully run "dvc import" (as well as "dvc pull/update") on my local project of repository A.
The problem comes when I use colab to run my project. So what I did was the following:
- Created a new notebook on colab
- Successfully git-cloned my machine learning project (repository A)
- Ran "!pip install dvc"
- Ran "!dvc pull -v" (This is what causes the error)
On step 4, I got the error (this is the full stack trace. Note that I changed the repo URL in the stack trace for confidentiality reasons)
2022-03-08 08:53:31,863 DEBUG: Adding '/content/<my_project_A>/.dvc/config.local' to gitignore file.
2022-03-08 08:53:31,866 DEBUG: Adding '/content/<my_project_A>/.dvc/tmp' to gitignore file.
2022-03-08 08:53:31,866 DEBUG: Adding '/content/<my_project_A>/.dvc/cache' to gitignore file.
2022-03-08 08:53:31,916 DEBUG: Creating external repo https://gitlab.com/<my-dataset-repo-B>.git@3a3f2019efabff8ec71429da39b86688d1c98e75
2022-03-08 08:53:31,916 DEBUG: erepo: git clone 'https://gitlab.com/<my-dataset-repo-B>.git' to a temporary dir
Everything is up to date.
2022-03-08 08:53:32,154 ERROR: failed to pull data from the cloud - Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'
------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/backend/gitpython.py", line 185, in clone
tmp_repo = clone_from()
File "/usr/local/lib/python3.7/dist-packages/git/repo/base.py", line 1148, in clone_from
return cls._clone(git, url, to_path, GitCmdObjectDB, progress, multi_options, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/git/repo/base.py", line 1079, in _clone
finalize_process, decode_streams=False)
File "/usr/local/lib/python3.7/dist-packages/git/cmd.py", line 176, in handle_process_output
return finalizer(process)
File "/usr/local/lib/python3.7/dist-packages/git/util.py", line 386, in finalize_process
proc.wait(**kwargs)
File "/usr/local/lib/python3.7/dist-packages/git/cmd.py", line 502, in wait
raise GitCommandError(remove_password_if_present(self.args), status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git clone -v --no-single-branch --progress https://gitlab.com/<my-dataset-repo-B>.git /tmp/tmp2x6y9z0edvc-clone
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/dvc/scm.py", line 104, in clone
return Git.clone(url, to_path, progress=pbar.update_git, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/__init__.py", line 121, in clone
backend.clone(url, to_path, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/backend/gitpython.py", line 190, in clone
raise CloneError(url, to_path) from exc
scmrepo.exceptions.CloneError: Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/dvc/command/data_sync.py", line 41, in run
glob=self.args.glob,
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 49, in wrapper
return f(repo, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/pull.py", line 38, in pull
run_cache=run_cache,
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 49, in wrapper
return f(repo, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/fetch.py", line 50, in fetch
revs=revs,
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 437, in used_objs
with_deps=with_deps,
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/index.py", line 190, in used_objs
filter_info=filter_info,
File "/usr/local/lib/python3.7/dist-packages/dvc/stage/__init__.py", line 660, in get_used_objs
for odb, objs in out.get_used_objs(*args, **kwargs).items():
File "/usr/local/lib/python3.7/dist-packages/dvc/output.py", line 918, in get_used_objs
return self.get_used_external(**kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/output.py", line 973, in get_used_external
return dep.get_used_objs(**kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/dependency/repo.py", line 94, in get_used_objs
used, _ = self._get_used_and_obj(**kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/dependency/repo.py", line 108, in _get_used_and_obj
locked=locked, cache_dir=local_odb.cache_dir
File "/usr/lib/python3.7/contextlib.py", line 112, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 35, in external_repo
path = _cached_clone(url, rev, for_write=for_write)
File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 155, in _cached_clone
clone_path, shallow = _clone_default_branch(url, rev, for_write=for_write)
File "/usr/local/lib/python3.7/dist-packages/funcy/decorators.py", line 45, in wrapper
return deco(call, *dargs, **dkwargs)
File "/usr/local/lib/python3.7/dist-packages/funcy/flow.py", line 274, in wrap_with
return call()
File "/usr/local/lib/python3.7/dist-packages/funcy/decorators.py", line 66, in __call__
return self._func(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 220, in _clone_default_branch
git = clone(url, clone_path)
File "/usr/local/lib/python3.7/dist-packages/dvc/scm.py", line 106, in clone
raise CloneError(str(exc))
dvc.scm.CloneError: Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'
------------------------------------------------------------
2022-03-08 08:53:32,161 DEBUG: Analytics is enabled.
2022-03-08 08:53:32,192 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp4x5js0dk']'
2022-03-08 08:53:32,193 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp6x11s0dk']'
And btw this is how I cloned my git repository (repo A)
!git config - global user.name "Zharfan"
!git config - global user.email "[email protected]"
!git clone https://<MyTokenName>:<MyToken>@link-to-my-repo-A.git
Does anyone know why? Any help would be greatly appreciated. Thank you in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

总结一下评论区的讨论。
最有可能发生这种情况是因为 DVC 无法访问 GitLab 上的私人存储库。 (错误消息很模糊,应该修复。)
同样,您将无法运行:
它还会返回一个非常模糊的错误:(
我认为这与 Colab 中 tty 的设置方式有关?
)解决这个问题的方法是使用 SSH,就像描述的此处例如。
To summarize the discussion in the comments thread.
Most likely it's happening since DVC can't get access to a private repo on GitLab. (The error message is obscure and should be fixed.)
The same way you would not be able to run:
It also returns a pretty obscure error:
(I think it's something related to how tty is setup in Colab?)
The best approach to solve this is to use SSH like described here for example.