Google Colab 上的 DVC 错误 - dvc.scm.CloneError:无法克隆存储库

发布于 01-12 02:13 字数 6545 浏览 4 评论 0原文

我尝试在 Google Colab 上运行“dvc pull”时遇到问题。我有两个存储库(我们称之为 A 和 B),其中存储库 A 用于我的机器学习代码,存储库 B 用于我的数据集。

我已成功使用 DVC 将数据集推送到存储库 B(使用 gdrive 作为远程存储),并且我还成功在存储库 A 的本地项目上运行“dvc import”(以及“dvc pull/update”)。

当我使用 colab 运行我的项目时,问题就出现了。所以我做了以下事情:

  1. 在colab上创建了一个新笔记本
  2. 成功地git克隆了我的机器学习项目(存储库A)
  3. Ran“!pip install dvc
  4. ”Ran“!dvc pull -v”(这就是导致错误的原因)

On第 4 步,我收到错误(这是完整的堆栈跟踪。请注意,出于保密原因,我更改了堆栈跟踪中的存储库 URL)

2022-03-08 08:53:31,863 DEBUG: Adding '/content/<my_project_A>/.dvc/config.local' to gitignore file.
2022-03-08 08:53:31,866 DEBUG: Adding '/content/<my_project_A>/.dvc/tmp' to gitignore file.
2022-03-08 08:53:31,866 DEBUG: Adding '/content/<my_project_A>/.dvc/cache' to gitignore file.
2022-03-08 08:53:31,916 DEBUG: Creating external repo https://gitlab.com/<my-dataset-repo-B>.git@3a3f2019efabff8ec71429da39b86688d1c98e75
2022-03-08 08:53:31,916 DEBUG: erepo: git clone 'https://gitlab.com/<my-dataset-repo-B>.git' to a temporary dir
Everything is up to date.
2022-03-08 08:53:32,154 ERROR: failed to pull data from the cloud - Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/backend/gitpython.py", line 185, in clone
    tmp_repo = clone_from()
  File "/usr/local/lib/python3.7/dist-packages/git/repo/base.py", line 1148, in clone_from
    return cls._clone(git, url, to_path, GitCmdObjectDB, progress, multi_options, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/git/repo/base.py", line 1079, in _clone
    finalize_process, decode_streams=False)
  File "/usr/local/lib/python3.7/dist-packages/git/cmd.py", line 176, in handle_process_output
    return finalizer(process)
  File "/usr/local/lib/python3.7/dist-packages/git/util.py", line 386, in finalize_process
    proc.wait(**kwargs)
  File "/usr/local/lib/python3.7/dist-packages/git/cmd.py", line 502, in wait
    raise GitCommandError(remove_password_if_present(self.args), status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git clone -v --no-single-branch --progress https://gitlab.com/<my-dataset-repo-B>.git /tmp/tmp2x6y9z0edvc-clone

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/dvc/scm.py", line 104, in clone
    return Git.clone(url, to_path, progress=pbar.update_git, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/__init__.py", line 121, in clone
    backend.clone(url, to_path, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/backend/gitpython.py", line 190, in clone
    raise CloneError(url, to_path) from exc
scmrepo.exceptions.CloneError: Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/dvc/command/data_sync.py", line 41, in run
    glob=self.args.glob,
  File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/dvc/repo/pull.py", line 38, in pull
    run_cache=run_cache,
  File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/dvc/repo/fetch.py", line 50, in fetch
    revs=revs,
  File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 437, in used_objs
    with_deps=with_deps,
  File "/usr/local/lib/python3.7/dist-packages/dvc/repo/index.py", line 190, in used_objs
    filter_info=filter_info,
  File "/usr/local/lib/python3.7/dist-packages/dvc/stage/__init__.py", line 660, in get_used_objs
    for odb, objs in out.get_used_objs(*args, **kwargs).items():
  File "/usr/local/lib/python3.7/dist-packages/dvc/output.py", line 918, in get_used_objs
    return self.get_used_external(**kwargs)
  File "/usr/local/lib/python3.7/dist-packages/dvc/output.py", line 973, in get_used_external
    return dep.get_used_objs(**kwargs)
  File "/usr/local/lib/python3.7/dist-packages/dvc/dependency/repo.py", line 94, in get_used_objs
    used, _ = self._get_used_and_obj(**kwargs)
  File "/usr/local/lib/python3.7/dist-packages/dvc/dependency/repo.py", line 108, in _get_used_and_obj
    locked=locked, cache_dir=local_odb.cache_dir
  File "/usr/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 35, in external_repo
    path = _cached_clone(url, rev, for_write=for_write)
  File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 155, in _cached_clone
    clone_path, shallow = _clone_default_branch(url, rev, for_write=for_write)
  File "/usr/local/lib/python3.7/dist-packages/funcy/decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/usr/local/lib/python3.7/dist-packages/funcy/flow.py", line 274, in wrap_with
    return call()
  File "/usr/local/lib/python3.7/dist-packages/funcy/decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 220, in _clone_default_branch
    git = clone(url, clone_path)
  File "/usr/local/lib/python3.7/dist-packages/dvc/scm.py", line 106, in clone
    raise CloneError(str(exc))
dvc.scm.CloneError: Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'
------------------------------------------------------------
2022-03-08 08:53:32,161 DEBUG: Analytics is enabled.
2022-03-08 08:53:32,192 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp4x5js0dk']'
2022-03-08 08:53:32,193 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp6x11s0dk']'

顺便说一句,这就是我克隆 git 存储库(存储库 A)的方式

!git config - global user.name "Zharfan"
!git config - global user.email "[email protected]"
!git clone https://<MyTokenName>:<MyToken>@link-to-my-repo-A.git

有谁知道为什么吗?任何帮助将不胜感激。先感谢您!

I'm having a problem trying to run "dvc pull" on Google Colab. I have two repositories (let's call them A and B) where repository A is for my machine learning codes and repository B is for my dataset.

I've successfully pushed my dataset to repository B with DVC (using gdrive as my remote storage) and I also managed to successfully run "dvc import" (as well as "dvc pull/update") on my local project of repository A.

The problem comes when I use colab to run my project. So what I did was the following:

  1. Created a new notebook on colab
  2. Successfully git-cloned my machine learning project (repository A)
  3. Ran "!pip install dvc"
  4. Ran "!dvc pull -v" (This is what causes the error)

On step 4, I got the error (this is the full stack trace. Note that I changed the repo URL in the stack trace for confidentiality reasons)

2022-03-08 08:53:31,863 DEBUG: Adding '/content/<my_project_A>/.dvc/config.local' to gitignore file.
2022-03-08 08:53:31,866 DEBUG: Adding '/content/<my_project_A>/.dvc/tmp' to gitignore file.
2022-03-08 08:53:31,866 DEBUG: Adding '/content/<my_project_A>/.dvc/cache' to gitignore file.
2022-03-08 08:53:31,916 DEBUG: Creating external repo https://gitlab.com/<my-dataset-repo-B>.git@3a3f2019efabff8ec71429da39b86688d1c98e75
2022-03-08 08:53:31,916 DEBUG: erepo: git clone 'https://gitlab.com/<my-dataset-repo-B>.git' to a temporary dir
Everything is up to date.
2022-03-08 08:53:32,154 ERROR: failed to pull data from the cloud - Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/backend/gitpython.py", line 185, in clone
    tmp_repo = clone_from()
  File "/usr/local/lib/python3.7/dist-packages/git/repo/base.py", line 1148, in clone_from
    return cls._clone(git, url, to_path, GitCmdObjectDB, progress, multi_options, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/git/repo/base.py", line 1079, in _clone
    finalize_process, decode_streams=False)
  File "/usr/local/lib/python3.7/dist-packages/git/cmd.py", line 176, in handle_process_output
    return finalizer(process)
  File "/usr/local/lib/python3.7/dist-packages/git/util.py", line 386, in finalize_process
    proc.wait(**kwargs)
  File "/usr/local/lib/python3.7/dist-packages/git/cmd.py", line 502, in wait
    raise GitCommandError(remove_password_if_present(self.args), status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git clone -v --no-single-branch --progress https://gitlab.com/<my-dataset-repo-B>.git /tmp/tmp2x6y9z0edvc-clone

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/dvc/scm.py", line 104, in clone
    return Git.clone(url, to_path, progress=pbar.update_git, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/__init__.py", line 121, in clone
    backend.clone(url, to_path, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/backend/gitpython.py", line 190, in clone
    raise CloneError(url, to_path) from exc
scmrepo.exceptions.CloneError: Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/dvc/command/data_sync.py", line 41, in run
    glob=self.args.glob,
  File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/dvc/repo/pull.py", line 38, in pull
    run_cache=run_cache,
  File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/dvc/repo/fetch.py", line 50, in fetch
    revs=revs,
  File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 437, in used_objs
    with_deps=with_deps,
  File "/usr/local/lib/python3.7/dist-packages/dvc/repo/index.py", line 190, in used_objs
    filter_info=filter_info,
  File "/usr/local/lib/python3.7/dist-packages/dvc/stage/__init__.py", line 660, in get_used_objs
    for odb, objs in out.get_used_objs(*args, **kwargs).items():
  File "/usr/local/lib/python3.7/dist-packages/dvc/output.py", line 918, in get_used_objs
    return self.get_used_external(**kwargs)
  File "/usr/local/lib/python3.7/dist-packages/dvc/output.py", line 973, in get_used_external
    return dep.get_used_objs(**kwargs)
  File "/usr/local/lib/python3.7/dist-packages/dvc/dependency/repo.py", line 94, in get_used_objs
    used, _ = self._get_used_and_obj(**kwargs)
  File "/usr/local/lib/python3.7/dist-packages/dvc/dependency/repo.py", line 108, in _get_used_and_obj
    locked=locked, cache_dir=local_odb.cache_dir
  File "/usr/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 35, in external_repo
    path = _cached_clone(url, rev, for_write=for_write)
  File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 155, in _cached_clone
    clone_path, shallow = _clone_default_branch(url, rev, for_write=for_write)
  File "/usr/local/lib/python3.7/dist-packages/funcy/decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/usr/local/lib/python3.7/dist-packages/funcy/flow.py", line 274, in wrap_with
    return call()
  File "/usr/local/lib/python3.7/dist-packages/funcy/decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 220, in _clone_default_branch
    git = clone(url, clone_path)
  File "/usr/local/lib/python3.7/dist-packages/dvc/scm.py", line 106, in clone
    raise CloneError(str(exc))
dvc.scm.CloneError: Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'
------------------------------------------------------------
2022-03-08 08:53:32,161 DEBUG: Analytics is enabled.
2022-03-08 08:53:32,192 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp4x5js0dk']'
2022-03-08 08:53:32,193 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp6x11s0dk']'

And btw this is how I cloned my git repository (repo A)

!git config - global user.name "Zharfan"
!git config - global user.email "[email protected]"
!git clone https://<MyTokenName>:<MyToken>@link-to-my-repo-A.git

Does anyone know why? Any help would be greatly appreciated. Thank you in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

我喜欢麦丽素2025-01-19 02:13:36

总结一下评论区的讨论。

最有可能发生这种情况是因为 DVC 无法访问 GitLab 上的私人存储库。 (错误消息很模糊,应该修复。)

同样,您将无法运行:

!git clone https://gitlab.com/org/<private-repo>

它还会返回一个非常模糊的错误:(

Cloning into '<private-repo>'...
fatal: could not read Username for 'https://gitlab.com': No such device or address

我认为这与 Colab 中 tty 的设置方式有关?

)解决这个问题的方法是使用 SSH,就像描述的此处例如。

To summarize the discussion in the comments thread.

Most likely it's happening since DVC can't get access to a private repo on GitLab. (The error message is obscure and should be fixed.)

The same way you would not be able to run:

!git clone https://gitlab.com/org/<private-repo>

It also returns a pretty obscure error:

Cloning into '<private-repo>'...
fatal: could not read Username for 'https://gitlab.com': No such device or address

(I think it's something related to how tty is setup in Colab?)

The best approach to solve this is to use SSH like described here for example.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文