分布式版本控制系统真的没有集中存储库吗?
这似乎是一个愚蠢的问题,但是如何在没有服务器可供检出的情况下设置工作目录呢?企业如何保存存储库的安全备份副本?
我认为必须有一个中央仓库......但是它到底是如何“分布”的?我一直想到服务器-客户端 (SVN) 与点对点 (GIT) 的区别,但我不认为这是正确的,除非像 GIT 这样的工具依赖于 torrent 风格的技术?
It might seem a silly question, but how do you get a working drectory set up without a server to check out from? And how does a business keep a safe backed up copy of the repo?
I assume then there must be a central repo... but then how exactly is it 'distributed'? I always thought of a server-client (SVN) Vs peer-2-peer (GIT) distinction, but I don't believe that can be correct unless tools like GIT are dependent on torrent-style technology?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
没有强制中央存储库 - 这只是按照惯例。大多数项目确实有一个中央存储库,但每个存储库都是平等的,因为它们具有完整的历史记录,并且可以在彼此之间推送和拉取补丁。
一种思考方式是,集中式 VCS 固定在星型拓扑中:一个中央集线器充当具有完整存储库的服务器,一个或多个客户端挂在其上。客户通常只有最近干净结帐的副本和有限的历史记录(如果有)。因此大多数操作都需要与服务器进行往返。分支是通过在一个存储库中创建分支来实现的。
在分布式 VCS 中,网络拓扑没有限制。理论上你可以拥有任何你喜欢的形状。您可以为每个团队或子项目拥有一个单独的存储库,并进行阶段提交。您可以拥有一个稳定的存储库和一个不稳定的存储库,以及许多功能分支,等等。并且不存在客户端/服务器的区别——所有节点都是平等的。每个存储库都是独立且完整的,并且可以从任何其他存储库推送和/或拉取更改。首先,您克隆现有存储库(制作您自己的副本以供工作),然后开始进行更改。一旦你进行了第一次提交,你实际上就拥有了一个分支。幸运的是,完成后通常很容易将更改合并回来。
但通常发生的情况是,您拥有一个位于中央服务器上的存储库,这使人们更容易上手并跟踪最新更改的位置。
您的存储库必须从源树的某个位置开始。因此,始终存在第一个存储库,以及最初的一系列签入。假设您想在 Murky 上工作。您可以克隆存储库,这将为您提供一个自己的完整存储库,其中包含所有历史记录和签入。您进行一些更改(从而创建分支),完成后,您将更改推回原处,并在其中合并。两个系统都充当对等体,并且它们在彼此之间推送和拉动变更集。
Mercurial 和 Git 都将存储库保存在隐藏的子目录中,因此一个目录树既包含您的工作副本(可以处于您喜欢的任何状态),也包含存储库本身。
如上所述,您只需拥有一个指定的主存储库,其中包含所有最新合并的更改,并像其他任何东西一样对其进行备份。您甚至可以拥有多个备份存储库,或者在物理上独立的盒子上进行自动克隆。在某些方面,备份更容易。
它不是分布式的,因为不同的客户端有不同的部分,例如点对点文件共享。这实际上与中心化模型形成鲜明对比。
所有 DVCS 存储库都是一等公民。如何安排它们成为一个社会或管理问题,而不是一个技术问题。
There is no enforced central repository - it is only by convention. Most projects do have a central repository, but each repository is equal in the sense that they have the full history, and can push and pull patches between each other.
One way to think of it is a centralised VCS is fixed in a star topology: one central hub acts as the server with the complete repository, with one or more clients hanging off it. The clients typically only have a copy of the most recent clean checkout, and limited history (if any). So most operations require a round-trip to the server. Branching is achieved by creating branches within the one repository.
In a distributed VCS, there is no limit to the topology of your network. You can theoretically have any shape you like. You can have a separate repository per team or sub-project, and stage commits. You can have a stable repository and an unstable repository, and lots of feature branches, and so on. And there is no client/server distinction - all nodes are equal. Each repository is self-contained and complete, and can push and/or pull changes from any other. To get started, you clone an existing repository (make your own copy to work from), and start making changes. Once you make your first commit, you effectively have a branch. Fortunately, it is usually very easy to merge your changes back when you're done.
But what normally happens is you have one repository which is on a central server, which makes it easier for people to get started, and to keep track of where the latest changes are.
Your repository has to start somewhere with a source tree. So there is always a first repository, with the initial series of checkins. Let's say you want to work on Murky. You would clone the repository, which gives you a complete repository of your own, with all the history and checkins. You make some changes (thus creating a branch), and when you're done, you push your changes back, where they get merged. Both systems are acting as peers, and they push and pull changesets between each other.
Both Mercurial and Git keep the repository in a hidden subdirectory, so the one directory tree contains both your working copy (which can be in whatever state you like), and the repo itself.
As above, you simply have a nominated master repository which has all the latest merged changes, and back it up like anything else. You can even have multiple backup repos, or have automated clones on physically separate boxes. In some ways, backing up is easier.
It is not distributed in the sense that different clients have different parts, like peer-to-peer file sharing. It is really just in contrast to the centralised model.
All DVCS repositories are first-class citizens. It becomes a social or managerial question of how to arrange them, rather than a technical issue.
回复:“洪流式技术”-您混淆了两个问题,一个是网络拓扑(点对点与服务器/客户端),另一个是服务器权限。这是可以理解的,因为术语几乎相同。但是分布式源代码控制对网络连接模型没有任何要求 - 如果您愿意,您可以通过电子邮件分发变更集。分布式版本控制的重要之处在于,每个人本质上都运行自己的服务器并合并来自其他服务器的更改。当然,您需要能够从某个地方获取初始克隆,以及如何知道“某个地方”在哪里超出了系统本身的范围。没有“跟踪器”程序或任何东西——通常有人在某个地方有一个公共存储库,其地址发布在网站上。但是一旦你克隆了它,你的副本就是一个完整的副本,可以作为其他人克隆的基础。
Re: 'torrent-style technology' - you're confusing 2 issues, one of network topology (peer to peer vs. server/client) and one of server authority. This is understandable because the terms are almost identical. But there's nothing about distributed source control that makes any requirements on the network connection model - you could be distributing changesets via email if you prefer. The important thing with distributed version control is that each person essentially runs their own server and merges changes in from the other servers. Of course, you need to be able to get your initial clone from somewhere, and how you know where that 'somewhere' is falls outside of the scope of the system itself. There is no 'tracker' program or anything - typically someone has a public repository somewhere with the address published on a web site. But once you've cloned it, your copy is a full one that is capable of being the basis for someone else's clone.
这里有一个重要的区别:是否有一个技术中央服务器,或者是否有一个按照惯例。
技术上 git 存储库的所有克隆都是等效的。它们都允许更改、签入、分支、相互合并。没有任何一个存储库在某种程度上比任何其他存储库“更真实”。
按照社会惯例,大多数使用 git 的项目都有一个被认为是权威存储库的中央存储库,代表项目的官方状态。
与更传统的 VCS(例如 SVN)相比:这里的中央存储库在技术上与每个开发人员可能拥有的本地结帐非常不同。本地签出只能执行与中央存储库相关的 VCS 操作。如果没有中央存储库,开发人员就无法提交。
There's an important distinction to make here: is there a technical central server, or is there one by convention.
Technically all clones of a of a git repository are equivalent. All of them allow changes, check-ins, branches, merging with each other. There's no single repository that is somehow "more true" than any other.
By social convention most projects using git have a central repository that's considered the authoritative repository, representing the official state of the project.
Compare that with a more traditional VCS such as SVN: here the central repository is technically very different from the local checkout that each developer may have. The local check-out can only do VCS operations in relation to the central repository. Without the central repository, the developer can't commit.
通过分布式版本控制,您可以将整个历史记录(整个存储库)的完整副本嵌入到本地(签出)副本中。
此外,大多数项目都会有一些中央存储库,其中也有所有内容的副本。这意味着在某些时候您需要将更改从本地存储库推送到中央存储库。但这也意味着您可以在本地随心所欲地工作,然后只推送您想要推送的更改,并且只需要在准备好时推送它们。
例如,看看 Linux 内核:很多人会从某个地方“克隆”内核树。它可能是 Linus 的树,也可能是 kernel.org 或互联网上漂浮的其他树之一。但 Linus 的树存在于 kernel.org 上,并且(大概)也存在于 Linus 的计算机上(以及从那里提取的任何其他人的计算机上)。
Joel 的最新博客文章描述了这些优点(以及与诸如DVCS 的最佳颠覆):
因此,您将树的副本放在某个中央服务器上,其他人可以从中获取(或放在您的私人服务器上,以便您有备份),当您愿意时,您可以将一些位推送到那里。然后,如果有人想要副本,他们可以从那里克隆。
With distributed version control, you can have a complete copy of the entire history (the entire repository) embedded as part of your local (checked out) copy.
In addition, most projects will have some central repository that will also have a copy of everything. This means that at some point you will need to push your changes from your local repository to the central one. But it also means that you can work locally to your heart's content, and then only push the changes that you want to push, and you only need to push them when you are ready.
For example, look at the Linux kernel: Lots of people will check out "clone" a kernel tree from somewhere. It might be Linus's tree, or it might be one of the other trees floating around kernel.org or the internet. But Linus's tree exists on both kernel.org and (presumably) also on Linus's computer(s) (and the computers of anyone else that has pulled from there).
Joel's latest blog post described the advantages (and major difference from systems like Subversion) of a DVCS best:
So you put a copy of the tree on some central server somewhere that other people can pull from (or on your private server so you have a backup) and when you feel like it, you push some bits across to there. Then if someone wants a copy, they can clone from there.
从技术上讲,DVCS 不需要集中存储库。
在现实生活中,应用程序(例如 Linux 内核)必须在交付之前从单个商定的源集合构建。
通过这种方式,DVCS 不会强加任何源管理策略,并将此类决定留给项目经理。
Technically, DVCS don't need centralised repository.
In real life, an application (e.g. a Linux kernel) must be built from a single agreed collection of sources before being delivered.
In this way, DVCS don't impose any source management policy and leave such decisions to project managers.
“点对点”类比实际上是指如何从另一个存储库获取更改。
因为您可以从任何其他“对等”存储库(共享相同的首次提交的存储库)获取内容,您可以将其视为 点对点模型,因为:
The "peer-to-peer" analogy refers actually to how you get changes from another repo.
Since you can fetch from any other "peer" repository (a repo which share the same first commit), you could consider this a peer-to-peer model, since:
从技术上讲,您不需要中央服务器:您只需与同行交换提交即可,仅此而已。
从逻辑上讲(只需看看 github.com)(至少)总会有一个中央存储库,某种您必须依赖的“主副本”。我想在 Linux 内核上,Linus 的存储库是最终接受更改的主存储库,不是吗?
我认为对于采用 DVCS 的公司来说尤其如此:他们不会依赖开发人员的“副本”,而是依赖集中式副本,尽管很明显,可能不止一个副本(这也很好地避免了灾难:- P,并且在 DVCS 中很自然地发生)
Technically you don't need a central server: you can just exchange commits with your peers and that's it.
Logically (just take a look at github.com) there will be ALWAYS (at least) a central repository, some sort of "master copy" you've to rely on. I guess on Linux Kernel the Linus' repo is the master one from which ultimately changes are accepted, isn't it?
I think this will be specially true for companies embracing DVCS: they won't rely on developers' "copies" but centralized ones, although, obviously, there could be MORE than just one copy (which is very good to avoid disaster too :-P, and happens rather naturally with DVCS)