分布式版本控制系统和企业——一个很好的组合?

发布于 2024-11-01 11:27:55 字数 1431 浏览 2 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

爱已欠费 2024-11-08 11:27:55

我刚刚在一家大型银行公司引入了 DVCS(本例中为 Git),其中 Perforce、SVN 或 ClearCase 是集中式 VCS 的选择:
我已经知道这些挑战(请参阅我之前的答案“我们最终可以在企业软件中转向 DVCS 吗?SVN 仍然是开发的“必须具备”吗?”)

我在三个方面受到了挑战:

  • 中心化:虽然去中心化模型有其优点(并且允许私人提交或在没有网络的情况下工作,同时可以访问完整历史记录),但仍然需要成为一组清晰的集中式存储库,作为所有开发人员的主要参考。

  • 身份验证:DVCS 允许您以几乎任何人的身份“签核”(提交)您的代码(作者“foo”,电子邮件“< code>[电子邮件受保护]")。
    您可以执行 git config user.name foogit config user.namewhateverNameIFeelToHave,并让所有提交都包含虚假名称。
    这与大型企业使用的独特的集中式“Active Directory”用户引用并不能很好地结合起来。

  • 授权:默认情况下,您可以克隆、从任何存储库推送或拉取到任何存储库,以及修改任何分支或任何目录。
    对于敏感项目,这可能是一个阻塞问题(银行界通常对某些定价或量化算法非常保护,这些算法需要对非常有限的人员进行严格的读/写访问)

答案(对于 Git 设置)是:

  • 集中化:已为任何存储库设置了唯一的服务器,所有用户都必须可以访问。
    备份一直在照顾(每天增量,每周完整)。
    已实施 DRP(灾难恢复计划),在另一个站点上有第二台服务器,并通过 SRDF
    此设置本身与您所需的引用或工具的类型无关(DVCS、Nexus 存储库、主 Hudson 调度程序,或...):对于发布到生产环境至关重要的任何工具都需要安装在服务器上具有备份和灾难恢复功能。

  • 身份验证:只有两种协议允许用户访问主存储库:
    • 基于 ssh,带有公钥/私钥:
      • 对组织外部的用户有用(例如离岸开发),
      • 对于 Active Directory 管理员不想创建的通用帐户很有用(因为它将是“匿名”帐户):必须由真人负责该通用帐户,并且那将是拥有私钥的人
    • 基于 https,Apache 通过 LDAP 设置对用户进行身份验证:这样,必须为这些存储库上的任何 git 操作提供实际的登录信息。
      Git 为其提供了智能 http 协议,不仅允许 通过http拉(读取),也可以通过http(写入)。

身份验证部分还在 Git 级别通过 post-receive 挂钩得到加强,确保您正在推送的至少一个提交存储库的“提交者名称”等于通过 shh 或 http 协议检测到的用户名。
换句话说,您需要正确设置您的 git config user.name ,否则您想要向中央存储库进行的任何推送都将被拒绝。

  • 授权:之前的两个设置(ssh 或 https)都连接到调用同一组 perl 脚本,名为 gitolite,参数如下:
    • 这两个协议检测到的实际用户名
    • 用户想要执行的 git 命令(克隆、推送或拉取)

gitolite perl 脚本将解析一个简单的文本文件,其中授权(对所有存储库或给定存储库中的分支的读/写访问权限,甚至存储库中的目录)已设置。
如果 git 命令所需的访问级别与该文件中定义的 ACL 不匹配,则该命令将被拒绝。


上面描述了我需要为 Git 设置实现的内容,但更重要的是,它列出了 DVCS 设置需要解决的主要问题,以便在具有独特用户群的大公司中发挥作用。

然后,也只有到那时,DVCS(Git、Mercurial 等)才能添加值,因为:

  • 多个站点之间的数据交换:同时这些用户都通过相同的 Active Directory 进行身份验证,他们可以位于世界各地(我工作过的公司通常在两个或三个国家的团队之间进行开发)。 DVCS 自然是为了在这些分布式团队之间有效地交换数据而设计的。

  • 跨环境复制:负责身份验证/授权的设置允许在其他专用服务器上克隆这些存储库(用于集成测试、UAT 测试、预生产和预部署目的)< /p>

  • <流程自动化:您可以轻松地克隆存储库,也可以在一个用户的工作站上本地使用,通过“受保护的提交”技术和其他巧妙的用途进行单元测试:请参阅“源存储库最聪明的用途是什么你见过吗?”。
    简而言之,您可以推送到第二个本地存储库,负责:

    • 各种任务(单元测试或代码静态分析)
    • 如果这些任务成功,则返回主存储库
    • 您仍在第一个存储库中工作时,无需等待这些任务的结果。

  • 杀手级功能:任何 DVCS 都附带这些功能,主要是其中之一是合并(曾经尝试过使用 SVN 执行复杂的合并工作流程吗?或者缓慢使用 ClearCase 合并 6000 个文件?)。
    仅此一点(合并)就意味着您可以真正利用分支,同时始终能够将您的代码合并回另一个“主”开发线,因为您会这样做:

    • 首先在您自己的存储库中进行本地操作,而不打扰任何人
    • 然后在远程服务器上,将合并结果推送到中央存储库。

I have just introduced a DVCS (Git in this case) in a large banking company, where Perforce, SVN or ClearCase was the centralized VCS of choices:
I already knew of the challenges (see my previous answer "Can we finally move to DVCS in Corporate Software? Is SVN still a 'must have' for development?")

I have been challenged on three fronts:

  • centralization: while the decentralized model has its merits (and allows for private commits or working without the network while having access to the full history), there still needs to be a clear set of centralized repos, acting as the main reference for all developers.

  • authentication: a DVCS allows you to "sign-off" (commit) your code as... pretty much anyone (author "foo", email "[email protected]").
    You can do a git config user.name foo, or git config user.name whateverNameIFeelToHave, and have all your commits with bogus names in it.
    That doesn't mix well with the unique centralized "Active Directory" user referential used by big enterprises.

  • authorization: by default, you can clone, push from or pull to any repository, and modify any branch, or any directory.
    For sensitive projects, that can be a blocking issue (the banking world is usually very protective of some pricing or quants algorithms, which require strict read/write access for a very limited number of people)

The answer (for a Git setup) was:

  • centralization: a unique server has been set up for any repository having to be accessible by all users.
    Backup has been taking care of (incremental every day, full every week).
    DRP (Disaster Recovery Plan) has been implemented, with a second server on another site, and with real-time data replication through SRDF.
    This setup in itself is independent of the type of referential or tool you need (DVCS, or Nexus repo, or main Hudson scheduler, or...): any tool which can be critical for a release into production needs to be installed on servers with backup and DR.

.

  • authentication: only two protocols allow users to access the main repos:
    • ssh based, with public/private key:
      • useful for users external to the organization (like off-shore development),
      • and useful for generic accounts that Active Directory manager don't want to create (because it would be an "anonymous" account): a real person has to be responsible for that generic account, and that would be the one owning the private key
    • https-based, with an Apache authenticating the users through a LDAP setting: that way, an actual login must be provided for any git operation on those repos.
      Git offers it with its smart http protocol, allowing not just pull (read) through http, but also push (write) through http.

The authentication part is also reinforced at the Git level by a post-receive hook which makes sure that at least one of the commits you are pushing to a repo has a "committer name" equals to the user name detected through shh or http protocol.
In other words, you need to set up your git config user.name correctly, or any push you want to make to a central repo will be rejected.

.

  • authorization: both previous settings (ssh or https) are wired to call the same set of perl script, named gitolite, with as parameters:
    • the actual username detected by those two protocols
    • the git command (clone, push or pull) that user wants to do

The gitolite perl script will parse a simple text file where the authorizations (read/write access for a all repository, or for branches within a given repository, or even for directories within a repository) have been set.
If the access level required by the git command doesn't match the ACL defined in that file, the command is rejected.


The above describes what I needed to implement for a Git setting, but more importantly, it lists the main issues that need to be addressed for a DVCS setting to make sense in a big corporation with a unique user base.

Then, and only then, a DVCS (Git, Mercurial, ...) can add values because of:

  • data exchange between multiple sites: while those users are all authenticated through the same Active Directory, they can be located across the world (the companies I have worked for have developments usually between teams across two or three countries). A DVCS is naturally made for exchanging efficiently data between those distributed teams.

  • replication across environments: a setting taking care of authentication/authorization allows for cloning those repositories on other dedicated servers (for integration testing, UAT testing, pre-production, and pre-deployment purposes)

  • process automation: the ease with which you can clone a repo can also be used locally on one user's workstation, for unit-testing purposes with the "guarded commits" techniques and other clever uses: see "What is the cleverest use of source repository that you have ever seen?".
    In short, you can push to a second local repo in charge of:

    • various tasks (unit test or static analysis of the code)
    • pushing back to the main repo if those tasks are successful
    • while you are still working in the first repo without having to wait for the result of those tasks.

.

  • killer features: Any DVCS comes with those, the main one being merging (ever tried to do a complex merge workflow with SVN? Or sloooowly merge 6000 files with ClearCase?).
    That alone (merging) means you can really take advantage of branching, while being able at all time to merge back your code to another "main" line of development because you would do so:

    • first locally within your own repo, without disturbing anybody
    • then on the remote server, pushing the result of that merge on the central repo.
落花浅忆 2024-11-08 11:27:55

分布式源模型在企业中绝对有意义,但它确实取决于团队的结构。

分布式源代码控制使您可以灵活地创建自己的工作流程。

如果你愿意的话,想象一下,一个更大的团队,其中有一些较小的团队在不同的功能分支上工作。

  • 这些团队都可以拥有自己的中央存储库,以及自己的构建自动化/签入控制机制。
  • 他们可以在任何地方工作,并在需要时备份本地工作。
  • 然后,他们可以选择要在群组之间共享哪些签到。
  • 他们可以拥有一个单独的集成商,在自己的机器上工作,执行合并,而不影响其他人。

这些是您可以使用传统集中式服务器实现的目标,但正如 @Brook 指出的那样,集中式模型必须可扩展,而分布式模型已经分片,因此没有(或者至少,更少)需要垂直扩展任何服务器。

Absolutely a distributed source model can make sense in an enterprise, but it does depend on the structure of your teams.

Distributed source control gives you the flexibility to create your own workflows.

Imagine, if you will, a larger team, within which are smaller teams working on separate feature branches.

  • These teams can all have their own central repositories, with their own build automation/checkin control mechanisms.
  • They can work anywhere, and backup their local work whenever they so desire.
  • They can then choose what checkins they'd like to share between groups.
  • They can have a single individual integrator, working on their own machine, performing merging, without impacting others.

These are things you could achieve with a traditional centralised server, but as @Brook points out, the centralised model has to scale, whereas the distributed model is already sharded, so no (or at least, less) need to vertically scale any servers.

仅此而已 2024-11-08 11:27:55

为了补充其他评论,我认为没有理由不能拥有企业中央存储库。从技术上讲,它只是另一个存储库,但它是您从中发布生产的存储库。我已经使用某种形式的 VCS 超过 30 年了,我可以说切换到 Mercurial 就像一个城市男孩第一次呼吸干净的乡村空气。

To add to the other comments, I would observe that there's no reason you can't have a Corporate Central Repository. Technically it's just another repository, but it's the one you ship production from. I've been using one form or another of VCS for over 30 years and I can say that switching to Mercurial was like a city boy breathing clean country air for the first time.

七分※倦醒 2024-11-08 11:27:55

对于离线或慢速网络,DSCS(通常)比集中式系统有更好的故事。它们往往更快,这对于进行大量签入的开发人员(使用 TDD)来说确实值得注意。

集中式系统最初更容易掌握,对于经验不足的开发人员来说可能是更好的选择。 DVCS 允许您创建大量迷你分支并隔离新功能,同时仍然对绿色编码风格进行红绿重构签入。同样,这非常强大,但只对相当精明的开发团队有吸引力。

如果您处理不可合并的文件(如数字资产和非文本文档(PDF 和 Word 等)),那么拥有一个支持独占锁的中央存储库是有意义的,因为它可以防止您陷入混乱并手动合并。

我认为开发人员的数量或代码库的大小对其影响不大,这两个系统都已被证明支持大型源代码树和提交者数量。然而,对于大型代码库和项目,DVCS 在快速创建分散的远程分支方面提供了很大的灵活性。您可以使用集中式系统来做到这一点,但您需要更加慎重,这既有好处也有坏处。

简而言之,需要考虑一些技术方面,但您还应该考虑团队的成熟度以及他们当前围绕 SCCS 的流程。

DSCS have a better story (generally) than centralized systems for offline or slow networks. They tend to be faster, which is really noticable for developers (using TDD) who do lots of check-ins.

Centralized systems are somewhat easier to grasp initially and might be a better choice for less experienced developers. DVCSes allow you to create lots of mini-branches and isolate new features while still doing red-gree-refactor checkin on green style of coding. Again this is very powerful but only attractive to fairly savvy development teams.

Having a single central repository for support for exclusive locks makes sense if you deal with files that are not mergable like digital assets and non-text documents (PDFs and Word etc) as it prevents you getting yourself into a mess and manually merging.

I don't think the number of developers or codebase size plays into it that much, both systems have been show to support large source trees and numbers of committers. However for large code bases and projects DVCS gives a lot of flexibility in quickly creating decentralized remote branches. You can do this with centralized systems but you need to be more deliberate about it which is both good and bad.

In short there are some technical aspects to consider but you should also think about the maturity of your team and their current process around SCCS.

归途 2024-11-08 11:27:55

至少在 tfs 2013 中,您确实能够与本地工作区断开连接。分布式与集中式是由业务定义的,并且取决于正在开发的项目的需求和要求。

对于企业项目,将工作流和文档连接到代码更改的能力对于将业务需求和高阶元素连接到解决特定更改、错误或功能添加的特定代码更改至关重要。

工作流和代码存储库之间的这种连接将 TFS 与仅代码存储库的解决方案分开。对于一些对项目审核要求较高的地方,只有像TFS这样的产品才能满足更多的项目审核要求。

您可以在此处找到应用程序生命周期管理流程的概述。

http://msdn.microsoft.com/ en-us/library/vstudio/fda2bad5(v=vs.110).aspx

At least with tfs 2013 you do have the ability to work disconnected with local workspaces. Distributed vs centralized is defined by the business and depends on the needs and requirements of the projects under development.

For enterprise projects the ability to connect workflow and documents to code changes can be critical in connecting business requirements and higher order elements to specific code changes that address a specific change, bug or feature addition.

This connection between workflow and code repository separates TFS from code repository only solutions. For some places where a higher order of project auditing is required only a product like TFS would satisfy more of the project auditing requirements.

An overview of the application lifecycle management process can be found here.

http://msdn.microsoft.com/en-us/library/vstudio/fda2bad5(v=vs.110).aspx

染年凉城似染瑾 2024-11-08 11:27:55

我们在企业环境中使用 Git 面临的最大问题是缺乏基于路径的读取访问控制。 Git 架构(我假设大多数 DVCS)固有的特点是,如果您拥有对存储库的读取访问权限,您就可以获得整个内容。但有时项目需要稀疏检出(即您想要对靠近源的敏感数据进行版本控制,或者您想要为第三方提供项目部分的选择性视图)。

Git 开箱即用,不提供任何权限 - 您可以编写自己的钩子。

大多数流行的仓库管理器 GithubEnterprise、Gitlab、Bitbucket 都提供基于分支的写入限制。 Gitolite 允许更细粒度,提供基于路径(以及更多)的写入限制。

我听说过的唯一支持读取访问的存储库管理器是 Perforce Helix,它在 perforce 后端之上重新实现了 git 协议,但我没有任何实际操作经验。它很有前途,但我担心它与“普通”git 的兼容性如何。

The biggest issue we face with Git in enterprise setting is the lack of path based read-access control. It is inherent in Git's architecture (and I would assume most DVCSs) that if you got read access to repository you get the whole thing. But sometimes a project would require a sparse checkout (i.e. you want to version control sensitive data close to the source, or you want to give third party a selective view of part of the project).

Out of the box, Git provides no permissions - you've got hooks to write your own.

Most of the popular repo managers GithubEnterprise, Gitlab, Bitbucket provide branch based write restrictions. Gitolite allows to be finer grained, providing path (and more) based write restrictions.

The only repo manager I've heard of supporting read access is the Perforce Helix, which reimplements git protocol on top of perforce backend, but I have no hands-on experience with it. It is promising, but I would be concerned how compatible it is with "plain" git.

久夏青 2024-11-08 11:27:55

对我来说,他们提供的最重要的东西就是速度。对于最常见的操作来说,它们比集中式源代码控制要快几个数量级。

离线工作也是一个巨大的优势。

To me the biggest thing they offer is Speed. They're orders of magnitude faster for the most common operations than centralized source control.

Working disconnected is also a huge plus.

幸福不弃 2024-11-08 11:27:55

我们的团队在切换到 Mercurial 之前使用了 TFS 大约 3 年。 HG 的分支/合并支持比 TFS 好得多。这是因为 DVCS 依赖于无痛合并。

Our team used TFS for about 3 years before switching to Mercurial. HG's branch/merge support is so much better than TFS. This is because a DVCS relies on painless merging.

厌味 2024-11-08 11:27:55

跨远程/断开连接的位置更好的同步。

Better synchronization across remote / disconnected locations.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文