如何处理 git 存储库中广泛的代码格式更改

发布于 2024-08-13 20:30:32 字数 528 浏览 5 评论 0原文

我们有一个包含大约 500,000 行代码的项目,使用 git 进行管理,其中大部分已经有好几年了。我们将进行一系列修改,以使旧代码在命名约定、异常处理、缩进等方面符合开发人员社区的当前标准和最佳实践。

您可以将其视为介于漂亮打印和低级/机械重构之间的东西。

此过程可能会触及代码库中的几乎每一行代码(~85%),并且某些行将进行多达五次修改。所有更改都旨在保持语义中立。

  • 有没有什么方法可以使更改对 gitblame 等透明,以便一个月后查看代码时,我们会看到引入逻辑的提交,而不是缩进或大小写发生更改的提交?
  • 从未经过此过程的分叉中提取合并的最佳方法是什么?我目前的计划是让一个脚本克隆分叉的存储库,对其及其基础应用自动化流程,比较它们,然后应用差异。但我很想得到一个更清晰的答案。
  • 是否还有其他我没有看到的此类问题?如果有,可以采取哪些措施来缓解这些问题?我认为 git bisect 等应该没问题,git log 等。除非你小心,否则跨越鸿沟会很烦人,并且 git diff 将无望,但我不相信我不会忽略另一个痛点。

  • We have a project with around 500,000 lines of code, managed with git, much of it several years old. We're about to make a series of modifications to bring the older code into conformance with the developer community's current standards and best practices, with regards to naming conventions, exception handling, indentation, and so forth.

    You can think of it as something between pretty printing and low level/mechanical refactoring.

    This process is likely to touch almost every line of code in the code base (~85%), and some lines will be subject to as many as five modifications. All of the changes are intended to be semantically neutral.

  • Is there any way to make the changes transparent to git blame, etc. so that when looking at the code a month from now we'll see the commit the logic was introduced in, not the one in which the indentation or capitalization was changed?
  • What's the best way to pull merges from forks that have not undergone this process? My present plan would be to have a script clone the forked repo, apply the automated process to it and its base, diff them, then apply the diff. But I'd love to have a cleaner answer.
  • Are there any other problems of this sort that I'm not seeing, and if so what can be done to mitigate them? I'm figuring that git bisect, etc. should be fine, git log, etc. crossing the great divide will be annoying unless you are careful, and git diff will be hopeless, but I'm not convinced I'm not overlooking another pain point.

  • 如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

    扫码二维码加入Web技术交流群

    发布评论

    需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

    评论(4

    韬韬不绝 2024-08-20 20:30:32

    我不知道如何最好地处理您所描述的一些更具侵入性的更改,但是...

    使用这些选项来进行过滤:

    • -w选项使git忽略空格的变化,这样你就可以更容易地看到真正的差异。
    • -M-C 选项使其遵循重命名复制;在 git Blame 的情况下,还会跨文件移动和复制代码片段。

    请参阅: explainshell.com - git diff -w -M - C

    I don't know how best to deal with some of the more invasive changes you're describing, but...

    Use these options to git blame and git diff to filter:

    • The -w option causes git to ignore changes in whitespace, so you can more easily see the real differences.
    • The -M and -C options make it follow renames and copies; in the case of git blame also moving and copying of fragments of code across files.

    See: explainshell.com - git diff -w -M -C

    鹿港小镇 2024-08-20 20:30:32

    我建议在一个中央 Git 存储库中一次一步地进行这些演变(中央如“所有其他存储库要遵循的公共参考”):

    • 缩进
    • ,然后重新排序方法
    • ,然后重命名
    • ......

    但不是“缩进-重新排序-重命名” -...-一个巨大的提交”。

    这样,您就可以给 Git 一个合理的机会来跟踪重构修改中的更改。

    另外,我不会接受任何未应用相同内容的新合并(从其他存储库中提取)在推送代码之前进行重构。
    如果应用格式过程对获取的代码带来任何更改,您可以拒绝它并要求远程存储库首先符合新标准(至少在进行更多推送之前从存储库中提取)。

    I would recommend making those evolutions one step at a time, in a central Git repo (central as in "public reference for all other repositories to follow):

    • indentation
    • then reordering methods
    • then renaming
    • then ...

    But not "indentation-reordering-renaming-...-one giant commit".

    That way, you give to Git a reasonable chance to follow the changes across refactoring modifications.

    Plus, I would not accept any new merge (pulled from other repo) which do not have applied the same refactoring before pushing their code.
    If applying the format process brings any changes to the fetched code, you could reject it and ask for the remote repo to conform to the new standards first (at least by pulling from your repo before making any more push).

    ゃ人海孤独症 2024-08-20 20:30:32

    您还需要一个允许主动忽略空白的合并工具。 p4merge 就是这样做的,并且可以免费下载。

    You will also need a mergetool that allows agressive ignoring of whitespace. p4merge does this, and is freely downloadable.

    野の 2024-08-20 20:30:32

    这个问题有一个很好的解决方案。简单使用一下git filter-branch

    我自己使用了这段代码:

    git filter-branch --tree-filter "git diff-tree --name-only --diff-filter=AM -r --no-commit-id \$GIT_COMMIT | grep '.*cpp\|.*h' | xargs ./emacs-script" HEAD

    其中 ./emacs-script 是我使用 emacs 编写的用于更改代码样式的脚本,它只是在每个文件上调用 indent-region

    如果没有从存储库中删除或删除任何文件,则此代码可以正常工作,在这种情况下使用 --ignore-unmatch 可能会有所帮助,但我不确定。

    This question has a good solution for it. Briefly use git filter-branch.

    I used for myself this code:

    git filter-branch --tree-filter "git diff-tree --name-only --diff-filter=AM -r --no-commit-id \$GIT_COMMIT | grep '.*cpp\|.*h' | xargs ./emacs-script" HEAD

    Which ./emacs-script is a script I wrote using emacs to change the code-style, it simply just call indent-region on each file.

    This code works fine if there is not any file that deleted or removed from repository, On that situation using --ignore-unmatch may be helpful but I'm not sure.

    ~没有更多了~
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文