如何读取 git diff 的输出?
git-diff 的手册页相当长,并解释了许多对于初学者来说似乎不必要的情况。例如:
git diff origin/master
The man page for git-diff
is rather long, and explains many cases which don't seem to be necessary for a beginner. For example:
git diff origin/master
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
让我们看一下 git 历史记录中的高级差异示例(在 git 中的 commit 1088261f 中) .git存储库):
让我们逐行分析这个补丁。
第一行
是一个“git diff”标头,格式为
diff --git a/file1 b/file2
。a/
和b/
文件名是相同的,除非涉及重命名/复制(就像我们的例子)。--git
表示 diff 是“git” diff 格式。接下来是一个或多个扩展标题行。前三位
告诉我们该文件已从
builtin-http-fetch.c
重命名为http-fetch.c
并且这两个文件文件有 95% 相同(用于检测此重命名)。扩展 diff 标头中的最后一行,即
告诉我们给定的模式文件(
100644
表示它是普通文件,而不是符号链接,并且它没有可执行权限位),以及关于原像(给定更改之前的文件版本)和后像的缩短哈希值(更改后文件的版本)。如果补丁本身无法应用,git am --3way
使用此行尝试进行 3 路合并。接下来是两行统一的 diff header
与
diff -U
结果相比,它在源之后没有 from-file-modification-time 或 to-file-modification-time (原像)和目标(后像)文件名。如果文件已创建,则源为/dev/null
;如果文件被删除,则目标为/dev/null
。如果将
diff.mnemonicPrefix
配置变量设置为 true,则代替a/<在这个两行标题中,您可以使用 /code> 和
b/
前缀来代替c/
、i/
、w/< /code> 和
o/
作为前缀,分别作为您比较的内容;请参阅 git-config(1)接下来是一个或多个差异;每个块都显示文件不同的一个区域。统一格式帅哥以这样的行开头
或
格式为
@@ from-file-range to-file-range @@ [header]
。 range 的格式为-,
,to-file-range 为+,< /code>。起始行和行数分别指原像和后像中块的位置和长度。如果行数未显示,则表示它是 1。
行开头 可选标头显示每次更改发生的 C 函数,如果它是 C 文件(如 GNU diff 中的
-p
选项),或等效项(如果有)其他类型的文件。接下来是文件不同之处的描述。两个文件共有的行以空格字符开头。两个文件之间实际不同的行在左侧打印列中具有以下指示符之一:
'+' -- 此处向第一个文件添加了一行。
'-' -- 从第一个文件中删除了一行。
例如,第一个块
意味着
cmd_http_fetch
被main
替换,并且添加了const char *prefix;
行。换句话说,在更改之前,“builtin-http-fetch.c”文件的相应片段如下所示:
更改后,现在“http-fetch.c”文件的该片段如下所示:
行存在(它不在示例差异中)。
正如Donal Fellows所说最好练习阅读现实生活中示例的差异,您知道自己更改了什么。
参考文献:
Lets take a look at example advanced diff from git history (in commit 1088261f in git.git repository):
Lets analyze this patch line by line.
The first line
is a "git diff" header in the form
diff --git a/file1 b/file2
. Thea/
andb/
filenames are the same unless rename/copy is involved (like in our case). The--git
is to mean that diff is in the "git" diff format.Next are one or more extended header lines. The first three
tell us that the file was renamed from
builtin-http-fetch.c
tohttp-fetch.c
and that those two files are 95% identical (which was used to detect this rename).The last line in extended diff header, which is
tell us about mode of given file (
100644
means that it is ordinary file and not e.g. symlink, and that it doesn't have executable permission bit), and about shortened hash of preimage (the version of file before given change) and postimage (the version of file after change). This line is used bygit am --3way
to try to do a 3-way merge if patch cannot be applied itself.Next is two-line unified diff header
Compared to
diff -U
result it doesn't have from-file-modification-time nor to-file-modification-time after source (preimage) and destination (postimage) file names. If file was created the source is/dev/null
; if file was deleted, the target is/dev/null
.If you set
diff.mnemonicPrefix
configuration variable to true, in place ofa/
andb/
prefixes in this two-line header you can have insteadc/
,i/
,w/
ando/
as prefixes, respectively to what you compare; see git-config(1)Next come one or more hunks of differences; each hunk shows one area where the files differ. Unified format hunks starts with line like
or
It is in the format
@@ from-file-range to-file-range @@ [header]
. The from-file-range is in the form-<start line>,<number of lines>
, and to-file-range is+<start line>,<number of lines>
. Both start-line and number-of-lines refer to position and length of hunk in preimage and postimage, respectively. If number-of-lines not shown it means that it is 1.The optional header shows the C function where each change occurs, if it is a C file (like
-p
option in GNU diff), or the equivalent, if any, for other types of files.Next comes the description of where files differ. The lines common to both files begin with a space character. The lines that actually differ between the two files have one of the following indicator characters in the left print column:
'+' -- A line was added here to the first file.
'-' -- A line was removed here from the first file.
So, for example, first chunk
means that
cmd_http_fetch
was replaced bymain
, and thatconst char *prefix;
line was added.In other words, before the change, the appropriate fragment of then 'builtin-http-fetch.c' file looked like this:
After the change this fragment of now 'http-fetch.c' file looks like this instead:
line present (it is not in example diff).
As Donal Fellows said it is best to practice reading diffs on real-life examples, where you know what you have changed.
References:
@@ -1,2 +3,4 @@
差异部分这部分花了我一段时间来理解,所以我创建了一个最小的示例。
格式与
diff -u
统一diff基本相同。例如:
这里我们删除了第 2、3、14 和 15 行。输出:
@@ -1,6 +1,4 @@
表示:-1,6 表示第一个文件的这一部分从第 1 行开始,总共显示 6 行。因此它显示第 1 行到第 6 行。
<前><代码>1
2
3
4
5
6
-
表示“旧”,因为我们通常将其调用为diff -u old new
。+1,4
表示第二个文件的这一部分从第 1 行开始,总共显示 4 行。因此它显示第 1 行到第 4 行。+
表示“新”。我们只有 4 行而不是 6 行,因为删除了 2 行!新帅哥就是:
<前><代码>1
4
5
6
@@ -11,6 +9,4 @@
第二个块是类似的:在旧文件上,我们有 6 行,从第 11 行开始旧文件:
<前><代码>11
12
13
14
15
16
在新文件上,我们有 4 行,从新文件的第 9 行开始:
<前><代码>11
12
13
16
请注意,第 11 行是新文件的第 9 行,因为我们已经删除了前一个块上的 2 行:2 和 3。
Hunk header
根据您的 git 版本和配置,您还可以在
@@
行旁边获取一个代码行,例如func1() {
in:这也可以通过普通
diff
的-p
标志来获得。示例:旧文件:
如果我们删除第
6
行,则差异显示:请注意,这不是
func1
的正确行:它跳过了1
行> 和2
。这个很棒的功能通常可以准确地告诉每个块属于哪个函数或类,这对于解释差异非常有用。
选择标头的算法如何准确工作在以下位置讨论:git diff hunk 标头中的摘录来自哪里?
一行 hunk 总结符号
这种情况非常罕见,但请考虑:
where:
-U0< /code>: 使用 0 行上下文
10
替换为hack
这种情况下的 diff 输出是:
所以我们知道,当有单行更改时,符号总结为仅显示一个数字,而不是
m,n
对。托德的回答引用的文档中记录了此行为:
单行块添加和删除看起来像这样,删除:
输出:
添加:
输出:
在 diff 3.8,Ubuntu 22.10 上测试。
) <(seq -w 16)输出:
在 diff 3.8,Ubuntu 22.10 上测试。
)这里我们删除了第 2、3、14 和 15 行。输出:
@@ -1,6 +1,4 @@
表示:-1,6 表示第一个文件的这一部分从第 1 行开始,总共显示 6 行。因此它显示第 1 行到第 6 行。
<前><代码>1
2
3
4
5
6
-
表示“旧”,因为我们通常将其调用为diff -u old new
。+1,4
表示第二个文件的这一部分从第 1 行开始,总共显示 4 行。因此它显示第 1 行到第 4 行。+
表示“新”。我们只有 4 行而不是 6 行,因为删除了 2 行!新帅哥就是:
<前><代码>1
4
5
6
@@ -11,6 +9,4 @@
第二个块是类似的:在旧文件上,我们有 6 行,从第 11 行开始旧文件:
<前><代码>11
12
13
14
15
16
在新文件上,我们有 4 行,从新文件的第 9 行开始:
<前><代码>11
12
13
16
请注意,第 11 行是新文件的第 9 行,因为我们已经删除了前一个块上的 2 行:2 和 3。
Hunk header
根据您的 git 版本和配置,您还可以在
@@
行旁边获取一个代码行,例如func1() {
in:这也可以通过普通
diff
的-p
标志来获得。示例:旧文件:
如果我们删除第
6
行,则差异显示:请注意,这不是
func1
的正确行:它跳过了1
行> 和2
。这个很棒的功能通常可以准确地告诉每个块属于哪个函数或类,这对于解释差异非常有用。
选择标头的算法如何准确工作在以下位置讨论:git diff hunk 标头中的摘录来自哪里?
一行 hunk 总结符号
这种情况非常罕见,但请考虑:
where:
-U0< /code>: 使用 0 行上下文
10
替换为hack
这种情况下的 diff 输出是:
所以我们知道,当有单行更改时,符号总结为仅显示一个数字,而不是
m,n
对。托德的回答引用的文档中记录了此行为:
单行块添加和删除看起来像这样,删除:
输出:
添加:
输出:
在 diff 3.8,Ubuntu 22.10 上测试。
)输出:
添加:
输出:
在 diff 3.8,Ubuntu 22.10 上测试。
)这里我们删除了第 2、3、14 和 15 行。输出:
@@ -1,6 +1,4 @@
表示:-1,6 表示第一个文件的这一部分从第 1 行开始,总共显示 6 行。因此它显示第 1 行到第 6 行。
<前><代码>1
2
3
4
5
6
-
表示“旧”,因为我们通常将其调用为diff -u old new
。+1,4
表示第二个文件的这一部分从第 1 行开始,总共显示 4 行。因此它显示第 1 行到第 4 行。+
表示“新”。我们只有 4 行而不是 6 行,因为删除了 2 行!新帅哥就是:
<前><代码>1
4
5
6
@@ -11,6 +9,4 @@
第二个块是类似的:在旧文件上,我们有 6 行,从第 11 行开始旧文件:
<前><代码>11
12
13
14
15
16
在新文件上,我们有 4 行,从新文件的第 9 行开始:
<前><代码>11
12
13
16
请注意,第 11 行是新文件的第 9 行,因为我们已经删除了前一个块上的 2 行:2 和 3。
Hunk header
根据您的 git 版本和配置,您还可以在
@@
行旁边获取一个代码行,例如func1() {
in:这也可以通过普通
diff
的-p
标志来获得。示例:旧文件:
如果我们删除第
6
行,则差异显示:请注意,这不是
func1
的正确行:它跳过了1
行> 和2
。这个很棒的功能通常可以准确地告诉每个块属于哪个函数或类,这对于解释差异非常有用。
选择标头的算法如何准确工作在以下位置讨论:git diff hunk 标头中的摘录来自哪里?
一行 hunk 总结符号
这种情况非常罕见,但请考虑:
where:
-U0< /code>: 使用 0 行上下文
10
替换为hack
这种情况下的 diff 输出是:
所以我们知道,当有单行更改时,符号总结为仅显示一个数字,而不是
m,n
对。托德的回答引用的文档中记录了此行为:
单行块添加和删除看起来像这样,删除:
输出:
添加:
输出:
在 diff 3.8,Ubuntu 22.10 上测试。
@@ -1,2 +3,4 @@
part of the diffThis part took me a while to understand, so I've created a minimal example.
The format is basically the same the
diff -u
unified diff.For instance:
Here we removed lines 2, 3, 14 and 15. Output:
@@ -1,6 +1,4 @@
means:-1,6
means that this piece of the first file starts at line 1 and shows a total of 6 lines. Therefore it shows lines 1 to 6.-
means "old", as we usually invoke it asdiff -u old new
.+1,4
means that this piece of the second file starts at line 1 and shows a total of 4 lines. Therefore it shows lines 1 to 4.+
means "new".We only have 4 lines instead of 6 because 2 lines were removed! The new hunk is just:
@@ -11,6 +9,4 @@
for the second hunk is analogous:on the old file, we have 6 lines, starting at line 11 of the old file:
on the new file, we have 4 lines, starting at line 9 of the new file:
Note that line
11
is the 9th line of the new file because we have already removed 2 lines on the previous hunk: 2 and 3.Hunk header
Depending on your git version and configuration, you can also get a code line next to the
@@
line, e.g. thefunc1() {
in:This can also be obtained with the
-p
flag of plaindiff
.Example: old file:
If we remove line
6
, the diff shows:Note that this is not the correct line for
func1
: it skipped lines1
and2
.This awesome feature often tells exactly to which function or class each hunk belongs, which is very useful to interpret the diff.
How the algorithm to choose the header works exactly is discussed at: Where does the excerpt in the git diff hunk header come from?
One line hunk summarized notation
This is very rare, but consider:
where:
-U0
: use 0 lines of context10
withhack
The diff output in that case is:
So we understand that when there's a single line change, the notation gets summarized to showing just one number instead of the
m,n
pair.This behavior is documented in the documentation quoted by Todd's answer:
And single line hunk addition and removal look like this, removal:
output:
addition:
output:
Tested on diff 3.8, Ubuntu 22.10.
) <(seq -w 16)output:
Tested on diff 3.8, Ubuntu 22.10.
)Here we removed lines 2, 3, 14 and 15. Output:
@@ -1,6 +1,4 @@
means:-1,6
means that this piece of the first file starts at line 1 and shows a total of 6 lines. Therefore it shows lines 1 to 6.-
means "old", as we usually invoke it asdiff -u old new
.+1,4
means that this piece of the second file starts at line 1 and shows a total of 4 lines. Therefore it shows lines 1 to 4.+
means "new".We only have 4 lines instead of 6 because 2 lines were removed! The new hunk is just:
@@ -11,6 +9,4 @@
for the second hunk is analogous:on the old file, we have 6 lines, starting at line 11 of the old file:
on the new file, we have 4 lines, starting at line 9 of the new file:
Note that line
11
is the 9th line of the new file because we have already removed 2 lines on the previous hunk: 2 and 3.Hunk header
Depending on your git version and configuration, you can also get a code line next to the
@@
line, e.g. thefunc1() {
in:This can also be obtained with the
-p
flag of plaindiff
.Example: old file:
If we remove line
6
, the diff shows:Note that this is not the correct line for
func1
: it skipped lines1
and2
.This awesome feature often tells exactly to which function or class each hunk belongs, which is very useful to interpret the diff.
How the algorithm to choose the header works exactly is discussed at: Where does the excerpt in the git diff hunk header come from?
One line hunk summarized notation
This is very rare, but consider:
where:
-U0
: use 0 lines of context10
withhack
The diff output in that case is:
So we understand that when there's a single line change, the notation gets summarized to showing just one number instead of the
m,n
pair.This behavior is documented in the documentation quoted by Todd's answer:
And single line hunk addition and removal look like this, removal:
output:
addition:
output:
Tested on diff 3.8, Ubuntu 22.10.
)output:
addition:
output:
Tested on diff 3.8, Ubuntu 22.10.
)Here we removed lines 2, 3, 14 and 15. Output:
@@ -1,6 +1,4 @@
means:-1,6
means that this piece of the first file starts at line 1 and shows a total of 6 lines. Therefore it shows lines 1 to 6.-
means "old", as we usually invoke it asdiff -u old new
.+1,4
means that this piece of the second file starts at line 1 and shows a total of 4 lines. Therefore it shows lines 1 to 4.+
means "new".We only have 4 lines instead of 6 because 2 lines were removed! The new hunk is just:
@@ -11,6 +9,4 @@
for the second hunk is analogous:on the old file, we have 6 lines, starting at line 11 of the old file:
on the new file, we have 4 lines, starting at line 9 of the new file:
Note that line
11
is the 9th line of the new file because we have already removed 2 lines on the previous hunk: 2 and 3.Hunk header
Depending on your git version and configuration, you can also get a code line next to the
@@
line, e.g. thefunc1() {
in:This can also be obtained with the
-p
flag of plaindiff
.Example: old file:
If we remove line
6
, the diff shows:Note that this is not the correct line for
func1
: it skipped lines1
and2
.This awesome feature often tells exactly to which function or class each hunk belongs, which is very useful to interpret the diff.
How the algorithm to choose the header works exactly is discussed at: Where does the excerpt in the git diff hunk header come from?
One line hunk summarized notation
This is very rare, but consider:
where:
-U0
: use 0 lines of context10
withhack
The diff output in that case is:
So we understand that when there's a single line change, the notation gets summarized to showing just one number instead of the
m,n
pair.This behavior is documented in the documentation quoted by Todd's answer:
And single line hunk addition and removal look like this, removal:
output:
addition:
output:
Tested on diff 3.8, Ubuntu 22.10.
这是一个简单的例子。
这里有一个解释:
--git
不是一个命令,这意味着它是 diff 的 git 版本(不是 unix)a/ b/
是目录,它们不是真实的。当我们处理同一个文件时,这只是一个方便(在我的例子中,a/在索引中,b/在工作目录中)10ff2df..84d4fa2
是这两个文件的blob ID100644
是“模式位”,表明这是一个常规文件(不可执行,也不是符号链接)--- a/file +++ b/file
减号显示行在 a/ 版本中,但在 b/ 版本中缺失;和加号显示 a/ 中缺少但 b/ 中存在的行(在我的情况下 --- 表示删除的行,+++ 表示 b/ 中添加的行,这是工作目录中的文件)@@ -1 ,5 +1,5 @@
为了理解这一点,最好使用大文件;如果您在不同的地方有两个更改,您将得到两个条目,例如@@ -1,5 +1,5 @@
;假设您有文件 line1 ... line100 并删除了 line10 并添加了新的 line100 - 您将得到:Here's the simple example.
Here's an explanation:
--git
is not a command, this means it's a git version of diff (not unix)a/ b/
are directories, they are not real. it's just a convenience when we deal with the same file (in my case a/ is in index and b/ is in working directory)10ff2df..84d4fa2
are blob IDs of these 2 files100644
is the “mode bits,” indicating that this is a regular file (not executable and not a symbolic link)--- a/file +++ b/file
minus signs shows lines in the a/ version but missing from the b/ version; and plus signs shows lines missing in a/ but present in b/ (in my case --- means deleted lines and +++ means added lines in b/ and this the file in the working directory)@@ -1,5 +1,5 @@
in order to understand this it's better to work with a big file; if you have two changes in different places you'll get two entries like@@ -1,5 +1,5 @@
; suppose you have file line1 ... line100 and deleted line10 and add new line100 - you'll get:默认输出格式(如果您想查找更多信息,它最初来自名为
diff
的程序)被称为“统一差异”。它本质上包含 4 种不同类型的行:+
开头,-
,以及我建议您练习阅读差异在文件的两个版本之间,您可以确切地知道您更改了什么。这样,当您看到它时,您就会意识到正在发生什么。
The default output format (which originally comes from a program known as
diff
if you want to look for more info) is known as a “unified diff”. It contains essentially 4 different types of lines:+
,-
, andI advise that you practice reading diffs between two versions of a file where you know exactly what you changed. Like that you'll recognize just what is going on when you see it.
在我的 Mac 上:
info diff
然后选择:输出格式
->上下文
->统一格式
->详细统一
:或 gnu 上的在线 man diff 按照相同的路径到达同一部分:
On my mac:
info diff
then select:Output formats
->Context
->Unified format
->Detailed Unified
:Or online man diff on gnu following the same path to the same section:
从您的问题中不清楚您发现差异的哪一部分令人困惑:实际差异或 git 打印的额外标头信息。以防万一,这里是标题的快速概述。
第一行类似于
diff --git a/path/to/file b/path/to/file
- 显然它只是告诉您 diff 的这一部分是用于哪个文件。如果您设置布尔配置变量diff.mnemonic prefix
,则a
和b
将更改为更具描述性的字母,例如c< /code> 和
w
(提交和工作树)。接下来,有“模式行” - 向您提供不涉及更改文件内容的任何更改的描述的行。这包括新的/删除的文件、重命名/复制的文件以及权限更改。
最后,有一行类似于
index 789bd4..0afb621 100644
。您可能永远不会关心它,但那些 6 位十六进制数字是该文件的旧 Blob 和新 Blob 的缩写 SHA1 哈希值(Blob 是存储原始数据(如文件内容)的 git 对象)。当然,100644
是文件的模式 - 最后三位数字显然是权限;前三个提供额外的文件元数据信息(SO帖子描述了这一点)。之后,您将获得标准的统一差异输出(就像经典的
diff -U
一样)。它被分成块 - 块是文件中包含更改及其上下文的部分。每个块前面都有一对---
和+++
行,表示有问题的文件,然后实际的差异(默认情况下)是三行上下文-
和+
行的两侧显示已删除/添加的行。It's unclear from your question which part of the diffs you find confusing: the actually diff, or the extra header information git prints. Just in case, here's a quick overview of the header.
The first line is something like
diff --git a/path/to/file b/path/to/file
- obviously it's just telling you what file this section of the diff is for. If you set the boolean config variablediff.mnemonic prefix
, thea
andb
will be changed to more descriptive letters likec
andw
(commit and work tree).Next, there are "mode lines" - lines giving you a description of any changes that don't involve changing the content of the file. This includes new/deleted files, renamed/copied files, and permissions changes.
Finally, there's a line like
index 789bd4..0afb621 100644
. You'll probably never care about it, but those 6-digit hex numbers are the abbreviated SHA1 hashes of the old and new blobs for this file (a blob is a git object storing raw data like a file's contents). And of course, the100644
is the file's mode - the last three digits are obviously permissions; the first three give extra file metadata information (SO post describing that).After that, you're on to standard unified diff output (just like the classic
diff -U
). It's split up into hunks - a hunk is a section of the file containing changes and their context. Each hunk is preceded by a pair of---
and+++
lines denoting the file in question, then the actual diff is (by default) three lines of context on either side of the-
and+
lines showing the removed/added lines.