Rebase and checkout are wildly different commands, with different goals. Neither goal exactly matches your own—which is or seems to be to inspect something—but checkout comes much closer.
I'm afraid I blow right past the vocabulary limits for that ???? but let's start with the proper basics, which too many Git users have skipped (for reasons good or bad, but the end result was bad).
Git is about commits
The basic unit of storage in Git is the commit. A Git repository is a collection of commits, stored in a big database that Git calls the object database. A Git repository has several more parts, which we'll get to in a moment, but this first one—the object database—is essential: without it there's no repository.
The object database is a simple key-value store, using what Git calls OIDs or Object IDs to look up the objects. The most important kind of object for our purposes—in fact, the only one we really care about—is the commit object, which holds the first part of any commit. So our commits, in Git, have these OIDs. We'll call them hash IDs to avoid getting caught up in too many TLAs—Three Letter Acronyms—and probably, eventually, RAS syndrome. Some call them SHA or SHA-1, because Git initially (and currently) uses the SHA-1 crytographic hash as its hash IDs, but Git is no longer wedded to SHA-1, so "hash ID" or "OID" is more appropriate.
An OID or hash ID is a big ugly string of letters and digits, such as
. This is actually a very large number, expressed in hexadecimal. Git needs these to find its objects. The ID is unique to that particular object: no other object, in the big objects database, can have that ID. Every commit you make has to get a new random-looking number, never-before-used, never to be used again ever, in any Git repository, unless it's being used to store your commit. Making this actually work is hard—technically, it's impossible1—but the sheer size of the hash ID makes it work in practice. A Git doomsday may come someday (see How does the newly found SHA-1 collision affect Git?) but it won't be for a while yet.1See the pigeonhole principle.
Git is not about branches or files
If Git commits did not store files, Git would be useless. So commits do store files. But commits are not files themselves, and a file is not Git's "unit of work" as it were. Git is about the commits, which sort of accidentally-on-purpose contain files.
The word branch, in Git, is very badly overused, almost to the point of meaninglessness.2 There are at least two or three things people mean when they say branch here, and it can get very confusing, although once you've got the basics down you'll find yourself right among all the other people casually tossing the word branch out in a sentence, maybe more than once in the same sentence, with each word meaning something different, yet the whole thing seems totally obvious.
To help keep this straight, I like to (try at least) to use the phrase branch name when referring to a name like
, and so on. A branch name, in Git, is a fast and important way to find one particular commit. Humans use these because human brains are no good at working with hash IDs: they're too big, ugly, and random-looking.A repository therefore keeps a separate database—another simple key-value store—in which each key is a name and the value is the big ugly hash ID that goes with that name. Branch names are one of the many kinds of names that Git sticks in this second database. So, you can give Git a branch name; Git will look up the hash ID, and find the latest commit for that branch.
In this sense, we use branches—or more precisely, branch names—in Git to get to our commits. But Git isn't about these branches, really; it's still about the commits.
2For an even more extreme example of this problem, see Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo. For more on Git's abuse of the word branch, see What exactly do we mean by "branch"?
What's in a commit
Now that we know Git is all about commits, let's take a look at an actual raw commit. Here's the one I referred to above:
That's the raw commit object, and it actually consists entirely of the commit's metadata.
A commit object has two parts:
Every commit has a full snapshot of all of the files that make up that particular commit. In a real commit like the one above, that's the
line, which is required: there must be one and only onetree
.Every commit also has some metadata. That's the entire chunk of text above, really (including the
line itself).Note that the metadata tells us who made the commit, and when: the magic number
above is a date-and-time-stamp meaningThu May 5 14:36:37 2022
. The-0700
is the time zone, which in this case is Pacific Daylight Time or UTC-7. (It could be Mountain Standard Time which is also UTC-7, and is in use right now in the Navajo Nation areas of Arizona, but you can pretty safely bet that this was not Junio Hamano's actual location at the time.) It also has the committer's commit message, which in this case is remarkably short: compare with, e.g., a snippet fromf8781bfda31756acdc0ae77da7e70337aedae7c9
:which is a much better commit message. (Excluding the updated tests and a comment in
, the fix itself just adds three lines tobuiltin/diff-tree.c
.)The other really important part of the metadata, which Git sets up on its own, is the
line. There can be more than oneparent
line—or, rarely, no parent line—because each commit carries, in its metadata, a list of parent hash IDs. These are just the raw hash IDs of some existing commits in the repository, that were there when you, or Junio, or whoever, added a new commit. We'll see in a moment what these are for.Review so far
A repository has two databases:
A working tree
Now, one of the tricks to making the hash IDs work, in Git, is that no part of any object can ever change. A commit, once made, is the way it is forever. That commit, with that hash ID, holds those files and that metadata and thus has that parent (or those parents) and so on. Everything is frozen for all time.
The files inside a commit are stored in a special, read-only, compressed (sometimes highly compressed), de-duplicated format. That avoids having the repository bloat up even though most commits mostly re-use most of the files from their parent commit(s). Because the files are de-duplicated, the duplicates literally take no space. Only a changed file needs any space.
But there's an obvious problem:
If we're going to get any work done, we must have ordinary files, that ordinary programs can both read and write. Where will we get those?
Git's answer is to provide, with any non-bare repository,3 an area in which you can do your work. Git calls this area—a directory-tree or folder full of folders, or whatever terminology you like—your working tree, or work-tree for short. In fact, the typical setup is to have the repository proper live inside a hidden
directory at the top level of the working tree. Everything inside this is Git's; everything outside it, at the top level of the working tree and in any sub-directory (folder) within it other than.git
itself, is yours.3A bare repository is one without a work-tree. This might seem kind of redundant or pointless, but it does actually have a function: see What problem is trying to solve a Git --bare repo?
git checkout
orgit switch
is aboutWhen you check out some commit—with
git checkout
orgit switch
and a branch name—you're telling Git:Git takes a big short-cut here when it can: if you're moving from commit
, and most of the files in those two commits are de-duplicated, Git won't actually bother with the remove-and-replace for these files. This short-cut becomes important later, but if you start out thinking ofgit checkout
/git switch
as meaning: remove the current commit's files, change to a new current commit, and extract those files you have a good start.How commits get strung together
Let's revisit the commit itself for a bit now. Each commit has, in its metadata, some set of
lines. Most commits (by far in most repositories) have exactly one parent and that's the thing to start with.Let's draw the commits in a simple, tiny, three-commit repository. The three commits will have three big ugly random-looking hash IDs, but rather than make some up, let's just call them commits
, andC
in that order. CommitA
was the very first commit—which is a bit special because it has no parent commit—and then you madeB
while using commitA
, and madeC
while usingB
. So we have this:That is, commit
, the latest commit, has some files as its snapshot, and has, as its parent, the raw hash ID of commitB
. We say thatC
points toB
.Meanwhile, commit
has some files as its snapshot, and has commitA
as its parent. We say thatB
points toA
.Your branch name, which we'll assume is
, points to the latest commitC
:(here I get lazy about drawing the arrows between commits as arrows, but they're still backwards-pointing arrows, really).
When you
git checkout main
, Git extracts all the commit-C
files into your working tree. You have those files available to view and edit.If you do edit some, you use
git add
andgit commit
to make a new commit. This new commit gets an all-new, never been used before anywhere in any Git repository in the universe, hash ID, but we'll just call this new commitD
. Git will arrange for new commitD
to point backwards to existing commitC
, becauseC
is the one you've been using, so let's draw in new commitD
:(The backwards slash going up-and-left from
is why I get lazy about the arrows—there are some arrow fonts but they don't work all that well on StackOverflow, so we just have to imagine the arrow fromD
.)But now
is the latestmain
commit, sogit commit
also storesD
's hash ID into the namemain
so thatmain
now points toD
:(and now there's no reason to use extra lines to draw things; I just kept it for visual symmetry).
This is one way a branch grows, in Git. You check out the branch, so that it's your current branch. Its tip-most commit—the one towards the right in this drawing, or towards the top in
git log --graph
output—becomes your current commit and those are the files you see in your working tree. You edit those files, usegit add
, and rungit commit
, and Git packages up the new files—with automatic de-duplication, so that if you change a file back to the way it was inB
, it gets de-duplicated here!—into a new commit, then stuffs the new commit's hash ID into the current branch name.How branches form
Let's say we start out with that same three-commit repository:
Let's now create a new branch name
. This name must point to some existing commit. There are only three commits, so we have to pick one ofA
, orC
, for the namedev
to point-to. The obvious one to use is the most recent: we probably don't need to go back in time to commitB
to start adding new commits. So let's adddev
so that it also points toC
, by running:We get:
It's hard to tell from our drawing: are we on
? That is, if we rungit status
, which will it say, "on branch dev" or "on branch main"? Let's add a special name,HEAD
in all uppercase like this, and attach it to one of the two branch names, to show which name we are using:We are "on" branch
. If we make a new commit now, commitD
will point back to commitC
as usual, and Git will stick the new hash ID into the namemain
.But if we run:
Git will remove, from our working tree, all the commit-
files, and put in all the commit-C
files instead. (Seems kind of silly, doesn't it? Short-cut! Git won't actually do any of that!) Now we have:and when we make our new commit
we get:If we
git checkout main
, Git will remove the commit-D
files and install the commit-C
files, and we'll be back to:and if we now make another new commit we will get:
This is how branches work in Git. A branch name, like
, picks out a last commit. From there, Git works backwards. CommitE
might be the lastmain
commit, but commitsA-B-C
are onmain
because we get to them when we start fromE
and work backwards.Meanwhile, commit
is the lastdev
commit, but commitsA-B-C
are ondev
because we get to them when we start fromD
and work backwards. CommitD
is not onmain
because we never reach commitD
when we start fromE
and work backwards: that skips right overD
We now know:
Now we'll get to
git rebase
git rebase
is aboutWe often find ourselves using Git and stuck in this kind of situation:
and we say to ourselves: Gosh, it would be nice if we had started out feature later, when
had commitG
in it, because we need what's in those now.There's nothing fundamentally wrong with commits
and we could just usegit merge
, but for whatever reason—the boss says so, the co-workers have decided they like a rebase flow, whatever it might be—we decide that we're going to "improve" ourC-D-E
commits. We're going to re-make them so that they come afterF-G-H
, like this:We can, quite literally, do this by check out commit
, making a new branch, and then re-doing our work:What
git rebase
does is automate this for us. If we were to do it manually, each "redo" step would involve usinggit cherry-pick
(which I won't go into in any detail here). Thegit rebase
command automates the cherry-picking for us, and then adds one other twist: instead of requiring a new branch name likeimproved-feature
, it simply yanks the old branch name off the old commits and makes it point to the new ones:The old abandoned commits are actually still there, in Git, for at least 30 days or so. But with no name by which to find them, you can only see those commits if you have saved their hash IDs, or have some trick by which to find those hash IDs.4
When the rebase finishes completely, our original commits are copied to new-and-improved commits. The new commits have new and different hash IDs, but since no human ever notices the actual hash IDs, a human who looks at this repository just sees three
-branch-only commits and assumes they have magically been changed into the new improved ones.54Git comes with some handy tricks built-in, but we won't cover them here.
5Git sees the truth, and if you connect your Git repository to some other Git repository, they will have ... words, or a long conversation, about this and it can make a big mess if you don't know what you're doing. Basically, if they still have your originals, you can wind up getting them back when you thought you'd gotten rid of them! Any time you connect two Git repositories, you generally have one hand over any new commits it has that the other one is missing. This is where the magic of the hash IDs really comes into effect: they do this all by hash ID alone.
The bottom line here is that you should only rebase commits when all users of those commits agree that those commits can be rebased. If you're the only user, you just have to agree with yourself, so that's a lot easier. Otherwise, get agreement in advance from all other users before you start rebasing.
将在更新分支的顶部重新列出我的任何本地提交,不仅要进行我的评论,还要检查我的本地(尚未推动)代码/提交是否仍在更新的远程分支的顶部工作。To review a remote branch (I don't have yet), I prefer
git switch aBranch
: its guess mode would automatically set a remote tracking branchorigin/aBranch
, allowing me to do simplegit pull
to update it in the future review instances.That would be the same as
git switch -c <branch> --track <remote>/<branch>
I also prefer setting
That way, a
git pull
on that branch would rebase any of my local commits on top of the updated branch, not only for my review, but also to check if my local (not yet pushed) code/commits still work on top of the updated remote branch.