Git talk
Vokinloksar33 November 09, 2017 #git #tech #learning #kl00kThis is the archive of the git talk I gave at a share session at Klook.
Goal:
-
solve more problem, increase efficiency. avoid error.
-
knowing what is happening.
-
choose the right tool to do the right thing
-
find out what happened in git history, AKA: 'who did this?!!'
-
recover from an error stage.
-
best practices
-
workflow (chen)
Note:
- this is a discussion.
- my opinions are biased
Ignored
- basics: getting and creating project, setup and config, clone, .gitignore (hide changes you don't want to see or track.), basic usage
- things are useful but simple to know: tagging, prettify
Why git
source code control manage system used for linux kernel
-
distributed
-
performance
-
integrity (SHA-1, used ubiquitously)
-
all other vcs are trash
-
BitKeeper no longer
What is git
- Git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it.
- It means that at the core of Git is a simple key-value data store. What this means is that you can insert any kind of content into a Git repository, for which Git will hand you back a unique key you can use later to retrieve that content.
Git Objects
Blob Object (Binary Large OBject)
-
Now, let’s use git hash-object to create a new data object and manually store it in your new Git database:
-
In its simplest form, git hash-object would take the content you handed to it and merely return the unique key that would be used to store it in your Git database. The -w option then tells the command to not simply return the key, but to write that object to the database. Finally, the --stdin option tells git hash-object to get the content to be processed from stdin;
- git-cat-file - Provide content or type and size information for repository objects
Tree Object
-
All the content is stored as tree and blob objects, with trees corresponding to UNIX directory entries and blobs corresponding more or less to inodes or file contents. A single tree object contains one or more tree entries, each of which contains a SHA-1 pointer to a blob or subtree with its associated mode, type, and filename.
-
You can fairly easily create your own tree. Git normally creates a tree by taking the state of your staging area or index and writing a series of tree objects from it. So, to create a tree object, you first have to set up an index by staging some files. To create an index with a single entry — the first version of your test.txt file — you can use the plumbing command git update-index. You use this command to artificially add the earlier version of the test.txt file to a new staging area. You must pass it the --add option because the file doesn’t yet exist in your staging area (you don’t even have a staging area set up yet) and --cacheinfo because the file you’re adding isn’t in your directory but is in your database. Then, you specify the mode, SHA-1, and filename:
-
Git normally creates a tree by taking the state of your staging area or index and writing a series of tree objects from it.
- Now, you can use git write-tree to write the staging area out to a tree object.
)
)
)
- You’ll now create a new tree with the second version of test.txt and a new file as well:
)
)
)
Your staging area now has the new version of test.txt as well as the new file new.txt. Write out that tree (recording the state of the staging area or index to a tree object) and see what it looks like:
)
)
- Just for fun, you’ll add the first tree as a subdirectory into this one. You can read trees into your staging area by calling git read-tree. In this case, you can read an existing tree into your staging area as a subtree by using the --prefix option with this command:
)
)
)
Commit Object
- now have three trees that represent the different snapshots of your project that you want to track
- you must remember all three SHA-1 values in order to recall the snapshots. You also don’t have any information about who saved the snapshots, when they were saved, or why they were saved.
|
)
- You will get a different hash value because of different creation time and author data.
)
- The format for a commit object is simple: it specifies the top-level tree for the snapshot of the project at that point; the author/committer information (which uses your user.name and user.email configuration settings and a timestamp); a blank line, and then the commit message.
|
)
|
)
- Each of the three commit objects points to one of the three snapshot trees you created.
)
|
)
|
|
) )
|
)
- Amazing. You’ve just done the low-level operations to build up a Git history without using any of the front end commands. This is essentially what Git does when you run the git add and git commit commands — it stores blobs for the files that have changed, updates the index, writes out trees, and writes commit objects that reference the top-level trees and the commits that came immediately before them.
-
commit
- def: Stores the current contents of the index in a new commit along with a log message from the user describing the changes.
- Git stores a commit object that contains a pointer to the snapshot of the content you staged. This object also contains the author’s name and email, the message that you typed, and pointers to the commit or commits that directly came before this commit (its parent or parents):
- When you commit in Git, Git stores a commit object that contains a pointer to the snapshot of the content you staged, the author and message metadata, and zero or more pointers to the commit or commits that were the direct parents of this commit: zero parents for the first commit, one parent for a normal commit, and multiple parents for a commit that results from a merge of two or more branches.
- The action of storing a new snapshot of the project’s state in the Git history, by creating a new commit representing the current state of the index and advancing HEAD to point at the new commit.
How git works
Git doesn’t store data as a series of changesets or deltas, but instead as a series of snapshots.
.git
- The Git directory is where Git stores the metadata and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer.
- ./objects
- The objects directory stores all the content for your database,
- ./refs
- the refs directory stores pointers into commit objects in that data (branches, tags, remotes and more),
- You can run something like git log 1a410e to look through your whole history, but you still have to remember that 1a410e is the last commit in order to walk that history to find all those objects. You need a file in which you can store the SHA-1 value under a simple name so you can use that pointer rather than the raw SHA-1 value.
- In Git, these are called “references” or “refs;”
- ./HEAD
- the HEAD file points to the branch you currently have checked out,
- A named reference to the commit at the tip of a branch.
- ./index
- the index file is where Git stores your staging area information
- A collection of files with stat information, whose contents are stored as objects. The index is a stored version of your working tree https://stackoverflow.com/questions/4084921/what-does-the-git-index-contain-exactly/4086986#4086986
- ./objects
HEAD, INDEX, WORKING DIR
-
HEAD
- The HEAD in Git is the pointer to the current branch reference, which is in turn a pointer to the last commit you made or the last commit that was checked out into your working directory. That also means it will be the parent of the next commit you do. It's generally simplest to think of it as HEAD is the snapshot of your last commit.
-
INDEX
- index: The "index" holds a snapshot of the content of the working tree, and it is this snapshot that is taken as the contents of the next commit. Thus after making any changes to the working tree, and before running the commit command, you must use the add command to add any new or modified files to the index.it's simplest to think of it as the Index is the snapshot of your next commit.
)
-
WORKING DIR
- Finally, you have your working directory. This is where the content of files are placed into actual files on your filesystem so they're easily edited by you. The Working Directory is your scratch space, used to easily modify file content.
)
- workflow: So, Git is all about recording snapshots of your project in successively better states by manipulating these three trees, or collections of contents of files
you can use reset to update part of the Index or the Working Directory with previously committed content this way.
- an example of reset
https://git-scm.com/blog/2011/07/11/reset.html // this could be used as a comprehensive example.
Basics
-
file
-
The lifecycle of the status of your files.
-
-
diff
-
add
- In order to begin tracking a new file, you use the command git add.
- Defination: This command updates the index using the current content found in the working tree, to prepare the content staged for the next commit.
-
status
- defi: Displays paths that have differences between the index file and the current HEAD commit, paths that have differences between the working tree and the index file, and paths in the working tree that are not tracked by Git
-
branch
- what defines a branch, where stored?
- A branch in Git is simply a lightweight movable pointer to one of these commits.
- Because a branch in Git is actually a simple file that contains the 40 character SHA-1 checksum of the commit it points to
- What happens if you create a new branch? Well, doing so creates a new pointer for you to move around. Let’s say you create a new branch called testing. You do this with the git branch command: show an example https://git-scm.com/book/en/v1/Git-Branching-What-a-Branch-Is
-
checkout
- def: Updates files in the working tree to match the version in the index or the specified tree.
- This moves HEAD to point to the testing branch
Advanced
-
remote branch, track
git push -u
- origin/master is local
-
merge
-
Join two or more development histories together
-
Incorporates changes from the named commits (since the time their histories diverged from the current branch) into the current branch. This command is used by git pull to incorporate changes from another repository and can be used by hand to merge changes from one branch into another.
A---B---C topic / D---E---F---G master
-
Then "git merge topic" will replay the changes made on the topic branch since it diverged from master (i.e., E) until its current commit (C) on top of master, and record the result in a new commit along with the names of the two parent commits and a log message from the user describing the changes.
A---B---C topic / \ D---E---F---G---H master
-
-
rebase
-
Reapply commits on top of another base tip
-
All changes made by commits in the current branch but that are not in
are saved to a temporary area. A---B---C topic / D---E---F---G master
-
The commits that were previously saved into the temporary area are then reapplied to the current branch, one by one, in order. to:
A'--B'--C' topic / D---E---F---G master
-
can also be used as local cleanup
-
need
git push --force
. be careful!
-
-
merge vs rebase
- Both of these commands are designed to integrate changes from one branch into another branch—they just do it in very different ways.
- On the other hand, this also means that the feature branch will have an extraneous merge commit every time you need to incorporate upstream changes. If master is very active, this can pollute your feature branch’s history quite a bit. While it’s possible to mitigate this issue with advanced git log options, it can make it hard for other developers to understand the history of the project.
- The major benefit of rebasing is that you get a much cleaner project history.
- First, it eliminates the unnecessary merge commits required by git merge.
- Second, as you can see in the above diagram, rebasing also results in a perfectly linear project history—you can follow the tip of feature all the way to the beginning of the project without any forks.
- golden rule: The golden rule of git rebase is to never use it on public branches.
-
git-cherry-pick
- Apply the changes introduced by some existing commits
- In SCM jargon, "cherry pick" means to choose a subset of changes out of a series of changes (typically commits) and record them as a new series of changes on top of a different codebase. In Git, this is performed by the "git cherry-pick" command to extract the change introduced by an existing commit and to record it based on the tip of the current branch as a new commit.
-
commit --amend
- recover from a bad amend https://stackoverflow.com/questions/38001038/how-to-undo-a-git-commit-amend
-
detached HEAD
- Normally the HEAD stores the name of a branch, and commands that operate on the history HEAD represents operate on the history leading to the tip of the branch the HEAD points at. However, Git also allows you to check out an arbitrary commit that isn’t necessarily the tip of any particular branch. The HEAD in such a state is called "detached".
-
reflog
- def: Reference logs, or "reflogs", record when the tips of branches and other references were updated in the local repository.
- A reflog shows the local "history" of a ref.
-
worktree
- def: Manage multiple working trees attached to the same repository. A git repository can support multiple working trees, allowing you to check out more than one branch at a time.
- The working tree is a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk for you to use or modify
about error
-
know your status.(where am I? what am I doing? where do I go?)
-
know from where to where
-
don't just listen to git status
- rebase: don't pull again
-
do check git log
-
reflog
-
stackoverflow
Best Practice
-
dos:
- know more about git
- branching
- The way Git branches is incredibly lightweight, making branching operations nearly instantaneous, and switching back and forth between branches generally just as fast.
- prettify
- make alias
- git_prompt
- worktree
- git status often
- promoting: tagging
- git stash
- backup branch
- use a workflow
- keep up to date, find conflicts earlier
- keep the reflog
- commit
- compact your commits before deploy/push
- commit often
- write good commit message
- git diff before commit
- good branch naming
-
don'ts
- don't
git reset --hard
, - don't clean file.
git checkout -- <file> / git clean -f
- don't specify remote branch every time
- don't rewrite public history
- don't keep life-time branch.
- use
git fetch
rather thangit pull
# because git pull will do merge, but we want rebase.
- don't
https://sethrobertson.github.io/GitBestPractices/