Workflow with Git

For all the times I accidentally commit a file that is too big

workflow
Author

Sarah Tanja

Published

November 5, 2023

Modified

September 1, 2025

1 Setup for .gitignore

.gitignore cannot filter files by size — it only filters by path patterns, like filenames, extensions, or directory names. It doesn’t support dynamic criteria like file size, modification time, etc.

An example template for a .gitignore while working with large bioinformatics files:

.Rproj.user 
.Rhistory 
.RData 
.Ruserdata 
.bashvars 
*.bam
*.sam 
*.fastq
*.fq
*.fastq.gz
*.fq.gz 
*.fasta
*.fa 
*.gff3

2 Normal routine for git push

  1. git status : check on staged, not staged, and untracked files
  2. find . -type f -size -50M -exec git add {} +
    1. find .: Starts the search from the current directory.
    2. -type f: Finds only files (not directories).
    3. -size -50M: Limits the size to files smaller than 50 megabytes.
    4. -exec git add {} +: Executes the git add command on all found files, adding them to the Git staging area.
  3. git status : run again to double check staged, not staged, and untracked files
  4. git commit -m "your message here"
  5. git pull : merge if needed, see Write & quit vim during a merge
  6. git push

3 Write & quit vim during a merge

  1. Press INS key to insert text, write a short message about the merge

  2. Press Esc to make sure you’re in normal mode.

  3. Type : to enter command mode.

  4. Then type wq and press Enter

4 Setup git hooks to automatically gitignore large files

We can use built-in hooks to automatically ignore files larger than 100 MB (no matter the directory or file name!). Here are the steps to follow:

  • Create a new text file in the .git/hooks/ directory of your repository called pre-commit

  • Select the More tab with the gear icon under the RStudio Files navigator bar and select show hidden files to see the .git folder.

  • Add the following text to the .git/hooks/pre-commit file:

#!/bin/sh

# Maximum file size (in bytes)
max_file_size=104857600

# Find all files larger than max_file_size and add them to the .gitignore file
find . -type f -size +$max_file_size -exec echo "{}" >> .gitignore \;

This code sets the max_file_size variable to 100 MB and then uses the find command to locate all files in the repository that are larger than the specified max_file_size. The exec option of the find command appends the name of each file that matches the criteria to the .gitignore file.

or

#!/bin/bash

echo “automatically ignoring large files” find . -size 5M | sed ‘s|^./||g’ >> .gitignore cat .gitignore | sort | uniq > .gitignore

git diff –exit-code .gitignore exit_status=$? if [ $exit_status -eq 1 ] then set +e for i in cat .gitignore do set +e git rm –cached $i done

git add .gitignore
git commit .gitignore --no-verify -m "ignoring large files"

echo "ignored new large files"

It is pretty brute force and the downside is that in case there were new large files added by the git hook, the origin commit fails because the state (hash) changed. So you need to execute another commit to actually commit what you have staged. Consider this as a feature telling you that new large files were detected ;-)

Save the pre-commit file and make it executable by running the following command in Terminal from your base git directory:

chmod +x .git/hooks/pre-commit

With these changes, whenever you run a git commit command, Git will first execute the pre-commit hook, which will automatically add any files larger than 100 MB to the .gitignore file. This will prevent Git from tracking these files in the repository going forward.

5 Manually execute code to find and add files >1G to .gitignore before committing

Use the following code as a bash terminal command to find and add files >1G from repo to .gitignore find . -size +1G | sed 's|^./||g' | cat >> .gitignore from THIS Robert’s Lab Course Issue

5.1 Using git revert:

-   Open your terminal or command prompt.

-   Navigate to the repository directory using the **`cd`** command.

-   Execute the following command to revert the most recent commit:

    `git revert HEAD`

-   Git will create a new commit that undoes the changes introduced by the previous commit. A text editor will open for you to provide a commit message. Save and close the editor to complete the revert process.

5.2 Using git reset

-   Open your terminal or command prompt.

-   Navigate to the repository directory using the **`cd`** command.

-   Execute the following command to reset the repository to the commit before the most recent one:

    `git reset HEAD~1`

    This command moves the branch pointer to the commit before the most recent one, effectively undoing the last commit.

-   Note that the changes introduced in the undone commit will still be present in your working directory. You can then modify or discard them as needed.

Please exercise caution when using these commands, as they modify the Git history. If you have already pushed the commit you want to undo to a remote repository, it’s generally not recommended to modify the history, as it can cause conflicts for other team members. In that case, it may be better to consider reverting the changes in a new commit and pushing that instead.

6 Make a backup and use rsync

If you’ve made lots of changes along with the large file commit that you don’t want to redo, the best option is to make a backup of your local directory, re-clone the remote git repo, and then use rsync to copy over all your local changes minus the .git folder with its history…

  1. Turn your local directory into a backup

    mv git-repo git-repo-BU

  2. Clone the clean repo from GitHub

    git clone https://github.com/sarahtanja/git-repo.git

  3. Restore large files using rsync , excluding .git

    rsync -av --progress --exclude='.git/' git-repo-BU/ git-repo/

7 Burn it down with git reset –hard

Warning

⚠️warning this will overwrite any changes you made after your last successful push⚠️

If you still want to continue, you can un-comment the code and follow this instruction:

  1. Update all origin/<branch> refs to latest:
#git fetch --all
  1. Backup your current branch (e.g. master):
#git branch backup-branch
  1. Jump to the latest commit on origin/master :
#git reset --hard origin/main