Reducing the size of a repository

The best way to keep your repository at a manageable size is to use the .gitignore feature to make sure large files are not being part of your git repository in the first place. But sometimes it can be too late, as you already committed large files in the past. Or maybe at some point you legitimately needed some larger files, but now they are obsolete.

Tip

The removal of files from the Git history is also useful if you accidentally pushed secrets such as passwords, API keys or other private information to your repository. Without rewriting the history, this information would permanently linger in your Git history.

Either way, you might want to permanently remove files from your Git repository, to shrink its overall size or to remove individual files that weren't meant to be committed to it.

The first step to achieving this, is deleting the files from the current state of your branch, using the regular git rm filename.txt approach. Once these files are no longer in the current HEAD of your branch, you can rewrite the history of your Git repository. This ensures that the files do not remain in the overall Git history, where they would continue to take up space, or in case of accidentally committed secrets, remain accessible to others.

Warning

Rewriting your Git history is a destructive process. You should make a backup of your repository before attempting rewriting the history.

Additionally, once you have rewritten the history, everyone who has made a working copy of your repository will have ensure that they have copies with the correct, new history.

If your repository has open pull requests, this will introduce conflicts for these, too.

Getting started

Installing the necessary tools

Install git-filter-repo, which will be used to actually perform the history rewrites. Optionally, you can also install git-sizer, if you want to find large files that are no longer used in your history.

Make a mirror clone of the repository

This step will ensure that you get a full copy of your repository, including all your references.

You can use the git clone --mirror flag to create such a clone,. e.g.,

git clone --mirror [email protected]:your_user_name/your_repo.git

It's important that you clone your repository using the --mirror flag to be able to rewrite the history.

Identifying files to remove & removing them

Optionally: Run git-filter-repo’s analyze command.

This optional step can help you identify files that are already deleted from the current state of your branch, but that are still in the history. For example, this would help for finding be large build files or assets that were accidentally committed and already, but still take space in the history.

To identify such files, you can run:

git-filter-repo --analyze
head filter-repo/analysis/*-{all,deleted}-sizes.txt

# and/or

git-sizer -v

Run git-filter-repo to rewrite the history and permanently remove the files

This is the actual history-rewriting that will take place on your local mirror clone. As a reminder: You should only try to use git-filter-repo to remove files from your repository's history that are already no longer in use in your current HEAD.

For example, let's assume you at some point accidentally had committed a folder dist/. You already removed it from the current state of your repository using git rm -r dist/. But to minimize the size of your repo, you now also want to remove it from your history.

To achieve this, you can run:

git-filter-repo --path dist/ --invert-paths

You give dist/ as the path, and using --invert-paths you tell git-filter-repo that you want to keep all files, except the ones specified using --path.

  1. Run git-filter-repo analyze/git-sizer again to check that the repository size has indeed been reduced:

    git-filter-repo --analyze --force
    head filter-repo/analysis/*-{all,deleted}-sizes.txt
    
    # and/or
    
    git-sizer -v

So far, all of these changes have been applied to your local mirror copy of your repository, once you are happy with the rewritten history (e.g. that the size has successfully been reduced or that the secrets were removed), you can push those changes to Codeberg.

Replacing the history on Codeberg

Warning

The following step replaces your repository's history on Codeberg in a destructive operation.

Ensure that you have a pre-rewrite backup of your repository somewhere, otherwise you will not be able to undo in case anything goes wrong.

  1. Turn off the mirror flag and carry out force pushes to your remote

    git config --unset remote.origin.mirror
    git push origin --force 'refs/heads/*'
    git push origin --force 'refs/tags/*'
    git push origin --force 'refs/replace/*'
Info

Remember: Everyone who has a working copy of your repository will now need to move to that new history as well.

For people who don't have any ongoing local work, the easiest way to ensure the correct history is to check out a fresh clone.

For users who have on-going local work, the following steps should work, unless it includes now-deleted files:

  1. git fetch the remote (not pull, to avoid errors/warnings)
  2. git checkout the local branch that has current work
  3. git rebase origin/branch with origin being the remote for the repository and branch being the branch you are working against.

This will rebase the local work onto the remote's main branch (or the branch you are working against at).

Further reading


Hey there! 👋 Thank you for reading this article!

Is there something missing, or do you have an idea on how to improve the documentation? Do you want to write your own article?

You're invited to contribute to the Codeberg Documentation at its source code repository, for example, by adding a pull request or joining in on the discussion in the issue tracker.

For an introduction on contributing to Codeberg Documentation, please have a look at the Contributor FAQ.

© Codeberg Docs Contributors. See LICENSE