Categories
git git-filter-branch git-subtree

Detach (move) subdirectory into separate Git repository

1914

I have a Git repository which contains a number of subdirectories. Now I have found that one of the subdirectories is unrelated to the other and should be detached to a separate repository.

How can I do this while keeping the history of the files within the subdirectory?

I guess I could make a clone and remove the unwanted parts of each clone, but I suppose this would give me the complete tree when checking out an older revision etc. This might be acceptable, but I would prefer to be able to pretend that the two repositories doesn’t have a shared history.

Just to make it clear, I have the following structure:

XYZ/
    .git/
    XY1/
    ABC/
    XY2/

But I would like this instead:

XYZ/
    .git/
    XY1/
    XY2/
ABC/
    .git/
    ABC/

3

  • 9

    This is trivial now with git filter-branch see my answer below.

    Aug 20, 2014 at 14:12

  • 14

    @jeremyjjbrown is right. This is no longer difficult to do but it is difficult to find the right answer on Google because all the old answers dominate the results.

    Oct 14, 2014 at 5:39

  • 3

    Use of git filter-branch is discouraged. See warning in docs.

    – djvg

    Nov 5, 2021 at 16:59

1266

Update: This process is so common, that the git team made it much simpler with a new tool, git subtree. See here: Detach (move) subdirectory into separate Git repository


You want to clone your repository and then use git filter-branch to mark everything but the subdirectory you want in your new repo to be garbage-collected.

  1. To clone your local repository:

    git clone /XYZ /ABC
    

    (Note: the repository will be cloned using hard-links, but that is not a problem since the hard-linked files will not be modified in themselves – new ones will be created.)

  2. Now, let us preserve the interesting branches which we want to rewrite as well, and then remove the origin to avoid pushing there and to make sure that old commits will not be referenced by the origin:

    cd /ABC
    for i in branch1 br2 br3; do git branch -t $i origin/$i; done
    git remote rm origin
    

    or for all remote branches:

    cd /ABC
    for i in $(git branch -r | sed "s/.*origin\///"); do git branch -t $i origin/$i; done
    git remote rm origin
    
  3. Now you might want to also remove tags which have no relation with the subproject; you can also do that later, but you might need to prune your repo again. I did not do so and got a WARNING: Ref 'refs/tags/v0.1' is unchanged for all tags (since they were all unrelated to the subproject); additionally, after removing such tags more space will be reclaimed. Apparently git filter-branch should be able to rewrite other tags, but I could not verify this. If you want to remove all tags, use git tag -l | xargs git tag -d.

  4. Then use filter-branch and reset to exclude the other files, so they can be pruned. Let’s also add --tag-name-filter cat --prune-empty to remove empty commits and to rewrite tags (note that this will have to strip their signature):

    git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter ABC -- --all
    

    or alternatively, to only rewrite the HEAD branch and ignore tags and other branches:

    git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter ABC HEAD
    
  5. Then delete the backup reflogs so the space can be truly reclaimed (although now the operation is destructive)

    git reset --hard
    git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
    git reflog expire --expire=now --all
    git gc --aggressive --prune=now
    

    and now you have a local git repository of the ABC sub-directory with all its history preserved.

Note: For most uses, git filter-branch should indeed have the added parameter -- --all. Yes that’s really --space-- all. This needs to be the last parameters for the command. As Matli discovered, this keeps the project branches and tags included in the new repo.

Edit: various suggestions from comments below were incorporated to make sure, for instance, that the repository is actually shrunk (which was not always the case before).

28

  • 13

    Why do you need --no-hardlinks? Removing one hardlink won’t affect the other file. Git objects are immutable too. Only if you’d change owner/file permissions you need --no-hardlinks.

    – vdboor

    Feb 1, 2010 at 9:58


  • 2

    And if you want to rewrite your tags to not reference the old structure, add --tag-name-filter cat

    Oct 6, 2011 at 11:58

  • 8

    Like Paul, I did not want project tags in my new repo, so I did not use -- --all. I also ran git remote rm origin, and git tag -l | xargs git tag -d before the git filter-branch command. This shrunk my .git directory from 60M to ~300K. Note that I needed to run both of these commands to in order to get the size reduction.

    Nov 17, 2011 at 21:18

  • 2

    The git man page recommends, instead of rm -rf .git/refs/original/, git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d; I guess the latter is more robust if refs are not stored in the right place. Moreover, I believe that ‘git remote rm origin’ is also needed to shrink the repo, otherwise the refs from origin will keep objects referenced. @jonp, I think that was the problem for you. Finally, to also rewrite other branches, one must set them up manually with git branch after cloninng, -- --all and remove HEAD (which stops rewriting of other branches).

    Feb 20, 2012 at 0:07

  • 3

    Does this not create ABC/ instead of ABC/ABC/?

    May 21, 2013 at 8:39

1266

Update: This process is so common, that the git team made it much simpler with a new tool, git subtree. See here: Detach (move) subdirectory into separate Git repository


You want to clone your repository and then use git filter-branch to mark everything but the subdirectory you want in your new repo to be garbage-collected.

  1. To clone your local repository:

    git clone /XYZ /ABC
    

    (Note: the repository will be cloned using hard-links, but that is not a problem since the hard-linked files will not be modified in themselves – new ones will be created.)

  2. Now, let us preserve the interesting branches which we want to rewrite as well, and then remove the origin to avoid pushing there and to make sure that old commits will not be referenced by the origin:

    cd /ABC
    for i in branch1 br2 br3; do git branch -t $i origin/$i; done
    git remote rm origin
    

    or for all remote branches:

    cd /ABC
    for i in $(git branch -r | sed "s/.*origin\///"); do git branch -t $i origin/$i; done
    git remote rm origin
    
  3. Now you might want to also remove tags which have no relation with the subproject; you can also do that later, but you might need to prune your repo again. I did not do so and got a WARNING: Ref 'refs/tags/v0.1' is unchanged for all tags (since they were all unrelated to the subproject); additionally, after removing such tags more space will be reclaimed. Apparently git filter-branch should be able to rewrite other tags, but I could not verify this. If you want to remove all tags, use git tag -l | xargs git tag -d.

  4. Then use filter-branch and reset to exclude the other files, so they can be pruned. Let’s also add --tag-name-filter cat --prune-empty to remove empty commits and to rewrite tags (note that this will have to strip their signature):

    git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter ABC -- --all
    

    or alternatively, to only rewrite the HEAD branch and ignore tags and other branches:

    git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter ABC HEAD
    
  5. Then delete the backup reflogs so the space can be truly reclaimed (although now the operation is destructive)

    git reset --hard
    git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
    git reflog expire --expire=now --all
    git gc --aggressive --prune=now
    

    and now you have a local git repository of the ABC sub-directory with all its history preserved.

Note: For most uses, git filter-branch should indeed have the added parameter -- --all. Yes that’s really --space-- all. This needs to be the last parameters for the command. As Matli discovered, this keeps the project branches and tags included in the new repo.

Edit: various suggestions from comments below were incorporated to make sure, for instance, that the repository is actually shrunk (which was not always the case before).

28

  • 13

    Why do you need --no-hardlinks? Removing one hardlink won’t affect the other file. Git objects are immutable too. Only if you’d change owner/file permissions you need --no-hardlinks.

    – vdboor

    Feb 1, 2010 at 9:58


  • 2

    And if you want to rewrite your tags to not reference the old structure, add --tag-name-filter cat

    Oct 6, 2011 at 11:58

  • 8

    Like Paul, I did not want project tags in my new repo, so I did not use -- --all. I also ran git remote rm origin, and git tag -l | xargs git tag -d before the git filter-branch command. This shrunk my .git directory from 60M to ~300K. Note that I needed to run both of these commands to in order to get the size reduction.

    Nov 17, 2011 at 21:18

  • 2

    The git man page recommends, instead of rm -rf .git/refs/original/, git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d; I guess the latter is more robust if refs are not stored in the right place. Moreover, I believe that ‘git remote rm origin’ is also needed to shrink the repo, otherwise the refs from origin will keep objects referenced. @jonp, I think that was the problem for you. Finally, to also rewrite other branches, one must set them up manually with git branch after cloninng, -- --all and remove HEAD (which stops rewriting of other branches).

    Feb 20, 2012 at 0:07

  • 3

    Does this not create ABC/ instead of ABC/ABC/?

    May 21, 2013 at 8:39

140

Paul’s answer creates a new repository containing /ABC, but does not remove /ABC from within /XYZ. The following command will remove /ABC from within /XYZ:

git filter-branch --tree-filter "rm -rf ABC" --prune-empty HEAD

Of course, test it in a ‘clone –no-hardlinks’ repository first, and follow it with the reset, gc and prune commands Paul lists.

5

  • 54

    make that git filter-branch --index-filter "git rm -r -f --cached --ignore-unmatch ABC" --prune-empty HEAD and it will be much faster. index-filter works on the index while tree-filter has to checkout and stage everything for every commit.

    – fmarc

    Sep 17, 2009 at 19:58

  • 51

    in some cases messing up the history of repository XYZ is overkill … just a simple “rm -rf ABC; git rm -r ABC; git commit -m’extracted ABC into its own repo'” would work better for most people.

    – Evgeny

    Oct 28, 2010 at 23:24

  • 2

    You probably wish to use -f (force) on this command if you do it more than once, e.g., to remove two directories after they have been separated. Otherwise you will get “Cannot create a new backup.”

    Apr 18, 2011 at 17:59


  • 4

    If you’re doing the --index-filter method, you may also want to make that git rm -q -r -f, so that each invocation won’t print a line for each file it deletes.

    Oct 12, 2011 at 19:55

  • 1

    I would suggest editing Paul’s answer, only because Paul’s is so thorough.

    Mar 5, 2014 at 15:38