September 19th, 2019
When I was looking through the documentation of git commands, I noticed that many of them had an option for
<pathspec>. I initially thought that this was just a technical way to say “path,” and assumed that it could only accept directories and filenames. After diving into the rabbit hole of documentation, I found that the pathspec option of git commands are capable of so much more.
The pathspec is the mechanism that git uses for limiting the scope of a git command to a subset of the repository. If you have used much git, you have likely used a pathspec whether you know it or not. For example, in the command
git add README.md, the pathspec is
README.md. However, it is capable of much more nuance and flexibility.
So, why should you learn about pathspecs? Since it is a part of many commands, these commands become much more powerful with an understanding of pathspecs. With
git add, you can add just the files within a single directory. With
git diff, you can examine just the changes made to filenames with an extension of
.scss. You can
git grep all files except for those in the
In addition, pathspecs can help with the writing of more generic git aliases. For example, I have an alias named
git todo, which will search all of my repository files for the string
'todo'. However, I would like for this to show all instances of the string, even if they are not within my current working directory. With pathspecs, we will see how this becomes possible.
File or directory
The most straightforward way to use a pathspec is with just a directory and/or filename. For example, with
git add you can do the following.
README are the respective pathspecs for each command.
git add . # add CWD (current working directory) git add src/ # add src/ directory git add README # add only README directory
You can also add multiple pathspecs to a command:
git add src/ server/ # adds both src/ and server/ directories
Sometimes, you may see a
-- preceding the pathspec of a command. This is used to remove any ambiguity of what is the pathspec and what is part of the command.
In addition to files & directories, you can match patterns using
* symbol is used as a wildcard and it will match the
/ in paths — in other words, it will search through subdirectories.
git log '*.js' # logs all .js files in CWD and subdirectories git log '.*' # logs all 'hidden' files and directories in CWD git log '*/.*' # logs all 'hidden' files and directories in subdirectories
The quotes are important, especially when using
*! They prevent your shell (such as bash or ZSH) from attempting to expand the wildcards on their own. For example, let’s take a look at how
git ls-files will list files with and without the quotes.
# example directory structure # # . # ├── package-lock.json # ├── package.json # └── data # ├── bar.json # ├── baz.json # └── foo.json git ls-files *.json # package-lock.json # package.json git ls-files '*.json' # data/bar.json # data/baz.json # data/foo.json # package-lock.json # package.json
Since the shell is expanding the
* in the first command,
git ls-files receives the command as
git ls-files package-lock.json package.json. The quotes ensure that git is the one to resolve the wildcard.
You can also use the
? character as a wildcard for a single character. For example, to match either
mp4 files, you can do the following.
git ls-files '*.mp?'
[tj]. This will match either a
t or a
git ls-files '*.[tj]s'
This will match either
.ts files or
.js files. In addition to just using characters, there are certain collections of characters that can be referenced within bracket expressions. For example, you can use
[:digit:] within a bracket expression to match any decimal digit, or you can use
[:space:] to match any space characters.
git ls-files '*.mp[[:digit:]]' # mp0, mp1, mp2, mp3, ..., mp9 git ls-files '*[[:space:]]*' # matches any path containing a space
To read more about bracket expression and how to use them, check out the GNU manual.
Pathspecs also have the special tool in their arsenal called “magic signatures” which unlock some additional functionality to your pathspecs. These “magic signatures” are called by using
:(signature) at the beginning of your pathspec. If this doesn’t make sense, don’t worry: some examples will hopefully help clear it up.
top signature tells git to match the pattern from the root of the git repository rather than the current working directory. You can also use the shorthand
:/ rather than
git ls-files ':(top)*.js' git ls-files ':/*.js' # shorthand
This will list all files in your repository that have an extension of
.js. With the
top signature this can be called within any subdirectory in your repository. I find this to be especially useful when writing generic git aliases!
git config --global alias.js 'ls-files -- ':(top)*.js''
You can use
icase signature tells git to not care about case when matching. This could be useful if you don’t care which case the filename is — for example, this could be useful for matching
jpg files, which sometimes use the uppercase extension
git ls-files ':(icase)*.jpg'
literal signature tells git to treat all of your characters literally. This would be used if you want to treat characters such as
? as themselves, rather than as wildcards. Unless your repository has filenames with
?, I don’t expect that this signature would be used too often.
git log ':(literal)*.js' # returns log for the file '*.js'
When I started learning pathspecs, I noticed that wildcards worked differently than I was used to. Typically I see a single asterisk
* as being a wildcard that does not match anything through directories and consecutive asterisks (
**) as a “deep” wildcard that does match names through directories. If you would prefer this style of wildcards, you can use the
glob magic signature!
This can be useful if you want more fine-grained control over how you search through your project’s directory structure. As an example, take a look at how these two
git ls-files can search through a React project.
git ls-files ':(glob)src/components/*/*.jsx' # 'top level' jsx components git ls-files ':(glob)src/components/**/*.jsx' # 'all' jsx components
Git has the ability to set “attributes” to specific files. You can set these attributes using a
# .gitattributes src/components/vendor/* vendored # sets 'vendored' attribute src/styles/vendor/* vendored
attr magic signature can set attribute requirements for your pathspec. For example, we might want to ignore the above files from a vendor.
git ls-files ':(attr:!vendored)*.js' # searches for non-vendored js files git ls-files ':(attr:vendored)*.js' # searches for vendored js files
Lastly, there is the “exclude’” magic signature (shorthand of
:^). This signature works differently from the rest of the magic signatures. After all other pathspecs have been resolved, all pathspecs with an
exclude signature are resolved and then removed from the returned paths. For example, you can search through all of your
.js files while excluding the
.spec.js test files.
git grep 'foo' -- '*.js' ':(exclude)*.spec.js' # search .js files excluding .spec.js git grep 'foo' -- '*.js' ':!*.spec.js' . # shorthand for the same
There is nothing limiting you from using multiple magic signatures in a single pathspec! You can use multiple signatures by separating your magic words with commas within your parenthesis. For example, you can do the following if you’d like to match from the base of your repository (using
top), case insensitively (using
icase), using only authored code (ignoring vendor files with
attr), and using glob-style wildcards (using
git ls-files -- ':(top,icase,glob,attr:!vendored)src/components/*/*.jsx'
The only two magic signatures that you are unable to combine are
literal, since they both affect how
git deals with wildcards. This is referenced in the git glossary with perhaps my favorite sentence that I have ever read in any documentation.
Glob magic is incompatible with literal magic.