Common Path Patterns
pathlib module, introduced in Python 3.4, added a class-based approach to Path and file management. Most users of
Path never venture beyond the basics of joining paths together. That’s a shame as the motivations behind the
pathlib module is to serve as more than just a replacement for the myriad functions in
Let’s take a look at some of the more useful parts of
What is the difference between
One interesting problem you will encounter if you write cross-platform code is dealing with the peculiarities of each platform. The file system differences between POSIX (which you should interpret to mean platforms other than Windows in this instance) and Microsoft Windows are sometimes vast chasms and other times more or less the same. But as a package author you must reconcile them, somehow, and offer a unified interface.
To that end, the
pathlib module introduces two distinct types of classes you can instantiate directly if you want to keep your code platform-agnostic.
PurePath, for inspecting and manipulating abstract paths and filenames only. There are no I/O capabilities here, and no way to query the underlying file system. But if you need to construct or parse Windows paths on Linux, or vice versa, the
PurePathclass is indispensable.
Path, which inherits from
PurePath, but adds file system and I/O support.
Interestingly, however, the two aforementioned classes are not the class instances you end up with if you create them. To ensure your code works everywhere the
pathlib module checks where the code is running and instantiates either:
PurePosixPath) for non-Windows platforms; or
PureWindowsPath) for Windows.
Because of the distinction between
Pure and regular
Path objects, you cannot instantiate a
WindowsPath on Linux as Linux does not understand Windows-specific file system I/O (or the other way around) and it would result in errors or – worse – code that seems to work OK but silently fails or misbehaves. So if you want to manipulate platform-specific paths and filenames you must use the
PurePath-derived versions. For all other use-cases you want
Constructing and Altering Paths
Creating New Paths Explicitly with
One of the more controversial features is the ability to join paths together using the operator overloaded division symbol,
It’s useful. However, it assumes there is at least one Path-like object in the chain of operators, and that the division symbols only appear where there is a Path-like object on the left-hand or right-hand side of
This works fine because the strings on the far left and right are adjacent to a Path object. But this is an error, though:
You must take care when you join variables together in this way that you do not end up joining strings together instead of
But if you alter the operator precedence it does work:
It goes without saying that this is an obtuse way of forcing the
Path to join paths to its left. In your own code you should emphasize readability over convenience and not do this.
Programmatically Joining Paths
If you have to programmatically join paths together you cannot – well, not easily anyway – use the
/ operator. Instead you should use the
joinpath() method merely concatenates strings together, so by all means include filenames at the end if that is what you are trying to do.
One word of caution when you use this method. If you use a leading
/ in one of the method arguments you will reset the root point of the path you generate. It’s a useful feature, but can cause hiccups if you’re not aware of it:
Replacing the filename or extension
If you have a known
Path but you want to replace either the filename or extension, you can do so with
Listing or Iterating over Directories
A common activity is to walk through every file or directory given a starting
Listing all the files in the current directory
The easiest method is with
It returns a generator that you must loop over. It works well with large directories for that reason. If you require all the files you can do as above and pass the generator object through
Recursively listing files and directories
Path.rglob as it takes a traditional “glob”-style pattern and recurses into sub-directories yielding all filenames or paths that match the pattern. And unlike the classic
os.scandir that forces you to write your own recursive function, this function is easy to use as it handles that for you.
The glob-style pattern matcher understands simple wildcard characters like
As the scanner walks through each directory and filename in turn, the pattern is checked against that filename (or directory) only. You cannot, therefore, use the glob pattern to match across directories and files. Therefore it is not possible to search for
dict*english to match parts of the path and filename.
Path.iterdir this method returns a generator as it recursively walks through the tree — and a good thing too, as you may end up walking through tens thousands of files. Therefore it will only scan one file at a time instead of crawling the whole tree first and returning just the results that match.
Reading and Writing to Files
The traditional way to
open() a file for reading is:
But you can do this with a
Path object also. Most of the same arguments you pass to
open also work with
However, if you only need the contents of the file, and if you do not need the still-open file object after the fact, you can simplify the code even further:
write and you can write data:
And you can do the same but with binary with
Path.read_bytes() if you have a
Extracting and Splitting a
Operating on parts of a filename or filepath is common, and unlike the traditional functions found in
Path is designed to be user friendly, consistent and as unambiguous as possible from the start.
Extracting the Filename or File Extension
You can retrieve the outer-most file extension, if the
Path has one, with:
If there are multiple – such as with
.tar.gz – you can ask for all the suffixes as a list.
This also works with zero or one suffixes, so if you always want your results in a list, you should prefer
Extracting the Filename
If you want the filename (without the extension) you can use the
stem property instead
stem property is clever enough to handle ‘hidden’ files that begin with a
If you just want the filename portion of the path, but with the extension, use
Splitting the Path into its parts
You can split a path into all its component parts with the
This is particularly useful if you want to check that parts of a
Path exist within a certain hierarchy of directories.
Likewise, you can request the
parents of a
Keep in mind that determining the parent of a
Path with a filename is just the path the filename is in.
You can request a list of parents. As it’s sequence, you must loop over it to reveal its elements. Below I’ve just run it through the
tuple() function to consume each element.
Extracting the drive or root of a Path
You can query for the root on all platforms, though
drive is perhaps mostly useful on Windows
On Linux it will most likely return
And the drive is almost always
If you quickly need a Path object set to one of several common defaults, you can use these class methods exposed on the
You can get the
cwd (current working directory) using the
You can also get the home directory on Linux or Windows.
However, on Windows that may be any number of places governed by the
USERNAME environment variables as it uses a heuristic to try and infer the right place for it.
The Builder Pattern
One tantalizing feature of
Path – as I explained above – is that it consolidates many disparate functions into a standardized class that you can instantiate and query. One useful side-effect of that is what is known as the Builder Pattern.
The Builder Pattern is a common Object-Oriented Programming pattern. The
Path class implements a version of that. The basic idea is that when you instantiate a
Path class you are given an object that, in turn, can create more
Path objects based on the values already present in the object.
Consider this example:
When you create a
Path object you are given either a
WindowsPath object in return. I covered why that happens above.
Observe that creating a
Path object in turn gives you another
Path. That is the capstone of the builder pattern. Most methods or properties present on a
Path object will, in turn, give you a new
Path with the change you asked for applied to it.
It’s possible to chain these methods and properties to build new
That’s a powerful feature and a useful way of constructing or altering paths in a programmatic, linear way. You can of course store parts of that chain in a function (or variable) and use them as a mini-factory:
Pathclass makes it easy to interact with files and paths
By consolidating the most common file and path operations into a single class you won’t have to poke around in the
- Transparent cross-platform file and path support
You can use
Pathto transparently handle both Linux and Windows paths without worry. If you must generate paths for a particular platform you can instantiate
PureWindowsPathbut you lose the ability to interact with the actual file system.
- Reading to, and writing from, files is also supported
If your needs are simple you can use the built-in file reader and writer in
Path. It also supports binary. You can also
opena file directly via the
Pathinterface if you need access to the underlying
- The builder pattern is a powerful way of constructing paths
Instead of string concatenation and awkwardly nesting function calls to
osand friends you can trivially construct and alter paths and filenames with a unified interface.