Common Path Patterns

The pathlib module, introduced in Python 3.4, added a class-based approach to Path and file management. Most users of Path never venture beyond the basics of joining paths together. That’s a shame as the motivations behind the pathlib module is to serve as more than just a replacement for the myriad functions in os and os.path.

Let’s take a look at some of the more useful parts of pathlib.

What is the difference between PurePath and Path?

One interesting problem you will encounter if you write cross-platform code is dealing with the peculiarities of each platform. The file system differences between POSIX (which you should interpret to mean platforms other than Windows in this instance) and Microsoft Windows are sometimes vast chasms and other times more or less the same. But as a package author you must reconcile them, somehow, and offer a unified interface.

To that end, the pathlib module introduces two distinct types of classes you can instantiate directly if you want to keep your code platform-agnostic.

  1. PurePath, for inspecting and manipulating abstract paths and filenames only. There are no I/O capabilities here, and no way to query the underlying file system. But if you need to construct or parse Windows paths on Linux, or vice versa, the PurePath class is indispensable.

  2. Path, which inherits from PurePath, but adds file system and I/O support.

Interestingly, however, the two aforementioned classes are not the class instances you end up with if you create them. To ensure your code works everywhere the pathlib module checks where the code is running and instantiates either: PosixPath (or PurePosixPath) for non-Windows platforms; or WindowsPath (or PureWindowsPath) for Windows.

Because of the distinction between Pure and regular Path objects, you cannot instantiate a WindowsPath on Linux as Linux does not understand Windows-specific file system I/O (or the other way around) and it would result in errors or – worse – code that seems to work OK but silently fails or misbehaves. So if you want to manipulate platform-specific paths and filenames you must use the PurePath-derived versions. For all other use-cases you want Path.

Constructing and Altering Paths

Creating New Paths Explicitly with /

One of the more controversial features is the ability to join paths together using the operator overloaded division symbol, /:

>>> Path('/tmp') / 'hello.txt'
PosixPath('/tmp/hello.txt')

It’s useful. However, it assumes there is at least one Path-like object in the chain of operators, and that the division symbols only appear where there is a Path-like object on the left-hand or right-hand side of /:

>>> '/tmp' / Path('hello') / 'bar.txt'
PosixPath('/tmp/hello/bar.txt')

This works fine because the strings on the far left and right are adjacent to a Path object. But this is an error, though:

>>> '/tmp' / 'hello' / Path('world.txt')
TypeError: unsupported operand type(s) for /: 'str' and 'str'

You must take care when you join variables together in this way that you do not end up joining strings together instead of Path objects.

But if you alter the operator precedence it does work:

>>> '/tmp' / ('hello' / Path('world.txt'))
PosixPath('/tmp/hello/world.txt')

It goes without saying that this is an obtuse way of forcing the Path to join paths to its left. In your own code you should emphasize readability over convenience and not do this.

Programmatically Joining Paths

If you have to programmatically join paths together you cannot – well, not easily anyway – use the / operator. Instead you should use the Path.joinpath() method:

>>> Path('/home/inspiredpython').joinpath('documents', 'hello.txt')
PosixPath('/home/inspiredpython/documents/hello.txt')

The joinpath() method merely concatenates strings together, so by all means include filenames at the end if that is what you are trying to do.

One word of caution when you use this method. If you use a leading / in one of the method arguments you will reset the root point of the path you generate. It’s a useful feature, but can cause hiccups if you’re not aware of it:

>>> Path('/home').joinpath('/tmp', 'myfile.txt')
PosixPath('/tmp/myfile.txt')

Replacing the filename or extension

If you have a known Path but you want to replace either the filename or extension, you can do so with Path.with_name or Path.with_suffix:

>>> Path('/home/inspiredpython/test.py').with_name('hello.txt')
PosixPath('/home/inspiredpython/hello.txt')
>>> Path('/home/inspiredpython/test.py').with_suffix('.txt')
PosixPath('/home/inspiredpython/test.txt')

Listing or Iterating over Directories

A common activity is to walk through every file or directory given a starting Path.

Listing all the files in the current directory

The easiest method is with Path.iterdir:

>>> list(Path('/home/inspiredpython/'))
[PosixPath('.bashrc'), ...]

It returns a generator that you must loop over. It works well with large directories for that reason. If you require all the files you can do as above and pass the generator object through list().

Recursively listing files and directories

Path.rglob as it takes a traditional “glob”-style pattern and recurses into sub-directories yielding all filenames or paths that match the pattern. And unlike the classic os.scandir that forces you to write your own recursive function, this function is easy to use as it handles that for you.

The glob-style pattern matcher understands simple wildcard characters like * and ?.

>>> list(Path('/usr/share/dict/').rglob('*english'))
[PosixPath('/usr/share/dict/british-english'),
 PosixPath('/usr/share/dict/american-english')]

As the scanner walks through each directory and filename in turn, the pattern is checked against that filename (or directory) only. You cannot, therefore, use the glob pattern to match across directories and files. Therefore it is not possible to search for dict*english to match parts of the path and filename.

Like Path.iterdir this method returns a generator as it recursively walks through the tree — and a good thing too, as you may end up walking through tens thousands of files. Therefore it will only scan one file at a time instead of crawling the whole tree first and returning just the results that match.

Reading and Writing to Files

The traditional way to open() a file for reading is:

with open('/tmp/hello.txt') as f:
    text = f.read()

But you can do this with a Path object also. Most of the same arguments you pass to open also work with Path.open():

with Path('/tmp/hello.txt').open('r', encoding='iso-8859-1') as f:
    text = f.read()

However, if you only need the contents of the file, and if you do not need the still-open file object after the fact, you can simplify the code even further:

text = Path('/tmp/hello.txt').read_text()

Replacing read with write and you can write data:

text = "Hello, World!"
Path('/tmp/hello.txt').write_text(text)

And you can do the same but with binary with Path.write_bytes() and Path.read_bytes() if you have a bytes object:

import os
binary = os.urandom(32)
Path('/tmp/hello.bin').write_bytes(binary)

Extracting and Splitting a Path

Operating on parts of a filename or filepath is common, and unlike the traditional functions found in os.path, Path is designed to be user friendly, consistent and as unambiguous as possible from the start.

Extracting the Filename or File Extension

You can retrieve the outer-most file extension, if the Path has one, with:

>>> Path('inspiredpython.txt').suffix
'.txt'

If there are multiple – such as with .tar.gz – you can ask for all the suffixes as a list.

>>> Path('archive.tar.gz').suffixes
['.tar', '.gz']

This also works with zero or one suffixes, so if you always want your results in a list, you should prefer suffixes to suffix.

>>> Path('README').suffixes
[]

Extracting the Filename

If you want the filename (without the extension) you can use the stem property instead

>>> Path('README.md').stem
'README'

The stem property is clever enough to handle ‘hidden’ files that begin with a .:

>>> Path('.bashrc').stem
'.bashrc'

If you just want the filename portion of the path, but with the extension, use name:

>>> Path('/opt/hello.txt').name
'hello.txt'

Splitting the Path into its parts

You can split a path into all its component parts with the parts property

>>> Path('/var/log/dmesg').parts
('/', 'var', 'log', 'dmesg')

This is particularly useful if you want to check that parts of a Path exist within a certain hierarchy of directories.

Likewise, you can request the parent or parents of a Path.

>>> Path('/home/inspiredpython/README.md').parent
PosixPath('/home/inspiredpython')

Keep in mind that determining the parent of a Path with a filename is just the path the filename is in.

You can request a list of parents. As it’s sequence, you must loop over it to reveal its elements. Below I’ve just run it through the tuple() function to consume each element.

>>> tuple(Path('/home/inspiredpython/README.md').parents)
(PosixPath('/home/inspiredpython'), PosixPath('/home'), PosixPath('/'))

Extracting the drive or root of a Path

You can query for the root on all platforms, though drive is perhaps mostly useful on Windows

>>> Path(r'C:\Users\inspiredpython\Document\inspired.py').drive
'C:'

On Linux it will most likely return /

>>> p.root
'/'

And the drive is almost always ''

>>> p.drive
''

Quick Shortcuts

If you quickly need a Path object set to one of several common defaults, you can use these class methods exposed on the Path class.

You can get the cwd (current working directory) using the Path.cwd() classmethod:

>>> Path.cwd()
PosixPath('/tmp/some-directory/')

You can also get the home directory on Linux or Windows.

However, on Windows that may be any number of places governed by the USERPROFILE, HOMEPATH, and USERNAME environment variables as it uses a heuristic to try and infer the right place for it.

>>> Path.home()
PosixPath('/home/inspiredpython/')

The Builder Pattern

One tantalizing feature of Path – as I explained above – is that it consolidates many disparate functions into a standardized class that you can instantiate and query. One useful side-effect of that is what is known as the Builder Pattern.

The Builder Pattern is a common Object-Oriented Programming pattern. The Path class implements a version of that. The basic idea is that when you instantiate a Path class you are given an object that, in turn, can create more Path objects based on the values already present in the object.

Consider this example:

>>> Path('/home/inspired/Documents/python.txt')
PosixPath('/home/inspired/Documents/python.txt')

When you create a Path object you are given either a PosixPath or WindowsPath object in return. I covered why that happens above.

Observe that creating a Path object in turn gives you another Path. That is the capstone of the builder pattern. Most methods or properties present on a Path object will, in turn, give you a new Path with the change you asked for applied to it.

It’s possible to chain these methods and properties to build new Path constructs:

>>> Path('/home/inspired/Documents/python.txt').parent.parent.joinpath('files').joinpath('python.rst')
PosixPath('/home/inspired/files/python.rst')

That’s a powerful feature and a useful way of constructing or altering paths in a programmatic, linear way. You can of course store parts of that chain in a function (or variable) and use them as a mini-factory:

def switch_to_files(p):
    return p.parent.parent.joinpath('files')

>>> p = Path('/home/inspired/Documents/python.txt')
>>> switch_to_files(p)
PosixPath('/home/inspired/files')
>>> switch_to_files(p) / 'hello-world.txt'
PosixPath('/home/inspired/files/hello-world.txt')

Summary

The Path class makes it easy to interact with files and paths

By consolidating the most common file and path operations into a single class you won’t have to poke around in the os module.

Transparent cross-platform file and path support

You can use Path to transparently handle both Linux and Windows paths without worry. If you must generate paths for a particular platform you can instantiate PurePosixPath and PureWindowsPath but you lose the ability to interact with the actual file system.

Reading to, and writing from, files is also supported

If your needs are simple you can use the built-in file reader and writer in Path. It also supports binary. You can also open a file directly via the Path interface if you need access to the underlying file object.

The builder pattern is a powerful way of constructing paths

Instead of string concatenation and awkwardly nesting function calls to os and friends you can trivially construct and alter paths and filenames with a unified interface.

Liked the Article?

Why not follow us …

Be Inspired Get Python tips sent to your inbox

We'll tell you about the latest courses and articles.

Absolutely no spam. We promise!