Mastering Structural Pattern Matching
If you’re not familiar with the term Structural Pattern Matching then you are not alone. It’s a feature that, until about 10-15 years ago, you would not see outside of functional programming languages. Its use, however, has spread; today you can find a feature like it in C#, Swift, and Ruby. What was once the preserve of niche languages is now available for you to try in Python 3.10.
Disciples of the functional programming school will surely love it; and seasoned developers who has had to tangle with the umpteenth business rules engine can look forward to some reprieve also. But what about day-to-day use cases? What makes Structural Pattern Matching useful for your typical Python project? What is it even, and why would you want to adopt it when you can solve complex problems without it?
The general concept – and I’ll walk you through how it all works soon enough – goes to the very heart of Computer Science and (especially) functional programming. Permeating all these different languages and their own take on this feature is a common vocabulary and understanding about what Pattern Matching is and the problems it tries to solve. Once you grasp the gist of pattern matching in Python you will recognize – and know how to apply – the concepts anywhere.
Tantalizingly I left a snippet of code heralding the new feature above. It doesn’t look too bad, right? it’s a function that tries to intelligently format a greeting:
But there’s nothing in
greet_person that you couldn’t do with a series of
if statements. And that, right there, is the crux of what pattern matching tries to do: remove the verbiage and tedium of
if statements and “getters” that interrogate the structure of an object to extract the information you want. In
greet_person I want – ideally – several pieces of information: a
greeting and a
name, and with graceful handling in case some or all of them are missing.
Manipulating data structures is a core part of programming, and the pattern matching system is there to help you achieve that. When you use
isinstance calls, Exceptions and membership tests against objects, dictionaries, lists, tuples and sets you do so to ensure the structure of the data matches one or more patterns. That is what an ad hoc pattern matching engine looks like.
Consider what the
match code above looks like the old-fashioned way:
This is just a part of the whole ordeal, and I made no effort to get clever either. But as you can see, deeply nested
if statements make it easy to miss a business rule or put it in the wrong place; even worse, you have to parse the whole structure to figure out the right place to make changes. Not to mention the size of it. Add just a few more rules or complex checks to determine the right greeting format and you would have to create your own home brew matching engine — this approach simply does not scale.
And that, then, brings us to the heart of Structural Pattern Matching: the
case keywords. This is a problem that you have – and will have – in every facet of programming:
Do you have an inordinately deep and nested dict-of-dicts where you must check for the presence of keys and their values? You could use the structural pattern matcher.
Do you have complex business rules that depend on certain attributes in custom objects, like a
Salesobject? You could use the structural pattern matcher.
Do you have to parse the output of files or streams of data from other systems? Maybe transform them from a list of primitives (strings, integers, etc.) into a
namedtuple, dictionary or custom dataclass object? You could use the structural pattern matcher.
So let’s take a look at how it really works.
Anatomy of the Pattern Matcher Syntax
OK, so it’s time to introduce a bit of terminology. The
match statement is a soft keyword, and takes an expression (think: similar to the right-hand side of a variable assignment) and that becomes the subject of your
A soft keyword, like the
match statement, is a keyword that does not cause a syntax error if used in a context that is unambiguously not part of a
match pattern matching block.
That means you can continue to use
match as a variable or function name, for instance.
match statement is not a function, nor does it return anything. It simply demarcates the beginning of one or more
case clauses, like a daisy-chain of
When Python executes the pattern matcher code it simply checks, in the order that you wrote the
case clauses, for the first one that matches. You can have multiple
case clauses that match, but only the first that it encounters that does match is used. Therefore, the ordering does matter.
matchstatement picks the first match that evaluates to true
So try to order the
casestatements in the order you want them to match. In this sense, it is similar to how you might order a chain of
case clause represents one or more patterns that you want to match against the subject defined in the
In C-like languages, you must
switch-case statements or the code will simply proceed to the next case. That is not possible here: there is at most one
case clause that is executed. Indeed,
switch-case in C-likes is nothing at all like pattern matching, so do not mix them up.
case clause takes one or more patterns. Each pattern can in turn have its own sub-patterns.
See Truthy and Falsy Gotchas for more information
case clause can optionally have a guard, which is an
if statement that lets you apply boolean conditionals that must be truthy for the
case clause to match. It’s a bit like the
if statement in a list comprehension.
case clause takes a statement block of code that is executed if that
case clause is the first one in the
match block that matches the subject. If you want to
yield or, say, talk to a database inside a
case clause’s statement block, you can and should. This is where you put all the logic that you must invoke if the subject matches.
match-casestatements may well become the center of your code in some applications
Finite state machines; walking tree and tree-like structures with declarative patterns and recursion; an infinite loop that processes incoming requests in a microservice; the part of your ETL application that reads in raw data from a live system before cranking out JSON and putting into another. The sky is the limit.
What is a Pattern?
The first thing I need to mention is that the code you’ll write in a
case statement is nothing like the code you’d write outside it!
When you write a pattern you describe the structure the
case clause should test the subject against. That opens up a lot of avenues that are otherwise unavailable to you. You can deeply nest dictionaries, lists and tuples and Python’s matching engine will delicately unwrap each layer and check if the structure matches any of the
Consider the example from before:
Let’s take a closer look at that
case clause. It has exactly one pattern and that pattern mandates that:
The subject is a dictionary.
The dictionary contains at least two keys, one named
"greeting"and the other
And that the value of those two keys are bound to the named bindings
So if you pass
greet_person anything that does not meet those three criteria, the
case statement match fails and the
match statement proceeds to the next
And what’s a Capture Pattern?
The only confounding part so far is the bound names. Yes, they look an awful lot like variables. But they’re not variables, even though if it were a dictionary in any other part of your code, they would be. That’s because it’s a Capture Pattern and is part and parcel of the pattern matching engine.
When Python has to map the subject onto the patterns in
case blocks it can, along the way, bind the value it finds to a name given by you. They are called name bindings or bound names because they are captured as part of the pattern matching process. Once they are bound, though, you can use them as though they were variables. Crucially, it is only when Python attempts to pattern match that they take on the ephemeral nature of not being a variable.
Indeed, if the
case clause succeeds, we
- You can use named bindings to match large swathes of the subject
So you are by no means limited to just the values of a dictionary. As you’ll soon see, we can do so much more than that.
But always remember that named bindings are not variables. There is also the awkward matter of what happens when a pattern is partially matched, but ultimately fails. But I’ll cover the gotchas in a later chapter as they deserve scrutiny too.
- A named binding itself matches (or not!) parts of your pattern
Indeed I can capture the values of the dictionary but of course there’s an implicit assumption: that the keys exist in the first place and has some value, even if that value is
Therefore – and this is crucial – the named binding itself affects the pattern you want the subject to match against.
- A pattern is declarative and not imperative
Recall that imperative programming is writing code that tells Python what to do. With a pattern you do not tell Python what to do; instead, you declare the result or outcome you want, and you expect Python to figure out the nitty-gritty details.By the way …
Specifically PEP-634, PEP-635 and PEP-636
This is very important and remembering that patterns are declarative is critical if you want to truly understand how pattern matching works. Consider the example from before: how does Python do what it does? I mean, it’s documented in several PEP specifications , and there’s of course the pattern matcher’s source code as well.
But – gotchas and engine limitations aside – it is not something that matters here. To use the structural pattern matching engine you must define patterns that are meaningful to you and Python and trust that Python will work out how to get you the answer.
Now you know that a pattern is a way of articulating a desired structure a subject must have for the pattern to match. That structure can be almost anything. But you can also extract portions of the structure that you are most interested in. That is the critical piece of what makes Structural Pattern Matching useful.
In theory the simplest of all the pattern types, the Literal Pattern matches literals, like strings, bools, numbers and
The literal pattern matcher has to make a number of assumptions to work the way most people’s intuition of Python works. That means making a number of explicit exceptions that most people would otherwise find confusing.
Literal pattern checks are made with equality checking (
a == b) but there are a couple of special-cased exceptions and gotchas that you should know about.
Floating point and integers are compared with equality checks. So some floating point numbers will naturally equal their integer counterparts.
You can force Python to prefer one or the other using the type constraint
float(), like so:
Booleans require some forethought if you mix them with
False are both
True == 1 and, in the literal pattern example above, the
case True clause would never run as
case 1 matches it first!
The way to fix that is to ensure the
case True statement is run before
case 1. That will fix the problem:
1 will match
case 1 and
True will match
The reason is that
None are matched by identity (
a is b), like so:
In most codebases this is not going to be a problem, but it is worth knowing about nonetheless. I recommend you read Truthy and Falsy Gotchas to understand why mixing up equality and identity checking can get you into hot water.
When you write patterns you may want to make certain declarations in your pattern that Python must adhere to for the pattern to match. But if you also want to bind that declaration to a name that you can use later, you must use the
Here’s two patterns. One makes a type declaration that must match strings, and the other integers. Note that unlike the example in Literal Patterns, I have not specified a particular string or integer, though I certainly could.
When I call the code it works as you would expect, because the
as statement binds the matched value on the left-hand side to the name on the right.
- AS Patterns make it possible to bind grouped declarations
Without AS patterns you could only bind generic data in a pattern and never constrain it to a type, for instance.
Strictly speaking, guards are not patterns. They are invoked after a pattern is matched but before the code inside the
case block is executed.
greet_person example features a guard. Like the optional
if in a list comprehension, you can optionally attach a guard to a
case block. They are important if you want to make decisions based on the values bound to the names in the pattern.
In this example the
greet_person function checks if a person’s name is in uppercase and, if it is, politely asks them not to shout.
So even if the pattern matches, if the guard is not truthy the whole
case clause fails and the
match statement proceeds to the next one.
- Guards let you evaluate the bound names from a pattern and apply additional checks
Unlike the declarative nature of a pattern, expressions in a guard can have side effects or other complex logic like this:
You can therefore construct patterns and apply constraints that make sense from a functional perspective in your application without concerning yourself with the nitty-gritty of pulling data out of data structures.
Wanting to match two or more patterns in a single
case statement is a common feature. Thanks to Python’s pattern recognition system you are not limited to a single pattern. You can combine multiple patterns at the
case clause level, or inside individual patterns also. The latter, in particular, is especially powerful.
One important caveat is that even though the pattern style is formally named OR Patterns the actual syntax requires you to use
| and not
Note that each highlighted line uses
| and never
or. Aside from that syntactic quirk, everything behaves in much the same way as it does in other parts of Python. I have specifically added brackets around the OR patterns on line 3 to ensure the
as statement to make the relationship clear, even though it is not strictly required.
The most powerful feature of OR patterns is the ability to nest them deep inside data structures that you wish to pattern match against.
Let’s analyze lines 5 & 6 a bit more closely.
The top-most pattern’s a dictionary that mandates that a key named
"greeting" must exist. But unlike the first example I gave this one expects
"Hi" | "Hello" as a sub-pattern against the value of
"greeting". So either
"Hello" is a valid greeting.
Line 6 is a bit more specific. There must be a key
"name" and it must have as value a dictionary with either
"name" as a key. The value of either is bound to the name
- Sub-patterns are powerful and expressive
The benefit of declaratively describing what we want strikes true again. It’s not uncommon to have a nice and neat data structure (and the code to understand it) in your application but, like most things, it’ll evolve and change over time. As it does, you’ll still need to support the legacy format and the newer one at the same time. OR Patterns combined with the ability to embed sub-patterns inside an existing pattern makes it readable, expressive and trivial to extend and understand.
- When you bind a name in an OR Pattern it must be present in all OR patterns
Observe that on line 6 I bind the value of the keys
name. It is impossible to have a bound variable in one part of an OR pattern and not another. If it were possible, that would mean some bound variables would be undefined and hard to reason about.
- There are no equivalent AND patterns or NOT patterns
You only get OR patterns. But that is usually okay; you can constrain the patterns you define to precisely match what you need, which should hopefully eliminate the need for NOT patterns and AND Patterns.
Frequently you want to match anything to indicate that you do not care about the actual value at all, just that there is something there. In Python that role is historically served by
And so it is in a pattern. You may have seen this pattern at the end of some of the examples:
That is a wildcard symbol and it matches anything. As you can represent the entirety of the subject as
_ it serves as a fallback that matches anything in the event none of the other
case clauses do.
You can interrogate structures with them as well, disregarding elements in a list, for instance, that you do not care about:
[_, middle, _] extracts the penultimate element from a list of exactly three elements. You cannot refer to the wildcarded elements as they are unbound; they do not have a name, and cannot be used. Any attempt to use
_ in the code block will instead look for the actual variable
_, if such a variable is in scope.
You can however name a wildcard with
as to bind it if you so desire:
But that seems rather obtuse so I recommend you avoid doing that, and instead just use a bound name of your own choosing.
You can also use the
*rest syntax to represent arbitrary sequences of elements, or
**kwargs to represent keyword arguments, like so:
This pattern returns
*rest, a sequence of an unknown number of elements, provided there are two anonymous (wildcard) elements ahead of it:
It behaves as you would expect for dictionaries also:
Although Python is reasonably clever in deducing the structure of a list or dictionary, you cannot have more than one
**kwargs token at a time. So if you want complex Prolog-style finitary relations and backtracking you will need to do some of that legwork yourself.
- Do not bind things you do not need
Although you can bind most things in a pattern, you should avoid doing so if you do not require the binding. Wildcards instruct Python to disregard the value so the pattern matcher can decide the most efficient way to return the bound names you do care about.
**_to named variants if you do not care about the bound values.
- You can use wildcards in guards
So this is perfectly legitimate and a useful way of constraining a pattern beyond what you can reasonably achieve with a pattern alone:
This is perhaps the most contentious and debated part of Python’s pattern matching implementation.
So far everything I have written pertains to static patterns. Meaning, I typed them into a Python file and I did not, in any way, include values derived from constants, variables or function arguments in the pattern itself. The subject, yes, but not the pattern.
Recall that a Capture pattern is where a pattern’s value is bound to a name.
The problems begin when you write code like this:
It looks fine and it works. But there is a problem.
PREFERRED_GREETING is a bound name and it shadows the module constant with the same name.
So the result is:
Not the answer we were looking for. Leaving out the
"greeting" key and it won’t match at all:
And the reason for that is to do with an unsettled argument about syntax. In languages that typify the use of pattern matching, like LISP for instance, you can (simplifying a bit here) quote or unquote something to indicate that it is (or is not) a variable or a symbol.
Python does not have that. There were endless discussions and, I admit, it’s a hard one to resolve without complicating the syntax and the notation further with a concept that is limited to this one feature of the language. Essentially the problem you saw above could’ve been resolved if there was a way of marking
PREFERRED_GREETING as being a value (Maybe
$PREFERRED_GREETING – it does not matter) or the other way around: that every capture pattern is clearly distinguished from values sourced from outside the pattern.
The only way to use value patterns is to put the values somewhere where Python can deduce that attribute access is required.
This works, because
constants is a module and
getattr(constants, 'PREFERRED_GREETING') is an example of attribute access. Another would be to put constants in an Enum which – if you can – is a much better way to do things anyway. Enums are symbolic and capture both a name and a value and is a marriage made in heaven when you combine it with pattern matching.
- You cannot use plain variables, arguments or constants
Python gets them confused with Capture patterns and it’s a big old mess. Where possible you should avoid passing values into the pattern matching engine unless you gate them behind an attribute lookup (
some_customer.user_idfor instance instead of
- This is likely to be a source of bugs
Tread carefully and decide on a standard way of presenting constants or variable values you want to share with the pattern matching engine:
A dunder class (
namedtuple, dataclasses, etc.) that hosts the values you wish to use
A simple wrapper class that exposes a single property with the value you want to use in the pattern
Use an Enum, if that is possible
Store constants and other module-level stuff in a module and refer to it explicitly, like so:
Sequences are lists and tuples, or anything that inherits from the Abstract Base Class
collections.abc.Sequence. Note though that the pattern matching engine will not expand iterables of any kind.
And unlike other parts of Python where
list("Hello") is a legitimate way of generating a list of a string’s characters, this use case does apply here. Strings and bytes are treated as Literal patterns and are not considered Sequence patterns.
As you have seen by now, lists and tuples behave the way you expect them to.
- You cannot represent sets in a pattern
You can have them in the subject, but you cannot use pattern matching or set constructs in a
caseclause. I recommend you use guards to check for equality if that is what you are looking to do.
Mapping (“Dictionary”) Patterns
Mappings here implies dictionaries (or anything that uses
collections.abc.Mapping), which you have also seen how to do by now. One caveat when you pattern match dictionaries is that the pattern you specify in the
case clause implies a subset check against the subject:
case clause matches the full-length dictionary. If you do not want this, you should instead enforce it with a guard:
The guard checks if the rest of the dictionary is empty and only allows the match if it is.
- Dictionary entries must exist when the pattern matching takes place
defaultdictto create elements as a side effect of the pattern matching process will not work, and no elements are created as a result of a pattern matching attempt. The matcher uses the object’s
get(k)method to match the subject’s keys and values against the mapping pattern.
Matching elementary structures like dicts and lists is useful, but in larger applications you’ll often capture this knowledge in compound objects and rely on encapsulation to present a homogeneous view of your data.
Luckily, Python 3.10 can work with most object structures with either no work, or very little work required.
dataclasses work out of the box with the pattern matching engine. As the example above shows, extracting attributes from an object is very simple indeed.
Now let’s consider an anti-pattern that oh-so-many make. Namely, putting complex side-effect-causing code in the
__init__ constructor on a custom class:
When you create an instance of
Connection with a given
connect() method is called and, as it’s a demo, prints a message saying it’s connecting to the host.
Note that Python’s clever enough to not create instance of
Connection during the pattern matching step. (If it had tried that, we would have seen another “Connecting to server” message.)
So even if you have side effects in your
__init__ method there are some safeguards made to avoid causing them directly.
Having said that, where possible you should move that sort of logic into a dedicated class method that does that work for you.
Phew. It’s a large feature, and in part two of this series I’ll show you some real-world use-cases for it beyond the fairly simple examples you’ve seen here.
It’s a big feature with a number of gotchas – particularly around Capture and Value patterns – but I think the good far outweighs the bad. And it’s very likely that Python 3.11 will have an elegant solution for this problem also.
I believe structural pattern matching will cut down on bugs. Particularly when you deal with imperfect data, or structured data that requires transformation. Even if you’re not a data scientist or don’t work with ETL, this is such a common thing we all need to do that I am certain it will find a place in the hearts and minds of most Python developers.
- Pattern matching is declarative not imperative
You should consider anything you write in a
caseclause to represent the structure of data declaratively. Nowhere else in Python do you have the ability to qualify what the structure of your data looks like (dict-of-dicts, namedtuples or custom objects, etc.) but also the ability to selectively match and extract meaning from that data.
Transforming and extracting information from data is already hard work, but Python’s pattern matching library makes it much easier.
- Beware Value and Capture Patterns
For they are one and the same. Unfortunately. I am confident that future versions of Python will dull that sharp edge, but until it does, you should keep to the advice I gave earlier on and not pass in variables or constants to the pattern matching engine without first guarding it behind an attribute lookup.
- Pattern Matching encourages code without side effects
It’s hard – not impossible – to build code that accidentally calls other parts of your code in a pattern matching library due to the declarative and (mostly!) non-invasive way that Python probes the subjects and patterns that you write. You should consider how these concepts could apply to other parts of your code.
If you find that using the pattern matching engine causes side effects in your code, then I would take the time to reflect on your code is doing the right thing, and if you cannot find a way of doing the same work in a way that does not.