Game Boy Emulator: Writing the Z80 Disassembler
Let’s continue where we left off in the introduction to Game Boy Emulation with a deep dive into the Game Boy’s opcodes and operands – the language of the Z80 CPU – and how to make sense of it all.
As you’ll recall, the Z80 is an 8-bit CPU with a selection of 16-bit instructions, and each instruction has an associated opcode and zero or more operands used by the instruction.
Much later on we’ll implement the specifics of each instruction, but before we do, we need to understand how the Game Boy passes information to the CPU for processing; to understand that, we’ll start out with a quick run-through of what a cartridge ROM is, before moving on to writing our first piece of the emulator: a disassembler.
What is a Game Boy Cartridge ROM?
When you slot a cartridge into the back of the Game Boy it – somehow – boots up, and starts the game. Game Boy cartridges differ quite a bit depending on the game it was made for, the era it was created in, and the game developer who made it.
They all have some form of storage for the game’s code. Some of the larger games have more than one chip, and therefore need a memory bank controller in the cartridge, as the Game Boy only had a 16-bit address bus. The games could then switch between the chips, as needed. Later generations featured everything from camera attachments to accelerometers. Each of these features would in turn would simply write to dedicated areas of memory which the Game Boy could in turn read and the game’s code make use of. Simple, but effective.
Some also featured some sort of main memory, to store things like high scores and save games, and a small battery to keep a charge to said chip, to prevent data loss.
Laid out in full, the size of the cartridge’s effective storage ranged from 32 KiB to several MiB.
So, that’s a cartridge. A ROM – ROM being Read-Only Memory – is a catch-all term used in emulator circles to describe a clone of a cartridge, floppy disk, CD-ROM – anything, really – laid out in a format that emulator writers have agreed on over time. For simpler things it’s a 1:1 mapping. One byte in a chip somewhere; one byte in a file on your PC. Game Boy cartridges mostly work that way, which is good news for us.
To start with, and for quite a while actually, we won’t worry too much about complex memory bank switching and instead focus on games that don’t have any of those. They are easily identifiable in one of two ways: the size is
32 KiB exactly and the other we’ll talk about later when we look at how to read out cartridge ROM metadata.
- Cartridge ROMs are byte-accurate ROM images of the cartridge’s chips
So a cartridge ROM, then, is just a series of bytes lifted from one or more chips in a physical cartridge. And that’s exactly the representation we want, as it’s easy to reason about.
Reading a Cartridge ROM’s Metadata
Unpacking binary data with the
Most languages come with some sort of notation for representing collections of typed data. In C, it’s
struct. In Pascal, it’s
record. It’s an efficient way of structuring information, especially as you can order the compiler (if there is one) to pack the structure in such a way that you have complete control over the layout of that structure, bit-for-bit, in memory and on disk. That’s a useful property when you want to represent collections of bytes, like we need to with the cartridge’s header metadata.
You can do this in myriad ways in Python. The problem, however, is that binary structures like this one requires an eye for precision: you need to not only read out the information byte-by-byte, but also take into account things like:
- Endianness, or the direction in which you read a sequence of bytes
Big and little endian systems interpret byte structures differently. The Z80 was a big endian CPU, and yours is probably little endian.
sys.byteorderin your Python interpreter to tell for sure.
- Signed vs Unsigned integers
Unsigned integers are positive integers only. Signed, on the other hand, is both negative and positive. The representation you pick will determine the value held in the byte string.
Is it a C-style string or a Pascal-style? The former terminates a string with a NUL character to indicate the end is reached. But Pascal strings prefix theirs with the byte size of the string ahead of it.
Are you reading an 8-bit number or a 16-bit number? Perhaps an even larger one?
And the list goes on and on. In other words, the bits and bytes that make up our data is a matter of representation. Get it wrong, and you’ll read in garbage or, worse, it’ll work with some values but not others!
struct module that ships with Python is equipped to deal all of these issues. Using a little mini-language, not unlike the one you’d use for format strings, you can tell Python how to interpret a stream of binary data.
Big and Little Endian
Let’s briefly talk about endianness and what it is. It plays a prominent role in how we read and represent information. It’s the order in which you read a sequence of bytes of data.
A term borrowed from the book, Gulliver’s Travels, of all places.
So consider the following hexadecimal string in Python:
When that byte string is represented as little or big endian, the decimal value changes. Recall that at this point it’s just a byte string; it has no meaning yet. That means the numerical value of the hexadecimal string
AB CD is ambiguous if you don’t know whether the person who wrote it chose big or little endian!
Consider the variable,
data, from before:
And that’s because the orientation of the data differs between the two endian formats. Little endian interprets it as
CD AB and big endian as
Now you might wonder why it’s
CD AB and not
DC BA — i.e., why is the boundary a byte and not half a byte?
The long and the short of it is that most CPUs are (at least) 8-bit addressable, meaning address bus will read and write at least 8 bits (or 1 byte) of data. The Game Boy has an 8-bit CPU but 16-bit addressable bus, so the smallest unit it operates on is 1 byte.
Weird CPU platforms may differ, and many did 50 years ago, but as far as we’re concerned, CPUs today operate on multiples of 8 bits.
To demonstrate, you can convert any decimal to a byte string padded to a given length in big or little endian. Here I am using hexadecimal notation to match one byte (the
length keyword) of the example byte string from before.
As you can see, no byte transpositions took place. The reason is this: as the smallest unit we operate on is 8 bits, there’s no difference whether it’s read left to right or right to left; the word
0xCD is just
0xCD. Now it’s perfectly possible to have bit-level (as opposed to byte-level) endianness, where the order you read bits in changes. But that’s not the case here though.
Now again but with a size of 2 (i.e., 16 bits):
And now it did transpose (as per the rule from before) with Python helpfully padding the extra byte with
0x00 in little endian to ensure a system expecting 2 bytes of little endian-ordered data reads it properly.
Converting between Big and Little Endian
As the examples above demonstrate, you can let Python do the hard work of converting between big and little endian. But you can also swap them manually with bit shifting:
- Converting a 16-bit value between big and little endian with bit shifting
I won’t belabor the method just yet; rest assured, bit twiddling is on the menu later on when we start implementing the Z80’s instructions.
This method works with values larger than 16 bits, too, of course, with a few modifications.
- Converting arbitrary values between big and little endian with
This method converts any integer to a byte string of the given byteorder –
Because integers are objects in Python, they come with an assortment of methods that you can invoke directly on them. I urge you to resist the temptation to do this with literal values and instead use
int. It’s far easier to read.
- Using the
arraymodule is a basic array implementation that ships with Python. You give it a size initializer (and more on what they mean in the next section) – a bit like
dtypein numpy – and Python handles the rest. This method’s useful if you have an array full of values you want to swap.
Byte strings and type representation
Mapping them to the fields is not hard, once you understand the basics. The main thing to remember, though, is that we only operate on bytes, like so:
Byte strings are important here because no conversion to or from your computer’s locale takes place; it’s just the raw form, untouched by any conversions to
UTF-8 or other character encodings.
Consider this byte string with a bunch of escape-encoded stuff in it:
When I decode it from its byte format into
UTF-8 I get… a snake. So the byte string’s just a raw segment of bytes; it can mean anything until we give it purpose: converting it to UTF-8 yields a snake, but if I use
struct.unpack_from I can tell Python that it must represent it as an unsigned integer instead:
So that’s the crux of what we need to do with the Cartridge Header. We need to come up with a series of format string characters to give to
unpack_from so it can work its magic.
Luckily we only need a couple of different ones:
|Format String||“C”-equivalent type||Purpose|
||Pad Byte||Skips a byte or pads out another format string. Useful for stuff we don’t care about.|
||Use your system’s native endian format||Probably what you want. Python will determine if it should use little or big endian when reading the data|
||Big & Little Endian Indicator, respectively||Very important. The Z80 stores things in Big endian, so if our system is little endian we should tell it to represent it as little endian. Note: It must be the first character in the format string.|
Useful for arbitrary
lengths of text.
Takes a prefix to
indicate length, like
||Unsigned Short||2-byte unsigned integer|
||Unsigned Char||Used as 1-byte unsigned integer|
So to use it, you can combine the format strings into a sequence of unpack instructions. Consider this simple example that pulls out a couple of numbers – in big endian – and a string:
Pay close attention to
>. Try running the code with
< instead and again with
The key thing to remember is this:
- You want to convert to your platform’s native endian format
I mean, you don’t have to, but you’ll have to deal with mentally and programmatically swapping things around all the time. Not fun.
So in our case, the Z80’s big endian, so you should convert it to little endian if your platform is also little endian. If it’s big endian, you don’t need to convert or change anything.
- Knowing the byte order is critical
If you don’t know the byte order of a binary file format, you’re kind of screwed. You can try to reverse engineer the likely byte order by looking for telltale signs of format types’ encoding, like twos-complement, floating point, ASCII strings, but it’s a slog.
With that in mind, let’s get on with the cartridge reader.
Game Boy Cartridge Metadata Reader
The format string to
struct.unpack_from must be contiguous as it does not support newlines nor comments. To get around that, and to add a bit of clarity what would otherwise be a jumbled alphabet soup, I’ve built up a list of tuples, with each tuple holding the future attribute I want to reference the value by later. If it’s
None it indicates that I do not want to store the value at all.
With that, the Cartridge Metadata is sort-of done — well, the hard part anyway. Now let’s write a quick test using Hypothesis before we delve into the code that does the actual reading.
Hypothesis uses clever algorithms to generate test data to try and break your code. It’s great. You can read more about property-based testing with Hypothesis here.
So there’s a bit to unravel here, so let’s start at the top. I’m defining a number of constants for use in the test. The beginning and end of the cartridge header are known values to you now: they’re taken from pandocs along with the other cartridge metadata
The test itself uses Hypothesis to generate a random assortment of binary junk of
max_size equal to the size of the header plus its offset. I could just as easily offset everything by
-0x100, though, but I like the idea that I’m also testing that we can read from the correct offset.
The test itself features
read(), a helper function that reads
count number of bytes from
offset. Note that we need to add
+1 because if
offset = count = 1 then
data[1:1] == ''.
read_cartridge_metadata calls out custom code to read the metadata – more on that below – and checks that it reads a few of the fields. I’ve picked the title, as it’s a string, and the global checksum as it’s a two-byte field and endianness is therefore important to get right.
The final check ensures we read in the checksum as though it were big endian.
Now for the cartridge reader itself:
Yep. That’s it.
CARTRIDGE_HEADER pulls out just the key in each tuple from
CartridgeMetadata is a
namedtuple that we map each
field_name into that is not
struct.unpack_from function does most of the heavy lifting. It takes an optional
offset that we default to the usual location of
0x100. The unpacked tuple of values are fed directly into
CartridgeMetadata._make which turns the whole thing into a more accessible format:
And that’s it for the cartridge metadata reader.
- Endianness is important
But only if you represent more than a single byte at a time. The Z80 CPU is Big Endian, so keep that in mind when you read in values. If you’re using a little endian CPU (
sys.byteordertells you which) then that’s what you should ask for!
- All the pieces matter
The cartridge metadata has some use in our emulator, but it’s also a great tutorial to test and improve you knowledge of low-level constructs like the binary representation of things. It’ll come in handy later, and it’s a nice and easy way to ease your way into it.
- Python can easily represent, and convert between, the representations we’ll need for the emulator
Hexadecimals, big and little endian, binary, and any number of structured binary formats are all possible thanks to a number of, admittedly hidden, method calls.
The Z80 Instruction Decoder and Disassembler
A brief but important interlude.
Throughout the course I have referred to the CPU as Z80 (or Z80-style) as it is similar to the CPU in the Game Boy. But it is not entirely the same: it’s an Intel 8080-like Sharp CPU called LR35902. I will instead use the term Z80 even though it’s not 100% truthful. The reason for that is there’s scant documentation for the Sharp CPU on the internet except references to just the Game Boy. If you want to discover more literature on the CPU, your best bet is to search for Z80 as it’s a very common model of CPU. Keep in mind that the opcodes and some of the other CPU details do differ, though.
With a decent understanding of how the representation of a sequence of bytes depends on the context, let us now turn our attention to the disassembler.
One salient point before I proceed. The CPU emulator does not actually need a disassembler at all; but you will. The CPU only cares about decoding instructions from the byte stream, and it does not care about displaying them for humans to read on a screen. But, good debugging and instrumentation facilities is paramount to a successful emulator project. And the best place to start is with the disassembler (and decoder) as you’ll want to understand the instructions the CPU is about to emulate, and why.
In Game Boy Emulator Introduction we parsed the opcodes file and there was an optional task to pretty print the opcodes also. We’ll need those parsed dictionaries of opcodes for this next step. I opted for dataclasses; they look like a little bit like this:
We need two dictionaries of instructions. One for the prefix instructions, and another for the regular instructions. There are two because it is not possible to represent all the different instructions with just a single byte. The prefixed instructions are thus, well, prefixed with
0xCB to indicate to the CPU that the byte following that one is the prefixed instruction.
CB 26 has the mnemonic of
SLA (HL). You can see a list of the CPU Instruction Sets on pandoc and, of course, in your parsed dictionaries. I also recommend you keep the Game Boy CPU Manual on hand as it has more detailed explanations of the instructions.
So now that we have a list of opcodes it’s a case of mapping a stream of bytes to their opcode equivalents. There are, however, a couple of snags that make it infeasible to use the
struct approach we used above:
- The byte lengths of the instructions are not fixed
Each instruction size varies from one to two bytes. All prefixed instructions are by their nature two bytes long.
- Opcodes are variadic
Some opcodes have operands, and others do not.
NOP) has no operands, for instance. But
CB 26has one. Some also reference a special memory location, further lengthening the amount of bytes to read.
- The offset you read from is unknown
Maybe you’re reading from
0x0, or perhaps another offset.
- The stream is potentially infinite
This is not the case when we disassemble a cartridge ROM (it has a fixed size), but it could happen once our emulator starts executing instructions, and we’d have no easy way of knowing, either .
It’s known as The Halting Problem.
So it’s much easier to take what we’ve learned and go about reading the data in one byte at a time, using the parsed opcodes as a guide for what we need to read.
So the goal is roughly:
Given an address (think index in an array of bytes) and our parsed opcodes, read one byte and increment address by 1
If the byte equals
0xCB, use the prefix instructions opcode lookup table, and increment the address by 1.
Get the instruction from the opcode lookup table
Loop over the instruction’s operands and:
If the operand has
bytes > 0, read that many bytes and increment the address by the same and store it as the
valueof the operand.
bytes is Nonefield, then the operand is not data value and a fixed operand, so store that instead in
At this point you’ll have an instruction and associated operands, if any. Return the address and the instruction.
Ensure that any value you read is converted to your system’s byteorder. Use
The point of the exercise is to translate strings of bytes into the equivalent high-level instruction that both the CPU and us, the developers, can comprehend. Because the byte length varies depending on the opcode, we cannot simply chunk the stream into packets of instructions to parse.
Let’s start with a test for the
Here I’m using a pytest factory fixture to generate the
Decoder object that’ll do all the heavy lifting. The test, then, generates a decoder with a bytestring
\x00. Next, I ask the decoder to decode address
0x0 (which is of course the first and only byte in our bytestring) and assert that the instruction matches the one I got from my parsed opcodes file, and that the address returned by the decoder reflects the new position:
Now for the decoder. Let’s start with the constructor and the skeleton of the class.
The Decoder requires
data to decode. Later we’ll replace the generic concept of “data” with the emulator’s memory banks. For now, a generic bytestring is a decent stand-in.
There’s also an
address that we encapsulate so we can later query the last position it had. Not needed just yet, but useful to have around. Finally there are two dictionaries containing the parsed opcodes.
create classmethod is a factory that reads in the opcode file and calls
load_opcodes (not shown) that parses the JSON opcodes file. It also takes two other parameters to seed the Decoder with data and a starting address.
Random aside: I recommend you avoid cramming code with side effects into
__init__ constructors as it’s almost always a code smell. If creating or talking to other things is part of the contract of the class, you should instead put it into a
@classmethod that does it for you, like I do here.
Now you can create an instance of
Decoder directly and pass in faked dictionary values without having to patch out, or feature switch, the
load_opcodes call like you’d otherwise have to if you had it in
And now for the meat of the class. The decoder method itself.
I think the
read method speaks for itself. If we attempt to read beyond the bounds of the bytestring, raise an
IndexError, otherwise return
count number of bytes from
decode method follows the algorithm I laid out above. We read one byte at a time, remembering to increment
address when we do, and if there are operands associated with the matching instruction, we read an additional
operand.bytes (again incrementing address) and store it in
operand.bytes is None we instead just store the operand as-is.
The reason for the
bytes is not None check has to do with how the opcode table in the JSON file is laid out. Not all operands are parametric and require additional bytes to read. If they have no bytes to read, we still want the operand.
Both dictionaries of instructions contain instances of the
Instruction dataclasses that I defined in Instruction and Operand Dataclasses. The only thing to note is the
copy methods that return an identical copy of the
Operand instances, but with the
Instruction) swapped out.
I also added a couple of pretty printers to both the
The printer code is self-explanatory. The goal is to format an instruction (and any operands) to look like hand-written assembly code. There’s a style to it, and you can see it’s more or less the same in all the Game Boy and Z80 Assembly language manuals.
With a pretty printer and working decoder we’re almost done:
Generalizing this to a function capable of disassembling an arbitrary length of bytes is now easy:
Which, when run with the offset of
0x150 (which happens to be entrypoint for
And that’s it. A working disassembler. Advanced ones like Ghidra and IDA Pro come with a battery of additional features like figuring out call graphs, where functions begin and end, and so much more. But this is enough for us to begin to understand what our future emulator CPU is executing.
We’re now ready to tackle the next part of the equation: writing the framework that will make up our CPU; the CPU registers (and what they are); and a crash course on Z80 assembly language to get us started.
- Representation is a matter of interpretation
Big and little endian is one thing to be aware of. Another is that a consecutive series of bits and bytes can mean different things. And we’ve only scratched the surface. Later on the concept of signed and unsigned numbers and how to represent them rears its head.
- Disassemblers are key to CPU emulation
If you’ve never done systems programming before, then the thought of writing a disassembler may seem difficult or challenging: and they definitely can be, if you have to reverse engineer the opcodes and operands! We’ve been given a big leg up because someone has carefully transcribed the opcodes and operands into parseable JSON. Without it, we’d have to do that tedious manual work first.
But even though pretty-printed disassembly is useful to us, the developers, the CPU still needs to go through what is known as a “Fetch-Decode-Execute” cycle. We’ve simplified the fetching, for now, as it does not read from memory yet. But the decoder is complete and it’ll serve as a keystone in the emulator going forward.