## How Many x86-64 Instructions Are There Anyway?

March 7, 2016

x86 is an enormously popular instruction set that is used on most desktop computers and servers (but usually not on mobile devices like phones). Given this wide use, it might seem like an easy question to ask how many x86 instructions there are, but it turns out this is more intricate than it looks.

### Background

x86 is the name of a whole family of so-called instruction set architectures (ISA for short). The ISA defines the available instructions of a CPU, but also the registers, the memory architecture, how interrupts are handled, and much more. It is different from the microarchitecture, though, which determines how the ISA is implemented. Different microarchitectures can implement the same ISA, which is what happens with different AMD and Intel microarchitectures implementing essentially the same ISA. Since the introduction of x86 by Intel in 1978, the architecture has evolved extensively, but with almost full backward compatibility. Partly for this reason, the instruction set is full of weird corner-cases and historical artifacts. Here, I will look at the 64-bit variant (sometimes called x86-64) of the ISA and attempt to count the number of instructions. Specifically, the ISA from the Haswell microarchitecture, one of the latest ISA available at the time of writing. Due to the backward compatibility, this instruction set still includes most of the instructions from the very first generation of x86, and anything in between.

### Attempts 1 and 2: Mnemonics

Machine instructions (the ones and zeros that tell your computer what to do next) have a portion called the *opcode* that determines what operation is performed. A mnemonic is a symbolic name for these opcodes. For instance, to add one to the `eax`

register, one could use the instruction `addl $1, %eax`

. Here, `addl`

is the mnemonic, and so we can count the number of unique mnemonics to determine the number of instructions. I’ve written a small program to do this:

`$ ./instr-count --mode "mnemonic_att" 1279`

Great, so we are done and there are 1,279 instructions, right? Actually, the story is more complicated. Those familiar with assembly will have realized that the `l`

in `addl`

is a suffix that indicates the bit width of the operands. For instance, to add one to a 16-bit register (instead of the 32-bit register `eax`

), one would use `addw $1, %ax`

. Both of these instructions are doing addition, maybe we should not count them separately. In fact, had we used Intel syntax for the instructions, then the same operation would be written as `add eax, 1`

(note the absence of a width suffix, and the inverted ordering of operands). Let’s try that:

`$ ./instr-count --mode "mnemonic_intel" 981`

### Attempt 3: Mnemonics and Operand Types

If we look a little closer at mnemonics, we find that the situation is actually not so simple. Let’s again look at the two instructions to add a constant to a 16 and 32-bit register, `addw $0x1, %ax`

and `addl $0x1, %eax`

, respectively. It turns out, that on 64-bit machines we consider here, they actually have relatively different semantics. The former just adds one to `ax`

. On the other hand, `addl $0x1, %eax`

also does the addition, but additionally zeros out the upper 32 bits of `rax`

. Here is an example illustrating this behavior:

Note that the 32-bit addition on the right zeroes out the upper 32 bits of `rax`

(marked in red), but the 16-bit addition (as well as the 8-bit addition not shown here) don’t do that. That doesn’t really seem like the same instruction any longer, even if both have the same (Intel) mnemonic. For another example why mnemonics can be misleading consider the instruction to perform a 32-bit signed multiplication: `imull`

. There is a variant that multiplies two registers, and there is a variant that multiplies two registers and a constant. Again, even though both of these share the same mnemonic, they hardly seem like the same instruction (one multiplies two numbers, the other three numbers).

Therefore, maybe we should really count all the different combinations of mnemonic as well as the argument types. For instance, `addl $1, %eax`

would be `add_imm32_r32`

. This ensures that we correctly identify the examples shown as different instructions, and leaves us with 3,683 instructions:

`$ ./instr-count --mode "operand_type" 3683`

### Attempt 4: Mnemonics and Operand Widths

Now we have a lot of instructions, and some of them seem like they are not really different instructions. For instance, the instruction to add a constant to a 32-bit register and the instruction to add a constant to a 32-bit memory location are counted separately, when the *operation* they perform could be argued to be identical. Thus, maybe a better way would be to consider the mnemonic in addition to the operand *width* . Our running example would be `add_32_32`

, regardless of whether the operand is a register, memory location or constant. This way, we end up with 2,034 instructions:

`$ ./instr-count --mode "operand_width" 2034`

### It’s All Wrong

Unfortunately, there are even more weird corner cases and historical artifacts in x86 that make counting instructions hard. For instance, we used mnemonics to group the same "mathematical operation" regardless of the operands, but it turns out that doing a bitwise `xor`

operation is done using `xor`

(in Intel syntax), but also `pxor`

, `vpxor`

, `vxorpd`

, `vxorps`

, `xorpd`

, and `xorps`

depending on the operand size and other things (and even more in AT&T syntax). So if attempt 2 tries to group the same operation regardless of the operation, then that count is actually too high.

And on the other end of the spectrum, we tried to count every different operation separately in attempt 3. However, it turns out there are instructions that really are a family of instructions. For instance, consider `cmpsd`

that compares two floating point numbers. A third operand, an 8-bit constant, determines if the comparison is for equality, inequality, if the first number is larger, and 5 more variants. These clearly seem like different instructions, yet we have counted them as a single one.

Unfortunately, I can’t think of an easy automatic way of counting such nuances, so I will leave it at the numbers I have.

### Conclusions

As I have hopefully convinced you by now, this innocent looking question is more difficult to answer than it might seem. It depends on your notion of what an instruction is, and is complicated by the many intricacies of the complex x86-64 instruction set. Here is an overview of possible answers:

Measure | Count | Comment |
---|---|---|

AT&T mnemonic (e.g., `addl` ) |
1,279 | Counts the number of unique mnemonics in AT&T syntax. |

Intel mnemonic (e.g., `add` ) |
981 | This is a rough estimate of the number of different kinds of operations the x86 instruction set can perform, ignoring the operand type and size. There are various caveats as described earlier, such as some operations having multiple different mnemonics. |

Mnemonic and operand types (e.g., `add_r32_imm32` ) |
3,683 | Distinguishes instructions in every way possible and is thus a sort of upper bound on the number of instructions. If you are looking for a very fine-grained notion of instruction, this is a good measure. |

Mnemonic and operand width (e.g., `add_32_32` ) |
2,034 | This is an estimate of the number of different kinds of instructions, if an operation on 8 bits is considered to be different from the same operation on 16 bits (because sometimes, these actually have different semantics, as shown with the zeroing of the upper 32 bits for the `addl` instruction). Does not distinguish instructions that operate on registers vs. constants vs. memory locations. |

Personally, the 4th attempt seems to capture my intuitive understanding of what an instruction is the closest. So, if I ever have to cross the bridge of death and get asked how many x86-64 instructions there are, I shall answer 2,034.

All the numbers in this blog post have been obtained through a small program making use of our awesome C++11 library for working with x86-64 assembly . Just like the library, my program is available as open source on GitHub .

转载本站任何文章请注明：转载至神刀安全网，谢谢神刀安全网 » How Many X86-64 Instructions Are There Anyway?