Recently I was having a discussion with some friends about programming, and this question came up: is writing x+=5 the same as writing x=x+5? It would seem obvious that the answer is yes. However, as I will explain in this post, the real answer to this question is not really a simple yes or no.
I’ll start with the obvious part. Writing x+=5 and x=x+5 will have the same end result: they will both increase the value of variable x by 5. This is (hopefully) something we can all agree on. Ok so problem solved right? Well, not exactly. What each of these statements does is exactly the same, but how they do it may not necessarily be the same. Allow me to explain.
For the sake of this argument, let’s assume we’re writing this program in C (ANSI C specifically). Well your computer can’t understand C code natively, so you hit that compile button to convert your C code to machine code. This (the compilation process) is where things get interesting, and where we may begin to see a difference between the two statements.
Now time for a disclaimer. The exact way that each of these statements gets compiled is compiler specific. This example doesn’t target any specific compiler, but rather demonstrates a possible implementation of the two aforementioned statements. I will explain this in greater detail later in the post.
The = operator is the basic assignment operator. In the generic case, x=y, the value of y is stored in variable x. If the register EAX contains a pointer to variable x, and register EBX contains a pointer to variable y, this statement may be compiled to the following assembly operations:
mov [EAX], [EBX] ; Store the value of variable y in variable x
Now for a slightly more complicated example using the addition operator as well as the basic assignment operator: x=y+5. This says to the compiler take the value of variable y, add 5 to it and store it in variable x. The compilation may yield the following assembly operations (assume the same register assignment from the previous example, and that register ECX is not in use for anything else):
mov ECX, [EBX] ; Take the value of variable y add ECX, 5 ; Add 5 to it mov [EAX], ECX ; Store it in variable x
Carrying along that line of thought, let’s look at x=x+5. In literal terms, this tells the compiler to take the value of variable x, add 5 to it and store it in variable x. Using EAX as a pointer to variable x and ECX as an extra register, x=x+5 compiles the same way as the other addition statement:
mov ECX, [EAX] ; Take the value of variable x add ECX, 5 ; Add 5 to it mov [EAX], ECX ; Store it in variable x
On the other hand, x+=5 says something slightly different to the compiler: it says add 5 directly to variable x. Now the x86 architecture supports adding to a memory location directly, so this would allow the statement x+=5 to compile as follows (assuming the same register assignment):
add [EAX], 5 ; Add 5 to variable x
So the implementation of x+=5 is clearly more efficient than the implementation of x=x+5: x+=5 uses only a single op code, opposed to x=x+5 which uses three. Remember fewer op codes means fewer CPU cycles are necessary to run the code which means faster execution. Therefore it is clear in this case that, under the hood, x=x+5 and x+=5 are in fact not exactly the same, and that x+=5 is indeed preferable.
Now I already said that the way each of these statements gets compiled is compiler specific, and that the “compiler” used in the above examples is in fact a fictitious compiler. So now, let’s apply this example to real compilers.
When it comes to compilers, there are two types that are of concern here: optimizing compilers and non-optimizing compilers. Optimizing compilers will attempt to optimize your code as best it can for you before it gets compiled. A good optimizing compiler will come across your x=x+5 statement and realize that it is the same as x+=5. It will then treat it as though it were x+=5 and compile it to the more efficient code. However, non-optimizing compilers do not do this. They will most likely treat x=x+5 the same way they would treat any other basic assignment statement, yielding the less efficient code. This is all just a general trend though; it is by no means the case for every single compiler out there. There are some optimizing compilers that aren’t all that good at optimizing, and there are some “non-optimizing compilers” that know to treat x=x+5 as x+=5
In the modern (post 2005) day, most compilers are optimizing compilers. Both GCC and Microsoft’s Visual C++ compiler are optimizing compilers[2,3]. So now you’re probably wondering why this is even of any concern today. Here’s the reason: still to this day, not every single compiler is an optimizing compiler. There will be times that you will not be able to use GCC, Visual C++ compiler or any other main stream compiler. I remember once I was writing a program to run on and old Zilog z80 and the only compiler available was a non-optimizing compiler from the 1980s. And this isn’t just for archaic compilers either. I’ve used compilers from as recently as 2014 that are non-optimizing (this compiler even has a statement about this specific situation on page 6 of its manual). Let this serve as an example of why it pays for a programmer to know how his compiler and processor work.
Additionally, this post only examines the Intel x86 architecture. Every architecture is different as is every compiler for every different architecture. We can’t deal with the whole big bang though and this post must end somewhere.