How do computers loop the loop

What exactly does a cpu do?

For a second let’s backtrack to a middle school computer class, that’s starts off with a fundamental question: What’s a computer? Time for some personal definitions, a computer is anything that can compute. That’s cool, but what’s compute?

Compute is what transforms input data to output data. Hmm, so basically a computer takes in input data and spits out output data?

For a middle school student, this might be an acceptable answer, but a computer does not only transform input data to output. It also has appartus to take in input and present output in various forms(text,audio,video…). What we have defined here is actually the CPU of the computer
What takes in input and presents output are peripherals in hardware land.

The transformation of input data to output data is handled by the CPU

What’s the input and output for the CPU?

There are two inputs to a cpu: instructions and data. Literally every program that runs on your machine is a combination of instructions and data(see ELF for example)

The output of the CPU is transformed data: result of application of the input instructions to the input data.

The transformation is enabled by the hardware machinery that the CPU is equipped with.

What’s the difference between instruction and data?

We have used the words instruction and data loosely in the previous discussion. As far as we know they are just a stream of bytes, right? So what’s the difference?

The stream of bytes that can be used by the CPU to perform a dedicated functionality is called an instruction

So clearly a "Hello World" string in memory is different from an add instruction. To extend this a bit further, there are two different types of CPU architectures:

Harvard Architecture
Von-Neumann Architecture

The Harvard architecture separates the memory that stores instructions and data(IMem and DMem respectively) while Von-Neumann architectures use a contiguous block of memory for both.

A good exercise at this point is to figure out how your CPU stores instructions and memory. Is it the Harvard way or the Von-Neumann way?

Now that we have established the differences between data and instructions, we can examine the different kinds of instructions in a modern CPU.

How does a very simple CPU look like?

The initial idea that I had was to put an image of a RISC-V single cycle processor, the ones that you might see in a computer architecture class or in a book like Patterson or Hennessy. Instead of giving you a circuital picture, I am gonna spread out these components using some code(since my target audience for this article are primarily software devs). Lets take a simple python program that adds two numbers:

if __name__ == "__main__":
  a = 4
  b = 2
  c = a+b

Lets take each line of this program and think about what happens semantically. Inside the main function first we bind the value of 4 to a and the value of 2 to b. Then we bind the value of their sum to a variable c. Now I am gonna take parts of these sentences namely “bind the value” and “value of their sum”. Now we have boiled the conversation down to two questions:

How and were do we bind these values?
How do we calculate this sum?

Lets tackle the first question. Fundamental units of memory in your CPU is what electrical engineers call as a Flip-flop. Even though the memory of a modern computer is not restricted to registers(For example your SSD), inside the CPU the values for computation are stored in these small boxes(Small as they are usually the size of your platform architecture, 32 or 64 bits are most common nowadays). In this article I will not go into the details of the construction of a register. For us currently its just a box that stores values inside the CPU.

Now the second question, how do we calculate the sum: Well we can use a binary adder circuit. You might go like what??, binary?? Where did binary come from. Well all data in a computer are basically 0s and 1s(Everything from the pixels on your screen to the video packets that you recieve while streaming your favourite movies and tv shows). From elementary math, we know how to add numbers, but how can we add binary numbers? Well the algebra is similar, but the primary issue is all operations on these bits are logical ones. You can take two bits AND them,OR them, XOR them etc and using these operations we can add them together (see how they work)

Let’s summarize the discussion so far:

Computers store values in small boxes called registers
Logical operations are done via logic gates like AND,NOR,NOT etc
Arithmetic operations like addition,multiplication etc are done by constructing circuits built with logic gates that can achieve the same(this is done by using some boolean algebra + mapping the boolean algebra to logic gates)

So our CPU is a bunch of registers + circuitry that performs addition,logical operations, bit shifts etc. So are we done here? Not really, modern CPUs have more functional units like caches,memory management units,special circuitry for floating point operations etc. But from a birds eye view this is a pretty accurate picture. But we still haven’t covered all the different instructions a modern CPU has. Also we haven’t talked about:

its interaction with external memory.
The way it handles the more complicated constructs outside of arithmetic and logical operations(like loops and conditional logic).

What are these CPU instructions?

Let’s take a practical scenario: you have a calculator program: you give it binary inputs and an operation and it spits out an answer. Let’s try to think about how such a scenario can be handled within a CPU. First we have to store the binary inputs in our memory boxes/registers. Then based on the operation at hand we need to activate a particular circuital path that can perform that specific computation. This is based on the operand input that we give to the CPU. Let’s write some python code to model this situation:

  add = lambda x,y: x+y
  subtract = lambda x,y: x-y
  multiply = lambda x,y: x*y
  divide = lambda x,y: x/y

  if __name__ == "__main__":
    if operand == "+": return add(x,y)
    elif operand == "-": return subtract(x,y)
    elif operand == "*": return multiply(x,y)
    elif operand == "/": return divide(x,y)
    else: return "Invalid operand"

The computer will store each of the functions add,subtract,multiply and divide in certain locations in its memory. Based on the operand the cpu has to choose one location in instruction memory and then start execution from that location. The instruction that is executed by the cpu is indexed by something called as a program counter. The counter monotonically increases in case there are no conditional operations(common in if/else, switch statements and loops), monotonically here means the program counter moves the pointer from the current instruction to the immediate next one once its executed.

An analogy for this is: suppose you had 4 friends: one of them is very good at addition, another at subtraction,another at multiplication and another at division. Suppose they live at different room numbers in a hostel. When you recieve an addition operand you go to room 329 which is where your friend add lives. For the cpu when it encounters an add operand it will change the program counter to the location of the add function routine(analogous to the room number). When the program counter gets changed abruptly(or a better way to phrase it is, when the program counter is written to, with a value that is not the location of the next instruction), we say a branch has been taken.

From here we infer that in addition to arithmetic and logical instructins we need to have our branch instructions for non-linear/branched execution. Now the next question is “how does the cpu interact with external memory”

Well there are only two ways you can interact with memory, you can write to it and you can read from it. Writing to memory is what we call a store instruction. Reading is a bit more nuanced here: we don’t exactly want to read whats at a memory location, our goal is to perform computations with the value at that memory location. This according to our initial discussion requires the data to be placed in our register boxes. When we read from external memory and write to cpu registers, its called as load and the instruction that corresponds to the same is called as a load instruction.

Now lets summarize the whole discussion:

Our CPU has various different parts:
- Registers/ memory boxes to store values for computation
- Logic gates and their combinations to do logical and mathematical operations
- Program counter to control the flow of instructions(index the instruction memory: where our compiled instructions are stored)
We have 4 different kinds of instructions:
1. Arithmetic instructions/Logical instructions
2. Branch instructions
3. Load instructions
4. Store instructions

Now let’s try compiling an actual c program to its corresponding RISC-V assembly and check what are all the instructions that are being used.

Lets take a piece of c code:

#include<stdio.h>
int main(int argc,char* argv[]){
  return 42;
}

compiled to 32 bit risc-v assembly

 main:
        addi    sp,sp,-32
        sw      s0,28(sp)
        addi    s0,sp,32
        sw      a0,-20(s0)
        sw      a1,-24(s0)
        li      a5,42
        mv      a0,a5
        lw      s0,28(sp)
        addi    sp,sp,32
        jr      ra

Let’s go line by line here:

First instruction is addi which is an acronym for add immediate. Immediate instructions use a numeric value, instead of a value stored in a register to perform computations. For example add a3,a1,a2: translates to take the value in the a1 register and a2 register and add them together, then write the value to the register a3, whereas addi s0,sp,32 translates to take the value in register sp and add the numeric value(immediate value) 32 to it.(Why the number 32? It has to do with the fact that we have compiled the code for a 32 bit processor)
The next instruction is sw s0,28(sp): sw translates to store word, a word refers to a 32 bits in external memory. This instruction stores the value in register s0 to the external memory at index: (value at register sp+28). This might seem a bit intimidating at first but this is just a store instruction, the address that we write to is just computed using the values in our memory boxes(using the same circuitry we used for adding numbers)
addi is similar to what we have seen in (1).
similar to (2).
similar to (2).
Here the instruction is li a5,42. li translates to load immediate. From the semantics mentioned above we load an immediate numeric value of 42 to the register a5.
mv instruction(move) basically copies the value in the register a5 to the register a0.
similar to (1)
jr: is a jump instruction. During jump instructions the return address(the address which should have executed had the jump not happened) is written to a special register and the program counter is assigned a value equal to the value in the register ra. Not that since the program counter has changed we have made a jump. Also you might observe that this mechanism is very similar to how functions work, we jump execution the definition of the function whilst keeping track of where to go when the function returns.

#include<stdio.h>
int main(int argc,char* argv[]){
    int a = 3,b = 5;
    if(a != b){
        return 0;
    }else{
        return 1;
    }
}

compiled to 32 bit risc-v assembly

main:
        addi    sp,sp,-48
        sw      s0,44(sp)
        addi    s0,sp,48
        sw      a0,-36(s0)
        sw      a1,-40(s0)
        li      a5,3
        sw      a5,-20(s0)
        li      a5,5
        sw      a5,-24(s0)
        lw      a4,-20(s0)
        lw      a5,-24(s0)
        beq     a4,a5,.L2
        li      a5,0
        j       .L3
.L2:
        li      a5,1
.L3:
        mv      a0,a5
        lw      s0,44(sp)
        addi    sp,sp,48
        jr      ra

Here in the assembly, we can see that there are a couple of new instructions. Listing them out:

beq: this instruction translates to branch if equal. beq a4,a5,.L2 => if the values in the a4 and a5 registers are equal then jump to the the address corresponding to .L2
j: j writes the value of the return address to the x0 register and continues execution

Here .L2,.L3 are labels. They represent the value of their index in the instruction memory(This is where the program counter should jump to , to execute the corresponding functions). The li(load immediate) instructions load a value of 3 and 5 to registers a3 and a5 and beq checks if they are equal, if they are we jump to l2, where we store 1 to our a5 register and we continue execution from L3. If they are not equal a value of 0 is stored on the a5 register and we continue executing from L3

Using this program we have understood how the mechanics of an if else statements looks under the hood. Now let’s try the same on a loop.

#include<stdio.h>
int main(int argc,char* argv[]){
    int i = 0;
    do{
        i++;
    }while(i!=3);
    return 0;
}

main:
        addi    sp,sp,-48
        sw      s0,44(sp)
        addi    s0,sp,48
        sw      a0,-36(s0)
        sw      a1,-40(s0)
        sw      zero,-20(s0)
.L2:
        lw      a5,-20(s0)
        addi    a5,a5,1
        sw      a5,-20(s0)
        lw      a4,-20(s0)
        li      a5,3
        bne     a4,a5,.L2
        li      a5,0
        mv      a0,a5
        lw      s0,44(sp)
        addi    sp,sp,48
        jr      ra

Here let’s find where our branch instruction lies(cause loops loop based on some conditional check followed by a change in the flow). bne a4,a5,.L2, checks whether the values in the a4 and a5 registers are equal, if yes we branch to L2 and start execution there. Lets analyze the loop body that is the instructions between the branch conditional check and the place where the execution starts for the branch(since this is do while the branch conditional check comes after the loop body).Notice that addi instruction adds a value of 1 to the a5 register(which is the loop body). The value in the a5 registers is stored into a memory address and its loaded to the register a4. So now a4 keeps the state of the variable i. Then we load the value 3 which we are checking equality for and then we compare a4(current state of i) with a5(the value which we check equality for).

If you have hung around in the programming ecosystem(looking at you hackernews and tech twitter), you might have heard of something called as the goto statement. Most people consider it a bad practice since modern languages offer great control flow primitives and most likely you can replace your goto call with something more readable. But I feel is that people spent years on using these constructs without really understanding what actuallly happens under the hood. Computers do not have a control flow primitive, it can only change the value of the program counter. Conditional change of the program counter is what gives rise to all kinds of iterative behaviour regardless of the construct(while,for,forEach etc…).And this is where I personally feel gotos are relevant. If you dont write a goto, the compiler will write a goto for you. Just for the sake of demonstration I have rewritten the previous program using a goto. Reading this code gives a far better idea on how the assembled code looks like as compared to the do while loop written above.

#include<stdio.h>
int main(int argc,char* argv[]){
    int i = 0;
    jump:
        i++;
    if(i != 3){
        goto jump;
    }
    return 0;
}

main:
        addi    sp,sp,-48
        sw      s0,44(sp)
        addi    s0,sp,48
        sw      a0,-36(s0)
        sw      a1,-40(s0)
        sw      zero,-20(s0)
.L2:
        lw      a5,-20(s0)
        addi    a5,a5,1
        sw      a5,-20(s0)
        lw      a4,-20(s0)
        li      a5,3
        beq     a4,a5,.L3
        j       .L2
.L3:
        li      a5,0
        mv      a0,a5
        lw      s0,44(sp)
        addi    sp,sp,48
        jr      ra

Coming soon 🚀

Soon I will try to analyze what happens when a while or for loop is used, how they can be constructed from our do while loop and how their implementations in assembly closely resemble the goto equivalents and append to this blogpost, till then stay tuned.