Rohith Suresh

Inheritance and Polymorphism in C and Golang

July 27, 2023 | 11:25 AM

Table of Contents

Open Table of Contents

What’s a programming paradigm?

Two year’s ago, I was sitting in an interview (online because of covid) for a campus web-dev team and the interviewer asked me this question What do you mean when you say Javascript is multi-paradigm?.

As any other run of the mill freshman in college, I had zero clue what it meant: except for a vague idea what the word paradigm meant (thanks to a million essays that use the phrase “paradigm shift”).

Since now that I am no longer a naive freshmen(as I would like to think :) ), I can probably give a good answer to this question.

Paradigm basically refers to style, take music as an example: there are N different styles of music and there are starking differences between each style.

For example, blues music sounds very different from country music. What constitutes the differences between them? Are there similarities? How does this relate to different programming paradigms a.k.a styles?

Well I am no music expert and can’t give a good answer about the difference between blues and country music. The similarities between them, I would hazard a guess would be that they adhere to the same fundamental principles of music: the notes should be of the right frequency(not flatter or sharper) and the they should adhere to a time signature(or in laymans language should be in rhythm)

The first thing people usually do when they are asked about programming paradigms is that they start namedropping: Functional,Object-Oriented,Imperative,Logic programming etc

I feel that the reason for this is because “style” in the context of programming is too vague a term, the differences between functional and imperative programming is not just a stylistic difference, it is almost a philosophical difference.

Instead of name-dropping the paradigms here, I am gonna code-drop(not sure if such a word exists). Lets take a matrix multiply algorithm in pure and pristine C and Haskell.

transpose :: Num a => [[a]] -> [[a]]
transpose mat
  | all (\x -> length x == 1) mat = [map head mat]
  | otherwise = (map head mat) : transpose (map tail mat)

vectorMultiply :: Num a => [a]->[a]->[a]
vectorMultiply   = zipWith (*)

multiplyAccumulate :: Num a => [a] -> [a] -> a
multiplyAccumulate vec1 vec2 = sum (vectorMultiply vec1 vec2)

multiplyMatrices :: Num a => [[a]] -> [[a]] -> [[a]]
multiplyMatrices mat1 mat2 =
  map (\x -> (map (\y -> (multiplyAccumulate x y)) (transpose mat2) )) mat1
int** multiplyMatrix(int** matrix1, int** matrix2, int row1, int col1, int row2, int col2) {
    for (int i = 0; i < row1; ++i) {
        for (int j = 0; j < col2; ++j) {
            result[i][j] = 0;
            for (int k = 0; k < col1; ++k) {
                result[i][j] += matrix1[i][k] * matrix2[k][j];
            }
        }
    }
}

Both these programs do kind of the same thing: multiply matrices. What is the real difference between both these approaches?

Haskell tells you that in order to find the product of two matrices you should:

  1. Take each row in the first matrix, then take each row in the transpose of the second matrix a.k.a the first column
  2. These two vectors should be multiplied(dot product) and the sum of the matrix should be taken
    • This is called multiply-accumulate and modern GEMM accelerators have them implemented in hardware

C tells you that in order to find the product of two matrices you should:

  1. Iterate through each row of the first matrix
  2. For an element in the first matrix take each element in the column of the second
  3. Take each element in the ith row of the first matrix and multiply it with the corresponding element in the jth column of second matrix.

Stylistic and Philosophical Differences

Now let’s think about the stylistic differences between the imperative and functional style:

  1. Haskell ensures that each function has a dedicated functionality with no side effects:

    • This eliminates unwanted effects like variables getting mutated
    • The restriction to pure functions, along with nifty operations on containers (like map,zip,foldl/reduce etc) makes checking for theoretical correctness easier
    • Repl based debugging takes over, since once each unit works, it is theoretically certain that the function should produce the correct output
  2. C on the other hand is completely imperative:

    • It would take an experienced C programmer a couple of seconds to glance over the code and determine the generated assembly
    • The implementation details are completely exposed to the programmer:
      • Thus it is easy to reason about performance especially in context of the hardware( thus every tutorial on cache aware algorithms use C)

Haskell focuses on the idea that fundamentally compute is just functions composed on top of each other and the focus is on theoretical correctness , whereas C focues on the idea that the CPU is a large state machine clicking at each clock cycle executing each instructions.

This might explain the overwhelming majority of mathematicians using Haskell and electrical engineers using C

Enough about “programming styles a.k.a paradigms”. Let’s move to the meat of this post: Inheritance and Polymorphism

What exactly is inheritance and polymorphism?

In object-oriented programming (OOP), Inheritance is a fundamental concept that allows a class (known as the child or derived class) to inherit properties and behaviors from another class (known as the parent or base class). The child class can reuse and extend the functionality of the parent class, promoting code reusability and creating a hierarchical relationship between classes.

Polymorphism is the ability of a class to exhibhit various behaviours according to their implementations.

If you are kind-of familiar with Java/Typescript, think of the first as extends and the latter as interfaces/abstract classes. These concepts form the philosophical bedrock for object oriented programming along with some others which you can probably name if you have AbstractProxyFactoryBeaned at least once in your life.

Personally I dont prefer any of these paradigms as the core “philosophy” backing the way I write code.

I tend to think of programming with one fundamental abstraction a function (Not to be confused with the first sentence on your “Intro to functional programming” slide).

What’s your function abstraction and how is it relevant?

The function mentioned here is not a pure function. printf satisfies my criteria of being a function, so does the ptrace system call on Linux and so does foldl in Haskell.

Ok I get it but why Inheritance and Polymorphism then? Why can’t you just compose functions and call it a day like a functional programmer.

Hmm not quite. Let’s look closer into What exactly is a function?. A function takes in arguments, does some compute within its body and then returns a value.

Compute is what a cpu does and I kinda know that (If you are not sure, checkout my last blog post). But what exactly are these values?

Well vaguely value is data(of some type). Hmmm what exactly are these types? To quench your Socratic thirst, I am gonna put a functional programmer’s hat and give you an answer.

Types are of four different variations:

  1. Primitive Data Types: characters, integers, booleans and arrays of these types
  2. Product Types: These are compound types that hold state. Records in Haskell, Structs in C/C++, Classes in Java are good examples
  3. Sum Types: Represents a group of values out of which one can represent the current state. Enums in most languages are examples of sum types
  4. Parametrized Types: I will mention the word generics and leave the discussion there.

Type theory experts can probably list more variations, but these are the variants that I encounter often.

Enough programming talk, let’s watch a movie

Let’s take a small digression from our discussion so far.

Imagine a movie theatre with tickets for entry. Based on your age each tickets costs different. The owner of the theatre enforces that all tickets should have the name of the the person who owns it and each child ticket should have the age of the child

This is to ensure that people are not committing scams, since the child tickets cost half the adult tickets.

How can we use types to model this problem? I am gonna use Rust here to model this because out of all the programming languages I have used till date, Rust has the best enums.

enum Ticket<'a> {
    Adult { name: &'a str },
    Child { name: &'a str, age: i32 },
}

Now let’s use our type analysis mentioned above to poke Ticket.

  1. Ticket is an enum and so it is a sum type.
  2. Each entry in the sum type holds multiple primitive values together and hence are product types

This should be the first aha moment of our discussion. We can finally define polymorphism.

The entity called Ticket can exist in multiple different forms/morphisms. This is a property of sum types in general. Thus they can exhibhit polymorphism.

Now onto Inheritance with an example in a different language: Typescript:

class Ticket {
    string name;
};

class AdultTicket extends Ticket {
}

class ChildTicket extends Ticket {
    number age;
}

Ticket by default has a name, and the variants of it may/may not have additional data/functionality as required

Thus inheritance is a case of polymorphism where there exists a data hierarchy

Nice essay there mate, but where Go and C ?

A good way to demonstrate the inheritance and polymorphism implementation in C and Go, lets take a solid problem

We are given an interface (In C think of this as a struct with a function pointer) Expression that holds a function getExpressionValue. This interface is implemented by 3 structs: IntegerToken,PrefixExpression and InfixExpression. Write a function evaluate that takes in an instance of this interface and return the evaluated value(an integer).

Code-dropping the solution in Golang:

type Expression interface {
  getExpressionValue() string
}

type IntegerToken struct {
  value int32
}

type PrefixExpression struct {
  op  string
  rhs Expression
}

type InfixExpression struct {
  lhs Expression
  rhs Expression
  op  string
}


func (i IntegerToken) getExpressionValue() string {
  return strconv.Itoa(int(i.value))
}


func (i PrefixExpression) getExpressionValue() string {
  return i.op + strconv.Itoa(evalExpression(i.rhs))
}

func (i InfixExpression) getExpressionValue() string {
  return ""
}

func evalExpression(e Expression) int {
  operationFunctionMap := map[string]func(int, int) int{
    "+": func(a, b int) int { return a + b },
    "-": func(a, b int) int { return a - b },
    "*": func(a, b int) int { return a * b },
    "/": func(a, b int) int { return a / b },
  }
  switch v := e.(type) {
  case IntegerToken:
    return int(v.value)
  case PrefixExpression:
    return evalPrefix(v)
  case InfixExpression:
    return operationFunctionMap[v.op](evalExpression(v.lhs), evalExpression(v.rhs))
  }
  return 0
}

If you are someone who’s been programming in golang for a while, this pattern would be familiar for you. If you are not familiar, I would highly suggest reading about type switches in Tour Of Go.

TLDR: In order to evaluate an expression, we evaluate the type of the struct implementing the interface (The Golang compiler does this for us), and evaluate accordingly

If I posted this on twitter, there would be at least one greybeard programmer in front of his Arch Linux machine, on firefox (built from source) typing ferociously on a keyboard that he himself wrote the firmware for:

Modern programming languages are bloat, real programmers use C for everything.

Now in the words of Barny Stinson, “Challenge Accepted”.

Implementing inheritance in C

Let’s try to port the code given above to C. All our go structs can be mapped one to one to C structs, so that isn’t a huge problem, the real issue is the interface and the type checking switch statement. In order to solve this problem we use an age old technique in C that’s used in places like the CPython codebase and the Linux Kernel called Struct Composition.

Struct Composition

In C we layout our structs in memory using certain rules of struct alignment. Let’s take a sample struct a:

typedef struct a {
  int serial_no;
} a;

int main(){
  a *instance = (a*)malloc(sizeof(a));
  a->serial_no = 3;
}

Let’s consider the pointer to the struct that we have created. For pedagogical reasons let’s first ask ourselves the question what is a pointer?

Pointers, I hardly know ‘er

A pointer is a value that points to some memory location.

Here where does our instance pointer point to?

I don’t really want to get into the nitty grittys of struct alignment here (although you can refer to this link if you want to know more).

Now let’s consider a slightly more involved piece of C code.

We have a function that takes in a character input, and then based on whether its an integer or an alphabet you return an integer_token or a char_token. There’s a function that prints the token as TOKEN_NAME: TOKEN_VALUE.

typedef enum token_type {
    INTEGER,
    CHARACTER,
} token_type;

typedef struct base_token_t {
  token_type type;
} base_token_t;

typedef struct integer_token_t {
  base_token_t inherit_base;
  int value;
} integer_token_t;

typedef struct character_token_t {
  base_token_t inherit_base;
  char value;
} character_token_t;

void print_token(base_token_t* token){
  switch(token->type){
    case INTEGER:
      printf("integer: %d",((integer_token_t*)(token))->value);
    case CHARACTER:
      printf("character: %s",((character_token_t*)(token))->value);
    default:
      assert("Invalid token");
  }
}

Let’s try to understand what’s going on in the print_token function:

  1. Our function print_token takes in a base_token_t type as an argument. Why?? Well because it can take in both the character_token_t and integer_token_t. You don’t have a union type so you create a struct that acts like a parent(we have given it the name base_token_t)

  2. Suppose I pass in a pointer to integer_token_t to the print_token function. What’s gonna happen? Well the c compiler will typecast your argument(your pointer) to base_token_t*

  3. What are the implications of this typecasting? Let’s look at memory for a second:

    - In memory the `integer_token_t` is pointing to:
      - `integer_token_t` struct
      - the first field of the struct, which in our case is the `base_token_t` struct
  4. At this point you might ask yourself the question, why does a pointer need a type? Why can’t we just have an agnostic pointer with no types(Technically thats void*). Well we need it for the sake of interpretation. Imagine having an address in memory. How do you read it? How long should you read? 4 bytes, 8 bytes, read 3 bytes then ignore next 2 bytes and then read 5 bytes? (Now ask yourself the question, why does an http header have a Content-Length field)

  5. If a pointer type is for interpretation, then typecasting a pointer type is for interpreting it in a different way: Let’s take a classic example, we know that characters are stored in their ascii format in memory, if we interpret the pointer to the memory as a character and print it, we print the character to the screen. What if we interpreted it as a integer and then print it? Well now we interpret the ascii code as an integer and thus an integer will be printed to the screen. This is a really simple example and I would really suggest the reader to do this.

  6. In our example, the function is interpreting our integer_token_t struct as a base_token_t, this won’t cause any problems since the memory the struct is pointing can be interpreted safely as both of them.

  7. Now in the switch statement we check the type of the base_token_t, because the type is an enum value(integer technically), we read it and match it with the enum values and print it

  8. But now our token is interpreted as a base_token_t, how can we retrieve our value field from the integer_token_t struct?

Phew that was a long ride. Now let’s start porting our go code to C using struct composition.

typedef enum expression_type {
  PREFIX,
  INFIX,
  INTEGER,
} expression_type;

typedef struct expression_t {
  expression_type type;
} expression_t;

typedef struct integer_token_t {
  expression_t inherit_expression;
  uint32_t value;
} integer_token_t;

typedef struct prefix_expression_t {
  expression_t inherit_expression;
  char op;
  expression_t* rhs;
} prefix_expression_t;

typedef struct infix_expression_t {
  expression_t inherit_expression;
  expression_t* lhs;
  expression_t* rhs;
  char op;
} infix_expression_t;

int eval_expression(expression_t* expression){
  switch(expression->type){
    case PREFIX:{
      prefix_expression_t* prefix_expr = (prefix_expression_t*)expression;
      switch(prefix_expr->op){
        case '+':
          return eval_expression(prefix_expr->rhs);
        case '-':
          return -1 * eval_expression(prefix_expr->rhs);
        default:
          assert("Shouldn't hit this");
      }
    }
    case INFIX:{
      infix_expression_t* infix_expr = (infix_expression_t*)expression;
      switch(infix_expr->op){
        case '+':
          return eval_expression(infix_expr->lhs) + eval_expression(infix_expr->rhs);
        case '-':
          return eval_expression(infix_expr->lhs) - eval_expression(infix_expr->rhs);
        case '*':
          return eval_expression(infix_expr->lhs) * eval_expression(infix_expr->rhs);
        case '/':
          return eval_expression(infix_expr->lhs) / eval_expression(infix_expr->rhs);
        default:
          assert("Shouldn't hit this");
      }
    }
    case INTEGER:{
      return ((integer_token_t*)expression)->value;
    }
    default:
      assert("Should not come here");
  }
}

I thought you were going to patch GCC and add an extends keyword for structs 😡!!!

No I am not and that’s not just because of my inability(skill issue). Inheritance as data hierarchy does not require an extends keyword, in fact data hierarchy can be established in two ways:

  1. Inheritance
  2. Composition

We have composed our structs in such a way that all of them extend the same data and functionality.

Languages like Golang promote this by only providing interfaces, Rust and even PHP now on the other hand gives traits( a pseudonym for interfaces tbh) to stall the user from using the inheritance footgun.

This is far better then inheritance in my personal opinion, cause the implications both in terms of memory and compute are known to me and I dont have to dive into the compiler implementation to understand the costs of the inherits/extends abstraction(looking at you dynamic dispatch). The memory cost is determined by calculating the total size of the struct via the rules of alignment and the cost for dynamic dispatch is one cmp instruction(from our switch case).

Here if the parent changes, the children are not forced to add additional data as long as they are not using it, in a language like Java with an traditional inheritance, this will give an error at compile time. Also there exists this need to establish a taxonomy of data at significantly early stage of your project thus losing flexibility (Also Kind of why I think NoSQL might be a better choice for companies that haven’t established a PMF) .

And it’s not just the naive college senior (me) that thinks the same, read this from OG React Docs and this really nice video by CodeAesthetic.

You just implemented inheritance, where is polymorphism?

If you are in this camp: Read the blog again, especially the part of it where I present Inheritance as a special case of Polymorphism

Conclusion

Not much to write here TBH, but I kinda want to give my perspective on “understanding a concept” in Computer Science.

You only truly understand a concept if you can map it entirely to instructions and memory.

How do you do that?

You implement it in a tool that only exposes you to the same. In an ideal world write assembly on a system with a simple processor core and a block of DRAM without an operating system. If you don’t have access to that then write assembly on a Linux machine. If you don’t have the time for that then write C