Nim types (originally a Reddit reply)

9th January 2018 - Nim , Programming

While perusing the Nim subreddit I stumbled across a post asking for an explanation of how types work in Nim, especially how Nim allocates different types on the heap and the stack. Since the answer grew pretty long I decided to post it here as well for perpetuity. What follows is a copy of my response with some additional markdown.

First order of business heap vs. stack

Whenever you call a function it creates a stack frame. Stacks are simply said first in last out, so when a function call is done it pops it's frame from the stack, and you return to the previous frame. When you declare a variable inside a scope in is typically allocated on the stack. This means that when the function returns that part of memory is not in use anymore and the value is gone (technically it's still there, but we shouldn't try to access it).

The heap on the other hand is quite a bit different. Items on the heap are allocated and live there until you deallocate them. In C this is done manually with calls like malloc and free. In Nim however, and C# for that matter, we have the garbage collector (or GC), this nifty thing reads through our memory and checks if anything still points to things allocated on the heap and if nothing is pointing to that which was previously allocated it gets free'd. This means that we won't leak memory as easily as you would in C as things that we've lost every reference to gets automatically cleaned.

Declaring types and where they are placed

Now that we know where our stuff lives in memory, let's look at how Nim represents those things. Basic types are things like integers, floats, bools, and characters. Their size is known and static. Nim also calls strings a basic type, but those are not quite like the rest since they can change in size. Since our stack is first in last out it means that it makes sense to store everything in order. But storing things in order isn't possible if you want to change the size of something (for example appending to a string). So when we create numbers in our Nim code they will be stored on the stack, this is why you never need to free your numbers and don't have to set them to nil when you're done with them.

So what are types? In Nim you can create aliases for types like type age = int this is just a way to say that age is an integer, and it will be treated like one for all intents and purposes. If we want to create collections of types to represent something particular we can create objects, don't think of these quite like objects in an object oriented language, think of them more like structs in C. Such objects are simply a collection of values. So in Nim when we create an object it will live with us on the stack. If we return an object it will be copied to our caller, and if we insert it into a data structure it will also be copied. While this might be practical in many cases (even offering a speed benefit if done right) we often don't want to copy our large objects around. This is when allocating on the heap comes into play.

If we define our type as a ref object it means that the type is actually a reference to an object. I'll come back to the difference between a reference and a pointer later, but for now just remember that a reference is the memory location of an object on the heap. This means that if we return a ref object it means that we're only returning the memory address of that object, not copying the object itself. This also means that if we insert it into a data structure only the reference to the object is inserted. Whenever you see new SomeObject it means that it allocates memory for that object on the heap, and gives us a reference to this object. If we had simply done var myObject: SomeObject and SomeObject was defined as a ref object we would only have a reference on our stack, so trying to access it would crash saying we had an "Illegal storage access". This is because Nim defaults our value to nil, and no-one is allowed to access memory area 0.

So imagine we had an object that contained the height, the weight, and the age of a person. That could be represented by three integers. If we wanted to return this object it would mean copying all those three values to our caller. If we defined it as a reference, we would only pass one integer, the position in memory (on the heap) where we stored the three others. Conveniently this also means that if one function modifies a value in a referenced object, that change would be visible for all other functions using the same reference (since they all point to the same place in memory). This is practical for example if you want to make one list sorted by age, one by height, and the third by weight. Instead of copying our person three times, we could just use three references, one in each list.

Pointers vs. references

So now that we know where and how are values are stored we can look at the difference between a pointer and a reference. In pure Nim code you would typically only use references, these are what every call to new creates and what goes on behind the scenes most of the time when working with strings. A reference is also called a managed pointer, it simply means that Nim manages this area of memory, and that it will be automatically free'd for us by the garbage collector when Nim sees that we're not using it any longer. Pointers on the other hand are unmanaged meaning that Nim doesn't try to do anything with the memory behind that pointer, most of the time you won't even know what is there. The reason there are pointers in Nim is mostly to interface with C. In C every time you want objects on the heap you need to manually malloc and free them. Many C libraries work by passing around a pointer to a structure containing some state and all good libraries have some way of dealing with this memory. Typically you do it by calling some initialisation function when you begin and then some cleanup function when you are done. In Nim we might want to use a C library such as that, but since we might loose the reference in our own code while the library still keeps a reference somewhere which Nim doesn't know about we can't have a reference to it as Nim would garbage collect it. So instead we have pointers.