Sunday, July 21, 2013

Coding pratice Under the hood

Normal programming practice most of us focus on  improving efficiency of code by optimizing our algorithm, which is 'Yes very important , But equally important is underling system you are gonna run your application on. A better understanding of this will let you touch the limits. 
t is normal , conventional an good practice variables in "memory" and move data to and from "disk". But sometimes it becomes necessary to keep your data also in memory for fast access, specially when you are making a real time response systems, which most of the operating systems do very perfectly  without even letting you know about it.

And those are the times you need malloc to allocate memory faster than usual, some of us implement our own  memory pool spend inordinate amounts of time keeping track of what objects need to be in RAM and which are on disk and it will move them forth and back depending on needs.
you create a object in "RAM" and it gets used some times rapidly after creation. Then after some time it get no more hits and the kernel notices this. Then somebody tries to get memory from the kernel for something and the kernel decides to push those unused pages of memory out to swap space and use the (cache-RAM) more sensibly for some data which is actually used by a program. This however, is done without you knowing about it. Your application still thinks that these objects are in RAM, and they will be, the very second it tries to access them, but until then, the RAM is used for something productive.
This is what Virtual Memory is all about.
After some time, you will also notice that some of your objects are unused, and you decide to move them to disk so the RAM can be used for more busy objects. So you  create a file and then it writes the objects to the file.(if u do ';-)')
Lets see what actually happened under the hood, your application calls write(2), the address it gives is a "virtual address" and the kernel has it marked as not available.
So the CPU hardware's paging unit will raise a trap, a sort of interrupt to the operating system telling it "fix the memory please".
The kernel tries to find a free page, if there are none, it will take a little used page from somewhere, likely another little used application object, write it to the swap area, when that write completes, it will read from another place in the paging pool the data it "paged out" into the now unused RAM page, fix up the paging tables, and retry the instruction which failed.
your application knows nothing about this, for your application it was just a single normal memory acces.
So now you have the object in a page in RAM and written to the disk two places: one copy in the operating systems paging space and one copy in the filesystem.
Application now uses this RAM for something else but after some time, when application need it back, first you needs some RAM, so it may decide to push another object out to disk (repeat above), then it reads the filesystem file back into RAM, Uff hell lot of work.

A smart guy would have done it like this 

Allocate some virtual memory, it tell the operating system to back this memory with space from a disk file (memory mapped files). When it needs to access object, it simply refers to that piece of virtual memory and leaves the rest to the kernel.
If/when the kernel decides it needs to use RAM for something else, the page will get written to the backing file and the RAM page reused elsewhere.
When application next time refers to the virtual memory, the operating system will find a RAM page, possibly freeing one, and read the contents in from the backing file.
Andthat's it. application doesn't really need to control what is cached in RAM and what is not, the kernel has code and hardware support to do a good job at that, and it does a good job.
The  objects are not needed as filesystem objects, so there is no point in wasting time in the filesystem name space (directories, filenames and all that) for each object, all we need to have in memory is a pointer into virtual memory and a length, the kernel does the rest.
Virtual memory was meant to make it easier to program when data was larger than the physical memory, but people have still not caught on.

Multi-cpu designs have become the fancy of the world, despite the fact that they suck as a programming model. Multiple layer of cache makes it even more/less difficult to cope in optimal programing 

To read a memory location means to check if we have it in the CPUs level 1 cache. It is unlikely to be unless it is very frequently used. Next check the level two cache, and let us assume that is a miss as well.
If this is a single CPU system, the game ends here, we pick it out of RAM and move on. On a Multi-CPU system, and it doesn't matter if the CPUs share a socket or have their own, we first have to check if any of the other CPUs have a modified copy of variable stored in their caches, so a special bus-transaction goes out to find this out, if some cpu comes back and says "yeah, I have it" that cpu gets to write it to RAM. On good hardware designs, our CPU will listen in on the bus during that write operation, on bad designs it will have to do a memory read afterwards.
Now the CPU can increment the value of variable, and write it back. But it is unlikely to go directly back to memory, we might need it again quickly, so the modified value gets stored in our own L1 cache and then at some point, it will end up in RAM.
Now imagine that another CPU wants to increment variable at the same time, can it do that ? No. Caches operate not on bytes but on some "linesize"(smaller than page size) of bytes, typically from 8 to 128 bytes in each line. So since the first cpu was busy dealing with one variable, the second CPU will be trying to grab the same cache-line, so it will have to wait, even through it is a different variable.
To do that, it read the variable and then write that variable back. It may or may not involve a load into a CPU register, but that is not important.

This is a place where pointers help. If you make a application where you need to process some kind of  message. Allocate virtual memory to it and store pointers to this application and try to reuse these objects. this model of programming avoid unnecessary malloc and free calls. Zero copy model of programming  also help in  this case.